Skip to content

Using AI for improving the future of the Web

I found myself inspired to think of where should the World Wide Web go in utilizing Artificial Intelligence.

I would like to tackle this from the information security front. Like many security people who grow up, I also start seeing security as a problem that goes beyond technology. I’m not referring now to non-technical solutions to solve technical security problems (like user awareness training), but the other way around: the non-technical security risks to people and societies that may call for technical solutions.

The first decade or two of the Web was devoted to making this information highway faster, more accessible, and support advanced information provisioning use-cases (audio, video, interaction, etc.) The following decade or two were devoted also to increasing the amount and diversity of that information and its sources, with Web 2.0 and user-generated content replacing the traditional waterfall model. The upcoming years should, in my opinion, be dedicated to improving the Web in areas that emerged as challenges by our successes of the first 30 years.

The current Achilles Hill of the Web is no longer its speed or amount of content, neither is it the ease in which new content gets created and circulated. Where it falls short is in content quality and suitability from the perspective of its consumers. Consumers vary, and net neutrality is king, but for any given consumer, most of the Web is noise, if not outright harmful. This problem is not just about user experience. Pollution of the public discourse is a threat to democracy, and dilution of the subjective value of web content to its users is a threat to its overall usefulness and future. Our phenomenal success in making content so easy to produce and distribute, whatever this content is, plants the seeds of the biggest threat to both the web’s usability, and to some of the societal norms that we appreciate. Artificial Intelligence, if deployed properly, can help us overcome the difficulties posed by the overall information overload and quality-dilution that our success in implementing the Web has caused.

There are many reasons for content to be irrelevant to some of the users, and those reasons are always subjective. Even machine-generated content that we may classify as “noise” or “propaganda” may not be considered as such by some people who may enjoy consuming it. Furthermore, the moment we architect judgmental models into the fabric of the Web, we put an inevitable end to much of what the Web is and should remain being. Our solution should be focused on enabling the user to overcome information overload, coping with the majority of web content being irrelevant; irrelevant by that user’s own standard.

There is a precedent for this situation. Decades ago, we coped with a simplified version of the “mostly irrelevant content” problem. Also then, the problem was highlighted by the high accessibility we put into our Internet protocols, and also then we solved the problem using (a simple version of) machine learning. The e-mail transfer protocol, SMTP, was created without much security built into it. Most notably, anyone can send an e-mail appearing to come from just about anyone else. Soon enough, the e-mail system was swamped by spam messages, which at one time accounted for almost 93% of all e-mail traffic. This was an existential threat to e-mail. Fortunately, Bayes filtering was invented, and implemented in spamassassin and other tools at different levels. This type of filtering, unlike simple static rule-based filtering, learns over time what types of genuine e-mails the user gets, and after not too long of standing corrected by the user and the infrastructure operator, reaches an impressive filtering accuracy. It is only thanks to this mechanism (and others that followed) that we no longer consider spam mail as a problem worth worrying about.

It is time to expand this model dramatically. The challenge now is not just e-mail but web content in general, we do not need to filter spam from ham (using spam filtering jargon) but to rank the relevance and appropriateness of content, and we need to account for different users that have entirely different definitions of content relevance or appropriateness, which even they cannot always describe. If we do this properly, we stand a chance at making today’s massive amounts of diverse web content as attention-worthy to its consumers as web content used to be when it was carefully produced and consumed using the amount- and diversity-limiting waterfall model.

If I keep on thinking about this further, then I will add future posts with future thoughts...

See also


No Trackbacks


Display comments as Linear | Threaded

Raphael Bar-El on :

Excellent perspective and approach. The case of e-mail spam is an excellent illustration. The next and complex step is how to make it adapt to personal needs.

Add Comment

Markdown format allowed
Enclosing asterisks marks text as bold (*word*), underscore are made via (_word_), else escape with (\_).
E-Mail addresses will not be displayed and will only be used for E-Mail notifications.
Form options

Submitted comments will be subject to moderation before being displayed.