The semantic web

Today there are a number of major search engines such as Google, Ask Jeeves, HotBot, etc. which are often the main port of entry to the Web for many users.

Unfortunately their value has decreased as the Internet expands because there are far too many "hits" returned even in response to the simplest of queries. No one has the time to scan through thousands of possible answers, particularly as most of them are in fact irrelevant. As a result the order in which the returns are listed becomes all important, and so it is who is found first that matters. Unfortunately that means that those who are favoured most by the search engines (those who pay enough, often enough) win. This is not what the Internet is all about; it is supposed to be impartial.
The root of the problem lies in HTML. HTML uses a fixed set of tags, which simplifies the creation of Web pages. This was the correct approach in the first place because it gets things moving, but the fixed tags mean that there is no flexibility in the mark-up language. Thus HTML should have been replaced in the early days by a fully functioned mark-up language. Unfortunately things developed too fast and HTML became too entrenched to be easily replaced. There was a mark-up language already in existence which could have been used, SGML. In fact HTML was inspired by SGML, but SGML, which was developed for the publishing industry, was too complex, which made the development of tools difficult, further encouraging the entrenchment of HTML. In any case SGML, while much more appropriate than HTML, was not ideal for the Web.
Fortunately the World Wide Web Consortium (W3C) was by then well established as the authority for Web related standards, and they produced a new standard, more closely related to SGML than was HTML, but both appropriately simplified and extended to meet the requirements of electronic (as opposed to printed paper) documents. Such a standard, XML, is equally suited to Web pages and to documents. Unfortunately XML was too late to stop HTML from becoming too entrenched.
The key difference between HTML and XML with respect to the Web is that tags could be used to define meaning, hence the term "Semantic Web", so that searches could then be directed and a much shorter, more useful number of hits would be made. The amusing example is a search for "apple"; HTML returns pages relevant to Apple computers and to apple trees and cider, while a Semantic (XML) Web would allow a search for a company named apple as opposed to fruit. Unfortunately this needs all the Web pages to be properly formatted in XML but they are all in HTML.
Is there any chance for the current Web to be replaced by a Semantic Web? The sad answer in the short and medium term is no and so the Web will continue to be frustrating and unable to fulfil its promise. It is not so much a question of technology any more since there is now a wealth of knowledge of XML and a good selection of products. By a fortunate coincidence the Web also created another problem as e-commerce applications were added to the original information dissemination concept. In order to make it possible for incompatible systems to interoperate a formal method for formatting messages was needed. This in fact is simply an example of marking up a document, albeit a short one, i.e. an invoice line, and so XML has become established, but not yet for Web pages.< BR>

Martin Healey, pioneer development Intel-based computers en c/s-architecture. Director of a number of IT specialist companies and an Emeritus Professor of the University of Wales.