Say that you want to find the best price on the Web for that new Kate Bush album. Hell, say you just want to find it. If you had the skill and the inclination, you might write a special-purpose Web crawler that would search out sites like CDNow.com and Tunes.com that sell music online, query them for their price on the album, and then assemble the results for you. Finally, you might have your computer scan the online classifieds to see if anybody in your neighborhood is trying to sell a used copy of the disc at half price.
Although you could write such a program today, it would be a tremendous undertaking. That's because every online shopping site is different; each has its own way of searching for discs and its own way of displaying prices. As for hunting through the classifieds, unless you have a degree in artificial intelligence and natural language, you'll have a hard time writing a computer program that can pick through all that noise to find some meaningful signal.
Today's Web is filled with online information. What's missing is data that describes the data - metadata.
Metadata is more than some new set of HTML tags that says things like "this is a CD title" and "this is a price." As envisioned by Tim Berners-Lee, the inventor of the Web and the director of the World Wide Web Consortium, metadata would be a comprehensive set of standards for describing data about data.
For example, CDNow.com might create a standard set of HTTP queries for searching the company's database and a standard template for sending the data back. Other companies could then implement those same standards. Pretty soon, building a program that could scan the Web for the best prices on discs wouldn't be too hard at all.
"The long-term objective is the automatable Web - basically, to put machine-readable information on the Web," Berners-Lee says. "It could have a very revolutionary effect."
Berners-Lee's vision is that metadata will be used to describe different kinds of provable assertions. One kind of assertion might be "We offer the *Hounds of Love CD for US$9.95." A real-estate agency's Web site might have assertions such as "This house has four bedrooms." Consumer's Union might have an assertion such as "This product is a CU Best Buy." Special metadata documents on the Web would describe the syntax of the assertions and what's meant by vocabulary they use. It's likely that these ontologies won't be created by industry leaders, but by renegades who are trying to attract customers by offering consistently lower prices. Once one company starts offering online information in machine-readable form, others can follow in its footsteps using the same ontology. Pretty soon, even the industry leaders will be forced to compete on price, service, and selection - rather than on glitzy online graphics. Click Geek This to learn more about applied ontology.
Here's FOLDOC's definition of ontology and how it might be used:
ontology -
1. n. [artificial intelligence (AI) - from philosophy]
An explicit formal specification of how to represent the objects, concepts, and other entities that are assumed to exist in some area of interest and the relationships that hold among them.
For AI systems, what "exists" is that which can be represented. When the knowledge about a domain is represented in a declarative language, the set of objects that can be represented is called the universe of discourse. We can describe the ontology of a program by defining a set of representational terms. Definitions associate the names of entities in the universe of discourse (e.g. classes, relations, functions, or other objects) with human-readable text describing what the names mean, and formal axioms that constrain the interpretation and well-formed use of these terms. Formally, an ontology is the statement of a logical theory.
A set of agents that share the same ontology will be able to communicate about a domain of discourse without necessarily operating on a globally shared theory. We say that an agent commits to an ontology if its observable actions are consistent with the definitions in the ontology. The idea of ontological commitment is based on the Knowledge-Level perspective.
Metadata could also enable Web browsers to assign relative credibility to the information they display.
"We have been saying for many years that we should have an 'oh yeah?' button on the browser," says Berners-Lee. Click this button and the browser will try to construct some kind of proof, based on metadata contained on the Web, of why you should believe the information on the screen. For example, if you clicked the "oh yeah?" button on Tim's own homepage some day in the future, your browser might come back with a chain of justifications like this:
"You should believe what's on this Web page because it's signed with Tim's digital signature, and Tim's Digital ID is on a list of MIT research affiliates that's signed by the master key of the Massachusetts Institute of Technology, and MIT's master key is signed with the VeriSign Class III CA key, which you trust."
That's the vision, at least. Right now, the World Wide Web Consortium is involved in several metadata projects that are far more mundane. The first is W3C's notorious PICS project for labeling content on the Web. PICS labels are a form of metadata. Another is the W3C's digital signature initiative, which is designed to create a metadata language that will explain what is actually meant when somebody signs a particular document on the Web with a particular digital key. And W3C is working on XML, the Extensible Markup Language, which should be some sort of general-purpose language for denoting assertions.
Two other big metadata pushes are coming from librarians and data-retrieval companies, who want to use metadata to describe things like the author and title of a document, as well as the license agreement under which the document is being made available.
"What we are trying to do is keep in mind a path to the future whereby the assertions that you put on the Web now will be actually compatible with the [metadata] language as it gets more and more powerful," Berners-Lee says.
In the meantime, I'm probably better off searching for that new Kate Bush album myself.*