1. What is the Semantic Web?
The vision of the Semantic Web is to extend principles of the Web from documents to data. This extension will allow to fulfill more of the Web’s potential, in that it will allow data to be shared effectively by wider communities, and to be processed automatically by tools as well as manually.
The Semantic Web allows two things.
- It allows data to be surfaced in the form of real data, so that a program doesn’t have to strip the formatting and pictures and ads off a Web page and guess where the data on it is.
- it allows people to write (or generate) files which explain—to a machine—the relationship between different sets of data. For example, one is able to make a “semantic link” between a database with a “zip-code” column and a form with a “zip” field that they actually mean the same – they are the same abstract concept. This allows machines to follow links and hence automatically integrate data from many different sources.
Semantic Web technologies can be used in a variety of application areas; for example: in data integration, whereby data in various locations and various formats can be integrated in one, seamless application; in resource discovery and classification to provide better, domain specific search engine capabilities; in cataloging for describing the content and content relationships available at a particular Web site, page, or digital library; by intelligent software agents to facilitate knowledge sharing and exchange; in content rating; in describing collections of pages that represent a single logical “document”; for describing intellectual property rights of Web pages (see, eg, the Creative Commons), and in many others.
In order to achieve the goals described above, the most important is to be able to define and describe the relations among data (i.e., resources) on the Web. This is not unlike the usage of hyperlinks on the current Web that connect the current page with another one: the hyperlinks defines a relationship between the current page and the target. One major difference is that, on the Semantic Web, such relationships can be established between any two resources, there is no notion of “current” page. Another major difference is that the relationship (i.e, the link) itself is named, whereas the link used by a human on the (traditional) Web is not and their role is deduced by the human reader. The definition of those relations allow for a better and automatic interchange of data. RDF, which is one of the fundamental building blocks of the Semantic Web, gives a formal definition for that interchange.
On that basis, additional building blocks are built around this central notion. Some examples are:
- Tools to query information described through such relationships (eg, SPARQL)
- Tools to have a finer and more detailed classification and characterization of those relationships. This ensures interoperability and more complex automatic behaviors. For example, a community can agree what name to use for a relationship connecting a page to one’s calendar; this name can then be used by a large number of users and applications without the necessity to redefine such names every time. (E.g., RDF Schemas, OWL, SKOS)
- For more complex cases, tools are available to define logical relationships among resources and the relationships (for example, if a relationships binds a person to his/her email address, it is feasible to declare that the email address is unique, ie, the address is not shared by several persons). (E.g., OWL, Rules)
- Tools to extract from, and to bind to traditional data sources to ensure their interchange with data from other sources. (E.g., GRDDL, RDFa)
It is difficult to predict what a “killer application” is for a specific technology, and the prediction is often erroneous. That said, the integration of currently unbound and independent “silos” of data in a coherent application is certainly a good candidate. Specific examples are currently explored in areas like Health Care and Life Sciences, Public Administration, Engineering, etc.
Not necessarily, at least not directly. The Semantic Web technologies may act behind the scenes, resulting in a better user experience, rather than directly influencing the “look” on the browser. This is already happening: there are Web Sites (e.g., Sun’s white paper collection site, or Nokia’s support portal for their S60 series device, Oracle’s virtual press room, Harper’s online magazine) that use Semantic Web technologies in the background.
As all innovative technologies, the Semantic Web underwent an evolution starting at research labs, being then picked up by the Open Source community, then by small and specialized startups and finally by business in general. Remember: the Web was originally developed in a High Energy Physics center!
At present, the Semantic Web is increasingly used by small and large business. Oracle, IBM, Adobe, Software AG, or Northrop Grumman are only some of the large corporations that have picked up this technology already and are selling tools as well as complete business solutions. Large application areas, like the Health Care and Life Sciences, look at the data integration possibilities of the Semantic Web as one of the technologies that might offer significant help in solving their R&D problems.
First of all, as pointed out elsewhere in this document, one can develop Semantic Web applications without using ontologies. Very useful applications can be built without those, relying on the most fundamental, and simple concept of the Semantic Web. However, even if ontologies, rules, reasoners, etc, are used, the average user should not care about the complexities of, say, the details of reasoning. All this is done “under the hood”. What the developer needs to operate with are usually simple logical patterns of the sort “Given that (Flipper isA Dolphin)
and (Dolphin isAlso Mammal)
, one can conclude that (Flipper isA Mammal)
".
Compare it to SQL. The official SQL standards, the formal semantics of SQL, and indeed its implementations are extremely complex and understood by a few specialists only. Nevertheless, a large number of users use SQL in practice, without caring about the underlying complexities.
The Semantic Web is an extension of the current Web and not its replacement. Islands of RDF and possibly related ontologies can be developed incrementally. Major application areas (like Health Care and Life Sciences) may choose to “locally” adopt Semantic Web technologies, and this can then spread over the Web in general. In other words, one should not think in terms of “rebuilding” the Web.
The Semantic Web Activity at W3C groups together all the Working and Interest Groups whose goals are to improve the current Semantic Web technologies or to contribute to their wider adoption. The activity home page gives an up-to-date list of the current work at W3C.
2. How does the Semantic Web relate to…
Some parts of the Semantic Web technologies are based on results of Artificial Intelligence research, like knowledge representation (e.g., for ontologies), model theory (e.g., for the precise semantics of RDF and RDF Schemas), or various types of logic (e.g., for rules). However, it must be noted that Artificial Intelligence has a number of research areas (e.g., image recognition) that are completely orthogonal to the Semantic Web.
It is also true that the development of the Semantic Web brought some new perspectives to the Artificial Intelligence community: the “Web effect”, i.e., the merge of knowledge coming from different sources, usage of URIs, the necessity to reason with incomplete data; etc.
Description Logic is the mathematical theory (stemming from knowledge representation) that is at the basis of some of the technologies defined on the Semantic Web: OWL-DL and OWL-Lite.
Both formalisms have their strengths and weaknesses; their area of usage is different. The two data models serve different constituencies and the choice really depends on the application. There is no better or worse; only different.
One of XML’s strengths is its ability to describe strict hierarchies. Applications may rely and indeed exploit the position of an element in a hierarchy: for example, most browsers provide a different rendering of HTML’s li
element depending on how “deep” the enclosing list is. XML makes it easy to control the content via XML Schemas and combine XML data that abide to the same Schema or DTD.
However, combining different XML hierarchies (technically, DOM trees) within the same application may become very complex. XML is not an easy tool for data integration. On the other hand, RDF consists of a very loose set of relations (triples). Due to its usage of URIs it is very easy to seamlessly merge triple sets, ie, data described in RDF within the same application; it is therefore ideal for the integration of possibly heterogenous information on the Web. But this has its price: reconstructing hierarchies from RDF may become quite complex. As an example, it would be fairly complicated (and unnecessary) to describe, eg, vector graphics, using RDF; use SVG instead!
This issue is also related to the issue of using XML or RDF, addressed in a previous question. First of all, let us quote from the OWL Guide recommendation:
- An ontology differs from an XML Schema in that it is a knowledge representation, not a message format. Most industry based Web standards consist of a combination of message formats and protocol specifications. These formats have been given an operational semantics, such as, “Upon receipt of this
PurchaseOrder
message, transferAmount
dollars fromAccountFrom
toAccountTo
and shipProduct
.” But the specification is not designed to support reasoning outside the transaction context. For example, we won’t in general have a mechanism to conclude that because theProduct
is a type ofChardonnay
it must also be a white wine.- One advantage of OWL ontologies will be the availability of tools that can reason about them. Tools will provide generic support that is not specific to the particular subject domain, which would be the case if one were to build a system to reason about a specific industry-standard XML schema. […] They will benefit from third party tools based on the formal properties of the OWL language, tools that will deliver an assortment of capabilities that most organizations would be hard pressed to duplicate.
Also, XML data is very sensitive to the XML Schema it refers to. If the XML Schema changes, the same XML data may become invalid, i.e., being rejected by Schema-aware parsers. Somewhat similar dependence on RDF Schemas and Ontologies exist for RDF data, too: if the RDF Schema or OWL Ontology changes, the inferences drawn from the RDF data may change. However, the core RDF data is still usable, there is no notion of the data being “rejected” by, e.g., a parser due to a Schema/Ontology change. In general, RDF is more robust against changing of Schemas and Ontologies than XML is versus Schemas.
The meta
and link
elements in HTML can be used to add metadata to an HTML page. In Semantic Web terms, this is equivalent to the process of defining RDF relationships for that page as a “source”. Note, however, that these elements can be used to define relationships for the enclosing HTML file only, whereas the Semantic Web allows the definition of relationships on any resource on the Web. That also means that the meta
and link
elements can be used by the author of the document only, whereas, on the Semantic Web, anybody could publish metadata concerning that page.
Tagging has emerged as a popular method of categorizing content. Users are allowed to attach arbitrary strings to their data items (for example, blog entries and photographs). While tagging is easy and somewhat useful, it destroys a lot of the semantics of the data. In the Semantic Web, instead of tagging data items with strings, they can be related to other resources which can be uniquely identified, like ones representing people and places. The relationships are very specific, like who took the photograph, who is in the photograph, where the photograph was taken.
Microformats are usually relatively small and simple sets of terms agreed upon by a community. Data models developed within the framework of the Semantic Web have the potential to be more expressive, rigorous, and formal (and are usually larger). Both can be used to express structured data within web pages. In some cases, microformats are appropriate because the extra features provided by Semantic Web technologies are not necessary. Other cases requiring more rigor will not be able to use microformats.
Data described in microformats each address a specific problem area. One has to develop a program well-adapted to a particular microformat, to the way it uses, say, the class and title attributes. It also becomes difficult (though possible) to combine different microformats. In contrast, RDF can represent any information—including that extracted from microformats present on the page. This is where microformats can benefit from RDF—the generality of the Semantic Web tools makes it easier to reuse existing tools, eg, a query language and combining statements from different origins easily belongs to the very essence of the Semantic Web.
Note that the GRDDL Working Group has developed a “bridge” to the microformats approach; it defines a general procedure whereby microformats stored in an XHTML file can be transformed into RDF on–the–fly. Also, the Semantic Web Deployment group’s work on RDFa develops an XHTML1.1 module that gives the possibility to use virtually any RDF vocabulary as annotations of the XHTML content; a bit like microformats with somewhat more rigor and a better way of integrating different vocabularies within the same document. Finally, eRDF (developed by Talis) offers a formalism somewhere between the two: one can add general RDF data to an (X)HTML page without the need for a new module, although with restrictions on the type of RDF vocabularies that can be used this way.
One aspect of Web 2.0, beyond the exciting new interfaces, is that it pushes intelligence and active agents from the server to the client, more specifically the browser. Development of active client-side application also means that these applications use all kinds of data; data that are on the Web somewhere, or data that is embedded in the page though not necessarily visible on the screen. Examples are microformats type annotation of the page, calendar data on the Web, tagged images or links stored on a web site, etc. This aspect of Web 2.0, ie, that applications are based on combining various types of data (“mashing up” the data) that are spread all around on the Web coincides with the very essence of the Semantic Web. What the Semantic Web provides is a more consistent model and tools for the definition and the usage of qualified relationships among data on the Web. I.e., both technologies focus on intelligent data sharing. A number of typical Web 2.0 demonstrations and applications emerge that, in the background, use Semantic Web tools combined with AJAX and other, exciting user interface approaches.
In many cases, using RDF-based techniques makes the mashing up process easier, mainly when data collected by one application is reused by another one somewhere down the line. The general nature of RDF makes this “mashup chaining” straightforward, which is not always the case for simpler Web 2.0 applications.