The Road to the Semantic Web

转载 2007年10月22日 16:47:00

Written by Alex Iskold / November 14, 2006 / 14 comments

Written by Alex Iskold and edited by Richard MacManus.

John Markoff's recent article in NY Times has generated an interesting discussion about Web 3.0 being the long-promised Semantic Web. For instance, a short post on Fred Wilson's blog had a lot of lengthy comments attempting to define Web 1.0, Web 2.0 and Web 3.0. Some people think that the Semantic Web is about AI, some claim that it is more about semantics, while others say that it is about data annotation. All agree however, that we will all be wonderfully more productive and simply happier when it arrives. Lets take a look at the ingredients, definitions and approaches to the Semantic Web so that we can recognize it when it is finally here.


What is the Semantic Web?

The Wikipedia defines the Semantic Web as a project that intends to create a universal medium for information exchange by putting documents with computer-processable meaning (semantics) on the World Wide Web. The core idea is to create the meta data describing the data, which will enable computers to process the meaning of things. Once computers are equipped with semantics, they will be capable of solving complex semantical optimization problems. For example, as John Markoff describes in his article, a computer will be able to instantly return relevant search results if you tell it to find a vacation on a 3K budget.

In order for computers to be able to solve problems like this one, the information on the web needs to be annotated with descriptions and relationships. Basic examples of semantics consist of categorizing an object and its attributes. For example, books fall into a Books category where each object has attributes such as the author, the number of pages and the publication date. The basic example of a relationship comes from various social networks that we are part of. In one network the relationship might be a friend of, in another a family member and in another works with.

RDF, OWL and the mathematical approach to annotation

There are billions of fairly unstructured HTML pages which contain no annotations and meta data. The fundamental engineering question is how can we go from today's unstructured web to one rich with semantical information? W3C consortium authored specs for RDF (Resource Description Framework) and OWL (Web Ontology Languages) attempt to enable the collective capture and description of information, along with the ontology and the relationships with other pieces of information, in a rigorous, mathematical way.

RDF is an XML-based language which enables description of relationships via predicates. The Wikipedia explains: The subject denotes the resource, and the predicate denotes traits or aspects of the resource and expresses a relationship between the subject and the object. For example, one way to represent the notion "The sky has the color blue" in RDF is as a triple of specially formatted strings: a subject denoting "the sky", a predicate denoting "has the color", and an object denoting "blue".

OWL is another XML-based language used for describing and reasoning ontologies. In a nutshell, OWL facilitates semantic descriptions such as Dog is an animal or Dog has four legs. There are three flavors of OWL: OWL Lite, OWL DL and OWL Full - each flavor capturing a different side of a trade off between expressiveness and computability. This RDF/OWL framework is comprehensive, but is difficult for people without a background in mathematics and computer science to understand. Given that this is a bottom up approach, it is clear that if it is to succeed, there needs to exist an automated mechanism that takes existing HTML content and turns it into RDF and OWL meta data. This, however, is a chicken-egg problem because if we could already do this, the problem would not be there to begin with. Still we can envision tooling which does 80% of the work automatically and then interacts with the person to complete the other 20% of the work.



Recognizing the complexity of RDF and OWL, a group of people are trying a different approach called Microformats. The goal of microformats is to embed the basic semantics right into HTML pages. It is not as expressive right now as RDF and OWL, but it is very compact and uses available XHTML facilities to add semantics to the pages. For example, there is a microformat for describing contact information called hCard. Using hCard it is possible to annotate the HTML so that a microformat-aware browser or a search engine can deduce the information about a person such as first and last name, a company or a phone number. Another mature microformat called hCalendar enables page authors to describe events. Many popular event sites, such as Facebook and Yahoo! Local use this format to annotate events in their HTML pages.

Leaving the aesthetics of the representation aside, the microformats approach is clearly simpler than RDF and OWL. And even though it is less powerful, it is becoming very popular. Many site authors are starting to embed microformats into their HTML pages. We are also seeing some early examples of search engines based on microformats, like this one from Technorati. The simple gain in using microformats and doing search is removing ambiguity. In a way, it is similar to the vertical search engine - which knows which vertical you are searching. With microformats inside the pages, the data is also no longer ambiguous, so the search results are more precise.

Still, there are some issues with microformats. The first one is the same as with the previous bottom up approach - people have to do the work to annotate the pages. The good news is that since the format is simpler, more can be done via reverse engineering and automation. The second issue is that the current set of microformats does not cover many things that we encounter online. For example, we are not aware of a format that would help represent a book or a movie. Many more formats need to be created before they can really "cover" the web.

Semantic Web is Personalized Web

The problem of annotating data is very complex and is far from being solved completely. But let’s leave it aside for a moment and think of what we can be doing once all the data becomes annotated. The promise is that we will be doing less of what we are doing now - namely sifting through piles of irrelevant information. Given that the amount of information is growing exponentially and our tolerance is shrinking, this is a very intriguing proposition. If the computer can return relevant results instantly, we can potentially save a ton of time.

But having semantics and knowing all relationships between the data is not enough to do that. Take the simple example of a travel agency. When you show up there for the first time, the agent does not know what to offer you, even though she knows the semantics of travel, the relationships between things and the prices of everything. In order to be effective, she needs to know where you've been already and what kind of destinations you like. That’s why she asks you questions. All services that we receive work this way and the results are better and more refined over time, because service people have time to learn what you like.

So the second important ingredient of the Semantic Web, the one that will facilitate productivity, is a set of persistent personal preferences. Once the computer knows your preferences and has a semantical representation of it online, it can then run an algorithm to deliver you precise, personalized results. To put it differently, your personal preferences is the filter that needs to be applied to the results that the computer returns in response to: Find a vacation for under 3K. And when this happens, then we can claim that the Semantic Web has arrived.



So will the 'Web 3.0' be the Semantic Web? Probably. But are we there yet? Not quite. It will take some time to annotate the world's information and then to capture personal information in the right way, to enable the kinds of applications that we have discussed. We are certainly getting close and it will be interesting to see how things unfold over the next few years.

Incidentally, if you would like us to write more about the Semantic Web please let us know and we will do follow up posts. 


为了实现实时性应用的要求, 提出了一种通过统一架构的联合classification、detection、segmentation的方法, 其中encoder由三个任务共享。在Kitti上测试验证。...
  • Hanging_Gardens
  • Hanging_Gardens
  • 2017年05月25日 12:52
  • 1740

Road Hackers:自动驾驶平台

数据 原理 开放平台数据10000公里数据,涉及很多数据,自定位、决策、路况分析、POI 和高精地图。一是全景图片数据,二是汽车姿态数据。 关于图片数据,地图采集车采集到的是360度全景数据,但考虑到...
  • c602273091
  • c602273091
  • 2017年01月11日 16:27
  • 1092


 博士论坛 - Powered by PHPWind计算机科学论坛--论坛首页龙星计划首页研学论坛 研学_学术_科技_科研_研究生_博士 - Powered by Discuz!生物统计...
  • alec1987
  • alec1987
  • 2014年05月03日 17:06
  • 7377

Semantic Web下的数据集成

Semantic Web就是语义网,什么是语义网呢?
  • sima1989
  • sima1989
  • 2014年08月23日 15:56
  • 351

文献阅读笔记——road segmention(一)

一.文献名字和作者   A Fast Forest Road Segmentation for Real-time Robot Self-navigation,2004,Toyota     二...
  • iseehz
  • iseehz
  • 2016年11月30日 20:48
  • 251

react项目实战(权限模块开发六)semantic-ui-react 加入到开发环境中

  • weiyongliang_813
  • weiyongliang_813
  • 2017年07月27日 00:25
  • 1232

前端框架之Semantic UI

  • gmg082900
  • gmg082900
  • 2016年08月15日 21:02
  • 3067

42.前端开发框架Semantic UI(一)

简单介绍前端开发框架Semantic UI的安装及使用方法,详细介绍了其button组件的用法。...
  • a464057216
  • a464057216
  • 2016年09月10日 11:37
  • 14434

关于LSA(Latent Semantic Analysis)主题模型的个人理解

  • cang_sheng_ta_ge
  • cang_sheng_ta_ge
  • 2015年07月01日 11:43
  • 2849

语义网技术(Semantic Web Technologies)复习资料

(1)应用 1.RDF(S) 文件 -> RDF Graph   SPARQL Query (BGP, OPTIONAL, UNION, ORDER BY, FILTER) -> answer ...
  • huangxiongbiao
  • huangxiongbiao
  • 2015年09月11日 11:53
  • 2863
您举报文章:The Road to the Semantic Web