原文 :An Intro To The Semantic Web 原作者:zenkat
zenkat 于 Saturday, April 21st, 2007 at 12:23 pm 发表该文,并April 26th, 2007 at 5:20 pm 对文章进行了修改。
本文笔记:语义网的提出已经有一段较长时间的历史,然而在过去的几年中,并没有取得多大进展,直到现在,它才开始有了快速的发展。在传统的网络上,网络是用来发布各种信息的,这些文献信息对于人来说都是可读的,然而,机器并不懂这些信息的内容和涵义。这样,语义网的初衷便是使机器能真正读懂和理解这些信息。为了达到这个目的,就需要建立一系列的标准和规则,使机器能按照这些标准和规则去理解信息。而现在,资源描述框架(RDF)和本体描述语言模型(OWL)的提出和发展,以及相应的查询语言如 SPARQL的应用,使得语义网成为现实的可能。而事实上,目前国内外基于本体的各种应用的成功案例,已经在一定程度上实现了语义网的功能。
语义网能够解决现存的技术中存在的许多问题,如利用通用的搜索引擎检索信息时的查准率问题,原作者zenkat 举了一个简单的例子说明了这种问题,并说明语义网是如何能解决这些问题的。然而,语义网的发展也仍然存在一些问题,因为他还远不成熟。但是可以预言,它会有良好的发展趋势,并会有广阔的应用。
原文内容如下:
The concept of the sematic web is a few years old now, but is only now really beginning to gain real-world traction. The idea is based upon the simple observation that the current web mainly consists of a network of human-readable documents, not computer-parsable data. Because of this, the web is extremely useful for humans to gather data and information, but not at all useful for computers. The sematic web seeks to overcome this limitation by promoting standards for information representation and exchange to create a “web of data”. Much in the same way that technical interchange standards like HTTP and HTML allowed the organic growth of the “web of documents”, new technical standards will provide a fertile ground for the growth of this “web of data”.
The key technical standards for the Semantic Web are RDF and OWL, both of which were concieved by Tim Berners-Lee and later developed into working standards by the collective efforts of many contributors to W3C working groups. These standards provide a consistent, unifed way of representing knowledge and information as well as mechnisms for exchanging this information. There is also SPARQL, the emerging standard query language for RDF data stores.
So what can you do with the Semantic Web? Theoretically, lots. Consider the following simple question: what are the homepages for all of the Web 2.0 companies located in San Francisco? With today’s tools, this is a nearly impossible question to answer. Typing “web 2.0 company san francisco” into Google returns a confusing mishmash of 12 million hits, most of which are neither companies nor located in San Francisco. It’s up to you, the human on the other side of the screen, to sift through the dross of ads, conference announcements, articles, and blog chatter to find the few gems you are looking for. It’s also up to you, the human, to cut/paste all of these into a spreadsheet for tracking.
This is a royal pain in the ass. I should know — I’ve tried to compile this list, and was quickly frustrated.
The Semantic Web solves this problem by providing a standard mechanisms for web sites to publish data, instead of documents. One could imagine that every company that wanted to make its presence known on the Semantic Web would publish a set of RDF tags (<MyCompany, location, San Francisco> and <MyCompany, field, Web 2.0>) describing itself. With the information in a standard format, query tools could then allow construction of targeted queries that answer the specific question at hand.
Of course, this is only a basic example. Imagine if scientific publications provided RDF representations of the data contained within them, or if data repositories like NCBI and PubChem provided RDF gateways into their data. We’ll be exploring these questions in an upcoming entry.
I also think there’s a lot of promise in using RDF data stores as a simple replacement for standard relational databases, especially in environments that require very dynamic data models. Since RDF entries in essence define their own schema, developers are no longer tied to a fixed data model. As the domain space changes and the data representation evolve, RDF can transparently allow entry of new attributes for key data. Anyone who has tried to maintain a LIMS will quickly understand the power of this method.
Radar Networks is one of a handful of companies that are focused on bringing “sematic web” technologies to market. Other players in the field include the startups Metaweb (developers of Freebase), Zephira, and Franz. There are also a variety of public/aceademic/opensource efforts, including SIMILE, Jena, and dbepdia. But there are heavyweights, too — players like Oracle are active in the space.
The field is just emerging from its infancy, but many see a bright future ahead for the Semantic Web. We’ve seen what the Network Effect can do for document repositories, software development projects, and community building. Just imagine what it can do with the world’s collective knowledge!