1. Introduction
Web search in the pre-knowledge graph age: Documents + Keywords (not: disambiguated entities)
Web search in the knowledge graph age: Structured information + All entities are disambiguated
1.1 Definitions
There are entities and relations that are connected and form a graph
There is a set of entity and relation types. Those are often referred to as a schema or ontology.
定义不统一
The “Classic” Web
a.k.a. “World WId Web”, “Document Web”
- use HTTP Protocol and URLs
- Information is accessible to humans not machines.
- Full-text search by keywords
- Problems
- Finding information
Keyword based search instead of natural language questions
Different natural languages
Synonyms, homonyms and polysemous words
Ambiguity of natural language - Processing information (Formats and encodings)
- Making use of information (Distributed across pages, e.g., a book’s author on the publishers site, address on his/her personal page)
- Finding information
Example: Wolfram Alpha
– Structured knowledge
– Explicit inference rules
– Good at precise factual questions
– Not good at common knowledge
Example: ChatGPT
– Unstructured knowledge
– Heuristic inference
– Good at common knowledge
– Not good at precise factual questions
Web Mining / Information Extraction: Extract information from the Web
Knowledge Graphs: Create machine-interpretable information
Semantic Web Vision
- Provide information in machine interpretable form
- Make (semantic) links between (data) documents usable
- Facilitate useful (!) complex queries
- Allow logical reasoning
(Enterprise) Knowledge Graph Vision
- Integrate data from different sources
- Make connections between entities in those sources
- Facilitate cross data source queries
- Overcome data silos
Syntactic Interoperability: Character Sets, Multilinguality, Unicode
1.2 XML, HTML, XPATH
eXtensible Markup Language
Universal format for data exchange and integration
Tags(nested) & Attributes
Namespaces: using prefixes(prefix:name), Each namespace itself is a URI
HTML
documents look like XML documents but they are usually not well-formed!
XHTML
HTML as well-formed XML documents
XPath
Query language for XML
Accessing Information in XML
1.3 XML: Document Type Definition (DTD)
Defines valid elements for a class of XML documents (names, allowed attributes, allowed nested child elements)
XML documents matching a DTD are called “valid”
1.4 XML Schema
XML schemas are XML files themselves
More flexible than DTDs
- Minimum and maximum number of elements
- Combinations of elements (either/or, combinations w/out fixed order, …)
- Data types (Numbers, dates, …), own types may be defined
- Support for namespaces
- Possibility to create modular schemas
Syntax: how are correct sentences formed?
Semantics: what does a word and sentence mean?
Syntactic correctness does not guarantee semantic interpretability. Semantic interpretability does not require syntactic correctness
(for humans)
1.5 What does a DTD/Schema Define?
XML Schema / DTD defines the syntax of an XML document, but not its semantics.
(XML has no sematic meaning. An intelligent agent can combine information and dynamically create a system for a given purpose. Semantic Web can combine them better. If XML has no meaning, it’s impossible to meaningfully combine them.)
Tag names are not interpretable by machines
The Semantic Web is meant as a remedy to that problem (Semantic Web is/can do more than XML!)
1.6 Uniform Resource Identifiers (URIs)
Used for naming and finding resources on the Web
URIs vs. URLs
Uniform Resource Locators are a subset of URIs. URIs can refer to arbitrary things. A URL refers to a resource on the Web.