Introduction to Knowledge bases

最新推荐文章于 2022-06-02 15:12:10 发布

是ひま呀

最新推荐文章于 2022-06-02 15:12:10 发布

阅读量216

点赞数 1

分类专栏： # WDPS 课程笔记

本文链接：https://blog.csdn.net/Odessa_R/article/details/103554615

版权

WDPS 同时被 2 个专栏收录

5 篇文章 0 订阅

订阅专栏

课程笔记

5 篇文章 0 订阅

订阅专栏

Introduction to KB

The limits of text

initially the retrieval was keyword-based (text)
then it moved to entity-based (knowledge)
Knowledge is a familiarity, awareness or understanding or someone or something, such as facts, information, descriptions or skills.

build knowledge repositories from the web

Manifest knowledge (accessible to humans)
- knowledge bases or knowledge graphs
- typically constructed manually or from unstructured sources
latent knowledge (hidden)
- latent models or latent feature models
- typically learned using machine learning techniques.

Manifest knowledge

logic is the language that humans designed to express knowledge.
opinion: knowledge is something we can interpret without ambiguities.
knowledge bases: crystallisation of factual knowledge in the form of associations between entities and relations
- can be expressed as first-order logic
- recently Google rebranded knowledge bases as knowledge-graphs

First order logic, 简称FOL
包含的东西有常量（Constant symbol），谓词符号（Predicate symbol），函数符号（Function symbol），变量（Variable），连词（ ∧∨→↔），量词（Quantifiers, ∃∀），例如：
Father(Mary) = Bob
father_of(Mary, Bob)

First order logic
More for FOL

Latent knowledge

Opinion: we do not need to be able to interpret knowledge, as long as it does what it is supposed to do.

become popular due to deep learning
ex. Google’s word2vec

Knowledge bases on the Web

Word net - most famous, sets of synonyms

每个单词可以是单义词（monosemous）或者多义词（polysemous）
每个synset都有一个评注（gloss），并且同其他synsets以不同的语义关系连接起来。最重要的几个是：Hypernyms (isA / 上位词) 和Meronym (partOf /借代)

RDF - resource description framework

本质是一个数据模型，表现形式为SPO三元组 (triples)
A RDF dataset can be represented as a directed graph

举例
<http://www.vu.nl> <rdf:type> <wikipedia/University>

RDF图中一共有三种类型，International Resource Identifiers(IRIs)，blank nodes 和 literals。

Subject可以是IRI或blank node。
Predicate是IRI。
Object三种类型都可以。

什么是知识图谱
 知识图谱基础

SPARQL - to query a RDF dataset

SQL-inspired syntax
Findinganswers to a SPARQL query corresponds to find allpossible graph homomorphisms between the query and thegraph.

查询举例：

SELECT，指定我们要查询的变量。在这里我们查询所有的变量，用*代替。
WHERE，指定我们要查询的图模式。含义上和SQL的WHERE没有区别。
FROM，指定查询的RDF数据集。
PREFIX，用于IRI的缩写。

没有模式匹配的查询

SELECT ?X, ?Y FROM{
?X <rdf:type> <wikipedia/University>.
?X <rdf:label> ?Y.
}

example input
<http://www.vu.nl> <rdf:type> <wikipedia/University> .
<http://www.vu.nl> <rdf:label> ”VU University” .
_:x <http://www.vu.nl#studies> <http://www.vu.nl> .

output
{
?X-> <http://www.vu.nl>
?Y-> “VU University” 
}

如果要查询所有数据，那spo三元组每个都是未知变量

SELECT * WHERE {
	?s ?p ?o
}

查询周星驰出演了哪些电影
*这里最终查询值是movieTitle

SELECT ?n WHERE {
  ?s rdf:type :Person.
  ?s :personName '周星驰'.
  ?s :hasActedIn ?o.
  ?o :movieTitle ?n
}

两个部分组成：协议和查询语言
一个SPARQL查询本质上是一个带有变量的RDF图
简而言之，SPARQL查询分为三个步骤：

构建查询图模式，表现形式就是带有变量的RDF。

匹配，匹配到符合指定图模式的子图。

绑定，将结果绑定到查询图模式对应的变量上。

SPARQL

DBPedia

Project to convert Wikipedia pages into RDF
Leverages structured content contained in thepages
Infoboxes
Labels
Categories
Redirects

Contains links toother KBs
Widely popular in the “linked-data-cloud”
Fairly large ontology but not richin terms of expressiveness
320 classes
1650 properties
Alignment between infoboxes and ontologies is done via community-provided mappings

YAGO - high standard of quality

Goals
Unify Wikipedia and Wordnet
Exploit Wikipedia Infoboxes to extract clean facts
Check the plausibility of facts via type checking

Freebase - collaborative KB

Wikidata - mainly text

数据由社区认证
保留信息来源
多语言支持
支持复数

Data is validated by the community
Keeps provenance of the data
Multilingual by design
Supports plurality

BabelNet - merging wordnet and wikipedia

linguistic community
can only be accessed through APIs
Three tasks

Combine Wordnet and Wikipedia by establishing mapping between them
Harvest multilingual lexicalizations (using Wikipedia inter-language links and machine translation)
Establish relations between Wordnet synsets
通过建立投射关系来把wordnet和Wikipedia结合起来
多语言化（使用Wikipedia内建语言链接和机器翻译）
在wordnet的同义词组间建立联系

从关键词识别到实体识别意味着要使搜索引擎能够理解文本内容，而把data转变成knowledge，我们必须要搭建一个knowledge repository。

知识的类型又分为显式的和隐式的，前一种我们可以通过knowledge bases 或者 knowledge graphs来进行存储，后一种由于不可见，需要使用latent models or latent feature models. 显式知识通常可以人工搭建或者从非结构性来源创建，而隐式知识通常由机器学习得来。

是ひま呀

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Introduction to Knowledge bases

Introduction to KBThe limits of textinitially the retrieval was keyword-based (text)then it moved to entity-based (knowledge)Knowledge is a familiarity, awareness or understanding or someone or so...
复制链接

扫一扫