Knowledge Graph知识图谱—10. Data Quality and Linking

10. Data Quality and Linking

10.1 How well are the linked open data in practice?

Linked Open Data Best Practices
Provide Derefencable URIs
Set RDF links pointing at other data sources1

Set RDF links pointing at other data sources2

Use terms from widely deployed vocabularies1

Linked Open Vocabularies(LOV) project
– analyze usage of vocabularies

Make proprietary vocabulary terms dereferencable1

Make proprietary vocabulary terms dereferencable2

Map proprietary vocabulary terms to other vocabularies

Provide provenance metadata

Provide licensing metadata

Provide data-set-level metadata

Refer to additional access methods1

Refer to additional access methods2

More Indicators

10.2 Quality

Linked Data Conformance vs. Quality
Conformance: – i.e., following standards and best practices, technical dimension, can be evaluated automatically

Quality: – i.e., how complete/correct/… is the data, content dimension, hard to evaluate automatically

Quality of Knowledge Graphs

Issues with Automatic Evaluation1

Issues with Automatic Evaluation2

Example: Crowd Evaluation of DBpedia

The Quality of Linked Open Data is far from perfect: conformance & content
Improving the quality is an active field of research
– Survey 2017: >40 approaches
– since then: a lot of work in KG embeddings

10.3 Links

Previously on Knowledge Graphs

  • Integrate data from different sources
  • Make connections between entities in those sources
  • Facilitate cross data source queries
  • Overcome data silos

Why do we need Links?
Why do we need Links?

How do we Create the Links?
How do we Create the Links?

数据太多,很多将自己的跟其他数据集互连

10.3.1 Tool Support

A plethora of names
Mostly used for schema level:

  • Ontology matching/alignment/mapping
  • Schema matching/mapping

Mostly used for the instance level:

  • Instance matching/alignment
  • Interlinking
  • Link discovery

10.3.2 Automating Interlinking

Automating Interlinking1

Automating Interlinking2

Summary and Takeaways

Basic Interlinking Techniques
Basic Interlinking Techniques

Sources for Interlinking Signals

Sources for Interlinking Signals

Simple String Based Metrics

  • String equality
    e.g. foo:University_of_Mannheim, bar:University_of_Mannheim
  • Common prefixes
    e.g. foo:United_States, bar:United_States_of_America
  • Common postfixes
    e.g. foo:Barack_Obama, bar:Obama
  • Typical usage of prefixes/postfixes: |common|/max(length)
    foo:United_States, bar:United_States_of_America → 12/22
    foo:Barack_Obama, bar:Obama → 5/12

Edit Distance
Edit Distance

N-gram based Similarity
N-gram based Similarity

Typical Preprocessing Techniques
Typical Preprocessing Techniques述

Language-specific Preprocessing
Language-specific Preprocessing

Using External Knowledge
Using External Knowledge

From Matching Literals to Matching Entities
From Matching Literals to Matching Entities

Preprocessing and Matching Pipelines
Preprocessing and Matching Pipelines

10.4 Schema Matching

Schema Matching1
Schema Matching2

Schema Matching3

Schema Matching4

Schema Matching5

10.5 Instance based Matching

Instance based Matching

Enforcing 1:1 Mappings
Enforcing 1:1 Mappings
Schema Matching6

Schema Matching

10.5 Matcher Combination

Matcher Combination1

Matcher Combination2

Matcher Combination3

Evaluating Matchers
Evaluating Matchers

Challenges in Matching
Challenges in Matching

Summary and Takeaways

Unifying Large Language Models and Knowledge Graphs: A Roadmap

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值