python构建知识图谱_Zincbase 一个知识图谱构建工具包

68747470733a2f2f636972636c6563692e636f6d2f67682f746f6d6772656b2f7a696e63626173652e7376673f7374796c653d73766768747470733a2f2f7a656e6f646f2e6f72672f62616467652f3138333833313236352e73766768747470733a2f2f72656164746865646f63732e6f72672f70726f6a656374732f7a696e63626173652f62616467652f3f76657273696f6e3d6c6174657374

Hello!

The tech behind parts of ZincBase was acquired. This repo is still here for reference, but it is deprecated.

Fortunately, work still goes on. Apart from a couple of fringe bits, the active repo lives here.

The new owner of ZincBase as it is today is ComplexDB.

Alright, you still want to continue

57199440-c45daf00-6f33-11e9-91df-1a6a9cae6fb7.png

ZincBase is a state of the art knowledge base. It does the following:

Extract facts (aka triples and rules) from unstructured data/text

Store and retrieve those facts efficiently

Build them into a graph

Provide ways to query the graph, including via bleeding-edge graph neural networks.

Zincbase exists to answer questions like "what is the probability that Tom likes LARPing", or "who likes LARPing", or "classify people into LARPers vs normies":

57595488-2dc45b80-74fa-11e9-80f4-dc5c7a5b22de.png

It combines the latest in neural networks with symbolic logic (think expert systems and prolog) and graph search.

View full documentation here.

Quickstart

from zincbase import KB

kb = KB()

kb.store('eats(tom, rice)')

for ans in kb.query('eats(tom, Food)'):

print(ans['Food']) # prints 'rice'

...

# The included assets/countries_s1_train.csv contains triples like:

# (namibia, locatedin, africa)

# (lithuania, neighbor, poland)

kb = KB()

kb.from_csv('./assets/countries.csv')

kb.build_kg_model(cuda=False, embedding_size=40)

kb.train_kg_model(steps=2000, batch_size=1, verbose=False)

kb.estimate_triple_prob('fiji', 'locatedin', 'melanesia')

0.8467

Requirements

Python 3

Libraries from requirements.txt

GPU preferable for large graphs but not required

Installation

pip install -r requirements.txt

Note: Requirements might differ for PyTorch depending on your system.

Testing

python test/test_main.py

python test/test_graph.py

python test/test_lists.py

python test/test_nn_basic.py

python test/test_nn.py

python test/test_neg_examples.py

python test/test_truthiness.py

python -m doctest zincbase/zincbase.py

Validation

"Countries" and "FB15k" datasets are included in this repo.

There is a script to evaluate that ZincBase gets at least as good performance on the Countries dataset as the original (2019) RotatE paper. From the repo's root directory:

python examples/eval_countries_s3.py

It tests the hardest Countries task and prints out the AUC ROC, which should be ~ 0.95 to match the paper. It takes about 30 minutes to run on a modern GPU.

There is also a script to evaluate performance on FB15k: python examples/fb15k_mrr.py.

Building documentation

From docs/ dir: make html. If something changed a lot: sphinx-apidoc -o . ..

TODO

Add documentation

to_csv method

utilize postgres as backend triple store

The to_csv/from_csv methods do not yet support node attributes.

Add relation extraction from arbitrary unstructured text

Add context to triple - that is interpreted by BERT/ULM/GPT-2 similar and put into an embedding that's concat'd to the KG embedding.

Reinforcement learning for graph traversal.

References & Acknowledgements

Citing

If you use this software, please consider citing:

@software{zincbase,

author = {{Tom Grek}},

title = {ZincBase: A state of the art knowledge base},

url = {https://github.com/tomgrek/zincbase},

version = {0.1.1},

date = {2019-05-12}

}

Contributing

See CONTRIBUTING. And please do!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值