从0->1构建知识图谱练习(KG,Knowledge Graph)

参考资料:Knowledge Graph: Data Science Technique to Mine Information from Text (with Python code)

链接需要挂梯子。

一篇写的比较容易理解的文章,根据作者的思路和展示能实现成功。

数据集地址:

wiki_sentences_v2.csv

代码实现时要注意IDE版本和一些库的版本适配。尤其是spacy库,新版和旧版的参数会有所不同,附上我使用的版本:Python 3.8.19

如果想要构建新环境可以使用以下的内容写入requirements.txt并在终端运行 pip install -r requirements.txt。

Python Version: 3.8.19

absl-py==2.0.0
accelerate==0.23.0
aiofiles==23.2.1
aiohttp==3.8.6
aiosignal==1.3.1
aliyun-python-sdk-core==2.14.0
aliyun-python-sdk-kms==2.16.2
altair==5.1.2
annotated-types==0.6.0
anyio==3.7.1
asgiref==3.7.2
astor==0.8.1
async-timeout==4.0.3
attrdict==2.0.1
attrs==23.1.0
Babel==2.13.1
backports.zoneinfo==0.2.1
bce-python-sdk==0.8.95
beautifulsoup4==4.12.2
blinker==1.6.3
blis==0.7.11
boto3==1.28.82
botocore==1.31.82
bottle==0.12.25
cachetools==5.3.1
catalogue==2.0.10
certifi==2023.7.22
cffi==1.16.0
charset-normalizer==3.3.0
click==8.1.7
cloudpathlib==0.16.0
colorama==0.4.6
common==0.1.2
confection==0.1.3
ConfigArgParse==1.7
contourpy==1.1.1
cpm-kernels==1.0.11
crcmod==1.7
cryptography==41.0.7
cssselect==1.2.0
cssutils==2.9.0
ctranslate2==3.20.0
cycler==0.12.1
cymem==2.0.8
Cython==3.0.5
data==0.4
datasets==2.19.0
decorator==4.4.2
dill==0.3.7
docopt==0.6.2
dual==0.0.10
dynamo3==0.4.10
easydict==1.11
en-core-web-sm==3.7.1
et-xmlfile==1.1.0
evaluate==0.4.1
exceptiongroup==1.1.3
faiss-cpu==1.7.1.post2
fastapi==0.103.2
fasttext-wheel==0.9.2
ffmpy==0.3.1
filelock==3.12.4
fire==0.5.0
Flask==3.0.0
flask-babel==4.0.0
flatbuffers==23.5.26
flywheel==0.5.4
fonttools==4.43.1
frozenlist==1.4.0
fsspec==2023.6.0
funcsigs==1.0.2
future==0.18.3
gast==0.3.3
gitdb==4.0.10
GitPython==3.1.37
google-auth==2.23.4
google-auth-oauthlib==1.0.0
gradio==3.47.1
gradio_client==0.6.0
grpcio==1.59.2
h11==0.14.0
httpcore==0.18.0
httpx==0.25.0
huggingface-cli==0.1
huggingface-hub==0.22.2
icetk==0.0.4
idna==3.4
imageio==2.32.0
imbalanced-learn==0.12.2
imgaug==0.4.0
importlib-metadata==6.8.0
importlib-resources==6.1.0
iopath==0.1.10
itsdangerous==2.1.2
jieba==0.42.1
Jinja2==3.1.2
jmespath==0.10.0
joblib==1.3.2
jsonify==0.5
jsonschema==4.19.1
jsonschema-specifications==2023.7.1
kiwisolver==1.4.5
langcodes==3.3.0
latex2mathml==3.75.2
layoutparser==0.3.4
Levenshtein==0.23.0
libaio==0.9.1
llvmlite==0.41.1
lmdb==1.4.1
lxml==4.9.3
Markdown==3.5
markdown-it-py==3.0.0
MarkupSafe==2.1.3
matplotlib==3.7.3
mdtex2html==1.2.0
mdurl==0.1.2
mpmath==1.3.0
multidict==6.0.4
multiprocess==0.70.15
murmurhash==1.0.10
networkx==3.1
nltk==3.8.1
numba==0.58.1
numpy==1.21.0
oauthlib==3.2.2
onnxruntime==1.10.0
opencv-contrib-python==4.2.0.32
opencv-python==4.6.0.66
OpenNMT-py==2.3.0
openpyxl==3.1.2
openxlab==0.0.29
opt-einsum==3.3.0
orjson==3.9.7
oss2==2.17.0
packaging==23.2
paddle==1.0.2
paddle-bfloat==0.1.7
paddleclas==2.5.1
paddleocr==2.7.0.3
paddlepaddle==2.4.1
pandas==2.0.3
pdf2docx==0.5.5
pdf2image==1.17.0
pdfminer.six==20231228
pdfplumber==0.11.0
peewee==3.17.0
peft==0.5.0
Pillow==10.0.0
pip==23.3.1
pipreqs==0.4.13
pkgutil_resolve_name==1.3.10
portalocker==2.8.2
premailer==3.10.0
preshed==3.0.9
prettytable==3.9.0
protobuf==3.20.0
prox==0.0.17
psutil==5.9.5
pyahocorasick==2.0.0
pyarrow==13.0.0
pyarrow-hotfix==0.6
pyasn1==0.5.0
pyasn1-modules==0.3.0
pybind11==2.11.1
pyclipper==1.3.0.post5
pycparser==2.21
pycryptodome==3.19.0
pydantic==2.4.2
pydantic_core==2.10.1
pydeck==0.8.1b0
pydub==0.25.1
Pygments==2.16.1
PyMuPDF==1.20.2
PyMuPDFb==1.23.6
pynndescent==0.5.12
pyonmttok==1.37.1
pyparsing==3.1.1
pypdfium2==4.29.0
PySocks==1.7.1
python-dateutil==2.8.2
python-docx==1.1.0
python-geoip-python3==1.3
python-Levenshtein==0.23.0
python-multipart==0.0.6
pytz==2023.3.post1
PyWavelets==1.4.1
pywin32==306
PyYAML==6.0.1
rapidfuzz==3.5.2
rarfile==4.1
referencing==0.30.2
regex==2023.10.3
requests==2.28.2
requests-oauthlib==1.3.1
responses==0.18.0
rich==13.4.2
rouge-chinese==1.0.3
rpds-py==0.10.4
rsa==4.9
s3transfer==0.7.0
sacrebleu==2.3.1
safetensors==0.4.3
scikit-image==0.17.2
scikit-learn==1.3.2
scipy==1.10.1
semantic-version==2.10.0
sentencepiece==0.1.95
setuptools==60.2.0
shapely==2.0.2
six==1.16.0
smart-open==6.4.0
smmap==5.0.1
sniffio==1.3.0
soupsieve==2.5
spacy==3.7.2
spacy-legacy==3.0.12
spacy-loggers==1.0.5
sqlparse==0.4.4
srsly==2.4.8
sse-starlette==1.6.5
starlette==0.27.0
streamlit==1.27.2
sympy==1.12
tabulate==0.9.0
tenacity==8.2.3
tensorboard==2.14.0
tensorboard-data-server==0.7.2
termcolor==2.3.0
thinc==8.2.1
threadpoolctl==3.2.0
tifffile==2023.7.10
tight==0.1.0
tokenizers==0.13.3
toml==0.10.2
toolz==0.12.0
torch==2.1.0+cu121
torchaudio==2.1.0
torchtext==0.5.0
torchvision==0.16.0
tornado==6.3.3
tqdm==4.65.2
transformers==4.26.1
typer==0.9.0
typing_extensions==4.8.0
tzdata==2023.3
tzlocal==5.1
ujson==5.8.0
umap==0.1.1
umap-learn==0.5.6
urllib3==1.26.18
uvicorn==0.23.2
validators==0.22.0
visualdl==2.5.3
waitress==2.1.2
wasabi==1.1.2
watchdog==3.0.0
wcwidth==0.2.9
weasel==0.3.3
websockets==11.0.3
Werkzeug==3.0.1
wheel==0.41.2
xxhash==3.4.1
yarg==0.1.9
yarl==1.9.2
zipp==3.17.0

还需要一个预训练的英文语言模型en_core_web_sm:

可以在终端直接pip install en_core_web_sm,模型版本要和spacy库对应。

或者下载模型到本地:

third-party-oneoffs/en-core-web-sm: spacy-models en-core-web-sm (github.com)

或使用顶部的资源。

然后终端运行命令:

pip install en_core_web_sm-2.3.0.tar.gz

可以运行测试代码查看:

import spacy

nlp = spacy.load('en_core_web_sm')

doc = nlp("The 22-year-old recently won ATP Challenger tournament.")

for tok in doc:
    print(tok.text, "...", tok.dep_)

然后可以从顶部文章链接扒代码运行,记得要仔细看文章内容,黑框白框都有相关代码。

有何使用体验和心得欢迎私信交流~

这是一个有向图,可以用邻接表来表示。邻接表是一个数组,其中每个元素都指向一个链表,链表中存储该节点指向的所有节点。 下面是用 C 语言程序实现该图的邻接表表示方法,其中定义了一个结构体来表示链表中的节点,节点中存储了指向的节点的编号和下一个节点的指针: ```c #include <stdio.h> #include <stdlib.h> // 链表节点结构体 typedef struct Node { int dest; struct Node* next; } Node; // 邻接表结构体 typedef struct Graph { int numVertices; Node** adjLists; } Graph; // 创建新节点 Node* createNode(int dest) { Node* newNode = (Node*) malloc(sizeof(Node)); newNode->dest = dest; newNode->next = NULL; return newNode; } // 创建邻接表 Graph* createGraph(int numVertices) { Graph* graph = (Graph*) malloc(sizeof(Graph)); graph->numVertices = numVertices; graph->adjLists = (Node**) malloc(numVertices * sizeof(Node*)); int i; for (i = 0; i < numVertices; i++) { graph->adjLists[i] = NULL; } return graph; } // 添加边 void addEdge(Graph* graph, int src, int dest) { Node* newNode = createNode(dest); newNode->next = graph->adjLists[src]; graph->adjLists[src] = newNode; } // 输出每个节点的出度 void printOutDegree(Graph* graph) { int i; for (i = 0; i < graph->numVertices; i++) { int outDegree = 0; Node* temp = graph->adjLists[i]; while (temp != NULL) { outDegree++; temp = temp->next; } printf("Node %d has out degree %d\n", i, outDegree); } } int main() { Graph* graph = createGraph(4); addEdge(graph, 0, 2); addEdge(graph, 2, 1); addEdge(graph, 2, 3); addEdge(graph, 3, 0); printOutDegree(graph); return 0; } ``` 输出结果为: ``` Node 0 has out degree 1 Node 1 has out degree 0 Node 2 has out degree 2 Node 3 has out degree 1 ``` 可以看到,节点0的出度为1,节点1的出度为0,节点2的出度为2,节点3的出度为1。
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值