从任何文本中提取知识图谱的AI工具kg-gen

最新推荐文章于 2025-02-23 07:15:00 发布

程序猿李巡天

最新推荐文章于 2025-02-23 07:15:00 发布

阅读量1.3k

点赞数 17

文章标签：人工智能知识图谱深度学习计算机视觉 excel 网络学习

本文链接：https://blog.csdn.net/m0_59235945/article/details/145781891

版权

项目简介

kg-gen 帮助您从任何纯文本中提取知识图谱，使用 AI。它可以处理小型和大型文本输入，还可以处理对话格式的消息。
为什么生成知识图谱？ kg-gen 如果你想：

创建一个图来辅助 RAG（检索增强生成）
创建用于模型训练和测试的图合成数据
将任何文本结构化为图
分析源文本中概念之间的关系

我们通过 LiteLLM 支持基于 API 和本地模型提供商，包括 OpenAI、Ollama、Anthropic、Gemini、Deepseek 等，还使用 DSPy 进行结构化输出生成。尝试通过运行 tests/ 中的脚本来试用。运行我们的 KG 基准测试 MINE 的说明在 MINE/ 。
阅读论文：KGGen：使用语言模型从纯文本中提取知识图谱Quick

快速开始

安装模块：

pip install kg-gen

然后导入并使用 kg-gen 。您可以以两种格式之一提供您的文本输入：

A single string 消息对象列表（每个对象具有角色和内容）
以下是一些示例片段：

from kg_gen import KGGen``   ``# Initialize KGGen with optional configuration``kg = KGGen(`  `model="openai/gpt-4o",  # Default model`  `temperature=0.0,        # Default temperature`  `api_key="YOUR_API_KEY"  # Optional if set in environment``)``   ``# EXAMPLE 1: Single string with context``text_input = "Linda is Josh's mother. Ben is Josh's brother. Andrew is Josh's father."``graph_1 = kg.generate(`  `input_data=text_input,`  `context="Family relationships"``)``# Output:` `# entities={'Linda', 'Ben', 'Andrew', 'Josh'}` `# edges={'is brother of', 'is father of', 'is mother of'}` `# relations={('Ben', 'is brother of', 'Josh'),` `#           ('Andrew', 'is father of', 'Josh'),` `#           ('Linda', 'is mother of', 'Josh')}``   ``# EXAMPLE 2: Large text with chunking and clustering``with open('large_text.txt', 'r') as f:`  `large_text = f.read()``   ``# Example input text:``# """``# Neural networks are a type of machine learning model. Deep learning is a subset of machine learning``# that uses multiple layers of neural networks. Supervised learning requires training data to learn``# patterns. Machine learning is a type of AI technology that enables computers to learn from data.``# AI, also known as artificial intelligence, is related to the broader field of artificial intelligence.``# Neural nets (NN) are commonly used in ML applications. Machine learning (ML) has revolutionized``# many fields of study.``# ...``# """``   ``graph_2 = kg.generate(`  `input_data=large_text,`  `chunk_size=5000,  # Process text in chunks of 5000 chars`  `cluster=True      # Cluster similar entities and relations``)``# Output:``# entities={'neural networks', 'deep learning', 'machine learning', 'AI', 'artificial intelligence',` `#          'supervised learning', 'unsupervised learning', 'training data', ...}` `# edges={'is type of', 'requires', 'is subset of', 'uses', 'is related to', ...}` `# relations={('neural networks', 'is type of', 'machine learning'),``#           ('deep learning', 'is subset of', 'machine learning'),``#           ('supervised learning', 'requires', 'training data'),``#           ('machine learning', 'is type of', 'AI'),``#           ('AI', 'is related to', 'artificial intelligence'), ...}``# entity_clusters={``#   'artificial intelligence': {'AI', 'artificial intelligence'},``#   'machine learning': {'machine learning', 'ML'},``#   'neural networks': {'neural networks', 'neural nets', 'NN'}``#   ...``# }``# edge_clusters={``#   'is type of': {'is type of', 'is a type of', 'is a kind of'},``#   'is related to': {'is related to', 'is connected to', 'is associated with'``#  ...}``# }``   ``# EXAMPLE 3: Messages array``messages = [`  `{"role": "user", "content": "What is the capital of France?"},`   `{"role": "assistant", "content": "The capital of France is Paris."}``]``graph_3 = kg.generate(input_data=messages)``# Output:` `# entities={'Paris', 'France'}` `# edges={'has capital'}` `# relations={('France', 'has capital', 'Paris')}``   ``# EXAMPLE 4: Combining multiple graphs``text1 = "Linda is Joe's mother. Ben is Joe's brother."``   ``# Input text 2: also goes by Joe."``text2 = "Andrew is Joseph's father. Judy is Andrew's sister. Joseph also goes by Joe."``   ``graph4_a = kg.generate(input_data=text1)``graph4_b = kg.generate(input_data=text2)``   ``# Combine the graphs``combined_graph = kg.aggregate([graph4_a, graph4_b])``   ``# Optionally cluster the combined graph``clustered_graph = kg.cluster(`  `combined_graph,`  `context="Family relationships"``)``# Output:``# entities={'Linda', 'Ben', 'Andrew', 'Joe', 'Joseph', 'Judy'}` `# edges={'is mother of', 'is father of', 'is brother of', 'is sister of'}` `# relations={('Linda', 'is mother of', 'Joe'),``#           ('Ben', 'is brother of', 'Joe'),``#           ('Andrew', 'is father of', 'Joe'),``#           ('Judy', 'is sister of', 'Andrew')}``# entity_clusters={``#   'Joe': {'Joe', 'Joseph'},``#   ...``# }``# edge_clusters={ ... }

功能

大文本分块
对于长文本，您可以指定一个 chunk_size 参数以将文本分块处理：

graph = kg.generate(`  `input_data=large_text,`  `chunk_size=5000  # Process in chunks of 5000 characters``)

聚类相似实体和关系

您可以聚类相似实体和关系，无论是在生成过程中还是之后：

# During generation``graph = kg.generate(`  `input_data=text,`  `cluster=True,`  `context="Optional context to guide clustering"``)``   ``# Or after generation``clustered_graph = kg.cluster(`  `graph,`  `context="Optional context to guide clustering"``)

聚合多个图

您可以使用聚合方法组合多个图表：

graph1 = kg.generate(input_data=text1)``graph2 = kg.generate(input_data=text2)``combined_graph = kg.aggregate([graph1, graph2])

消息数组处理

处理消息数组时，kg-gen：

保留每条消息的角色信息
维护消息顺序和边界
能提取实体和关系：

消息中提到的概念之间
演讲者（角色）与概念之间
在对话中的多条消息

例如，给定这个对话：

messages = [`  `{"role": "user", "content": "What is the capital of France?"},`  `{"role": "assistant", "content": "The capital of France is Paris."}``]

生成的图形可能包括以下实体：

  "user"  

 "assistant"

“France”
“Paris”

并且关系如下：

(user, “asks about”, “France”)

(assistant, “states”, “Paris”)

(Paris, “is capital of”, “France”)

API 参考

KGGen 类

构造函数参数

`model` : str = “openai/gpt-4o” - 使用的生成模型

temperature : 浮点数 = 0.0 - 模型采样的温度
api_key : Optional[str] = None - 模型访问的 API 密钥

生成()方法参数

Union[str, List[Dict]] - 文本字符串或消息字典列表

model : Optional[str] - 覆盖默认模型
api_key : Optional[str] - 覆盖默认 API 密钥
context : str = “” - 数据上下文描述
chunk_size : 可选[int] - 处理文本块的大小

`cluster` : 布尔型 = False - 是否在生成后对图进行聚类
temperature : Optional[float] - 覆盖默认温度
output_folder:可选的路径以保存部分进度

cluster() 方法参数

graph 聚类图
context : str = “” - 数据上下文描述
model : Optional[str] - 覆盖默认模型
temperature : Optional[float] - 覆盖默认温度
api_key : Optional[str] - 覆盖默认 API 密钥

aggregate() 方法参数

graphs : 图列表 - 要组合的图列表

项目链接

http://github.com/stair-lab/kg-gen

如何学习大模型 AI ？

由于新岗位的生产效率，要优于被取代岗位的生产效率，所以实际上整个社会的生产效率是提升的。

但是具体到个人，只能说是：

“最先掌握AI的人，将会比较晚掌握AI的人有竞争优势”。

这句话，放在计算机、互联网、移动互联网的开局时期，都是一样的道理。

我在一线互联网企业工作十余年里，指导过不少同行后辈。帮助很多人得到了学习和成长。

我意识到有很多经验和知识值得分享给大家，也可以通过我们的能力和经验解答大家在人工智能学习中的很多困惑，所以在工作繁忙的情况下还是坚持各种整理和分享。但苦于知识传播途径有限，很多互联网行业朋友无法获得正确的资料得到学习提升，故此将并将重要的AI大模型资料包括AI大模型入门学习思维导图、精品AI大模型学习书籍手册、视频教程、实战学习等录播视频免费分享出来。

在这里插入图片描述