这一周北方下雪啦,下了好大的雪。看着如此优美的雪景,再看看手上的问题,不禁一阵感叹,为啥问题不能像雪花一样悄无声息的消散呢。没办法撸起袖子开干吧,大雪会压迫人,但是压力会让人成长。
上周的文章中我提到了文档总结的三种方式:Stuff,Refine,Map-Reduce。现在复习一下:
- Stuff:这种方法包括从文档中收集所有信息,并以逻辑简洁的方式进行组织。这包括确定要点、关键思想和支持细节。目标是在不丢失基本信息的情况下,将文档压缩成一个更短、更易于管理的摘要。
- Refine:这种方法侧重于通过删除不必要或多余的信息来细化和改进文档摘要。它包括对句子进行编辑和改写,使其更加简洁明了。其目的是创建一个摘要,准确地捕捉文档的主要思想,同时消除任何不必要的细节。
- Map-Reduce:这种方法包括将文档分解成更小的部分或块,并将它们分配给不同的个人或团队进行摘要。每个人或小组都有责任总结他们分配的部分。一旦对所有章节进行了总结,这些总结就会合并或缩减为一个连贯的文档总结。这种方法允许以更高效和协作的方式汇总大型复杂文档。
第一种stuff模式总结文章对于短篇文章来说适用,但是篇幅过长就需要很长一段时间。所以今天我们来研究第二种Refine模式来总结文档。
下面我将使用LangChain表达式语言来实现RefineDocumentsChain,使其具有所有内置LCEL功能,其中我们着重点在于流式输出(网上关于流式输出的功能都是老版本的,最新的这个语言特性没有示例):
首先是全文返回方式:
from functools import partial
from operator import itemgetter
from langchain.callbacks.manager import trace_as_chain_group
from langchain.chat_models import ChatAnthropic
from langchain.prompts import PromptTemplate
from langchain.schema import StrOutputParser
from langchain_core.prompts import format_document
llm = ChatAnthropic()
first_prompt = PromptTemplate.from_template("Summarize this content:\n\n{context}")
document_prompt = PromptTemplate.from_template("{page_content}")
partial_format_doc = partial(format_document, prompt=document_prompt)
summary_chain = {"context": partial_format_doc} | first_prompt | llm | StrOutputParser()
refine_prompt = PromptTemplate.from_template(
"Here's your first summary: {prev_response}. "
"Now add to it based on the following context: {context}"
)
refine_chain = (
{
"prev_response": itemgetter("prev_response"),
"context": lambda x: partial_format_doc(x["doc"]),
}
| refine_prompt
| llm
| StrOutputParser()
)
def refine_loop(docs):
with trace_as_chain_group("refine loop", inputs={"input": docs}) as manager:
summary = summary_chain.invoke(
docs[0], config={"callbacks": manager, "run_name": "initial summary"}
)
for i, doc in enumerate(docs[1:]):
summary = refine_chain.invoke(
{"prev_response": summary, "doc": doc},
config={"callbacks": manager, "run_name": f"refine {i}"},
)
manager.on_chain_end({"output": summary})
return summary
from langchain.schema import Document
text = """Nuclear power in space is the use of nuclear power in outer space, typically either small fission systems or radioactive decay for electricity or heat. Another use is for scientific observation, as in a Mössbauer spectrometer. The most common type is a radioisotope thermoelectric generator, which has been used on many space probes and on crewed lunar missions. Small fission reactors for Earth observation satellites, such as the TOPAZ nuclear reactor, have also been flown.[1] A radioisotope heater unit is powered by radioactive decay and can keep components from becoming too cold to function, potentially over a span of decades.[2]
The United States tested the SNAP-10A nuclear reactor in space for 43 days in 1965,[3] with the next test of a nuclear reactor power system intended for space use occurring on 13 September 2012 with the Demonstration Using Flattop Fission (DUFF) test of the Kilopower reactor.[4]
After a ground-based test of the experimental 1965 Romashka reactor, which used uranium and direct thermoelectric conversion to electricity,[5] the USSR sent about 40 nuclear-electric satellites into space, mostly powered by the BES-5 reactor. The more powerful TOPAZ-II reactor produced 10 kilowatts of electricity.[3]
Examples of concepts that use nuclear power for space propulsion systems include the nuclear electric rocket (nuclear powered ion thruster(s)), the radioisotope rocket, and radioisotope electric propulsion (REP).[6] One of the more explored concepts is the nuclear thermal rocket, which was ground tested in the NERVA program. Nuclear pulse propulsion was the subject of Project Orion.[7]
Regulation and hazard prevention[edit]
After the ban of nuclear weapons in space by the Outer Space Treaty in 1967, nuclear power has been discussed at least since 1972 as a sensitive issue by states.[8] Particularly its potential hazards to Earth's environment and thus also humans has prompted states to adopt in the U.N. General Assembly the Principles Relevant to the Use of Nuclear Power Sources in Outer Space (1992), particularly introducing safety principles for launches and to manage their traffic.[8]
Benefits
Both the Viking 1 and Viking 2 landers used RTGs for power on the surface of Mars. (Viking launch vehicle pictured)
While solar power is much more commonly used, nuclear power can offer advantages in some areas. Solar cells, although efficient, can only supply energy to spacecraft in orbits where the solar flux is sufficiently high, such as low Earth orbit and interplanetary destinations close enough to the Sun. Unlike solar cells, nuclear power systems function independently of sunlight, which is necessary for deep space exploration. Nuclear-based systems can have less mass than solar cells of equivalent power, allowing more compact spacecraft that are easier to orient and direct in space. In the case of crewed spaceflight, nuclear power concepts that can power both life support and propulsion systems may reduce both cost and flight time.[9]
Selected applications and/or technologies for space include:
Radioisotope thermoelectric generator
Radioisotope heater unit
Radioisotope piezoelectric generator
Radioisotope rocket
Nuclear thermal rocket
Nuclear pulse propulsion
Nuclear electric rocket
"""
docs = [
Document(
page_content=split,
metadata={"source": "https://en.wikipedia.org/wiki/Nuclear_power_in_space"},
)
for split in text.split("\n\n")
]
最后调用方法:可以在控制台看见打印输出
print(refine_loop(docs))
那么我们如何实现流式输出呢:LCEL提供了stream方法和astream方法,我们翻看其源码:
async def astream(
self,
input: Input,
config: Optional[RunnableConfig] = None,
**kwargs: Optional[Any],
) -> AsyncIterator[Output]:
"""
Default implementation of astream, which calls ainvoke.
Subclasses should override this method if they support streaming output.
"""
yield await self.ainvoke(input, config, **kwargs)
他在最后会去调用ainvoke方法,那么就好办了,之前我们在学习LCEL的特性时,它包括了什么:(invoke、batch、stream、ainvoke),那么我们去调用它的ainvoke方法就好了
for chunk in runnable.ainvoke({"prev_response": summary, "doc": input_})
print(chunk,flush=true)
至此我们在对于使用LangChain调用Refine模式进行文档总结就完成了。
提供一个我在学习中发现的很还用的插件:ChatSNow,我在学习中很多知识点不熟悉或者使用都使用ChatSNow来回答,相比于百度谷歌自己去检索查看答案,ChatSNow能更加快速有效的给我答案。使用十分方便。