LangChain系列教程:使用知识库修复幻觉

大型语言模型(LLMs)存在数据新鲜度问题。即使是像GPT-4这样最强大的模型,也不了解最近的事件。

根据LLMs的视角,世界仿佛停滞在某个时间点。它们只知道世界是如何在它们的训练数据中呈现的。

这对于依赖最新信息或特定数据集的任何用例都会带来问题。例如,您可能有一些内部公司文件,您希望通过LLM与之互动。

第一个挑战是将这些文件添加到LLM中,我们可以尝试训练LLM使用这些文件,但这是耗时且昂贵的。而且当添加新文件时会发生什么呢?为每个新文件进行训练是非常低效的,简直是不可能的。

那么,我们如何处理这个问题呢?我们可以使用检索增强技术。这种技术允许我们从外部知识库中检索相关信息,并将这些信息提供给我们的LLM。

外部知识库就是我们了解LLM训练数据之外世界的"窗口"。在本章中,我们将学习如何使用LangChain为LLMs实施检索增强。

创建知识库

我们有两种主要类型的知识适用于LLMs。参数化知识指的是LLM在训练过程中学到的一切,它充当了LLM的世界的冻结快照。

第二种类型的知识是源知识。这种知识包括通过输入提示输入到LLM中的任何信息。当我们谈论检索增强时,我们指的是向LLM提供有价值的源知识。

获取我们知识库的数据

为了帮助我们的LLM,我们需要为其提供访问相关源知识的能力。为了实现这一点,我们需要创建我们自己的知识库。

我们从一个数据集开始。所使用的数据集自然取决于用例。它可以是用于协助编写代码的代码文档,用于内部聊天机器人的公司文件,或者其他任何内容。

在我们的示例中,我们将使用维基百科的一个子集。为了获取这些数据,我们将使用Hugging Face数据集,如下所示:

In[2]:

from datasets import load_dataset

data = load_dataset("wikipedia", "20220301.simple", split='train[:10000]')
data

Out[2]:

Downloading readme:   0%|          | 0.00/16.3k [00:00<?, ?B/s]

Out[2]:

Dataset({
    features: ['id', 'url', 'title', 'text'],
    num_rows: 10000
})

In[3]:

data[6]

Out[3]:

{'id': '13',
 'url': 'https://simple.wikipedia.org/wiki/Alan%20Turing',
 'title': 'Alan Turing',
 'text': 'Alan Mathison Turing OBE FRS (London, 23 June 1912 – Wilmslow, Cheshire, 7 June 1954) was an English mathematician and computer scientist. He was born in Maida Vale, London.\n\nEarly life and family \nAlan Turing was born in Maida Vale, London on 23 June 1912. His father was part of a family of merchants from Scotland. His mother, Ethel Sara, was the daughter of an engineer.\n\nEducation \nTuring went to St. Michael's, a school at 20 Charles Road, St Leonards-on-sea, when he was five years old.\n"This is only a foretaste of what is to come, and only the shadow of what is going to be.” – Alan Turing.\n\nThe Stoney family were once prominent landlords, here in North Tipperary. His mother Ethel Sara Stoney (1881–1976) was daughter of Edward Waller Stoney (Borrisokane, North Tipperary) and Sarah Crawford (Cartron Abbey, Co. Longford); Protestant Anglo-Irish gentry.\n\nEducated in Dublin at Alexandra School and College; on October 1st 1907 she married Julius Mathison Turing, latter son of Reverend John Robert Turing and Fanny Boyd, in Dublin. Born on June 23rd 1912, Alan Turing would go on to be regarded as one of the greatest figures of the twentieth century.\n\nA brilliant mathematician and cryptographer Alan was to become the founder of modern-day computer science and artificial intelligence; designing a machine at Bletchley Park to break secret Enigma encrypted messages used by the Nazi German war machine to protect sensitive commercial, diplomatic and military communications during World War 2. Thus, Turing made the single biggest contribution to the Allied victory in the war against Nazi Germany, possibly saving the lives of an estimated 2 million people, through his effort in shortening World War II.\n\nIn 2013, almost 60 years later, Turing received a posthumous Royal Pardon from Queen Elizabeth II. Today, the “Turing law” grants an automatic pardon to men who died before the law came into force, making it possible for living convicted gay men to seek pardons for offences now no longer on the statute book.\n\nAlas, Turing accidentally or otherwise lost his life in 1954, having been subjected by a British court to chemical castration, thus avoiding a custodial sentence. He is known to have ended his life at the age of 41 years, by eating an apple laced with cyanide.\n\nCareer \nTuring was one of the people who worked on the first computers. He created the theoretical  Turing machine in 1936. The machine was imaginary, but it included the idea of a computer program.\n\nTuring was interested in artificial intelligence. He proposed the Turing test, to say when a machine could be called "intelligent". A computer could be said to "think" if a human talking with it could not tell it was a machine.\n\nDuring World War II, Turing worked with others to break German ciphers (secret messages). He  worked for the Government Code and Cypher School (GC&CS) at Bletchley Park, Britain's codebreaking centre that produced Ultra intelligence.\nUsing cryptanalysis, he helped to break the codes of the Enigma machine. After that, he worked on other German codes.\n\nFrom 1945 to 1947, Turing worked on the design of the ACE (Automatic Computing Engine) at the National Physical Laboratory. He presented a paper on 19 February 1946. That paper was "the first detailed design of a stored-program computer". Although it was possible to build ACE, there were delays in starting the project. In late 1947 he returned to Cambridge for a sabbatical year. While he was at Cambridge, the Pilot ACE was built without him. It ran its first program on 10\xa0May 1950.\n\nPrivate life \nTuring was a homosexual man. In 1952, he admitted having had sex with a man in England. At that time, homosexual acts were illegal. Turing was convicted. He had to choose between going to jail and taking hormones to lower his sex drive. He decided to take the hormones. After his punishment, he became impotent. He also grew breasts.\n\nIn May 2012, a private member's bill was put before the House of Lords to grant Turing a statutory pardon. In July 2013, the government supported it. A royal pardon was granted on 24 December 2013.\n\nDeath \nIn 1954, Turing died from cyanide poisoning. The cyanide came from either an apple which was poisoned with cyanide, or from water that had cyanide in it. The reason for the confusion is that the police never tested the apple for cyanide. It is also suspected that he committed suicide.\n\nThe treatment forced on him is now believed to be very wrong. It is against medical ethics and international laws of human rights. In August 2009, a petition asking the British Government to apologise to Turing for punishing him for being a homosexual was started. The petition received thousands of signatures. Prime Minister Gordon Brown acknowledged the petition. He called Turing's treatment "appalling".\n\nReferences\n\nOther websites \nJack Copeland 2012. Alan Turing: The codebreaker who saved 'millions of lives'. BBC News / Technology \n\nEnglish computer scientists\nEnglish LGBT people\nEnglish mathematicians\nGay men\nLGBT scientists\nScientists from London\nSuicides by poison\nSuicides in the United Kingdom\n1912 births\n1954 deaths\nOfficers of the Order of the British Empire'}

大多数数据集将包含包含大量文本的记录。因此,我们通常的第一个任务是构建一个预处理管道,将那些长文本分割成更简洁的块。

创建文本块

将我们的文本分割成较小的块对于多个原因至关重要。主要目的是:

  • 提高“嵌入准确性” - 这将提高后续结果的相关性。

  • 减少输入到我们LLM作为源知识的文本数量。限制输入可以提高LLM遵循指令的能力,降低生成成本,并帮助我们获得更快的响应。

  • 为用户提供更精确的信息源,因为我们可以将信息源缩小到更小的文本块。

  • 对于非常长的文本块,我们将超出嵌入或完成模型的最大上下文窗口。分割这些文本块使得将这些较长的文档添加到我们的知识库成为可能。

要创建这些块,首先需要一种测量文本长度的方法。LLMs不是按单词或字符测量文本的 - 它们是按“标记”来测量的。

标记通常是单词或子词的大小,不同的LLMs有不同的标记大小。标记本身是由标记器构建的。我们将使用gpt-3.5-turbo作为我们的完成模型,并且我们可以像这样初始化该模型的标记器:

import tiktoken  # !pip install tiktoken

tokenizer = tiktoken.get_encoding('p50k_base')

使用标记器,我们可以从纯文本创建标记并计算标记的数量。我们将把这个过程封装到一个名为tiktoken_len的函数中:

In[28]:

# create the length function
def tiktoken_len(text):
    tokens = tokenizer.encode(
        text,
        disallowed_special=()
    )
    return len(tokens)

tiktoken_len("hello I am a chunk of text and using the tiktoken_len function "
             "we can find the length of this chunk of text in tokens")

Out[28]:

28

有了我们的标记计数函数准备好后,我们可以初始化一个LangChain RecursiveCharacterTextSplitter对象。这个对象将允许我们将文本分割成不超过我们通过chunk_size参数指定的长度的块。

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=400,
    chunk_overlap=20,
    length_function=tiktoken_len,
    separators=["\n\n", "\n", " ", ""]
)


现在我们可以这样分割文本:

In[6]:

chunks = text_splitter.split_text(data[6]['text'])[:3]
chunks


Out[6]:

['Alan Mathison Turing OBE FRS (London, 23 June 1912 – Wilmslow, Cheshire, 7 June 1954) was an English mathematician and computer scientist. He was born in Maida Vale, London.\n\nEarly life and family \nAlan Turing was born in Maida Vale, London on 23 June 1912. His father was part of a family of merchants from Scotland. His mother, Ethel Sara, was the daughter of an engineer.\n\nEducation \nTuring went to St. Michael's, a school at 20 Charles Road, St Leonards-on-sea, when he was five years old.\n"This is only a foretaste of what is to come, and only the shadow of what is going to be.” – Alan Turing.\n\nThe Stoney family were once prominent landlords, here in North Tipperary. His mother Ethel Sara Stoney (1881–1976) was daughter of Edward Waller Stoney (Borrisokane, North Tipperary) and Sarah Crawford (Cartron Abbey, Co. Longford); Protestant Anglo-Irish gentry.\n\nEducated in Dublin at Alexandra School and College; on October 1st 1907 she married Julius Mathison Turing, latter son of Reverend John Robert Turing and Fanny Boyd, in Dublin. Born on June 23rd 1912, Alan Turing would go on to be regarded as one of the greatest figures of the twentieth century.\n\nA brilliant mathematician and cryptographer Alan was to become the founder of modern-day computer science and artificial intelligence; designing a machine at Bletchley Park to break secret Enigma encrypted messages used by the Nazi German war machine to protect sensitive commercial, diplomatic and military communications during World War 2. Thus, Turing made the single biggest contribution to the Allied victory in the war against Nazi Germany, possibly saving the lives of an estimated 2 million people, through his effort in shortening World War II.',
 'In 2013, almost 60 years later, Turing received a posthumous Royal Pardon from Queen Elizabeth II. Today, the “Turing law” grants an automatic pardon to men who died before the law came into force, making it possible for living convicted gay men to seek pardons for offences now no longer on the statute book.\n\nAlas, Turing accidentally or otherwise lost his life in 1954, having been subjected by a British court to chemical castration, thus avoiding a custodial sentence. He is known to have ended his life at the age of 41 years, by eating an apple laced with cyanide.\n\nCareer \nTuring was one of the people who worked on the first computers. He created the theoretical  Turing machine in 1936. The machine was imaginary, but it included the idea of a computer program.\n\nTuring was interested in artificial intelligence. He proposed the Turing test, to say when a machine could be called "intelligent". A computer could be said to "think" if a human talking with it could not tell it was a machine.\n\nDuring World War II, Turing worked with others to break German ciphers (secret messages). He  worked for the Government Code and Cypher School (GC&CS) at Bletchley Park, Britain's codebreaking centre that produced Ultra intelligence.\nUsing cryptanalysis, he helped to break the codes of the Enigma machine. After that, he worked on other German codes.',
 'From 1945 to 1947, Turing worked on the design of the ACE (Automatic Computing Engine) at the National Physical Laboratory. He presented a paper on 19 February 1946. That paper was "the first detailed design of a stored-program computer". Although it was possible to build ACE, there were delays in starting the project. In late 1947 he returned to Cambridge for a sabbatical year. While he was at Cambridge, the Pilot ACE was built without him. It ran its first program on 10\xa0May 1950.\n\nPrivate life \nTuring was a homosexual man. In 1952, he admitted having had sex with a man in England. At that time, homosexual acts were illegal. Turing was convicted. He had to choose between going to jail and taking hormones to lower his sex drive. He decided to take the hormones. After his punishment, he became impotent. He also grew breasts.\n\nIn May 2012, a private member's bill was put before the House of Lords to grant Turing a statutory pardon. In July 2013, the government supported it. A royal pardon was granted on 24 December 2013.\n\nDeath \nIn 1954, Turing died from cyanide poisoning. The cyanide came from either an apple which was poisoned with cyanide, or from water that had cyanide in it. The reason for the confusion is that the police never tested the apple for cyanide. It is also suspected that he committed suicide.\n\nThe treatment forced on him is now believed to be very wrong. It is against medical ethics and international laws of human rights. In August 2009, a petition asking the British Government to apologise to Turing for punishing him for being a homosexual was started. The petition received thousands of signatures. Prime Minister Gordon Brown acknowledged the petition. He called Turing's treatment "appalling".\n\nReferences\n\nOther websites \nJack Copeland 2012. Alan Turing: The codebreaker who saved 'millions of lives'. BBC News / Technology']


这些块都没有超过我们之前设置的400块大小限制:

In[7]:

tiktoken_len(chunks[0]), tiktoken_len(chunks[1]), tiktoken_len(chunks[2])


Out[7]:

(397, 304, 399)


使用文本分割器,我们获得了适当大小的文本块。稍后在索引过程中,我们将使用这个功能。现在,让我们来看看嵌入。

创建嵌入

矢量嵌入对于检索LLM的相关上下文至关重要。我们将希望存储在我们的知识库中的文本块进行编码,将每个块编码成一个矢量嵌入。

这些嵌入可以充当每个文本块含义的“数值表示”。这是可能的,因为我们使用另一个已经学会将人类可读文本转化为AI可读嵌入的AI语言模型来创建这些嵌入。

image.png

然后,我们将这些嵌入存储在我们的矢量数据库中(稍后会详细介绍),并通过在矢量空间中计算嵌入之间的距离来找到具有相似含义的文本块。

image.png

我们将使用的嵌入模型是另一个OpenAI模型,称为text-embedding-ada-002。我们可以像这样通过LangChain初始化它:

from langchain.embeddings.openai import OpenAIEmbeddings

model_name = 'text-embedding-ada-002'

embed = OpenAIEmbeddings(
    document_model_name=model_name,
    query_model_name=model_name,
    openai_api_key=OPENAI_API_KEY
)


现在我们可以嵌入我们的文本:

In[10]:

texts = [    'this is the first chunk of text',    'then another second chunk of text is here']

res = embed.embed_documents(texts)
len(res), len(res[0])


Out[10]:

(2, 1536)


从这里,我们得到了两个嵌入,因为我们传入了两个文本块。每个嵌入都是一个1536维的向量。这个维度只是text-embedding-ada-002的输出维度。

有了这些,我们拥有了我们的数据集、文本分割器和嵌入模型。我们具备了开始构建知识库所需的一切。

向量数据库

矢量数据库是一种知识库类型,它允许我们扩展对数十亿条记录的相似嵌入的搜索,通过添加、更新或删除记录来管理我们的知识库,甚至可以执行诸如过滤等操作。

我们将使用Pinecone矢量数据库。要使用它,我们需要一个免费的API密钥。然后,我们可以像这样初始化我们的数据库索引:

import pinecone

index_name = 'langchain-retrieval-augmentation'

pinecone.init(
        api_key="YOUR_API_KEY",  # find api key in console at app.pinecone.io
        environment="YOUR_ENV"  # find next to api key in console
)

# we create a new index
pinecone.create_index(
        name=index_name,
        metric='dotproduct',
        dimension=len(res[0]) # 1536 dim of text-embedding-ada-002
)


然后,我们连接到新的索引:

In[12]:

index = pinecone.GRPCIndex(index_name)

index.describe_index_stats()


Out[12]:

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {},
 'total_vector_count': 0}


我们会看到新的Pinecone索引的total_vector_count为0,因为我们还没有添加任何向量。我们接下来的任务是完成这一步。

索引过程包括遍历我们希望添加到知识库中的数据,创建ID、嵌入和元数据,然后将它们添加到索引中。

我们可以分批进行此操作以加速过程。

from tqdm.auto import tqdm
from uuid import uuid4

batch_limit = 100

texts = []
metadatas = []

for i, record in enumerate(tqdm(data)):
    # first get metadata fields for this record
    metadata = {
        'wiki-id': str(record['id']),
        'source': record['url'],
        'title': record['title']
    }
    # now we create chunks from the record text
    record_texts = text_splitter.split_text(record['text'])
    # create individual metadata dicts for each chunk
    record_metadatas = [{
        "chunk": j, "text": text, **metadata
    } for j, text in enumerate(record_texts)]
    # append these to current batches
    texts.extend(record_texts)
    metadatas.extend(record_metadatas)
    # if we have reached the batch_limit we can add texts
    if len(texts) >= batch_limit:
        ids = [str(uuid4()) for _ in range(len(texts))]
        embeds = embed.embed_documents(texts)
        index.upsert(vectors=zip(ids, embeds, metadatas))
        texts = []
        metadatas = []


现在我们已经完成了索引。要检查索引中记录的数量,我们再次调用describe_index_stats:

In[14]:

index.describe_index_stats()


Out[14]:

{'dimension': 1536,
 'index_fullness': 0.1,
 'namespaces': {'': {'vector_count': 27437}},
 'total_vector_count': 27437}


我们的索引包含大约27,000条记录。正如之前提到的,我们可以将这个数量扩展到数十亿,但对于我们的示例来说,27,000足够了。

LangChain矢量存储和查询

我们独立于LangChain构建我们的索引。这是因为这是一个简单的过程,而且直接使用Pinecone客户端更快。然而,我们即将回到LangChain,所以我们应该通过LangChain库重新连接到我们的索引。

from langchain.vectorstores import Pinecone

text_field = "text"

# switch back to normal index for langchain
index = pinecone.Index(index_name)

vectorstore = Pinecone(
    index, embed.embed_query, text_field
)


我们可以使用相似性搜索方法直接进行查询,返回文本块,而无需LLM生成响应。

In[16]:

query = "who was Benito Mussolini?"

vectorstore.similarity_search(
    query,  # our search query
    k=3  # return 3 most relevant docs
)


Out[16]:

[Document(page_content='Benito Amilcare Andrea Mussolini KSMOM GCTE (29 July 1883 – 28 April 1945) was an Italian politician and journalist. He was also the Prime Minister of Italy from 1922 until 1943. He was the leader of the National Fascist Party.\n\nBiography\n\nEarly life\nBenito Mussolini was named after Benito Juarez, a Mexican opponent of the political power of the Roman Catholic Church, by his anticlerical (a person who opposes the political interference of the Roman Catholic Church in secular affairs) father. Mussolini's father was a blacksmith. Before being involved in politics, Mussolini was a newspaper editor (where he learned all his propaganda skills) and elementary school teacher.\n\nAt first, Mussolini was a socialist, but when he wanted Italy to join the First World War, he was thrown out of the socialist party. He 'invented' a new ideology, Fascism, much out of Nationalist\xa0and Conservative views.\n\nRise to power and becoming dictator\nIn 1922, he took power by having a large group of men, "Black Shirts," march on Rome and threaten to take over the government. King Vittorio Emanuele III gave in, allowed him to form a government, and made him prime minister. In the following five years, he gained power, and in 1927 created the OVRA, his personal secret police force. Using the agency to arrest, scare, or murder people against his regime, Mussolini was dictator\xa0of Italy by the end of 1927. Only the King and his own Fascist party could challenge his power.', lookup_str='', metadata={'chunk': 0.0, 'source': 'https://simple.wikipedia.org/wiki/Benito%20Mussolini', 'title': 'Benito Mussolini', 'wiki-id': '6754'}, lookup_index=0),
 Document(page_content='Fascism as practiced by Mussolini\nMussolini's form of Fascism, "Italian Fascism"- unlike Nazism, the racist ideology that Adolf Hitler followed- was different and less destructive than Hitler's. Although a believer in the superiority of the Italian nation and national unity, Mussolini, unlike Hitler, is quoted "Race? It is a feeling, not a reality. Nothing will ever make me believe that biologically pure races can be shown to exist today".\n\nMussolini wanted Italy to become a new Roman Empire. In 1923, he attacked the island of Corfu, and in 1924, he occupied the city state of Fiume. In 1935, he attacked the African country Abyssinia (now called Ethiopia). His forces occupied it in 1936. Italy was thrown out of the League of Nations because of this aggression. In 1939, he occupied the country Albania. In 1936, Mussolini signed an alliance with Adolf Hitler, the dictator of Germany.\n\nFall from power and death\nIn 1940, he sent Italy into the Second World War on the side of the Axis countries. Mussolini attacked Greece, but he failed to conquer it. In 1943, the Allies landed in Southern Italy. The Fascist party and King Vittorio Emanuel III deposed Mussolini and put him in jail, but he was set free by the Germans, who made him ruler of the Italian Social Republic puppet state which was in a small part of Central Italy. When the war was almost over, Mussolini tried to escape to Switzerland with his mistress, Clara Petacci, but they were both captured and shot by partisans. Mussolini's dead body was hanged upside-down, together with his mistress and some of Mussolini's helpers, on a pole at a gas station in the village of Millan, which is near the border  between Italy and Switzerland.', lookup_str='', metadata={'chunk': 1.0, 'source': 'https://simple.wikipedia.org/wiki/Benito%20Mussolini', 'title': 'Benito Mussolini', 'wiki-id': '6754'}, lookup_index=0),
 Document(page_content='Fascist Italy \nIn 1922, a new Italian government started. It was ruled by Benito Mussolini, the leader of Fascism in Italy. He became head of government and dictator, calling himself "Il Duce" (which means "leader" in Italian). He became friends with German dictator Adolf Hitler. Germany, Japan, and Italy became the Axis Powers. In 1940, they entered World War II together against France, Great Britain, and later the Soviet Union. During the war, Italy controlled most of the Mediterranean Sea.\n\nOn July 25, 1943, Mussolini was removed by the Great Council of Fascism. On September 8, 1943, Badoglio said that the war as an ally of Germany was ended. Italy started fighting as an ally of France and the UK, but Italian soldiers did not know whom to shoot. In Northern Italy, a movement called Resistenza started to fight against the German invaders. On April 25, 1945, much of Italy became free, while Mussolini tried to make a small Northern Italian fascist state called the Republic of Salò. The fascist state failed and Mussolini tried to flee to Switzerland and escape to Francoist Spain, but he was captured by Italian partisans. On 28 April 1945 Mussolini was executed by a partisan.\n\nAfter World War Two \n\nThe state became a republic on June 2, 1946. For the first time, women were able to vote. Italian people ended the Savoia dynasty and adopted a republic government.\n\nIn February 1947, Italy signed a peace treaty with the Allies. They lost all the colonies and some territorial areas (Istria and parts of Dalmatia).\n\nSince then Italy has joined NATO and the European Community (as a founding member). It is one of the seven biggest industrial economies in the world.\n\nTransportation \n\nThe railway network in Italy totals . It is the 17th longest in the world. High speed trains include ETR-class trains which travel at .', lookup_str='', metadata={'chunk': 5.0, 'source': 'https://simple.wikipedia.org/wiki/Italy', 'title': 'Italy', 'wiki-id': '363'}, lookup_index=0)]


所有这些都是相关的结果,告诉我们我们的系统的检索组件正在运行。下一步是将我们的LLM添加进来,以生成性地回答我们的问题,使用在这些检索到的上下文中提供的信息。

生成式问答

在生成式问答(GQA)中,我们将问题传递给LLM,但指示它基于从我们的知识库返回的信息来生成答案。我们可以在LangChain中轻松地使用RetrievalQA链来实现这一点。

from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA

# completion llm
llm = ChatOpenAI(
    openai_api_key=OPENAI_API_KEY,
    model_name='gpt-3.5-turbo',
    temperature=0.0
)

qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever()
)


让我们尝试一下之前的查询:

In[22]:

qa.run(query)


Out[22]:

'Benito Mussolini was an Italian politician and journalist who served as the Prime Minister of Italy from 1922 until 1943. He was the leader of the National Fascist Party and invented the ideology of Fascism. Mussolini was a dictator of Italy by the end of 1927, and his form of Fascism, "Italian Fascism," was different and less destructive than Hitler's Nazism. Mussolini wanted Italy to become a new Roman Empire and attacked several countries, including Abyssinia (now called Ethiopia) and Greece. He was removed from power in 1943 and was executed by Italian partisans in 1945.'


这次我们得到的响应是由我们的gpt-3.5-turbo LLM根据从我们的矢量数据库中检索到的信息生成的。

尽管如此,我们仍然无法完全防止模型生成令人信服但虚假的幻觉,这种情况可能会发生,而且不太可能完全消除这个问题。然而,我们可以采取更多措施来提高我们对提供的答案的信任度。

一个有效的方法是在响应中添加引用,让用户看到信息来源。我们可以使用稍微不同版本的RetrievalQA链,称为RetrievalQAWithSourcesChain,来实现这一点。

In[23]:

from langchain.chains import RetrievalQAWithSourcesChain

qa_with_sources = RetrievalQAWithSourcesChain.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever()
)


In[24]:

qa_with_sources(query)


Out[24]:

{'question': 'who was Benito Mussolini?',
 'answer': 'Benito Mussolini was an Italian politician and journalist who was the Prime Minister of Italy from 1922 until 1943. He was the leader of the National Fascist Party and invented the ideology of Fascism. He became dictator of Italy by the end of 1927 and was friends with German dictator Adolf Hitler. Mussolini attacked Greece and failed to conquer it. He was removed by the Great Council of Fascism in 1943 and was executed by a partisan on April 28, 1945. After the war, several Neo-Fascist movements have had success in Italy, the most important being the Movimento Sociale Italiano. His granddaughter Alessandra Mussolini has outspoken views similar to Fascism. \n',
 'sources': 'https://simple.wikipedia.org/wiki/Benito%20Mussolini, https://simple.wikipedia.org/wiki/Fascism'}


现在,我们不仅回答了提出的问题,还包括了LLM使用的这些信息的来源。

我们已经学会了通过使用矢量数据库作为我们的知识库来为大型语言模型提供源知识的基础。利用这一点,我们可以鼓励LLM的响应准确性,保持源知识的更新,并通过在每个答案中提供引用来提高对我们系统的信任。

我们已经看到LLM和知识库在大型产品中配对使用,例如必应的AI搜索、Google Bard和ChatGPT插件。毫无疑问,LLM的未来与高性能、可扩展和可靠的知识库紧密相连。

如何学习AI大模型?

我在一线互联网企业工作十余年里,指导过不少同行后辈。帮助很多人得到了学习和成长。

我意识到有很多经验和知识值得分享给大家,也可以通过我们的能力和经验解答大家在人工智能学习中的很多困惑,所以在工作繁忙的情况下还是坚持各种整理和分享。但苦于知识传播途径有限,很多互联网行业朋友无法获得正确的资料得到学习提升,故此将并将重要的AI大模型资料包括AI大模型入门学习思维导图、精品AI大模型学习书籍手册、视频教程、实战学习等录播视频免费分享出来。

在这里插入图片描述

第一阶段: 从大模型系统设计入手,讲解大模型的主要方法;

第二阶段: 在通过大模型提示词工程从Prompts角度入手更好发挥模型的作用;

第三阶段: 大模型平台应用开发借助阿里云PAI平台构建电商领域虚拟试衣系统;

第四阶段: 大模型知识库应用开发以LangChain框架为例,构建物流行业咨询智能问答系统;

第五阶段: 大模型微调开发借助以大健康、新零售、新媒体领域构建适合当前领域大模型;

第六阶段: 以SD多模态大模型为主,搭建了文生图小程序案例;

第七阶段: 以大模型平台应用与开发为主,通过星火大模型,文心大模型等成熟大模型构建大模型行业应用。

在这里插入图片描述

👉学会后的收获:👈
• 基于大模型全栈工程实现(前端、后端、产品经理、设计、数据分析等),通过这门课可获得不同能力;

• 能够利用大模型解决相关实际项目需求: 大数据时代,越来越多的企业和机构需要处理海量数据,利用大模型技术可以更好地处理这些数据,提高数据分析和决策的准确性。因此,掌握大模型应用开发技能,可以让程序员更好地应对实际项目需求;

• 基于大模型和企业数据AI应用开发,实现大模型理论、掌握GPU算力、硬件、LangChain开发框架和项目实战技能, 学会Fine-tuning垂直训练大模型(数据准备、数据蒸馏、大模型部署)一站式掌握;

• 能够完成时下热门大模型垂直领域模型训练能力,提高程序员的编码能力: 大模型应用开发需要掌握机器学习算法、深度学习框架等技术,这些技术的掌握可以提高程序员的编码能力和分析能力,让程序员更加熟练地编写高质量的代码。

在这里插入图片描述

1.AI大模型学习路线图
2.100套AI大模型商业化落地方案
3.100集大模型视频教程
4.200本大模型PDF书籍
5.LLM面试题合集
6.AI产品经理资源合集

👉获取方式:
😝有需要的小伙伴,可以保存图片到wx扫描二v码免费领取【保证100%免费】🆓

在这里插入图片描述

  • 28
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值