基于huggingface和langchain快速开发大模型应用

一家专注于自然语言处理（NLP）、人工智能和分布式系统的创业公司，创立于2016年。最早是主营业务是做闲聊机器人，2018年 Bert 发布之后，他们贡献了一个基于 Pytorch 的 Bert 预训练模型，即 pytorch-pretrained-bert，大受欢迎，进而将重心转向维护 NLP开源社区。

HuggingFace整合了他们的贡献的NLP领域的预训练模型，发布了Transformers库。Transformers 提供了数以千计的预训练模型（包括我们熟知的Bert、GPT、GPT-2、XLM等），支持 100 多种语言的文本分类、信息抽取、问答、摘要、翻译、文本生成。它的宗旨让最先进的 NLP 技术人人易用。

1.2活跃度

HuggingFace的模型仓库已经共享了超过60000个模型，数据集仓库已经共享了超过8000个数据集，基于开源共享的精神，这些资源的使用都是完全免费的。HuggingFace代码库也在快速更新中，HuggingFace开始时以自然语言处理任务为重点，所以HuggingFace大多数的模型和数据集也是自然语言处理方向的，但图像和语音的功能模型正在快速更新中，相信未来逐渐会把图像和语音的功能完善并标准化，如同自然语言处理一样。

1.3 工具集

HuggingFace把AI项目的研发大致分为以下几部分，如图1-1所示。

图1-1

针对流程中的各个节点，HuggingFace都提供了很多工具类，能够帮助研发人员快速地实施。HuggingFace提供的工具集如图1-2所示。

图1-2

二、HuggingFace工具介绍

2.1 Pipelines

2.1.1定义

pipeline是一个设计用来封装Transformer库中大部分复杂代码的对象，它提供了一个简单的API接口，用于执行各种任务，如命名实体识别（NER）、情感分析等。使用pipeline，用户可以轻松地将文本输入传递给模型，并获得相应的输出结果。这个过程包括三个主要步骤：

文本预处理：将文本转换成模型可以理解的格式。
模型预测：预处理后的输入被送入模型进行推理。
后处理：模型的预测结果经过后处理，以便赋予具体的业务含义。

2.1.2常见参数

参数名称	参数含义	示例列表
task	The task defining which pipeline will be returned.
model	The model that will be used by the pipeline to make predictions. This can be a model identifier or an actual instance of a pretrained model inheriting from PreTrainedModel (for PyTorch) or TFPreTrainedModel (for TensorFlow).
tokenizer (分词器)	The tokenizer that will be used by the pipeline to encode data for the model. This can be a model identifier or an actual pretrained tokenizer inheriting from PreTrainedTokenizer.
feature_extractor (特征提取器)	The feature extractor that will be used by the pipeline to encode data for the model. This can be a model identifier or an actual pretrained feature extractor inheriting from PreTrainedFeatureExtractor.

2.2、AutoClass

2.2.1定义

由于存在许多不同的Transformer架构，因此为您的checkpoint创建一个可用架构可能会具有挑战性。通过AutoClass可以自动推断并从给定的checkpoint加载正确的架构, 这也是Transformers易于使用、简单且灵活核心规则的重要一部分。

2.2.2 支持模型架构列表

模型类型	AutoClass名称
NLP任务	AutoTokenizer
视觉任务	AutoImageProcessor
音频任务	AutoFeatureExtractor
多模态任务	AutoProcessor

三、HuggingFace案例介绍

3.1基于Piplelines的语音识别案例

Facebook语音识别模型推理脚本示例:

from transformers import pipeline

model_name = "./huggingface/model/wav2vec2-base-960h"

transcriber = pipeline(task="automatic-speech-recognition",model=model_name)

output = transcriber("mlk.flac")

print(output)

模型推理结果:

3.2基于AutoClass的大模型对话交互案例

智谱大模型3.0的6B 32K token版本，模型推理脚本示例:

from transformers import AutoTokenizer, AutoModel

model_name = "./huggingface/model/chatglm3-6b-32k"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

model = AutoModel.from_pretrained(model_name, trust_remote_code=True).half().cuda()

response, history = model.chat(tokenizer, "小米是一家什么公司", history=[])

print('问题:{}'.format(history))

print('回答:{}'.format(response))

print('-----------------------------------------------------------------------')

response, history = model.chat(tokenizer, "公司核心产品有哪些", history=history)

print('问题:{}'.format(history))

print('回答:{}'.format(response))

模型推理结果:

3.3基于Langchain的大模型RAG检索增强案例

3.3.1 流程介绍

基于langchain、ChatGLM、Chroma实现大模型向量数据库检索增强，具体流程如下:

简化图中表达，整体仅需要五个步骤:

文档转换即文档预处理
文档分段
文档按段进行向量化并持久化
文档向量相似度检索
通过prompt工程对检索结果进行知识增强(注:大模型也可以API方式对接，以及对接其他大模型)

3.3.1 案例分享

脚本示例:

# 1 文档转文本(支持格式:txt、docx、md、pdf)

from langchain.document_loaders import UnstructuredFileLoader

from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = UnstructuredFileLoader("移动套餐信息.docx")

data = loader.load()

# 2 文本分块

# chunk_size每个分片的最大大小，chunk_overlap分片之间的覆盖大小，可以保持连贯性

text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=0)

split_docs = text_splitter.split_documents(data)

# 3 文本分块转向量并持久化

from langchain.vectorstores import Chroma

from langchain.embeddings.huggingface import HuggingFaceEmbeddings

import sentence_transformers

embedding_model_dict = {

"text2vec3":"huggingface/model/text2vec-base-chinese",

}

EMBEDDING_MODEL = "text2vec3"

embeddings = HuggingFaceEmbeddings(model_name=embedding_model_dict[EMBEDDING_MODEL], )

embeddings.client = sentence_transformers.SentenceTransformer(

embeddings.model_name, device='cuda')

from langchain.vectorstores import Chroma

db = Chroma.from_documents(split_docs, embeddings, persist_directory="./data/doc_embeding")

# 持久化

db.persist()

# 4 向量检索

question = "什么是移动水秀卡"

db = Chroma(persist_directory="./data/doc_embeding", embedding_function=embeddings)

similarDocs = db.similarity_search(question, k=5)

info = ""

for similardoc in similarDocs:

info = info + similardoc.page_content

# 5 大模型知识增强

from transformers import AutoTokenizer, AutoModel

model_name = "./huggingface/model/chatglm3-6b-32k"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

model = AutoModel.from_pretrained(model_name, trust_remote_code=True).half().cuda()

model = model.eval()

history = []

question = "结合以下信息：" + info + "回答" + question

response, history = model.chat(tokenizer, question, history=history)

print("\033[31m 问题：{}\033[0m".format(question))

print("\033[32m 回答：{}\033[0m".format(response))

RAG知识库增强推理结果:

3.4基于Langchain的大模型智能体Action Agent案例

3.4.1 Action Agent参数

agent_executor = initialize_agent(

custom_tool_list,//工具列表

llm=model,//所使用的大模型

agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,//agent类型

verbose=True,//是否打印详细日志

max_iterations=10,//步骤

early_stopping_method="generate",//默认force方法，该方法只返回常量字符串;generate方法，该方法通过LLM执行一次FINAL传递以生成输出

agent_kwargs={"prefix": custom_prefix},//自定义参数

handle_parsing_errors="Check your output and make sure it conforms",//处理返回错误的方法

)

3.4.2 Action Agent类型

名称	含义	适用场景
ZERO_SHOT_REACT_DESCRIPTION	使用ReAct框架仅基于工具的描述来确定要使用的工具	普通任务场景
REACT_DOCSTORE	使用ReAct框架与文档存储进行交互，需携带Search工具和Lookup工具使用	文档检索场景
SELF_ASK_WITH_SEARCH	结合了自问自答方法和外部搜索工具的代理，会将一个复杂问题分解成多个小问题，然后逐一解答这些小问题	文档深层检索场景
CONVERSATIONAL_REACT_DESCRIPTION	相对第一种Agent增加记忆功能，需携带memory使用	适合上下文场景
CHAT_ZERO_SHOT_REACT_DESCRIPTION	相对第一种Agent增加对话功能	适合即时响应而不需要上下文的场景
CHAT_CONVERSATIONAL_REACT_DESCRIPTION	相对第一种Agent增加记忆功能和对话功能，需携带memory使用	适合即时响应和上下文场景
STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION	相对第一种Agent增加对话功能和结构化输入功能	适合即时响应和精确问答场景
OPENAI_FUNCTIONS	单个Function Calling	Openai场景
OPENAI_MULTI_FUNCTIONS	多个OPENAI_FUNCTIONS	Openai场景

总体来说，一共是有4个大类，9个子类

3.4.3 Rect框架

ReAct 由 Shunyu Yao 等人 2022年10月提出，用以解决语言模型语言理解和交互式决策制定等任务中推理（例如思维链提示）和行动（例如行动计划生成）能力结合的问题。具体思考框架如下:

Question 问题是什么
Thought 思考如何去解决
Action 下一步采取的行动
Observation: 行动的结果

并指示生成的思考、行动、结果是可以重复 N 次的。并指示 LLM 在知道最终的结果后，输出 Final Answer。

3.4.4 PromptTemplate类型

PromptTemplate可以帮助语言模型生成更好的响应，主要用以下两种类型:

自定义提示模板:

prompt = PromptTemplate(input_variables=["question", "answer"], template="Question: {question}\n{answer}")

few shot examples模板

examples = [
{
"question": "Who lived longer, Muhammad Ali or Alan Turing?",
"answer": " "

}]

example_prompt = PromptTemplate(input_variables=["question", "answer"], template="Question: {question}\n{answer}")

print(example_prompt.format(**examples[0]))

prompt = FewShotPromptTemplate(
    examples=examples,
    example_prompt=example_prompt,
    input_variables=["input"]
)

3.4.5 案例分享

基于langchain、ChatGLM实现计算器智能体Agent，具体流程如下:

1.得到llm对象

2.得到tools对象

3.调用initialize_agent得到agent对象

4.调用agent run

这里考虑到大模型并不擅长计算功能，所以选择agent type为CHAT_CONVERSATIONAL_REACT_DESCRIPTION ，即需要通过prompt干预，让大模型针对计算场景，选择外挂工具，别试图自己计算

脚本示例:

from langchain.agents import AgentType, Tool, initialize_agent

from langchain_community.utilities import SearchApiAPIWrapper

from langchain.tools import BaseTool

from math import pi

from typing import Union

class CircumferenceTool(BaseTool):

name = "Circumference calculator"

description = "当需要使用圆的半径计算周长时，请使用此工具"

def _run(self, radius: Union[int, float]):

print("execute run")

return float(radius)*2.0*3.14

def _arun(self, radius: int):

raise NotImplementedError("This tool does not support async")

tools = [CircumferenceTool()]

from langchain.prompts import PromptTemplate

from langchain.chains import LLMChain

agent_template = """当提供数学问题时，无论多么简单，请参考它可靠的工具，绝对不会试图自己回答数学问题\n\n"""

def _handle_error(error) -> str:

return str(error)[:50]

from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(memory_key="chat_history",k=5,return_messages=True)

agent = initialize_agent(

tools, GLM(), agent=AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION,handle_parsing_errors=_handle_error,verbose=True,memory=memory,max_iterations=3,early_stopping_method="generate"

)

new_prompt = agent.agent.create_prompt(

system_message=agent_template,

tools=tools

)

agent.agent.llm_chain.prompt = new_prompt

agent("你能计算出一个半径为7.81毫米的圆的周长吗?")

推理结果:

3.4.6 常见问题

问题1:

ValueError: Tool name should be Intermediate Answer, got {'Weather'}

答: 没有选择正确的Agent类型

问题2:

pydantic.v1.error_wrappers.ValidationError: 2 validation errors for LLMChain

答:基于llm大模型定义错误

问题3:

ValueError: not enough values to unpack (expected 2, got 1)

答:字符串过大

3.5基于Langchain的大模型智能体PlanAndExecute Agent案例

3.5.1 Action Agent对比PlanAndExecute Agent

从该示例我们会发现Action Agent计算【班级跳hip-hop舞蹈学生的比例】错误，原因总结下来有两点:

没有真实理解问题的复杂性
当涉及多个步骤时，有时会遗漏一些中间推理步骤,所以此时需要一个PlanAndExecute的Agent

总结下来，Action Agent就是下一步的动作由上一步的输出决定；PlanAndExecute Agent就是计划好所有的步骤，然后顺序执行。

3.5.2 案例分享

基于langchain、ChatGLM实现商品价格对比智能体Agent，具体流程如下:

1.得到llm对象

2.得到tools对象

3.得到Planer和executor对象

4.调用PlanAndExecute得到agent对象

5.调用agent run

执行过程分享:

步骤一: Agent通过Prompt进行大模型推理，得出执行计划:

大模型推理的计划Plan:

langchain prompt resp:Plan:

1. 首先，计算可口可乐和百事可乐的价格差。

2. 然后，计算价格差的三次方。

3. 最后，回答用户最初的问题。

<END_OF_PLAN>

步骤二: Agent通过计划Plan步骤，调用Tool，进行逻辑推理:

STEP1

Step: Calculate the price difference between Coca-Cola and Pepsi.

Response: The price difference between Coca-Cola and Pepsi is 2 yuan.

> Entering new AgentExecutor chain...

Action:

```

{

"action": "Calculator",

"action_input": "2^3"

}

```

Observation: Answer: 8

I know what to respond

Action:

```

{

"action": "Final Answer",

"action_input": "The 3rd power of the price difference between Coca-Cola and Pepsi is 8."

}

```

> Finished chain.

STEP2

Step: Take the 3rd power of the price difference.

Response: The 3rd power of the price difference between Coca-Cola and Pepsi is 8.

> Entering new AgentExecutor chain...

Action:

```

{

"action": "Final Answer",

"action_input": "The price difference between Coca-Cola and Pepsi is 2 yuan, and the 3rd power of the price difference is 8."

}

```

> Finished chain.

STEP3

Step: Given the above steps taken, please respond to the user's original question.

Response: The price difference between Coca-Cola and Pepsi is 2 yuan, and the 3rd power of the price difference is 8.

> Finished chain.

The price difference between Coca-Cola and Pepsi is 2 yuan, and the 3rd power of the price difference is 8.

3.6基于ChatGLM-6B进行LORA微调案例

3.6.1 LORA定义

LORA即低秩(即向量空间的基向量的个数)适应（Low-Rank Adaptation）是一种参数高效的微调技术，其核心思想是对大模型的权重矩阵进行隐式的低秩转换。

这里的微调训练，只要有合适的数据集，可以应用于任何NLP任务，例如，文本分类、命名实体识别，翻译，聊天对话等，以下案例为NLP文本分类案例。

3.6.2 案例分享

外卖评估数据集介绍:

4000个训练集问答对，2000个测试集问答对，问答对示例如下:

基于外卖评论数据集进行LORA微调核心代码示例:

from transformers import AutoTokenizer, AutoModel, TrainingArguments, AutoConfig

import torch

import torch.nn as nn

from peft import get_peft_model, LoraConfig, TaskType

model = AutoModel.from_pretrained("./huggingface/model/chatglm3-6b-32k",

load_in_8bit=False,

trust_remote_code=True)

model.supports_gradient_checkpointing = True

model.gradient_checkpointing_enable()

model.enable_input_require_grads()

model.config.use_cache = False # silence the warnings. Please re-enable for inference!

peft_config = LoraConfig(

task_type=TaskType.CAUSAL_LM, inference_mode=False,

r=8,

lora_alpha=32, lora_dropout=0.1,

)

model = get_peft_model(model, peft_config)

model.is_parallelizable = True

model.model_parallel = True

model.print_trainable_parameters()

微调之前模型准确率为87.8%，通过6000条左右数据进行微调，训练了一个小时左右，模型准确率到了90.3%

四、参考链接

Huggingface文档：安装

Huggingface模型库: https://hf-mirror.com/models

Langchain文档: https://www.langchain.com.cn/modules/agents/tools/custom_tools

附录

1如何自定义加载本地大模型

继承父类LLM的_call方法即可

参考链接:

langchain/libs/core/langchain_core/language_models/llms.py at master · langchain-ai/langchain · GitHub

class LLM(BaseLLM):

"""Base LLM abstract class.

The purpose of this class is to expose a simpler interface for working

with LLMs, rather than expect the user to implement the full _generate method.

"""

//必须实现被标记为抽象方法的接口

@abstractmethod

def _call(

self,

prompt: str,

stop: Optional[List[str]] = None,

run_manager: Optional[CallbackManagerForLLMRun] = None,

**kwargs: Any,

) -> str:

"""Run the LLM on the given prompt and input."""

async def _acall(

self,

prompt: str,

stop: Optional[List[str]] = None,

run_manager: Optional[AsyncCallbackManagerForLLMRun] = None,

**kwargs: Any,

) -> str:

"""Run the LLM on the given prompt and input."""

return await run_in_executor(

None,

self._call,

prompt,

stop,

run_manager.get_sync() if run_manager else None,

**kwargs,

)

def _generate(

self,

prompts: List[str],

stop: Optional[List[str]] = None,

run_manager: Optional[CallbackManagerForLLMRun] = None,

**kwargs: Any,

) -> LLMResult:

"""Run the LLM on the given prompt and input."""

# TODO: add caching here.

generations = []

new_arg_supported = inspect.signature(self._call).parameters.get("run_manager")

for prompt in prompts:

text = (

self._call(prompt, stop=stop, run_manager=run_manager, **kwargs)

if new_arg_supported

else self._call(prompt, stop=stop, **kwargs)

)

generations.append([Generation(text=text)])

return LLMResult(generations=generations)

async def _agenerate(

self,

prompts: List[str],

stop: Optional[List[str]] = None,

run_manager: Optional[AsyncCallbackManagerForLLMRun] = None,

**kwargs: Any,

) -> LLMResult:

"""Run the LLM on the given prompt and input."""

generations = []

new_arg_supported = inspect.signature(self._acall).parameters.get("run_manager")

for prompt in prompts:

text = (

await self._acall(prompt, stop=stop, run_manager=run_manager, **kwargs)

if new_arg_supported

else await self._acall(prompt, stop=stop, **kwargs)

)

generations.append([Generation(text=text)])

return LLMResult(generations=generations)