【Python】langchain-huggingface 库：将 Hugging Face 的模型和工具集成到 LangChain 框架

彬彬侠

于 2025-05-06 16:43:01 发布

阅读量669

点赞数 13

分类专栏： Python基础文章标签： python langchain huggingface Hugging Face LLM

本文链接：https://blog.csdn.net/u013172930/article/details/147742528

版权

Python基础专栏收录该内容

128 篇文章

订阅专栏

langchain-huggingface 是 LangChain 生态系统的一个子库，专门用于将 Hugging Face 的模型和工具集成到 LangChain 框架中。LangChain 是一个用于构建基于语言模型的应用程序的框架，而 langchain-huggingface 提供了与 Hugging Face Hub 上的预训练模型、分词器和嵌入模型的无缝连接，支持文本生成、聊天、嵌入生成等任务。它特别适合需要利用 Hugging Face 生态系统中的开源模型（如 LLaMA、Mistral、BERT）构建复杂应用（如 RAG、代理）的开发者。

以下是对 langchain-huggingface 库的详细介绍，包括其功能、用法及与 Hugging Face 的集成方式。

1. langchain-huggingface 库的作用

模型集成：支持 Hugging Face 的预训练模型（包括 GGUF、PyTorch 等格式）用于文本生成和聊天任务。
嵌入生成：利用 Hugging Face 的嵌入模型生成文本向量，适用于语义搜索和 RAG（检索增强生成）。
分词器支持：直接使用 Hugging Face 的分词器，确保模型输入一致性。
LangChain 兼容：与 LangChain 的提示模板、内存、代理和工具链无缝集成。
Hugging Face Hub 集成：从 Hugging Face Hub 下载模型、数据集和分词器，简化模型管理。

2. 安装与环境要求

Python 版本：支持 Python 3.8+（推荐 3.9+）。
依赖：
- langchain-core：LangChain 核心库。
- huggingface_hub：用于下载模型和分词器。
- transformers：Hugging Face 的核心库，用于加载 PyTorch 模型。
- sentence-transformers：用于嵌入模型。
- 可选：llama-cpp-python（支持 GGUF 模型）。

安装命令：

pip install langchain-huggingface
pip install transformers huggingface_hub sentence-transformers

GPU 支持（以 NVIDIA CUDA 为例）：

确保安装 torch 和 transformers 的 GPU 版本：

pip install torch transformers --extra-index-url https://download.pytorch.org/whl/cu121

对于 GGUF 模型，安装 llama-cpp-python 的 GPU 版本：

CMAKE_ARGS="-DGGML_CUDA=ON" pip install llama-cpp-python

Hugging Face 令牌（若需访问受限模型，如 LLaMA）：

export HUGGINGFACE_HUB_TOKEN="your_token"

或在 Python 中设置：

import os
os.environ["HUGGINGFACE_HUB_TOKEN"] = "your_token"

验证安装：

from langchain_huggingface import HuggingFacePipeline
print(HuggingFacePipeline.__module__)  # 输出: langchain_huggingface.llms.huggingface_pipeline

3. 核心功能与用法

langchain-huggingface 提供了多个核心类，用于集成 Hugging Face 模型到 LangChain 工作流中，包括 HuggingFacePipeline（文本生成）、ChatHuggingFace（聊天模型）、HuggingFaceEmbeddings（嵌入生成）等。

3.1 文本生成（HuggingFacePipeline）

使用 HuggingFacePipeline 加载 Hugging Face 模型进行文本生成。

from langchain_huggingface import HuggingFacePipeline
from transformers import pipeline

# 加载 Hugging Face 模型
hf_pipeline = pipeline("text-generation", model="gpt2", max_length=50)
llm = HuggingFacePipeline(pipeline=hf_pipeline)

# 生成文本
response = llm.invoke("The future of AI is")
print(response)

输出示例：

The future of AI is bright, with advancements in machine learning and natural language processing driving innovation across industries.

说明：

pipeline 来自 transformers，支持多种任务（如 text-generation、text2text-generation）。
HuggingFacePipeline 包装 Hugging Face 的 pipeline，使其兼容 LangChain。

3.2 聊天模型（ChatHuggingFace）

使用 ChatHuggingFace 加载对话模型，支持 OpenAI 风格的聊天接口。

from langchain_huggingface import ChatHuggingFace
from langchain_core.messages import HumanMessage, SystemMessage

# 加载模型（需 GPU 或量化模型以降低内存需求）
llm = ChatHuggingFace(
    model_id="mistralai/Mixtral-8x7B-Instruct-v0.1",
    task="text-generation",
    model_kwargs={"max_new_tokens": 100}
)

# 聊天对话
messages = [
    SystemMessage(content="You are a helpful AI assistant."),
    HumanMessage(content="What is the capital of France?")
]
response = llm.invoke(messages)
print(response.content)

输出示例：

The capital of France is Paris.

说明：

model_id：Hugging Face 模型 ID（如 meta-llama/Llama-3.2-3B-Instruct）。
支持 HumanMessage、SystemMessage 等 LangChain 消息格式。
大型模型需 GPU 或量化版本。

3.3 GGUF 模型支持

使用 HuggingFacePipeline 或 ChatHuggingFace 加载 GGUF 模型（需 llama-cpp-python）。

from langchain_huggingface import HuggingFacePipeline
from llama_cpp import Llama

# 加载 GGUF 模型
llm = HuggingFacePipeline.from_model_id(
    model_id="hugging-quants/Llama-3.2-3B-Instruct-Q8_0-GGUF",
    task="text-generation",
    model_kwargs={
        "filename": "llama-3.2-3b-instruct-q8_0.gguf",
        "n_ctx": 2048,
        "n_gpu_layers": 32  # GPU 加速
    }
)

# 生成文本
response = llm.invoke("Explain AI in simple terms.")
print(response)

说明：

from_model_id 自动从 Hugging Face 下载 GGUF 模型。
model_kwargs 传递给 llama-cpp-python，如上下文长度 (n_ctx) 和 GPU 层数 (n_gpu_layers)。

3.4 嵌入生成（HuggingFaceEmbeddings）

使用 HuggingFaceEmbeddings 生成文本向量，适用于语义搜索或 RAG。

from langchain_huggingface import HuggingFaceEmbeddings

# 加载嵌入模型
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

# 生成嵌入
texts = ["Hello, world!", "AI is amazing."]
embed_vectors = embeddings.embed_documents(texts)
print(len(embed_vectors), len(embed_vectors[0]))  # 输出: 2 384

# 查询嵌入
query_vector = embeddings.embed_query("What is AI?")
print(len(query_vector))  # 输出: 384

说明：

model_name：推荐 sentence-transformers 模型，如 all-MiniLM-L6-v2（轻量，384 维）。
embed_documents：批量生成文档嵌入。
embed_query：生成单条查询嵌入。

3.5 与 LangChain 工作流集成

结合 LangChain 的提示模板、内存和检索器构建复杂应用。

示例（RAG 应用）：

from langchain_huggingface import HuggingFaceEmbeddings, ChatHuggingFace
from langchain_core.prompts import ChatPromptTemplate
from langchain_community.vectorstores import FAISS
from langchain_core.runnables import RunnablePassthrough

# 嵌入模型
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

# 文档索引
docs = ["Python is a programming language.", "AI is transforming industries."]
vectorstore = FAISS.from_texts(docs, embeddings)
retriever = vectorstore.as_retriever()

# 聊天模型
llm = ChatHuggingFace(model_id="mistralai/Mixtral-8x7B-Instruct-v0.1")

# 提示模板
prompt = ChatPromptTemplate.from_template(
    "Context: {context}\nQuestion: {question}\nAnswer:"
)

# 构建 RAG 链
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
)

# 查询
response = chain.invoke("What is Python?")
print(response.content)

输出示例：

Python is a programming language known for its simplicity and versatility, widely used in web development, data science, and AI.

说明：

FAISS：向量存储，用于检索相关文档。
ChatPromptTemplate：格式化上下文和问题。
RunnablePassthrough：传递用户输入到链中。

4. Hugging Face 集成的优势

模型丰富：访问 Hugging Face Hub 上数千个开源模型（如 LLaMA、Mistral、Qwen）。
量化支持：直接使用 GGUF 格式的量化模型，降低硬件需求。
分词一致性：通过 transformers.AutoTokenizer 确保分词与模型一致。
自动化管理：huggingface_hub 自动下载和缓存模型。
社区支持：Hugging Face 提供详细模型卡和量化版本（如 TheBloke、hugging-quants）。

推荐模型：

文本生成：meta-llama/Llama-3.2-3B-Instruct、mistralai/Mixtral-8x7B-Instruct-v0.1
嵌入生成：sentence-transformers/all-MiniLM-L6-v2、BAAI/bge-small-en-v1.5
GGUF 模型：hugging-quants/Llama-3.2-3B-Instruct-Q8_0-GGUF

5. 性能与优化

高效性：
- HuggingFacePipeline 利用 transformers 的优化（如 FP16、INT8 量化）。
- ChatHuggingFace 支持 GGUF 模型，适合低资源设备。
GPU 加速：
- 确保 torch 和 transformers 支持 GPU：
```
llm = ChatHuggingFace(model_id="mistralai/Mixtral-8x7B-Instruct-v0.1", device="cuda")
```
- GGUF 模型通过 llama-cpp-python 的 n_gpu_layers 加速。
内存管理：
- 小型嵌入模型（如 all-MiniLM-L6-v2）内存占用低（约 100MB）。
- 大型模型需 16GB+ RAM 或 GPU，推荐量化模型。
批量处理：
- embed_documents 支持批量嵌入，加速处理。
- 使用 pipeline 的 batch_size 参数：
```
hf_pipeline = pipeline("text-generation", model="gpt2", batch_size=8)
```

6. 实际应用场景

聊天机器人：使用 ChatHuggingFace 构建对话系统。
RAG 系统：结合 HuggingFaceEmbeddings 和检索器实现知识增强生成。
文本总结：使用生成模型总结长文档。
语义搜索：使用嵌入模型实现文档搜索。
代码生成：加载 CodeLlama 或 StarCoder 模型生成代码。

示例（聊天机器人）：

from langchain_huggingface import ChatHuggingFace
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.messages import HumanMessage

# 加载模型
llm = ChatHuggingFace(model_id="Qwen/Qwen2-0.5B-Instruct")

# 定义提示模板
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a friendly AI assistant."),
    ("human", "{input}")
])

# 构建链
chain = prompt | llm

# 对话
response = chain.invoke({"input": "Tell me a joke."})
print(response.content)

输出示例：

Why did the computer go to art school? Because it wanted to learn how to draw a better "byte"!