[如何自托管AI嵌入：深入探索Self Hosted Embeddings和Hugging Face模型]

llzwxh888

于 2024-10-07 02:54:05 发布

阅读量70

点赞数 2

文章标签：人工智能 python

本文链接：https://blog.csdn.net/ppoojjj/article/details/142734489

版权

如何自托管AI嵌入：深入探索Self Hosted Embeddings和Hugging Face模型

在这篇文章中，我们将深入探讨如何使用SelfHostedEmbeddings、SelfHostedHuggingFaceEmbeddings和SelfHostedHuggingFaceInstructEmbeddings类，在自托管环境中加载和使用Hugging Face模型。我们将会提供实用的代码示例，讨论潜在的挑战及其解决方案，并引导您获取更多资源。

引言

大多数开发者习惯于在云端环境中使用预训练模型，但随着对隐私和数据控制需求的增加，自托管AI模型正在成为一个重要的趋势。本篇文章中，我们将学习如何在本地或自定义集群上运行这些嵌入服务。

主要内容

1. 环境准备

首先，您需要为模型选择适当的硬件环境。这里有几个选项：

import runhouse as rh

# For an on-demand A100 with GCP, Azure, or Lambda
gpu = rh.cluster(name="rh-a10x", instance_type="A100:1", use_spot=False)

# 使用API代理服务提高访问稳定性

2. 使用SelfHostedEmbeddings加载模型

我们可以加载常见的预训练模型并生成文本嵌入。以下是使用SelfHostedHuggingFaceEmbeddings的示例：

from langchain_community.embeddings import SelfHostedHuggingFaceEmbeddings

embeddings = SelfHostedHuggingFaceEmbeddings(hardware=gpu)
text = "This is a test document."
query_result = embeddings.embed_query(text)

3. 自定义加载函数

有时您可能需要自定义模型加载过程，比如使用特定的Hugging Face模型。以下是一个示例：

def get_pipeline():
    from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
    model_id = "facebook/bart-base"
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = AutoModelForCausalLM.from_pretrained(model_id)
    return pipeline("feature-extraction", model=model, tokenizer=tokenizer)

def inference_fn(pipeline, prompt):
    # Return last hidden state of the model
    if isinstance(prompt, list):
        return [emb[0][-1] for emb in pipeline(prompt)]
    return pipeline(prompt)[0][-1]

from langchain_community.embeddings import SelfHostedEmbeddings

embeddings = SelfHostedEmbeddings(
    model_load_fn=get_pipeline,
    hardware=gpu,
    model_reqs=["./", "torch", "transformers"],
    inference_fn=inference_fn,
)

query_result = embeddings.embed_query(text)