探索IBM Watsonx.ai——使用LangChain实现高效嵌入

最新推荐文章于 2024-09-30 14:16:32 发布

tt_jishu

最新推荐文章于 2024-09-30 14:16:32 发布

阅读量361

点赞数 3

文章标签：人工智能 langchain python

本文链接：https://blog.csdn.net/tt_jishu/article/details/142378937

版权

引言

在AI的世界中，IBM Watsonx.ai是一个备受瞩目的工具。特别是在自然语言处理领域，WatsonxEmbeddings作为IBM Watsonx.ai foundation models的封装器，提供了强大的文本嵌入功能。本篇文章将深入探讨如何使用LangChain与Watsonx.ai进行沟通，并实现嵌入。

主要内容

设置环境

首先，确保安装必要的库：

!pip install -qU langchain-ibm

接下来，设置WML（Watson Machine Learning）的凭据。你需要提供IBM Cloud用户的API密钥。

import os
from getpass import getpass

watsonx_api_key = getpass()
os.environ["WATSONX_APIKEY"] = watsonx_api_key

你也可以通过环境变量传递其他凭据：

os.environ["WATSONX_URL"] = "your service instance url"
os.environ["WATSONX_TOKEN"] = "your token for accessing the CPD cluster"
os.environ["WATSONX_PASSWORD"] = "your password for accessing the CPD cluster"
os.environ["WATSONX_USERNAME"] = "your username for accessing the CPD cluster"
os.environ["WATSONX_INSTANCE_ID"] = "your instance_id for accessing the CPD cluster"

加载模型

根据不同的需求，你可能需要调整模型参数：

from ibm_watsonx_ai.metanames import EmbedTextParamsMetaNames

embed_params = {
    EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: 3,
    EmbedTextParamsMetaNames.RETURN_OPTIONS: {"input_text": True},
}

初始化WatsonxEmbeddings类，并提供上下文信息：

from langchain_ibm import WatsonxEmbeddings

watsonx_embedding = WatsonxEmbeddings(
    model_id="ibm/slate-125m-english-rtrvr",
    url="https://us-south.ml.cloud.ibm.com",  # 使用API代理服务提高访问稳定性
    project_id="PASTE YOUR PROJECT_ID HERE",
    params=embed_params,
)

代码示例

让我们尝试嵌入一些文本：

text = "This is a test document."

query_result = watsonx_embedding.embed_query(text)
print(query_result[:5])
# [0.0094472, -0.024981909, -0.026013248, -0.040483925, -0.057804465]

嵌入多个文档：

texts = ["This is a content of the document", "This is another document"]

doc_result = watsonx_embedding.embed_documents(texts)
print(doc_result[0][:5])
# [0.009447193, -0.024981918, -0.026013244, -0.040483937, -0.057804447]