LlamaIndex整合ChatGLM

最新推荐文章于 2024-06-21 15:24:37 发布

slient_love

最新推荐文章于 2024-06-21 15:24:37 发布

阅读量772

点赞数 7

分类专栏： AI 文章标签： langchain embedding LlamaIndex python

本文链接：https://blog.csdn.net/slient_love/article/details/138190556

版权

AI 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

LlamaIndex整合chatglm

LlamaIndex官方网站上给出的示例都是采用的Open AI，那么对于没有open_api_key的用户怎么执行查看LlamaIndex的效果嘞。下面是LlamaIndex整合ChatGLM的一个简单示例：

1. 安装相关依赖

pip install zhipuai
pip install langchain
pip install langchain-openai
pip install llama-index
pip install llama-index-embeddings-huggingface

2. 定义获取ChatGLM

具体怎么获取，请参考这篇文章：python实现在线 ChatGLM调用
本次使用的是langchain方式调用在线ChatGLM。

新建一个zhipu_llm.py文件：

from langchain_openai import ChatOpenAI
import jwt
import time
from langchain_core.messages import HumanMessage

zhipuai_api_key = "智普清言的API-KEY"


def generate_token(apikey: str, exp_seconds: int):
    try:
        id, secret = apikey.split(".")
    except Exception as e:
        raise Exception("invalid apikey", e)

    payload = {
        "api_key": id,
        "exp": int(round(time.time() * 1000)) + exp_seconds * 1000,
        "timestamp": int(round(time.time() * 1000)),
    }

    return jwt.encode(
        payload,
        secret,
        algorithm="HS256",
        headers={"alg": "HS256", "sign_type": "SIGN"},
    )


class ChatZhiPuAI(ChatOpenAI):
    def __init__(self, model_name):
        super().__init__(model_name=model_name, openai_api_key=generate_token(zhipuai_api_key, 10),
                         openai_api_base="https://open.bigmodel.cn/api/paas/v4")

    def invoke(self, question):
        messages = [
            HumanMessage(content=question),
        ]
        return super().invoke(messages)

3. LlamaIndex整合ChatGLM实现自定义文档的简单问答

完整代码如下：

from llama_index.core import GPTVectorStoreIndex, SimpleDirectoryReader
from llm.zhipu_llm import ChatZhiPuAI
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

# 加载数据，需确认数据目录的正确性
documents = SimpleDirectoryReader('data').load_data()

# 实例化BAAI/bge-small-en-v1.5模型
baai_embedding = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

# 使用 BAAI/bge-small-en-v1.5 模型初始化GPTVectorStoreIndex
index = GPTVectorStoreIndex.from_documents(documents, embed_model=baai_embedding)

chatglm = ChatZhiPuAI(model_name="glm-4")
query_engine = index.as_query_engine(llm=chatglm)
response = query_engine.query("LlamaIndex为何而生？")
print(response)

response = query_engine.query("LlamaIndex如何破局？")
print(response)

其中：
需要在上述代码同目录下新建一个data文件夹，里面放入需要检索的知识文档

from llm.zhipu_llm import ChatZhiPuAI这段代码导入的就是刚才新建的zhipu_llm.py文件中自定义的ChatZhiPuAI。

如果不指定嵌入向量模型的话，会默认使用OpenAI的Embedding，需要设置OPEN-API-KEY，因此这里使用llama_index.embeddings.huggingface提供的BAAI/bge-small-en-v1.5作为嵌入模型。
首先将其下载并实例化成一个BaseEmbedding对象，然后将该对象传递给GPTVectorStoreIndex的embed_model参数。然后初始化GPTVectorStoreIndex：

from llama_index.embeddings.huggingface import HuggingFaceEmbedding

documents = SimpleDirectoryReader('data').load_data()
baai_embedding = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
index = GPTVectorStoreIndex.from_documents(documents, embed_model=baai_embedding)

4.运行效果如下：

LlamaIndex was created to enhance LLM (Large Language Model) applications, such as GPT, by integrating and structuring private or domain-specific data. These models typically interact with data through natural language interfaces and are pre-trained on a vast amount of publicly available data. However, applications built on top of LLMs often require the use of private or specialized data sources that are scattered across different platforms and storage systems, which can include being behind APIs, within SQL databases, or even in PDFs and presentations. LlamaIndex addresses this need by providing a framework to inject and access this diverse data.
LlamaIndex破局的方式是通过提供五大核心工具：Data connectors、Data indexes、Engines、Data agents以及Application integrations。这些工具协同工作，帮助用户注入、结构化并访问私有或特定领域的数据，从而增强基于自然语言交互的LLM模型。这些工具支持从不同来源整合数据，包括API、SQL数据库、PDF文件等，使得构建在LLM之上的应用程序能够更有效地利用这些数据。

slient_love

关注

7
点赞
踩
11

收藏

觉得还不错? 一键收藏
3
评论
LlamaIndex整合ChatGLM

python实现在线 ChatGLM调用本次使用的是langchain方式调用在线ChatGLM。新建一个zhipu_llm.py文件import jwtzhipuai_api_key = "智普清言的API-KEY"try:payload,secret,# 加载数据，需确认数据目录的正确性# 实例化BAAI/bge-small-en-v1.5模型# 使用 BAAI/bge-small-en-v1.5 模型初始化GPTVectorStoreIndex。
复制链接

扫一扫

专栏目录