如何使用自定义数据训练chatGPT，并创建专属与自己的chatGPT

最新推荐文章于 2024-03-06 16:49:58 发布

nizaiganshenmoa

最新推荐文章于 2024-03-06 16:49:58 发布

阅读量1.9k

点赞数 33

文章标签： chatgpt 人工智能 ai

本文链接：https://blog.csdn.net/nizaiganshenmoa/article/details/136291895

版权

随着chatGPT的更新，现在可以通过自定义数据来训练它，创建专属与自己的chatGPT应用了。现在开始吧

准备工作

1.Python3安装/更新命令

python3 -m pip install --upgrade pip

查看一下是否安装成功

python --version

2.依赖

pip3 install openai llama-index pypdf gradio

openai — 这是安装OpenAI python 库
llama-index — 这是为我们的 LLM 应用程序安装LlamaIndex数据框架
pypdf — 这是开源 python 库，将用于读取我们训练的pdf文件
gradio — Gradio.app是一种创建简单 Web UI

3.获取openAI 密钥（需要梯子）

如果没注册：注册链接
如果你已经注册了创建一个新的API密钥（一定要保存到本地，因为只在生成的时候显示一次！！！）：API 密钥
如果你想使用/体验gpt-4模型：教程

发车

1.创建一个文件夹myAIApp里面有trainingData（存放训练用的pdf文件）文件夹，在创建一个app.py的python脚本文件。如图：

2.创建应用程序

app.py文件代码：

from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader, LLMPredictor, ServiceContext, StorageContext, load_index_from_storage
from langchain import OpenAI
import gradio
import os

os.environ["OPENAI_API_KEY"] = '你的API key'

def construct_index(directory_path):
    # 设置数量输出标记
    num_outputs = 256

    _llm_predictor = LLMPredictor(llm=OpenAI(temperature=0.5, model_name="gpt-3.5-turbo", max_tokens=num_outputs))

    service_context = ServiceContext.from_defaults(llm_predictor=_llm_predictor)

    docs = SimpleDirectoryReader(directory_path).load_data()

    index = GPTVectorStoreIndex.from_documents(docs, service_context=service_context)
    
    #存储索引的目录
    index.storage_context.persist(persist_dir="indexes")

    return index

def chatbot(input_text):
    
    # 重建存储上下文 storage_context
    storage_context = StorageContext.from_defaults(persist_dir="indexes")
    
    #使用storage_context从目录加载索引
    query_engne = load_index_from_storage(storage_context).as_query_engine()
    
    response = query_engne.query(input_text)
    
    return response.response

#使用gradio创建Web UI
iface = gradio.Interface(fn=chatbot,
                     inputs=gradio.inputs.Textbox(lines=5, label="placeholder"),
                     outputs="text",
                     title="gpt-3自定义训练")

#基于构建索引在traininData文件夹中的文档上
#如果已经训练了应用程序并且需要重新运行它，则可以跳过该步骤
index = construct_index("trainingData")

#使用gradio启动Web UI
iface.launch(share=True)

运行应用程序：