chatRTX / chat with RTX 下载及更换中文模型

最新推荐文章于 2024-07-02 11:04:44 发布

冰宸旻

最新推荐文章于 2024-07-02 11:04:44 发布

阅读量979

点赞数 14

分类专栏：配置教程文章标签： windows 人工智能

本文链接：https://blog.csdn.net/weixin_64404339/article/details/139701888

版权

配置教程专栏收录该内容

2 篇文章 0 订阅

订阅专栏

本体下载

Build a Custom LLM with ChatRTX | NVIDIA
安装目录为 C:\Users\wyh\AppData\Local\NVIDIA\ChatRTX
conda环境目录：C:\Users\wyh\AppData\Local\NVIDIA\ChatRTX\env_nvd_rag

激活方式conda activate C:\Users\wyh\AppData\Local\NVIDIA\ChatRTX\env_nvd_rag

安装额外模型

具体模型需要TensorRT支持
TensorRT-LLM/examples at main · NVIDIA/TensorRT-LLM (github.com)一些提供好的tensorRT可量化模型模板
以chatglm3-6b为例

下载模型

 # 在C:\Users\wyh\AppData\Local\NVIDIA\ChatRTX\RAG\trt-llm-rag-windows-ChatRTX_0.3\model 下打开目录
  git lfs install
  git clone https://www.modelscope.cn/ZhipuAI/chatglm3-6b.git
  # 漫长的等待，46.5G

下载完后建立三个文件夹engine,model_checkpoints,tokenizer

使用TensorRT转换模型

0.3版本不再自带TensorRT-LLM源文件，需要自己下载
打开conda命令行

 conda activate C:\Users\wyh\AppData\Local\NVIDIA\ChatRTX\env_nvd_rag #具体路径自行修改
 pip show tensort-llm #查看tensort版本

在Releases · NVIDIA/TensorRT-LLM (github.com)里面下载对应版本源代码，直接下载在model文件夹下即可

 # 将模型参数转化为TensorRT可用格式
 python .\TensorRT-LLM-0.9.0\examples\chatglm\convert_checkpoint.py  `      #对应模型的conver_checkpoint文件
     --model_dir .\trt-llm-rag-windows-ChatRTX_0.3\model\chatglm3-6b  `     # 模型本体路径
     --output_dir  .\trt-llm-rag-windows-ChatRTX_0.3\model\chatglm3-6b\model_checkpoints #输出路径，路径都需要修改
     
 # 生成TensorRT引擎
 trtllm-build --checkpoint_dir .\trt-llm-rag-windows-ChatRTX_0.3\model\chatglm3-6b\model_checkpoints `
         --gemm_plugin float16 `
         --output_dir .\trt-llm-rag-windows-ChatRTX_0.3\model\chatglm3-6b\engine 
         --max_input_len=4096 
         --max_output_len=2048

max_input_len和max_output_len需要根据模型修改，例如chatglm在tensorRT里面有说明，Using ChatGLM2-6B-32K / ChatGLM3-6B-32K models, we need to guarantee max_batch_size * max_beam_width * (max_input_len + max_output_len) <= 78398 = 2^31 / (13696 * 2) due to constrain of TensorRT. For example, we will fail to build engine while using default max_batch_size (8) and adding arguments --max_beam_width=4 --max_input_len=20000 --max_output_len=100

编辑config

路径C:\Users\wyh\AppData\Local\NVIDIA\ChatRTX\RAG\trt-llm-rag-windows-ChatRTX_0.3\config\config.json

在model-support里面修改，下面是参考

 {
                 "name": "ChatGLM 3 6B",
                 "id": "chatglm3-6b",
                 "ngc_model_name": "nvidia/chatglm3-6b-chat:1.0",
                 "is_downloaded_required": false,
                 "downloaded": true,         
                 "is_installation_required": false,
                 "setup_finished": true,
                 "min_gpu_memory": 8,
                 "should_show_in_UI": true,
                 "prerequisite": {
                     "checkpoints_files": [
                         "config.json",
                         "rank0.safetensors"
                     ],
                     "tokenizer_ngc_dir": "tokenizer",
                     "tokenizer_files": {
                         "config": "config.json",
                         "model": "tokenizer.model",
                         "tokenizer_config": "tokenizer_config.json"
                     },
                     "checkpoints_local_dir": "model_checkpoints",
                     "tokenizer_local_dir": "tokenizer",
                     "engine_build_command": "trtllm-build --checkpoint_dir %checkpoints_local_dir% --output_dir %engine_dir% --gemm_plugin float16 --max_batch_size 1 --max_input_len 7168 --max_output_len 1024",
                     "engine_dir": "engine"
                 },
                 "metadata": {
                     "engine": "rank0.engine",
                     "max_new_tokens": 512,              // 最大输入
                     "max_input_token": 7168,            //最大输出
                     "temperature": 0.1
                 },
                 "model_info": "ChatGLM-6B is an open bilingual language model based on General Language Model framework, with 6.2 billion parameters | <a href= 'https://huggingface.co/THUDM/chatglm3-6b/blob/main/MODEL_LICENSE'> License </a>",
                 "model_license": "<a href= 'https://huggingface.co/THUDM/chatglm3-6b/blob/main/MODEL_LICENSE'> License </a>",
                 "model_size": "15.8GB"
 }

效果

冰宸旻

关注

14
点赞
踩
14

收藏

觉得还不错? 一键收藏
打赏
1
评论
chatRTX / chat with RTX 下载及更换中文模型

max_input_len和max_output_len需要根据模型修改，例如chatglm在tensorRT里面有说明，Using ChatGLM2-6B-32K / ChatGLM3-6B-32K models, we need to guarantee。conda环境目录：C:\Users\wyh\AppData\Local\NVIDIA\ChatRTX\env_nvd_rag。安装目录为 C:\Users\wyh\AppData\Local\NVIDIA\ChatRTX。打开conda命令行。
复制链接

扫一扫