Llama-Factory + Ollama 打造属于自己的中文版 Llama3_llama-factory ollama

月流霜

于 2024-07-22 09:00:00 发布

阅读量4

点赞数

文章标签： llama 人工智能

原文链接：https://blog.csdn.net/hyang1974/article/details/138470087?spm=1001.2100.3001.7377&utm_medium=distribute.pc_feed_blog_category.none-task-blog-classify_tag-1-138470087-null-null.nonecase&depth_1-utm_source=distribute.pc_feed_blog_category.none-task-blog-cla

版权

Meta 推出 Llama3 也有一小段时间了。Llama3 包含 8B 和 70B 两种参数规模，涵盖预训练和指令调优的变体。Llama 3 支持多种商业和研究用途，并已在多个行业标准测试中展示了其卓越的性能（关于Llama3的具体介绍可以参考本站另外一篇博文：尝鲜 Meta Llama 3）。和以往的原始 Llama 模型一样，Llama 3 对中文的支持效果欠佳，经常会出现你用中文提问，它用英文或中文+英文回复的现象。

例如当我问它：”你是谁？“，会得到如下的回答：


  
  
    
    
     
     
    
    
    
    
     
     
      
      >>> 你是谁？
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      I am LLaMA, an AI assistant developed 
      
      by Meta AI that can understand 
      
      and respond 
      
      to human input 
      
      in a conversational manner. I
      
      'm not a human, but a computer program designed to simulate conversation and answer questions to the best of my ability based on my training data. I can generate text on a wide range of topics, from science and history to entertainment and culture.

然后让它”介绍一下上海的旅游景点“，则会得到如下的回答：


  
  
    
    
     
     
    
    
    
    
     
     
      
      >>> 介绍一下上海的旅游景点
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      Shanghai, a city 
      
      with a rich history 
      
      and culture, has numerous tourist attractions that showcase its modernity, traditional charm, 
      
      and natural beauty. Here 
      
      are some 
      
      of the most popular tourist spots 
      
      in Shanghai:
     
     
    
    

    
    
     
     
    
    
    
    
     
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      **Landmarks:
      
      **
     
     
    
    

    
    
     
     
    
    
    
    
     
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      1. 
      
      **Oriental Pearl Tower
      
      **: A iconic TV tower 
      
      with panoramic views 
      
      of the city.
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      2. 
      
      **Shanghai Tower
      
      **: The second-tallest building 
      
      in the world, offering stunning views 
      
      from its observation deck.
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      3. 
      
      **Jin Mao Tower
      
      **: A 
      
      421-meter-tall skyscraper 
      
      and home 
      
      to the luxurious Mandarin Oriental Hotel.
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      4. 
      
      **The Bund
      
      **: A famous waterfront promenade lined 
      
      with colonial-era buildings 
      
      and stunning views 
      
      of the Huangpu River.
     
     
    
    

    
    
     
     
    
    
    
    
     
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      **Historical Sites:
      
      **
     
     
    
    

    
    
     
     
    
    
    
    
     
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      1. 
      
      **Yu Garden
      
      **: A classic Chinese garden 
      
      with beautiful landscaping, pavilions, 
      
      and koi ponds.
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      2. 
      
      **Shanghai Museum
      
      **: A museum showcasing Chinese art, history, 
      
      and culture.
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      3. 
      
      **Xintiandi
      
      **: A charming neighborhood 
      
      with traditional Shikumen architecture 
      
      and modern boutiques.
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      4. 
      
      **French Concession
      
      **: A historic neighborhood 
      
      with colonial-era buildings, cafes, 
      
      and restaurants.
     
     
    
    

    
    
     
     
    
    
    
    
     
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      **Nature Reserves:
      
      **
     
     
    
    

    
    
     
     
    
    
    
    
     
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      1. 
      
      **Changfeng Park
      
      **: A large public park 
      
      with gardens, lakes, 
      
      and walking trails.
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      2. 
      
      **Fuxing Island
      
      **: A tranquil oasis 
      
      in the Huangpu River 
      
      with beautiful landscaping 
      
      and walking paths.
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      3. 
      
      **Shanghai Botanical Garden
      
      **: A peaceful garden featuring various plant species 
      
      and a peaceful atmosphere.
     
     
    
    

    
    
     
     
    
    
    
    
     
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      **
      
      Other Attractions:
      
      **
     
     
    
    

    
    
     
     
    
    
    
    
     
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      1. 
      
      **Disney Resort Shanghai
      
      **: An amusement park 
      
      with Disney-themed rides, shows, 
      
      and attractions.
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      2. 
      
      **Shanghai Grand Opera House
      
      **: A stunning opera house 
      
      with performances ranging 
      
      from music 
      
      to dance.
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      3. 
      
      **Shanghai Zoo
      
      **: A popular zoo 
      
      with various animal species 
      
      and a panda enclosure.
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      4. 
      
      **The Shanghai Tower
      
      's Observation Deck**: For panoramic views of the city.
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      These are just a few of the many tourist attractions in Shanghai. Depending on your interests, there are plenty more hidden gems to explore! 😊

回答得还是很简洁而全面的。缺点就是全是英文答案，对于英文不太好的人有点吃力。对 Llama3 进行微调来支持中文是一项必须的工作。

Llama3 中文模型的现状

实际上，当 Llama3 推出以后，很多开发者在中文支持方面做了大量的工作，一大批微调的中文模型如雨后春笋般纷纷涌现。目前比较知名的中文模型有：

Unichat-llama3-Chinese（地址）：中国联通AI创新中心发布业界第一个llama3中文指令微调模型，全参数微调(非lora)，以Meta Llama 3为基础,增加中文数据进行训练,实现llama3模型高质量中文问答，模型上下文保持原生长度8K，支持长度64K版本将于后续发布。
OpenBuddy – Open Multilingual Chatbot（地址）：OpenBuddy是一个面向全球用户的强大的开放式多语言聊天机器人模型，强调对话式AI以及对英语、中文和其他语言的无缝多语言支持。
Llama3-Chinese（地址）：在500k高质量中文多轮SFT数据、100k英语多轮SFT数据和2k单轮自我认知数据上训练的大型模型，采用基于Meta的DORA和LORA+的训练方法。以Llama-3-8B为基础。
Llama3-8B-Chinese-Chat（地址）：是一个针对中文和英文用户的指令调整语言模型，具有角色扮演和工具使用等各种能力，建立在 Meta-Llama-3-8B-Instruct 模型的基础上。
Llama-3-8B-Instruct-Chinese-chat（地址）：使用 firefly-train-1.1M，moss-003-sft-data，school_math_0.25M，ruozhiba 等数据集微调的模型，基于Llama-3-8B-Instruct。
Bunny-Llama-3-8B-V（地址）：Bunny 是一系列轻量但功能强大的多模式模型。它提供多种即插即用视觉编码器，如 EVA-CLIP、SigLIP 和语言主干，包括 Llama-3-8B、Phi-1.5、StableLM-2、Qwen1.5、MiniCPM 和 Phi-2。为了弥补模型大小的减小，通过从更广泛的数据源中进行精选来构建信息更丰富的训练数据。
llava-llama-3-8b-v1_1（地址）：llava-llama-3-8b-v1_1 是一个 LLaVA 模型，由XTuner使用ShareGPT4V-PT和InternVL-SFT从 meta-llama/Meta-Llama-3-8B-Instruct和CLIP-ViT-Large-patch14-336进行微调。
。。。

还有很多新的模型就不一一列举了，想要尝试这些模型可以直接部署试用。本文则探讨如何使用 Llama-Factory 对 Llama3 进行中文微调的具体过程，并通过 Ollama 本地部署中文微调的 Llama3 模型，打造属于自己的个性化的 Llama3 LLM 。

使用 Llama-Factory 对 Llama3 进行中文微调

LLaMA-Factory 是一个开源项目，它提供了一套全面的工具和脚本，用于微调、服务和基准测试 LLaMA 模型。LLaMA（大型语言模型自适应）是 Meta AI 开发的一组基础语言模型，在各种自然语言任务上表现出色。

LLaMA-Factory 存储库提供以下内容，让您轻松开始使用 LLaMA 模型：

数据预处理和标记化的脚本
用于微调 LLaMA 模型的训练流程
使用经过训练的模型生成文本的推理脚本
评估模型性能的基准测试工具
用于交互式测试的 Gradio Web UI

关于 Llama-Factory 的具体介绍可以参考本站的另外一篇博文：LLaMA-Factory 简介。

安装 Llama-Factory

首先从 github 拉取 Llama-Factory：

git clone https://github.com/hiyouga/LLaMA-Factory.git

为了方便今后的调试和部署，我选择了使用 docker 的方式来运行 Llama-Factory。它提供了一个参考的 Dockerfile：


  
  
    
    
     
     
    
    
    
    
     
     
      
      FROM nvcr.io
      
      /nvidia
      
      /pytorch:
      
      24.01-py
      
      3
     
     
    
    

    
    
     
     
    
    
    
    
     
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      WORKDIR 
      
      /app
     
     
    
    

    
    
     
     
    
    
    
    
     
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      COPY requirements.txt 
      
      /app
      
      /
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      RUN pip install -r requirements.txt
     
     
    
    

    
    
     
     
    
    
    
    
     
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      COPY . 
      
      /app
      
      /
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      RUN pip install -e .[deepspeed,metrics,bitsandbytes,qwen]
     
     
    
    

    
    
     
     
    
    
    
    
     
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      VOLUME [ 
      
      "/root/.cache/huggingface/", 
      
      "/app/data", 
      
      "/app/output" ]
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      EXPOSE 
      
      7860
     
     
    
    

    
    
     
     
    
    
    
    
     
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      CMD [ 
      
      "llamafactory-cli webui" ]

可以根据自己的实际情况进行修改。我家里的网络自己搭建了 proxy server（为了访问 github, huggingface 等站点），所以更改 Dockerfile 如下。该 Dockerfile 从 docker buildx 命令行获取 http_proxy 和 https_proxy 变量，并设置 docker buildx 环境里的相应环境变量，这样编译 docker 镜像的过程中就能使用代理服务器了。


  
  
    
    
     
     
    
    
    
    
     
     
      
      FROM nvcr.io
      
      /nvidia
      
      /pytorch:
      
      24.01-py
      
      3
     
     
    
    

    
    
     
     
    
    
    
    
     
      
     
     
    
    

    
    
     
     
    
    
    
    
     
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      # 使用构建参数设置环境变量
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      ARG http_proxy
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      ARG https_proxy
     
     
    
    

    
    
     
     
    
    
    
    
     
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      # 设置环境变量
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      ENV HTTP_PROXY
      
      =$http_proxy
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      ENV HTTPS_PROXY
      
      =$https_proxy
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      ENV http_proxy
      
      =$http_proxy
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      ENV https_proxy
      
      =$https_proxy
     
     
    
    

    
    
     
     
    
    
    
    
     
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      WORKDIR 
      
      /app
     
     
    
    

    
    
     
     
    
    
    
    
     
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      COPY requirements.txt 
      
      /app
      
      /
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      RUN pip install -r requirements.txt
     
     
    
    

    
    
     
     
    
    
    
    
     
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      COPY . 
      
      /app
      
      /
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      RUN pip install -e .[deepspeed,metrics,bitsandbytes,qwen]
     
     
    
    

    
    
     
     
    
    
    
    
     
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      # unset环境变量。Container运行过程中需要代理服务器的话通过-e 传入参数
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      ENV HTTP_PROXY
      
      =
      
      ""
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      ENV HTTPS_PROXY
      
      =
      
      ""
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      ENV http_proxy
      
      =
      
      ""
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      ENV https_proxy
      
      =
      
      ""
     
     
    
    

    
    
     
     
    
    
    
    
     
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      VOLUME [ 
      
      "/root/.cache/huggingface/", 
      
      "/app/data", 
      
      "/app/output" ]
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      EXPOSE 
      
      7860
     
     
    
    

    
    
     
     
    
    
    
    
     
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      CMD [ 
      
      "python", 
      
      "src/train_web.py" ]

运行如下脚本生成自己的 Llama-Factory docker 镜像：

docker buildx build --build-arg http_proxy=http://proxy_ip:port --build-arg https_proxy=http://proxy_ip:port -t llama-factory:v0.00 .

Docker 镜像编译成功后，运行 docker image list 就可以看到编译出来的 docker 镜像了：

docker image list

接下来我们需要运行 Llama-Factory container。Llama-Factory 提供了一个参考的 docker-compose.yml 文件来运行 docker，我们可以按照自己的实际情况进行修改。我这边修改的版本如下，修改的部分参考注释。


  
  
    
    
     
     
    
    
    
    
     
     
      
      version: 
      
      '3.8'
     
     
    
    

    
    
     
     
    
    
    
    
     
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      services:
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
        llama-factory:
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
          #build:
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
          #  dockerfile: Dockerfile
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
          #  context: .
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
          image: llama-factory:v
      
      0.00 # 修改为编译出来的 docker image 名称
      
      /版本
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
          container_name: llama_
      
      factory # container 名称
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
          volumes:
     
     
    
    

    
    
     
     
    
    
    
    
     
          
      
       - .
      
      /volumes
      
      /huggingface:
      
      /root
      
      /.cache
      
      /huggingface
      
      /
     
     
    
    

    
    
     
     
    
    
    
    
     
          
      
       - .
      
      /volumes
      
      /
      
      data:
      
      /app
      
      /
      
      data
     
     
    
    

    
    
     
     
    
    
    
    
     
          
      
       - .
      
      /volumes
      
      /
      
      output:
      
      /app
      
      /
      
      output
     
     
    
    

    
    
     
     
    
    
    
    
     
          
      
       - 
      
      /mnt
      
      /dev
      
      /myprojects
      
      /llm-webui-docker
      
      /models:
      
      /app
      
      /models # 映射自己的models目录
     
     
    
    

    
    
     
     
    
    
    
    
     
         
      
      environment:
     
     
    
    

    
    
     
     
    
    
    
    
     
          
      
       - CUDA_VISIBLE_DEVICES
      
      =
      
      0
     
     
    
    

    
    
     
     
    
    
    
    
     
          
      
       - GRADIO_SERVER_PORT
      
      =
      
      7864 # webui跑在
      
      7864端口上，
      
      7860被comfyui占用了
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
          ports:
     
     
    
    

    
    
     
     
    
    
    
    
     
          
      
       - 
      
      "7864:7864" # webui跑在
      
      7864端口上，
      
      7860被comfyui占用了
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
          ipc: host
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
          deploy:
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
            resources:
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
              reservations:
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
                devices:
     
     
    
    

    
    
     
     
    
    
    
    
     
              
      
       - driver: nvidia
     
     
    
    

    
    
     
     
    
    
    
    
     
                 
      
      count: 
      
      "all"
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
                  capabilities: [gpu]
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
          restart: unless-stopped

运行如下命令启动 Llama-Factory 容器：

docker-compose up --detach

现在可以看到 llama_factory container 已经在正常运行了：

访问 http://server_ip:7864 则可以看到如下的 Llama-Factory WebUI 界面：

使用 Llama-Factory 为 Llama3 训练中文 LoRA

1. 模型名称与路径

进入 Llama-Factory WebUI 后，先选择 Model name 为 LLaMA-8B，这时候 Model path 会自动变成 meta-llama/Meta-Llama-3-8B 并自动从 huggingface 拉取 Meta-Llama-3-8B 模型。我这里因为已经下载了 Meta-Llama-3-8B 并映射到了 Llama-Factory container 的 /app/models/llama3-hf/Meta-Llama-3-8B 目录，所以我这里的 Model path 也设置为该路径。

2. 设置 Advanced configurations

点右边的箭头展开 Advance configurations。量化位数（Quantization bit）可以选择 4 ，减小模型的体积并提高速度。Prompt template 则选择 llama3。如果有安装 flashattn2 或者 unsloth 的话可以在 boost 里选择，我这里没有安装所以选择 none。

3. 设置训练参数

在 Train 标签页里设置训练相关的参数。主要的参数有：

Stage：设置为 Supervised Fine-Tuning
Data dir：我这里设置为 data ，因为 Llama-Factory 项目自身也带了一些中文数据集，我打算直接使用。如果你自己下载了别的中文数据集，请设置相应的数据集所在的目录地址
Dataset：我选择了 Llama-Factory 项目里自带的 alpaca_zh，alpaca_gpt4_zh 和 oaast_sft_zh 数据集
Learning rate：安装自己的需要设置。我采用了缺省的 5e-5
Epochs：按照自己的需要设置。我采用了缺省的 3.0
Cutoff length：按照自己的需要设置。数字越大，对 GPU 和显存的要求越高；数字越小，则可能对长句的语义理解不够充分。我这里选择缺省的 1024
Batch size：按照自己的需要甚至。数字越大，对 GPU 和显存的要求越高。我这里选择缺省的 2
Output dir / Config path：按照自己的需要设置

设置完成后，可以点击 Preview dataset 来查看一下数据集内容。

点击 Preview command 可以查看训练过程的命令行参数。如果不希望使用 WebUI 进行训练，则可以直接执行命令行，这样也有助于进一步编程和自动化：

点击 Save arguments 则将目前的训练设置保存到指定的 json 文件。点击 Load arguments 则可以加载以前保存好的训练设置。

4. 开始训练

参数设置好后，就可以点击 Start 开始训练。

Llama-Factory 训练脚本开始解析数据。

整个训练的过程预计 8 个小时不到一点。

训练完成后，在 /app/output/llama3_cn_train_2024-04-27-16-32-46 目录下可以看到如下的文件目录结构：

这些就是 Llama-Factory 训练出来的 LoRA。可以在自动生成的 Readme.md 文件查看 LoRA 的信息：


  
  
    
    
     
     
    
    
    
    
     
     
      
      license: 
      
      other
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      library_name: peft
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      tags:
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      - llama-factory
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      - lora
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      - generated_
      
      from_trainer
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      base_model: 
      
      /app
      
      /models
      
      /llama
      
      3-hf
      
      /Meta-Llama-
      
      3-
      
      8B
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      model-index:
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      - name: llama
      
      3_cn_train_
      
      2024-
      
      04-
      
      27-
      
      16-
      
      32-
      
      46
     
     
    
    

    
    
     
     
    
    
    
    
     
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      。。。后面的内容省略

在 Ollama 中打造自己的中文版 Llama3

接下来，我们要在 Ollama 中运行 llama3 和我们训练出来的 LoRA，打造属于自己的中文版 Llama3。

Ollama 是一个开源的大模型管理工具，它提供了丰富的功能，包括模型的训练、部署、监控等。通过Ollama，你可以轻松地管理本地的大模型，提高模型的训练速度和部署效率。此外，Ollama还支持多种机器学习框架，如TensorFlow、PyTorch等，使得你可以根据自己的需求选择合适的框架进行模型的训练。

运行 Ollama docker

从 github 拉取 ollama 代码：

git clone https://github.com/ollama/ollama.git

Ollama github项目提供了参考的 Dockerfile，可以编译自己的 ollama 镜像并运行。也可以直接运行 ollama 官方发布的 docker 镜像：

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

转换 LoRA 格式

按照 Ollama modelfile ADAPTER 的说明，Ollama 支持 ggml 格式的 LoRA，所以我们需要把刚才生成的 LoRA 转换成为 ggml 格式。为此，我们需要使用到 Llama.cpp 的某些脚本。有关 Llama.cpp 开源项目的介绍请参考本站另外一篇博文：Llama.cpp 上手实战指南。

ADAPTER
The ADAPTER instruction is an optional instruction that specifies any LoRA adapter that should apply to the base model. The value of this instruction should be an absolute path or a path relative to the Modelfile and the file must be in a GGML file format. The adapter should be tuned from the base model otherwise the behaviour is undefined.

在 llama.cpp 项目中，有如下几个用于转换格式的 python 脚本。我们将使用 conver-lora-to-ggml.py 脚本来转换格式。

运行如下的命令（其中 /app/output/llama3_cn_train_2024-04-27-16-32-46 是 Llama-Factory 生成 LoRA 的路径）：

./conver-lora-to-ggml.py /app/output/llama3_cn_train_2024-04-27-16-32-46 llama

运行完这个命令后，将在 /app/output/llama3_cn_train_2024-04-27-16-32-46 下生成 ggml-adapter-model.bin 文件。这个文件就是 Ollama 需要的 ggml 格式的 LoRA 文件。

在 ollama 中创建自己的 llama3 中文模型

我们使用 ollama 的 modelfile 来创建自己的 llama3 中文模型。我自己使用的参考 llama3.modelfile 内容如下：


  
  
    
    
     
     
    
    
    
    
     
     
      
      # 
      
      set the base model
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      FROM llama
      
      3:
      
      8b
     
     
    
    

    
    
     
     
    
    
    
    
     
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      # 
      
      set custom parameter 
      
      values
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      PARAMETER temperature 
      
      1
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      PARAMETER num_keep 
      
      24
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      PARAMETER 
      
      stop 
      
      <|
      
      start_header_id|
      
      >
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      PARAMETER 
      
      stop 
      
      <|
      
      end_header_id|
      
      >
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      PARAMETER 
      
      stop 
      
      <|eot_id|
      
      >
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      PARAMETER 
      
      stop 
      
      <|reserved_special_token
     
     
    
    

    
    
     
     
    
    
    
    
     
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      # 
      
      set the model template
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      TEMPLATE 
      
      ""
      
      "
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      {{ if .System }}<|start_header_id|>system<|end_header_id|>
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      {{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      {{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      {{ .Response }}<|eot_id|>
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      "
      
      ""
     
     
    
    

    
    
     
     
    
    
    
    
     
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      # 
      
      set the system message
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      SYSTEM You 
      
      are llama
      
      3 
      
      from Meta, customized 
      
      and hosted @ HY
      
      's Blog (https://blog.yanghong.dev).
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      # set Chinese lora support
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      ADAPTER /root/.ollama/models/lora/ggml-adapter-model.bin

首先，通过命令行进入 ollama container：

docker exec -it ollama /bin/bash

然后使用我们刚才生成的 llama3.modelfile 来创建自己的中文版 Llama3，命名为 llama3:hy。

ollama create llama3:hy -f llama3.modelfile

创建成功后，运行如下的命令，则可以看到刚才创建的 llama3:hy 模型。

ollama list

执行如下的命令来运行 llama3:hy 模型，接下来我们就可以愉快地跟它进行中文对话了。

ollama run llama3:hy

我们同样问它本文开头的那两个问题：”你是谁？“，”介绍一下中国的首都北京“，看看它的表现如何。

你是谁？


  
  
    
    
     
     
    
    
    
    
     
     
      
      >>> 你是谁？
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      我是一個 AI 型Chatbot，来自 Meta，目前被Customized 和 Hosted 在 HY 的博客（https:
      
      /
      
      /blog.yanghong.dev）。简单地说，我是一個能與人互動、回答問題和生成文本的 AI 机器人。

介绍一下中上海的旅游景点：


  
  
    
    
     
     
    
    
    
    
     
     
      
      >>> 介绍一下上海的旅游景点
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      上海是一座流行的旅游目的地，拥有悠久历史、多样文化和现代建筑。以下是上海一些最受欢迎的旅游景点：
     
     
    
    

    
    
     
     
    
    
    
    
     
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      1. 外滨路（The Bund）：黄浦江沿岸的一带，提供城市天际线的壮丽景象。此外，这里也有一些历史古楼、餐厅和酒吧。
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      2. 上海塔（Shanghai Tower）：这是世界最高建筑之一，高度
      
      632米。从
      
      124层观光台可以欣赏到绕城全景。
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      3. 豫园（Yu Garden）：明代的中国古典花园，known 
      
      for its beautiful architecture, lush greenery 
      
      and peaceful atmosphere.
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      4. 法租区（French Concession）：一带拥有法国影响的街区，featuring colonial-era buildings, trendy boutiques和hip restaurants。
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      5. 新天地（Xintiandi）：历史区域被转换为时尚购物和美食街区，known 
      
      for its cobblestone streets, traditional Shikumen houses和vibrant nightlife.
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      6. 上海博物馆（Shanghai Museum）：展示城市历史、艺术和文化的综合博物馆，有超过
      
      10,000件展品。
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      7. 人民广场（People
      
      's Square）：一个大型公共广场，surrounded by iconic buildings包括上海大剧院、上海市政厅和中国革命博物馆。
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      8. 东方明珠塔（Oriental Pearl Tower）：一座电视塔，提供城市全景观从263层观光台。
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      9. 朱家角水乡（Zhujiajiao Water Town）：一个古老的水镇位于上海外围，known for its canals, bridges和traditional architecture.
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      10. 上海迪士尼度假 resort（Disneyland Resort Shanghai）：一个世界一流的主题公园，featuring Disney, Pixar, Marvel和Star Wars attractions.
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      这些只是上海旅游景点之一。无论你是否感兴趣历史、文化、美食或娱乐，这座城市都有足够的选择！

可以看到，它基本上能以中文回答问题，我们的 LoRA fine tuning 过程成功完成了。

为中文 Llama3 添加 WebUI

Ollama 提供了 REST API 来跟 LLM 模型进行交互，比如最常用的 Generate，Chat 等方法。

Generate a response


  
  
    
    
     
     
    
    
    
    
     
     
      
      curl http:
      
      /
      
      /localhost:
      
      11434
      
      /api
      
      /
      
      generate -d 
      
      '{
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
        "model": "llama3",
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
        "prompt":"Why is the sky blue?"
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      }'

Chat with a model


  
  
    
    
     
     
    
    
    
    
     
     
      
      curl http:
      
      //localhost:11434/api/chat -d '{
     
     
    
    

    
    
     
     
    
    
    
    
     
       
      
      "model": 
      
      "llama3",
     
     
    
    

    
    
     
     
    
    
    
    
     
       
      
      "messages": [
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
          { 
      
      "role": 
      
      "user", 
      
      "content": 
      
      "why is the sky blue?" }
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
        ]
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
      }
      
      '

完整的 REST API 文档可参阅 github 。

因此，我们可以开发一个简单的 gradio 程序，通过 REST API 调用 llama3:hy 模型来进行中文交互。参考代码片段如下：


  
  
    
    
     
     
    
    
    
    
     
     
      
              response 
      
      = requests.post(
      
      'http://192.168.3.204:11434/api/generate',
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
                  json
      
      ={
     
     
    
    

    
    
     
     
    
    
    
    
     
                     
      
      'model': model
      
      2endpoint[model_name],
     
     
    
    

    
    
     
     
    
    
    
    
     
                     
      
      'prompt': prompt,
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
                      #
      
      'context': context,
     
     
    
    

    
    
     
     
    
    
    
    
     
                     
      
      'options': {
     
     
    
    

    
    
     
     
    
    
    
    
     
                         
      
      'top_k': 
      
      top_k,
     
     
    
    

    
    
     
     
    
    
    
    
     
                         
      
      'temperature': 
      
      top_p,
     
     
    
    

    
    
     
     
    
    
    
    
     
                         
      
      'top_p': temperature
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
                      }
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
                  },
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
                  stream
      
      =
      
      True
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
              )
     
     
    
    

    
    
     
     
    
    
    
    
     
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
              yield 
      
      "", history, user_message, 
      
      ""
     
     
    
    

    
    
     
     
    
    
    
    
     
             
      
      output 
      
      = 
      
      ""
     
     
    
    

    
    
     
     
    
    
    
    
     
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
              # Check 
      
      if the request was successful
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
              response.
      
      raise_
      
      for_
      
      status()
     
     
    
    

    
    
     
     
    
    
    
    
     
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
              # 
      
      Initialize the 
      
      output 
      
      and history variables
     
     
    
    

    
    
     
     
    
    
    
    
     
             
      
      output 
      
      = 
      
      ""
     
     
    
    

    
    
     
     
    
    
    
    
     
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
              # Iterate over the streamed response 
      
      lines
     
     
    
    

    
    
     
     
    
    
    
    
     
             
      
      for idx, 
      
      line 
      
      in enumerate(response.iter_
      
      lines()):
     
     
    
    

    
    
     
     
    
    
    
    
     
                 
      
      if 
      
      line:
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
                      # Parse the 
      
      line 
      
      as JSON
     
     
    
    

    
    
     
     
    
    
    
    
     
                     
      
      data 
      
      = json.loads(
      
      line)
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
                      token 
      
      = 
      
      data.
      
      get(
      
      'response', 
      
      '')  # Assuming 
      
      'response' 
      
      contains the text
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
                      # Check 
      
      if the token 
      
      is a special token
     
     
    
    

    
    
     
     
    
    
    
    
     
                     
      
      if 
      
      data.
      
      get(
      
      'special', 
      
      False):
     
     
    
    

    
    
     
     
    
    
    
    
     
                         
      
      continue
     
     
    
    

    
    
     
     
    
    
    
    
     
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
                      # Append the token 
      
      to the 
      
      output
     
     
    
    

    
    
     
     
    
    
    
    
     
                     
      
      output 
      
      +
      
      = token
     
     
    
    

    
    
     
     
    
    
    
    
     
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
                      # Update history 
      
      and chat 
      
      based 
      
      on the 
      
      index
     
     
    
    

    
    
     
     
    
    
    
    
     
                     
      
      if idx 
      
      =
      
      = 
      
      0:
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
                          history.append(
      
      output.strip())  # Append 
      
      initial 
      
      output
     
     
    
    

    
    
     
     
    
    
    
    
     
                     
      
      else:
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
                          history[-
      
      1] 
      
      = 
      
      output.strip()  # Update the 
      
      last history entry
     
     
    
    

    
    
     
     
    
    
    
    
     
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
                      # Convert history 
      
      to chat 
      
      format
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
                      chat 
      
      = [
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
                          (history[i], history[i 
      
      + 
      
      1]) 
      
      if i 
      
      + 
      
      1 
      
      < len(history) 
      
      else (history[i], 
      
      "")
     
     
    
    

    
    
     
     
    
    
    
    
     
                         
      
      for i 
      
      in range(
      
      0, len(history), 
      
      2)
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
                      ]
     
     
    
    

    
    
     
     
    
    
    
    
     
      
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
                  # Yield the current chat, history, 
      
      and user message updates
     
     
    
    

    
    
     
     
    
    
    
    
     
     
      
                  yield chat, history, user_message, 
      
      ""

这样就可以通过上面的 WebUI 界面让它进行中文对话，问问题，帮我写代码了。同时也整合了其它的一些 coding LLM 在一起，碰到不会写的代码，就让它们在一起比比武。

以下视频是 fine tune 的中文 llama3 实际使用演示。

Llama3中文微调本地部署

如果您喜欢本文的内容，欢迎扫描下面的二维码访问作者的博客：HY's Blog

月流霜

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Llama-Factory + Ollama 打造属于自己的中文版 Llama3_llama-factory ollama

Meta 推出 Llama3 也有一小段时间了。Llama3 包含 8B 和 70B 两种参数规模，涵盖预训练和指令调优的变体。和以往的原始 Llama 模型一样，Llama 3 对中文的支持效果欠佳，经常会出现你用中文提问，它用英文或中文+英文回复的现象。例如当我问它：”你是谁？“，会得到如下的回答：>>> 你是谁？byandrespondtoin然后让它”介绍一下上海的旅游景点“，则会得到如下的回答：>>> 介绍一下上海的旅游景点withandandaresomeof。
复制链接

扫一扫