探索 360 智脑3：人工智能新时代的领航者

本文链接：https://blog.csdn.net/weixin_41446370/article/details/147906303

在这里插入图片描述

在科技飞速发展的今天，人工智能（AI）已成为推动各行业变革的核心力量。其中，大语言模型作为 AI 领域的关键突破，正深刻改变着人们与机器交互、获取信息以及解决问题的方式。360 智脑，作为奇虎 360 公司精心打造的大型语言模型，凭借其卓越的性能、丰富的功能和广泛的应用场景，在众多同类产品中脱颖而出，成为人工智能领域的一颗璀璨明星。

介绍

🎉🎉🎉 近日，360集团开源并升级了其自研的7B参数模型360智脑3-7B。目前该模型已在Github开源社区360zhinao3上线，可免费商用。该模型能力全面提升，相比10B以下小参数模型，360智脑3-7B在多个基准测试中取得了第一名的优异表现。

360Zhinao3-7B
360Zhinao3-7B-Instruct
360Zhinao3-7B-O1.5

我们360Zhinao3模型的显著特点是：

360Zhinao3-7B是在360Zhinao2-7B的基础上，使用7000亿高质量token进行持续预训练的。这两个模型的结构完全相同。模型性能的提升主要源于训练数据质量的提高。

下载链接

尺寸	模型	BF16
7B	360Zhinao3-7B	🤗
7B	360Zhinao3-7B-Instruct	🤗
7B	360Zhinao3-7B-O1.5	🤗

模型评估

基础模型

我们使用开源工具 OpenCompass 对模型进行了多维度的评估。该模型的基准平均得分在参数少于 100 亿的模型中排名第一。在同规模模型中具有竞争力。

Type	Datasets	language	glm4-9b	Qwen2.5-7B	internlm2.5-7b	Yi1.5-9B	gemma2-9b	Llama3.1-8B	360Zhinao2-7B	360Zhinao3-7B
Exam	ceval	zh	75.83	81.41	77.71	73.51	56.36	51.67	83.04	84.7
	mmlu	en	75.5	75.5	71.55	71.43	72.22	66.75	67.84	75.42
	cmmlu	zh	74.24	81.79	78.77	74.2	58.89	52.49	73.8	82.17
	ARC-c	en	94.92	80	85.08	87.46	77.63	80.68	87.12	88.14
	ARC-e	en	98.41	84.83	95.24	94.53	78.84	89.77	92.77	94
Language	WiC	en	51.57	52.82	50.78	50.63	50.47	50	49.84	50.31
Language	WSC	en	68.27	68.27	69.23	66.35	68.27	67.31	65.38	71.15
Knowledge	BoolQ	en	81.8	83.88	89.51	84.46	85.6	82.2	88.29	88.38
Knowledge	commonsense_qa	en	71.17	73.22	68.55	71.58	68.47	71.25	69.78	71.33
Understanding	C3	zh	91.51	92	93.04	85.86	81.64	83.51	93.26	92.77
	race-middle	en	91.99	91.02	92.06	91.16	88.09	81.69	90.46	90.04
	race-high	en	90.71	87.91	90.08	88.34	82.08	78.73	86.74	85.96
	lcsts	zh	18.29	15.82	15.96	16.49	10.62	17.29	18.61	18.85
	eprstmt-dev	zh	91.88	86.88	91.25	91.88	48.12	83.12	90	92.50
	lambada	en	71.67	71.14	69.98	70.64	75.43	74.23	72.56	68.17
Reasoning	hellaswag	en	70.25	72.76	70.38	71.55	66.83	74.65	71.49	73.61
	siqa	en	81.73	72.52	78.97	76.2	58.96	64.18	77.12	79.02
	bbh	en	73.68	54.63	59.43	67.86	68.45	59.9	46.54	73.74
Code	humaneval	en	69.51	75	60.37	26.22	5.49	27.44	60.98	64.63
Code	mbpp	en	60	60	43.6	56.8	51.2	42.6	54	67.80
Math	math	en	26.86	38	27.14	27.06	28.52	15.32	38.34	37.60
Math	gsm8k	en	78.54	79.76	52.54	71.11	73.09	56.25	75.51	78.77
Overall	avg_zh		70.35	71.58	71.35	68.39	51.13	57.62	71.74	74.20
Overall	avg_all		73.11	71.78	69.60	68.88	61.60	62.32	70.61	74.83

指导模型

我们已经在三个流行的评估中对360Zhinao3-7B-Instruct模型进行了评估和比较：IFEval、MT-bench和CF-Bench。MT-bench和CFBench在同级别的开源模型中均排名第一，具有很强的竞争力。在IFEval（严格提示）中，它仅次于glm4-9b，并且在7B尺寸中得分最高。

Model	MT-bench	IFEval(strict prompt)	CFBench(CSR,ISR,PSR)
Qwen2.5-7B-Instruct	8.07	0.556	0.81	0.46	0.57
Yi-9B-16k-Chat	7.44	0.455	0.75	0.4	0.52
GLM4-9B-Chat	8.08	0.634	0.82	0.48	0.61
InternLM2.5-7B-Chat	7.39	0.540	0.78	0.4	0.54
360Zhinao2-7B-Chat-4k	7.86	0.577	0.8	0.44	0.57
360Zhinao3-7B-Instruct	8.17	0.626	0.83	0.52	0.64

长链思维模型

我们使用了之前开源的Light-R1方法对360智脑3-7B-Instruct的长链思维模型进行了继续微调，同时也对RFT和GRPO进行了优化。虽然与最新的OpenThinker2-7B相比仍有一定差距，但它在基于通用Qwen2.5-7B-Instruct的所有先前模型上表现更优。

Model	Date	Base Model	AIME24	AIME25	GPQA Diamond
OpenThinker2-7B	25.4.3	Qwen2.5-7B-Instruct	50	33.3	49.3
OpenThinker-7B	25.1.28	Qwen2.5-7B-Instruct	31.3	23.3	42.4
360Zhinao3-7B-O1.5	25.4.14	360Zhinao3-7B-Instruct	54.2	36.3	40.0
OpenR1-Qwen-7B	25.2.11	Qwen2.5-Math-7B-Instruct	48.7	34.7	21.2
DeepSeek-R1-Distill-Qwen-7B	25.1.20	Qwen2.5-Math-7B-Instruct	57.3	33.3	47.3
Light-R1-7B-DS	25.3.12	DeepSeek-R1-Distill-Qwen-7B	59.1	44.3	49.4
Areal-boba-RL-7B	25.3.31	DeepSeek-R1-Distill-Qwen-7B	61.9	48.3	47.6

快速入门

一个简单的示例，展示如何快速使用360Zhinao3-7B、360Zhinao3-7B-Instruct和360Zhinao3-7B-O1.5与🤗Transformers

🤗 Transformers

基础模型推理演示

from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.generation import GenerationConfig

MODEL_NAME_OR_PATH = "qihoo360/360Zhinao3-7B"

tokenizer = AutoTokenizer.from_pretrained(
    MODEL_NAME_OR_PATH, 
    trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True).cuda()

generation_config = GenerationConfig.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)
generation_config.max_new_tokens = 1024

inputs = tokenizer('中国二十四节气\n1. 立春\n2. 雨水\n3. 惊蛰\n4. 春分\n5. 清明\n', return_tensors='pt')
inputs = inputs.to(model.device)

pred = model.generate(input_ids=inputs["input_ids"], generation_config=generation_config)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))

指令模型推理演示

from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.generation import GenerationConfig

MODEL_NAME_OR_PATH = "qihoo360/360Zhinao3-7B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True).cuda()

generation_config = GenerationConfig.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)
generation_config.max_new_tokens = 2048

messages = []

#round-1
print(f"user: 简单介绍一下刘德华")
messages.append({"role": "user", "content": "简单介绍一下刘德华"})
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
pred = model.generate(input_ids=input_ids, generation_config=generation_config)
response = tokenizer.decode(pred.cpu()[0][len(input_ids[0]):], skip_special_tokens=True)
messages.append({"role": "assistant", "content": response})
print(f"gpt: {response}")


#round-1
print(f"user: 他有什么代表作?")
messages.append({"role": "user", "content": "他有什么代表作?"})
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
pred = model.generate(input_ids=input_ids, generation_config=generation_config)
response = tokenizer.decode(pred.cpu()[0][len(input_ids[0]):], skip_special_tokens=True)
messages.append({"role": "assistant", "content": response})
print(f"gpt: {response}")

长链模型推理演示

import re
import json
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.generation import GenerationConfig

MODEL_NAME_OR_PATH = "qihoo360/360Zhinao3-7B-O1.5"

tokenizer = AutoTokenizer.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True).cuda()

generation_config = GenerationConfig.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)
generation_config.max_new_tokens = 2048


def extract_thinking_and_answer(input_string):
    thinking, answer = "", ""
    # 提取答案
    pattern_answer = r'.*</think>(.*)$'
    match_answer = re.search(pattern_answer, input_string, re.S)
    if match_answer:
        answer = match_answer.group(1)
    else:
        return thinking, input_string

    # 提取思考过程
    pattern_thinking = r'<think>(.*?)</think>'
    match_thinking = re.search(pattern_thinking, input_string, re.S)
    if match_thinking:
        thinking = match_thinking.group(1)

    return thinking, answer


messages = []
messages.append({"role": "user", "content": "现有一笼子，里面有鸡和兔子若干只，数一数，共有头14个，腿38条，求鸡和兔子各有多少只？"})
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
pred = model.generate(input_ids=input_ids, generation_config=generation_config)
response = tokenizer.decode(pred.cpu()[0][len(input_ids[0]):], skip_special_tokens=True)
thinking, answer = extract_thinking_and_answer(response)
messages.append({"role": "assistant", "content": answer, "reasoning_content": thinking})
print(json.dumps(messages, ensure_ascii=False, indent=4))

模型推理

部署

vLLM 安装

我们推荐使用 vllm==0.6.0。

如果您正在使用 CUDA 12.1 和 PyTorch 2.1，您可以直接通过以下命令安装 vLLM：

pip install  vllm==0.6.0

否则，请参考官方的 vLLM 安装指南。

安装完成后，请执行以下步骤：

将 vllm/zhinao.py 复制到你的 vLLM 安装目录（在 python/conda 环境中）的 vllm/model_executor/models 文件夹中。
然后在 vllm/model_executor/models/__init__.py 文件中添加一行：
```
"ZhinaoForCausalLM": ("zhinao", "ZhinaoForCausalLM"),
```

vLLM 服务启动

启动服务：

python -m vllm.entrypoints.openai.api_server \
    --model qihoo360/360Zhinao3-7B-O1.5 \
    --served-model-name 360Zhinao3-7B-O1.5 \
    --port 8360 \
    --host 0.0.0.0 \
    --dtype bfloat16 \
    --tensor-parallel-size 4 \
    --gpu-memory-utilization 0.8 \
    --trust-remote-code

使用 curl 请求服务：

curl http://localhost:8360/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
    "model": "360Zhinao3-7B-O1.5",
    "max_tokens": 200,
    "top_k": -1,
    "top_p": 0.8,
    "temperature": 1.0,
    "presence_penalty": 0.0,
    "frequency_penalty": 0.0,
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "你好"}
    ],
    "stop": [
        "<eod>",
        "<|im_end|>",
        "<|im_start|>"
    ]
}'

使用 Python 请求服务：

from openai import OpenAI
# Set OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8360/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

chat_response = client.chat.completions.create(
    model="360Zhinao3-7B-O1.5",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "你好"},
    ],
    stop=[
        "<eod>",
        "<|im_end|>",
        "<|im_start|>"
    ],
    presence_penalty=0.0,
    frequency_penalty=0.0
)
print("Chat response:", chat_response)

如果您需要启用重复惩罚，我们建议设置 presence_penalty 和 frequency_penalty，而不是 repetition_penalty。

模型微调

训练数据

训练数据：data/training_data_sample.json。此示例数据从multiturn_chat_0.8M中抽取了10,000行，并进行了格式转换。

数据格式：

[
  {
    "id": 1,
    "conversations": [
        {
            "from": "system",
            "value": "You are a helpful assistant."
        },
        {
            "from": "user",
            "value": "您好啊"
        },
        {
            "from": "assistant",
            "value": "你好！我今天能为您做些什么？有什么问题或需要帮助吗? 我在这里为您提供服务。"
        }
    ]
  }
]

微调脚本

set -x

HOSTFILE=hostfile
DS_CONFIG=./finetune/ds_config_zero2.json

# PARAMS
LR=5e-6
EPOCHS=3
MAX_LEN=32768
BATCH_SIZE=4
NUM_NODES=1
NUM_GPUS=8
MASTER_PORT=29500

IS_CONCAT=False # Whether to concatenate to maximum length (MAX_LEN)

DATA_PATH="./data/training_data_sample.json"
MODEL_PATH="qihoo360/360Zhinao3-7B-Instruct"
OUTPUT_DIR="./outputs/"

deepspeed --hostfile ${HOSTFILE} \
        --master_port ${MASTER_PORT} \
        --num_nodes ${NUM_NODES} \
        --num_gpus ${NUM_GPUS} \
        finetune.py \
        --report_to "tensorboard" \
        --data_path ${DATA_PATH} \
        --model_name_or_path ${MODEL_PATH} \
        --output_dir ${OUTPUT_DIR} \
        --model_max_length ${MAX_LEN} \
        --num_train_epochs ${EPOCHS} \
        --per_device_train_batch_size ${BATCH_SIZE} \
        --gradient_accumulation_steps 1 \
        --save_strategy steps \
        --save_steps 200 \
        --learning_rate ${LR} \
        --lr_scheduler_type cosine \
        --adam_beta1 0.9 \
        --adam_beta2 0.95 \
        --adam_epsilon 1e-8 \
        --max_grad_norm 1.0 \
        --weight_decay 0.1 \
        --warmup_ratio 0.01 \
        --gradient_checkpointing True \
        --bf16 True \
        --tf32 True \
        --deepspeed ${DS_CONFIG} \
        --is_concat ${IS_CONCAT} \
        --logging_steps 1 \
        --log_on_each_node False