https://github.com/InternLM/Tutorial/blob/camp3/docs/L1/Demo/easy_readme.mdhttps://github.com/InternLM/Tutorial/blob/camp3/docs/L1/Demo/easy_readme.md
cli demo部署
cli_demo.py文件里复制的代码,加上注释,方便理解
import torch # 导入PyTorch库,一个广泛使用的深度学习框架
from transformers import AutoTokenizer, AutoModelForCausalLM # 从transformers库导入AutoTokenizer和AutoModelForCausalLM
# 指定预训练模型的路径
model_name_or_path = "/root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b"
# 使用AutoTokenizer加载预训练的分词器,并设置trust_remote_code为True以信任远程代码,device_map指定使用CUDA设备
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=True, device_map='cuda:0')
# 使用AutoModelForCausalLM加载预训练的因果语言模型,设置torch_dtype为bfloat16以节省内存,同样设置trust_remote_code和device_map
model = AutoModelForCausalLM.from_pretrained(model_name_or_path, trust_remote_code=True, torch_dtype=torch.bfloat16, device_map='cuda:0')
model = model.eval() # 将模型设置为评估模式
# 定义系统提示,描述AI助手的身份和功能
system_prompt = """You are an AI assistant whose name is InternLM (书生·浦语).
- InternLM (书生·浦语) is a conversational language model that is developed by Shanghai AI Laboratory (上海人工智能实验室). It is designed to be helpful, honest, and harmless.
- InternLM (书生·浦语) can understand and communicate fluently in the language chosen by the user such as English and 中文.
"""
# 初始化消息列表,包含系统提示和空字符串作为初始输入
messages = [(system_prompt, '')]
# 打印欢迎信息
print("=============Welcome to InternLM chatbot, type 'exit' to exit.=============")
# 进入无限循环,等待用户输入
while True:
input_text = input("\nUser >>> ") # 获取用户输入
input_text = input_text.replace(' ', '') # 移除输入中的空格
if input_text == "exit": # 如果用户输入“exit”,则退出循环
break
length = 0 # 初始化长度变量
# 使用模型的stream_chat方法进行流式聊天,传入分词器、用户输入和消息列表
for response, _ in model.stream_chat(tokenizer, input_text, messages):
if response is not None: # 如果响应不为空
print(response[length:], flush=True, end="") # 打印响应,并更新长度变量
length = len(response)
讲一个300字的小故事
Streamlit Web Demo 部署
映射端口
LMDeploy 部署
(/root/share/pre_envs/icamp3_demo) root@intern-studio-50141768:~/demo#
(/root/share/pre_envs/icamp3_demo) root@intern-studio-50141768:~/demo# conda activate /root/share/pre_envs/icamp3_demo
(/root/share/pre_envs/icamp3_demo) root@intern-studio-50141768:~/demo# lmdeploy serve gradio /share/new_models/Shanghai_AI_Laboratory/internlm-xcomposer2-vl-1_8b --cache-max-entry-count 0.1
You are using a model of type internlmxcomposer2 to instantiate a model of type internlm. This is not supported for all configurations of models and can yield errors.
You are using a model of type internlmxcomposer2 to instantiate a model of type internlm. This is not supported for all configurations of models and can yield errors.
You are using a model of type internlmxcomposer2 to instantiate a model of type internlm. This is not supported for all configurations of models and can yield errors.
2024-08-08 09:51:42,414 - lmdeploy - INFO - matching vision model: Xcomposer2VisionModel
Set max length to 4096
config.json: 4.76kB [00:00, 28.5MB/s]
Could not locate the ixc_utils.py inside /share/new_models/Shanghai_AI_Laboratory/internlm-xcomposer2-vl-1_8b.
Could not locate the ixc_utils.py inside /share/new_models/Shanghai_AI_Laboratory/internlm-xcomposer2-vl-1_8b.
2024-08-08 09:52:00,231 - lmdeploy - INFO - matching type of ModelType.XCOMPOSER2
2024-08-08 09:52:27,284 - lmdeploy - INFO - input backend=turbomind, backend_config=TurbomindEngineConfig(model_name=None, model_format=None, tp=1, session_len=8192, max_batch_size=128, cache_max_entry_count=0.1, cache_block_seq_len=64, enable_prefix_caching=False, quant_policy=0, rope_scaling_factor=0.0, use_logn_attn=False, download_dir=None, revision=None, max_prefill_token_num=8192, num_tokens_per_iter=0, max_prefill_iters=1)
2024-08-08 09:52:27,284 - lmdeploy - INFO - input chat_template_config=ChatTemplateConfig(model_name=None, system=None, meta_instruction=None, eosys=None, user=None, eoh=None, assistant=None, eoa=None, separator=None, capability='chat', stop_words=None)
2024-08-08 09:52:27,317 - lmdeploy - INFO - updated chat_template_onfig=ChatTemplateConfig(model_name='internlm-xcomposer2', system=None, meta_instruction=None, eosys=None, user=None, eoh=None, assistant=None, eoa=None, separator=None, capability='chat', stop_words=None)
2024-08-08 09:52:27,317 - lmdeploy - INFO - model_source: hf_model
You are using a model of type internlmxcomposer2 to instantiate a model of type internlm. This is not supported for all configurations of models and can yield errors.
You are using a model of type internlmxcomposer2 to instantiate a model of type internlm. This is not supported for all configurations of models and can yield errors.
You are using a model of type internlmxcomposer2 to instantiate a model of type internlm. This is not supported for all configurations of models and can yield errors.
You are using a model of type internlmxcomposer2 to instantiate a model of type internlm. This is not supported for all configurations of models and can yield errors.
You are using a model of type internlmxcomposer2 to instantiate a model of type internlm. This is not supported for all configurations of models and can yield errors.
You are using a model of type internlmxcomposer2 to instantiate a model of type internlm. This is not supported for all configurations of models and can yield errors.
Could not locate the ixc_utils.py inside /share/new_models/Shanghai_AI_Laboratory/internlm-xcomposer2-vl-1_8b.
Could not locate the ixc_utils.py inside /share/new_models/Shanghai_AI_Laboratory/internlm-xcomposer2-vl-1_8b.
Could not locate the ixc_utils.py inside /share/new_models/Shanghai_AI_Laboratory/internlm-xcomposer2-vl-1_8b.
Could not locate the ixc_utils.py inside /share/new_models/Shanghai_AI_Laboratory/internlm-xcomposer2-vl-1_8b.
2024-08-08 09:52:32,391 - lmdeploy - INFO - model_config:
[llama]
model_name = internlm-xcomposer2
model_arch = InternLMXComposer2ForCausalLM
tensor_para_size = 1
head_num = 16
kv_head_num = 8
vocab_size = 92544
num_layer = 24
inter_size = 8192
norm_eps = 1e-05
attn_bias = 0
start_id = 1
end_id = 2
session_len = 8192
weight_type = bf16
rotary_embedding = 128
rope_theta = 1000000.0
size_per_head = 128
group_size = 0
max_batch_size = 128
max_context_token_num = 1
step_length = 1
cache_max_entry_count = 0.1
cache_block_seq_len = 64
cache_chunk_size = -1
enable_prefix_caching = False
num_tokens_per_iter = 8192
max_prefill_iters = 1
extra_tokens_per_iter = 0
use_context_fmha = 1
quant_policy = 0
max_position_embeddings = 32768
rope_scaling_factor = 2.0
use_dynamic_ntk = 1
use_logn_attn = 0
lora_policy = plora
lora_r = 256
lora_scale = 1.0
lora_max_wo_r = 256
lora_rank_pattern =
lora_scale_pattern =
[TM][WARNING] [LlamaTritonModel] `max_context_token_num` = 8192.
2024-08-08 09:52:34,162 - lmdeploy - WARNING - get 411 model params
2024-08-08 09:52:53,356 - lmdeploy - INFO - updated backend_config=TurbomindEngineConfig(model_name=None, model_format=None, tp=1, session_len=8192, max_batch_size=128, cache_max_entry_count=0.1, cache_block_seq_len=64, enable_prefix_caching=False, quant_policy=0, rope_scaling_factor=0.0, use_logn_attn=False, download_dir=None, revision=None, max_prefill_token_num=8192, num_tokens_per_iter=0, max_prefill_iters=1)
[WARNING] gemm_config.in is not found; using default GEMM algo
[TM][INFO] NCCL group_id = 0
[TM][INFO] [BlockManager] block_size = 6 MB
[TM][INFO] [BlockManager] max_block_count = 49
[TM][INFO] [BlockManager] chunk_size = 49
[TM][WARNING] No enough blocks for `session_len` (8192), `session_len` truncated to 3136.
[TM][INFO] LlamaBatch<T>::Start()
Running on local URL: http://0.0.0.0:6006
Could not create share link. Missing file: /root/share/pre_envs/icamp3_demo/lib/python3.10/site-packages/gradio/frpc_linux_amd64_v0.2.
Please check your internet connection. This can happen if your antivirus software blocks the download of this file. You can install manually by following these steps:
1. Download this file: https://cdn-media.huggingface.co/frpc-gradio-0.2/frpc_linux_amd64
2. Rename the downloaded file to: frpc_linux_amd64_v0.2
3. Move the file to this location: /root/share/pre_envs/icamp3_demo/lib/python3.10/site-packages/gradio
2024-08-08 10:08:40,222 - lmdeploy - INFO - prompt: ('图中有什么?', [<PIL.Image.Image image mode=RGB size=2550x1390 at 0x7FCA842128C0>])
2024-08-08 10:08:40,222 - lmdeploy - WARNING - Can not found event loop in current thread. Create a new event loop.
2024-08-08 10:08:40,223 - lmdeploy - WARNING - auto append <IMAGE_TOKEN> at the beginning, the user can manually insert the token to prompt
2024-08-08 10:08:40,223 - lmdeploy - INFO - start ImageEncoder._forward_loop
2024-08-08 10:08:40,223 - lmdeploy - INFO - ImageEncoder received 1 images, left 1 images.
2024-08-08 10:08:40,223 - lmdeploy - INFO - ImageEncoder process 1 images, left 0 images.
2024-08-08 10:08:41,943 - lmdeploy - INFO - ImageEncoder forward 1 images, cost 1.720s
2024-08-08 10:08:41,944 - lmdeploy - INFO - ImageEncoder done 1 images, left 0 images.
2024-08-08 10:08:41,946 - lmdeploy - INFO - preprocess cost 1.724s
2024-08-08 10:08:41,946 - lmdeploy - INFO - input_ids: [1, 92543, 9081, 364, 2770, 657, 589, 15358, 17993, 6843, 963, 505, 4576, 11146, 30778, 1234, 20248, 451, 62442, 60752, 60721, 61255, 61104, 4452, 285, 4576, 11146, 30778, 1234, 20248, 451, 62442, 60752, 60721, 61255, 61104, 313, 505, 395, 7445, 17218, 2881, 7659, 1813, 4287, 1762, 560, 505, 8020, 684, 36956, 15358, 31288, 451, 68589, 76659, 71581, 699, 1226, 505, 6342, 442, 517, 11100, 328, 10894, 328, 454, 51978, 756, 285, 4576, 11146, 30778, 1234, 20248, 451, 62442, 60752, 60721, 61255, 61104, 313, 777, 3696, 454, 19187, 19829, 4563, 435, 410, 4287, 12032, 684, 410, 1341, 1893, 569, 6519, 454, 262, 69093, 756, 285, 4576, 11146, 30778, 1234, 20248, 451, 62442, 60752, 60721, 61255, 61104, 313, 505, 13026, 446, 12824, 2613, 454, 27943, 15613, 14644, 13585, 3285, 519, 410, 4054, 2321, 281, 92542, 364, 92543, 1008, 364, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 73037, 69259, 60504, 92542, 364, 92543, 525, 11353, 364]
2024-08-08 10:08:41,946 - lmdeploy - INFO - Register stream callback for 0
[TM][INFO] [forward] Enqueue requests
[TM][INFO] [forward] Wait for requests to complete ...
[TM][INFO] [ProcessInferRequests] Request for 0 received.
[TM][INFO] ------------------------- step = 1370 -------------------------
[TM][INFO] [Forward] [0, 1), dc_bsz = 0, pf_bsz = 1, n_tok = 1371, max_q = 1371, max_k = 1371
[TM][INFO] ------------------------- step = 1380 -------------------------
[TM][INFO] ------------------------- step = 1390 -------------------------
[TM][INFO] [Interrupt] slot = 0, id = 0
[TM][INFO] [forward] Request completed for 0
2024-08-08 10:08:43,270 - lmdeploy - INFO - UN-register stream callback for 0
映射端口