华为MindIE初体验:Qwen1.5-14B-Chat模型40并发推理实测

最近发现华为NPU的生态里多了一个成员——MindIE,部分解决了大模型推理的问题,下面简要介绍下Mind华为昇腾NPU卡的生态。

1)华为NPU生态新增了MindIE,部分解决了大模型推理问题。

2)华为昇腾NPU与英伟达GPU生态层级对比:

CANN 对应 CUDA

MindSpore 对应 PyTorch

MindFormer 对应 Transformers

MindIE 对应 vLLM

3)MindIE推理性能测试(使用910B4卡):

并发数:40

首token平均延迟:66毫秒

每秒生成token数:约1200

单请求每秒生成token数:约30

模型:Qwen 1.5-14B-Chat

硬件:4卡910B4

测试条件:

测试结论:测试结果显示,MindIE的推理性能基本可以满足生产环境需求

基本概念

首先,在英伟达的生态中,有从底层到上层分别有CUDA、PyTorch、transformers、vLLM等常见库。对应的,在华为的生态中,分别有CANN、MindSpore、MindFormer、MindIE。具体对应关系见下图:

关于MindSpore、MindIE的详细介绍,分别见下面的图与链接:

MindSpore——https://www.mindspore.cn/

MindIE——https://www.hiascend.com/software/mindie

MindIE推理效果

虽然支持的模型不多,但是,得益于910B系列卡的强劲算力,配合MindIE框架做了下并发推理测试,具体结果如下:

可以看到4卡910B4,跑Qwen1.5-14B-Chat模型,在40并发的情况下首token平均延迟为66毫秒每秒token生成数在1200左右单个请求每秒生成token数约为30个,基本可以满足生产环境的需求。

参考资料:华鲲振宇AI最优解/ Ascend-FAQ的gitee: https://gitee.com/HKZY-FAE/ascend-faq/wikis

转自:华为MindIE初体验:Qwen1.5-14B-Chat模型40并发推理实测-CSDN博客 

### Huawei Qwen2.5-VL-7B-Instruct Model Information The Huawei Qwen2.5-VL-7B-Instruct model is a large multimodal language model designed to understand and generate text based on both textual and visual inputs. This model integrates advanced capabilities from natural language processing (NLP) and computer vision, enabling it to interpret features across different modalities effectively[^1]. #### Key Characteristics of the Model - **Architecture**: The architecture leverages transformer-based models optimized for handling multi-modal data, combining deep learning techniques with efficient computational strategies. - **Training Data**: Trained on extensive datasets that include paired image-text examples, ensuring robust performance when dealing with diverse content types. - **Parameter Size**: With approximately 7 billion parameters, this version strikes a balance between complexity and efficiency while maintaining high accuracy. #### Usage Examples To utilize the Huawei Qwen2.5-VL-7B-Instruct model within Python applications or research projects, one can employ libraries such as Hugging Face's `transformers`. Below demonstrates how to load and interact with the model using PyTorch: ```python from transformers import AutoModelForVision2Seq, AutoProcessor processor = AutoProcessor.from_pretrained("huawei/Qwen2.5-VL-7B-Instruct") model = AutoModelForVision2Seq.from_pretrained("huawei/Qwen2.5-VL-7B-Instruct") image_path = "path_to_image.jpg" text_input = "Describe what you see." inputs = processor(image=image_path, text=text_input, return_tensors="pt") outputs = model.generate(**inputs) print(processor.decode(outputs[0], skip_special_tokens=True)) ``` This code snippet initializes the necessary components for interacting with images and texts through the specified pretrained weights provided by Huawei under their designated repository name.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值