2025大模型MiniCPM-o 2.6部署 & 测试,记录我踩过的坑坑,ImportError: flash_attn. Run `pip install flash_attn`

MiniCPM-o 是从 MiniCPM-V 升级的最新端侧多模态大模型系列。该系列模型可以以端到端方式,接受图像、视频、文本、音频作为输入,并生成高质量文本和语音输出。

代码:https://github.com/OpenBMB/MiniCPM-o

模型:https://huggingface.co/openbmb/MiniCPM-o-2_6 

1. 搭建环境。

# 创建虚拟环境
conda create -n minicpm-o python=3.10

conda activate minicpm-o

# 安装pytorch
conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=12.1 -c pytorch -c nvidia

git clone https://github.com/OpenBMB/MiniCPM-o.git

cd MiniCPM-o

pip install -r requirements_o2.6.txt

cd web_demos/minicpm-o_2.6

# 启动服务

python chatbot_web_demo_o2.6.py

2. 测试效果。

OCR效果牛逼。

如何避免 flash-attn2 强制依赖

模型初始化阶段如果不指定attn_implementation='flash_attention_2',模型推理是不依赖于 flash-attn 库的,但是由于 transformers 的动态导入机制在 4.44.2 版本以前有一些问题,会触发这个报错,这个问题在 transformers==4.45.0 被修复了,具体可参考这个 PR

ImportError: This modeling file requires the following packages that were not found in your environment: flash_attn. Run `pip install flash_attn`

  transformer <= 4.44.2 版本的解决办法可以在模型初始化的时候加上这些代码

from transformers.dynamic_module_utils import *
from unittest.mock import patch

def fixed_get_imports(filename: Union[str, os.PathLike]) -> List[str]:
    """
    Extracts all the libraries (not relative imports this time) that are imported in a file.

    Args:
        filename (`str` or `os.PathLike`): The module file to inspect.

    Returns:
        `List[str]`: The list of all packages required to use the input module.
    """
    with open(filename, "r", encoding="utf-8") as f:
        content = f.read()

    # filter out try/except block so in custom code we can have try/except imports
    content = re.sub(r"\s*try\s*:\s*.*?\s*except\s*.*?:", "", content, flags=re.MULTILINE | re.DOTALL)
    
    # filter out imports under is_flash_attn_2_available block for avoid import issues in cpu only environment
    content = re.sub(
        r"if is_flash_attn[a-zA-Z0-9_]+available\(\):\s*(from flash_attn\s*.*\s*)+", "", content, flags=re.MULTILINE
    )

    # Imports of the form `import xxx`
    imports = re.findall(r"^\s*import\s+(\S+)\s*$", content, flags=re.MULTILINE)
    # Imports of the form `from xxx import yyy`
    imports += re.findall(r"^\s*from\s+(\S+)\s+import", content, flags=re.MULTILINE)
    # Only keep the top-level module
    imports = [imp.split(".")[0] for imp in imports if not imp.startswith(".")]
    
    return list(set(imports))


with patch("transformers.dynamic_module_utils.get_imports", fixed_get_imports):
    model = AutoModel.from_pretrained(
        'openbmb/MiniCPM-o-2_6', torch_dtype=torch.bfloat16,
        trust_remote_code=True,
        attn_implementation='sdpa',
        init_vision=True,
        init_audio=True,
        init_tts=True
    )

  int4 的推理同理,可以替换 get_imports 函数后进行模型初始化,即可绕开 flash-attn 的安装

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值