【书生浦语实战】InternVL 多模态模型部署微调实践

今日结果

在这里插入图片描述

未微调的模型效果

测试脚本:

from lmdeploy import pipeline
from lmdeploy.vl import load_image

pipe = pipeline('/root/models/InternVL2-2B')

image = load_image('/root/code/datasets/ex_images/004atEXYgy1gpbifbj0ozj60x70lvdvp02.jpg')
response = pipe(('请你根据这张图片,讲一个脑洞大开的梗', image))
print(response.text)

测试图片:
在这里插入图片描述

测试结果:

说到了一些图片特征,但是描述不是很准确,更不会玩梗。

[prompt]请你根据这张图片,讲一个脑洞大开的梗
[response]这张图片展示了一只打扮成宇航员的猫,正坐在桌前,手里拿着一个生日蛋糕。这是一个非常有趣的梗,通常被称为“宇航员猫”或“宇航员生日”。

这个梗的笑点在于:

1. **装扮**:这只猫穿着一件红色宇航服,上面写着“OMG”,看起来像是一个超级英雄,但却在一个普通的生日场景中。这种反差制造了幽默感。

2. **姿势和表情**:猫的表情看起来既惊讶又有些无奈,好像是在见证一件非常不寻常的事情。这种表情和姿势让人联想到它对生日蛋糕的意外反应。

3. **生日蛋糕**:蛋糕上插着一根生日蜡烛,这进一步强化了生日的主题。猫坐在桌子前,显得非常自然,就像是在参加一个普通的生日派对。

4. **家庭氛围**:这个场景通常出现在家庭照片中,尤其是那些充满温馨和爱的家庭照片。猫在这种场景中显得特别有趣,因为它打破了传统生日庆祝的严肃气氛。

这个梗利用了反差、夸张和家庭氛围,引发了人们的笑点。它不仅展示了猫的好奇心和顽皮,还让人们感受到一种轻松愉快的家庭氛围。

微调识图模型

step 1 数据准备

直接采用了官方提供的数据。其实真实工作中,这才是最重要最辛苦的活,但是课程老师非常非常友好地提供了这个数据,2000条梗图和对应的梗(写这么多梗也不容易吧…)再次感谢🙏!

训练数据存放格式:

  [
  {
    "image": "ex_images/007aPnLRgy1gm8gujsdekj30ci0elaey.jpg",
    "conversations": [
      {
        "from": "human",
        "value": "<image>\n请你根据这张图片,讲一个脑洞大开的梗"
      },
      {
        "from": "gpt",
        "value": "摘下面具后露出了真容"
      }
    ]
  },
  ]

step2: 修改config文件

注意直接就按教程示例改配置就好,菜鸟不要自己画蛇添足!

比如我导入了个os库,结果运行时直接报错“TypeError: cannot pickle ‘module’ object”;开始还以为是环境配置错误,结果后来详细看了报错的代码才发现运行训练脚本是直接deepcopy整个cofig文件的,copy.deepcopy来复制包含模块对象(例如导入os)的数据结构会报错TT…

完整config文件:
(更换掉data_path、data_root、image_folder就好)

# Copyright (c) OpenMMLab. All rights reserved.
from mmengine.hooks import (CheckpointHook, DistSamplerSeedHook, IterTimerHook,
                            LoggerHook, ParamSchedulerHook)
from mmengine.optim import AmpOptimWrapper, CosineAnnealingLR, LinearLR
from peft import LoraConfig
from torch.optim import AdamW
from transformers import AutoTokenizer

from xtuner.dataset import InternVL_V1_5_Dataset
from xtuner.dataset.collate_fns import default_collate_fn
from xtuner.dataset.samplers import LengthGroupedSampler
from xtuner.engine.hooks import DatasetInfoHook
from xtuner.engine.runner import TrainLoop
from xtuner.model import InternVL_V1_5
from xtuner.utils import PROMPT_TEMPLATE

#######################################################################
#                          PART 1  Settings                           #
#######################################################################
# Model
path = '/root/model/InternVL2-2B'

# Data
data_root = '/root/InternLM/datasets/CLoT_cn_2000/'
data_path = data_root + 'ex_cn.json'
image_folder = data_root
prompt_template = PROMPT_TEMPLATE.internlm2_chat
max_length = 6656

# Scheduler & Optimizer
batch_size = 4  # per_device
accumulative_counts = 4
dataloader_num_workers = 4
max_epochs = 6
optim_type = AdamW
# official 1024 -> 4e-5
lr = 2e-5
betas = (0.9, 0.999)
weight_decay = 0.05
max_norm = 1  # grad clip
warmup_ratio = 0.03

# Save
save_steps = 1000
save_total_limit = 1  # Maximum checkpoints to keep (-1 means unlimited)

#######################################################################
#            PART 2  Model & Tokenizer & Image Processor              #
#######################################################################
model = dict(
    type=InternVL_V1_5,
    model_path=path,
    freeze_llm=True,
    freeze_visual_encoder=True,
    quantization_llm=True,  # or False
    quantization_vit=False,  # or True and uncomment visual_encoder_lora
    # comment the following lines if you don't want to use Lora in llm
    llm_lora=dict(
        type=LoraConfig,
        r=128,
        lora_alpha=256,
        lora_dropout=0.05,
        target_modules=None,
        task_type='CAUSAL_LM'),
    # uncomment the following lines if you don't want to use Lora in visual encoder # noqa
    # visual_encoder_lora=dict(
    #     type=LoraConfig, r=64, lora_alpha=16, lora_dropout=0.05,
    #     target_modules=['attn.qkv', 'attn.proj', 'mlp.fc1', 'mlp.fc2'])
)

#######################################################################
#                      PART 3  Dataset & Dataloader                   #
#######################################################################
llava_dataset = dict(
    type=InternVL_V1_5_Dataset,
    model_path=path,
    data_paths=data_path,
    image_folders=image_folder,
    template=prompt_template,
    max_length=max_length)

train_dataloader = dict(
    batch_size=batch_size,
    num_workers=dataloader_num_workers,
    dataset=llava_dataset,
    sampler=dict(
        type=LengthGroupedSampler,
        length_property='modality_length',
        per_device_batch_size=batch_size * accumulative_counts),
    collate_fn=dict(type=default_collate_fn))

#######################################################################
#                    PART 4  Scheduler & Optimizer                    #
#######################################################################
# optimizer
optim_wrapper = dict(
    type=AmpOptimWrapper,
    optimizer=dict(
        type=optim_type, lr=lr, betas=betas, weight_decay=weight_decay),
    clip_grad=dict(max_norm=max_norm, error_if_nonfinite=False),
    accumulative_counts=accumulative_counts,
    loss_scale='dynamic',
    dtype='float16')

# learning policy
# More information: https://github.com/open-mmlab/mmengine/blob/main/docs/en/tutorials/param_scheduler.md  # noqa: E501
param_scheduler = [
    dict(
        type=LinearLR,
        start_factor=1e-5,
        by_epoch=True,
        begin=0,
        end=warmup_ratio * max_epochs,
        convert_to_iter_based=True),
    dict(
        type=CosineAnnealingLR,
        eta_min=0.0,
        by_epoch=True,
        begin=warmup_ratio * max_epochs,
        end=max_epochs,
        convert_to_iter_based=True)
]

# train, val, test setting
train_cfg = dict(type=TrainLoop, max_epochs=max_epochs)

#######################################################################
#                           PART 5  Runtime                           #
#######################################################################
# Log the dialogue periodically during the training process, optional
tokenizer = dict(
    type=AutoTokenizer.from_pretrained,
    pretrained_model_name_or_path=path,
    trust_remote_code=True)

custom_hooks = [
    dict(type=DatasetInfoHook, tokenizer=tokenizer),
]

# configure default hooks
default_hooks = dict(
    # record the time of every iteration.
    timer=dict(type=IterTimerHook),
    # print log every 10 iterations.
    logger=dict(type=LoggerHook, log_metric_by_epoch=False, interval=10),
    # enable the parameter scheduler.
    param_scheduler=dict(type=ParamSchedulerHook),
    # save checkpoint per `save_steps`.
    checkpoint=dict(
        type=CheckpointHook,
        save_optimizer=False,
        by_epoch=False,
        interval=save_steps,
        max_keep_ckpts=save_total_limit),
    # set sampler seed in distributed evrionment.
    sampler_seed=dict(type=DistSamplerSeedHook),
)

# configure environment
env_cfg = dict(
    # whether to enable cudnn benchmark
    cudnn_benchmark=False,
    # set multi process parameters
    mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
    # set distributed parameters
    dist_cfg=dict(backend='nccl'),
)

# set visualizer
visualizer = None

# set log level
log_level = 'INFO'

# load from which checkpoint
load_from = None

# whether to resume training from the loaded checkpoint
resume = False

# Defaults to use random seed and disable `deterministic`
randomness = dict(seed=None, deterministic=False)

# set log processor
log_processor = dict(by_epoch=False)

step3 开始训练

然后就开始训练啦~ 用50% A100训练时间大概是4小时;这里建议运行训练脚本时,把日志存在一个log文件里,不要直接打在控制台上,不然这期间一旦vscode断开链接就看不到训练日志了

训练命令:

cd XTuner

NPROC_PER_NODE=1 xtuner train /root/code/XTuner/xtuner/configs/internvl/v2/internvl_v2_internlm2_2b_qlora_finetune.py  --work-dir /root/code/work_dir/internvl_ft_run_8_filter  --deepspeed deepspeed_zero1 > train.log 2>&1

以下是完整的训练log:

[2024-10-04 18:08:49,038] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-10-04 18:09:13,665] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Warning: The cache directory for DeepSpeed Triton autotune, /root/.triton/autotune, appears to be on an NFS system. While this is generally acceptable, if you experience slowdowns or hanging when DeepSpeed exits, it is recommended to set the TRITON_CACHE_DIR environment variable to a non-NFS path.
10/04 18:09:21 - mmengine - INFO - 
------------------------------------------------------------
System environment:
    sys.platform: linux
    Python: 3.10.14 (main, May  6 2024, 19:42:50) [GCC 11.2.0]
    CUDA available: True
    MUSA available: False
    numpy_random_seed: 1157427657
    GPU 0: NVIDIA A100-SXM4-80GB
    CUDA_HOME: /usr/local/cuda
    NVCC: Cuda compilation tools, release 12.2, V12.2.140
    GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
    PyTorch: 2.1.2
    PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.1.1 (Git Hash 64f6bcbcbab628e96f33a62c3e975f8535a7bde4)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 12.1
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
  - CuDNN 8.9.2
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.1.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

    TorchVision: 0.16.2
    OpenCV: 4.10.0
    MMEngine: 0.10.5

Runtime environment:
    launcher: none
    randomness: {'seed': None, 'deterministic': False}
    cudnn_benchmark: False
    mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0}
    dist_cfg: {'backend': 'nccl'}
    seed: None
    deterministic: False
    Distributed launcher: none
    Distributed training: False
    GPU number: 1
------------------------------------------------------------

10/04 18:09:21 - mmengine - INFO - Config:
accumulative_counts = 4
batch_size = 4
betas = (
    0.9,
    0.999,
)
custom_hooks = [
    dict(
        tokenizer=dict(
            pretrained_model_name_or_path='/root/models/InternVL2-2B',
            trust_remote_code=True,
            type='transformers.AutoTokenizer.from_pretrained'),
        type='xtuner.engine.hooks.DatasetInfoHook'),
]
data_path = '/root/code/datasets/ex_cn.json'
data_root = '/root/code/datasets/'
dataloader_num_workers = 4
default_hooks = dict(
    checkpoint=dict(
        by_epoch=False,
        interval=1000,
        max_keep_ckpts=1,
        save_optimizer=False,
        type='mmengine.hooks.CheckpointHook'),
    logger=dict(
        interval=10,
        log_metric_by_epoch=False,
        type='mmengine.hooks.LoggerHook'),
    param_scheduler=dict(type='mmengine.hooks.ParamSchedulerHook'),
    sampler_seed=dict(type='mmengine.hooks.DistSamplerSeedHook'),
    timer=dict(type='mmengine.hooks.IterTimerHook'))
env_cfg = dict(
    cudnn_benchmark=False,
    dist_cfg=dict(backend='nccl'),
    mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0))
image_folder = '/root/code/datasets/'
launcher = 'none'
llava_dataset = dict(
    data_paths='/root/code/datasets/ex_cn.json',
    image_folders='/root/code/datasets/',
    max_length=6656,
    model_path='/root/models/InternVL2-2B',
    template='xtuner.utils.PROMPT_TEMPLATE.internlm2_chat',
    type='xtuner.dataset.InternVL_V1_5_Dataset')
load_from = None
log_level = 'INFO'
log_processor = dict(by_epoch=False)
lr = 2e-05
max_epochs = 4
max_length = 6656
max_norm = 1
model = dict(
    freeze_llm=True,
    freeze_visual_encoder=True,
    llm_lora=dict(
        lora_alpha=256,
        lora_dropout=0.05,
        r=128,
        target_modules=None,
        task_type='CAUSAL_LM',
        type='peft.LoraConfig'),
    model_path='/root/models/InternVL2-2B',
    quantization_llm=True,
    quantization_vit=False,
    type='xtuner.model.InternVL_V1_5')
optim_type = 'torch.optim.AdamW'
optim_wrapper = dict(
    optimizer=dict(
        betas=(
            0.9,
            0.999,
        ),
        lr=2e-05,
        type='torch.optim.AdamW',
        weight_decay=0.05),
    type='DeepSpeedOptimWrapper')
param_scheduler = [
    dict(
        begin=0,
        by_epoch=True,
        convert_to_iter_based=True,
        end=0.12,
        start_factor=1e-05,
        type='mmengine.optim.LinearLR'),
    dict(
        begin=0.12,
        by_epoch=True,
        convert_to_iter_based=True,
        end=4,
        eta_min=0.0,
        type='mmengine.optim.CosineAnnealingLR'),
]
path = '/root/models/InternVL2-2B'
prompt_template = 'xtuner.utils.PROMPT_TEMPLATE.internlm2_chat'
randomness = dict(deterministic=False, seed=None)
resume = False
runner_type = 'FlexibleRunner'
save_steps = 1000
save_total_limit = 1
strategy = dict(
    config=dict(
        bf16=dict(enabled=True),
        fp16=dict(enabled=False, initial_scale_power=16),
        gradient_accumulation_steps='auto',
        gradient_clipping='auto',
        train_micro_batch_size_per_gpu='auto',
        zero_allow_untested_optimizer=True,
        zero_force_ds_cpu_optimizer=False,
        zero_optimization=dict(overlap_comm=True, stage=1)),
    exclude_frozen_parameters=True,
    gradient_accumulation_steps=4,
    gradient_clipping=1,
    sequence_parallel_size=1,
    train_micro_batch_size_per_gpu=4,
    type='xtuner.engine.DeepSpeedStrategy')
tokenizer = dict(
    pretrained_model_name_or_path='/root/models/InternVL2-2B',
    trust_remote_code=True,
    type='transformers.AutoTokenizer.from_pretrained')
train_cfg = dict(max_epochs=4, type='xtuner.engine.runner.TrainLoop')
train_dataloader = dict(
    batch_size=4,
    collate_fn=dict(type='xtuner.dataset.collate_fns.default_collate_fn'),
    dataset=dict(
        data_paths='/root/code/datasets/ex_cn.json',
        image_folders='/root/code/datasets/',
        max_length=6656,
        model_path='/root/models/InternVL2-2B',
        template='xtuner.utils.PROMPT_TEMPLATE.internlm2_chat',
        type='xtuner.dataset.InternVL_V1_5_Dataset'),
    num_workers=4,
    sampler=dict(
        length_property='modality_length',
        per_device_batch_size=16,
        type='xtuner.dataset.samplers.LengthGroupedSampler'))
visualizer = None
warmup_ratio = 0.03
weight_decay = 0.05
work_dir = '/root/code/work_dir/internvl_ft_run_8_filter'

10/04 18:09:21 - mmengine - WARNING - Failed to search registry with scope "mmengine" in the "builder" registry tree. As a workaround, the current "builder" registry in "xtuner" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmengine" is a correct scope, or whether the registry is initialized.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
10/04 18:09:21 - mmengine - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH   ) RuntimeInfoHook                    
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
before_train:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(NORMAL      ) DatasetInfoHook                    
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
before_train_epoch:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(NORMAL      ) DistSamplerSeedHook                
 -------------------- 
before_train_iter:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
 -------------------- 
after_train_iter:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(BELOW_NORMAL) LoggerHook                         
(LOW         ) ParamSchedulerHook                 
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
after_train_epoch:
(NORMAL      ) IterTimerHook                      
(LOW         ) ParamSchedulerHook                 
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
before_val:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) DatasetInfoHook                    
 -------------------- 
before_val_epoch:
(NORMAL      ) IterTimerHook                      
 -------------------- 
before_val_iter:
(NORMAL      ) IterTimerHook                      
 -------------------- 
after_val_iter:
(NORMAL      ) IterTimerHook                      
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
after_val_epoch:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(BELOW_NORMAL) LoggerHook                         
(LOW         ) ParamSchedulerHook                 
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
after_val:
(VERY_HIGH   ) RuntimeInfoHook                    
 -------------------- 
after_train:
(VERY_HIGH   ) RuntimeInfoHook                    
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
before_test:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) DatasetInfoHook                    
 -------------------- 
before_test_epoch:
(NORMAL      ) IterTimerHook                      
 -------------------- 
before_test_iter:
(NORMAL      ) IterTimerHook                      
 -------------------- 
after_test_iter:
(NORMAL      ) IterTimerHook                      
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
after_test_epoch:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
after_test:
(VERY_HIGH   ) RuntimeInfoHook                    
 -------------------- 
after_run:
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
10/04 18:09:21 - mmengine - INFO - Starting to loading data and calc length
10/04 18:09:21 - mmengine - INFO - =======Starting to process /root/code/datasets/ex_cn.json =======
10/04 18:09:21 - mmengine - INFO - =======total 2000 samples of /root/code/datasets/ex_cn.json=======
10/04 18:09:21 - mmengine - INFO - end loading data and calc length
10/04 18:09:21 - mmengine - INFO - =======total 2000 samples=======
10/04 18:09:21 - mmengine - INFO - LengthGroupedSampler is used.
10/04 18:09:21 - mmengine - INFO - LengthGroupedSampler construction is complete, and the selected attribute is modality_length
10/04 18:09:21 - mmengine - WARNING - Dataset InternVL_V1_5_Dataset has no metainfo. ``dataset_meta`` in visualizer will be None.
10/04 18:09:21 - mmengine - INFO - Start to load InternVL_V1_5 model.
`low_cpu_mem_usage` was None, now set to True since model is quantized.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
FlashAttention is not installed.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash attention is not available, using eager attention instead.
10/04 18:09:32 - mmengine - INFO - InternVL_V1_5(
  (data_preprocessor): BaseDataPreprocessor()
  (model): InternVLChatModel(
    (vision_model): InternVisionModel(
      (embeddings): InternVisionEmbeddings(
        (patch_embedding): Conv2d(3, 1024, kernel_size=(14, 14), stride=(14, 14))
      )
      (encoder): InternVisionEncoder(
        (layers): ModuleList(
          (0-23): 24 x InternVisionEncoderLayer(
            (attn): InternAttention(
              (qkv): Linear(in_features=1024, out_features=3072, bias=True)
              (attn_drop): Dropout(p=0.0, inplace=False)
              (proj_drop): Dropout(p=0.0, inplace=False)
              (proj): Linear(in_features=1024, out_features=1024, bias=True)
            )
            (mlp): InternMLP(
              (act): GELUActivation()
              (fc1): Linear(in_features=1024, out_features=4096, bias=True)
              (fc2): Linear(in_features=4096, out_features=1024, bias=True)
            )
            (norm1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
            (norm2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
            (drop_path1): Identity()
            (drop_path2): Identity()
          )
        )
      )
    )
    (language_model): PeftModelForCausalLM(
      (base_model): LoraModel(
        (model): InternLM2ForCausalLM(
          (model): InternLM2Model(
            (tok_embeddings): Embedding(92553, 2048, padding_idx=2)
            (layers): ModuleList(
              (0-23): 24 x InternLM2DecoderLayer(
                (attention): InternLM2Attention(
                  (wqkv): lora.Linear(
                    (base_layer): Linear4bit(in_features=2048, out_features=4096, bias=False)
                    (lora_dropout): ModuleDict(
                      (default): Dropout(p=0.05, inplace=False)
                    )
                    (lora_A): ModuleDict(
                      (default): Linear(in_features=2048, out_features=128, bias=False)
                    )
                    (lora_B): ModuleDict(
                      (default): Linear(in_features=128, out_features=4096, bias=False)
                    )
                    (lora_embedding_A): ParameterDict()
                    (lora_embedding_B): ParameterDict()
                  )
                  (wo): lora.Linear(
                    (base_layer): Linear4bit(in_features=2048, out_features=2048, bias=False)
                    (lora_dropout): ModuleDict(
                      (default): Dropout(p=0.05, inplace=False)
                    )
                    (lora_A): ModuleDict(
                      (default): Linear(in_features=2048, out_features=128, bias=False)
                    )
                    (lora_B): ModuleDict(
                      (default): Linear(in_features=128, out_features=2048, bias=False)
                    )
                    (lora_embedding_A): ParameterDict()
                    (lora_embedding_B): ParameterDict()
                  )
                  (rotary_emb): InternLM2DynamicNTKScalingRotaryEmbedding()
                )
                (feed_forward): InternLM2MLP(
                  (w1): lora.Linear(
                    (base_layer): Linear4bit(in_features=2048, out_features=8192, bias=False)
                    (lora_dropout): ModuleDict(
                      (default): Dropout(p=0.05, inplace=False)
                    )
                    (lora_A): ModuleDict(
                      (default): Linear(in_features=2048, out_features=128, bias=False)
                    )
                    (lora_B): ModuleDict(
                      (default): Linear(in_features=128, out_features=8192, bias=False)
                    )
                    (lora_embedding_A): ParameterDict()
                    (lora_embedding_B): ParameterDict()
                  )
                  (w3): lora.Linear(
                    (base_layer): Linear4bit(in_features=2048, out_features=8192, bias=False)
                    (lora_dropout): ModuleDict(
                      (default): Dropout(p=0.05, inplace=False)
                    )
                    (lora_A): ModuleDict(
                      (default): Linear(in_features=2048, out_features=128, bias=False)
                    )
                    (lora_B): ModuleDict(
                      (default): Linear(in_features=128, out_features=8192, bias=False)
                    )
                    (lora_embedding_A): ParameterDict()
                    (lora_embedding_B): ParameterDict()
                  )
                  (w2): lora.Linear(
                    (base_layer): Linear4bit(in_features=8192, out_features=2048, bias=False)
                    (lora_dropout): ModuleDict(
                      (default): Dropout(p=0.05, inplace=False)
                    )
                    (lora_A): ModuleDict(
                      (default): Linear(in_features=8192, out_features=128, bias=False)
                    )
                    (lora_B): ModuleDict(
                      (default): Linear(in_features=128, out_features=2048, bias=False)
                    )
                    (lora_embedding_A): ParameterDict()
                    (lora_embedding_B): ParameterDict()
                  )
                  (act_fn): SiLU()
                )
                (attention_norm): InternLM2RMSNorm()
                (ffn_norm): InternLM2RMSNorm()
              )
            )
            (norm): InternLM2RMSNorm()
          )
          (output): lora.Linear(
            (base_layer): Linear4bit(in_features=2048, out_features=92553, bias=False)
            (lora_dropout): ModuleDict(
              (default): Dropout(p=0.05, inplace=False)
            )
            (lora_A): ModuleDict(
              (default): Linear(in_features=2048, out_features=128, bias=False)
            )
            (lora_B): ModuleDict(
              (default): Linear(in_features=128, out_features=92553, bias=False)
            )
            (lora_embedding_A): ParameterDict()
            (lora_embedding_B): ParameterDict()
          )
        )
      )
    )
    (mlp1): Sequential(
      (0): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
      (1): Linear(in_features=4096, out_features=2048, bias=True)
      (2): GELU(approximate='none')
      (3): Linear(in_features=2048, out_features=2048, bias=True)
    )
  )
)
10/04 18:09:32 - mmengine - INFO - InternVL_V1_5 construction is complete
[2024-10-04 18:09:32,633] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.15.1, git-hash=unknown, git-branch=unknown
[2024-10-04 18:09:32,633] [INFO] [comm.py:652:init_distributed] cdb=None
[2024-10-04 18:09:32,637] [INFO] [comm.py:667:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2024-10-04 18:09:32,736] [INFO] [comm.py:717:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=1, master_addr=192.168.237.215, master_port=29500
[2024-10-04 18:09:32,736] [INFO] [comm.py:683:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2024-10-04 18:09:32,737] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 1
[2024-10-04 18:09:33,522] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2024-10-04 18:09:33,525] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer
[2024-10-04 18:09:33,528] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer
[2024-10-04 18:09:33,589] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = AdamW
[2024-10-04 18:09:33,589] [INFO] [utils.py:59:is_zero_supported_optimizer] Checking ZeRO support for optimizer=AdamW type=<class 'torch.optim.adamw.AdamW'>
[2024-10-04 18:09:33,594] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.bfloat16 ZeRO stage 1 optimizer
[2024-10-04 18:09:33,599] [INFO] [stage_1_and_2.py:148:__init__] Reduce bucket size 500000000
[2024-10-04 18:09:33,604] [INFO] [stage_1_and_2.py:149:__init__] Allgather bucket size 500000000
[2024-10-04 18:09:33,604] [INFO] [stage_1_and_2.py:150:__init__] CPU Offload: False
[2024-10-04 18:09:33,608] [INFO] [stage_1_and_2.py:151:__init__] Round robin gradient partitioning: False
[2024-10-04 18:09:34,096] [INFO] [utils.py:781:see_memory_usage] Before initializing optimizer states
[2024-10-04 18:09:34,097] [INFO] [utils.py:782:see_memory_usage] MA 2.95 GB         Max_MA 3.23 GB         CA 3.37 GB         Max_CA 3 GB 
[2024-10-04 18:09:34,101] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 42.4 GB, percent = 2.1%
[2024-10-04 18:09:34,218] [INFO] [utils.py:781:see_memory_usage] After initializing optimizer states
[2024-10-04 18:09:34,218] [INFO] [utils.py:782:see_memory_usage] MA 2.95 GB         Max_MA 3.51 GB         CA 3.93 GB         Max_CA 4 GB 
[2024-10-04 18:09:34,224] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 42.4 GB, percent = 2.1%
[2024-10-04 18:09:34,229] [INFO] [stage_1_and_2.py:543:__init__] optimizer state initialized
[2024-10-04 18:09:34,351] [INFO] [utils.py:781:see_memory_usage] After initializing ZeRO optimizer
[2024-10-04 18:09:34,352] [INFO] [utils.py:782:see_memory_usage] MA 2.95 GB         Max_MA 2.95 GB         CA 3.93 GB         Max_CA 4 GB 
[2024-10-04 18:09:34,355] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 42.41 GB, percent = 2.1%
[2024-10-04 18:09:34,363] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = DeepSpeedZeroOptimizer
[2024-10-04 18:09:34,364] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using configured LR scheduler = None
[2024-10-04 18:09:34,369] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = None
[2024-10-04 18:09:34,374] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[2e-05], mom=[(0.9, 0.999)]
[2024-10-04 18:09:34,382] [INFO] [config.py:999:print] DeepSpeedEngine configuration:
[2024-10-04 18:09:34,382] [INFO] [config.py:1003:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2024-10-04 18:09:34,386] [INFO] [config.py:1003:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True, 'use_gds': False}
[2024-10-04 18:09:34,390] [INFO] [config.py:1003:print]   amp_enabled .................. False
[2024-10-04 18:09:34,395] [INFO] [config.py:1003:print]   amp_params ................... False
[2024-10-04 18:09:34,399] [INFO] [config.py:1003:print]   autotuning_config ............ {
    "enabled": false, 
    "start_step": null, 
    "end_step": null, 
    "metric_path": null, 
    "arg_mappings": null, 
    "metric": "throughput", 
    "model_info": null, 
    "results_dir": "autotuning_results", 
    "exps_dir": "autotuning_exps", 
    "overwrite": true, 
    "fast": true, 
    "start_profile_step": 3, 
    "end_profile_step": 5, 
    "tuner_type": "gridsearch", 
    "tuner_early_stopping": 5, 
    "tuner_num_trials": 50, 
    "model_info_path": null, 
    "mp_size": 1, 
    "max_train_batch_size": null, 
    "min_train_batch_size": 1, 
    "max_train_micro_batch_size_per_gpu": 1.024000e+03, 
    "min_train_micro_batch_size_per_gpu": 1, 
    "num_tuning_micro_batch_sizes": 3
}
[2024-10-04 18:09:34,404] [INFO] [config.py:1003:print]   bfloat16_enabled ............. True
[2024-10-04 18:09:34,404] [INFO] [config.py:1003:print]   bfloat16_immediate_grad_update  False
[2024-10-04 18:09:34,404] [INFO] [config.py:1003:print]   checkpoint_parallel_write_pipeline  False
[2024-10-04 18:09:34,410] [INFO] [config.py:1003:print]   checkpoint_tag_validation_enabled  True
[2024-10-04 18:09:34,410] [INFO] [config.py:1003:print]   checkpoint_tag_validation_fail  False
[2024-10-04 18:09:34,417] [INFO] [config.py:1003:print]   comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7f2e5c4e4e20>
[2024-10-04 18:09:34,422] [INFO] [config.py:1003:print]   communication_data_type ...... None
[2024-10-04 18:09:34,422] [INFO] [config.py:1003:print]   compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2024-10-04 18:09:34,426] [INFO] [config.py:1003:print]   curriculum_enabled_legacy .... False
[2024-10-04 18:09:34,430] [INFO] [config.py:1003:print]   curriculum_params_legacy ..... False
[2024-10-04 18:09:34,430] [INFO] [config.py:1003:print]   data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2024-10-04 18:09:34,434] [INFO] [config.py:1003:print]   data_efficiency_enabled ...... False
[2024-10-04 18:09:34,434] [INFO] [config.py:1003:print]   dataloader_drop_last ......... False
[2024-10-04 18:09:34,434] [INFO] [config.py:1003:print]   disable_allgather ............ False
[2024-10-04 18:09:34,434] [INFO] [config.py:1003:print]   dump_state ................... False
[2024-10-04 18:09:34,434] [INFO] [config.py:1003:print]   dynamic_loss_scale_args ...... None
[2024-10-04 18:09:34,434] [INFO] [config.py:1003:print]   eigenvalue_enabled ........... False
[2024-10-04 18:09:34,434] [INFO] [config.py:1003:print]   eigenvalue_gas_boundary_resolution  1
[2024-10-04 18:09:34,434] [INFO] [config.py:1003:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2024-10-04 18:09:34,434] [INFO] [config.py:1003:print]   eigenvalue_layer_num ......... 0
[2024-10-04 18:09:34,434] [INFO] [config.py:1003:print]   eigenvalue_max_iter .......... 100
[2024-10-04 18:09:34,434] [INFO] [config.py:1003:print]   eigenvalue_stability ......... 1e-06
[2024-10-04 18:09:34,434] [INFO] [config.py:1003:print]   eigenvalue_tol ............... 0.01
[2024-10-04 18:09:34,434] [INFO] [config.py:1003:print]   eigenvalue_verbose ........... False
[2024-10-04 18:09:34,434] [INFO] [config.py:1003:print]   elasticity_enabled ........... False
[2024-10-04 18:09:34,434] [INFO] [config.py:1003:print]   flops_profiler_config ........ {
    "enabled": false, 
    "recompute_fwd_factor": 0.0, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2024-10-04 18:09:34,434] [INFO] [config.py:1003:print]   fp16_auto_cast ............... None
[2024-10-04 18:09:34,434] [INFO] [config.py:1003:print]   fp16_enabled ................. False
[2024-10-04 18:09:34,434] [INFO] [config.py:1003:print]   fp16_master_weights_and_gradients  False
[2024-10-04 18:09:34,434] [INFO] [config.py:1003:print]   global_rank .................. 0
[2024-10-04 18:09:34,434] [INFO] [config.py:1003:print]   grad_accum_dtype ............. None
[2024-10-04 18:09:34,434] [INFO] [config.py:1003:print]   gradient_accumulation_steps .. 4
[2024-10-04 18:09:34,434] [INFO] [config.py:1003:print]   gradient_clipping ............ 1
[2024-10-04 18:09:34,434] [INFO] [config.py:1003:print]   gradient_predivide_factor .... 1.0
[2024-10-04 18:09:34,434] [INFO] [config.py:1003:print]   graph_harvesting ............. False
[2024-10-04 18:09:34,434] [INFO] [config.py:1003:print]   hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2024-10-04 18:09:34,434] [INFO] [config.py:1003:print]   initial_dynamic_scale ........ 1
[2024-10-04 18:09:34,434] [INFO] [config.py:1003:print]   load_universal_checkpoint .... False
[2024-10-04 18:09:34,434] [INFO] [config.py:1003:print]   loss_scale ................... 1.0
[2024-10-04 18:09:34,434] [INFO] [config.py:1003:print]   memory_breakdown ............. False
[2024-10-04 18:09:34,434] [INFO] [config.py:1003:print]   mics_hierarchial_params_gather  False
[2024-10-04 18:09:34,435] [INFO] [config.py:1003:print]   mics_shard_size .............. -1
[2024-10-04 18:09:34,435] [INFO] [config.py:1003:print]   monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') comet=CometConfig(enabled=False, samples_log_interval=100, project=None, workspace=None, api_key=None, experiment_name=None, experiment_key=None, online=None, mode=None) wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName')
[2024-10-04 18:09:34,435] [INFO] [config.py:1003:print]   nebula_config ................ {
    "enabled": false, 
    "persistent_storage_path": null, 
    "persistent_time_interval": 100, 
    "num_of_version_in_retention": 2, 
    "enable_nebula_load": true, 
    "load_path": null
}
[2024-10-04 18:09:34,435] [INFO] [config.py:1003:print]   optimizer_legacy_fusion ...... False
[2024-10-04 18:09:34,435] [INFO] [config.py:1003:print]   optimizer_name ............... None
[2024-10-04 18:09:34,435] [INFO] [config.py:1003:print]   optimizer_params ............. None
[2024-10-04 18:09:34,435] [INFO] [config.py:1003:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True}
[2024-10-04 18:09:34,435] [INFO] [config.py:1003:print]   pld_enabled .................. False
[2024-10-04 18:09:34,435] [INFO] [config.py:1003:print]   pld_params ................... False
[2024-10-04 18:09:34,435] [INFO] [config.py:1003:print]   prescale_gradients ........... False
[2024-10-04 18:09:34,435] [INFO] [config.py:1003:print]   scheduler_name ............... None
[2024-10-04 18:09:34,435] [INFO] [config.py:1003:print]   scheduler_params ............. None
[2024-10-04 18:09:34,435] [INFO] [config.py:1003:print]   seq_parallel_communication_data_type  torch.float32
[2024-10-04 18:09:34,435] [INFO] [config.py:1003:print]   sparse_attention ............. None
[2024-10-04 18:09:34,435] [INFO] [config.py:1003:print]   sparse_gradients_enabled ..... False
[2024-10-04 18:09:34,435] [INFO] [config.py:1003:print]   steps_per_print .............. 10000000000000
[2024-10-04 18:09:34,435] [INFO] [config.py:1003:print]   timers_config ................ enabled=True synchronized=True
[2024-10-04 18:09:34,435] [INFO] [config.py:1003:print]   train_batch_size ............. 16
[2024-10-04 18:09:34,435] [INFO] [config.py:1003:print]   train_micro_batch_size_per_gpu  4
[2024-10-04 18:09:34,435] [INFO] [config.py:1003:print]   use_data_before_expert_parallel_  False
[2024-10-04 18:09:34,435] [INFO] [config.py:1003:print]   use_node_local_storage ....... False
[2024-10-04 18:09:34,435] [INFO] [config.py:1003:print]   wall_clock_breakdown ......... False
[2024-10-04 18:09:34,435] [INFO] [config.py:1003:print]   weight_quantization_config ... None
[2024-10-04 18:09:34,435] [INFO] [config.py:1003:print]   world_size ................... 1
[2024-10-04 18:09:34,435] [INFO] [config.py:1003:print]   zero_allow_untested_optimizer  True
[2024-10-04 18:09:34,435] [INFO] [config.py:1003:print]   zero_config .................. stage=1 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False use_all_reduce_for_fetch_params=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True
[2024-10-04 18:09:34,435] [INFO] [config.py:1003:print]   zero_enabled ................. True
[2024-10-04 18:09:34,435] [INFO] [config.py:1003:print]   zero_force_ds_cpu_optimizer .. False
[2024-10-04 18:09:34,435] [INFO] [config.py:1003:print]   zero_optimization_stage ...... 1
[2024-10-04 18:09:34,435] [INFO] [config.py:989:print_user_config]   json = {
    "gradient_accumulation_steps": 4, 
    "train_micro_batch_size_per_gpu": 4, 
    "gradient_clipping": 1, 
    "zero_allow_untested_optimizer": true, 
    "zero_force_ds_cpu_optimizer": false, 
    "zero_optimization": {
        "stage": 1, 
        "overlap_comm": true
    }, 
    "fp16": {
        "enabled": false, 
        "initial_scale_power": 16
    }, 
    "bf16": {
        "enabled": true
    }, 
    "steps_per_print": 1.000000e+13
}
10/04 18:09:34 - mmengine - INFO - Num train samples 2000
10/04 18:09:34 - mmengine - INFO - train example:
10/04 18:09:34 - mmengine - INFO -  <s><|im_start|> system
You are an AI assistant whose name is InternLM (书生·浦语).<|im_end|><|im_start|>user
 <img> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> <IMG_CONTEXT> </img>  
请你根据这张图片,讲一个脑洞大开的梗<|im_end|><|im_start|> assistant
果然!大家都会把鼻屎抹在课桌下面<|im_end|>
10/04 18:09:34 - mmengine - WARNING - "FileClient" will be deprecated in future. Please use io functions in https://mmengine.readthedocs.io/en/latest/api/fileio.html#file-io
10/04 18:09:34 - mmengine - WARNING - "HardDiskBackend" is the alias of "LocalBackend" and the former will be deprecated in future.
10/04 18:09:34 - mmengine - INFO - Checkpoints will be saved to /root/code/work_dir/internvl_ft_run_8_filter.
/root/.conda/envs/xtuner/lib/python3.10/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
  warnings.warn(
/root/.conda/envs/xtuner/lib/python3.10/site-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
  warnings.warn(
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3417
10/04 18:10:59 - mmengine - INFO - Iter(train) [  10/2000]  lr: 3.0510e-06  eta: 4:41:52  time: 8.4989  data_time: 0.0145  memory: 23047  loss: 5.5083
10/04 18:12:15 - mmengine - INFO - Iter(train) [  20/2000]  lr: 6.4408e-06  eta: 4:24:32  time: 7.5336  data_time: 0.0205  memory: 22953  loss: 5.8013
10/04 18:13:29 - mmengine - INFO - Iter(train) [  30/2000]  lr: 9.8306e-06  eta: 4:16:27  time: 7.3999  data_time: 0.0187  memory: 22977  loss: 6.0257
10/04 18:14:33 - mmengine - INFO - Iter(train) [  40/2000]  lr: 1.3220e-05  eta: 4:04:18  time: 6.4835  data_time: 0.0170  memory: 22968  loss: 5.5061
10/04 18:15:42 - mmengine - INFO - Iter(train) [  50/2000]  lr: 1.6610e-05  eta: 3:58:58  time: 6.8486  data_time: 0.0157  memory: 22901  loss: 5.2521
10/04 18:16:55 - mmengine - INFO - Iter(train) [  60/2000]  lr: 2.0000e-05  eta: 3:57:39  time: 7.3361  data_time: 0.0182  memory: 22968  loss: 5.2148
10/04 18:18:09 - mmengine - INFO - Iter(train) [  70/2000]  lr: 1.9999e-05  eta: 3:56:24  time: 7.3446  data_time: 0.0183  memory: 22850  loss: 5.3141
10/04 18:19:22 - mmengine - INFO - Iter(train) [  80/2000]  lr: 1.9995e-05  eta: 3:55:17  time: 7.3780  data_time: 0.0202  memory: 22960  loss: 5.0322
10/04 18:20:33 - mmengine - INFO - Iter(train) [  90/2000]  lr: 1.9989e-05  eta: 3:52:56  time: 7.0356  data_time: 0.0168  memory: 22884  loss: 5.0520
10/04 18:21:50 - mmengine - INFO - Iter(train) [ 100/2000]  lr: 1.9980e-05  eta: 3:53:04  time: 7.7440  data_time: 0.0216  memory: 22948  loss: 5.2572
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3378
10/04 18:23:04 - mmengine - INFO - Iter(train) [ 110/2000]  lr: 1.9969e-05  eta: 3:51:58  time: 7.4063  data_time: 0.0189  memory: 22874  loss: 5.2845
10/04 18:24:21 - mmengine - INFO - Iter(train) [ 120/2000]  lr: 1.9954e-05  eta: 3:51:31  time: 7.6570  data_time: 0.0192  memory: 22907  loss: 5.0298
10/04 18:25:36 - mmengine - INFO - Iter(train) [ 130/2000]  lr: 1.9938e-05  eta: 3:50:30  time: 7.4846  data_time: 0.0205  memory: 23028  loss: 5.2737
10/04 18:26:52 - mmengine - INFO - Iter(train) [ 140/2000]  lr: 1.9918e-05  eta: 3:49:40  time: 7.5739  data_time: 0.0175  memory: 22949  loss: 5.1425
10/04 18:28:07 - mmengine - INFO - Iter(train) [ 150/2000]  lr: 1.9896e-05  eta: 3:48:40  time: 7.5251  data_time: 0.0203  memory: 22932  loss: 5.3333
10/04 18:29:21 - mmengine - INFO - Iter(train) [ 160/2000]  lr: 1.9872e-05  eta: 3:47:24  time: 7.4023  data_time: 0.0183  memory: 22978  loss: 5.1071
10/04 18:30:38 - mmengine - INFO - Iter(train) [ 170/2000]  lr: 1.9845e-05  eta: 3:46:39  time: 7.6819  data_time: 0.0207  memory: 22919  loss: 5.2182
10/04 18:31:52 - mmengine - INFO - Iter(train) [ 180/2000]  lr: 1.9815e-05  eta: 3:45:21  time: 7.3935  data_time: 0.0199  memory: 22968  loss: 4.7387
10/04 18:33:06 - mmengine - INFO - Iter(train) [ 190/2000]  lr: 1.9783e-05  eta: 3:44:08  time: 7.4441  data_time: 0.0211  memory: 22900  loss: 5.2470
10/04 18:34:20 - mmengine - INFO - Iter(train) [ 200/2000]  lr: 1.9748e-05  eta: 3:42:49  time: 7.3802  data_time: 0.0215  memory: 22917  loss: 5.1297
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3378
10/04 18:35:36 - mmengine - INFO - Iter(train) [ 210/2000]  lr: 1.9710e-05  eta: 3:41:51  time: 7.6163  data_time: 0.0225  memory: 22947  loss: 4.8400
10/04 18:36:50 - mmengine - INFO - Iter(train) [ 220/2000]  lr: 1.9670e-05  eta: 3:40:32  time: 7.3749  data_time: 0.0185  memory: 22842  loss: 5.1875
10/04 18:38:06 - mmengine - INFO - Iter(train) [ 230/2000]  lr: 1.9628e-05  eta: 3:39:30  time: 7.6036  data_time: 0.0209  memory: 22945  loss: 5.3917
10/04 18:39:21 - mmengine - INFO - Iter(train) [ 240/2000]  lr: 1.9583e-05  eta: 3:38:24  time: 7.5462  data_time: 0.0286  memory: 22945  loss: 4.8723
10/04 18:40:36 - mmengine - INFO - Iter(train) [ 250/2000]  lr: 1.9535e-05  eta: 3:37:09  time: 7.4470  data_time: 0.0205  memory: 22945  loss: 5.2141
10/04 18:41:52 - mmengine - INFO - Iter(train) [ 260/2000]  lr: 1.9485e-05  eta: 3:36:09  time: 7.6598  data_time: 0.0197  memory: 23016  loss: 5.2090
10/04 18:43:10 - mmengine - INFO - Iter(train) [ 270/2000]  lr: 1.9433e-05  eta: 3:35:17  time: 7.7998  data_time: 0.0222  memory: 22981  loss: 5.1787
10/04 18:44:19 - mmengine - INFO - Iter(train) [ 280/2000]  lr: 1.9378e-05  eta: 3:33:24  time: 6.8397  data_time: 0.0179  memory: 22977  loss: 5.1427
10/04 18:45:34 - mmengine - INFO - Iter(train) [ 290/2000]  lr: 1.9320e-05  eta: 3:32:13  time: 7.5015  data_time: 0.0181  memory: 22916  loss: 5.0729
10/04 18:46:49 - mmengine - INFO - Iter(train) [ 300/2000]  lr: 1.9260e-05  eta: 3:31:05  time: 7.5629  data_time: 0.0187  memory: 22991  loss: 5.2407
dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3382
10/04 18:48:06 - mmengine - INFO - Iter(train) [ 310/2000]  lr: 1.9198e-05  eta: 3:30:04  time: 7.6992  data_time: 0.0216  memory: 22926  loss: 5.1557
10/04 18:49:22 - mmengine - INFO - Iter(train) [ 320/2000]  lr: 1.9133e-05  eta: 3:28:55  time: 7.5621  data_time: 0.0205  memory: 22968  loss: 4.8546
10/04 18:50:38 - mmengine - INFO - Iter(train) [ 330/2000]  lr: 1.9066e-05  eta: 3:27:49  time: 7.6329  data_time: 0.0216  memory: 22953  loss: 4.8317
10/04 18:51:54 - mmengine - INFO - Iter(train) [ 340/2000]  lr: 1.8997e-05  eta: 3:26:41  time: 7.6143  data_time: 0.0200  memory: 22891  loss: 5.0185
10/04 18:53:11 - mmengine - INFO - Iter(train) [ 350/2000]  lr: 1.8925e-05  eta: 3:25:35  time: 7.6493  data_time: 0.0217  memory: 22945  loss: 5.2560
10/04 18:54:28 - mmengine - INFO - Iter(train) [ 360/2000]  lr: 1.8851e-05  eta: 3:24:29  time: 7.6684  data_time: 0.0201  memory: 22938  loss: 4.9929
10/04 18:55:37 - mmengine - INFO - Iter(train) [ 370/2000]  lr: 1.8774e-05  eta: 3:22:52  time: 6.9758  data_time: 0.0218  memory: 22926  loss: 5.3571
10/04 18:56:48 - mmengine - INFO - Iter(train) [ 380/2000]  lr: 1.8695e-05  eta: 3:21:20  time: 7.0693  data_time: 0.0192  memory: 23006  loss: 4.8131
10/04 18:58:09 - mmengine - INFO - Iter(train) [ 390/2000]  lr: 1.8614e-05  eta: 3:20:33  time: 8.1170  data_time: 0.0211  memory: 22927  loss: 5.0295
10/04 18:59:20 - mmengine - INFO - Iter(train) [ 400/2000]  lr: 1.8531e-05  eta: 3:19:03  time: 7.1018  data_time: 0.0198  memory: 22893  loss: 4.7538
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3386
10/04 19:00:37 - mmengine - INFO - Iter(train) [ 410/2000]  lr: 1.8445e-05  eta: 3:17:59  time: 7.7217  data_time: 0.0190  memory: 22905  loss: 5.1747
10/04 19:01:51 - mmengine - INFO - Iter(train) [ 420/2000]  lr: 1.8357e-05  eta: 3:16:41  time: 7.4033  data_time: 0.0204  memory: 22914  loss: 5.0257
10/04 19:03:05 - mmengine - INFO - Iter(train) [ 430/2000]  lr: 1.8267e-05  eta: 3:15:23  time: 7.3713  data_time: 0.0208  memory: 22931  loss: 5.0657
10/04 19:04:20 - mmengine - INFO - Iter(train) [ 440/2000]  lr: 1.8175e-05  eta: 3:14:10  time: 7.5162  data_time: 0.0216  memory: 22954  loss: 5.1610
10/04 19:05:43 - mmengine - INFO - Iter(train) [ 450/2000]  lr: 1.8081e-05  eta: 3:13:24  time: 8.3035  data_time: 0.0206  memory: 22958  loss: 4.7556
10/04 19:07:09 - mmengine - INFO - Iter(train) [ 460/2000]  lr: 1.7984e-05  eta: 3:12:45  time: 8.5487  data_time: 0.0193  memory: 22950  loss: 4.9845
10/04 19:08:37 - mmengine - INFO - Iter(train) [ 470/2000]  lr: 1.7886e-05  eta: 3:12:11  time: 8.7813  data_time: 0.0210  memory: 22943  loss: 5.1654
10/04 19:10:03 - mmengine - INFO - Iter(train) [ 480/2000]  lr: 1.7785e-05  eta: 3:11:30  time: 8.6312  data_time: 0.0211  memory: 22940  loss: 5.1257
10/04 19:11:29 - mmengine - INFO - Iter(train) [ 490/2000]  lr: 1.7682e-05  eta: 3:10:47  time: 8.5985  data_time: 0.0198  memory: 22944  loss: 4.7626
10/04 19:12:56 - mmengine - INFO - Exp name: internvl_v2_internlm2_2b_qlora_finetune_20241004_180921
10/04 19:12:56 - mmengine - INFO - Iter(train) [ 500/2000]  lr: 1.7578e-05  eta: 3:10:05  time: 8.7105  data_time: 0.0211  memory: 22932  loss: 5.2822
10/04 19:12:56 - mmengine - WARNING - Reach the end of the dataloader, it will be restarted and continue to iterate. It is recommended to use `mmengine.dataset.InfiniteSampler` to enable the dataloader to iterate infinitely.
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3417
10/04 19:14:29 - mmengine - INFO - Iter(train) [ 510/2000]  lr: 1.7471e-05  eta: 3:09:38  time: 9.2816  data_time: 0.2867  memory: 22970  loss: 4.5358
10/04 19:15:59 - mmengine - INFO - Iter(train) [ 520/2000]  lr: 1.7362e-05  eta: 3:09:00  time: 8.9838  data_time: 0.0194  memory: 22936  loss: 4.2632
10/04 19:17:26 - mmengine - INFO - Iter(train) [ 530/2000]  lr: 1.7252e-05  eta: 3:08:12  time: 8.7122  data_time: 0.0194  memory: 22977  loss: 4.6033
10/04 19:18:53 - mmengine - INFO - Iter(train) [ 540/2000]  lr: 1.7139e-05  eta: 3:07:22  time: 8.6703  data_time: 0.0182  memory: 22877  loss: 4.4570
10/04 19:20:17 - mmengine - INFO - Iter(train) [ 550/2000]  lr: 1.7025e-05  eta: 3:06:26  time: 8.4778  data_time: 0.0214  memory: 22928  loss: 4.3810
10/04 19:21:36 - mmengine - INFO - Iter(train) [ 560/2000]  lr: 1.6909e-05  eta: 3:05:13  time: 7.8703  data_time: 0.0210  memory: 22960  loss: 4.3151
10/04 19:23:02 - mmengine - INFO - Iter(train) [ 570/2000]  lr: 1.6791e-05  eta: 3:04:18  time: 8.6270  data_time: 0.0214  memory: 22954  loss: 4.1707
10/04 19:24:29 - mmengine - INFO - Iter(train) [ 580/2000]  lr: 1.6671e-05  eta: 3:03:24  time: 8.6738  data_time: 0.0218  memory: 22892  loss: 4.1207
10/04 19:25:52 - mmengine - INFO - Iter(train) [ 590/2000]  lr: 1.6550e-05  eta: 3:02:20  time: 8.3036  data_time: 0.0186  memory: 22949  loss: 4.2864
10/04 19:27:17 - mmengine - INFO - Iter(train) [ 600/2000]  lr: 1.6426e-05  eta: 3:01:19  time: 8.4933  data_time: 0.0194  memory: 22949  loss: 4.1973
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3378
10/04 19:28:40 - mmengine - INFO - Iter(train) [ 610/2000]  lr: 1.6302e-05  eta: 3:00:13  time: 8.2698  data_time: 0.0218  memory: 22950  loss: 4.2712
10/04 19:30:05 - mmengine - INFO - Iter(train) [ 620/2000]  lr: 1.6175e-05  eta: 2:59:12  time: 8.5497  data_time: 0.0206  memory: 22934  loss: 4.2687
10/04 19:31:25 - mmengine - INFO - Iter(train) [ 630/2000]  lr: 1.6047e-05  eta: 2:57:58  time: 7.9752  data_time: 0.0192  memory: 22959  loss: 4.0844
10/04 19:32:45 - mmengine - INFO - Iter(train) [ 640/2000]  lr: 1.5917e-05  eta: 2:56:44  time: 7.9558  data_time: 0.0196  memory: 23005  loss: 4.4429
10/04 19:34:03 - mmengine - INFO - Iter(train) [ 650/2000]  lr: 1.5786e-05  eta: 2:55:26  time: 7.8173  data_time: 0.0196  memory: 22977  loss: 4.4057
10/04 19:35:22 - mmengine - INFO - Iter(train) [ 660/2000]  lr: 1.5653e-05  eta: 2:54:11  time: 7.9448  data_time: 0.0180  memory: 22852  loss: 4.2475
10/04 19:36:45 - mmengine - INFO - Iter(train) [ 670/2000]  lr: 1.5519e-05  eta: 2:53:04  time: 8.3177  data_time: 0.0200  memory: 22976  loss: 4.3842
10/04 19:38:11 - mmengine - INFO - Iter(train) [ 680/2000]  lr: 1.5383e-05  eta: 2:52:00  time: 8.5530  data_time: 0.0214  memory: 22962  loss: 4.0834
10/04 19:39:37 - mmengine - INFO - Iter(train) [ 690/2000]  lr: 1.5246e-05  eta: 2:50:57  time: 8.6077  data_time: 0.0227  memory: 22894  loss: 4.2219
10/04 19:40:59 - mmengine - INFO - Iter(train) [ 700/2000]  lr: 1.5107e-05  eta: 2:49:45  time: 8.1722  data_time: 0.0197  memory: 22949  loss: 3.9734
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1844
10/04 19:42:18 - mmengine - INFO - Iter(train) [ 710/2000]  lr: 1.4967e-05  eta: 2:48:28  time: 7.9223  data_time: 0.0193  memory: 22950  loss: 3.9172
10/04 19:43:42 - mmengine - INFO - Iter(train) [ 720/2000]  lr: 1.4826e-05  eta: 2:47:20  time: 8.4123  data_time: 0.0206  memory: 22943  loss: 4.2512
10/04 19:45:06 - mmengine - INFO - Iter(train) [ 730/2000]  lr: 1.4684e-05  eta: 2:46:11  time: 8.4035  data_time: 0.0191  memory: 22939  loss: 4.2932
10/04 19:46:24 - mmengine - INFO - Iter(train) [ 740/2000]  lr: 1.4540e-05  eta: 2:44:52  time: 7.8164  data_time: 0.0181  memory: 22876  loss: 3.9030
10/04 19:47:48 - mmengine - INFO - Iter(train) [ 750/2000]  lr: 1.4395e-05  eta: 2:43:42  time: 8.3626  data_time: 0.0206  memory: 22930  loss: 3.9226
10/04 19:49:12 - mmengine - INFO - Iter(train) [ 760/2000]  lr: 1.4249e-05  eta: 2:42:33  time: 8.4211  data_time: 0.0184  memory: 22866  loss: 4.5162
10/04 19:50:37 - mmengine - INFO - Iter(train) [ 770/2000]  lr: 1.4102e-05  eta: 2:41:24  time: 8.4737  data_time: 0.0210  memory: 22924  loss: 4.4149
10/04 19:51:57 - mmengine - INFO - Iter(train) [ 780/2000]  lr: 1.3954e-05  eta: 2:40:08  time: 8.0413  data_time: 0.0181  memory: 22986  loss: 4.5628
10/04 19:53:21 - mmengine - INFO - Iter(train) [ 790/2000]  lr: 1.3804e-05  eta: 2:38:57  time: 8.3861  data_time: 0.0189  memory: 22878  loss: 4.0859
10/04 19:54:46 - mmengine - INFO - Iter(train) [ 800/2000]  lr: 1.3654e-05  eta: 2:37:47  time: 8.4690  data_time: 0.0193  memory: 22969  loss: 4.2838
dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3381
10/04 19:56:08 - mmengine - INFO - Iter(train) [ 810/2000]  lr: 1.3503e-05  eta: 2:36:33  time: 8.2487  data_time: 0.0196  memory: 22931  loss: 4.1978
10/04 19:57:31 - mmengine - INFO - Iter(train) [ 820/2000]  lr: 1.3351e-05  eta: 2:35:19  time: 8.2596  data_time: 0.0189  memory: 22922  loss: 4.0237
10/04 19:58:55 - mmengine - INFO - Iter(train) [ 830/2000]  lr: 1.3198e-05  eta: 2:34:08  time: 8.4619  data_time: 0.0192  memory: 22828  loss: 4.1301
10/04 20:00:25 - mmengine - INFO - Iter(train) [ 840/2000]  lr: 1.3044e-05  eta: 2:33:04  time: 8.9480  data_time: 0.0189  memory: 22952  loss: 3.9222
10/04 20:01:52 - mmengine - INFO - Iter(train) [ 850/2000]  lr: 1.2889e-05  eta: 2:31:56  time: 8.7567  data_time: 0.0230  memory: 22941  loss: 4.0111
10/04 20:03:19 - mmengine - INFO - Iter(train) [ 860/2000]  lr: 1.2734e-05  eta: 2:30:47  time: 8.6898  data_time: 0.0205  memory: 22935  loss: 4.0319
10/04 20:04:50 - mmengine - INFO - Iter(train) [ 870/2000]  lr: 1.2578e-05  eta: 2:29:42  time: 9.0381  data_time: 0.0218  memory: 22944  loss: 4.2178
10/04 20:06:18 - mmengine - INFO - Iter(train) [ 880/2000]  lr: 1.2421e-05  eta: 2:28:34  time: 8.8554  data_time: 0.0205  memory: 23030  loss: 4.2132
10/04 20:07:44 - mmengine - INFO - Iter(train) [ 890/2000]  lr: 1.2264e-05  eta: 2:27:21  time: 8.5443  data_time: 0.0190  memory: 22925  loss: 4.3738
10/04 20:09:01 - mmengine - INFO - Iter(train) [ 900/2000]  lr: 1.2106e-05  eta: 2:25:58  time: 7.6838  data_time: 0.0189  memory: 22989  loss: 4.3468
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3384
10/04 20:10:28 - mmengine - INFO - Iter(train) [ 910/2000]  lr: 1.1947e-05  eta: 2:24:48  time: 8.7327  data_time: 0.0218  memory: 22985  loss: 4.2110
10/04 20:11:53 - mmengine - INFO - Iter(train) [ 920/2000]  lr: 1.1788e-05  eta: 2:23:34  time: 8.4784  data_time: 0.0210  memory: 22989  loss: 4.3073
10/04 20:13:15 - mmengine - INFO - Iter(train) [ 930/2000]  lr: 1.1628e-05  eta: 2:22:18  time: 8.2734  data_time: 0.0198  memory: 22879  loss: 3.7660
10/04 20:14:45 - mmengine - INFO - Iter(train) [ 940/2000]  lr: 1.1468e-05  eta: 2:21:09  time: 8.9320  data_time: 0.0206  memory: 22953  loss: 3.9863
10/04 20:16:13 - mmengine - INFO - Iter(train) [ 950/2000]  lr: 1.1308e-05  eta: 2:19:58  time: 8.8192  data_time: 0.0207  memory: 22932  loss: 4.0266
10/04 20:17:35 - mmengine - INFO - Iter(train) [ 960/2000]  lr: 1.1147e-05  eta: 2:18:40  time: 8.1754  data_time: 0.0189  memory: 22855  loss: 3.9319
10/04 20:19:02 - mmengine - INFO - Iter(train) [ 970/2000]  lr: 1.0986e-05  eta: 2:17:28  time: 8.7313  data_time: 0.0202  memory: 22944  loss: 3.8750
10/04 20:20:29 - mmengine - INFO - Iter(train) [ 980/2000]  lr: 1.0825e-05  eta: 2:16:15  time: 8.6999  data_time: 0.0210  memory: 22940  loss: 4.3885
10/04 20:21:55 - mmengine - INFO - Iter(train) [ 990/2000]  lr: 1.0663e-05  eta: 2:15:00  time: 8.5673  data_time: 0.0207  memory: 22946  loss: 4.0840
10/04 20:23:21 - mmengine - INFO - Exp name: internvl_v2_internlm2_2b_qlora_finetune_20241004_180921
10/04 20:23:21 - mmengine - INFO - Iter(train) [1000/2000]  lr: 1.0502e-05  eta: 2:13:47  time: 8.6746  data_time: 0.0203  memory: 22939  loss: 4.2185
10/04 20:23:21 - mmengine - INFO - Saving checkpoint at 1000 iterations
/root/.conda/envs/xtuner/lib/python3.10/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
  warnings.warn(
/root/.conda/envs/xtuner/lib/python3.10/site-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
  warnings.warn(
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3417
10/04 20:24:57 - mmengine - INFO - Iter(train) [1010/2000]  lr: 1.0340e-05  eta: 2:12:41  time: 9.5129  data_time: 0.4742  memory: 22967  loss: 3.9110
10/04 20:26:25 - mmengine - INFO - Iter(train) [1020/2000]  lr: 1.0178e-05  eta: 2:11:28  time: 8.8382  data_time: 0.0200  memory: 22904  loss: 3.6152
10/04 20:27:42 - mmengine - INFO - Iter(train) [1030/2000]  lr: 1.0016e-05  eta: 2:10:04  time: 7.6816  data_time: 0.0181  memory: 22924  loss: 3.6865
10/04 20:29:10 - mmengine - INFO - Iter(train) [1040/2000]  lr: 9.8543e-06  eta: 2:08:51  time: 8.8354  data_time: 0.0197  memory: 22951  loss: 3.5300
10/04 20:30:40 - mmengine - INFO - Iter(train) [1050/2000]  lr: 9.6924e-06  eta: 2:07:39  time: 8.9816  data_time: 0.0865  memory: 22979  loss: 3.2532
10/04 20:32:05 - mmengine - INFO - Iter(train) [1060/2000]  lr: 9.5306e-06  eta: 2:06:22  time: 8.5304  data_time: 0.0214  memory: 22905  loss: 3.2470
10/04 20:33:34 - mmengine - INFO - Iter(train) [1070/2000]  lr: 9.3689e-06  eta: 2:05:09  time: 8.8580  data_time: 0.0209  memory: 22959  loss: 3.3525
10/04 20:34:58 - mmengine - INFO - Iter(train) [1080/2000]  lr: 9.2073e-06  eta: 2:03:50  time: 8.3720  data_time: 0.0190  memory: 22950  loss: 3.4107
10/04 20:36:15 - mmengine - INFO - Iter(train) [1090/2000]  lr: 9.0460e-06  eta: 2:02:27  time: 7.7196  data_time: 0.0186  memory: 22821  loss: 2.9280
10/04 20:37:43 - mmengine - INFO - Iter(train) [1100/2000]  lr: 8.8850e-06  eta: 2:01:12  time: 8.8270  data_time: 0.0209  memory: 22904  loss: 3.5778
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3378
10/04 20:39:04 - mmengine - INFO - Iter(train) [1110/2000]  lr: 8.7242e-06  eta: 1:59:52  time: 8.1260  data_time: 0.0180  memory: 22881  loss: 3.5382
10/04 20:40:32 - mmengine - INFO - Iter(train) [1120/2000]  lr: 8.5637e-06  eta: 1:58:37  time: 8.8084  data_time: 0.0216  memory: 22950  loss: 3.1394
10/04 20:42:00 - mmengine - INFO - Iter(train) [1130/2000]  lr: 8.4037e-06  eta: 1:57:21  time: 8.7789  data_time: 0.0216  memory: 22934  loss: 3.7029
10/04 20:43:26 - mmengine - INFO - Iter(train) [1140/2000]  lr: 8.2440e-06  eta: 1:56:04  time: 8.6175  data_time: 0.0201  memory: 23018  loss: 3.7436
10/04 20:44:42 - mmengine - INFO - Iter(train) [1150/2000]  lr: 8.0848e-06  eta: 1:54:39  time: 7.5520  data_time: 0.0181  memory: 22978  loss: 3.7774
10/04 20:46:06 - mmengine - INFO - Iter(train) [1160/2000]  lr: 7.9262e-06  eta: 1:53:20  time: 8.4224  data_time: 0.0197  memory: 22973  loss: 3.8176
10/04 20:47:28 - mmengine - INFO - Iter(train) [1170/2000]  lr: 7.7680e-06  eta: 1:52:00  time: 8.2088  data_time: 0.0191  memory: 22985  loss: 3.3876
10/04 20:48:51 - mmengine - INFO - Iter(train) [1180/2000]  lr: 7.6105e-06  eta: 1:50:40  time: 8.2678  data_time: 0.0204  memory: 22963  loss: 3.2532
10/04 20:50:16 - mmengine - INFO - Iter(train) [1190/2000]  lr: 7.4535e-06  eta: 1:49:22  time: 8.5289  data_time: 0.0197  memory: 22963  loss: 3.2147
10/04 20:51:39 - mmengine - INFO - Iter(train) [1200/2000]  lr: 7.2973e-06  eta: 1:48:03  time: 8.3227  data_time: 0.0194  memory: 22889  loss: 2.9976
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3380
10/04 20:53:03 - mmengine - INFO - Iter(train) [1210/2000]  lr: 7.1417e-06  eta: 1:46:44  time: 8.3818  data_time: 0.0199  memory: 22960  loss: 3.1814
10/04 20:54:29 - mmengine - INFO - Iter(train) [1220/2000]  lr: 6.9869e-06  eta: 1:45:26  time: 8.6318  data_time: 0.0220  memory: 22944  loss: 3.1790
10/04 20:55:51 - mmengine - INFO - Iter(train) [1230/2000]  lr: 6.8329e-06  eta: 1:44:05  time: 8.1279  data_time: 0.0193  memory: 22911  loss: 3.1587
10/04 20:57:18 - mmengine - INFO - Iter(train) [1240/2000]  lr: 6.6797e-06  eta: 1:42:47  time: 8.6747  data_time: 0.0200  memory: 22835  loss: 3.0420
10/04 20:58:44 - mmengine - INFO - Iter(train) [1250/2000]  lr: 6.5274e-06  eta: 1:41:30  time: 8.6785  data_time: 0.0216  memory: 22930  loss: 3.6275
10/04 20:59:59 - mmengine - INFO - Iter(train) [1260/2000]  lr: 6.3760e-06  eta: 1:40:04  time: 7.4518  data_time: 0.0210  memory: 23004  loss: 4.0501
10/04 21:01:23 - mmengine - INFO - Iter(train) [1270/2000]  lr: 6.2256e-06  eta: 1:38:45  time: 8.4184  data_time: 0.0196  memory: 22871  loss: 3.5492
10/04 21:02:44 - mmengine - INFO - Iter(train) [1280/2000]  lr: 6.0761e-06  eta: 1:37:24  time: 8.1113  data_time: 0.0203  memory: 22924  loss: 3.5701
10/04 21:04:08 - mmengine - INFO - Iter(train) [1290/2000]  lr: 5.9277e-06  eta: 1:36:04  time: 8.3615  data_time: 0.0196  memory: 22848  loss: 3.7771
10/04 21:05:32 - mmengine - INFO - Iter(train) [1300/2000]  lr: 5.7803e-06  eta: 1:34:44  time: 8.4343  data_time: 0.0194  memory: 22914  loss: 3.4576
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3382
10/04 21:06:55 - mmengine - INFO - Iter(train) [1310/2000]  lr: 5.6341e-06  eta: 1:33:24  time: 8.2937  data_time: 0.0203  memory: 22949  loss: 3.3055
10/04 21:08:17 - mmengine - INFO - Iter(train) [1320/2000]  lr: 5.4890e-06  eta: 1:32:04  time: 8.2342  data_time: 0.0202  memory: 22959  loss: 3.4384
10/04 21:09:40 - mmengine - INFO - Iter(train) [1330/2000]  lr: 5.3450e-06  eta: 1:30:43  time: 8.2901  data_time: 0.0206  memory: 22960  loss: 3.3032
10/04 21:11:05 - mmengine - INFO - Iter(train) [1340/2000]  lr: 5.2023e-06  eta: 1:29:24  time: 8.4960  data_time: 0.0193  memory: 22954  loss: 3.1710
10/04 21:12:32 - mmengine - INFO - Iter(train) [1350/2000]  lr: 5.0609e-06  eta: 1:28:05  time: 8.7225  data_time: 0.0215  memory: 22944  loss: 3.0601
10/04 21:13:58 - mmengine - INFO - Iter(train) [1360/2000]  lr: 4.9207e-06  eta: 1:26:46  time: 8.5459  data_time: 0.0212  memory: 22944  loss: 3.1155
10/04 21:15:19 - mmengine - INFO - Iter(train) [1370/2000]  lr: 4.7819e-06  eta: 1:25:25  time: 8.1565  data_time: 0.0223  memory: 22945  loss: 3.5367
10/04 21:16:46 - mmengine - INFO - Iter(train) [1380/2000]  lr: 4.6445e-06  eta: 1:24:05  time: 8.6134  data_time: 0.0212  memory: 22998  loss: 3.7689
10/04 21:18:10 - mmengine - INFO - Iter(train) [1390/2000]  lr: 4.5084e-06  eta: 1:22:46  time: 8.4796  data_time: 0.0206  memory: 22953  loss: 3.8861
10/04 21:19:38 - mmengine - INFO - Iter(train) [1400/2000]  lr: 4.3738e-06  eta: 1:21:27  time: 8.7913  data_time: 0.0204  memory: 22942  loss: 3.6471
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3386
10/04 21:21:05 - mmengine - INFO - Iter(train) [1410/2000]  lr: 4.2407e-06  eta: 1:20:08  time: 8.6783  data_time: 0.0201  memory: 22977  loss: 3.6828
10/04 21:22:30 - mmengine - INFO - Iter(train) [1420/2000]  lr: 4.1091e-06  eta: 1:18:47  time: 8.4557  data_time: 0.0196  memory: 22904  loss: 3.5758
10/04 21:23:53 - mmengine - INFO - Iter(train) [1430/2000]  lr: 3.9790e-06  eta: 1:17:27  time: 8.3265  data_time: 0.0195  memory: 22933  loss: 3.3302
10/04 21:25:18 - mmengine - INFO - Iter(train) [1440/2000]  lr: 3.8505e-06  eta: 1:16:06  time: 8.4857  data_time: 0.0194  memory: 22933  loss: 3.3044
10/04 21:26:44 - mmengine - INFO - Iter(train) [1450/2000]  lr: 3.7236e-06  eta: 1:14:47  time: 8.6112  data_time: 0.0202  memory: 22964  loss: 3.2307
10/04 21:28:00 - mmengine - INFO - Iter(train) [1460/2000]  lr: 3.5984e-06  eta: 1:13:23  time: 7.5750  data_time: 0.0199  memory: 22886  loss: 3.2902
10/04 21:29:19 - mmengine - INFO - Iter(train) [1470/2000]  lr: 3.4748e-06  eta: 1:12:01  time: 7.9660  data_time: 0.0186  memory: 22890  loss: 3.0802
10/04 21:30:45 - mmengine - INFO - Iter(train) [1480/2000]  lr: 3.3529e-06  eta: 1:10:41  time: 8.6100  data_time: 0.0200  memory: 22940  loss: 3.3096
10/04 21:32:10 - mmengine - INFO - Iter(train) [1490/2000]  lr: 3.2328e-06  eta: 1:09:20  time: 8.5060  data_time: 0.0198  memory: 22945  loss: 3.1521
10/04 21:33:23 - mmengine - INFO - Iter(train) [1500/2000]  lr: 3.1145e-06  eta: 1:07:56  time: 7.2291  data_time: 0.0183  memory: 22944  loss: 3.1733
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3417
10/04 21:34:52 - mmengine - INFO - Iter(train) [1510/2000]  lr: 2.9980e-06  eta: 1:06:37  time: 8.9726  data_time: 0.2966  memory: 22905  loss: 3.5842
10/04 21:36:19 - mmengine - INFO - Iter(train) [1520/2000]  lr: 2.8833e-06  eta: 1:05:17  time: 8.6213  data_time: 0.0208  memory: 23012  loss: 3.4258
10/04 21:37:42 - mmengine - INFO - Iter(train) [1530/2000]  lr: 2.7705e-06  eta: 1:03:56  time: 8.3738  data_time: 0.0207  memory: 22966  loss: 3.4098
10/04 21:39:06 - mmengine - INFO - Iter(train) [1540/2000]  lr: 2.6595e-06  eta: 1:02:35  time: 8.4013  data_time: 0.0198  memory: 22991  loss: 3.1356
10/04 21:40:30 - mmengine - INFO - Iter(train) [1550/2000]  lr: 2.5505e-06  eta: 1:01:14  time: 8.3521  data_time: 0.0174  memory: 22975  loss: 3.1623
10/04 21:41:56 - mmengine - INFO - Iter(train) [1560/2000]  lr: 2.4435e-06  eta: 0:59:53  time: 8.5880  data_time: 0.0213  memory: 22968  loss: 2.7483
10/04 21:43:21 - mmengine - INFO - Iter(train) [1570/2000]  lr: 2.3384e-06  eta: 0:58:33  time: 8.5371  data_time: 0.0207  memory: 22935  loss: 2.4332
10/04 21:44:42 - mmengine - INFO - Iter(train) [1580/2000]  lr: 2.2353e-06  eta: 0:57:11  time: 8.0390  data_time: 0.0193  memory: 22967  loss: 2.6731
10/04 21:46:05 - mmengine - INFO - Iter(train) [1590/2000]  lr: 2.1343e-06  eta: 0:55:49  time: 8.3256  data_time: 0.0191  memory: 22949  loss: 2.7426
10/04 21:47:32 - mmengine - INFO - Iter(train) [1600/2000]  lr: 2.0354e-06  eta: 0:54:29  time: 8.7261  data_time: 0.0213  memory: 22939  loss: 2.7215
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3376
10/04 21:48:55 - mmengine - INFO - Iter(train) [1610/2000]  lr: 1.9385e-06  eta: 0:53:07  time: 8.2561  data_time: 0.0199  memory: 22879  loss: 2.7919
10/04 21:50:18 - mmengine - INFO - Iter(train) [1620/2000]  lr: 1.8437e-06  eta: 0:51:46  time: 8.3675  data_time: 0.0191  memory: 22934  loss: 2.6617
10/04 21:51:43 - mmengine - INFO - Iter(train) [1630/2000]  lr: 1.7511e-06  eta: 0:50:25  time: 8.4747  data_time: 0.0201  memory: 23025  loss: 3.3477
10/04 21:53:09 - mmengine - INFO - Iter(train) [1640/2000]  lr: 1.6606e-06  eta: 0:49:04  time: 8.5781  data_time: 0.0193  memory: 22911  loss: 3.5590
10/04 21:54:38 - mmengine - INFO - Iter(train) [1650/2000]  lr: 1.5724e-06  eta: 0:47:44  time: 8.8654  data_time: 0.0198  memory: 22987  loss: 3.0301
10/04 21:56:02 - mmengine - INFO - Iter(train) [1660/2000]  lr: 1.4863e-06  eta: 0:46:23  time: 8.4864  data_time: 0.0192  memory: 22872  loss: 3.1503
10/04 21:57:26 - mmengine - INFO - Iter(train) [1670/2000]  lr: 1.4025e-06  eta: 0:45:01  time: 8.3341  data_time: 0.0176  memory: 22968  loss: 3.2793
10/04 21:58:50 - mmengine - INFO - Iter(train) [1680/2000]  lr: 1.3209e-06  eta: 0:43:40  time: 8.4439  data_time: 0.0194  memory: 22964  loss: 2.6808
10/04 22:00:10 - mmengine - INFO - Iter(train) [1690/2000]  lr: 1.2416e-06  eta: 0:42:17  time: 7.9845  data_time: 0.0179  memory: 22874  loss: 2.9352
10/04 22:01:35 - mmengine - INFO - Iter(train) [1700/2000]  lr: 1.1646e-06  eta: 0:40:56  time: 8.4718  data_time: 0.0245  memory: 22958  loss: 2.6870
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3379
10/04 22:02:54 - mmengine - INFO - Iter(train) [1710/2000]  lr: 1.0899e-06  eta: 0:39:34  time: 7.9074  data_time: 0.0238  memory: 22953  loss: 2.6267
10/04 22:04:18 - mmengine - INFO - Iter(train) [1720/2000]  lr: 1.0176e-06  eta: 0:38:12  time: 8.4329  data_time: 0.0215  memory: 22953  loss: 2.5500
10/04 22:05:46 - mmengine - INFO - Iter(train) [1730/2000]  lr: 9.4760e-07  eta: 0:36:51  time: 8.7581  data_time: 0.0216  memory: 22913  loss: 2.8524
10/04 22:07:11 - mmengine - INFO - Iter(train) [1740/2000]  lr: 8.7998e-07  eta: 0:35:30  time: 8.5283  data_time: 0.0213  memory: 22833  loss: 2.8008
10/04 22:08:37 - mmengine - INFO - Iter(train) [1750/2000]  lr: 8.1475e-07  eta: 0:34:09  time: 8.6292  data_time: 0.0208  memory: 23067  loss: 2.6952
10/04 22:10:00 - mmengine - INFO - Iter(train) [1760/2000]  lr: 7.5194e-07  eta: 0:32:47  time: 8.2276  data_time: 0.0229  memory: 22985  loss: 3.4256
10/04 22:11:21 - mmengine - INFO - Iter(train) [1770/2000]  lr: 6.9154e-07  eta: 0:31:25  time: 8.1453  data_time: 0.0215  memory: 22982  loss: 3.0947
10/04 22:12:40 - mmengine - INFO - Iter(train) [1780/2000]  lr: 6.3359e-07  eta: 0:30:02  time: 7.9328  data_time: 0.0182  memory: 22919  loss: 3.0292
10/04 22:14:06 - mmengine - INFO - Iter(train) [1790/2000]  lr: 5.7810e-07  eta: 0:28:41  time: 8.6139  data_time: 0.0205  memory: 22983  loss: 2.9566
10/04 22:15:31 - mmengine - INFO - Iter(train) [1800/2000]  lr: 5.2507e-07  eta: 0:27:19  time: 8.5016  data_time: 0.0213  memory: 22890  loss: 2.8613
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3382
10/04 22:16:49 - mmengine - INFO - Iter(train) [1810/2000]  lr: 4.7453e-07  eta: 0:25:57  time: 7.7396  data_time: 0.0195  memory: 22957  loss: 2.8540
10/04 22:18:13 - mmengine - INFO - Iter(train) [1820/2000]  lr: 4.2649e-07  eta: 0:24:35  time: 8.4487  data_time: 0.0623  memory: 22960  loss: 2.8646
10/04 22:19:40 - mmengine - INFO - Iter(train) [1830/2000]  lr: 3.8096e-07  eta: 0:23:13  time: 8.6708  data_time: 0.0204  memory: 22899  loss: 2.7100
10/04 22:21:04 - mmengine - INFO - Iter(train) [1840/2000]  lr: 3.3795e-07  eta: 0:21:52  time: 8.3722  data_time: 0.0195  memory: 22824  loss: 2.6113
10/04 22:22:32 - mmengine - INFO - Iter(train) [1850/2000]  lr: 2.9748e-07  eta: 0:20:30  time: 8.8344  data_time: 0.0228  memory: 22945  loss: 2.7548
10/04 22:23:58 - mmengine - INFO - Iter(train) [1860/2000]  lr: 2.5955e-07  eta: 0:19:08  time: 8.6054  data_time: 0.0218  memory: 22876  loss: 2.5025
10/04 22:25:22 - mmengine - INFO - Iter(train) [1870/2000]  lr: 2.2417e-07  eta: 0:17:46  time: 8.3883  data_time: 0.0207  memory: 22936  loss: 2.9235
10/04 22:26:48 - mmengine - INFO - Iter(train) [1880/2000]  lr: 1.9136e-07  eta: 0:16:25  time: 8.5478  data_time: 0.0200  memory: 22996  loss: 3.4898
10/04 22:28:14 - mmengine - INFO - Iter(train) [1890/2000]  lr: 1.6112e-07  eta: 0:15:03  time: 8.6544  data_time: 0.0213  memory: 22924  loss: 3.4387
10/04 22:29:32 - mmengine - INFO - Iter(train) [1900/2000]  lr: 1.3346e-07  eta: 0:13:40  time: 7.7458  data_time: 0.0180  memory: 22938  loss: 3.0704
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3388
10/04 22:30:54 - mmengine - INFO - Iter(train) [1910/2000]  lr: 1.0838e-07  eta: 0:12:18  time: 8.2318  data_time: 0.0190  memory: 22917  loss: 3.1061
10/04 22:32:21 - mmengine - INFO - Iter(train) [1920/2000]  lr: 8.5904e-08  eta: 0:10:56  time: 8.7586  data_time: 0.0210  memory: 22929  loss: 2.7799
10/04 22:33:49 - mmengine - INFO - Iter(train) [1930/2000]  lr: 6.6024e-08  eta: 0:09:35  time: 8.7937  data_time: 0.0221  memory: 22964  loss: 2.9244
10/04 22:35:13 - mmengine - INFO - Iter(train) [1940/2000]  lr: 4.8750e-08  eta: 0:08:12  time: 8.3437  data_time: 0.0217  memory: 22960  loss: 2.8050
10/04 22:36:34 - mmengine - INFO - Iter(train) [1950/2000]  lr: 3.4085e-08  eta: 0:06:50  time: 8.1300  data_time: 0.0207  memory: 22891  loss: 2.6039
10/04 22:37:58 - mmengine - INFO - Iter(train) [1960/2000]  lr: 2.2033e-08  eta: 0:05:28  time: 8.4243  data_time: 0.0200  memory: 22885  loss: 2.8847
10/04 22:39:26 - mmengine - INFO - Iter(train) [1970/2000]  lr: 1.2598e-08  eta: 0:04:06  time: 8.7378  data_time: 0.0205  memory: 22942  loss: 2.5020
10/04 22:40:51 - mmengine - INFO - Iter(train) [1980/2000]  lr: 5.7818e-09  eta: 0:02:44  time: 8.4783  data_time: 0.0225  memory: 22939  loss: 2.7424
10/04 22:42:21 - mmengine - INFO - Iter(train) [1990/2000]  lr: 1.5865e-09  eta: 0:01:22  time: 9.0800  data_time: 0.0222  memory: 22944  loss: 2.7072
10/04 22:43:42 - mmengine - INFO - Exp name: internvl_v2_internlm2_2b_qlora_finetune_20241004_180921
10/04 22:43:42 - mmengine - INFO - Iter(train) [2000/2000]  lr: 1.3112e-11  eta: 0:00:00  time: 8.0960  data_time: 0.0185  memory: 22856  loss: 2.9651
10/04 22:43:42 - mmengine - INFO - Saving checkpoint at 2000 iterations
Warning: The cache directory for DeepSpeed Triton autotune, /root/.triton/autotune, appears to be on an NFS system. While this is generally acceptable, if you experience slowdowns or hanging when DeepSpeed exits, it is recommended to set the TRITON_CACHE_DIR environment variable to a non-NFS path.

感觉最后的loss在震荡,收敛得一般般呐

step4 合并权重&&模型转换

把qlora的微调结果和原模型并到一起,

运行命令:

python3 xtuner/configs/internvl/v1_5/convert_to_official.py xtuner/configs/internvl/v2/internvl_v2_internlm2_2b_qlora_finetune.py /root/code/work_dir/internvl_ft_run_8_filter/iter_2000.pth /root/share/new_models/OpenGVLab/InternVL2-2B >convert.log 2>&1

转换的log:

[2024-10-04 23:28:21,990] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Warning: The cache directory for DeepSpeed Triton autotune, /root/.triton/autotune, appears to be on an NFS system. While this is generally acceptable, if you experience slowdowns or hanging when DeepSpeed exits, it is recommended to set the TRITON_CACHE_DIR environment variable to a non-NFS path.
10/04 23:28:27 - mmengine - INFO - Start to load InternVL_V1_5 model.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
FlashAttention is not installed.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash attention is not available, using eager attention instead.
Load pretrained weight from /root/code/work_dir/internvl_ft_run_8_filter/iter_2000.pth
10/04 23:29:07 - mmengine - INFO - InternVL_V1_5(
  (data_preprocessor): BaseDataPreprocessor()
  (model): InternVLChatModel(
    (vision_model): InternVisionModel(
      (embeddings): InternVisionEmbeddings(
        (patch_embedding): Conv2d(3, 1024, kernel_size=(14, 14), stride=(14, 14))
      )
      (encoder): InternVisionEncoder(
        (layers): ModuleList(
          (0-23): 24 x InternVisionEncoderLayer(
            (attn): InternAttention(
              (qkv): Linear(in_features=1024, out_features=3072, bias=True)
              (attn_drop): Dropout(p=0.0, inplace=False)
              (proj_drop): Dropout(p=0.0, inplace=False)
              (proj): Linear(in_features=1024, out_features=1024, bias=True)
            )
            (mlp): InternMLP(
              (act): GELUActivation()
              (fc1): Linear(in_features=1024, out_features=4096, bias=True)
              (fc2): Linear(in_features=4096, out_features=1024, bias=True)
            )
            (norm1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
            (norm2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
            (drop_path1): Identity()
            (drop_path2): Identity()
          )
        )
      )
    )
    (language_model): PeftModelForCausalLM(
      (base_model): LoraModel(
        (model): InternLM2ForCausalLM(
          (model): InternLM2Model(
            (tok_embeddings): Embedding(92553, 2048, padding_idx=2)
            (layers): ModuleList(
              (0-23): 24 x InternLM2DecoderLayer(
                (attention): InternLM2Attention(
                  (wqkv): lora.Linear(
                    (base_layer): Linear(in_features=2048, out_features=4096, bias=False)
                    (lora_dropout): ModuleDict(
                      (default): Dropout(p=0.05, inplace=False)
                    )
                    (lora_A): ModuleDict(
                      (default): Linear(in_features=2048, out_features=128, bias=False)
                    )
                    (lora_B): ModuleDict(
                      (default): Linear(in_features=128, out_features=4096, bias=False)
                    )
                    (lora_embedding_A): ParameterDict()
                    (lora_embedding_B): ParameterDict()
                  )
                  (wo): lora.Linear(
                    (base_layer): Linear(in_features=2048, out_features=2048, bias=False)
                    (lora_dropout): ModuleDict(
                      (default): Dropout(p=0.05, inplace=False)
                    )
                    (lora_A): ModuleDict(
                      (default): Linear(in_features=2048, out_features=128, bias=False)
                    )
                    (lora_B): ModuleDict(
                      (default): Linear(in_features=128, out_features=2048, bias=False)
                    )
                    (lora_embedding_A): ParameterDict()
                    (lora_embedding_B): ParameterDict()
                  )
                  (rotary_emb): InternLM2DynamicNTKScalingRotaryEmbedding()
                )
                (feed_forward): InternLM2MLP(
                  (w1): lora.Linear(
                    (base_layer): Linear(in_features=2048, out_features=8192, bias=False)
                    (lora_dropout): ModuleDict(
                      (default): Dropout(p=0.05, inplace=False)
                    )
                    (lora_A): ModuleDict(
                      (default): Linear(in_features=2048, out_features=128, bias=False)
                    )
                    (lora_B): ModuleDict(
                      (default): Linear(in_features=128, out_features=8192, bias=False)
                    )
                    (lora_embedding_A): ParameterDict()
                    (lora_embedding_B): ParameterDict()
                  )
                  (w3): lora.Linear(
                    (base_layer): Linear(in_features=2048, out_features=8192, bias=False)
                    (lora_dropout): ModuleDict(
                      (default): Dropout(p=0.05, inplace=False)
                    )
                    (lora_A): ModuleDict(
                      (default): Linear(in_features=2048, out_features=128, bias=False)
                    )
                    (lora_B): ModuleDict(
                      (default): Linear(in_features=128, out_features=8192, bias=False)
                    )
                    (lora_embedding_A): ParameterDict()
                    (lora_embedding_B): ParameterDict()
                  )
                  (w2): lora.Linear(
                    (base_layer): Linear(in_features=8192, out_features=2048, bias=False)
                    (lora_dropout): ModuleDict(
                      (default): Dropout(p=0.05, inplace=False)
                    )
                    (lora_A): ModuleDict(
                      (default): Linear(in_features=8192, out_features=128, bias=False)
                    )
                    (lora_B): ModuleDict(
                      (default): Linear(in_features=128, out_features=2048, bias=False)
                    )
                    (lora_embedding_A): ParameterDict()
                    (lora_embedding_B): ParameterDict()
                  )
                  (act_fn): SiLU()
                )
                (attention_norm): InternLM2RMSNorm()
                (ffn_norm): InternLM2RMSNorm()
              )
            )
            (norm): InternLM2RMSNorm()
          )
          (output): lora.Linear(
            (base_layer): Linear(in_features=2048, out_features=92553, bias=False)
            (lora_dropout): ModuleDict(
              (default): Dropout(p=0.05, inplace=False)
            )
            (lora_A): ModuleDict(
              (default): Linear(in_features=2048, out_features=128, bias=False)
            )
            (lora_B): ModuleDict(
              (default): Linear(in_features=128, out_features=92553, bias=False)
            )
            (lora_embedding_A): ParameterDict()
            (lora_embedding_B): ParameterDict()
          )
        )
      )
    )
    (mlp1): Sequential(
      (0): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
      (1): Linear(in_features=4096, out_features=2048, bias=True)
      (2): GELU(approximate='none')
      (3): Linear(in_features=2048, out_features=2048, bias=True)
    )
  )
)
10/04 23:29:07 - mmengine - INFO - InternVL_V1_5 construction is complete
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
InternVL_V1_5(
  (data_preprocessor): BaseDataPreprocessor()
  (model): InternVLChatModel(
    (vision_model): InternVisionModel(
      (embeddings): InternVisionEmbeddings(
        (patch_embedding): Conv2d(3, 1024, kernel_size=(14, 14), stride=(14, 14))
      )
      (encoder): InternVisionEncoder(
        (layers): ModuleList(
          (0-23): 24 x InternVisionEncoderLayer(
            (attn): InternAttention(
              (qkv): Linear(in_features=1024, out_features=3072, bias=True)
              (attn_drop): Dropout(p=0.0, inplace=False)
              (proj_drop): Dropout(p=0.0, inplace=False)
              (proj): Linear(in_features=1024, out_features=1024, bias=True)
            )
            (mlp): InternMLP(
              (act): GELUActivation()
              (fc1): Linear(in_features=1024, out_features=4096, bias=True)
              (fc2): Linear(in_features=4096, out_features=1024, bias=True)
            )
            (norm1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
            (norm2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
            (drop_path1): Identity()
            (drop_path2): Identity()
          )
        )
      )
    )
    (language_model): InternLM2ForCausalLM(
      (model): InternLM2Model(
        (tok_embeddings): Embedding(92553, 2048, padding_idx=2)
        (layers): ModuleList(
          (0-23): 24 x InternLM2DecoderLayer(
            (attention): InternLM2Attention(
              (wqkv): Linear(in_features=2048, out_features=4096, bias=False)
              (wo): Linear(in_features=2048, out_features=2048, bias=False)
              (rotary_emb): InternLM2DynamicNTKScalingRotaryEmbedding()
            )
            (feed_forward): InternLM2MLP(
              (w1): Linear(in_features=2048, out_features=8192, bias=False)
              (w3): Linear(in_features=2048, out_features=8192, bias=False)
              (w2): Linear(in_features=8192, out_features=2048, bias=False)
              (act_fn): SiLU()
            )
            (attention_norm): InternLM2RMSNorm()
            (ffn_norm): InternLM2RMSNorm()
          )
        )
        (norm): InternLM2RMSNorm()
      )
      (output): Linear(in_features=2048, out_features=92553, bias=False)
    )
    (mlp1): Sequential(
      (0): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
      (1): Linear(in_features=4096, out_features=2048, bias=True)
      (2): GELU(approximate='none')
      (3): Linear(in_features=2048, out_features=2048, bias=True)
    )
  )
)

效果验证

虽然梗有点烂,但是总比没微调过的好玩些嘛~

在这里插入图片描述

有些烂梗LOL

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值