预训练模型微调时遇到,CUDA error: device-side assert triggered

第一次用torch跑预训练模型(GPT2LMHeadModel)报错了:
RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

我的代码:

import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer
from torch.utils.data import DataLoader, TensorDataset
import torch.optim as optim
import numpy as np

# Example input and target waveforms
input_waveforms = np.random.randn(1000, 100)  # Replace with your input waveforms
target_waveforms = np.random.randn(1000, 100)  # Replace with your target waveforms

# Convert input and target waveforms to tensors
input_waveforms_tensor = torch.tensor(input_waveforms, dtype=torch.float32)
target_waveforms_tensor = torch.tensor(target_waveforms, dtype=torch.float32)

# Define the batch size
batch_size = 8

# Create a TensorDataset
dataset = TensorDataset(input_waveforms_tensor, target_waveforms_tensor)

# Create a DataLoader for batch processing
dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)

# Load pre-trained GPT model and tokenizer
model_name = 'gpt2-medium'  # Choose the desired GPT model
model = GPT2LMHeadModel.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)

# Set the device (GPU if available)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

# Define loss function and optimizer
criterion = torch.nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=1e-4)

# Fine-tuning loop
num_epochs = 10

for epoch in range(num_epochs):
    running_loss = 0.0

    for inputs, targets in dataloader:
        inputs = inputs.to(device)
        targets = targets.to(device)

        # Forward pass
        outputs = model(inputs)[0]

        # Compute loss
        loss = criterion(outputs, targets)

        # Backward pass and optimization
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    epoch_loss = running_loss / len(dataloader)
    print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {epoch_loss:.4f}')

# Evaluation
test_input_waveforms = np.random.randn(100, 100)  # Replace with your test input waveforms
test_input_waveforms_tensor = torch.tensor(test_input_waveforms, dtype=torch.float32).to(device)

# Convert test input waveforms tensor to the appropriate scalar type
test_input_waveforms_tensor = test_input_waveforms_tensor.to(torch.long)

generated_waveforms = model.generate(test_input_waveforms_tensor, max_length=100)

# Convert generated waveforms from tensors to numpy arrays
generated_waveforms = generated_waveforms.cpu().numpy()

# Print the generated waveforms
print(generated_waveforms)

网上的调整方法还挺多的,我一个一个试一下

1.加入下面这段代码,会打印出比较完整的报错信息。

import os
os.environ["CUDA_LAUNCH_BLOCKING"] = "1"

加了但是没有打印出来

2.把device 改成了device='cpu',放到cpu上运行,真的报错原因是:

IndexError:index out of range in self

再接着翻发现是输入数据超出了tokenizer的编码范围,回去看一下编码的结果。要往字典里加字之类的。

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
ce-side assert triggered是由于代码中出现了索引越界导致的错误。这个错误是CUDA报错,具体原因可能是代码中的某个索引超出了合法范围。然而,根据引用\[2\]所述,报错位置的代码并不一定是错误的根本位置,因此需要进一步分析。根据引用\[3\]的问题分析,可能是由于维度范围溢出导致的CUDA错误。为了解决这个问题,可以检查代码中的索引操作,确保索引值在合法范围内。此外,还可以尝试使用调试工具来定位错误的具体位置,并查看相关的报错信息以获取更多的线索。 #### 引用[.reference_title] - *1* *2* [RuntimeError: CUDA error: device-side assert triggered Pytorch框架代码运行错误解决方案(亲测有效!...](https://blog.csdn.net/weixin_42112050/article/details/120455407)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^koosearch_v1,239^v3^insert_chatgpt"}} ] [.reference_item] - *3* [「Bug」问题分析 RuntimeError: CUDA error: device-side assert triggered](https://blog.csdn.net/ViatorSun/article/details/125207465)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^koosearch_v1,239^v3^insert_chatgpt"}} ] [.reference_item] [ .reference_list ]
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值