train_transformer.py 异常:Assertion `srcIndex < srcSelectDimSize` failed.

引用项目:https://github.com/SMART-TTS/SMART-Single_Emotional_TTS
音频样本数据:LJSpeech-1.1
样本格式内容如:

LJ_NOR_10001.wav|the chronicles of newgate, volume two. by arthur griffiths. section eight: the beginnings of prison reform.
LJ_NOR_10002.wav|newgate prisoners were the victims to another most objectionable practice which obtained all over london.
LJ_NOR_10003.wav|persons committed to a metropolitan jail at that time were taken in gangs, men and women handcuffed together, or linked on to a long chain,
LJ_NOR_10004.wav|unless they could afford to pay for a vehicle out of their own funds.

异常:Assertion srcIndex < srcSelectDimSize failed.

(emo_tts3) D:\workspace_tts\emotion-fs-3>python train_transformer.py
Trainable Parameters: 15.927M
C:\Users\fangg\Anaconda3\envs\emo_tts3\lib\site-packages\torch\nn\modules\loss.py:94: UserWarning: Using a target size (torch.Size([8])) that is different to the input size (torch.Size([8, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
  return F.l1_loss(input, target, reduction=self.reduction)
| Epoch: 0, 0/330th loss : 0.9549 + 1.1547 + 0.0280 + 4.0505 = 1.2376
Validation| loss : 1.0039 + 1.2023 + 0.0212 + 3.8226 = 6.0500
| Epoch: 0, 1/330th loss : 0.9634 + 1.1604 + 0.0290 + 3.7260 = 1.1758
| Epoch: 0, 2/330th loss : 0.9589 + 1.1530 + 0.0286 + 3.8567 = 1.1994
| Epoch: 0, 3/330th loss : 0.9564 + 1.1508 + 0.0285 + 3.7279 = 1.1727
.
.
.
C:/w/b/windows/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: block: [38,0,0], thread: [32,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:/w/b/windows/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: block: [38,0,0], thread: [33,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
.
.
.
Traceback (most recent call last):
  File "train_transformer.py", line 261, in <module>
    main()
  File "train_transformer.py", line 223, in main
    mel_pred, postnet_pred, attn_probs, decoder_outputs, attns_enc, attns_dec, attns_style, post_linear, duration_predictor_output, duration, weights = m.forward(character, mel_input, pos_text, pos_mel, mel, pos_mel, mel_max_length_array=mel_max_length_array)
  File "C:\Users\fangg\Anaconda3\envs\emo_tts3\lib\site-packages\torch\nn\parallel\data_parallel.py", line 159, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "C:\Users\fangg\Anaconda3\envs\emo_tts3\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "D:\workspace_tts\emotion-fs-3\network.py", line 288, in forward
    memory, c_mask, attns_enc, duration_mask = self.encoder(characters, pos=pos_text)
  File "C:\Users\fangg\Anaconda3\envs\emo_tts3\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "D:\workspace_tts\emotion-fs-3\network.py", line 106, in forward
    x, attn = layer(x, x, mask=mask, query_mask=c_mask)
  File "C:\Users\fangg\Anaconda3\envs\emo_tts3\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "D:\workspace_tts\emotion-fs-3\module.py", line 289, in forward
    result, attns = self.multihead(key, value, query, mask=mask, query_mask=query_mask, kv_mask=kv_mask)
  File "C:\Users\fangg\Anaconda3\envs\emo_tts3\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "D:\workspace_tts\emotion-fs-3\module.py", line 212, in forward
    attn = t.bmm(query, key.transpose(1, 2))    #batch matrix-matrix product
RuntimeError: CUDA error: device-side assert triggered

问题描述:当前项目下调用python train_transformer.py命令后,有时会报上面异常,有时则直接卡住然后运行结束(什么信息也没有,其实主要的问题就是:Assertion srcIndex < srcSelectDimSize failed.),然后我就开始尝试修改hyperparams.py里面的一些主要参数(其中网上查找了很多问题相关的文章),没有效果……最后看到了这位老哥的文章no cuda capable device给了我灵感,他说词表中索引不对,我当然不知道他的词表是怎样的,但想到我的metadata_train.csv文件里面的内容好像有一大把标点符号,因为这些标点符号在训练过程中是没有什么用的,很可能问题就在这里,最后我把所有的标点符号都去掉,重新开始
python prepare_data.py
python train_transformer.py
……
想不到竟然OK了,哎,我可是搞了半天了啊这个问题,要是还没成我都打算直接到原项目里面去提问了,值得记录一下。

/usr/bin/ld: CMakeFiles/global_planning_node.dir/src/global_planning_node.cpp.o: in function main.cold': global_planning_node.cpp:(.text.unlikely+0x273): undefined reference to tf::TransformListener::~TransformListener()' /usr/bin/ld: CMakeFiles/global_planning_node.dir/src/global_planning_node.cpp.o: in function main': global_planning_node.cpp:(.text.startup+0xc64): undefined reference to tf::Transformer::DEFAULT_CACHE_TIME' /usr/bin/ld: global_planning_node.cpp:(.text.startup+0xc92): undefined reference to tf::TransformListener::TransformListener(ros::Duration, bool)' /usr/bin/ld: global_planning_node.cpp:(.text.startup+0xd7a): undefined reference to tf::Transformer::lookupTransform(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::_cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ros::Time const&, tf::StampedTransform&) const' /usr/bin/ld: global_planning_node.cpp:(.text.startup+0xe74): undefined reference to tf::TransformListener::~TransformListener()' collect2: error: ld returned 1 exit status make[2]: *** [CMakeFiles/global_planning_node.dir/build.make:246: /home/juan/catkin_ws/devel/.private/putn/lib/putn/global_planning_node] Error 1 make[1]: *** [CMakeFiles/Makefile2:207: CMakeFiles/global_planning_node.dir/all] Error 2 make[1]: *** Waiting for unfinished jobs.... /usr/bin/ld: CMakeFiles/local_obs_node.dir/src/local_obs.cpp.o: in function rcvVelodyneCallBack(sensor_msgs::PointCloud2<std::allocator<void> > const&)': local_obs.cpp:(.text+0xa0b): undefined reference to tf::Transformer::waitForTransform(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ros::Time const&, ros::Duration const&, ros::Duration const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) const' /usr/bin/ld: local_obs.cpp:(.text+0xc74): undefined reference to tf::TransformListener::transformPoint(std::cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, geometry_msgs::PointStamped<std::allocator<void> > const&, geometry_msgs::PointStamped<std::allocator<void> >&) const' /usr/bin/ld: CMakeFiles/local_obs_node.dir/src/local_obs.cpp.o: in function main.cold': local_obs.cpp:(.text.unlikely+0x37d): undefined reference to tf::TransformListener::~TransformListener()' /usr/bin/ld: CMakeFiles/local_obs_node.dir/src/local_obs.cpp.o: in function main':local_obs.cpp:(.text.startup+0x62a): undefined reference to tf::Transformer::DEFAULT_CACHE_TIME' /usr/bin/ld: local_obs.cpp:(.text.startup+0x64d): undefined reference to tf::TransformListener::TransformListener(ros::Duration, bool)' /usr/bin/ld: local_obs.cpp:(.text.startup+0x6dc): undefined reference to tf::TransformListener::~TransformListener()' collect2: error: ld returned 1 exit status make[2]: *** [CMakeFiles/local_obs_node.dir/build.make:246: /home/juan/catkin_ws/devel/.private/putn/lib/putn/local_obs_node] Error 1 make[1]: *** [CMakeFiles/Makefile2:612: CMakeFiles/local_obs_node.dir/all] Error 2 make: *** [Makefile:141: all] Error 2解释编译时出现这个问题的原因,并说说如何解决
07-10
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值