1. 论文介绍
关于论文的中文阅读参考这里:
https://blog.csdn.net/ggqyh/article/details/136098693;
代码:
https://github.com/RetroCirce/HTS-Audio-Transformer
2. 关于事件定位的相关提问
这里主要罗列出作者回答 关于音频事件定位的相关问题:
2.1 audio set 上的事件定位功能;
https://github.com/RetroCirce/HTS-Audio-Transformer/issues/25;
CUDA_VISIBLE_DEVICES=1,2,3,4 python main.py test
// make sure that fl_local=True in config.py
python fl_evaluate.py
// organize and gather the localization results
fl_evaluate_f1.ipynb
// Follow the notebook to produce the results
是的,这个函数是一个临时函数,你可能知道 AudioSet 去年发布了一个带有强大本地化标签的小子集。于是我把公司服务器里的数据处理了一下,以备后用,但现在却无法访问了。
I think doing the localization on AudioSet is different from DESED, there are two differences I would suggest you need to write your own code for processing it:
我认为在 AudioSet 上进行本地化与 DESED 不同,有两个区别我建议您需要编写自己的代码来处理它:
if you want to train a new HST-AT model by localization data (my HTS-AT can support it but I did not write it), you need to extract different output of HST-AT (I believe it is the last second layer feature-map output), and have a loss function to converge it. Actually this might become a new work. One thing to keep in mind is that the interpolation and resolution of the output may be different from the input localization time resolution ----- in that you need to find a way to align them.
如果你想通过本地化数据训练一个新的HST-AT模型(我的HTS-AT可以支持,但我没有写),你需要提取HST-AT的不同输出(我相信它是最后第二层特征) -map 输出),并有一个损失函数来收敛它。其实这可能会成为一部新作品。要记住的一件事是,输出的插值和分辨率可能与输入的本地化时间分辨率不同 ----- 因为您需要找到一种方法来对齐它们。
If you want to evaluate the model on localization dataset, fl_evaluate.py can be served as a code-base but you need to

最低0.47元/天 解锁文章
1821

被折叠的 条评论
为什么被折叠?



