论文阅读HTS-AT- A HIERARCHICAL TOKEN-SEMANTIC AUDIO TRANSFORMERFOR SOUND CLASSIFICATION AND DETECTION

1. 论文介绍

关于论文的中文阅读参考这里:
https://blog.csdn.net/ggqyh/article/details/136098693;

代码:
https://github.com/RetroCirce/HTS-Audio-Transformer

2. 关于事件定位的相关提问

这里主要罗列出作者回答 关于音频事件定位的相关问题:

2.1 audio set 上的事件定位功能;

https://github.com/RetroCirce/HTS-Audio-Transformer/issues/25;

CUDA_VISIBLE_DEVICES=1,2,3,4 python main.py test
// make sure that fl_local=True in config.py
python fl_evaluate.py
// organize and gather the localization results
fl_evaluate_f1.ipynb
// Follow the notebook to produce the results

是的,这个函数是一个临时函数,你可能知道 AudioSet 去年发布了一个带有强大本地化标签的小子集。于是我把公司服务器里的数据处理了一下,以备后用,但现在却无法访问了。

I think doing the localization on AudioSet is different from DESED, there are two differences I would suggest you need to write your own code for processing it:
我认为在 AudioSet 上进行本地化与 DESED 不同,有两个区别我建议您需要编写自己的代码来处理它:

if you want to train a new HST-AT model by localization data (my HTS-AT can support it but I did not write it), you need to extract different output of HST-AT (I believe it is the last second layer feature-map output), and have a loss function to converge it. Actually this might become a new work. One thing to keep in mind is that the interpolation and resolution of the output may be different from the input localization time resolution ----- in that you need to find a way to align them.
如果你想通过本地化数据训练一个新的HST-AT模型(我的HTS-AT可以支持,但我没有写),你需要提取HST-AT的不同输出(我相信它是最后第二层特征) -map 输出),并有一个损失函数来收敛它。其实这可能会成为一部新作品。要记住的一件事是,输出的插值和分辨率可能与输入的本地化时间分辨率不同 ----- 因为您需要找到一种方法来对齐它们。

If you want to evaluate the model on localization dataset, fl_evaluate.py can be served as a code-base but you need to

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值