arXiv每日推荐-3.19:语音/音频每日论文速递

同步公众号(arXiv每日学术速递)

【1】 Detecting Replay Attacks Using Multi-Channel Audio: A Neural Network-Based Method
标题:利用多通道音频检测重放攻击:一种基于神经网络的方法
作者: Yuan Gong, Christian Poellabauer
链接:https://arxiv.org/abs/2003.08225

【2】 Multi-Source DOA Estimation through Pattern Recognition of the Modal Coherence of a Reverberant Soundfield
标题:混响声场模式相干性模式识别的多源DOA估计
作者: A. Fahim, T. D. Abhayapala
链接:https://arxiv.org/abs/2003.08050

【3】 Deliberation Model Based Two-Pass End-to-End Speech Recognition
标题:基于商议模型的两遍端到端语音识别
作者: Ke Hu, Rohit Prabhavalkar
链接:https://arxiv.org/abs/2003.0796

3.18日补充

【1】 High-Resolution Speaker Counting In Reverberant Rooms Using CRNN With Ambisonics Features
标题:使用具有Ambisonics特性的CRNN在混响室中进行高分辨率说话人计数
作者: Pierre-Amaury Grumiaux, Alexandre Guérin
链接:https://arxiv.org/abs/2003.07839

【2】 Multi-modal Dense Video Captioning
标题:多模态密集视频字幕
作者: Vladimir Iashin, Esa Rahtu
链接:https://arxiv.org/abs/2003.07758

【3】 Hybrid Autoregressive Transducer (hat)
标题:混合自回归换能器(HAT)
作者: Ehsan Variani, Michael Riley
链接:https://arxiv.org/abs/2003.07705

【4】 Audio inpainting with generative adversarial network
标题:具有生成性对抗网络的音频修复
作者: P. P. Ebner, A. Eltelt
链接:https://arxiv.org/abs/2003.07704

【5】 ASR Error Correction and Domain Adaptation Using Machine Translation
标题:使用机器翻译的ASR纠错和域自适应
作者: Anirudh Mani, Florian Metze
备注:Accepted for Oral Presentation at ICASSP 2020
链接:https://arxiv.org/abs/2003.07692

【6】 End-to-end Recurrent Denoising Autoencoder Embeddings for Speaker Identification
标题:用于说话人识别的端到端循环去噪自动编码器嵌入
作者: Esther Rituerto-González, Carmen Peláez-Moreno
链接:https://arxiv.org/abs/2003.07688

【7】 Deep Attention Fusion Feature for Speech Separation with End-to-End Post-filter Method
标题:用于端到端后滤波法语音分离的深度注意融合特征
作者: Cunhang Fan, Xuefei Liu
链接:https://arxiv.org/abs/2003.07544

【8】 High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model
标题:基于双头上下文层轨迹LSTM模型的高精度低潜伏期语音识别
作者: Jinyu Li, Yifan Gong
备注:Accepted by ICASSP 2020
链接:https://arxiv.org/abs/2003.07482

【9】 TensorFlow Audio Models in Essentia
标题:Essentia中的TensorFlow音频模型
作者: Pablo Alonso-Jiménez, Xavier Serra
链接:https://arxiv.org/abs/2003.0739

3.17日补充

【1】 Multi-modal Multi-channel Target Speech Separation
标题:多模态多通道目标语音分离
作者: Rongzhi Gu, Dong Yu
链接:https://arxiv.org/abs/2003.07032

【2】 TRANS-BLSTM: Transformer with Bidirectional LSTM for Language Understanding
标题:TRANS-BLSTM:具有双向LSTM的转换器用于语言理解
作者: Zhiheng Huang, Bing Xiang
链接:https://arxiv.org/abs/2003.07000

【3】 Exploring Gaussian mixture model framework for speaker adaptation of deep neural network acoustic models
标题:探索用于深层神经网络声学模型的说话人自适应的高斯混合模型框架
作者: Natalia Tomashenko, Yannick Esteve
备注:36 pages; originally was submitted to CSL in February 2017
链接:https://arxiv.org/abs/2003.06894

【4】 A proto-object based audiovisual saliency map
标题:一种基于原型对象的视听显著图
作者: Sudarshan Ramenahalli
链接:https://arxiv.org/abs/2003.06779

【5】 Emotions Don’t Lie: A Deepfake Detection Method using Audio-Visual Affective Cues
标题:情感不说谎:一种基于视听情感线索的伪装检测方法
作者: Trisha Mittal, Dinesh Manocha
链接:https://arxiv.org/abs/2003.06711

【6】 Perception of prosodic variation for speech synthesis using an unsupervised discrete representation of F0
标题:使用F0的无监督离散表示的语音合成的韵律变化的感知
作者: Zack Hodari, Simon King
备注:Published to the 10th ISCA International Conference on Speech Prosody (SP2020)
链接:https://arxiv.org/abs/2003.06686

【7】 Audio-Visual Spatial Aligment Requirements of Central and Peripheral Object Events
标题:中心和外围对象事件的视听空间对齐要求
作者: Davide Berghi, Philip J.B. Jackson
链接:https://arxiv.org/abs/2003.0665

3.16日补充

【1】 Unsupervised Style and Content Separation by Minimizing Mutual Information for Speech Synthesis
标题:语音合成中最小互信息的无监督风格和内容分离
作者: Ting-Yao Hu, Chandra Dhir
备注:Accepted at ICASSP 2020 (for presentation in a lecture session)
链接:https://arxiv.org/abs/2003.06227

【2】 Quantifying Musical Style: Ranking Symbolic Music based on Similarity to a Style
标题:量化音乐风格:基于与风格相似性的象征性音乐排名
作者: Jeff Ens, Philippe Pasquier
链接:https://arxiv.org/abs/2003.06226

【3】 A Wide Dataset of Ear Shapes and Pinna-Related Transfer Functions Generated by Random Ear Drawings
标题:由随机耳图生成的耳形状和耳廓相关传递函数的广泛数据集
作者: Corentin Guezenoc (IETR), Renaud Seguier (IETR)
链接:https://arxiv.org/abs/2003.0618

【4】 Speaker Identification using EEG
标题:基于EEG的说话人识别
作者: Gautam Krishna, Ahmed Tewfik
链接:https://arxiv.org/abs/2003.04733

【5】 Development of Automatic Speech Recognition for Kazakh Language using Transfer Learning
标题:基于迁移学习的哈萨克语语音自动识别系统的开发
作者: Amirgaliyev E.N., Baimuratov O
链接:https://arxiv.org/abs/2003.0471

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值