本文是唇语识别近2年来最新的方法的记录,主要集中在中英文词级数据集如LRW,LRW-1000,英文句子级数据集如LRS2,英文短语级数据集如OuluVS2,以及其他一些数据集。
记录方法为1.简要翻译重点内容;2.介绍相关文章。
记录将持续更新,对唇语识别感兴趣的研究者请收藏或私信交流,本文将持续更新。
下面是文章记录列表,链接为已记录(持续更新):
文章目录
- Word-Level
- 1.Time Shift Module
- 2.Multi-Scale Temporal Convolutional Networks
- 3.HP-ResNet18 and MS-TCN+Attention
- 4.Synchronous Bidirectional Learning for Multilingual Lip Reading
- 5.Visual Keyword Spotting with Attention (also in sentence-level)
- x.Effective Lip Reading
- x.LiRA: Learning Visual Speech Representations from Audio through Self-supervision
- x.Lip-reading with Densely Connected Temporal Convolutional Networks
- x.Seeing voices and hearing voices learning discriminative embeddings using cross-modal self-supervision
- x.Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition
- x.Mutual Information Maximization for Effective Lip Reading
- x.Deformation Flow Based Two-Stream Network for Lip Reading
- x.Towards Practical Lipreading with Distilled and Efficient Models
- x.Multi-Modality Associative Bridging Through Memory: Speech Sound Recollected From Face Video
- Sentence-Level
- Phrase-Level
Word-Level
1.Time Shift Module
来源:ICASSP2021,作者单位:新疆大学。
1.1.史上最详细How to Use Time Information Effectively Combining with Time Shift Module for Lipreading文章记录
2.Multi-Scale Temporal Convolutional Networks
来源:ICASSP2020,作者单位:帝国理工大学。
2.1.史上最详细 Lipreading using Temporal Convolutional Networks 环境配置
2.2.史上最详细Lipreading using Temporal Convolutional Networks(MS-TCN)代码层面详解
3.HP-ResNet18 and MS-TCN+Attention
来源:INTERSPEECH2021,作者单位:中国科学技术大学
2.1史上最详细Lip-reading with Hierarchical Pyramidal Convolution and Self-Attention文章记录
4.Synchronous Bidirectional Learning for Multilingual Lip Reading
5.Visual Keyword Spotting with Attention (also in sentence-level)
x.Effective Lip Reading
来源:ICASSP2020,作者单位:中科院计算所。
x.LiRA: Learning Visual Speech Representations from Audio through Self-supervision
x.Lip-reading with Densely Connected Temporal Convolutional Networks
x.Seeing voices and hearing voices learning discriminative embeddings using cross-modal self-supervision
x.Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition
x.Mutual Information Maximization for Effective Lip Reading
x.Deformation Flow Based Two-Stream Network for Lip Reading
x.Towards Practical Lipreading with Distilled and Efficient Models
x.Multi-Modality Associative Bridging Through Memory: Speech Sound Recollected From Face Video
Sentence-Level
1.End-to-end Audio-visual Speech Recognition with Conformers
来源:ICASSP2021,作者单位:帝国理工大学。