史上最详细唇语识别最新研究进展记录

最新推荐文章于 2024-10-31 11:12:49 发布

置顶想到好名再改

最新推荐文章于 2024-10-31 11:12:49 发布

阅读量3.1k

点赞数 7

分类专栏：史上最详细文章标签：知识图谱人工智能唇语识别最新进展论文记录

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/qq_44697805/article/details/122130360

版权

史上最详细专栏收录该内容

11 篇文章

订阅专栏

本文总结了近2年唇语识别领域的关键方法，包括TimeShiftModule、MS-TCN、HP-ResNet等，涵盖了Word-Level、Sentence-Level和Phrase-Level的数据集与技术，对研究者提供一站式参考。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

本文是唇语识别近2年来最新的方法的记录，主要集中在中英文词级数据集如LRW，LRW-1000，英文句子级数据集如LRS2，英文短语级数据集如OuluVS2，以及其他一些数据集。
记录方法为1.简要翻译重点内容；2.介绍相关文章。
记录将持续更新，对唇语识别感兴趣的研究者请收藏或私信交流，本文将持续更新。

下面是文章记录列表，链接为已记录（持续更新）：

文章目录

Word-Level
Sentence-Level
Phrase-Level
- 1.Multi-Perspective LSTM for Joint Visual Representation Learning

Word-Level

1.Time Shift Module

来源：ICASSP2021，作者单位：新疆大学。
1.1.史上最详细How to Use Time Information Effectively Combining with Time Shift Module for Lipreading文章记录

2.Multi-Scale Temporal Convolutional Networks

来源：ICASSP2020，作者单位：帝国理工大学。
2.1.史上最详细 Lipreading using Temporal Convolutional Networks 环境配置
2.2.史上最详细Lipreading using Temporal Convolutional Networks(MS-TCN)代码层面详解

3.HP-ResNet18 and MS-TCN+Attention

来源：INTERSPEECH2021，作者单位：中国科学技术大学
2.1史上最详细Lip-reading with Hierarchical Pyramidal Convolution and Self-Attention文章记录

4.Synchronous Bidirectional Learning for Multilingual Lip Reading

5.Visual Keyword Spotting with Attention (also in sentence-level)

x.Effective Lip Reading

来源：ICASSP2020，作者单位：中科院计算所。

x.LiRA: Learning Visual Speech Representations from Audio through Self-supervision

x.Lip-reading with Densely Connected Temporal Convolutional Networks

x.Seeing voices and hearing voices learning discriminative embeddings using cross-modal self-supervision

x.Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition

x.Mutual Information Maximization for Effective Lip Reading

x.Deformation Flow Based Two-Stream Network for Lip Reading

x.Towards Practical Lipreading with Distilled and Efficient Models

x.Multi-Modality Associative Bridging Through Memory: Speech Sound Recollected From Face Video

Sentence-Level

1.End-to-end Audio-visual Speech Recognition with Conformers

来源：ICASSP2021，作者单位：帝国理工大学。

2.Visual Keyword Spotting with Attention (also in word-level)

x.Sub-word Level Lip Reading With Visual Attention

Phrase-Level

1.Multi-Perspective LSTM for Joint Visual Representation Learning

评论 3

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。