论文阅读:《 Lip Reading Sentences in the Wild》

论文:https://arxiv.org/abs/1611.05358
原文:http://www.hankcs.com/nlp/cs224n-lip-reading.html

唇语翻译

将视频处理为以嘴唇为中心的图片序列,给或不给语音,预测正在讲的话。

hankcs.com 2017-06-30 下午3.16.19.png

这些数据可能来自新闻直播:

hankcs.com 2017-06-30 下午3.16.41.png

动画演示:

这里唇语和语音的识别、卡拉OK效果式的对齐,都是模型自动完成的。

架构

hankcs.com 2017-06-30 下午3.40.00.png

视觉和听觉两个模块或者混合交火或者单独使用,每次输出一个字母。

视觉

取嘴唇时序上的sliding window,先喂给CNN,再喂给LSTM,生成一个output vector$s$:

hankcs.com 2017-06-30 下午3.42.20.png

听觉

类似地,取音频上的窗口分片:

hankcs.com 2017-06-30 下午3.44.03.png

Attention与Spell

将上述两个LSTM输出的output states送入一个带两个attention拓展的LSTM:

hankcs.com 2017-06-30 下午3.48.34.png

Curriculum Learning

hankcs.com 2017-06-30 下午3.52.37.png

通常训练seq2seq模型时喂进去的是完整的句子,但Curriculum Learning的手法是,一次只喂几个单词,逐步增长。这样可以加快收敛速度,降低过拟合。

Scheduled Sampling

hankcs.com 2017-06-30 下午5.03.54.png

通常训练递归模型的时候,使用的是前一个时刻的“标准答案”的one-hot向量,而这里根据前一个时刻的预测结果采样,让训练和测试统一起来。

数据集

hankcs.com 2017-06-30 下午5.06.20.png

来自BBC新闻的五千个小时的视频,对齐字幕,做了嘴唇位置等预处理。

结果

hankcs.com 2017-06-30 下午5.11.40.png

有趣之处在于,他们将模型效果与专业做唇语翻译的公司做了对比,发现比专业人士还要准,而且错误率低了20个百分点。(竟然还有公司专门做这个)

在同时输入语音和唇语的情况下,错误可以做到更低。

展开阅读全文

Wild Domains

09-20

DescriptionnnConsider a search engine that knows a number of sites. A site is described by a well-formed domain name which consists of two or more domain parts separated by dots such as www.sharif.edu. A domain part is a string of upper-case and lower-case alphabetic characters. For the rest of this description, we use the term domain name for well-formed domain name. nnTo restrict a search to some specific sites, a user is allowed to use domain patterns in a search query. A domain pattern is similar to a domain name, except that it may contain arbitrary number of the following wildcards: nnAsterisk character (*) that matches a sequence of one or more domain parts separated by dots, nQuestion mark character (?) that matches at least one and at most three domain parts separated by dots, nExclamation mark character (!) that matches at least three domain parts separated by dots. nnNote that if a wildcard character appears in a domain pattern, it should be separated from its surrounding domain parts (if any) by dots. For example www.?.edu, or *.edu are both valid domain patterns matching domain name www.sharif.edu. nnTwo domain patterns match if at least one domain name can be constructed matching both domain patterns. For example, the domain patterns www.?.edu and *.edu match, since both match the domain name www.xyz.edu. Note that the constructed domain name may be an arbitrary (yet not necessarily an existing) site. You are to write a program that given two domain patterns, determines whether the patterns match. nInputnnThe first line of the input file contains a single integer t (1 <= t <= 10), the number of test cases, followed by the input data for each test case. Each test case consists of two lines, each containing a domain pattern. Each domain pattern is at most 255 characters long, and does not include any leading or trailing blank characters.nOutputnnThere should be one line per test case in the output file containing a single word YES or NO, depending on whether the two domain patterns in the test case match or not.nSample Inputnn2nwww.?.edun?.edun*.edunyahoo.comnSample OutputnnYESnNO 问答

没有更多推荐了,返回首页