[论文记录] 2019 - Multimodal Transformer for Unaligned Multimodal Language Sequences(更新中)
论文简介
原论文:Multimodal Transformer for Unaligned Multimodal Language Sequences1
针对非对齐多模态语言序列的多模态Transformer
论文地址:https://arxiv.org/abs/1906.00295
源码地址:https://github.com/yaohungt/Multimodal-Transformer
以下仅为作者阅读论文时的记录,学识浅薄,如有错误,欢迎指正。
论文内容
摘要
-
Human language is often multimodal, which comprehends a mixture of natural language, facial gestures, and acoustic behaviors.
人类的语言通常是多模态的,包含了自然语言、面部姿态以及声学行为。 -
However, two major challenges in modeling such multimodal human language time-series data exist:
无论如何,建模这种多模态人类语言的时间序列数据存在两个主要的挑战