ACL 2016&2015 Accepted Papers

1.ACL 2016 Accepted Papers 会议论文

Long Papers
A CALL system for learning preposition usage
John Lee

A Character-level Decoder without Explicit Segmentation for Neural Machine Translation
Junyoung Chung, Kyunghyun Cho and Yoshua Bengio

A Continuous Space Rule Selection Model for Syntax-based Statistical Machine Translation
Jingyi Zhang, Masao Utiyama, Eiichro Sumita, Graham Neubig and Satoshi Nakamura

A Corpus-Based Analysis of Canonical Word Order of Japanese Double Object Constructions
Ryohei Sasano and Manabu Okumura

A Discriminative Topic Model using Document Network Structure
Weiwei Yang, Jordan Boyd-Graber and Philip Resnik

A Fast Unified Model for Parsing and Sentence Understanding
Samuel R. Bowman, Jon Gauthier, Abhinav Rastogi, Raghav Gupta, Christopher D. Manning and Christopher Potts

A Mean-Field Vector Space for Distributional Semantics for Entailment
James Henderson and Diana Popa

A Multi-media Approach to Cross-lingual Entity Knowledge Transfer
Di Lu, Xiaoman Pan, Nima Pourdamghani, Heng Ji, Shih-Fu Chang and Kevin Knight

A New Psychometric-inspired Evaluation Metric for Chinese Word Segmentation
Peng Qian, Xipeng Qiu and Xuanjing Huang

A Parallel-Hierarchical Model for Machine Comprehension on Sparse Data
Adam Trischler, Zheng Ye, Xingdi Yuan, Jing He and Phillip Bachman

A Persona-Based Neural Conversation Model
Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao and Bill Dolan

A Search-Based Dynamic Reranking Model for Dependency Parsing
Hao Zhou, Yue Zhang, Shujian Huang, Junsheng Zhou, XIN-YU DAI and Jiajun Chen

A Sentence Interaction Network for Modeling Dependence between Sentences
Biao Liu and Minlie Huang

A short proof that O_2 is an MCFL
Mark-Jan Nederhof

A Thorough Examination of the CNN / Daily Mail Reading Comprehension Task
Danqi Chen, Jason Bolton and Christopher D. Manning

A Trainable Spaced Repetition Model for Language Learning
Burr Settles and Brendan Meeder

A Transition-Based System for Joint Lexical and Syntactic Analysis
Matthieu Constant and Joakim Nivre

A Word-based Neural Network Method for Chinese Word Segmentation
Deng Cai and Hai Zhao

Achieving open vocabulary neural machine translation with hybrid word-character models
Minh-Thang Luong and Christopher D. Manning

Active Learning for Dependency Parsing with Partial Annotation
Zhenghua Li, Min Zhang, Yue Zhang, Zhanyi Liu, Wenliang Chen, Hua Wu and Haifeng Wang

Adaptive Joint Learning of Compositional and Non-Compositional Phrase Embeddings
Kazuma Hashimoto and Yoshimasa Tsuruoka

Addressing Limited Data for Textual Entailment Across Domains
Chaitanya Shivade, Preethi Raghavan and Siddharth Patwardhan

Agreement-based Learning of Parallel Lexicons and Phrases from Non-Parallel Corpora
Chunyang Liu, Yang Liu and Maosong Sun

Alleviating Poor Context with Background Knowledge for Named Entity Disambiguation
Ander Barrena, Aitor Soroa and Eneko Agirre

ALTO: Active Learning with Topic Overviews for Speeding Label Induction and Document Labeling
Forough Poursabzi-Sangdeh, Jordan Boyd-Graber, Leah Findlater and Kevin Seppi

Analysing Biases in Human Perception of User Age and Gender from Text
Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle Ungar and Daniel Preoţiuc-Pietro

Annotating and Predicting Non-Restrictive Noun Phrase Modifications
Gabriel Stanovsky and Ido Dagan

AraSenTi: Large-Scale Twitter-Specific Arabic Sentiment Lexicons
Nora Al-Twairesh, Hend Al-Khalifa and Abdulmalik AlSalman

Automatic identification of vector-space semantic content in speech to detect Alzheimer’s disease
Maria Yancheva and Frank Rudzicz

Automatic Labeling of Topic Models Using Text Summaries
Tianming Wang and Xiaojun Wan

Automatic Stance Classification of Argumentative Essays
Isaac Persing

Automatic Text Scoring Using Neural Networks
Dimitrios Alikaniotis, Helen Yannakoudakis and Marek Rei

Beyond Plain Spatial Knowledge: Determining Where Entities Are and Are Not Located, and For How Long
Alakananda Vempala and Eduardo Blanco

Bi-Transferring Deep Neural Networks for Domain Adaptation
Guangyou Zhou, Jun Zhao and Xiangji Jimmy Huang

Bidirectional Recurrent Convolutional Neural Network for Relation Classification
Rui Cai, Xiaodong Zhang and Houfeng WANG

Bilingual Segmented Topic Model
Akihiro Tamura and Eiichiro Sumita

Case and Cause in Icelandic: Reconstructing Causal Networks of Cascaded Language Changes
Fermin Moscoso del Prado Martin and Christian Brendel

Causality of Verbs for Grounded Language Understanding
Qiaozi Gao, Malcolm Doering, Shaohua Yang and Joyce Chai

CFO: Conditional Focused Neural Question Answering with Large-scale Knowledge Bases
Zihang Dai, Lei Li and Wei Xu

Chinese Couplet Generation with Neural Network Structures
Rui Yan

Collective Entity Resolution with Multi-Focal Attention
Amir Globerson, Nevena Lazic, Soumen Chakrabarti, Amarnag Subramanya, Michael Ringaard and Fernando Pereira

Combining Natural Logic and Shallow Reasoning for Question Answering
Gabor Angeli, Neha Nayak and Christopher D. Manning

Commonsense Knowledge Base Completion
Xiang Li, Aynaz Taheri, Lifu Tu and Kevin Gimpel

Composing Distributed Representations of Relational Patterns
Sho Takase, Naoaki Okazaki and Kentaro Inui

Compositional Learning of Embeddings for Relation Paths in Knowledge Base and Text
Kristina Toutanova, Victoria Lin, Wen-tau Yih, Hoifung Poon and Chris Quirk

Compositional Sequence Labeling Models for Error Detection in Learner Writing
Marek Rei and Helen Yannakoudakis

Compressing Neural Language Models by Sparse Word Representations
Yunchuan Chen, Lili Mou, Yan Xu, Ge Li and Zhi Jin

Connotation Frames: A Data-Driven Investigation
Hannah Rashkin, Sameer Singh and Yejin Choi

Constrained Multi-Task Learning for Automated Essay Scoring
Ronan Cummins, Meng Zhang and Ted Briscoe

Context-aware Argumentative Relation Mining
Huy Nguyen and Diane Litman

Continuous Profile Models in ASL Syntactic Facial Expression Synthesis
Hernisa Kacorri and Matt Huenerfauth

Coordination Annotation Extension in the Penn Tree Bank
Jessica Ficler and Yoav Goldberg

Cross-domain Text Classification with Multiple Domains and Disparate Label Sets
Himanshu Sharad Bhatt, Manjira Sinha and Shourya Roy

Cross-Lingual Image Caption Generation
Takashi Miyazaki and Nobuyuki Shimizu

Cross-Lingual Lexico-Semantic Transfer in Language Learning
Ekaterina Kochmar and Ekaterina Shutova

Cross-lingual Models of Word Embeddings: An Empirical Comparison
Shyam Upadhyay, Manaal Faruqui, Chris Dyer and Dan Roth

Cross-Lingual Morphological Tagging for Low-Resource Languages
Jan Buys and Jan A. Botha

Cross-Lingual Sentiment Classification with Bilingual Document Representation Learning
Xinjie Zhou and Xiaojun Wan

CSE: Conceptual Sentence Embeddings based on Attention Model
Yashen Wang, Heyan Huang, Chong Feng, Qiang Zhou, Jiahui Gu and Xiong Gao

Data Recombination for Neural Semantic Parsing
Robin Jia and Percy Liang

Deep Fusion LSTMs for Text Semantic Matching
Pengfei Liu, Xipeng Qiu and Xuanjing Huang

Deep Reinforcement Learning with a Natural Language Action Space
Ji He, Jianshu Chen, Xiaodong He, Jianfeng Gao, Lihong Li, Li Deng and Mari Ostendorf

Dependency Parsing with Bounded Block Degree and Well-nestedness viaLagrangian Relaxation and Branch-and-Bound
Caio Corro, Joseph Le Roux, Mathieu Lacroix, Antoine Rozenknop and Roberto Wolfler Calvo

Detecting Common Discussion Topics Across Culture From News Reader Comments
Bei Shi, Wai Lam, Lidong Bing and Yinqing Xu

Detecting Events in FrameNet
Shulin Liu, Kang Liu and Jun Zhao

Diachronic word embeddings reveal laws of semantic change
William Hamilton, Jure Leskovec and Dan Jurafsky

Discovery of Treatments from Text Corpora
Christian Fong and Justin Grimmer

Discriminative Deep Random Walk for Network Classification
Juzheng Li, Jun Zhu and Bo Zhang

DocChat: An Information Retrieval Approach for Chatbot Engines Using Unstructured Documents
Zhao Yan, Nan Duan, Junwei Bao, Peng Chen, Ming Zhou, Zhoujun Li and Jianshe Zhou

Document-level Sentiment Inference with Social, Faction, and Discourse Context
Eunsol Choi, Hannah Rashkin, Luke Zettlemoyer and Yejin Choi

Domain Adaptation for Authorship Attribution: Improved Structural Correspondence Learning
Upendra Sapkota, Thamar Solorio, Manuel Montes and Steven Bethard

Easy Questions First? Curriculum Learning for Question Answering
Mrinmaya Sachan and Eric Xing

Easy Things First: Installments Improve Referring Expression Generation for Objects in Photographs
Sina Zarrieß and David Schlangen

Edge-Linear First-Order Dependency Parsing with Undirected Minimum Spanning Tree Inference
Effi Levi, Roi Reichart and Ari Rappoport

Effects of Text Corpus Properties on Short Text Clustering Performance
Catherine Finegan-Dollak, Reed Coke, Rui Zhang, Xiangyi Ye and Dragomir Radev

Efficient techniques for parsing with tree automata
Jonas Groschwitz, Mark Johnson and Alexander Koller

Embeddings for Word Sense Disambiguation: An Evaluation Study
Ignacio Iacobacci, Mohammad Taher Pilehvar and Roberto Navigli

End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures
Makoto Miwa and Mohit Bansal

End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF
Xuezhe Ma and Eduard Hovy

Entropy converges between dialogue participants: explanations from an information-theoretic perspective
Yang Xu and David Reitter

Evaluating Sentiment Analysis in the Context of Securities Trading
Siavash Kazemian, Shunan Zhao and Gerald Penn

Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling for Dialogue Topic Tracking
Seokhwan Kim, Rafael Banchs and Haizhou Li

Finding Non-Arbitrary Form-Meaning Systematicity Using String-Metric Learning for Kernel Regression
E.D. Gutierrez, Roger Levy and Benjamin Bergen

Finding the Middle Ground - A Model for Planning Satisficing Answers
Sabine Janzen, Wolfgang Maaß and Tobias Kowatsch

Generalized Transition-based Dependency Parsing via Control Parameters
Bernd Bohnet, Ryan McDonald, Emily Pitler and Ji Ma

Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus
Iulian Vlad Serban, Alberto García-Durán, Caglar Gulcehre, Sungjin Ahn, Sarath Chandar, Aaron Courville and Yoshua Bengio

Generating Natural Questions About an Image
Nasrin Mostafazadeh, Ishan Misra, Jacob Devlin, Margaret Mitchell, Xiaodong He and Lucy Vanderwende

Generative Topic Embedding: a Continuous Representation of Documents
Shaohua Li, Tat-Seng Chua, Jun Zhu and Chunyan Miao

Globally Normalized Transition-Based Neural Networks
Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro Presta, Kuzman Ganchev, Slav Petrov and Michael Collins

Grammatical Error Correction: Machine Translation and Classifiers
Alla Rozovskaya and Dan Roth

Graph-based Dependency Parsing with Bidirectional LSTM
Wenhui Wang and Baobao Chang

Graph-Based Translation Via Graph Segmentation
Liangyou Li, Andy Way and Qun Liu

Grapheme-to-Phoneme Models for (Almost) Any Language
Aliya Deri and Kevin Knight

Harnessing Cognitive Features for Sarcasm Detection
Abhijit Mishra, Diptesh Kanojia, Kuntal Dey, Seema Nagar and Pushpak Bhattacharyya

Harnessing Deep Neural Networks with Logic Rules
Zhiting Hu, Xuezhe Ma, Zhengzhong Liu, Eduard Hovy and Eric Xing

Hidden Softmax Sequence Model for Dialogue Structure Analysis
Zhiyang He, Xien Liu, Ping Lv and Ji Wu

How Much is 131 Million Dollars? Putting Numbers in Perspective with Compositional Descriptions
Arun Chaganty and Percy Liang

How to Train Dependency Parsers with Inexact Search for Joint Sentence Boundary Detection and Parsing of Entire Documents
Anders Björkelund, Agnieszka Faleńska, Wolfgang Seeker and Jonas Kuhn

How well do Computers Solve Math Word Problems? Large-Scale Dataset Construction and Evaluation
Danqing Huang, Shuming Shi, Chin-Yew Lin, Jian Yin and Wei-Ying Ma

Identifying Causal Relations Using Parallel Wikipedia Articles
Christopher Hidey and Kathy McKeown

Idiom Token Classification using Sentential Distributed Semantics
Giancarlo Salton, Robert Ross and John Kelleher

Implicit Discourse Relation Detection via a Deep Architecture with Gated Relevance Network
Jifan Chen, Qi Zhang and Xuanjing Huang

Improved Representation Learning for Question Answer Matching
Ming Tan, Cicero dos Santos, Bing Xiang and Bowen Zhou

Improved Semantic Parsers For If-Then Statements
I. Beltagy and Chris Quirk

Improving Coreference Resolution by Learning Entity-Level Distributed Representations
Kevin Clark and Christopher D. Manning

Improving Hypernymy Detection with an Integrated Path-based and Distributional Method
Vered Shwartz, Yoav Goldberg and Ido Dagan

Improving Neural Machine Translation Models with Monolingual Data
Rico Sennrich, Barry Haddow and Alexandra Birch

Incorporating Copying Mechanism in Sequence-to-Sequence Learning
Jiatao Gu, Zhengdong Lu, Hang Li and Victor O.K. Li

Incremental Acquisition of Verb Hypothesis Space towards Physical World Interaction
Lanbo She and Joyce Chai

Inferring Logical Forms From Denotations
Panupong Pasupat and Percy Liang

Inferring Perceived Demographics from User Emotional Tone and User-Environment Emotional Contrast
Svitlana Volkova and Yoram Bachrach

Inner Attention based Recurrent Neural Network for Answer Selection
Bingning Wang, Kang Liu and Jun Zhao

Intrinsic Subspace Evaluation of Word Embedding Representations
Yadollah Yaghoobzadeh and Hinrich Schütze

Investigating Language Universal and Specific in Word Embedding
Peng Qian, Xipeng Qiu and Xuanjing Huang

Investigating LSTMs for Joint Extraction of Opinion Entities and Relations
Arzoo Katiyar and Claire Cardie

Investigating the Sources of Linguistic Alignment in Conversation
Gabriel Doyle and Michael C. Frank

Jointly Event Extraction and Visualization on Twitter via Probabilistic Modelling
Deyu ZHOU, Tianmeng Gao and Yulan He

Jointly Learning to Embed and Predict with Multiple Languages
Daniel C. Ferreira, André F. T. Martins and Mariana S. C. Almeida

Knowledge Base Completion via Coupled Path Ranking
Quan Wang, Jing Liu, Yuanfei Luo, Bin Wang and Chin-Yew Lin

Knowledge-Based Semantic Embedding for Machine Translation
Chen Shi, Shujie Liu, Shuo Ren, Shi Feng, Mu Li, Ming Zhou, Xu Sun, and Houfeng Wang

Language to Logical Form with Neural Attention
Li Dong and Mirella Lapata

Language Transfer Learning for Supervised Lexical Substitution
Gerold Hintz and Chris Biemann

Larger-Context Language Modelling with Recurrent Neural Network
Tian Wang and Kyunghyun Cho

Latent Predictor Networks for Code Generation
Wang Ling, Phil Blunsom, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kočiský, Fumin Wang and Andrew Senior

Learning Concept Taxonomies from Multi-modal Data
Hao Zhang, Zhiting Hu, Yuntian Deng, Mrinmaya Sachan, Zhicheng Yan and Eric Xing

Learning Language Games through Interaction
Sida I. Wang, Percy Liang and Christopher D. Manning

Learning Precise Partial Semantic Mappings via Linear Algebra
Fereshte Khani, Martin Rinard and Percy Liang

Learning Prototypical Event Structure from Photo Albums
Antoine Bosselut, Jianfu Chen, David Warren, Hannaneh Hajishirzi and Yejin Choi

Learning Semantically and Additively Compositional Distributional Representations
Ran Tian, Naoaki Okazaki and Kentaro Inui

Learning Structured Predictors from Partial Information for Interactive NLP
Artem Sokolov, Julia Kreutzer, Chris Lo and Stefan Riezler

Learning Text Pair Similarity with Context-sensitive Autoencoders
Hadi Amiri, Philip Resnik, Jordan Boyd-Graber and Hal Daumé III

Learning the Curriculum with Bayesian Optimization for Task-Specific Word Representation Learning
Yulia Tsvetkov, Manaal Faruqui, Wang Ling and Chris Dyer

Learning To Use Formulas To Solve Simple Arithmetic Problems
Arindam Mitra and Chitta Baral

Learning Word Meta-Embeddings
Wenpeng Yin and Hinrich Schütze

Learning-Based Single-Document Summarization with Compression and Anaphoricity Constraints
Greg Durrett, Taylor Berg-Kirkpatrick and Dan Klein

Leveraging inflection tables for Stemming and Lemmatization via Discriminative String Transduction.
Garrett Nicolai and Grzegorz Kondrak

LexSemTm: A Semantic Dataset Based on All-words Unsupervised Sense Distribution Learning
Andrew Bennett, Timothy Baldwin, Jey Han Lau, Diana McCarthy and Francis Bond

Liberal Event Extraction and Event Schema Induction
Lifu Huang, Taylor Cassidy, Xiaocheng Feng, Heng Ji, Clare R. Voss, Jiawei Han and Avirup Sil

Literal and Metaphorical Senses in Compositional Distributional Semantic Models
E.D. Gutierrez, Ekaterina Shutova, Tyler Marghetis and Benjamin Bergen

Metaphor Detection Using Topic Transition, Emotion and Cognition in Context
Hyeju Jang, Yohan Jo, Qinlan Shen, Michael Miller, Seungwhan Moon and Carolyn Rose

Minimum Risk Training for Neural Machine Translation
Shiqi Shen, Yong Cheng, Zhongjun He, Wei He, Hua Wu, Maosong Sun and Yang Liu

Mining Paraphrasal Typed Templates from a Plain Text Corpus
Or Biran, Terra Blevins and Kathleen McKeown

Model Architectures for Quotation Detection
Christian Scheible, Roman Klinger and Sebastian Padó

Modeling Concept Dependencies in a Scientific Corpus
Jonathan Gordon, Linhong Zhu, Gully Burns, Aram Galstyan and Prem Natarajan

Modeling Coverage for Neural Machine Translation
Zhaopeng Tu, Zhengdong Lu, Yang Liu, Xiaohua Liu and Hang Li

Modeling Simpler Logical Forms via Model Projections
Reginald Long, Panupong Pasupat and Percy Liang

Modeling Social Norms Evolution for Personalized Sentiment Classification
Lin Gong, Mohammad Al Boni and Hongning Wang

Models and Inference for Prefix-Constrained Machine Translation
Joern Wuebker, Spence Green, John DeNero, Sasa Hasan and Minh-Thang Luong

Morphological Smoothing and Extrapolation of Word Embeddings
Ryan Cotterell, Jason Eisner and Hinrich Schütze

Most “babies” are “little” and most “problems” are “huge”: Compositional Entailment in Adjective-Nouns
Ellie Pavlick and Chris Callison-Burch

Multimodal Pivots for Image Caption Translation
Julian Hitschler, Shigehiko Schamoni and Stefan Riezler

MUTT: Metric Unit TesTing for Language Generation Tasks
William Boag, Renan Campos, Anna Rumshisky and Kate Saenko

N-gram language models for massively parallel devices
Nikolay Bogoychev and Adam Lopez

Neural Greedy Constituent Parsing with Dynamic Oracles
Maximin Coavoux and Benoit Crabbé

Neural Machine Translation of Rare Words with Subword Units
Rico Sennrich, Barry Haddow and Alexandra Birch

Neural Network-Based Model for Japanese Predicate Argument Structure Analysis
Tomohide Shibata, Daisuke Kawahara and Sadao Kurohashi

Neural Networks For Negation Scope Detection
Federico Fancellu, Adam Lopez and Bonnie Webber

Neural Relation Extraction with Selective Attention over Instances
Yankai Lin, Shiqi Shen, Zhiyuan Liu and Maosong Sun

Neural Semantic Role Labeling with Dependency Path Embeddings
Michael Roth and Mirella Lapata

Neural Summarization by Extracting Sentences and Words
Jianpeng Cheng and Mirella Lapata

News Citation Recommendation with Implicit and Explicit Semantics
Hao Peng, Jing Liu and Chin-Yew Lin

Noise reduction and targeted exploration in imitation learning for Abstract Meaning Representation parsing
James Goodman, Andreas Vlachos and Jason Naradowsky

Normalising Medical Concepts in Social Media Texts by Learning Semantic Representation
Nut Limsopatham and Nigel Collier

Normalized Log-Linear Language Model Interpolation is Efficient
Kenneth Heafield, Chase Geigle, Sean Massung and Lane Schwartz

Off-topic Response Detection for Spontaneous Spoken English Assessment
Andrey Malinin, Rogier van Dalen, Kate Knill, Yu Wang and Mark Gales

On approximately searching similar word embeddings
Kohei Sugawara, Hayato Kobayashi and Masajiro Iwasaki

On the Role of Seed Lexicons in Learning Bilingual Word Embeddings
Ivan Vulić and Anna Korhonen

On the Similarities Between Native, Non-native and Translated Texts
Ella Rabinovich, Sergiu Nisioi, Noam Ordan and Shuly Wintner

On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems
Pei-Hao Su, Milica Gasic, Nikola Mrkšić, Lina M. Rojas Barahona, Stefan Ultes, David Vandyke, Tsung-Hsien Wen and Steve Young

One for All: Towards Language Independent Named Entity Linking
Avirup Sil and Radu Florian

Optimizing an approximation of ROUGE - a problem-reduction approach to extractive multi-document summarization
Maxime Peyrard and Judith Eckle-Kohler

Optimizing Spectral Learning for Parsing
Shashi Narayan and Shay B. Cohen

Part-of-Speech Induction from fMRI
Joachim Bingel, Maria Barrett and Anders Søgaard

Phrase Structure Annotation and Parsing for Learner English
Ryo Nagata and Keisuke Sakaguchi

Pointing the Unknown Words
Caglar Gulcehre, Sungjin Ahn, Ramesh Nallapati, Bowen Zhou and Yoshua Bengio

Predicting the Compositionality of Nominal Compounds: Giving Word Embeddings a Hard Time
Silvio Cordeiro, Carlos Ramisch, Marco Idiart and Aline Villavicencio

Predicting the Rise and Fall of Scientific Topics from Linguistic Cues to their Scholarly Functions
Vinodkumar Prabhakaran, William Hamilton, Dan McFarland and Dan Jurafsky

Prediction of Prospective User Engagement with Intelligent Assistants
Shumpei Sano, Nobuhiro Kaji and Manabu Sassano

Probabilistic Graph-based Dependency Parsing with Convolutional Neural Network
Zhisong Zhang and Hai Zhao

Query Expansion with Locally-Trained Word Embeddings
Fernando Diaz, Bhaskar Mitra and Nick Craswell

Question Answering on Freebase via Relation Extraction and Textual Evidence
Kun Xu, Yansong Feng, Siva Reddy and Songfang Huang

RBPB: Regularization-Based Pattern Balancing Method for Event Extraction
Lei Sha, Jing Liu, Chin-Yew Lin, Sujian Li, Baobao Chang, Zhifang Sui

Reconstructing Hidden Documents from Observed Text
Matthew Burgess, Eugenia Giraudy and Eytan Adar

Recurrent neural network models for disease name recognition using domain invariant features
Sunil Sahu and Ashish Anand

Relation Classification via Multi-Level Attention CNNs
Linlin Wang, Zhu Cao, Gerard de Melo and Zhiyuan Liu

Resolving References to Objects in Photographs using the Words-As-Classifiers Model
David Schlangen, Sina Zarrieß and Casey Kennington

Scaling a Natural Language Generation System
Jonathan Pfeil and Soumya Ray

Segment-Level Sequence Modeling using Gated Recursive Semi-Markov Conditional Random Fields
Jingwei Zhuo, Jun Zhu, Yong Cao, Zaiqing Nie and Bo Zhang

Semi-Supervised Learning for Neural Machine Translation
Yong Cheng and Yang Liu

Sentence Rewriting for Semantic Parsing
Bo Chen, Le Sun, Xianpei Han and Bo An

Sentiment Domain Adaptation with Multiple Sources
Fangzhao Wu and Yongfeng Huang

Sequence-based Structured Prediction for Semantic Parsing
Chunyang Xiao, Marc Dymetman and Claire Gardent

Set-Theoretic Alignment for Comparable Corpora
Thierry Etchegoyhen and Andoni Azpeitia

Siamese CBOW: Optimizing Word Embeddings for Sentence Representations
Tom Kenter, Alexey Borisov and Maarten de Rijke

Situation entity types: automatic classification of clause-level aspect
Annemarie Friedrich, Alexis Palmer and Manfred Pinkal

Speech Act Modeling of Written Asynchronous Conversations with Task-Specific Embeddings and Conditional Structured Models
Shafiq Joty and Enamul Hoque

Stack-propagation: Improved Representation Learning for Syntax
Yuan Zhang and David Weiss

Strategies for Training Large Vocabulary Neural Language Models
Wenlin Chen, David Grangier and Michael Auli

Summarizing Source Code using a Neural Attention Model
Srinivasan Iyer, Ioannis Konstas, Alvin Cheung and Luke Zettlemoyer

Supersense Embeddings: A Unified Model for Supersense Interpretation, Prediction and Utilization
Lucie Flekova and Iryna Gurevych

Synthesizing Compound Words for Machine Translation
Austin Matthews, Eva Schlinger, Alon Lavie and Chris Dyer

Tables as Semi-structured Knowledge for Question Answering
Sujay Kumar Jauhar, Peter Turney and Eduard Hovy

Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vector Differences for Lexical Relation Learning
Ekaterina Vylomova, Laura Rimell, Trevor Cohn and Timothy Baldwin

Target-Side Context for Discriminative Models in Statistical Machine Translation
Aleš Tamchyna, Alexander Fraser and Ondřej Bojar

Temporal Anchoring of Events for the TimeBank Corpus
Nils Reimers, Nazanin Dehghani and Iryna Gurevych

Text Understanding with the Attention Sum Reader Network
Rudolf Kadlec, Martin Schmid, Ondřej Bajgar and Jan Kleindienst

The Creation and Analysis of a Website Privacy Policy Corpus
Shomir Wilson, Florian Schaub, Aswarth Abhilash Dara, Frederick Liu, Sushain Cherivirala, Pedro Giovanni Leon, Mads Schaarup Andersen, Sebastian Zimmeck, Kanthashree Mysore Sathyendra, N. Cameron Russell, Thomas B. Norton, Eduard Hovy, Joel Reidenberg and Norman Sadeh

The LAMBADA dataset: Word prediction requiring a broad discourse context
Denis Paperno, Raffaella Bernardi, Gemma Boleda, Raquel Fernandez, Germán Kruszewski, Angeliki Lazaridou, Ngoc Quan Pham and Marco Baroni

The More Antecedents, the Merrier: Tackling Multiple Antecedents in Anaphor Resolution
Hardik Vala, Andrew Piper and Derek Ruths

The Top 14 Aspects of Editorial Quality Control of Online News
Ioannis Arapakis, Filipa Peleja, Barla Berkant and Joao Magalhaes

Together we stand: Siamese Networks for Similar Question Retrieval
Arpita Das, Harish Yenala, Manoj Chinnakotla and Manish Shrivastava

Topic Extraction from Microblog Posts Using Conversation Structures
Jing Li, Ming Liao, Wei Gao, Yulan He and Kam-Fai Wong

Toward Constructing Sports News from Live Text Commentary
Jianmin Zhang, Jin-ge Yao and Xiaojun Wan

Towards more variation in text generation Developing and evaluating variation models for choice of referential form
Thiago Castro Ferreira

TransG : A Generative Model for Knowledge Graph Embedding
Han Xiao, Minlie Huang and Xiaoyan Zhu

Transition-Based Left-Corner Parsing for Identifying PTB-Style Nonlocal Dependencies
Yoshihide Kato and Shigeki Matsubara

Tree-to-Sequence Attentional Neural Machine Translation
Akiko Eriguchi, Kazuma Hashimoto and Yoshimasa Tsuruoka

Two Discourse Driven Language Models for Semantics
Haoruo Peng and Dan Roth

Understanding Discourse on Work and Job-Related Well-Being in Public Social Media
Tong Liu, Christopher Homan, Cecilia Ovesdotter Alm, Megan Lytle, Ann Marie White and Henry Kautz

Universal Dependencies for Learner English
Yevgeni Berzak, Jessica Kenney, Carolyn Spadine, Jing Xian Wang, Lucia Lam, Keiko Sophie Mori, Sebastian Garza and Boris Katz

Unravelling Names of Fictional Characters
Katerina Papantoniou and Stasinos Konstantopoulos

Unsupervised Multi-Author Document Decomposition Based on Hidden Markov Model
Khaled Aldebei, Xiangjian He, Wenjing Jia and Jie Yang

Unsupervised Person Slot Filling based on Graph Mining
Dian Yu and Heng Ji

User Modeling in Language Learning with Macaronic Texts
Adithya Renduchintala, Rebecca Knowles, Philipp Koehn and Jason Eisner

Using Sentence-Level LSTM Language Models for Script Inference
Karl Pichotta and Raymond J. Mooney

Verbs taking clausal and non-finite arguments as signals of factuality or uncertainty – revisiting the issue of meaning grounded in syntax
Judith Eckle-Kohler

Which argument is more convincing? Analyzing and predicting convincingness of Web arguments using bidirectional LSTM
Ivan Habernal and Iryna Gurevych

Which Coreference Evaluation Metric Do You Trust? A Proposal for a Link-based Entity Aware Metric
Nafise Sadat Moosavi and Michael Strube

WikiReading: A Novel Large-scale Language Understanding Task over Wikipedia
Daniel Hewlett, Alexandre Lacoste, Llion Jones, Illia Polosukhin, Andrew Fandrianto, Jay Han, Matthew Kelcey and David Berthelot

Word-Based Neural Models for Chinese Segmentation
Meishan Zhang and Yue Zhang

Zero Pronoun Resolution with Low-Dimensional Features
Chen Chen and Vincent Ng

Short Papers

A Domain Adaptation Regularization for Denoising Autoencoders
Stephane Clinchant, Gabriela Csurka and Boris Chidlovskii

A Fast Algorithm for Semantic Short Texts Retrieval
Yanhui Gu

A Language-Independent Neural Network for Event Detection
Xiaocheng Feng, Heng Ji, Duyu Tang, Bing Qin and Ting Liu

A Latent Concept Topic Model for Robust Topic Inference
Weihua Hu and Jun’ichi Tsujii

A Neural Network based Approach to Automatic Post-Editing
Santanu Pal, Sudip Kumar Naskar, Mihaela Vela and Josef van Genabith

A Novel Measure for Coherence in Statistical Topic Models
Fred Morstatter and Huan Liu

An Entity-Focused Approach to Generating Company Descriptions
Gavin Saldanha, Or Biran, Kathleen McKeown and Alfio Gliozzo

An Open Web Platform for Rule-Based Speech-to-Sign Translation
Manny Rayner, Pierrette Bouillon, Sarah Ebling, Johanna Gerlach, Irene Strasly and Nikos Tsourakis

An Unsupervised Method for Automatic Translation Memory Cleaning
Masoud Jalili Sabet, Matteo Negri, Marco Turchi and Eduard Barbu

Annotating Relation Inference in Context via Question Answering
Omer Levy and Ido Dagan

Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification
Peng Zhou

Automatic Semantic Classification of German Preposition Types: Comparing Hard and Soft Clustering Approaches across Features
Maximilian Köper and Sabine Schulte im Walde

Beyond Privacy: The Social Impact of Natural Language Processing
Dirk Hovy and Shannon L. Spruit

Bootstrapped Text-level Named Entity Recognition for Literature
Julian Brooke, Adam Hammond and Timothy Baldwin

Character-based Neural Machine Translation
Marta R. Costa-jussà and José A. R. Fonollosa

Claim Synthesis via Predicate Recycling
Yonatan Bilu and Noam Slonim

Coarse-grained Argumentation Features for Scoring Persuasive Essays
Debanjan Ghosh, Aquila Khanam and Smaranda Muresan

Convergence of Syntactic Complexity in Conversation
Yang Xu and David Reitter

Cross-lingual projection for class-based language models
Beat Gfeller, Vlad Schogol and Keith Hall

Cross-Lingual Word Representations via Spectral Graph Embeddings
Takamasa Oshikiri, Kazuki Fukui and Hidetoshi Shimodaira

Deep multi-task learning with low level tasks supervised at lower layers
Anders Søgaard and Yoav Goldberg

Deep Neural Networks for Syntactic Parsing of Morphologically Rich Languages
Joël Legrand and Ronan Collobert

Dependency-based Gated Recursive Neural Network for Chinese Word Segmentation
Jingjing Xu, Xu Sun and Xiaoyan Cai

Detecting Mild Cognitive Impairment by Exploiting Linguistic Information from Transcripts
Veronika Vincze, Gábor Gosztolya, László Tóth, Ildikó Hoffmann, Gréta Szatlóczki, Zoltán Bánréti, Magdolna Pákáski and János Kálmán

Dimensional Sentiment Analysis Using a Regional CNN-LSTM Model
Jin Wang, Liang-Chih Yu, K. Robert Lai and Xuejie Zhang

Domain Specific Named Entity Recognition Referring to the Real World by Deep Neural Networks
Suzushi Tomori, Takashi Ninomiya and Shinsuke Mori

Don’t Count, Predict! An Automatic Approach to Learning Sentiment Lexicons for Short Text
Duy Tin Vo and Yue Zhang

Empty element recovery by spinal parser operations
Katsuhiko Hayashi and Masaaki Nagata

Event Nugget Detection with Bidirectional Recurrent Neural Networks
Reza Ghaeini, Xiaoli Fern, Liang Huang and Prasad Tadepalli

Exploiting Linguistic Features for Use in Sentence Completion
Aubrie Woods

Exploring Stylistic Variation with Age and Income on Twitter
Lucie Flekova, Daniel Preoţiuc-Pietro and Lyle Ungar

Exponentially Decaying Bag-of-Words Input Features for Feed-Forward Neural Network in Statistical Machine Translation
Jan-Thorsten Peter, Weiyue Wang and Hermann Ney

Finding Optimistic and Pessimistic Users on Twitter
Xianzhi Ruan, Steven Wilson and Rada Mihalcea

Hawkes Processes for Continuous Time Sequence Classification: an Application to Rumour Stance Classification in Twitter
Michal Lukasik, P. K. Srijith, Kalina Bontcheva and Trevor Cohn

How Naked is the Naked Truth? A Multilingual Lexicon of Nominal Compound Compositionality
Carlos Ramisch, Silvio Cordeiro, Leonardo Zilio, Marco Idiart and Aline Villavicencio

Hunting for Troll Comments in News Community Forums
Todor Mihaylov and Preslav Nakov

IBC-C: A dataset for armed conflict analysis
Andrej Zukov Gregoric, Zhiyuan Luo and Bartal Veyhe

Implicit Polarity and Implicit Aspect Recognition in Opinion Mining
Huan-Yuan Chen and Hsin-Hsi Chen

Improved Parsing for Argument-Clusters Coordination
Jessica Ficler and Yoav Goldberg

Improving Argument Overlap for Proposition-Based Summarisation
Yimai Fang and Simone Teufel

Improving cross-domain n-gram language modelling with skipgrams
louis onrust, Antal van den Bosch and Hugo Van hamme

Improving Statistical Machine Translation Performance by Oracle-BLEU Model Re-estimation
Praveen Dakwale and Christof Monz

Incorporating Relational Knowledge into Word Representations using Subspace Regularization
Abhishek Kumar and Jun Araki

Incremental Parsing with Minimal Features Using Bi-Directional LSTM
James Cross and Liang Huang

Integrating Distributional Lexical Contrast into Word Embeddings for Antonym-Synonym Distinction
Kim Anh Nguyen, Sabine Schulte im Walde and Ngoc Thang Vu

Is “Universal Syntax” Universally Useful for Learning Distributed Word Representations?
Ivan Vulić

Is This Post Persuasive? Ranking Argumentative Comments in Online Forum
Zhongyu Wei and Yang Liu

Joint part-of-speech and dependency projection from multiple sources
Anders Johannsen, Željko Agić and Anders Søgaard

Joint Word Segmentation and Phonetic Category Induction
Micha Elsner, Stephanie Antetomaso and Naomi Feldman

Learning easy to compose word representations via bilingual supervision
Ahmed Elgohary and Marine Carpuat

Learning Multiview Embeddings of Twitter Users
Adrian Benton, Raman Arora and Mark Dredze

Learning Word Segmentation Representations to Improve Named Entity Recognition for Chinese Social Media
Nanyun Peng and Mark Dredze

Leveraging Lexical Resources for Learning Entity Embeddings in Multi-Relational Data
Teng Long, Ryan Lowe, Jackie Chi Kit Cheung and Doina Precup

Machine Comprehension using Rich Semantic Representations
Mrinmaya Sachan and Eric Xing

Machine Translation Evaluation Meets Community Question Answering
Francisco Guzmán, Lluís Màrquez and Preslav Nakov

Matrix Factorization using Window Sampling for Improved Word Representations
Alexandre Salle, Aline Villavicencio and Marco Idiart

Metrics for Evaluation of Word-level Machine Translation Quality Estimation
Varvara Logacheva, Michal Lukasik and Lucia Specia

Modelling the Interpretation of Discourse Connectives by Bayesian Pragmatics
Frances Yung, Kevin Duh, Taku Komura and Yuji Matsumoto

Morphological Reinflection with Encoder-Decoder Models and Edit Trees
Katharina Kann and Hinrich Schütze

Multi-Modal Representations for Improved Bilingual Lexicon Learning
Ivan Vulić, Douwe Kiela, Stephen Clark and Marie-Francine Moens

Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models
Barbara Plank, Anders Søgaard and Yoav Goldberg

Multiplicative Representations for Unsupervised Semantic Role Induction
Yi Luan, Yangfeng Ji, Hannaneh Hajishirzi and Boyang Li

Natural Language Generation enhances human decision-making with uncertain information: NLG works for women
Dimitra Gkatzia, Oliver Lemon and Verena Rieser

Natural Language Inference by Tree-Based Convolution and Heuristic Matching
Lili Mou, Rui Men, Ge Li, Yan Xu, Lu Zhang, Rui Yan and Zhi Jin

Nonparametric Spherical Topic Modeling with Word Embeddings
Kayhan Batmanghelich, Ardavan Saeedi, Karthik Narasimhan and Sam Gershman

On the linearity of semantic change: Investigating meaning variation via dynamic graph models
Steffen Eger and Alexander Mehler

One model, two languages: training bilingual parsers with harmonized treebanks
David Vilares, Carlos Gómez-Rodríguez and Miguel A. Alonso

Part-of-speech induction from eye-tracking data
Maria Barrett, Joachim Bingel and Anders Søgaard

Phrase Table Pruning via Submodular Function Maximization
Masaaki Nishino, Jun Suzuki and Masaaki Nagata

Phrase-Level Combination of SMT and TM Using Constrained Word Lattice
Liangyou Li, Andy Way and Qun Liu

Recognizing Salient Entities in Shopping Queries
Zornitsa Kozareva, Qi Li, Ke Zhai and Weiwei Guo

Reference Bias in Monolingual Machine Translation Evaluation
Marina Fomicheva and Lucia Specia

Scalable Semi-Supervised Query Classification Using Matrix Sketching
Young-Bum Kim, Karl Stratos and Ruhi Sarikaya

Science Question Answering using Instructional Materials
Mrinmaya Sachan, Kumar Dubey and Eric Xing

Semantic classifications for detection of verb metaphors
Beata Beigman Klebanov, Chee Wee Leong, E. Dario Gutierrez, Ekaterina Shutova and Michael Flor

Semantics-Driven Recognition of Collocations Using Word Embeddings
Sara Rodríguez-Fernández, Luis Espinosa Anke, Roberto Carlini and Leo Wanner

Sequence-to-Sequence Generation for Spoken Dialogue via Deep Syntax Trees and Strings
Ondřej Dušek and Filip Jurcicek

Simple PPDB: A Paraphrase Database for Simplification
Ellie Pavlick and Chris Callison-Burch

Specifying and Annotating Reduced Argument Span Via QA-SRL
Gabriel Stanovsky, Ido Dagan and Meni Adler

Syntactically Guided Neural Machine Translation
Felix Stahlberg, Eva Hasler, Aurelien Waite and Bill Byrne

Text Simplification as Tree Labeling
Joachim Bingel and Anders Søgaard

The Enemy in Your Own Camp: How Well Can We Detect Statistically-Generated Fake Reviews – An Adversarial Study
Dirk Hovy

The red one! Learning to predict attributes that discriminate a referent in a visual context
Angeliki Lazaridou, Nghia The Pham and Marco Baroni

The Value of Semantic Parse Labeling for Knowledge Base Question Answering
Wen-tau Yih, Matthew Richardson, Chris Meek, Ming-Wei Chang and Jina Suh

Transductive Adaptation of Black Box Predictions
Stephane Clinchant, Boris Chidlovskii and Gabriela Csurka

Transition-based dependency parsing with topological fields
Daniël de Kok and Erhard Hinrichs

Tweet2Vec: Character-Based Distributed Representations for Social Media
Bhuwan Dhingra, Zhong Zhou, Dylan Fitzpatrick, Michael Muehl and William Cohen

Unsupervised morph segmentation and statistical language models for vocabulary expansion
Matti Varjokallio and Dietrich Klakow

User Embedding for Scholarly Microblog Recommendation
Yang Yu, Xinjie Zhou and Xiaojun Wan

Using mention accessibility to improve coreference resolution
Kellie Webster and James R. Curran

Using Sequence Similarity Networks to Identify Partial Cognates in Multilingual Wordlists
Johann-Mattis List, Philippe Lopez and Eric Bapteste

Very quaffable and great fun: Applying NLP to wine reviews
Iris Hendrickx, Els Lefever, Ija Croijmans, Asifa Majid and Antal van den Bosch

Vocabulary Manipulation for Large Vocabulary Neural Machine Translation
Haitao Mi, Zhiguo Wang and Abe Ittycheriah

Which Tumblr Post Should I Read Next?
Zornitsa Kozareva and Makoto Yamada

Word Alignment without NULL Words
Philip Schulz, Wilker Aziz and Khalil Sima’an

Word Embedding Calculus in Meaningful Ultradense Subspaces
Sascha Rothe and Hinrich Schütze

Word Embedding with Limited Memory
Shaoshi Ling, Yangqiu Song and Dan Roth

System Demonstrations

A WEB-FRAMEWORK FOR ODIN ANNOTATION
Ryan Georgi, Michael Wayne Goodman and Fei Xia

AN ADVANCED PRESS REVIEW SYSTEM COMBINING DEEP NEWS ANALYSIS AND
ACHINE LEARNING ALGORITHMS
Danuta Ploch, Andreas Lommatzsch and Florian Schultze

CCG2LAMBDA: A COMPOSITIONAL SEMANTICS SYSTEM
Pascual Martínez-Gómez, Koji Mineshima, Yusuke Miyao and Daisuke Bekki

CREATING INTERACTIVE MACARONIC INTERFACES FOR LANGUAGE LEARNING
Adithya Renduchintala, Rebecca Knowles, Philipp Koehn and Jason Eisner

DEEPLIFE: AN ENTITY-AWARE SEARCH, ANALYTICS AND EXPLORATION PLATFORM
FOR HEALTH AND LIFE SCIENCES
Patrick Ernst, Amy Siu, Dragan Milchevski, Johannes Hoffart and Gerhard Weikum

GOWVIS: A WEB APPLICATION FOR GRAPH-OF-WORDS-BASED TEXT VISUALIZATION
AND SUMMARIZATION
Antoine Tixier, Konstantinos Skiannis and Michalis Vazirgiannis

JEDI: JOINT ENTITY AND RELATION DETECTION USING TYPE INFERENCE
Johannes Kirschnick, Holmer Hemsen and Volker Markl

JIGG: A FRAMEWORK FOR AN EASY NATURAL LANGUAGE PROCESSING PIPELINE
Hiroshi Noji and Yusuke Miyao

LANGUAGE MUSE: AUTOMATED LINGUISTIC ACTIVITY GENERATION FOR ENGLISH
LANGUAGE LEARNERS
Nitin Madnani, Jill Burstein, John Sabatini, Kietha Biggers and Slava Andreyev

LIMOSINE PIPELINE: MULTILINGUAL UIMA-BASED NLP PLATFORM
Olga Uryupina, Barbara Plank, Gianni Barlacchi, Francisco J Valverde-Albacete, Manos Tsagkias and Alessandro Moschitti

MDSWRITER: ANNOTATION TOOL FOR CREATING HIGH-QUALITY MULTI-DOCUMENT
SUMMARIZATION CORPORA
Christian M. Meyer, Darina Benikova, Margot Mieskes and Iryna Gurevych

MEDIAGIST: A CROSS-LINGUAL ANALYSER OF AGGREGATED NEWS AND COMMENTARIES
Josef Steinberger

META: A UNIFIED TOOLKIT FOR TEXT RETRIEVAL AND ANALYSIS
Sean Massung, Chase Geigle and ChengXiang Zhai

MMFEAT: A TOOLKIT FOR EXTRACTING MULTI-MODAL FEATURES
Douwe Kiela

MUSEEC: A MULTILINGUAL TEXT SUMMARIZATION TOOL
Marina Litvak, Natalia Vanetik, Mark Last and Elena Churkin

MY SCIENCE TUTOR—LEARNING SCIENCE WITH A CONVERSATIONAL VIRTUAL TUTOR
Sameer Pradhan, Ron Cole and Wayne Ward

NEW/S/LEAK – INFORMATION EXTRACTION AND VISUALIZATION FOR AN
INVESTIGATIVE DATA JOURNALISTS
Seid Muhie Yimam, Heiner Ulrich, Tatiana von Landesberger, Marcel Rosenbach, Michaela Regneri,
Alexander Panchenko, Franziska Lehmann, Uli Fahrer, Chris Biemann and Kathrin Ballweg

ONLINE INFORMATION RETRIEVAL FOR LANGUAGE LEARNING
Maria Chinkina, Madeeswaran Kannan and Detmar Meurers

OPENDIAL: A TOOLKIT FOR DEVELOPING SPOKEN DIALOGUE SYSTEMS WITH
PROBABILISTIC RULES
Pierre Lison and Casey Kennington

PERSONALIZED EXERCISES FOR PREPOSITION LEARNING
John Lee and Mengqi Luo

PIGEO: A Python GEOTAGGING TOOL
Afshin Rahimi, Trevor Cohn and Timothy Baldwin

POLYGLOT: MULTILINGUAL SEMANTIC ROLE LABELING WITH UNIFIED LABELS
Alan Akbik and Yunyao Li

REAL-TIME DISCOVERY AND GEOSPATIAL VISUALIZATION OF MOBILITY AND
INDUSTRY EVENTS FROM LARGE-SCALE, HETEROGENEOUS DATA STREAMS
Leonhard Hennig, Philippe Thomas, Renlong Ai, Johannes Kirschnick, Wang He,
Jakob Pannier, Nora Zimmermann, Sven Schmeier, Feiyu Xu, Jan Ostwald and Hans Uszkoreit

ROLEO: VISUALISING THEMATIC FIT SPACES ON THE WEB
Asad Sayeed, Xudong Hong and Vera Demberg

TERMINOLOGY EXTRACTION WITH TERM VARIANT DETECTION
Damien Cram and Beatrice Daille

TMOP: A TOOL FOR UNSUPERVISED TRANSLATION MEMORY CLEANING
Masoud Jalili Sabet, Matteo Negri, Marco Turchi, José G. C. de Souza and Marcello Federico

TRANSCRATER: A TOOL FOR AUTOMATIC SPEECH RECOGNITION QUALITY ESTIMATION
Shahab Jalalvand, Matteo Negri, Marco Turchi, José G. C. de Souza and Falavigna Daniele

VISUALIZING AND CURATING KNOWLEDGE GRAPHS OVER TIME AND SPACE
Tong Ge, Yafang Wang, Gerard de Melo and Haofeng Li

from: http://acl2016.org/index.PHP?article_id=13#long_papers

2.会议 ACL 2015 paper 的概述

  1. Text to 3D Scene Generation with Rich Lexical Grounding

Angel Chang, Will Monroe, Manolis Savva, Christopher Potts, Christopher D. Manning

这篇论文很 fancy,就是如何利用简易的文本,建立 3D 图形。比如如何根据文本,画出一个房间里的角落后有一个冰箱,冰箱上面有一盆花。做的工作很细致。语料也很特别!

  1. MultiGranCNN: An Architecture for General Matching of Text Chunks on Multiple Levels of Granularity

Wenpeng Yin and Hinrich Schütze

他这两年的研究重点基本都放在 textchunks 的表达上,他的这系列工作(包括这篇)都强调他想 handle various granularity in the sentence reprentation,具体到他的模型中,就体现在了 unigram (word) feature, short ngram feature, long ngram feature 和 sentence feature 上。我理解他的 various granularity 要同时 model 两种 advantage:1)different granularity should be compared (between two sentence representations) at corrosponding granular-level (do not compare single words with entire sentences);2) should model interactions among different granularities. 对于1)他们认为这点比 Socher’11 的 RNN 工作要好;同时他们把这个工作 extend 到了 ACL’15 里;对于2)他们于是在他们的 model 中加入了一个 interaction NN。

这篇论文中有两个 technique,一个是 unsupervised pretraining CNN scheme,这个东西他们说特有有用,大概就是把最上层的 sentence representation layer output,当做 one unit,然后再加上整个 NN input 的原始的一个个 single word unit,去组成一个新的 sequence,然后结合 NCE(noise-contrastive estimation)技术,改造成一种 sentence-enhanced word prediction 的玩意。他们这个 technique 的思想源自两篇论文,一个很显然就是 word2vec 那种 unsupervised 的 prediction central word 的思想,一个是 Baroni’14 的 Dont count, predict! 论文,认为 predict-fashioned 的 LM 更好。

这是第一个 technique,这个 technique 被他用在了后面这一系列论文当中。但有多大用处各位可以一起检验一下。

第二个 technique 就是另二种 dynamic pooling,追随 Socher’11 的工作。

  1. [TACL]* Improving Distributional Similarity with Lessons Learned from Word Embeddings

Omer Levy, Yoav Goldberg, Ido Dagan

据说 Levy 在 oral presentation 当场战斗力爆表,直接说自己做了 5600 组实验都无法重复出某些模型的好的实验结果。

  1. Learning Word Representations by Jointly Modeling Syntagmatic and Paradigmatic Relations

Fei Sun, Jiafeng Guo, Yanyan Lan, Jun Xu, Xueqi Cheng

首先,这篇论文言辞朴素踏实。如果文如其人,可以透过论文看出这个作者对待研究的沉稳态度。全文很少有 fancy 或者渲染的形容词。踏实地描述工作,认真地对比方方面面。有数学,有分析,有 case study。再来补充推荐,这篇论文甚至可以当做一篇简要的 survey。Introduction, Related Work 和 后面的 Discussion 部分,对于 syntagmatic 和 paradigmatic models 的总结十分全面,评价客观到位。推荐给想了解一下这边内容的童鞋。

以下进入正题:

Motivation:我们都知道,一般去找 word similarity,会出现两种,一种更像 word relatedness,一种才是 word similarity。可以理解为“横向”和“纵向”的 similarity。文中用 The wolf is a fierce animal. The tiger is a fierce animal. 两句话来解释。(wolf, tiger) 是 paradigmatic relation,(wolf, fierce) 和 (tiger, fierce) 都分别是 syntagmatic relation。以前的许多 model 分别 capture 了这两种 relation 中的某一种。本文想 jointly 学这两种,并且认为 jointly 的学习是可以互相 boost 整体结果的(并在最后 case study 中给出了分析)。

Concepts:关于 syntagmatic vs. paradigmatic,本文中其实有四对相似的概念。首先是 (syntagmatic, paradigmatic),对应的是 (representations based on the text region, representations based on similar contexts),第三个对应的是 (combinatorial relations, substitutional relations),第四个对应的是 (words-by-documents co-occurrence matrix, words-by-words co-occurrence matrix).

Idea:jointly 的学习其实也算是一个 NLP 中比较有卖点的东西。进攻的(Hanyang 爱用的词)是 NLP 中经常使用的 pipeline framework,jointly 的工作可以减少 error propagation 和 accumlation。虽然这篇文章中不涉及 pipeline 工作,但是 jointly 的学习确实可以互相 boost。

Models:基于 word2vec 的 CBOW 和 SkipGram,改造了两个模型。虽然改造这俩模型的 paper 已经太多,但是这篇的改造确实给人眼前一点点亮的感觉。而且给出了严格的数学推导(还有源码呀)。表述清晰,数学不好的各位童鞋的福利(包括我)。简单来说,两者都是用 word2vec 的 contexts (neighboring words) 继续 capture paradigmatic,而用整个 documents capture syntagmatic。比改造 CBOW 的直接“并联”更巧妙的是改造 SkipGram,变成了 “Hierarchical”的形式,用 documents 先 predict (conditioned)中心词 w_0,再和 SkipGram 一样去用 w_0 predict context words,一样达到同时 capture 两种 relation 的目的。

Experiments:在公开的大数据集上,横纵向(多种 dim,多个 baseline model)比较了在 word similarity 和 word analogy 的表现。全部 beat baseline。

Case Study:这部分我觉得最认真。我很喜欢。

  1. Compositional Vector Space Models for Knowledge Base Completion

Arvind Neelakantan, Benjamin Roth, Andrew McCallum

思想很简单,去弥补 knowledge path,然后就可以推导出一些 transitional & compositional 的 relation in KB。

  1. Learning Answer-Entailing Structures for Machine Comprehension

Mrinmaya Sachan, Kumar Dubey, Eric Xing, Matthew Richardson

CMU 出品,Eric Xing 老师的组。本文不是 NN,数学上还算简单。个人觉得有两个亮点,一个就是假设了一个中间的 hypothesis,一个是在数学的地方结合了 multi-task,并使用了 feature map 的 technique 把 multi-task 给“退化”成了原始问题。

先说第一个,第一个就是说,他们先用 Question 和 Answer,学出一个 hypothesis,这个 hypothesis 就是一种 latent variable,也可以认为是一种 embedding 后的 fact。如果我们认为 question + answer 共同描述了一个 fact/truth/event 的话。基于这个 hypothesis,再去 match 原始 paragraph/text 里的 relevant words。具体可以看看 Figure 1.我觉得这个蛮有趣的。因为让我想起编码解码。Question + Answer 的组合就是一种 对于这篇 doc 的一种表达;而这篇 doc 本身是另一种表达。这两种表达就是两种 representation 的结果,那么中间真实的事情是什么?所谓的完整的 information 是什么?他这样直接结合的 hypothesis 肯定也是 reduce 了信息的。实际我觉得现在 Machine Translation/Conversation 那边也在做类似的事情。我们不要直接一对一,要有中间一个看不见的“hypothesis”。

第二个 multi-task,这个和他们用到的另一篇论文有关,《Toward AI-Complete Question Answering: A Set of Prerequisite Toy Tasks》。这里面定义了20种 AI 需要解决的问题。是种。就是上面说的问题是分类的,how/what/which/why/when/who 啥的。他们用了这20类,把任务细分,细分成 20个 subtask。这样就变成了 multi-task 的问题。然后使用了 feature map(Evgeniou 2004)的技术,把 multi-task 又给转化成了原始问题。我觉得还蛮有趣的。当然 multi-task 已经有非常多的解决办法了,这个只是一种适用于他的模型的有效简单的办法。

  1. A Generalisation of Lexical Functions for Composition in Distributional Semantics

Antoine Bride, Tim Van de Cruys, Nicholas Asher

论文也是关注一个热点,compositional。论文提出了一种比较 general 的框架去囊括 composition。同时还着重分析了形容词(adj)和名词(noun)的 composition 性质。

  1. Simple Learning and Compositional Application of Perceptually Grounded Word Meanings for Incremental Reference Resolution

Casey Kennington and David Schlangen

这篇论文的报告非常非常 cute!一直以右下角的三个俄罗斯方块作为动画主体。内容也很 fancy!所谓 grounded word meaning 就是那种描述性的事实性的修饰词。比如一个“十字”“红色”“方块”。这样。数据集也是他们自制的,公开。很不错很有趣的论文。

  1. Learning to Adapt Credible Knowledge in Cross-lingual Sentiment Analysis

Qiang Chen, Wenjie Li, Yu Lei, Xule Liu, Yanxiang He

这篇工作中,作者使用情感信息去 supervise 双语之间的翻译——很直观的假设就是,source language 和 target language 之间情感词性应该是不变的。一句话不可能翻译前是正向情感,翻译后就变成负向了。他们采用了 knowledge validation 进行了多次验证。

  1. Event-Driven Headline Generation

Rui Sun, Yue Zhang, Meishan Zhang, Donghong Ji

文章非常自然地用event structure 和 information 去 tradeoff 了 extractive-based method 和 abstractive-based method 的优缺点。关于这两种方法,这篇论文的 related work 写得很好,可以看一下(related wok 和 Background 都有)。

论文的思想是说,我们 event structure 就涵盖了非常 informative 的有利于 summarization 的东西。一个 event 被定义为一个 tuple。我们先 extract 全部的 event tuple,再做 generation。无论是 event tuple 还是 generation,这个工作都很妙。妙就妙在,event 的 structure 几乎涵盖了上面那篇 ACL’15 的 NP 和 VP 的信息(见Section 3.1.1),并且,更好的地方在于,它可以利用 event tuples 中的第二个元素,predicate 进行去重。这个就是利用了 event 这种 tuple 的数据结构,抓了 dependency parsing 的结果,用其中 NSUBJ 和 DOBJ relation 去处理 NP VP。

Section 3.1.3 就是很自然地 graph-based summarization 的常用思想,word event 不是一个 alignment pair 么,这种时候大招就是——A should be more important if it occurs in more important B. And verse visa. 所以我就把 event 和 words (in the lexical chains)联系起来了。

所以直到这一步都可以看出,是 event 这种 tuple 结构帮了大忙了。而作者也意识到了这点,他自己就认为 tuple 这个结构式一种很好的 tradeoff between extractive and abstractive,又比 abstractive 纯 Phrase-based 的多一些 grammatical 的 information,又可以减轻 extracitve 的 sparse 问题(见 Introduction)。

  1. How Far are We from Fully Automatic High Quality Grammatical Error Correction?

Christopher Bryant and Hwee Tou Ng

出发点很好,就是用 human evaluation 做了 agreement 的评价。发现人都做不到 90% 以上,所以我们不能要求机器翻译应该做到……

  1. Efficient Methods for Inferring Large Sparse Topic Hierarchies

Doug Downey, Chandra Bhagavatula, Yi Yang

我觉得他的卖点就依然在于 hierarchy,并且看起来能解决 hierarchy model 的 efficiency 的问题。这篇文章即使也是 pre-defined topic/structure,但是它给出了一种 expansion,就是用已经学好的一个他的 hierarchical 模型,去作为“seed”,学新的。提速。而且我认为也是符合认知的。

接下来说说这文章中,重点攻击的俩模型,和他自己的区别。由区别就可以看出为啥他快。LDA 作为一种最广泛应用的 topic model,简洁有效是不用说的。但是无论是 LDA 还是一些变种 LDA,他们最大的问题是,那个概率假设。要满足 topic 和 topic 之间是独立的(并不是合1的那个假设有问题)。这个 topic 和 topic 之间独立,带来的问题是,数据量不够时,topic 定多的时候,就会学出很多非常 general,nonsensical 的 topic,

对应于中文就是“我,的,我们,一个,一个人,生活”这类。这也是为啥 LDA 不 hierarchical 的原因(hierarchical LDA 也没打破这个假设)。所以,第一个重点区别就是,PAM 和这个论文里的 SBT 都是打破这个假设的,都是可以 modelling correlations between topics 的。那么 SBT 和 PAM 的区别是什么呢,就是它用的那个名字复杂和 fancy 的 tree prior 了。这种 prior 的 motivation 在我看来还是在 prior 的阶段,就去假设这种 hierarchy,从而在 sampling 阶段可以“recursive”。细节上来说,就是使得 sampling 的时候,topic 的 coherence 会更大。不会乱 sampling。会更倾向于 draw 相关的 topic。

  1. Jointly optimizing word representations for lexical and sentential tasks with the C-PHRASE model

Nghia The Pham, Germán Kruszewski, Angeliki Lazaridou, Marco Baroni

基于 CBOW 的改造模型,作者的出发点是——既然 CBOW 可以基于 contexts 中的 words combination(ngram)来预测中心词,我们应该可以找出一种方法,使得 contexts 不再是简单的自然 combination,而是符合 linguistic rule 符合 syntax 的 combination。

  1. Co-training for Semi-supervised Sentiment Classification Based on Dual-view Bags-of-words Representation

Rui Xia, Cheng Wang, Xin-Yu Dai, Tao Li

这篇文章的出发点很有趣——自制反例!在 sentiment 相关的任务中,由于数据稀疏性,可能会使得正负向情感词没有出现在 training instances 中,这时候我们可以通过自制反例来减少这种稀疏性。具体时,用 lexical rules 来匹配出一些情感词,然后把 sentiment 的 label 反转,0变1,1变0,从而得到对应的负例。

然后,正例和负例分别进入两个 view,便是 cotraining。和作者聊,Rui Xia 老师认为这种方法只能用在 sentiment 这种可以把 label 变负的问题上。

  1. A Hierarchical Neural Autoencoder for Paragraphs and Documents

Jiwei Li, Thang Luong, Dan Jurafsky

作者验证了 LSTM 变成 hierarchical 架构的可行性,给出了几种直观的改造方案。第三种是基于 attention machenism 进行的 partial part alignment 的 LSTM。经过 hierarchical 改造的 LSTM 可以进行 sentence - paraphrase - document 的多层次表达。

  1. A Re-ranking Model for Dependency Parser with Recursive Convolutional Neural Network

Chenxi Zhu, Xipeng Qiu, Xinchi Chen, Xuanjing Huang

这个论文最大的贡献是,他们把以前 Socher 提出的用原始 RNN 做 compositional 这种 relation 的 方法,给改良了。可以不再只能 model binary composition 了,可以 triple even more 了。具体可以见 Section 4 开始的那段写的,就是一个 constituent parsing vs. dependency parsing 的问题。这个是他这个论文最大的贡献。variant of RNN to handle more-than-two units of composition。

另外,distance embedding,in Section 3.1,用 [-2,2] 这种 relative position 直接作为 feature,然后直接 concatenate 到 embedding vector 里(见 Equ. 4)。方法取自The best paper in COLING 2014,《Relation Classification via Convolutional Deep Neural Network》。

  1. Cross-lingual Dependency Parsing Based on Distributed Representations

Jiang Guo, Wanxiang Che, David Yarowsky, Haifeng Wang, Ting Liu

作者利用双语对应信息,分别采用 alignment 和 CCA 的方法融合到了传统 NN-based dependency parsing 中去。其中 alignment 方法是允许 one-to-many relation 的 alignment 的,而 CCA 则只是 one-to-one。

  1. A Unified Multilingual Semantic Representation of Concepts

José Camacho-Collados, Mohammad Taher Pilehvar, Roberto Navigli

作者简直是在这个 word semantic representation/ word semantic disambiguation 上苦心修行多年:http://wwwusers.di.uniroma1.it/~navigli/pubs_by_cat.html。即使是在今年,也在 WWW/TACL/NAACL 上都分别发表了相关工作。2013 年的这个工作的前身还被提名为 ACL best paper 候选。

先说一下和这篇 paper 相关的几个工作:

Socher 2013a, Bilingual Word Embeddings for Phrase-Based Machine Translation,

Guo 2014, Learning Sense-specific Word Embeddings By Exploiting Bilingual Resources

NAACL 2015, Deep Multilingual Correlation for Improved Word Embeddings

NAACL 2015 (与本文同一作者), Simple task-specific bilingual word embeddings

Socher 2013a 的工作应该是第一个提出把双语映射的(不敢肯定)到同一个空间的——去学一个共同的 word embedding space。这个思想后来也算是被发扬到 text/image pair,各种各种吧。这个工作的结果还是很不错的,简略的介绍可以看当时神童的一篇博文:http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/

那么后来的工作其实分为了两个 step,第一个 step 其实是,我们用 multi-lingual(multi-resource)去(1)增强 word representation 的表达,和(2)我们去增进更细致的 concept 的表达(disambiguation)。

关于(1),除了 Socher 2013a 的工作, NAACL’15 的 Deep 那篇,也是用双语增进表达——这里他们是基于 CCA/DCCA 的假设,把 MT pair 作为 CCA/DCCA 的输入(CCA 就是之前讲的那两篇 model-based ACL’15 和 NIPS 的工作里的 CCA)。这篇主要认为 DCCA 作为一种 nonlinear subspace 的 transformation,要更加优于 CCA 这种 linear transformation。

关于(2),比如 Guo 2014 的工作,可以看他的 paper 里的 Table 1,一目了然。基于 MT 的 alignment model,去一步步剔除/选择想要的 cluster——把一词多义分进多个 cluster。

接下来来说本文这篇,A Unified Multilingual Semantic Representation of Concepts,它也是为了(2)服务的——去学一种 concept 的 embedding,其实就是把一词多义的 word 的不同 sense 认为是一个 concept。但是他不同的地方是什么呢,他不仅是用了 multilingual,还用了 external information——Wikipedia。而且它讨巧的一点在于,它不是选择 translation pair,而是用了一个“纯天然”的 multilingual synset database:Babelnet——http://babelnet.org/ 这玩意号称是整合了 WordNet 和 Wikipedia 等,直接使得每个它里面的 concept 有多种语言中的 synset word。这样他们就有起点了!也就是说,他们用这些 synset words 和 concept,再去遵循一定规则,去爬 Wikipedia,去增进他们的语义 corpus。

工作做的很 linguistic,但是有个东西挺有趣(除了那个 Babelnet),就是他们在 Section 3.1 中用到的 similarity metric。并不是大家常用的 consine or Haiming,而是 square-rooted Weighted Overlap(WO),孤陋寡闻的我还是第一次听说 orz——他们工作里说这玩意已经被证实比传统的 cosine 好。基于这个 WO metric(for vector representations of words),两个 word 之间的 similarity 还得再有个转换(公式3)。

  1. Dependency-based Convolutional Neural Networks for Sentence Embedding

Mingbo Ma, Liang Huang, Bowen Zhou, Bing Xiang

黄亮老师二作的论文,一作学生主讲。讲的非常非常清晰。语速快,掷地有声,slides 的可视化辅助理解。思想非常 straightforward,不再是简单的 sequential Convoluational NN,而是利用 dependency 的 relation,进行 Convolutional。这样的思想有点像改造 CBOW/Skip-Gram 时融入 dependency relation information。

  1. A Unified Learning Framework of Skip-Grams and Global Vectors

Jun Suzuki and Masaaki Nagata

一篇思想上希望从数学(Machine Learning)角度把 SkipGram (with negative sampling,SGNS)和 GloVe 囊括在一个框架下的论文。但是论文比较有争议的地方在一起,他们使用的两个模型的公式少了 bias 项。从某种程度上并不能算一个完全精确的囊括。

  1. Distributional Neural Networks for Automatic Resolution of Crossword Puzzles

Aliaksei Severyn, Massimo Nicosia, Gianni Barlacchi, Alessandro Moschitti

很有趣的一个任务,拼字游戏。作者同时公开了数据集。在 presentation 的时候做了一个小游戏,给出了四个 information,让大家猜一个词——最后猜出来是 Tux 小企鹅。事实上拼字游戏并没有想得那么简单。他们的模型中比较特殊的一点是,把两个 input unit 的 similarity 算出来后,会继续把 input unit x,input unit y,similarity 和其他 feature 一起 embedding 在同一层里。

  1. A Dependency-Based Neural Network for Relation Classification

Yang Liu, Furu Wei, Sujian Li, Heng Ji, Ming Zhou, Houfeng WANG

本文有两个贡献,首先提出了一种新的 dependency relation 相关的 path——ADP,Augmented dependency path。ADP 不仅包含了经典 relation classification 中的 dependency shortest paths,还包括了 path 相关的 subtrees。第二个贡献便是基于 ADP,改造了一种 Recursive NN 的模型,叫 DepNN。

  1. Machine Comprehension with Discourse Relations

Karthik Narasimhan and Regina Barzilay

MIT CSAIL 出品。开源。是一篇很 neat 的论文,而且不是 NN。这篇文章的卖点是:discourse information + less human annotation所以他们的 model,可以使用 discourse relation(relations between sentences, learned, not annotated) 去增强 machine comprehension 的 performance。具体的,他们先使用 parsing 等方法,去选出和 question 最 relevant 的一个句子(Model 1)或者多个句子(Model 2 和 Model 3),并在这个过程中建立 relation,最后预测。思想都是 discriminative model 的最简单的思想,找 hidden variable,概率连乘。如果对本文有兴趣,推荐看 Section 3.1,讨论了一下他们认为这个 task 上可能相关的四【类】feature。

  1. Model-based Word Embeddings from Decompositions of Count Matrices

Karl Stratos, Michael Collins, Daniel Hsu

首先推荐所有对 word embeddings 或者 low-dimensional lexical representation 有兴趣的童鞋读本文。本文主要是想从数学角度理解 word embedding,并想提出一种 template 去满足我们的 embedding 目标(其实只是降维)。

如果可以提出一种可以减少像 negative sampling derived word embeddings 中的 estimation error(即提高 estimation 准确度,但依然是 estimation),就可以提高 word embedding 的 performance。

于是本文从 CCA (用来求解 word similarity evaluation 中 Pearson ranking 的)入手,强调 CCA 是可以用来优化两个 vector,使得它们最大相关化(这不就是 context-based model 的假设么?the famous quote, You shall know a word by the company). 然后想把 corpus 中,central word 和它周围的 context words 构成这样的两个 vector(其实是 vector pairs,假设中心词是 c, 窗口大小是 K,那么就会有 2K 个pair 的vectors),就弄成这个 CCA 的优化里。但是这显然很耗费计算量。又通过各种 lemma 加观察,开始转化近似求解(当数据量大的时候)。近似求解之后的求解公式就联系到了用 CCA 做 parameter estimation,spectral estimation。由此提出了 spectral template for word embeddings。并且还把已经提出的对于 word embeddings 的拆解方式(如Levy 的 PPMI),都”归“进了它这个 template 里(Section 5,Figure 2)。然后做了实验。所以我觉得它们是通过另一种数学角度,把 word embedding 整件事给从 estimation error 的角度做了优化(直接把 negative sampling derived word embeddings 当靶子,而不是试图解释这个东西),也算是做了更进一步的事情。

鉴于 ACL’15 这篇,也引用了 NIPS’11 的。我先把它在引用时,自己的 comment 的贴出来:

Dhillon et al. (2011) and (2012) propose novel modifications of CCA (LRMVL and two-step CCA) to derive word embeddings, but do not establish any explicit connection to learning HMM parameters or justify the squareroot transformation.

看完论文的我,还是觉得这话说的很中肯的。下面我来对比一下这两篇论文:

  1. 首先 ACL’15 这篇不仅仅包括 NIPS’11,所以以下对比只强调它延续 NIPS’11 的工作的内容。

  2. 在 NIPS’11 中,作者所谓的 Multi-View,其实是,左 contexts L,右 contexts R,当前 target word W。三个 contexts。以及作者不是很强调的 previous and future view(HMM中的 hidden state)。用两部分来理解,L、R、W,其实是综合考虑上下文信息,这没的说;而 previous 和 future view,则是利用 HMM 的 state 假设(在 learning 过程中,这个 state 大概迭代 5-7 次)。

  3. NIPS’11 把 HMM 的假设搞到 word representation 里,其实也没什么新鲜的。但是我认为这个 HMM 中假设和学到的 hidden state 和我们的 word embedding 还是不同的,虽然都是 low-rank/dim 的表达,但是 hidden state 可以进一步被用来学习 context-specific 的 word embedding。也就是说 word embedding 是一种结果,一种 projected result,hidden state 是一种 learning method,一种 projection。(这里只是我的理解)

  4. NIPS’11 于是实际上,是用 CCA 先学出了 L,R 在 hidden state 假设下的一个降维后的 A,再用这个 A 去第二次使用 CCA,和 W 计算——所以是两个步骤,两次 CCA。作者有讨论,如果当我们是 infinite corpus 的情况,我们其实可以等价为一步到位的 CCA。但是当我们的 corpus 符合 Zips’ Law 的时候,我们这样分两步走,才是更准确的。

  5. 而 ACL’15 这篇,可以说,ACL’15 = NIPS’11 + Stratos (2014) + strict condition (squaredroot transformation)。就是说,它把在 strict condition 下,applied Stratos (2014) to NIPS’11。使得满足了他所说的“establish any explicit connection to learning HMM parameters or justify the squareroot transformation”,这部分就是 ACL’15 中 Section 4 的内容。

  6. 当然,为此,ACL’15 和 NIPS’11 的切入点/行文逻辑顺序就不一样,NIPS’11 就是告诉大家, CCA 可以学 low-rank,为了达到这个目的,我们需要满足什么假设,运用什么技巧;ACL’15 则是说,CCA 可以做我们知道,but CCA 还可以理解为一种 parameter estimation for HMM(Section 4.1 开篇),啥叫 parameter estimation 呢,在这 estimation 角度来讲,我们其实只是要找一个矩阵 O——可是这个矩阵 O 啊,最好要达到俩性质,这俩性质我们就需要两个额外的技巧才能满足。

  7. 具体举个例子来讲,NIPS’11 中 exponential smooth,是为了 low-rank 的 L,R 表达服务的,很自然地引入,以一种 smooth 的角度;ACL’15 中 exponential smooth 则是以一种为了满足 O 的性质,我们要这样做的 explicit proof 角度引入的。

  8. NIPS’11 是 convex 的,直接求解,没 local optimal 问题;ACL’15 是 non-convex 的(Stratos 2014 的工作是 non-convex 的因为),所以有点麻烦。

  9. Entity Hierarchy Embedding

Zhiting Hu, Poyao Huang, Yuntian Deng, Yingkai Gao, Eric Xing

  1. The Users Who Say ‘Ni’: Audience Identification in Chinese-language Restaurant Reviews

Rob Voigt and Dan Jurafsky

  1. PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification

Ellie Pavlick, Pushpendre Rastogi, Juri Ganitkevitch, Benjamin Van Durme, Chris Callison-Burch

推荐只是因为 poster 做的太有个性……

  1. Non-distributional Word Vector Representations

Manaal Faruqui and Chris Dyer

  1. A Hierarchical Knowledge Representation for Expert Finding on Social Media

Yanran Li, Wenjie Li, Sujian Li

作者通过层次化模型,将新浪微博上的每个用户的全部帖子表达成其层次化的知识结构——并用来和不同领域的专家的知识结构进行对比,从而判断这个用户是否是某个领域的专家。具体上,建立知识结构的过程使用了 Pachinko Allocation Model,不同于 LDA,这样的 model 放宽了 LDA 的 topic 之间是独立的假设,从而可以进行层次化建模。在进行结构 matching 的过程,基于 edit-distance,tree 上的编辑距离,改造了 approximate tree matching 算法,融入了 word embedding 的 semantic matching——从而提升了效果。

  1. Learning Summary Prior Representation for Extractive Summarization

Ziqiang Cao, Furu Wei, Sujian Li, Wenjie Li, Ming Zhou, Houfeng WANG

传统的框架是,两步走,先有一个 sentence ranking 的过程,再用 ranking score 去做第二步的 sentence selection。这两步基本都是 feature-based。所以过去的工作多数是在 feature 上做文章,各显身手。这篇论文在 ranking 的过程套用了一个 CNN,提升了效果。

开完 ACL 2015 大会,选了自己感兴趣的几十篇论文,大部分是自己已经读过的,做了一些概述。相信里面有很多错误,欢迎指正。另外,图文并茂版本在公众号查看,长微博复制图片也许有很多错误显示不出来。

  1. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks

Kai Sheng Tai, Richard Socher, Christopher D. Manning

思想很简单,就跟昨天说的黄亮老师组把 sequential CNN 变成 基于 dependency relation 的 CNN 一样,这篇就是把 sequential LSTM 变成了 Tree-Structured LSTM。

  1. genCNN: A Convolutional Architecture for Word Sequence Prediction

Mingxuan Wang, Zhengdong Lu, Hang Li, Wenbin Jiang, Qun Liu

这篇论文基本是用好几个 CNN 模拟 RNN,然后加上了 shared weight/ no shared weight (two feature maps), 做的工作,效果不错。

  1. Abstractive Multi-Document Summarization via Phrase Selection and Merging

Lidong Bing, Piji Li, Yi Liao, Wai Lam, Weiwei Guo, Rebecca Passonneau

他们的 main idea 是把 abstrative summarization 这件事,建立在对于 phrase 的 extract 和 combine 上。基本单元是 phrase。而且由于他们有两个 observation,认为 NP phrase 主要表示了 concept,VP phrase 主要表示了 fact。所以他们的工作只集中于抽取这两种 phrase,并基于他们来做 abstractive summarization。所以他们的 framework 分为三个部分——phrase extraction,phrase salience scoring and sentence generation as an optimization problem (simultanously),postprocessing。我感觉还是很直观的。所有的评价和选择都是基于 phrase 这个 unit,然后把 sentence generation 作为一个 optimization 的问题来处理。三个部分都有许多 heuristic,但看起来并不觉得很 dirty。最后 evaluation 部分的第二个部分,用 DUC 那五个方面,grammaticality, non-redundancy, referential clarity, focus and coherence 来评价。不知道是否已经是“标配”。最后我感觉他的 introduction 写的很好,但是把 extractive 中的 compression-based 单提出来当第二类方法,可能有点另类。

  1. Deep Unordered Composition Rivals Syntactic Methods for Text Classification

Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, Hal Daumé III

idea 很简单很简单很简单(有点像 SIGIR’15 的 HRM 的架构),就是deep averaging network——DAN。那用这个 DAN 做啥捏——他们是说,你们 ReNN(RecNN,作者是这么叫,但我记得我好像看到的 Socher 是叫 ReNN),就是 recursive NN,可以 handle 特别复杂的 syntactic + ordered 的 composition 关系——negation 啊 那些句法特征都可以 handle 进来。然并卵呀,你太复杂啦,你为了能提高准确性,在 ReNN 的每个 node 都要加个 classifier 来监督,每个 node 还都有不同的计算——你训练太慢啦。有没有可能你就是杀鸡焉用宰牛刀啊?

于是乎作者就搞了这么个 simple but useful 的架构。每个 sentence input 的时候,都是按词为单位,并且 input unit 是每个词的 word embedding。然后直接 average——作者表示,在以前的工作中大部分人认为 average 比 sum 效果好。这是简单的 neural bag of words——NBOW。然后再变 deep——反正 deep FFNN 的思想就是我每 deep 一层,就更 abstract 嘛。然后实验证明,这样的 deep averaging (DAN)真的几乎和 ReNN 无差别噢,训练速度和 单层 NBOW 几乎无差别呢。虽然任务很简单,是 text classification。但是实验后面的分析很不错。有兴趣的就直接看看那个 Figure 架构和 Section 5 就好了。

今年的 Best Student Paper 得主是来自慕尼黑大学的 Sascha Rothe 和其老师 Hinrich Schutze 的工作,《AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes》。

论文看起来也不是很顺畅。主要是概念有点多,重新组织一下:

论文想探究的是三种data type,word, synset, lexeme,这三种 data type 都常见于 Lexical Resources,比如 WordNet,Freebase, Wiktionary 等等。作者想通过他们在 这种 resources 中的关系,来作为 constraints,去把 word embedding,synset embedding, lexeme embedding 一起学在同一个空间里。同时,论文基于我们任何已有的 word embedding,和任何已有的 resources,不需要额外的 training corpus,就可以得到 synset, lexeme embedding。

先来说三种 data type:

word,不用说了。synset,一组同义词,由多个与不同 word 有关的 lexeme 组成;lexeme,不知道中文叫啥,反正既有一词多义的意思,也有一词多种形态的意思(syntactic)。具体举例可以见 Section 2 的第二段。

基于三种 data type,作者给出了两个 motivation 和 两个 observation 和两个 assumption(都是一个东西):

A word in WordNet can be viewed as a composition

of several lexemes. Lexemes from different

words together can form a synset. When a synset

is given, it can be decomposed into its lexemes.

And these lexemes then join to form words. These

observations are the basis for the formalization of

the constraints encoded in WordNet that will be

presented in the next section: we view words as

the sum of their lexemes and, analogously, synsets

as the sum of their lexemes.

然后这个东西就可以用来做 constraints 了,就是公式(1)(2),也是 Figure 1 架构的主要顺序。word->lexeme->synset->lexeme->word.

除了这俩 motivation 和 这俩 constraints,作者还有第三个 motivation 和 第三个 constraints:

Section 1 中的,认为

The next thing to notice is that this does not only work for words that combine several properties, but also for words that combine several senses. The vector of suit can be seen as the sum of a vector representing lawsuit and a vector representing business suit. AutoExtend is designed to take word vectors as input and unravel the word vectors to the vectors of their lexemes. The lexeme vectors will then give us the synset vectors

而 constraints 第三个则是基于 resources 的性质,在 Section 2.4,用于解决的是当 word 没有 synset 时的问题。

文/yido(简书作者)
原文链接:http://www.jianshu.com/p/73dffce2c23a
著作权归作者所有,转载请联系作者获得授权,并标注“简书作者”。

ACL

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值