Better Word Representations with Recursive Neural Networks for Morphology Semantic Compositionality through Recursive Matrix-Vector Spaces Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection Parsing Natural Scenes and Natural Language with Recursive Neural Networks Learning Continuous Phrase Representations and Syntactic Parsing with Recursive Neural Networks
Joint Learning of Words and Meaning Representations for Open-Text Semantic Parsing
翟成祥老师早期在语言模型的工作很有影响力,他在2009年写过一本综述专著:Statistical Language Models for Information Retrieval,建议阅读。
北大 @BatmanFly (现在是人大老师啦)他们做的Knowledge Sharing via Social Login: Exploiting Microblogging Service for Warming up Social Question Answering Websites在微博和知乎之间建立了语义联系,也是很赞的角度。http://t.cn/RPOzhh4
COLING 2014论文集:http://t.cn/RPpdIIk ,首先要去看今年最佳论文,中科院自动化所 @刘康_自动化所 赵军老师团队的大作:Relation Classification via Convolutional Deep Neural Network。:)
斯坦福Richard Socher在EMNLP2014发表新作:GloVe: Global Vectors for Word Representation 粗看是融合LSA等算法的想法,利用global word co-occurrence信息提升word vector学习效果,很有意思,在word analogy task上准确率比word2vec提升了11%。 http://t.cn/RPohHyc
哈工大@张牧宇-哈工大SCIR 的Triple based Background Knowledge Ranking for Document Enrichment利用knowledge triple表示文档,与今年WSDM的Knowledge-based Graph Document Modeling有异曲同工之妙。
发现哈工大的这篇 Learning Sense-specific Word Embeddings By Exploiting Bilingual Resources 利用双语数据学习词义表示。多语角度很有意思。
MSRA A Probabilistic Model for Learning Multi-Prototype Word Embeddings,基于skip-gram采用概率模型和EM算法解决一词多义的表示问题。
@周光有_CAS 和赵军老师在社区问答系统上的工作:Group Non-negative Matrix Factorization with Natural Categories for Question Retrieval in Community Question Answer Archives。最近word embedding和NMF都开始在NLP领域大显身手了。
IBM有篇Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts,在Fine-Grained的评测上效果比Socher的RNTN高大约3个百分点不到。
MSRA有篇 A Probabilistic Model for Learning Multi-Prototype Word Embeddings,基于skip-gram采用概率模型和EM算法解决一词多义的表示问题。这个问题很有实用价值。@陈新雄_THU 也将在今年EMNLP展示我们组在这方面的工作:A Unified Model for Word Sense Representation and Disambiguation。
哈工大和MSRA合作的 Building Large-Scale Twitter-Specific Sentiment Lexicon : A Representation Learning Approach 想法也很有意思,利用word embedding技术构建情感词典。作者 @唐都钰HIT-SCIR 今年还有篇ACL和EMNLP,都是以情感分析为主题,国内NLP新星啊。:)