word2vec实践(一):预备知识

word2vec是google最新发布的深度学习工具,它利用神经网络将单词映射到低维连续实数空间,又称为单词嵌入。词与词之间的语义相似度可以通过两个单词的嵌入向量之间的余弦夹角直接衡量,更不用说使用诸如kmeans、层次聚类这样的算法来挖掘其功能了,同时作者Tomas Mikolov发现了比较有趣的现象,就是单词经过分布式表示后,向量之间依旧保持一定的语法规则,比如简单的加减法规则。
目前网络上有大量的实践文章和理论分析文章。主要列举如下:
理论分析文章:
Deep Learning实战之word2vec

Deep Learning in NLP (一)词向量和语言模型

word2vec傻瓜剖析

word2vec学习+使用介绍

实践部分:

利用中文数据跑Google开源项目word2vec

分词工具ANSJ(实例)

Word2vec在事件挖掘中的调研

参考文献:
 
 
[1] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. In Proceedings of Workshop at ICLR, 2013. [2] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of NIPS, 2013. [3] Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. Linguistic Regularities in Continuous Space Word Representations. In Proceedings of NAACL HLT, 2013. [4] Tomas Mikolov, Stefan Kombrink, Lukas Burget, Jan Cernocky, and Sanjeev Khudanpur. Extensions of recurrent neural network language model. In Acoustics, Speech and Signal Processing (ICASSP), 2011, IEEE International Conference on, pages 5528–5531. IEEE, 2011. [5] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. ICLR Workshop, 2013. [6] Frederic Morin and Yoshua Bengio. Hierarchical probabilistic neural network language model. In Proceedings of the international workshop on artificial intelligence and statistics, pages 246–252, 2005. [7] Andriy Mnih and Geoffrey E Hinton. A scalable hierarchical distributed language model. Advances in neural information processing systems, 21:1081–1088,2009. [8] Hinton, Geoffrey E. "Learning distributed representations of concepts." Proceedings of the eighth annual conference of the cognitive science society. 1986. [9] R. Rosenfeld, "Two decades of statistical language modeling: where do we go from here?", Proceedings of the IEEE, 88(8), 1270-1288, 2000. [10] Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc’Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, and Andrew Y. Ng. "Large Scale Distributed Deep Networks". Proceedings of NIPS, 2012.
[11] http://licstar.net/archives/328
[12] http://www.cs.columbia.edu/~mcollins/loglinear.pdf
[13] A. Mnihand G. Hinton. Three new graphical models for statistical language modelling. Proceedings of the 24th international conference on Machine learning, pages 641–648, 2007 [14] Frederic Morin and Yoshua Bengio. Hierarchical probabilistic neural network language model.In Robert G. Cowell and Zoubin Ghahramani, editors, AISTATS’05,
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值