--------------- 第一部分 ---------------
1.1 中文词法分析中文词法分析
中文词法分析工具
中科院计算所NLPIR http://ictclas.nlpir.org/nlpir/
ansj分词器 https://github.com/NLPchina/ansj_seg
哈工大的LTP https://github.com/HIT-SCIR/ltp
清华大学THULAC https://github.com/thunlp/THULAC
Stanford分词器 https://nlp.stanford.edu/software/segmenter.shtml
Hanlp分词器 https://github.com/hankcs/HanLP
结巴分词 https://github.com/yanyiwu/cppjieba
KCWS分词器(字嵌入+Bi-LSTM+CRF) https://github.com/koth/kcws
IKAnalyzer https://github.com/wks/ik-analyzer
ZPar https://github.com/frcchang/zpar/releases
1.2 中文语义表示
相关paper
1、Efficient Estimation of Word Representation in Vector Space, 2013
2. Distributed Representations of Words and Phrases and their Compositionality ,2013
3、Distributed Representations of Sentences and Documents, 2014
4. Enriching Word Vectors with Subword Information, 2016
5. GloVe: Global Vectors for Word Representation,2014
1.3 句子级向量建模
lSkip-thoughts
l Quick-thoughts
l ELMo
l ULMFit
l GPT
l BERT
相关论文
1、Skip-Thought Vectors, 2015
2、Deep contextualized word representations,2018
https://github.com/strongio/keras-elmo
https://github.com/allenai/bilm-tf
https://zhuanlan.zhihu.com/p/51132034
3、Universal Language Model Fine-tuning for Text Classification
4、 Improving Language Understanding by Generative Pre-Training
https://github.com/openai/finetune-transformer-lm
5、Generating Wikipedia by Summarizing Long Sequences
6、 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
--------------- 第二部分:nlp在互联网中的应用 ---------------
支持向量机通俗导论(理解SVM的三层境界):https://blog.csdn.net/v_JULY_v/article/details/7624837
博主主页,七月在线,值得学习关注:https://blog.csdn.net/v_JULY_v
相关的论文
1、Convolutional Neural Networks for Sentence Classification textcnn
2. Character-level Convolutional Networks for Text Classification charcnn
3. Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification Bi-lstm+attention
4.Recurrent Convolutional Neural Networks for Text Classification RCNN
5.Adversarial Training Methods For Semi-Supervised Text Classification Adversarial LSTM
6.Attention Is All You Need transformer
7.Deep contextualized word representations ELMo
8.BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Bert
9.https://github.com/jiangxinyang227/textClassifier 推荐文本分类实践地址
--------------- 第三部分 ---------------
一 序列生成
描述:根据给定的文本A,生成文本B
机器翻译/文本摘要/生成式问答系统
Encoder-Decoder(RNN、LSTM、Attention)、seq2seq等
1.1 SMT(统计机器翻译)
1.2 NMT(神经网络机器翻译)
Tensorflow序列到序列的学习:
https://www.tensorflow.org/tutorials/seq2seq
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/ models/rnn/translate
1.3 基于 transformer的机器翻译
lSMT
翻译模型、语言模型、解码器
l NMT
seq2seq Encoder-Decoder Attention
l Transfomer机器翻译
self-attention
聊天机器人实战
1. Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models
2. Attention with Intention for a Neural Network Conversation Model
3. A Persona-Based Neural Conversation Model
4.How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation
5.A Diversity-Promoting Objective Function for Neural Conversation Models
1
SMT(统计机器翻译)SMT(