NLP--1_::皆lnlp:i小7↙↓y:丫1-CSDN博客

本文链接：https://blog.csdn.net/weixin_39328611/article/details/113808475

1 NLP，一种交叉学科使自然语言access to电脑

交流，输入输出
理解，获得、使用信息情感内容
语言协助（检查语法连贯）
2 相关领域
计算机语言学- 提供architectural inspiration for NLP
systems.；NLP更关注design 和分析自然语言的方法
AI 语言与概念**，表示和推理能力**相互依赖，knowlege的获取需要从自然语言输入中提取信息能力；
-ML ：NLP依赖ML,用监督，半监督，强化学习
text是离散信号，用ML模型处理此类信号的输入和输出的泛化
Speech处理：不是NLP的一部分，NLP apps提供为其提供输入；语言建模在二者之中都重要
3 自然语言长度差别大，varies
pipeline of modules
在这个通用管道的元素之上构建专门的NLP应用程序，作为相对简单的附加内容。
end-to-end
transform the raw input to the required output without specialized linguistic analyzer modules.
当前很多人用some universal analyzer modules for word segmentation or stemming
然后也用 ML models 跳过一些 traditional pipeline steps，产生需要的输出
4 迁移学习
end-to-end pretrained on unsupervised tasks on very large text collections
保留预训练weight（小调整）再加一些浅层形成专用模型
类似于传统的pipeline的成分一些学习morphology，一系诶学习semantics、
5 监督NLP->优化问题

预测，scoring function(model),
x,y的内部结构可能复杂，y可以是树
6 学习的过程就是找到最优参数的过程，在监督数据上用数值优化方法
search的过程是找到最好的y对某个x，输出argmax，Y（x）有时很大例如parse树，需要结合优化
7 relational perspective 概念语义联系在utterance 表达
（怎么知道cat在动物类别里）
8 compositional perspective 分析meaning表达根据内部组成结构
un|bear|able|s
9 distributional 角度不知道意思也不知道部分的意思。利用相似的distribution
好处是自动学习从large but unlabeled text collections 不用专业知识和符号
也有弊端，例如罕见词，以及无法提供为什么可以从这些相似的distribution学习相似性

参考

Dan Jurafsky and James H. Martin,
Speech and Language Processing 3rd ed.
https://web.stanford.edu/∼jurafsky/slp3.
Jacon Eisenstein,
Natural Language Processing.
https://github.com/jacobeisenstein/gt-nlp-class.
https://spacy.io/usage/processing-pipelines