Lecture 1: Introduction & Vector Representations of Text

最新推荐文章于 2024-07-12 18:01:08 发布

Yoga_sky

最新推荐文章于 2024-07-12 18:01:08 发布

阅读量85

点赞数

文章标签：自然语言处理机器学习深度学习

本文链接：https://blog.csdn.net/Yoga_sky/article/details/125155710

版权

Lecture 1: Introduction & Vector Representations of Text

1. Why is NLP challenging?

a. Natural languages are not designed; they evolve.

new words appear constantly
syntactic rules are flexible
ambiguity is inherent
b. World knowledge is necessary for interpretation
c. So many languages

2.NLP vs. ML

NLP is a confluence of computer science, artificial intelligence and
linguistics
ML provides statistical techniques for problem solving by learning from data
ML is often used in modelling NLP tasks

3. NLP vs. Computational Linguistics

Both mostly use text as data
In Computational Linguistics(CL), computational/statistical methods are used to support the study of linguistic phenomena and theories
In NLP, the scope is more general. Computational methods are used for translating text, extracting information, answering questions etc.

4. Vectors and Vector Space

Vector: one-dimensional array
Vector Space: matrix

5. Vector similarity

Dot(inner) product: takes two equal-length sequences of numbers(i.e. vextors) and returns a single value

Cosine similarity: normalise dot product([0,1])by dividing withe vectors’ lengths( or magnitude or norm)|x|

6. Why need vector representations of text?

for semantic similarity
for document retrieval
for clustering/classification algorithms operate on vectors

7. how to deal with the hidimensionality and sparsity of count-based matrices

Dimensionality reduction to the rescue:

find the most important dimensions of dataset
SVD
Approximation

8. Limitations for Word vectors

polysemy
antonyms: hard to distinguish the similar contexts are synonyms or antonyms
compositionality: hard to obtain the nearning of a sequence of words

9. Evaluation of word-word Vectors

similarity
improve performace in a task

10. Evaluation of document vectors

Document similarity
information retrieval
text classification
plagiarism detection

11. Limitation of Document vectors

word order is ignored

Yoga_sky

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
Lecture 1: Introduction & Vector Representations of Text

Lecture 1: Introduction & Vector Representations of Text
复制链接

扫一扫