Lecture 1: Introduction & Vector Representations of Text

Lecture 1: Introduction & Vector Representations of Text

1. Why is NLP challenging?

a. Natural languages are not designed; they evolve.

  • new words appear constantly
  • syntactic rules are flexible
  • ambiguity is inherent
    b. World knowledge is necessary for interpretation
    c. So many languages

2.NLP vs. ML

  • NLP is a confluence of computer science, artificial intelligence and
    linguistics
  • ML provides statistical techniques for problem solving by learning from data
  • ML is often used in modelling NLP tasks

3. NLP vs. Computational Linguistics

  • Both mostly use text as data
  • In Computational Linguistics(CL), computational/statistical methods are used to support the study of linguistic phenomena and theories
  • In NLP, the scope is more general. Computational methods are used for translating text, extracting information, answering questions etc.

4. Vectors and Vector Space

Vector: one-dimensional array
Vector Space: matrix

5. Vector similarity

Dot(inner) product: takes two equal-length sequences of numbers(i.e. vextors) and returns a single value

Cosine similarity: normalise dot product([0,1])by dividing withe vectors’ lengths( or magnitude or norm)|x|

6. Why need vector representations of text?

for semantic similarity
for document retrieval
for clustering/classification algorithms operate on vectors

7. how to deal with the hidimensionality and sparsity of count-based matrices

Dimensionality reduction to the rescue:

  • find the most important dimensions of dataset
  • SVD
  • Approximation

8. Limitations for Word vectors

polysemy
antonyms: hard to distinguish the similar contexts are synonyms or antonyms
compositionality: hard to obtain the nearning of a sequence of words

9. Evaluation of word-word Vectors

similarity
improve performace in a task

10. Evaluation of document vectors

Document similarity
information retrieval
text classification
plagiarism detection

11. Limitation of Document vectors

word order is ignored

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值