关闭

Week3-4Dimensionality reduction

385人阅读 评论(0) 收藏 举报
分类:

Problems with the simple vector approaches to similarity

这里写图片描述

Dimensionality reduction

  • looking for hidden similarities in data
  • based on matrix decomposition

Matrix decomposition

这里写图片描述

SVD

这里写图片描述

Example

  • Assume that we have 7 Documents with 9 terms.
    这里写图片描述
    e.g. Document 1 contains term 6 and term9.

  • The document term matrix should be 9×7, a column represents a document and a raw represents a term.
    这里写图片描述

Remark: we have to normalize our matrix before svd.

  • Apply the svd decomposition

    M9×7=U9×9Σ9×7VT

    这里写图片描述

  • Σ
    这里写图片描述

  • Rank 2 Σ
    这里写图片描述

    UΣ2 is the 2 rank approximation of the TERM(2 dimension),
    Σ2VT is the 2 rank approximation of the DOCUMENT(2 dimension).

这里写图片描述

Question

what do ATA and AAT mean if A is a document-term matrix 9×7?

  • ATA7×7 is the document-document similarity matrix.
  • AAT9×9 is the term-term similarity matrix.

Latent semantic indexing(LSI, identical to LSA)

  • Dimensionality reduction = identification of hidden(latent) concepts
  • query matching in latent space
0
0

查看评论
* 以上用户言论只代表其个人观点,不代表CSDN网站的观点或立场
    个人资料
    • 访问:13157次
    • 积分:1016
    • 等级:
    • 排名:千里之外
    • 原创:89篇
    • 转载:0篇
    • 译文:2篇
    • 评论:0条