Week3-4Dimensionality reduction

Problems with the simple vector approaches to similarity

这里写图片描述

Dimensionality reduction

  • looking for hidden similarities in data
  • based on matrix decomposition

Matrix decomposition

这里写图片描述

SVD

这里写图片描述

Example

  • Assume that we have 7 Documents with 9 terms.
    这里写图片描述
    e.g. Document 1 contains term 6 and term9.

  • The document term matrix should be 9×7, a column represents a document and a raw represents a term.
    这里写图片描述

Remark: we have to normalize our matrix before svd.

  • Apply the svd decomposition

    M9×7=U9×9Σ9×7VT

    这里写图片描述

  • Σ
    这里写图片描述

  • Rank 2 Σ
    这里写图片描述

    UΣ2 is the 2 rank approximation of the TERM(2 dimension),
    Σ2VT is the 2 rank approximation of the DOCUMENT(2 dimension).

这里写图片描述

Question

what do ATA and AAT mean if A is a document-term matrix 9×7?

  • ATA7×7 is the document-document similarity matrix.
  • AAT9×9 is the term-term similarity matrix.

Latent semantic indexing(LSI, identical to LSA)

  • Dimensionality reduction = identification of hidden(latent) concepts
  • query matching in latent space
阅读更多
版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/zypandora/article/details/49890981
个人分类: NLP(Michigan)
想对作者说点什么? 我来说一句

caffe-reduction layer

Reduction

sau_lwy sau_lwy

2016-10-05 20:07:59

阅读数:1747

没有更多推荐了,返回首页

不良信息举报

Week3-4Dimensionality reduction

最多只允许输入30个字

加入CSDN,享受更精准的内容推荐,与500万程序员共同成长!
关闭
关闭