Word Vector Representation

  1. SVD Based Methods

    1.1 Word-Document Matrix

    1.2 Window based CO-occurrence Matrix

    In this method we count the number of times each word appears inside a window of a particular size around the word of interest. We calculate this count for all the words in corpus.

    1.3 Advantages:Both of these methods give us word vectors that are more than sufficient to encode semantic and syntacic

    1.4 Shortcomming:

    The dimensions of the matrix change very often(new words are added very frequently and corpus changes in size)

    The matrix is extremely sparse since most words do not co-occur

    The matrix is very hign dimensional in general

    Quadratic cost to train(perform SVD)


  1. Iteration Based Methods

    2.1 CBOW Model

    key idea: Predicting a center word from the surrounding context

    unkonwns: Two matrics, VRn×|V| and UR|V|×n

    Notation for CBOW Model:

    • wi :Word i from vocabulary V

    • VRn×|V| :Input word matrix

    • vi :the input vector representation of word wi

    • URn×|V| :Output word matrix

    • ui :the output vector representation of word wi

    Steps:

    • We generate our one hot word vector( x(cm) ,…, x(c1) , x(c+1) ,…, x(c+m) ) for the input context of size m.

    • We get our embedded word vectors for the context ( Vcm=Vx(cm) , Vcm+1=Vx(cm+1) ,… Vc+m=Vx(c+m) )

    • Average these vectors to get v̂ =vcm+vcm+1+...+vc+m2m

    • Generate a score vector z=Uv̂ 

    • Turn the scores into probabilities ŷ =sofrmax(z)

    • We desire our probabilities generated, ŷ  ,to match the true probabilities,y,which also happens to be the one hot vector of the actual word
      CBOW Model

    2.2 Skip-Gram Model

    key idea: Predicting surrounding context words given a center word

    steps:

    • We generate our one hot input vector x

    • We get our embedded word vectors for the context vc=Vx

    • Since there is no averaging,just set v̂ =vc

    • Generate 2m score vectors, ucm,...,uc1,uc+1,...,uc+m using u=Uvc

    • Turn each of the scores into probabilitiesm, y=softmax(u)

    • We desire our probability vector generated to match the true probabilities which is ycm,...,yc1,yc+1,...,yc+m ,the one hot vecotrs of the actual output

    Skip-Gram Model

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值