论文阅读:《a simple but tough-to-beat baseline for sentence embeddings》

https://openreview.net/pdf?id=SyK00v5xx
原文:http://www.hankcs.com/nlp/cs224n-sentence-embeddings.html

句子Embedding动机

虽然这节课一直在讲词向量可以编码词的意思,但自然语言处理真正关心的是整个句子的意思。

hankcs.com 2017-06-07 下午4.07.53.png

如果我们能够拿到句子的向量表示,则可以方便地用内积计算相似度:

hankcs.com 2017-06-07 下午4.08.24.png

还可以在这些句子向量之上构建分类器做情感分析:

hankcs.com 2017-06-07 下午4.08.42.png

已有方法

具体怎么由词向量到句向量呢?有很多种方法,比如词袋模型中简单地线性运算:

hankcs.com 2017-06-07 下午4.09.14.png

在后面的课程中,将会用recurrent neural network、recursive neural network,CNN来做同样的事情。

hankcs.com 2017-06-07 下午4.09.29.png

新方法

但今天要介绍的这篇普林斯顿大学的论文却剑走偏锋,采用了一种简单的无监督方法。这种方法简单到只有两步:

hankcs.com 2017-06-07 下午4.09.53.png

第一步,对句子中的每个词向量,乘以一个独特的权值。这个权值是一个常数$\alpha$除以$\alpha$与该词语频率的和,也就是说高频词的权值会相对下降。求和后得到暂时的句向量。

然后计算语料库所有句向量构成的矩阵的第一个主成分$u$,让每个句向量减去它在$u$上的投影(类似PCA)。其中,一个向量$v$在另一个向量$u$上的投影定义如下:

$$\text{Proj}_u v=\frac{u u^Tv}{\Vert u \Vert^2}$$

概率论解释

其原理是,给定上下文向量,一个词的出现概率由两项决定:作为平滑项的词频,以及上下文:

hankcs.com 2017-06-07 下午4.10.26.png

其中第二项的意思是,有一个平滑变动的上下文随机地发射单词。

效果

hankcs.com 2017-06-07 下午4.13.14.png

在句子相似度任务上超过平均水平,甚至超过部分复杂的模型。在句子分类上效果也很明显,甚至是最好成绩。

展开阅读全文

A Simple Question of Chemistry

03-11

Your chemistry lab instructor is a very enthusiastic graduate student who clearly has forgotten what their undergraduate Chemistry 101 lab experience was like. Your instructor has come up with the brilliant idea that you will monitor the temperature of your mixture every minute for the entire lab. You will then plot the rate of change for the entire duration of the lab.nBeing a promising computer scientist, you know you can automate part of this procedure, so you are writing a program you can run on your laptop during chemistry labs. (Laptops are only occasionally dissolved by the chemicals used in such labs.) You will write a program that will let you enter in each temperature as you observe it. The program will then calculate the difference between this temperature and the previous one, and print out the difference. Then you can feed this input into a simple graphing program and finish your plot before you leave the chemistry lab.nnnInputnnThe input is a series of temperatures, one per line, ranging from -10 to 200. The temperatures may be specified up to two decimal places. After the final observation, the number 999 will indicate the end of the input data stream. All data sets will have at least two temperature observations.nnnOutputnnYour program should output a series of differences between each temperature and the previous temperature. There is one fewer difference observed than the number of temperature observations (output nothing for the first temperature). Differences are always output to two decimal points, with no leading zeroes (except for the ones place for a number less than 1, such as 0.01) or spaces.nnAfter the final output, print a line with "End of Output"nnnSample Inputnn10.0n12.05n30.25n20n999nnnSample Outputnn2.05n18.20n-10.25nEnd of Output 问答

a simple question about deploying a simple bean in deploytool of j2eesdk

04-19

distribute: D:\examples\MyFirstExample.earrn*** Operation failed:rnjava.lang.IllegalStateException: Premature end of file.rnrnat com.sun.enterprise.deployapi.SunDeploymentManager.distribute(SunDeploymentManager.java:728)rnrnat com.sun.enterprise.deployapi.SunDeploymentManager.distribute(SunDeploymentManager.java:249)rnrnat com.sun.enterprise.tools.deployment.ui.deploy.DeploymentPlatform._distribute(DeploymentPlatform.java:955)rnrnat com.sun.enterprise.tools.deployment.ui.deploy.DeploymentPlatform._deploy(DeploymentPlatform.java:1051)rnrnat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)rnrnat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)rnrnat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)rnrnat java.lang.reflect.Method.invoke(Method.java:324)rnrnat com.sun.enterprise.tools.deployment.ui.utils.UIActionDispatcher.invoke(UIActionDispatcher.java:365)rnrnat com.sun.enterprise.tools.deployment.ui.utils.UIActionDispatcher.invoke(UIActionDispatcher.java:387)rnrnat com.sun.enterprise.tools.deployment.ui.deploy.DeploymentPlatform$DeploymentProcess.run(DeploymentPlatform.java:1492)rnrnat com.sun.enterprise.tools.deployment.ui.utils.ExecProcess$ThreadProcess.run(ExecProcess.java:570)rnrnat java.lang.Thread.run(Thread.java:536)rnrnrn**********************************************************************rn[Completed (time=0.4sec, status=20)]rn**********************************************************************rnrnrnthe source code has compiled successfully,so there should be no problem in the files itseldrnrnand the deployment descriptor is as follows:rnrnrnrn Your first EJB application rnrn JUST A TESTrnrn rnrn rnrn Examplernrn examples.ExampleHomernrn examples.Examplernrn examples.ExampleBeanrnrn Statelessrnrn Containerrnrn rnrn rnrnrnrnthe sun-j2ee-ri.xml is as follows:rnrnrnrnrn rn rn Examplern Examplern rn rn rnrnrnrnand I am sure the directory structures is all rightrn.The problem may result from deploying,but I just do not know why?rni am using standard deploytool in j2eesdkrnrnthank you very much!!!!!!!rnrn 论坛

没有更多推荐了,返回首页