StarSpace系列之一:tagspace

问题类型

TagSpace 单词、标签的嵌入
用途: 学习从短文到相关主题标签的映射,例如,在 这篇文章 中的描述。这是一个典型的分类应用。

模型: 通过学习两者的嵌入,学习的映射从单词集到标签集。 例如,输入“restaurant has great food <\tab> #restaurant <\tab> #yum”将被翻译成下图。(图中的节点是要学习嵌入的实体,图中的边是实体之间的关系。

在这里插入图片描述

训练数据

training:

training data

The AG’s news topic classification dataset is constructed by choosing 4 largest classes from the original corpus. Each class contains 30,000 training samples and 1,900 testing samples. The total number of training samples is 120,000 and testing 7,600.
新闻数据,4大类,12万篇。
World
Sports
Business
Sci/Tech

数据样例

The file classes.txt contains a list of classes corresponding to each label.

__label__2 , garca winds up best in tough going , given what sergio garca has achieved in his career already it is difficult to believe he is only 24 years old . he had a 67 yesterday , four under , to share the volvo masters lead with his fellow spaniard
__label__3 , us shares take a tumble on oil prices , new york , nov 23 ( afp ) - wall street shares slid on tuesday as oil prices surged higher and investors sensed weaknesses in the technology sector .
__label__4 , product review blackberry 7100t smartphone ( newsfactor ) , newsfactor - research in motion ' s ( nasdaq rimm ) quad-band \blackberry 7100t with \pda capabilities is a gsm/gprs ( 850/900/1800/1900 mhz ) cellular handset that can make and receive phone calls in more than 100 countries around the world .

训练

./classification_ag_news.sh
Downloading dataset ag_news
Compiling StarSpace
make: *** No targets specified and no makefile found.  Stop.
Start to train on ag_news data:
Arguments:
lr: 0.01
dim: 10
epoch: 5
maxTrainTime: 8640000
validationPatience: 10
saveEveryEpoch: 0
loss: hinge
margin: 0.05
similarity: dot
maxNegSamples: 10
negSearchLimit: 5
batchSize: 5
thread: 20
minCount: 1
minCountLabel: 1
label: __label__
label: __label__
ngrams: 1
bucket: 2000000
adagrad: 0
trainMode: 0
fileFormat: fastText
normalizeText: 0
dropoutLHS: 0
dropoutRHS: 0
useWeight: 0
weightSep: :
Start to initialize starspace model.
Build dict from input file : /tmp/starspace/data/ag_news.train
Read 5M words
Number of words in dictionary:  95811
Number of la
  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值