Tutorials on training the Skip-thoughts vectors for features extraction of sentence.

Tutorials on training the Skip-thoughts vectors for features extraction of sentence. 

 

 

1. Send emails and download the training dataset. 

  the dataset used in skip_thoughts vectors is from [BookCorpus]: http://yknzhu.wixsite.com/mbweb 

  first, you should send a email to the auther of this paper and ask for the link of this dataset. Then you will download the following files: 

  

  unzip these files in the current folders. 

2. Open and download the tensorflow version code.   

  Do as the following links: https://github.com/tensorflow/models/tree/master/research/skip_thoughts

  Then, you will see the processing as follows: 

         

  [Attention]  when you install the bazel, you need to install this software, but do not update it. Or, it may shown you some errors in the following operations. 

 

3. Install the packages needed. 

 

4. Encoding Sentences :   

  run the following py files. 

  

 1 from __future__ import absolute_import
 2 from __future__ import division
 3 from __future__ import print_function
 4 import numpy as np
 5 import os.path
 6 import scipy.spatial.distance as sd
 7 from skip_thoughts import configuration
 8 from skip_thoughts import encoder_manager
 9 import pdb 
10 
11 print("==>> Skip-Thought Vector ")
12 
13 # Set paths to the model.
14 VOCAB_FILE = "./skip_thoughts/pretrained/skip_thoughts_bi_2017_02_16/vocab.txt"
15 EMBEDDING_MATRIX_FILE = "./skip_thoughts/pretrained/skip_thoughts_bi_2017_02_16/embeddings.npy"
16 CHECKPOINT_PATH = "./skip_thoughts/model/train/model.ckpt-78581"
17 # The following directory should contain files rt-polarity.neg and rt-polarity.pos.
18 
19 
20 # Set up the encoder. Here we are using a single unidirectional model.
21 # To use a bidirectional model as well, call load_model() again with
22 # configuration.model_config(bidirectional_encoder=True) and paths to the
23 # bidirectional model's files. The encoder will use the concatenation of
24 # all loaded models.
25 
26 print("==>> loading the pre-trained models ... ")
27 
28 encoder = encoder_manager.EncoderManager()
29 encoder.load_model(configuration.model_config(),
30                    vocabulary_file=VOCAB_FILE,
31                    embedding_matrix_file=EMBEDDING_MATRIX_FILE,
32                    checkpoint_path=CHECKPOINT_PATH)
33 
34 print("==>> Done !") 
35 
36 # Load the movie review dataset.
37 data = [' This is my second attempt  to the tensorflow version skip_thought_vectors ... ']
38 
39 print("==>> the given sentence is: ", data) 
40 
41 # Generate Skip-Thought Vectors for each sentence in the dataset.
42 encodings = encoder.encode(data)
43 
44 print("==>> the sentence feature is: ", encodings)    ## the output feature is 2400 dimension. 

 

wangxiao@AHU:/media/wangxiao/49cd8079-e619-4e4b-89b1-15c86afb5102/skip_thought_vector_onlineModels$ python run_sentence_feature_extraction.py
RuntimeError: module compiled against API version 0xc but this version of numpy is 0xb
RuntimeError: module compiled against API version 0xc but this version of numpy is 0xb
==>> Skip-Thought Vector
==>> loading the pre-trained models ...
WARNING:tensorflow:From ./skip_thoughts/skip_thoughts_model.py:360: create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.create_global_step
2018-05-13 21:36:27.670186: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
==>> Done !
==>> the given sentence is: [' This is my second attempt to the tensorflow version skip_thought_vectors ... ']
==>> the sentence feature is: [[-0.00676637 0.01928637 -0.01759908 ..., 0.00851333 0.00875245 -0.0040213 ]]
> ./run_sentence_feature_extraction.py(48)<module>()
-> print("==>> the encodings[0] is: ", encodings[0])
(Pdb) x = encodings[0]
(Pdb) x.size
2400
(Pdb)

 

as we can see from above terminal, the output feature vector is 2400-D.   

 

 

 

  

...

转载于:https://www.cnblogs.com/wangxiaocvpr/p/7277025.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值