环境已保存到矩池云
将 多个句子(str list) -> 多个特征向量(numpy array dtype=float32 shape=[n, 4800])
理论部分
参照word2vec的skip-gram模型思想,拓展到句子级别,用中间的句子来预测上下文两个句子,上图中虚线框内隐藏层输出即为整个输入句子的4800d特征。
安装环境
conda create -n SkipThoughts python=2.7
conda activate SkipThoughts
pip install theano==1.0.4
pip install scikit-learn==0.20.4
pip install nltk==3.1
pip install keras==2.9.0
pip install gensim==3.8.3
下载项目
二选一:
-
从下方百度网盘中下载2个压缩包,解压后进入skip-thoughts-master目录
链接:https://pan.baidu.com/s/12x_6hrEX-rWw1TluQO5UdQ?pwd=x555
提取码:x555
测试
运行data目录下的main.py
# main.py
import skipthoughts
model = skipthoughts.load_model()
encoder = skipthoughts.Encoder(model)
x = ['Hello world', 'This is a test sentence']
vec = encoder.encode(x)
print vec
print vec.shape
# 运行结果
# numpy array dtype=float32 shape=[n, 4800]
[[ 0.00759725 -0.02385723 -0.00653511 ... -0.00756364 -0.00080156
0.00343018]
[ 0.01462886 -0.00813354 0.00445716 ... -0.05013087 -0.04203623
0.01157508]]
(2, 4800)