自己动手训练word2vec模型

1.下载数据
http://www.sogou.com/labs/resource/cs.php

2.清洗数据
tar -zxvf news_sohusite_xml.full.tar.gz
cat news_sohusite_xml.dat | iconv -f gb18030 -t utf-8 | grep "<content>" >news_sohusite.txt
sed -i 's/<content>//g' news_sohusite.txt
sed -i 's/<\/content>//g' news_sohusite.txt

3.centos安装python-2.7.13
下载压缩包:wget https://www.python.org/ftp/python/2.7.13/Python-2.7.13.tgz
写一个安装脚本install_py27.sh,然后chmod 775 install_py27.sh

4. 中文分词
pip install jieba
python -m jieba -d ' ' /opt/data/news_sohusite2.txt > /opt/data/news_sohusite_cutword.txt
python -m jieba -d ' ' other_entdesc.txt > other_entdesc_cutword.txt  自己的语料
# 合并切割好的语料
cat news_sohusite_cutword.txt news_tensite_cutword.txt other_entdesc_cutword.txt > w2v_chisim_corpus.txt


5.
install_py27.sh


#!/bin/sh 
# __author__ = 'junxi'

# This script is used by fast installed python2.7 ......
# write by 2017/04/11
# 压缩包和脚本放在同一个目录下
echo "##############start run install for python2.7 script############"
yum -y install python-devel openssl openssl-devel gcc sqlite sqlite-devel mysql-devel libxml2-devel libxslt-devel
mkdir /software
mv Python-2.7.13.tgz /software
cd /software
tar -zxf Python-2.7.13.tgz
cd Python-2.7.13/
./configure --prefix=/usr/local/python2.7 --with-threads --enable-shared
make
make altinstall
mv /usr/bin/pip /usr/bin/pip_old
mv /usr/bin/easy_install /usr/bin/easy_install_old
mv /usr/bin/python /usr/bin/python_old
ln -s /usr/local/python2.7/lib/libpython2.7.so /usr/lib
ln -s /usr/local/python2.7/lib/libpython2.7.so.1.0 /usr/lib
ln -s /usr/local/python2.7/bin/python2.7 /usr/bin/python
ln -s /usr/local/python2.7/lib/libpython2.7.so /usr/lib64
ln -s /usr/local/python2.7/lib/libpython2.7.so.1.0 /usr/lib64
cd /software
wget https://bootstrap.pypa.io/get-pip.py
python get-pip.py
ln -s /usr/local/python2.7/bin/pip /usr/bin/pip
echo "############更换pip源为国内淘宝源##########"
mkdir /root/.pip/
touch /root/.pip/pip.conf
cat >> /root/.pip/pip.conf << EOF
[global]
index-url=http://mirrors.aliyun.com/pypi/simple/ 

[install]
trusted-host=mirrors.aliyun.com
EOF

pip install Pillow
sed -i 's#\/usr/bin/python#\/usr/bin/python2.6#g' /usr/bin/yum
yum -y install python-devel
echo 'the install script is the end......'
ModuleNotFoundError: No module named xxx 

ModuleNotFoundError: No module named xxx 或者 '__main__' is not a package 报错:

当前目录下增加__init__.py文件,然后在当前目录的上一层新建py文件、导入当前目录的文件,

然后在

if __name__ == '__main__':

运行当前文件的方法
pip install tensorflow超时异常
可选用国内源:

pip install --index-url https://pypi.douban.com/simple tensorflow 或

pip install --index-url http://mirrors.aliyun.com/pypi/simple/ tensorflow

 

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值