从零到实践《知乎"看山杯"第一名 init 队解决方案(PyTorch)》

        首先我是一名JAVA开发者,对Python了解较少,最近工作需要对大量文本进行分析整理,然后就开始从网上找资料,从知乎渠道了解到知乎举办的看山杯比赛,找到了冠军init队的解决方案,便开始了尝试。我的思路可能是错误的。

事实证明:机器学习需要带GPU的大内存linux系统,虚拟机安装的系统无法计算。

        首先linux系统需要64位,我使用虚拟机安装了linux系统。

        虚拟机版本:VMware-workstation-full-14.1.1-7528167.exe,14pro版本

        linux系统版本:CentOS-7-x86_64-DVD-1708.iso  4G左右 系统是从阿里云镜像站下载的

        写这个博客的目的是记录我操作过程中步骤及问题解决办法。因为是一遍操作一遍记录,所以篇幅可能没有排版,后面做完之后会进行整理排版,另外本人也是一个相对完美主义者。

        1,安装完系统后,内存设置为2G,硬盘:40G,因为电脑配置低,没有满足init解决方案的最低配置,但抱着尝试的态度去尝试。现在无法知道最后是否能完成,虚拟机默认为NET网卡,开启系统后,在虚拟机中操作ifconfig,(如果安装的是简单系统版本是没有这个命令的),没有ip地址。也无法ping通baidu.com。首先要可以与主机网络互通,所以执行以下操作:

[root@localhost ~]# vi /etc/sysconfig/network-scripts/ifcfg-ens33 

打开后编辑

ONBOOT为yes   ---刚开始为no
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=dhcp
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=ens33
UUID=763f71d6-ec83-4bee-8322-4c903f6b78ed
DEVICE=ens33
ONBOOT=yes

        编辑完成后,保存退出,重启网络或者重启系统,这里由于固定IP操作比较麻烦,所以未做这一步。

        重启完成后,我使用XSHELL进行连接操作。该软件视图清晰,而且容易复制。在虚拟机中操作不可以复制粘贴命令,不太方便。连接之后,执行命令。表示可以联网了。

[root@localhost ~]# ping baidu.com
PING baidu.com (123.125.115.110) 56(84) bytes of data.
64 bytes from 123.125.115.110 (123.125.115.110): icmp_seq=1 ttl=128 time=130 ms   

        接下来可能要安装软件,所以看下yum是否可以操作,如下命令表示可以使用yum

[root@localhost ~]# yum install unzip
已加载插件:fastestmirror
base   

        系统自带Python,因为方案是用的Python2.7,所以无需再重新安装

[root@localhost ~]# python
Python 2.7.5 (default, Aug  4 2017, 00:39:18) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 

        安装pip和wheel,setuptools

[root@localhost /]# mkdir weblogic
[root@localhost /]# ls
bin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var  weblogic
[root@localhost /]# cd weblogic/
[root@localhost weblogic]# curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1603k  100 1603k    0     0   465k      0  0:00:03  0:00:03 --:--:--  465k
[root@localhost weblogic]# python get-pip.py
Collecting pip
  Downloading https://files.pythonhosted.org/packages/0f/74/ecd13431bcc456ed390b44c8a6e917c1820365cbebcb6a8974d1cd045ab4/pip-10.0.1-py2.py3-none-any.whl (1.3MB)
    100% |████████████████████████████████| 1.3MB 294kB/s 
Collecting setuptools
  Downloading https://files.pythonhosted.org/packages/7f/e1/820d941153923aac1d49d7fc37e17b6e73bfbd2904959fffbad77900cf92/setuptools-39.2.0-py2.py3-none-any.whl (567kB)
    100% |████████████████████████████████| 573kB 406kB/s 
Collecting wheel
  Downloading https://files.pythonhosted.org/packages/81/30/e935244ca6165187ae8be876b6316ae201b71485538ffac1d718843025a9/wheel-0.31.1-py2.py3-none-any.whl (41kB)
    100% |████████████████████████████████| 51kB 729kB/s 
Installing collected packages: pip, setuptools, wheel
Successfully installed pip-10.0.1 setuptools-39.2.0 wheel-0.31.1
[root@localhost weblogic]# 
[root@localhost weblogic]# ls
get-pip.py  ipdb-0.11.tar.gz  pip-10.0.1.tar.gz  setuptools-39.2.0  setuptools-39.2.0.zip  torch-0.1.12.post2-cp27-none-linux_x86_64.whl  wheel-0.31.1  wheel-0.31.1.tar.gz
[root@localhost weblogic]# cd wheel-0.31.1
[root@localhost wheel-0.31.1]# python setup.py install

安装(PyTorch)

[root@localhost weblogic]# pip install torch-0.1.12.post2-cp27-none-linux_x86_64.whl 
Processing ./torch-0.1.12.post2-cp27-none-linux_x86_64.whl

安装GIT 

[root@localhost PyTorchText-master]# yum install curl-devel expat-devel gettext-devel openssl-devel zlib-devel gcc perl-ExtUtils-MakeMaker
已加载插件:fastestmirror
Loading mirror speeds from cached hostfile

 下载git安装包

    wget https://www.kernel.org/pub/software/scm/git/git-2.8.3.tar.gz

  解压git安装包

    tar -zxvf git-2.8.3.tar.gz

    cd git-2.8.3

[root@localhost git-2.8.3]# pwd
/weblogic/git-2.8.3
[root@localhost git-2.8.3]# gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28)
Copyright © 2015 Free Software Foundation, Inc.
本程序是自由软件;请参看源代码的版权声明。本软件没有任何担保;
包括没有适销性和某一专用目的下的适用性担保。
[root@localhost git-2.8.3]# ./configure prefix=/usr/local/git/
configure: Setting lib to 'lib' (the default)
./check_bindir "z$bindir" "z$execdir" "$bindir/git-add"
[root@localhost git-2.8.3]# git --version
git version 1.8.3.1
[root@localhost git-2.8.3]# 
[root@localhost PyTorchText-master]# pip install Cython
Collecting Cython
  Downloading https://files.pythonhosted.org/packages/f6/23/ef5521e077e9e7ef8e4603e27713ae95fee69e9c19c7cd036b4299c7ced5/Cython-0.28.3-cp27-cp27mu-manylinux1_x86_64.whl (3.3MB)
    100% |████████████████████████████████| 3.3MB 486kB/s 
Installing collected packages: Cython
Successfully installed Cython-0.28.3
[root@localhost PyTorchText-master]# 

安装fasttext时,如果用pip会报错, 

ImportError: No module named Cython.Build

解决方案如下:

pip install Cython

pip install fasttext   ---这个安装报错了。信息如下

[root@localhost PyTorchText-master]# pip install fasttext
Collecting fasttext
  Downloading https://files.pythonhosted.org/packages/a4/86/ff826211bc9e28d4c371668b30b4b2c38a09127e5e73017b1c0cd52f9dfa/fasttext-0.8.3.tar.gz (73kB)
    100% |████████████████████████████████| 81kB 315kB/s 
Collecting numpy>=1 (from fasttext)
  Downloading https://files.pythonhosted.org/packages/c0/e7/08f059a00367fd613e4f2875a16c70b6237268a1d6d166c6d36acada8301/numpy-1.14.3-cp27-cp27mu-manylinux1_x86_64.whl (12.1MB)
    100% |████████████████████████████████| 12.1MB 392kB/s 
Collecting future (from fasttext)
  Downloading https://files.pythonhosted.org/packages/00/2b/8d082ddfed935f3608cc61140df6dcbf0edea1bc3ab52fb6c29ae3e81e85/future-0.16.0.tar.gz (824kB)
    100% |████████████████████████████████| 829kB 441kB/s 
Building wheels for collected packages: fasttext, future
  Running setup.py bdist_wheel for fasttext ... error
  Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-DZjW32/fasttext/setup.py';f=getattr(tokenize, 'open', open)(__file__);
code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/pip-wheel-KodiTL --python-tag cp27:  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-2.7
  creating build/lib.linux-x86_64-2.7/fasttext
  copying fasttext/__init__.py -> build/lib.linux-x86_64-2.7/fasttext
  copying fasttext/model.py -> build/lib.linux-x86_64-2.7/fasttext
  running build_ext
  building 'fasttext.fasttext' extension
  creating build/temp.linux-x86_64-2.7
  creating build/temp.linux-x86_64-2.7/fasttext
  creating build/temp.linux-x86_64-2.7/fasttext/cpp
  creating build/temp.linux-x86_64-2.7/fasttext/cpp/src
  gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=ge
neric -D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I./fasttext -I/usr/include/python2.7 -c fasttext/fasttext.cpp -o build/temp.linux-x86_64-2.7/fasttext/fasttext.o -O3 -pthread -funroll-loops -std=c++0x  gcc: error trying to exec 'cc1plus': execvp: 没有那个文件或目录
  error: command 'gcc' failed with exit status 1
  
  ----------------------------------------
  Failed building wheel for fasttext
  Running setup.py clean for fasttext
  Running setup.py bdist_wheel for future ... done
  Stored in directory: /root/.cache/pip/wheels/bf/c9/a3/c538d90ef17cf7823fa51fc701a7a7a910a80f6a405bf15b1a
Successfully built future
Failed to build fasttext
Installing collected packages: numpy, future, fasttext
  Running setup.py install for fasttext ... error
    Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-DZjW32/fasttext/setup.py';f=getattr(tokenize, 'open', open)(__file__
);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-VyEfve/install-record.txt --single-version-externally-managed --compile:    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-2.7
    creating build/lib.linux-x86_64-2.7/fasttext
    copying fasttext/__init__.py -> build/lib.linux-x86_64-2.7/fasttext
    copying fasttext/model.py -> build/lib.linux-x86_64-2.7/fasttext
    running build_ext
    building 'fasttext.fasttext' extension
    creating build/temp.linux-x86_64-2.7
    creating build/temp.linux-x86_64-2.7/fasttext
    creating build/temp.linux-x86_64-2.7/fasttext/cpp
    creating build/temp.linux-x86_64-2.7/fasttext/cpp/src
    gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=
generic -D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I./fasttext -I/usr/include/python2.7 -c fasttext/fasttext.cpp -o build/temp.linux-x86_64-2.7/fasttext/fasttext.o -O3 -pthread -funroll-loops -std=c++0x    gcc: error trying to exec 'cc1plus': execvp: 没有那个文件或目录
    error: command 'gcc' failed with exit status 1
    
    ----------------------------------------
Command "/usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-DZjW32/fasttext/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace(
'\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-VyEfve/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-install-DZjW32/fasttext/
安装TensorFlow

pip install -r requirements.txt

[root@localhost PyTorchText-master]# pip install -r requirements.txt 
Collecting git+https://github.com/pytorch/tnt.git@master (from -r requirements.txt (line 5))
  Cloning https://github.com/pytorch/tnt.git (to revision master) to /tmp/pip-req-build-E_05vl
Collecting ipdb (from -r requirements.txt (line 1))
Collecting fire (from -r requirements.txt (line 2))
Collecting tqdm (from -r requirements.txt (line 3))
  Using cached https://files.pythonhosted.org/packages/93/24/6ab1df969db228aed36a648a8959d1027099ce45fad67532b9673d533318/tqdm-4.23.4-py2.py3-none-any.whl
Collecting visdom (from -r requirements.txt (line 4))
Collecting word2vec (from -r requirements.txt (line 6))
  Using cached https://files.pythonhosted.org/packages/5b/33/8e1cf93216342f0fe8aa4484ef1a833a12c4f6d6bf8e8b46ecc0feb5e5e8/word2vec-0.9.2.tar.gz
Requirement already satisfied: torch in /usr/lib64/python2.7/site-packages (from torchnet==0.0.2->-r requirements.txt (line 5)) (0.1.12.post2)
Requirement already satisfied: six in /usr/lib/python2.7/site-packages (from torchnet==0.0.2->-r requirements.txt (line 5)) (1.11.0)
Collecting ipython<6.0.0,>=5.0.0; python_version == "2.7" (from ipdb->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/52/19/aadde98d6bde1667d0bf431fb2d22451f880aaa373e0a241c7e7cb5815a0/ipython-5.7.0-py2-none-any.whl
Requirement already satisfied: setuptools in /usr/lib/python2.7/site-packages (from ipdb->-r requirements.txt (line 1)) (39.2.0)
Collecting torchfile (from visdom->-r requirements.txt (line 4))
Collecting pyzmq (from visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/5d/b0/3aea046f5519e2e059a225e8c924f897846b608793f890be987d07858b7c/pyzmq-17.0.0-cp27-cp27mu-manylinux1_x86_64.whl
Requirement already satisfied: numpy>=1.8 in /usr/lib64/python2.7/site-packages (from visdom->-r requirements.txt (line 4)) (1.14.3)
Collecting tornado (from visdom->-r requirements.txt (line 4))
Collecting websocket-client (from visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/8a/a1/72ef9aa26cfe1a75cee09fc1957e4723add9de098c15719416a1ee89386b/websocket_client-0.48.0-py2.py3-none-any.whl
Collecting pillow (from visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/00/49/a0483e7308b4b04b5a898789911dbb876d9fea54e7df0453915e47744cfd/Pillow-5.1.0-cp27-cp27mu-manylinux1_x86_64.whl
Collecting scipy (from visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/2a/f3/de9c1bd16311982711209edaa8c6caa962db30ebb6a8cc6f1dcd2d3ef616/scipy-1.1.0-cp27-cp27mu-manylinux1_x86_64.whl
Collecting requests (from visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/49/df/50aa1999ab9bde74656c2919d9c0c085fd2b3775fd3eca826012bef76d8c/requests-2.18.4-py2.py3-none-any.whl
Requirement already satisfied: cython in /usr/lib64/python2.7/site-packages (from word2vec->-r requirements.txt (line 6)) (0.28.3)
Requirement already satisfied: pyyaml in /usr/lib64/python2.7/site-packages (from torch->torchnet==0.0.2->-r requirements.txt (line 5)) (3.12)
Requirement already satisfied: prompt-toolkit<2.0.0,>=1.0.4 in /usr/lib/python2.7/site-packages (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (li
ne 1)) (1.0.15)Requirement already satisfied: decorator in /usr/lib/python2.7/site-packages (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1)) (3.4.0)
Requirement already satisfied: pexpect; sys_platform != "win32" in /usr/lib/python2.7/site-packages (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt
 (line 1)) (4.6.0)Requirement already satisfied: backports.shutil-get-terminal-size; python_version == "2.7" in /usr/lib/python2.7/site-packages (from ipython<6.0.0,>=5.0.0; python_version == "2.7"
->ipdb->-r requirements.txt (line 1)) (1.0.0)Requirement already satisfied: pygments in /usr/lib64/python2.7/site-packages (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1)) (2.2.0)
Collecting pathlib2; python_version == "2.7" or python_version == "3.3" (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/66/a7/9f8d84f31728d78beade9b1271ccbfb290c41c1e4dc13dbd4997ad594dcd/pathlib2-2.3.2-py2.py3-none-any.whl
Collecting traitlets>=4.2 (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/93/d6/abcb22de61d78e2fc3959c964628a5771e47e7cc60d53e9342e21ed6cc9a/traitlets-4.3.2-py2.py3-none-any.whl
Collecting simplegeneric>0.8 (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1))
Collecting pickleshare (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/9f/17/daa142fc9be6b76f26f24eeeb9a138940671490b91cb5587393f297c8317/pickleshare-0.7.4-py2.py3-none-any.whl
Collecting backports-abc>=0.4 (from tornado->visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/7d/56/6f3ac1b816d0cd8994e83d0c4e55bc64567532f7dc543378bd87f81cebc7/backports_abc-0.5-py2.py3-none-any.whl
Requirement already satisfied: futures in /usr/lib/python2.7/site-packages (from tornado->visdom->-r requirements.txt (line 4)) (3.2.0)
Collecting singledispatch (from tornado->visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/c5/10/369f50bcd4621b263927b0a1519987a04383d4a98fb10438042ad410cf88/singledispatch-3.4.0.3-py2.py3-none-any.whl
Collecting certifi>=2017.4.17 (from requests->visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/7c/e6/92ad559b7192d846975fc916b65f667c7b8c3a32bea7372340bfe9a15fa5/certifi-2018.4.16-py2.py3-none-any.whl
Collecting chardet<3.1.0,>=3.0.2 (from requests->visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl
Collecting idna<2.7,>=2.5 (from requests->visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/27/cc/6dd9a3869f15c2edfab863b992838277279ce92663d334df9ecf5106f5c6/idna-2.6-py2.py3-none-any.whl
Collecting urllib3<1.23,>=1.21.1 (from requests->visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/63/cb/6965947c13a94236f6d4b8223e21beb4d576dc72e8130bd7880f600839b8/urllib3-1.22-py2.py3-none-any.whl
Requirement already satisfied: wcwidth in /usr/lib/python2.7/site-packages (from prompt-toolkit<2.0.0,>=1.0.4->ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirement
s.txt (line 1)) (0.1.7)Requirement already satisfied: ptyprocess>=0.5 in /usr/lib/python2.7/site-packages (from pexpect; sys_platform != "win32"->ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r
 requirements.txt (line 1)) (0.5.2)Collecting scandir; python_version < "3.5" (from pathlib2; python_version == "2.7" or python_version == "3.3"->ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirement
s.txt (line 1))  Using cached https://files.pythonhosted.org/packages/13/bb/e541b74230bbf7a20a3949a2ee6631be299378a784f5445aa5d0047c192b/scandir-1.7.tar.gz
Collecting ipython-genutils (from traitlets>=4.2->ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/fa/bc/9bd3b5c2b4774d5f33b2d544f1460be9df7df2fe42f352135381c347c69a/ipython_genutils-0.2.0-py2.py3-none-any.whl
Requirement already satisfied: enum34; python_version == "2.7" in /usr/lib/python2.7/site-packages (from traitlets>=4.2->ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r r
equirements.txt (line 1)) (1.1.6)Building wheels for collected packages: word2vec, torchnet, scandir
  Running setup.py bdist_wheel for word2vec ... error
  Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-aaHbvs/word2vec/setup.py';f=getattr(tokenize, 'open', open)(__file__);
code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/pip-wheel-0AZ5iL --python-tag cp27:  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-2.7
  creating build/lib.linux-x86_64-2.7/word2vec
  copying word2vec/__init__.py -> build/lib.linux-x86_64-2.7/word2vec
  copying word2vec/_version.py -> build/lib.linux-x86_64-2.7/word2vec
  copying word2vec/io.py -> build/lib.linux-x86_64-2.7/word2vec
  copying word2vec/scripts_interface.py -> build/lib.linux-x86_64-2.7/word2vec
  copying word2vec/utils.py -> build/lib.linux-x86_64-2.7/word2vec
  copying word2vec/wordclusters.py -> build/lib.linux-x86_64-2.7/word2vec
  copying word2vec/wordvectors.py -> build/lib.linux-x86_64-2.7/word2vec
  creating build/lib.linux-x86_64-2.7/word2vec/tests
  copying word2vec/tests/__init__.py -> build/lib.linux-x86_64-2.7/word2vec/tests
  copying word2vec/tests/test_word2vec.py -> build/lib.linux-x86_64-2.7/word2vec/tests
  UPDATING build/lib.linux-x86_64-2.7/word2vec/_version.py
  set build/lib.linux-x86_64-2.7/word2vec/_version.py to '0.9.2'
  running build_ext
  building 'word2vec.word2vec_noop' extension
  creating build/temp.linux-x86_64-2.7
  creating build/temp.linux-x86_64-2.7/word2vec
  gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=ge
neric -D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I/usr/include/python2.7 -c word2vec/word2vec_noop.c -o build/temp.linux-x86_64-2.7/word2vec/word2vec_noop.o  word2vec/word2vec_noop.c:16:20: 致命错误:Python.h:没有那个文件或目录
   #include "Python.h"
                      ^
  编译中断。
  error: command 'gcc' failed with exit status 1
  
  ----------------------------------------
  Failed building wheel for word2vec
  Running setup.py clean for word2vec
  Running setup.py bdist_wheel for torchnet ... done
  Stored in directory: /tmp/pip-ephem-wheel-cache-nmBFRj/wheels/17/05/ec/d05d051a225871af52bf504f5e8daf57704811b3c1850d0012
  Running setup.py bdist_wheel for scandir ... error
  Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-aaHbvs/scandir/setup.py';f=getattr(tokenize, 'open', open)(__file__);c
ode=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/pip-wheel-YBhBvd --python-tag cp27:  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-2.7
  copying scandir.py -> build/lib.linux-x86_64-2.7
  running build_ext
  building '_scandir' extension
  creating build/temp.linux-x86_64-2.7
  gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=ge
neric -D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I/usr/include/python2.7 -c _scandir.c -o build/temp.linux-x86_64-2.7/_scandir.o  _scandir.c:14:20: 致命错误:Python.h:没有那个文件或目录
   #include <Python.h>
                      ^
  编译中断。
  error: command 'gcc' failed with exit status 1
  
  ----------------------------------------
  Failed building wheel for scandir
  Running setup.py clean for scandir
Successfully built torchnet
Failed to build word2vec scandir
Installing collected packages: scandir, pathlib2, ipython-genutils, traitlets, simplegeneric, pickleshare, ipython, ipdb, fire, tqdm, torchfile, pyzmq, backports-abc, singledispat
ch, tornado, websocket-client, pillow, scipy, certifi, chardet, idna, urllib3, requests, visdom, word2vec, torchnet  Running setup.py install for scandir ... error
    Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-aaHbvs/scandir/setup.py';f=getattr(tokenize, 'open', open)(__file__)
;code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-GKZVrW/install-record.txt --single-version-externally-managed --compile:    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-2.7
    copying scandir.py -> build/lib.linux-x86_64-2.7
    running build_ext
    building '_scandir' extension
    creating build/temp.linux-x86_64-2.7
    gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=
generic -D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I/usr/include/python2.7 -c _scandir.c -o build/temp.linux-x86_64-2.7/_scandir.o    _scandir.c:14:20: 致命错误:Python.h:没有那个文件或目录
     #include <Python.h>
                        ^
    编译中断。
    error: command 'gcc' failed with exit status 1
    
    ----------------------------------------
Command "/usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-aaHbvs/scandir/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('
\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-GKZVrW/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-install-aaHbvs/scandir/[root@localhost PyTorchText-master]# 

查找问题 在Centos7上安装Python-dev 

[root@localhost PyTorchText-master]# yum install python-dev
已加载插件:fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirrors.163.com
 * extras: mirrors.163.com
 * updates: mirrors.cn99.com
没有可用软件包 python-dev。
错误:无须任何处理
[root@localhost PyTorchText-master]# yum install Python-devel
已加载插件:fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirrors.163.com
 * extras: mirrors.163.com
 * updates: mirrors.cn99.com
没有可用软件包 Python-devel。
  * 也许您想要:python-devel
错误:无须任何处理
[root@localhost PyTorchText-master]# yum install python-devel
已加载插件:fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirrors.163.com
 * extras: mirrors.163.com
 * updates: mirrors.cn99.com
正在解决依赖关系
--> 正在检查事务
---> 软件包 python-devel.x86_64.0.2.7.5-68.el7 将被 安装
--> 正在处理依赖关系 python(x86-64) = 2.7.5-68.el7,它被软件包 python-devel-2.7.5-68.el7.x86_64 需要
--> 正在检查事务
---> 软件包 python.x86_64.0.2.7.5-58.el7 将被 升级
---> 软件包 python.x86_64.0.2.7.5-68.el7 将被 更新
--> 正在处理依赖关系 python-libs(x86-64) = 2.7.5-68.el7,它被软件包 python-2.7.5-68.el7.x86_64 需要
--> 正在检查事务
---> 软件包 python-libs.x86_64.0.2.7.5-58.el7 将被 升级
---> 软件包 python-libs.x86_64.0.2.7.5-68.el7 将被 更新
--> 解决依赖关系完成

依赖关系解决

===================================================================================================================================================================================
 Package                                       架构                                    版本                                            源                                     大小
===================================================================================================================================================================================
正在安装:
 python-devel                                  x86_64                                  2.7.5-68.el7                                    base                                  397 k
为依赖而更新:
 python                                        x86_64                                  2.7.5-68.el7                                    base                                   93 k
 python-libs                                   x86_64                                  2.7.5-68.el7                                    base                                  5.6 M

事务概要
===================================================================================================================================================================================
安装  1 软件包
升级           ( 2 依赖软件包)

总下载量:6.1 M
Is this ok [y/d/N]: y
Downloading packages:
Delta RPMs disabled because /usr/bin/applydeltarpm not installed.
(1/3): python-2.7.5-68.el7.x86_64.rpm                                                                                                                       |  93 kB  00:00:00     
(2/3): python-devel-2.7.5-68.el7.x86_64.rpm                                                                                                                 | 397 kB  00:00:03     
(3/3): python-libs-2.7.5-68.el7.x86_64.rpm                                                                                                                  | 5.6 MB  00:00:38     
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
总计                                                                                                                                               160 kB/s | 6.1 MB  00:00:39     
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  正在更新    : python-libs-2.7.5-68.el7.x86_64                                                                                                                                1/5 
  正在更新    : python-2.7.5-68.el7.x86_64                                                                                                                                     2/5 
  正在安装    : python-devel-2.7.5-68.el7.x86_64                                                                                                                               3/5 
  清理        : python-2.7.5-58.el7.x86_64                                                                                                                                     4/5 
  清理        : python-libs-2.7.5-58.el7.x86_64                                                                                                                                5/5 
  验证中      : python-libs-2.7.5-68.el7.x86_64                                                                                                                                1/5 
  验证中      : python-devel-2.7.5-68.el7.x86_64                                                                                                                               2/5 
  验证中      : python-2.7.5-68.el7.x86_64                                                                                                                                     3/5 
  验证中      : python-libs-2.7.5-58.el7.x86_64                                                                                                                                4/5 
  验证中      : python-2.7.5-58.el7.x86_64                                                                                                                                     5/5 

已安装:
  python-devel.x86_64 0:2.7.5-68.el7                                                                                                                                               

作为依赖被升级:
  python.x86_64 0:2.7.5-68.el7                                                          python-libs.x86_64 0:2.7.5-68.el7                                                         

完毕!
[root@localhost PyTorchText-master]# 

然后执行成功

[root@localhost PyTorchText-master]# pip install -r requirements.txt 
Collecting git+https://github.com/pytorch/tnt.git@master (from -r requirements.txt (line 5))
  Cloning https://github.com/pytorch/tnt.git (to revision master) to /tmp/pip-req-build-kyfk8D
Collecting ipdb (from -r requirements.txt (line 1))
Collecting fire (from -r requirements.txt (line 2))
Collecting tqdm (from -r requirements.txt (line 3))
  Using cached https://files.pythonhosted.org/packages/93/24/6ab1df969db228aed36a648a8959d1027099ce45fad67532b9673d533318/tqdm-4.23.4-py2.py3-none-any.whl
Collecting visdom (from -r requirements.txt (line 4))
Collecting word2vec (from -r requirements.txt (line 6))
  Using cached https://files.pythonhosted.org/packages/5b/33/8e1cf93216342f0fe8aa4484ef1a833a12c4f6d6bf8e8b46ecc0feb5e5e8/word2vec-0.9.2.tar.gz
Requirement already satisfied: torch in /usr/lib64/python2.7/site-packages (from torchnet==0.0.2->-r requirements.txt (line 5)) (0.1.12.post2)
Requirement already satisfied: six in /usr/lib/python2.7/site-packages (from torchnet==0.0.2->-r requirements.txt (line 5)) (1.11.0)
Collecting ipython<6.0.0,>=5.0.0; python_version == "2.7" (from ipdb->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/52/19/aadde98d6bde1667d0bf431fb2d22451f880aaa373e0a241c7e7cb5815a0/ipython-5.7.0-py2-none-any.whl
Requirement already satisfied: setuptools in /usr/lib/python2.7/site-packages (from ipdb->-r requirements.txt (line 1)) (39.2.0)
Collecting torchfile (from visdom->-r requirements.txt (line 4))
Collecting tornado (from visdom->-r requirements.txt (line 4))
Collecting scipy (from visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/2a/f3/de9c1bd16311982711209edaa8c6caa962db30ebb6a8cc6f1dcd2d3ef616/scipy-1.1.0-cp27-cp27mu-manylinux1_x86_64.whl
Requirement already satisfied: numpy>=1.8 in /usr/lib64/python2.7/site-packages (from visdom->-r requirements.txt (line 4)) (1.14.3)
Collecting pillow (from visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/00/49/a0483e7308b4b04b5a898789911dbb876d9fea54e7df0453915e47744cfd/Pillow-5.1.0-cp27-cp27mu-manylinux1_x86_64.whl
Collecting pyzmq (from visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/5d/b0/3aea046f5519e2e059a225e8c924f897846b608793f890be987d07858b7c/pyzmq-17.0.0-cp27-cp27mu-manylinux1_x86_64.whl
Collecting websocket-client (from visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/8a/a1/72ef9aa26cfe1a75cee09fc1957e4723add9de098c15719416a1ee89386b/websocket_client-0.48.0-py2.py3-none-any.whl
Collecting requests (from visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/49/df/50aa1999ab9bde74656c2919d9c0c085fd2b3775fd3eca826012bef76d8c/requests-2.18.4-py2.py3-none-any.whl
Requirement already satisfied: cython in /usr/lib64/python2.7/site-packages (from word2vec->-r requirements.txt (line 6)) (0.28.3)
Requirement already satisfied: pyyaml in /usr/lib64/python2.7/site-packages (from torch->torchnet==0.0.2->-r requirements.txt (line 5)) (3.12)
Collecting pathlib2; python_version == "2.7" or python_version == "3.3" (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/66/a7/9f8d84f31728d78beade9b1271ccbfb290c41c1e4dc13dbd4997ad594dcd/pathlib2-2.3.2-py2.py3-none-any.whl
Requirement already satisfied: backports.shutil-get-terminal-size; python_version == "2.7" in /usr/lib/python2.7/site-packages (from ipython<6.0.0,>=5.0.0; python_version == "2.7"
->ipdb->-r requirements.txt (line 1)) (1.0.0)Collecting simplegeneric>0.8 (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1))
Requirement already satisfied: pygments in /usr/lib64/python2.7/site-packages (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1)) (2.2.0)
Requirement already satisfied: pexpect; sys_platform != "win32" in /usr/lib/python2.7/site-packages (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt
 (line 1)) (4.6.0)Requirement already satisfied: prompt-toolkit<2.0.0,>=1.0.4 in /usr/lib/python2.7/site-packages (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (li
ne 1)) (1.0.15)Collecting pickleshare (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/9f/17/daa142fc9be6b76f26f24eeeb9a138940671490b91cb5587393f297c8317/pickleshare-0.7.4-py2.py3-none-any.whl
Requirement already satisfied: decorator in /usr/lib/python2.7/site-packages (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1)) (3.4.0)
Collecting traitlets>=4.2 (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/93/d6/abcb22de61d78e2fc3959c964628a5771e47e7cc60d53e9342e21ed6cc9a/traitlets-4.3.2-py2.py3-none-any.whl
Requirement already satisfied: futures in /usr/lib/python2.7/site-packages (from tornado->visdom->-r requirements.txt (line 4)) (3.2.0)
Collecting singledispatch (from tornado->visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/c5/10/369f50bcd4621b263927b0a1519987a04383d4a98fb10438042ad410cf88/singledispatch-3.4.0.3-py2.py3-none-any.whl
Collecting backports-abc>=0.4 (from tornado->visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/7d/56/6f3ac1b816d0cd8994e83d0c4e55bc64567532f7dc543378bd87f81cebc7/backports_abc-0.5-py2.py3-none-any.whl
Collecting urllib3<1.23,>=1.21.1 (from requests->visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/63/cb/6965947c13a94236f6d4b8223e21beb4d576dc72e8130bd7880f600839b8/urllib3-1.22-py2.py3-none-any.whl
Collecting idna<2.7,>=2.5 (from requests->visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/27/cc/6dd9a3869f15c2edfab863b992838277279ce92663d334df9ecf5106f5c6/idna-2.6-py2.py3-none-any.whl
Collecting chardet<3.1.0,>=3.0.2 (from requests->visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl
Collecting certifi>=2017.4.17 (from requests->visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/7c/e6/92ad559b7192d846975fc916b65f667c7b8c3a32bea7372340bfe9a15fa5/certifi-2018.4.16-py2.py3-none-any.whl
Collecting scandir; python_version < "3.5" (from pathlib2; python_version == "2.7" or python_version == "3.3"->ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirement
s.txt (line 1))  Using cached https://files.pythonhosted.org/packages/13/bb/e541b74230bbf7a20a3949a2ee6631be299378a784f5445aa5d0047c192b/scandir-1.7.tar.gz
Requirement already satisfied: ptyprocess>=0.5 in /usr/lib/python2.7/site-packages (from pexpect; sys_platform != "win32"->ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r
 requirements.txt (line 1)) (0.5.2)Requirement already satisfied: wcwidth in /usr/lib/python2.7/site-packages (from prompt-toolkit<2.0.0,>=1.0.4->ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirement
s.txt (line 1)) (0.1.7)Requirement already satisfied: enum34; python_version == "2.7" in /usr/lib/python2.7/site-packages (from traitlets>=4.2->ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r r
equirements.txt (line 1)) (1.1.6)Collecting ipython-genutils (from traitlets>=4.2->ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/fa/bc/9bd3b5c2b4774d5f33b2d544f1460be9df7df2fe42f352135381c347c69a/ipython_genutils-0.2.0-py2.py3-none-any.whl
Building wheels for collected packages: word2vec, torchnet, scandir
  Running setup.py bdist_wheel for word2vec ... done
  Stored in directory: /root/.cache/pip/wheels/89/a1/cb/417bcc7143a3e2befcc82da185ce8ad4a340eb82c0bf48969c
  Running setup.py bdist_wheel for torchnet ... done
  Stored in directory: /tmp/pip-ephem-wheel-cache-oQlzp4/wheels/17/05/ec/d05d051a225871af52bf504f5e8daf57704811b3c1850d0012
  Running setup.py bdist_wheel for scandir ... done
  Stored in directory: /root/.cache/pip/wheels/4a/ca/d7/26c3620234732f2d5b3ca86d7ccb0f59a21bd7712bffbbedc2
Successfully built word2vec torchnet scandir
Installing collected packages: scandir, pathlib2, simplegeneric, pickleshare, ipython-genutils, traitlets, ipython, ipdb, fire, tqdm, torchfile, singledispatch, backports-abc, tor
nado, scipy, pillow, pyzmq, websocket-client, urllib3, idna, chardet, certifi, requests, visdom, word2vec, torchnetSuccessfully installed backports-abc-0.5 certifi-2018.4.16 chardet-3.0.4 fire-0.1.3 idna-2.6 ipdb-0.11 ipython-5.7.0 ipython-genutils-0.2.0 pathlib2-2.3.2 pickleshare-0.7.4 pillow
-5.1.0 pyzmq-17.0.0 requests-2.18.4 scandir-1.7 scipy-1.1.0 simplegeneric-0.8.1 singledispatch-3.4.0.3 torchfile-0.1.0 torchnet-0.0.2 tornado-5.0.2 tqdm-4.23.4 traitlets-4.3.2 urllib3-1.22 visdom-0.1.8.3 websocket-client-0.48.0 word2vec-0.9.2[root@localhost PyTorchText-master]# 

安装完上述依赖之后,启动可视化工具visdom 服务
```sh
python -m visdom.server
```

pytorch学习笔记(八):PytTorch可视化工具 visdom

至此,环境已经准备好了,接下来就要准备init的源码和数据文件了

[root@localhost PyTorchText-master]# ll *.txt
-rw-r--r--. 1 root root   29200241 6月   5 16:55 char_embedding.txt
-rw-r--r--. 1 root root  239862273 6月   5 16:53 question_eval_set.txt
-rw-r--r--. 1 root root  204459814 6月   5 16:52 question_topic_train_set.txt
-rw-r--r--. 1 root root 3317236306 6月   5 16:57 question_train_set.txt
-rw-r--r--. 1 root root         77 6月   5 11:45 requirements.txt
-rw-r--r--. 1 root root    1072551 6月   5 16:53 topic_info.txt
-rw-r--r--. 1 root root 1005008916 6月   5 16:55 word_embedding.txt
[root@localhost PyTorchText-master]# 


## 2. 数据预处理

###  2.1 词向量转成numpy数组


[root@localhost PyTorchText-master]# python scripts/data_process/embedding2matrix.py main char_embedding.txt char_embedding.npz 
[root@localhost PyTorchText-master]# ls
char_embedding.npz  data                main-all.1.py  models                  question_topic_train_set.txt  rep.py            test.3.py
char_embedding.txt  del                 main-all.py    notebooks               question_train_set.txt        requirements.txt  topic_info.txt
checkpoints         ??ɽ??init?????.pdf  main.py        ??ɽ??-??ʿ????????.pptx  readme.md                     scripts           utils
config.py           LICENSE             ˵??.md         question_eval_set.txt   readme-zh.md                  test.1.py         word_embedding.txt
[root@localhost PyTorchText-master]# python scripts/data_process/embedding2matrix.py main word_embedding.txt word_embedding.npz 
[root@localhost PyTorchText-master]# ls
char_embedding.npz  data                main-all.1.py  models                  question_topic_train_set.txt  rep.py            test.3.py           word_embedding.txt
char_embedding.txt  del                 main-all.py    notebooks               question_train_set.txt        requirements.txt  topic_info.txt
checkpoints         ??ɽ??init?????.pdf  main.py        ??ɽ??-??ʿ????????.pptx  readme.md                     scripts           utils
config.py           LICENSE             ˵??.md         question_eval_set.txt   readme-zh.md                  test.1.py         word_embedding.npz
### 2.2  问题转成numpy 数组

这一步很耗内存,请确保内存>32G,仅操作了小文件

[root@localhost PyTorchText-master]# python scripts/data_process/question2array.py main question_eval_set.txt test.npz
Traceback (most recent call last):
  File "scripts/data_process/question2array.py", line 85, in <module>
    fire.Fire()
  File "/usr/lib/python2.7/site-packages/fire/core.py", line 127, in Fire
    component_trace = _Fire(component, args, context, name)
  File "/usr/lib/python2.7/site-packages/fire/core.py", line 366, in _Fire
    component, remaining_args)
  File "/usr/lib/python2.7/site-packages/fire/core.py", line 542, in _CallCallable
    result = fn(*varargs, **kwargs)
  File "scripts/data_process/question2array.py", line 19, in main
    char2id = np.load('/mnt/7/zhihu/ieee_zhihu_cup/data/char_embedding.npz')['word2id'].item()
  File "/usr/lib64/python2.7/site-packages/numpy/lib/npyio.py", line 372, in load
    fid = open(file, "rb")
IOError: [Errno 2] No such file or directory: '/mnt/7/zhihu/ieee_zhihu_cup/data/char_embedding.npz'
[root@localhost PyTorchText-master]# 

报错,需要修改文件中的路径,

[root@localhost PyTorchText-master]# python scripts/data_process/question2array.py main question_eval_set.txt test.npz
217360it [00:34, 6317.30it/s]
a
b
c
d
[root@localhost PyTorchText-master]# 

### 2.3 处理label,转成json

[root@localhost PyTorchText-master]# python scripts/data_process/label2id.py main question_topic_train_set.txt labels.json
> /weblogic/PyTorchText-master/scripts/data_process/label2id.py(17)main()
     16     import ipdb;ipdb.set_trace()
---> 17     all_labels = { _ for ii,jj in results for _ in jj }
     18     sorted_labels = sorted(all_labels)

ipdb> n
> /weblogic/PyTorchText-master/scripts/data_process/label2id.py(18)main()
     17     all_labels = { _ for ii,jj in results for _ in jj }
---> 18     sorted_labels = sorted(all_labels)
     19     label2id = {l_:ii for ii,l_ in enumerate(sorted_labels)}#-3239204820424->1

ipdb> n
> /weblogic/PyTorchText-master/scripts/data_process/label2id.py(19)main()
     18     sorted_labels = sorted(all_labels)
---> 19     label2id = {l_:ii for ii,l_ in enumerate(sorted_labels)}#-3239204820424->1
     20     id2label = {ii:l_ for ii,l_ in enumerate(sorted_labels)}

ipdb> n
> /weblogic/PyTorchText-master/scripts/data_process/label2id.py(20)main()
     19     label2id = {l_:ii for ii,l_ in enumerate(sorted_labels)}#-3239204820424->1
---> 20     id2label = {ii:l_ for ii,l_ in enumerate(sorted_labels)}
     21 

ipdb> n
> /weblogic/PyTorchText-master/scripts/data_process/label2id.py(22)main()
     21 
---> 22     d = {ii:[label2id[jj] for jj in labels ]  for ii,labels in results}
     23 

ipdb> n
n> /weblogic/PyTorchText-master/scripts/data_process/label2id.py(24)main()
     23 
---> 24     data = dict(d=d,label2id=label2id,id2label=id2label)
     25     import json

ipdb> n
> /weblogic/PyTorchText-master/scripts/data_process/label2id.py(25)main()
     24     data = dict(d=d,label2id=label2id,id2label=id2label)
---> 25     import json
     26     with open(outfile,'w') as f:

ipdb> n
> /weblogic/PyTorchText-master/scripts/data_process/label2id.py(26)main()
     25     import json
---> 26     with open(outfile,'w') as f:
     27         json.dump(data,f)

ipdb> n
> /weblogic/PyTorchText-master/scripts/data_process/label2id.py(27)main()
     26     with open(outfile,'w') as f:
---> 27         json.dump(data,f)
     28 

ipdb> n




--Return--
None
> /weblogic/PyTorchText-master/scripts/data_process/label2id.py(27)main()
     26     with open(outfile,'w') as f:
---> 27         json.dump(data,f)
     28 

ipdb> 
> /usr/lib/python2.7/site-packages/fire/core.py(543)_CallCallable()
    542   result = fn(*varargs, **kwargs)
--> 543   return result, consumed_args, remaining_args, capacity
    544 

ipdb> c
[1]+  已杀死               python scripts/data_process/label2id.py main question_topic_train_set.txt labels.json
[root@localhost PyTorchText-master]# 

操作文档中说很耗内存的一步,也操作完成了,我的内存是2G。未找到train.npz,可能是因为内存原因失败了。

[root@localhost PyTorchText-master]# python scripts/data_process/question2array.py main question_train_set.txt train.npz
已杀死
[root@localhost PyTorchText-master]# 

接下来从训练集中抽取一部分的数据生成验证集, 这部分代码是从ipython中备份的,__注意修改代码中的数据存放路径__ .

[root@localhost PyTorchText-master]# python scripts/data_process/get_val.py 
[root@localhost PyTorchText-master]# 

## 3. 训练模型

我发现了致命的错误

[root@localhost PyTorchText-master]#  python main.py main --max_epoch=5 --plot_every=100 --env='MultiCNNText' --weight=1 --model='MultiCNNTextBNDeep'  --batch-size=64  --lr=0.001 
--lr2=0.000 --lr_decay=0.8 --decay_every=10000  --title-dim=250 --content-dim=250    --weight-decay=0 --type_='word' --debug-file='/tmp/debug'  --linear-hidden-size=2000 --zhuge=True  --augument=FalseTraceback (most recent call last):
  File "main.py", line 158, in <module>
    fire.Fire()  
  File "/usr/lib/python2.7/site-packages/fire/core.py", line 127, in Fire
    component_trace = _Fire(component, args, context, name)
  File "/usr/lib/python2.7/site-packages/fire/core.py", line 366, in _Fire
    component, remaining_args)
  File "/usr/lib/python2.7/site-packages/fire/core.py", line 542, in _CallCallable
    result = fn(*varargs, **kwargs)
  File "main.py", line 74, in main
    model = getattr(models,opt.model)(opt).cuda()
  File "/usr/lib64/python2.7/site-packages/torch/nn/modules/module.py", line 147, in cuda
    return self._apply(lambda t: t.cuda(device_id))
  File "/usr/lib64/python2.7/site-packages/torch/nn/modules/module.py", line 118, in _apply
    module._apply(fn)
  File "/usr/lib64/python2.7/site-packages/torch/nn/modules/module.py", line 124, in _apply
    param.data = fn(param.data)
  File "/usr/lib64/python2.7/site-packages/torch/nn/modules/module.py", line 147, in <lambda>
    return self._apply(lambda t: t.cuda(device_id))
  File "/usr/lib64/python2.7/site-packages/torch/_utils.py", line 65, in _cuda
    return new_type(self.size()).copy_(self, async)
  File "/usr/lib64/python2.7/site-packages/torch/cuda/__init__.py", line 272, in __new__
    _lazy_init()
  File "/usr/lib64/python2.7/site-packages/torch/cuda/__init__.py", line 84, in _lazy_init
    _check_driver()
  File "/usr/lib64/python2.7/site-packages/torch/cuda/__init__.py", line 58, in _check_driver
    http://www.nvidia.com/Download/index.aspx""")
AssertionError: 
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx
[root@localhost PyTorchText-master]#




  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值