CoNLL 2017 - Multi-Model and Crosslingual Dependency Analysis

ref: Multi-Model and Crosslingual Dependency Analysis

code: https://github.com/CoNLL-UD-2017/Orange-Deskin

proceedings: http://universaldependencies.org/conll17/proceedings/

http://universaldependencies.org/conll17/results.html

代码运行环境的搭建: VirtualBox+Centos 7

1. get the source code

git clone https://github.com/CoNLL-UD-2017/Orange-Deskin

2. get cnn-v1

cd Orange-Deskin
git clone https://github.com/clab/cnn-v1

3. get eigen

hg clone https://bitbucket.org/eigen/eigen

4. replace cnn/model by the file in cnn-modifs and compile cnn:

cp cnn-modifs/model.h cnn-v1/cnn/
cd cnn-v1
mkdir build
cd build
cmake .. -DEIGEN3_INCLUDE_DIR=../../eigen 【need to set to absolute path, like: /root/Orange-Deskin/eigen】

make

如果cmake失败,而错误的原因是“Undefined reference to pthread_create in Linux”,解决方法是:安装boost-devel包,在Centos环境下:

yum install boost-devel

5. modify pycnn/setup.py (directory "../../cnn" should be "../../cnn-v1") and compile the python interface:

cd cnn-v1/pycnn
make install
Train models


1. in order to run training, we need to set the environment variable to find cnn-python-library

export LD_LIBRARY_PATH=PATH/TO/cnn-v1/pycnn

2. run the following training, we need to get "train-projective.conllu", "word2vec.cbow.bin", "train-words-to-load.txt".

python bistparser/barchybrid/src/parser.py \
  --cnn-mem 4000  \
  --outdir /PATH/TO/OUTDIR \
  --train train-projective.conllu \
  --dev dev-projective.conllu \
  --epochs 20 --lstmdims 125 \
  --lstmlayers 2 --bibi-lstm \
  --k 3 --usehead --userl \
  --extrn word2vec.cbow.bin \
  --extrnFilter train-words-to-load.txt \
  [--hidden 50]

3. get "train-projective.conllu"

py/projectivise.py -c train.conllu > train-pojective.conllu

其中,“train.conllu”是在treebank中选中一个训练集。Treebank: Universal Dependencies 2.0可以从“https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-2184”下载,其文件名为:ud-test-v2.0-conll2017.tgz。我们可以选择:ud-test-v2.0-conll2017/input/conll17-ud-trial-2017-03-19/en-udpipe.conllu.

4. get "dev-projective.conllu"

py/projectivise.py -c dev.conllu > dev-pojective.conllu

其中,“train.conllu”是在treebank中选中一个训练集。Treebank: Universal Dependencies 2.0可以从“https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-2184”下载,其文件名为:ud-test-v2.0-conll2017.tgz。我们可以选择:ud-test-v2.0-conll2017/input/conll17-ud-development-2017-03-19/en-udpipe.conllu.

5. get "word2vec.cbow.bin"

downloaded "freebase-vectors-skipgram1000.bin.gz" or "freebase-vectors-skipgram1000-en.bin.gz" or " GoogleNews-vectors-negative300.bin.gz" from https://code.google.com/archive/p/word2vec/

However, the used word2vec.cbow.bin is not trained by GoogleNews!

So we need to train another corpora so that we can get the file "word2vec.cbow.bin"

Word embeddings have been calculated on corpora taken from "https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-1989",可以下载文件名为:word-embeddings-conll17.tar的文件,取其中的word-embeddings-conll17/English/en.vectors文件作为word2vec.cbow.bin。

6. get "train-words-to-load.txt"

cut -f2 train-projective.conllu | sort -u > forms.txt
cut -f3 train-projective.conllu | sort -u > lemmas.txt
cat forms.txt lemmas.txt | perl -CSD -ne 'print lc' | sort -u > train-words-to-load.txt.txt

7. begin to training

python bistparser/barchybrid/src/parser.py \
  --cnn-mem 4000  \
  --outdir /PATH/TO/OUTDIR \
  --train train-projective.conllu \
  --dev dev-projective.conllu \
  --epochs 20 --lstmdims 125 \
  --lstmlayers 2 --bibi-lstm \
  --k 3 --usehead --userl \
  --extrn word2vec.cbow.bin \
  --extrnFilter 
  [--hidden 50]

Note: /PATH/TO/OUTDIR should be set as the true output directory.


Use Models


python bistparser/barchybrid/src/parser.py \
  --cnn-mem 4000 --predict \
  --outfile result.conllu \
  --test test-projective.conllu \
  --model /PATH/TO/OUTDIR/barchhybrid.model_NNN \
  --params /PATH/TO/OUTDIR/params.pickle \
  --k 3 --usehead --userl \
  --extrn word2vec.cbow.bin \
  --extrnFilter train-words-to-load.txt \
  --extrnFilterNew test-words-to-load.txt

发生如下错误:


出现上述错误,进过多次检查,发现barchhybrid.model_002的名字错误。


更改之后,发现如下错误:


出现上述原因是因为没有将cnn-v1/cnn/model.h替换掉。


Finally, de-projectivise output if you have projectivised:

py/projectivise.py -d result.conllu > result-deprojectivised.conllu

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值