使用docker 复现顶会文章里面的工具SegPhrase_ubuntu怎么用docker复现论文-CSDN博客

本文链接：https://blog.csdn.net/Fitz1318/article/details/108631541

使用docker 复现顶会文章里面的工具SegPhrase

Jialu Liu*, Jingbo Shang*, Chi Wang, Xiang Ren and Jiawei Han, "Mining
Quality Phrases from Massive Text
Corpora”, Proc. of
2015 ACM SIGMOD Int. Conf. on Management of Data (SIGMOD’15), Melbourne,
Australia, May 2015. (* equally contributed,
slides)

1.搭建开发环境

这里可以直接使用上述文章https://blog.csdn.net/Fitz1318/article/details/108627486提到的容器。这个容器里面已经提供了ubuntu，但是还缺少其他依赖。

首先我们根据这篇文章里提供的知识https://blog.csdn.net/Fitz1318/article/details/108627611

更换一下国内的镜像源，速度会快很多。

在作者提供的readme.md文件中提到需要安装以下依赖

g++4.8 使用命令apt-get install g++-4.8
python 2.7 ,这个是ubuntu系统自带的，不需要额外安装
pip 使用命令apt-get install python-pip,注意python-pip安装的才是pip的python2版本
scikit-learn 使用命令pip install -i https://pypi.doubanio.com/simple/ sklearn
nltk 使用命令 pip install -i https://pypi.doubanio.com/simple/ nltk

注意，精简版ubuntu系统里面不自带make，所以我们首先还需要使用命令 apt-get install make

来安装make.至此开发环境就搭建完成。

2.编译

作者这segphrase文件夹里面使用了Makefile工具，所以只需要

cd 进入 segphrase文件夹里面，然后使用命令make,就可以了。
在这里插入图片描述

3.默认运行

 ./train_toy.sh  #train a toy segmenter and output phrase list as results/unified.csv
 ./train_dblp.sh  #train a segmenter and output phrase list for DBLP data
 ./parse.sh  #use the segmenter to parse new documents