背景
为了高效、快速统计词频,故而采用KenLM。至于KenLM的详情,请参考源码: https://github.com/kpu/kenlm。
安装
作者提供了安装指南:https://kheafield.com/code/kenlm/ 。确实在一切其他依赖环境都具备的前提下,安装如下:
wget -O - https://kheafield.com/code/kenlm.tar.gz |tar xz
mkdir kenlm/build
cd kenlm/build
cmake ..
make -j4
PS:本文在Centos 7下安装,gcc版本是5.2。
boost
在boost版本过低时,cmake步骤大概率会出现以下错误:
解决方案:
yum install -y boost boost-devel boost-doc
再重新cmake,报错如下:
CMake Error at /usr/local/share/cmake-3.15/Modules/FindPackageHandleStandardArgs.cmake:137 (message):
Could NOT find Boost (missing: thread) (found suitable version "1.55.0",
minimum required is "1.41.0")
Call Stack (most recent call first):
/usr/local/share/cmake-3.15/Modules/FindPackageHandleStandardArgs.cmake:378 (_FPHSA_FAILURE_MESSAGE)
/usr/local/share/cmake-3.15/Modules/FindBoost.cmake:2142 (find_package_handle_standard_args)
CMakeLists.txt:66 (find_package)
CMake Warning (dev) in /usr/local/share/cmake-3.15/Modules/FindBoost.cmake:
Policy CMP0011 is not set: Included scripts do automatic cmake_policy PUSH
and POP. Run "cmake --help-policy CMP0011" for policy details. Use the
cmake_policy command to set the policy and suppress this warning.
The included script
/usr/local/share/cmake-3.15/Modules/FindBoost.cmake
affects policy settings. CMake is implying the NO_POLICY_SCOPE option for
compatibility, so the effects are applied to the including context.
Call Stack (most recent call first):
CMakeLists.txt:66 (find_package)
This warning is for project developers. Use -Wno-dev to suppress it.
-- Configuring incomplete, errors occurred!
See also "/home/data1/devtools/kenlm/build/CMakeFiles/CMakeOutput.log".
See also "/home/data1/devtools/kenlm/build/CMakeFiles/CMakeError.log".
可以看出是没有找到按照的boost位置。那么安装的boost在哪里呢?
先查看安装了哪些boost相关的lib:rpm -qa|grep boost
查看相关具体包的安装位置,比如查看boost-thread-1.53.0-27.el7.x86_64
的安装位置:rpm -ql boost-thread-1.53.0-27.el7.x86_64
,结果如下:
最终发现boost-devel-1.53.0-27.el7.x86_64
的include和lib安装目录:
综上,知晓boost的include和lib目录:
/usr/include/boost/
/usr/lib64/
将这2个目录信息添加到CMakeLists.txt
:
SET(BOOST_INCLUDEDIR "/usr/include/boost/")
SET(BOOST_LIBRARYDIR "/usr/lib64/")
指定编译器
再次安装,报错如下:
CMakeFiles/tokenize_piece_test.dir/tokenize_piece_test.cc.o: In function `boost::unit_test::make_test_case(boost::unit_test::callback0<boost::unit_test::ut_detail::unused> const&, boost::unit_test::basic_cstring<char const>)':
tokenize_piece_test.cc:(.text._ZN5boost9unit_test14make_test_caseERKNS0_9callback0INS0_9ut_detail6unusedEEENS0_13basic_cstringIKcEE[_ZN5boost9unit_test14make_test_caseERKNS0_9callback0INS0_9ut_detail6unusedEEENS0_13basic_cstringIKcEE]+0x11): undefined reference to `boost::unit_test::ut_detail::normalize_test_case_name[abi:cxx11](boost::unit_test::basic_cstring<char const>)'
collect2: error: ld returned 1 exit status
make[2]: *** [tests/tokenize_piece_test] Error 1
make[1]: *** [util/CMakeFiles/tokenize_piece_test.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[ 34%] Linking CXX static library ../../lib/libkenlm_interpolate.a
[ 34%] Built target kenlm_interpolate
[ 35%] Linking CXX executable ../tests/string_stream_test
CMakeFiles/string_stream_test.dir/string_stream_test.cc.o: In function `boost::unit_test::make_test_case(boost::unit_test::callback0<boost::unit_test::ut_detail::unused> const&, boost::unit_test::basic_cstring<char const>)':
string_stream_test.cc:(.text._ZN5boost9unit_test14make_test_caseERKNS0_9callback0INS0_9ut_detail6unusedEEENS0_13basic_cstringIKcEE[_ZN5boost9unit_test14make_test_caseERKNS0_9callback0INS0_9ut_detail6unusedEEENS0_13basic_cstringIKcEE]+0x11): undefined reference to `boost::unit_test::ut_detail::normalize_test_case_name[abi:cxx11](boost::unit_test::basic_cstring<char const>)'
collect2: error: ld returned 1 exit status
make[2]: *** [tests/string_stream_test] Error 1
make[1]: *** [util/CMakeFiles/string_stream_test.dir/all] Error 2
[ 36%] Linking CXX executable ../tests/sorted_uniform_test
CMakeFiles/sorted_uniform_test.dir/sorted_uniform_test.cc.o: In function `boost::unit_test::make_test_case(boost::unit_test::callback0<boost::unit_test::ut_detail::unused> const&, boost::unit_test::basic_cstring<char const>)':
sorted_uniform_test.cc:(.text._ZN5boost9unit_test14make_test_caseERKNS0_9callback0INS0_9ut_detail6unusedEEENS0_13basic_cstringIKcEE[_ZN5boost9unit_test14make_test_caseERKNS0_9callback0INS0_9ut_detail6unusedEEENS0_13basic_cstringIKcEE]+0x11): undefined reference to `boost::unit_test::ut_detail::normalize_test_case_name[abi:cxx11](boost::unit_test::basic_cstring<char const>)'
collect2: error: ld returned 1 exit status
make[2]: *** [tests/sorted_uniform_test] Error 1
make[1]: *** [util/CMakeFiles/sorted_uniform_test.dir/all] Error 2
make: *** [all] Error 2
解决方案:
修改C++编译器。在CMakeLists.txt
头部添加以下命令:
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -D_GLIBCXX_USE_CXX11_ABI=0")
最后make成功后,可以将bin目录添加到环境变量中。
在~/.bashrc中添加kenlm的bin目录如下:
export PATH=$PATH:/usr/local/cuda-9.0/bin:/home/data1/devtools/kenlm/build/bin
source ~/.bashrc
当然,也可以直接将编译好需要用到的bin文件直接拷贝到待使用的目录中,直接运行调用。