由于官网挂了,找资源 下载coreseek+sphinx+mmseg中文分词安装费了很多时间
开始是在github上一阵狂搜,找了几个都安装失败,后来在csdn上找到资源,一阵折腾之后安装成功
准备工作,我是预先安装好了lnmpyum -y install make gcc gcc-c++ libtool autoconf automake imake mysql-devel libxml2-devel expat-devel
安装中文分词mmseg和coreseek(coreseek中已经包含了sphinx)mkdir -p /root/tmp/coreseek/
cd /root/tmp/coreseek/
wget http://files.opstool.com/man/coreseek-4.1-beta.tar.gz
tar -xzvf coreseek-4.1-beta.tar.gz
cd coreseek-4.1-beta
如果上面的地址不能下载,请到这里下载
csdn上的下载地址
安装中文分词包mmsegcd /root/tmp/coreseek/coreseek-4.1-beta/mmseg-3.2.14
./bootstrap
./configure --prefix=/usr/local/mmseg3
make && make install
中途可能会遇到各种错误,如果提示“config.status: error: cannot find input file: src/Makefile.in”可以参考这里aclocal
libtoolize --force
automake --add-missing
autoconf
autoheader
make clean
测试$ /usr/local/mmseg3/bin/mmseg -d /usr/local/mmseg3/etc src/t1.txt中文/x 分/x 词/x 测试/x
中国人/x 上海市/x
Word Splite took: 0 ms.
安装coreseek和sphinxcd /root/tmp/coreseek/coreseek-4.1-beta/csft-4.1
sh ./buildconf.sh
./configure --prefix=/usr/local/coreseek --without-unixodbc --with-mmseg --with-mmseg-includes=/usr/local/mmseg3/include/mmseg/ --with-mmseg-libs=/usr/local/mmseg3/lib/ --with-mysql
如果执行buildconf.sh报错 可以看这里 https://my.oschina.net/sloan521/blog/816578
编辑vi src/Makefile
LIBS = -ldl -lm -lz -lexpat -L/usr/local/lib -lrt -lpthread
#修改为
LIBS = -ldl -lm -lz -lexpat -liconv -L/usr/local/lib -lrt -lpthread
继续编译make && make install
测试和配置
导入数据$ mysql -uroot -p test
password:
修改配置csft.conf$ cd /usr/local/coreseek/
$ cp etc/sphinx-min.conf.dist etc/csft.conf
编辑etc/csft.conf,
1)修改其中数据库配置source src1{
type = mysql
sql_host = localhost
sql_user = test
sql_pass =
sql_db = test
sql_port = 3306 # optional, default is 3306
# 设置数据库编码
sql_query_pre = SET NAMES utf8 ...}
2)设置中文词库位置index test1{
...
# 中文分词词典文件 uni.lib 的目录
charset_dictpath = /usr/local/mmseg3/etc/
# 启用中文分词功能
charset_type = zh_cn.utf-8
}
索引,默认会找etc/csft.conf 可以-c指定配置文件 ./bin/indexer -c etc/csft.conf --all$ ./bin/indexer --all
Coreseek Fulltext 4.1 [ Sphinx 2.0.2-dev (r2922)]
Copyright (c) 2007-2011,
Beijing Choice Software Technologies Inc (http://www.coreseek.com)
using config file '/usr/local/coreseek/etc/csft.conf'...
indexing index 'test1'...
collected 4 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 4 docs, 193 bytes
total 0.014 sec, 12906 bytes/sec, 267.48 docs/sec
skipping non-plain index 'testrt'...
total 3 reads, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg
total 9 writes, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg
搜索$ ./bin/search "this is a test"
Coreseek Fulltext 4.1 [ Sphinx 2.0.2-dev (r2922)]
Copyright (c) 2007-2011,
Beijing Choice Software Technologies Inc (http://www.coreseek.com)
using config file '/usr/local/coreseek/etc/csft.conf'...
index 'test1': query 'this is a test ': returned 0 matches of 0 total in 0.000 sec
words:
1. 'this': 4 documents, 4 hits
2. 'is': 4 documents, 4 hits
3. 'a': 0 documents, 0 hits
4. 'test': 3 documents, 5 hits
index 'testrt': search error: failed to open /usr/local/coreseek/var/data/testrt.sph: No such file or directory.
其中'test': 3 documents, 5 hits表示:有 3 条记录符合要求,命中 5 次(即出现 5 次)。
测试中文,导入一些数据insert into documents(`group_id`,`group_id2`,`title`, `content`)
values
('锄禾日当午', '锄禾日当午,上班真辛苦,一台破电脑,一坐一下午'),
('你有尺子吗', '昨天看电视说"吸烟导致猝死"吓的我心里哆嗦!一咬牙一跺脚下定决心!"以后不看电视了"');
重建索引$ ./bin/indexer --all
搜索$ ./bin/search "以后不看电脑了"
Coreseek Fulltext 4.1 [ Sphinx 2.0.2-dev (r2922)]
Copyright (c) 2007-2011,
Beijing Choice Software Technologies Inc (http://www.coreseek.com)
using config file '/usr/local/coreseek/etc/csft.conf'...
index 'test1': query '以后不看电脑了 ': returned 0 matches of 0 total in 0.000 sec
words:
1. '以后': 1 documents, 1 hits
2. '不': 1 documents, 1 hits
3. '看': 1 documents, 2 hits
4. '电脑': 1 documents, 1 hits
5. '了': 1 documents, 1 hits
index 'testrt': search error: failed to open /usr/local/coreseek/var/data/testrt.sph: No such file or directory.
增量索引的实现
每次都全量创建索引会导致耗费较多的时间和资源,因此可以采取短时间内使用增量索引,定时执行全量索引的方式
这份是搜片大师里面的配置,可以参考source main
{
type= mysql
sql_host= 127.0.0.1 #主机名
sql_user= root #账户
sql_pass= #密码
sql_db= bt #库名
sql_port= 3306 #端口
sql_query_pre= SET NAMES utf8
sql_query_pre= REPLACE INTO sph_counter SELECT 1, MAX(id) FROM film
sql_query= SELECT id, name, click, length, ctime FROM film WHERE id<=( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 )
sql_query_info= SELECT * FROM film WHERE id=$id
sql_attr_uint = click
sql_attr_bigint = length
sql_attr_uint = ctime
}
source delta:main
{
sql_query_pre = SET NAMES utf8
sql_query = SELECT id, name, click, length, ctime FROM film WHERE id>( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 )
sql_attr_uint = click
sql_attr_bigint = length
sql_attr_uint = ctime
}
index main
{
source= main
path= coreseek\var\data\main
docinfo= extern
charset_dictpath = coreseek\etc
charset_type= zh_cn.utf-8
}
index delta:main
{
source = delta
path = coreseek\var\data\delta
}
indexer
{
mem_limit= 64M
}
searchd
{
listen = 127.0.0.1 #监听IP
port= 9312 #监听端口
log= coreseek\var\log
query_log= coreseek\var\log\query.log
read_timeout= 5
max_children= 30
pid_file= coreseek\var\log\searchd.pid
max_matches= 1000
seamless_rotate= 0
preopen_indexes= 0
unlink_old= 1
}
创建主索引bin/indexer --config etc/csft.conf main --rotate
创建增量索引bin/indexer --config etc/csft.conf delta:main --rotate
如果修改了配置文件,那么需要重启searchd,然后再创建索引
php sphinx扩展可以在这里(http://pecl.php.net/package/sphinx)找到sphinx的php扩展源码
下载最新的 http://pecl.php.net/get/sphinx-1.3.3.tgz
注意,使用phpize,configure的时候可能会要求要安装libsphinxclient,它在coreseek-xxx
/csft-xx/api/libsphinxclient/里面能找到,编译安装它以后就可以configure,make,生成动态so文件了。
cd libsphinxclient/
./configure --prefix=/usr/local/libsphinxclient
make && make install
cd sphinx
/usr/bin/phpize
./configure --with-php-config=/usr/bin/php-config --with-sphinx=/usr/local/libsphinxclient
php代码示例$string = '中国平均工资';
$_CONFIG = array();
$_CONFIG['sphinx'] = array();
$_CONFIG['sphinx']['host'] = '127.0.0.1';
$_CONFIG['sphinx']['host'] = 9312;
$client = new SphinxClient ();
$client->SetServer ( $_CONFIG['sphinx']['host'], $_CONFIG['sphinx']['port']);
$client->SetArrayResult (true);
$client->SetLimits($offset,$perpage,360);
$client->SetMatchMode ( SPH_MATCH_EXTENDED );
//$client->SetMatchMode ( SPH_MATCH_ANY);
$res = $client->Query ( $string, "main,delta:main" );
var_dump($res);
创建searchd服务
在/usr/lib/systemd/system/创建service文件searchd.service:[Unit]
Description=Core Seek Searchd
After=syslog.target network.target remote-fs.target nss-lookup.target
[Service]
Type=forking
PIDFile=/opt/coreseek-4.1/var/searchd.pid
ExecStart=/opt/coreseek-4.1/bin/searchd
ExecReload=/opt/coreseek-4.1/bin/searchd --stop
ExecStop=/opt/coreseek-4.1/bin/searchd --stop
PrivateTmp=true
[Install]
WantedBy=multi-user.target
执行indexer --all创建索引
执行systemctl enable searchd设置自动运行。
执行systemctl start searchd运行服务。