mysql sphinx 中文搜索_sphinx+mysql+mmseg 实现中文全站搜索 安装配置文档

一、Sphinx的特性

高速的建立索引(在当代CPU上,峰值性能可达到10 MB/秒);

高性能的搜索(在2 – 4GB 的文本数据上,平均每次检索响应时间小于0.1秒);

可处理海量数据(目前已知可以处理超过100 GB的文本数据, 在单一CPU的系统上可

处理100 M 文档);

提供了优秀的相关度算法,基于短语相似度和统计(BM25)的复合Ranking方法;

支持分布式搜索;

可作为MySQL的存储引擎提供搜索服务;

支持布尔、短语、词语相似度等多种检索模式;

文档支持多个全文检索字段(最大不超过32个);

文档支持多个额外的属性信息(例如:分组信息,时间戳等);

停止词查询;

支持单一字节编码和UTF-8编码;

原生的MySQL支持(同时支持MyISAM 和InnoDB );

原生的PostgreSQL 支持.

二、安装中文分词mmseg

下载mmseg

wget http://www.coreseek.com/uploads/sources/mmseg-0.7.3.tar.gz

编译安装

tar zxf mmseg-0.7.3.tar.gz && cd mmseg-0.7.3

./configure --prefix=/usr/local/mmseg

`make && make install

ln -s /usr/local/mmseg/bin/mmseg /usr/bin/

如果出现 make[2]: *** [UnigramCorpusReader.lo] Error 1报错,解决方法如下:

vim src/css/UnigramCorpusReader.cpp

在23行加入#include 重新编译

测试安装成功如下所示

[root@iZ28bak61f3Z bin]#mmseg

Coreseek COS(tm) MM Segment 1.0

Copyright By Coreseek.com All Right Reserved.

Usage: ./mmseg

-u Unigram Dictionary

-r Combine with -u, used a plain text build Unigram Dictionary, default Off

-b Synonyms Dictionary

-h print this help and exit

三、编译安装配置 (MySQL 5.6.24)

下载sphinx

wget http://pkgs.fedoraproject.org/repo/pkgs/sphinx/sphinx-0.9.9.tar.gz/7b9b618cb9b378f949bb1b91ddcc4f54/sphinx-0.9.9.tar.gz

编译安装sphinx

tar zxf sphinx-0.9.9.tar.gz

cd sphinx-0.9.9

yum install python-devel

./configure --prefix=/usr/local/sphinx --with-mysql=/usr/local/mysql/ --with-mysql-includes=/usr/local/mysql/include/ --with-mysql-libs=/usr/local/mysql/lib/ --with-mmseg-includes=/usr/local/mmseg/include/mmseg --with-mmseg-libs=/usr/local/mmseg/lib/ --with-mmseg

make && make install

配置sphinx

cd /usr/local/sphinx/etc/

cp sphinx.conf.dist sphinx.conf

修改以下内容

type = mysql

sql_host = localhost

sql_user = root

sql_pass = ****

sql_db = test

sql_port = 3306

四、建立索引

创建一个test库,导入 example.sql,进行测试

mysql -uroot -p < /usr/local/sphinx/etc/example.sql

索引的建立方法:

/usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/sphinx.conf test1

test1为资源名称 不写为默认所有都建立索引,出现如下提示,说明索引建立完成

Sphinx 0.9.9-release (r2117)

Copyright (c) 2001-2009, Andrew Aksyonoff

using config file '/usr/local/sphinx/etc/sphinx.conf'...

indexing index 'test1'...

collected 4 docs, 0.0 MB

sorted 0.0 Mhits, 100.0% done

total 4 docs, 193 bytes

total 0.030 sec, 6256 bytes/sec, 129.67 docs/sec

total 2 reads, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg

total 7 writes, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg

五、启动服务器

开启

/usr/local/sphinx/bin/searchd --config /usr/local/sphinx/etc/sphinx.conf

关闭

/usr/local/sphinx/bin/searchd --config /usr/local/sphinx/etc/sphinx.conf --stop

通过search工具查询

/usr/local/sphinx/bin/search --config /usr/local/sphinx/etc/sphinx.conf test

Sphinx 0.9.9-release (r2117)

Copyright (c) 2001-2009, Andrew Aksyonoff

using config file '/usr/local/sphinx/etc/sphinx.conf'...

index 'test1': query 'test ': returned 3 matches of 3 total in 0.000 sec

displaying matches:

1. document=1, weight=2, group_id=1, date_added=Wed Oct 14 15:01:39 2015

id=1

group_id=1

group_id2=5

date_added=2015-10-14 15:01:39

title=test one

content=this is my test document number one. also checking search within phrases.

2. document=2, weight=2, group_id=1, date_added=Wed Oct 14 15:01:39 2015

id=2

group_id=1

group_id2=6

date_added=2015-10-14 15:01:39

title=test two

content=this is my test document number two

3. document=4, weight=1, group_id=2, date_added=Wed Oct 14 15:01:39 2015

id=4

group_id=2

group_id2=8

date_added=2015-10-14 15:01:39

title=doc number four

content=this is to test groups

words:

1. 'test': 3 documents, 5 hits

六、sphinx 启动脚本

vim /etc/init.d/sphinx

#!/bin/bash

# sphinx: Startup script for Sphinx search

#

# chkconfig: 345 86 14

# description: This is a daemon for high performance full text /

# search of MySQL and PostgreSQL databases. /

# See http://www.sphinxsearch.com/ for more info.

#

# processname: searchd

# pidfile: $sphinxlocation/var/log/searchd.pid

# Source function library.

. /etc/rc.d/init.d/functions

processname=searchd

servicename=sphinx

username=sphinx

sphinxlocation=/usr/local/sphinx

pidfile=$sphinxlocation/var/log/searchd.pid

searchd=$sphinxlocation/bin/searchd

RETVAL=0

PATH=$PATH:$sphinxlocation/bin

start() {

echo -n $"Starting Sphinx daemon: "

daemon --user=$username --check $servicename $processname

RETVAL=$?

echo

[ $RETVAL -eq 0 ] && touch /var/lock/subsys/$servicename

}

stop() {

echo -n $"Stopping Sphinx daemon: "

$searchd --stop

#killproc -p $pidfile $servicename -TERM

RETVAL=$?

echo

if [ $RETVAL -eq 0 ]; then

rm -f /var/lock/subsys/$servicename

rm -f $pidfile

fi

}

# See how we were called.

case "$1" in

start)

start

;;

stop)

stop

;;

status)

status $processname

RETVAL=$?

;;

restart)

stop

sleep 3

start

;;

condrestart)

if [ -f /var/lock/subsys/$servicename ]; then

stop

sleep 3

start

fi

;;

*)

echo $"Usage: $0 {start|stop|status|restart|condrestart}"

;;

esac

exit $RETVAL

chmod 755 /etc/init.d/sphinx

chkconfig --add sphinx

chkconfig --level 345 sphinx on #开机启动

service sphinx start #运行

service sphinx stop #停止

service sphinx restart #重启

service sphinx status #查看是否运行

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值