安装sphinx
CSFT介绍
CSFT,全称为CoreSeek Fulltext Search Server,也就是CoreSeek 全文检索服务器。Sphinx默认不支持中文索引及检索,CSFT是在Sphinx基础上开发的全文检索软件,按照GPLv2协议发行。Coreseek (http://www.coreseek.com) 为sphinx在中国地区的用户提供支持服务。
到写本文时,CSFT最新版为2009.6.7的3.1rc1版。所有版本发表和修订记录见:http://www.coreseek.cn/products/ft_changelog/
CSFT下载
下载地址:http://www.coreseek.cn/products/ft_down/
选择Source版. Coreseek Fulltext Server(源代码) Coreseek Mmseg(源代码)
安装配置CSFT
1.安装mmseg
- tar zxvf mmseg-3.1.tar.gz
- cd mmseg-3.1
- ./configure –prefix=/usr/local/mmseg
- make && make install
2.安装csft
- tar zxvf csft-3.1.tar.gz
- cd csft-3.1
- ./configure –prefix=/usr/local/sphinx -with-mysql=/usr/local/mysql -with-mysql-includes=/usr/local/mysql/include/mysql -with-mysql-libs=/usr/local/mysql/lib/mysql -with-mmseg-includes=/usr/local/mmseg/include/mmseg -with-mmseg-libs=/usr/local/mmseg/lib -with-mmseg
- make && make install
注意:在make时若提示找不到iconv,则在configure结束后在src下的Makefile中查找到LIBS,然后加上 -liconv
3.生成词典
重回到mmseg的源代码目录
- cd data
- mmseg -u unigram.txt
- mv unigram.txt.uni uni.lib
- cp uni.lib /usr/local/sphinx/data/
- 新建mmseg.ini,输入如下内容:
- [mmseg]
merge_number_and_ascii=1;
number_and_ascii_joint=-;
compress_space=0;
seperate_number_ascii=1;
#merge_number_and_ascii: 字母和数字连续出现是非切分
#number_and_ascii_joint:连接数字和字母可用的符号,如'-' '.' 等
#compress_space:暂时无效
#seperate_number_ascii:是否拆分数字,如 1988 -> 1/x 9/x 8/x 8/x
4.配置索引文件,如下sphinx配置文件,如存在增量索引,请建立下表search_counter:
5.建立索引 bin/indexer --all(threadsindex)
6.测试搜索 bin/search 'keyword';也可以直接跳到7、8进行测试
7.开启searchd服务 /usr/local/sphinx/bin/searchd &
8.把sphinxapi.php和sphinx.php放到程序目录下进行测试
9.随机启动searchd,echo "/usr/local/sphinx/bin/searchd">>/etc/rc.local
sphinx配置文件和分布式检索
分布式检索
index test{
type=distributed
agent=localhost:3312:index1,index2....
agent=192.168.1.2:3312:index3,index4.....
agent_connect_timeout=1000
agent_query_timeout=3000
}