sphinx是一个基于sql的全文检索引擎,他是SQL Phrase Index的缩写,目前提供对python,php,java,ruby等语言的API支持。他对mysql的支持很好,同时也提供了mysql存储引擎插件(SphinxSE),sphinx中文名称是斯芬克斯,官方网站是http://sphinxsearch.com。
这里先来介绍如何在redhat上安装最新版本的sphinx2.2.11。
我的系统及软件:
操作系统:Red Hat Enterprise Linux 6
数据库版本:Server version: 5.7.18 MySQL Community Server (GPL)
sphinx版本:2.2.11
前提是机器已经安装了mysql,并创建了名为test的数据库,后面会用到。
第一步、获取sphinx,并安装,下载地址:http://sphinxsearch.com/files/sphinx-2.2.11-release.tar.gz
$ wget http://sphinxsearch.com/files/sphinx-2.2.11-release.tar.gz
$ tar -xzvf sphinx-2.2.11-release.tar.gz
$ mv sphinx-2.2.11-release sphinx-2.2.11
$ cd sphinx-2.2.11
$ ./configure --prefix=/usr/local/sphinx --with-mysql
$ make && make install
以上命令都是在root用户权限下执行,前面几个命令可以不用再root下执行,最后make install一定要root权限。其中./configure可以不用带任何参数,默认--prefix就是/usr/local/sphinx,--with-mysql这个属性默认是enabled,我们可以通过./configure --help查看
--with-mysql compile with MySQL support (default is enabled)
如果是通过yum inistall安装的mysql,那么在./configure的时候会出现一个错误:ERROR: cannot find MySQL include files.
解决办法:yum install mysql-devel
通过这种编译安装成功之后,我们可以看到/usr/local/sphinx目录下会有四个文件夹 ,在bin目录下有indexer searchd等可执行文件
[hadoop@master Downloads]$ cd /usr/local/sphinx/
[hadoop@master sphinx]$ ls
bin etc share var
[hadoop@master sphinx]$ cd bin
[hadoop@master bin]$ ls
indexer indextool searchd spelldump wordbreaker
第二步、配置sphinx
进入/usr/local/sphinx/etc目录,然后将sphinx.conf.dist文件重命名为sphinx.conf,然后修改访问mysql的配置。
第三步:利用测试文件生成mysql测试表;
$ mysql -uroot -proot < /usr/local/sphinx/etc/example.sql
上面的语句会生成两个表documents,tags并向表中插入数据。
[root@master etc]# mysql -uroot -proot < example.sql
mysql: [Warning] Using a password on the command line interface can be insecure.
[root@master etc]# mysql -uroot -proot
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 5
Server version: 5.7.18 MySQL Community Server (GPL)
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> use test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> show tables;
+----------------+
| Tables_in_test |
+----------------+
| documents |
| tags |
+----------------+
2 rows in set (0.00 sec)
我们可以看一下example.sql的内容:
[hadoop@master sphinx-2.2.11]$ cat /usr/local/sphinx/etc/example.sql
DROP TABLE IF EXISTS test.documents;
CREATE TABLE test.documents
(
id INTEGER PRIMARY KEY NOT NULL AUTO_INCREMENT,
group_id INTEGER NOT NULL,
group_id2 INTEGER NOT NULL,
date_added DATETIME NOT NULL,
title VARCHAR(255) NOT NULL,
content TEXT NOT NULL
);
REPLACE INTO test.documents ( id, group_id, group_id2, date_added, title, content ) VALUES
( 1, 1, 5, NOW(), 'test one', 'this is my test document number one. also checking search within phrases.' ),
( 2, 1, 6, NOW(), 'test two', 'this is my test document number two' ),
( 3, 2, 7, NOW(), 'another doc', 'this is another group' ),
( 4, 2, 8, NOW(), 'doc number four', 'this is to test groups' );
DROP TABLE IF EXISTS test.tags;
CREATE TABLE test.tags
(
docid INTEGER NOT NULL,
tagid INTEGER NOT NULL,
UNIQUE(docid,tagid)
);
INSERT INTO test.tags VALUES
(1,1), (1,3), (1,5), (1,7),
(2,6), (2,4), (2,2),
(3,15),
(4,7), (4,40);
第四步、生成索引
$ cd /usr/local/sphinx
$ bin/indexer --all
[root@master sphinx]# bin/indexer --all
Sphinx 2.2.11-id64-release (95ae9a6)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
using config file '/usr/local/sphinx/etc/sphinx.conf'...
indexing index 'test1'...
collected 4 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 4 docs, 193 bytes
total 0.029 sec, 6440 bytes/sec, 133.47 docs/sec
indexing index 'test1stemmed'...
collected 4 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 4 docs, 193 bytes
total 0.019 sec, 9656 bytes/sec, 200.14 docs/sec
skipping non-plain index 'dist1'...
skipping non-plain index 'rt'...
total 8 reads, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg
total 24 writes, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg
第五步、启动服务
$ bin/searchd -c etc/sphinx.conf
[root@master sphinx]# bin/searchd -c etc/sphinx.conf
Sphinx 2.2.11-id64-release (95ae9a6)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
using config file 'etc/sphinx.conf'...
listening on all interfaces, port=9312
listening on all interfaces, port=9306
precaching index 'test1'
precaching index 'test1stemmed'
precaching index 'rt'
precached 3 indexes in 0.019 sec
查看端口我们发现开启了9306这个端口在监听搜索服务
第六步、我们可以通过mysql命令连接这个端口,执行相关操作
$ mysql -h 127.0.0.1 -P 9306 //这个操作很奇怪,竟然可以进入mysql环境,但是这肯定是另一个不一样的环境。
接下来我们可以运行官网上的示例:
mysql> show tables;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id: 1
Current database: *** NONE ***
+--------------+-------------+
| Index | Type |
+--------------+-------------+
| dist1 | distributed |
| rt | rt |
| test1 | local |
| test1stemmed | local |
+--------------+-------------+
4 rows in set (0.03 sec)
mysql> select * from test1;
+------+----------+------------+
| id | group_id | date_added |
+------+----------+------------+
| 1 | 1 | 1493367317 |
| 2 | 1 | 1493367317 |
| 3 | 2 | 1493367317 |
| 4 | 2 | 1493367317 |
+------+----------+------------+
4 rows in set (0.00 sec)
mysql> select * from test1 where match('my document');
+------+----------+------------+
| id | group_id | date_added |
+------+----------+------------+
| 1 | 1 | 1493367317 |
| 2 | 1 | 1493367317 |
+------+----------+------------+
2 rows in set (0.01 sec)
这个示例我并没有看出sphinx作为全文检索的功能,后续继续介绍。