集成 Sphinx 软件
http://www.ibm.com/developerworks/cn/opensource/os-php-sphinxsearch/
要应用 Sphinx 来解决问题,您必须定义一个或多个数据源以及一个或多个索引。
source
index
您将在 sphinx.conf 文件中定义数据源和索引。Body Parts 的数据源是 MySQL 数据库。清单 5 显示了名为 catalog 的数据源的部分定义 —— 指定连接的数据库以及如何建立连接(主机、套接字、用户和密码)的代码片段。
清单 5. 用于访问 MySQL 数据库的设置
source catalog { type = mysql sql_host = localhost sql_user = reaper sql_pass = s3cr3t sql_db = body_parts sql_sock = /var/run/mysqld/mysqld.sock sql_port = 3306
接下来,创建一个查询以生成要被索引的行。通常,将创建 SELECT
JOIN
要获得右侧表单中的数据,需要创建一个视图
清单 6. Catalog 视图将把数据整合到虚拟表中
CREATE OR REPLACE VIEW Catalog AS SELECT Inventory.id, Inventory.partno, Inventory.description, Assembly.id AS assembly, Model.id AS model FROM Assembly, Inventory, Model, Schematic WHERE Schematic.partno_id=Inventory.id AND Schematic.model_id=Model.id AND Schematic.assembly_id=Assembly.id;
如果用前面所示的表和数据创建名为 body_parts 的数据库,则 Catalog 视图应当类似以下内容:
mysql> use body_parts; Database changed mysql> select * from Catalog; +----+---------+---------------------+----------+-------+ | id | partno | description | assembly | model | +----+---------+---------------------+----------+-------+ | 6 | 765432 | Bolt | 5 | 1 | | 8 | ENG088 | Cylinder head | 5 | 1 | | 1 | WIN408 | Portal window | 3 | 1 | | 5 | WIN958 | Windshield, front | 3 | 1 | | 4 | ACC5409 | Cigarette lighter | 7 | 3 | | 9 | ENG976 | Large cylinder head | 5 | 3 | | 8 | ENG088 | Cylinder head | 5 | 7 | | 6 | 765432 | Bolt | 5 | 7 | +----+---------+---------------------+----------+-------+ 8 rows in set (0.00 sec)
在视图中,字段 id
partno
description
assembly
model
清单 7. 查询创建待索引的行
# indexer query # document_id MUST be the very first field # document_id MUST be positive (non-zero, non-negative) # document_id MUST fit into 32 bits # document_id MUST be unique sql_query = \ SELECT \ id, partno, description, \ assembly, model \ FROM \ Catalog; sql_group_column = assembly sql_group_column = model # document info query # ONLY used by search utility to display document information # MUST be able to fetch document info by its id, therefore # MUST contain '$id' macro # sql_query_info = SELECT * FROM Inventory WHERE id=$id }
sql_query
sql_group_column
sql_query_info
$id
最后一个配置步骤是构建索引。清单 8 显示了数据源 catalog 的索引。
清单 8. 描述 catalog 数据源的一个可能的索引
index catalog { source = catalog path = /var/data/sphinx/catalog morphology = stem_en min_word_len = 3 min_prefix_len = 0 min_infix_len = 3 }
第 1 行将指向 sphinx.conf 文件中的指定数据源。第 2 行将定义存储索引数据的位置;按照约定,Sphinx 索引将被存储到 /var/data/sphinx 中。第 3 行将允许索引使用英文词法。并且第 5 行至第 7 行将告诉索引器只索引含有三个字符或更多字符的那些单词,并且为每个这样的字符的子字符串创建中缀索引(为了便于引用,清单 9 显示了 Body Parts 的完整示例 sphinx.conf 文件)。
清单 9. Body Parts 的示例 sphinx.conf
source catalog { type = mysql sql_host = localhost sql_user = reaper sql_pass = s3cr3t sql_db = body_parts sql_sock = /var/run/mysqld/mysqld.sock sql_port = 3306 # indexer query # document_id MUST be the very first field # document_id MUST be positive (non-zero, non-negative) # document_id MUST fit into 32 bits # document_id MUST be unique sql_query = \ SELECT \ id, partno, description, \ assembly, model \ FROM \ Catalog; sql_group_column = assembly sql_group_column = model # document info query # ONLY used by search utility to display document information # MUST be able to fetch document info by its id, therefore # MUST contain '$id' macro # sql_query_info = SELECT * FROM Inventory WHERE id=$id } index catalog { source = catalog path = /var/data/sphinx/catalog morphology = stem_en min_word_len = 3 min_prefix_len = 0 min_infix_len = 3 } searchd { port = 3312 log = /var/log/searchd/searchd.log query_log = /var/log/searchd/query.log pid_file = /var/log/searchd/searchd.pid }
底部的 searchd
构建和测试索引
您现在已经准备好为 Body Parts 应用程序构建索引。为此,需要执行以下步骤:
- 键入
$
sudo mkdir -p /var/data/sphinx 创建目录结构 /var/data/sphinx - 假定 MySQL 正在运行,使用如下所示的代码运行索引器来创建索引。
清单 10. 创建索引
$ sudo /usr/local/bin/indexer --config /usr/local/etc/sphinx.conf --all Sphinx 0.9.7 Copyright (c) 2001-2007, Andrew Aksyonoff using config file '/usr/local/etc/sphinx.conf'... indexing index 'catalog'... collected 8 docs, 0.0 MB sorted 0.0 Mhits, 82.8% done total 8 docs, 149 bytes total 0.010 sec, 14900.00 bytes/sec, 800.00 docs/sec
-all
参数将重构 sphinx.conf 中列出的所有索引。如果不需要重构所有索引,您可以使用其他参数只对部分索引进行重构。 - 您现在可以使用如下所示的代码用 search 实用程序测试索引(不必运行 searchd 即可使用 search)。
清单 11. 用 search 测试索引
$ /usr/local/bin/search --config /usr/local/etc/sphinx.conf ENG Sphinx 0.9.7 Copyright (c) 2001-2007, Andrew Aksyonoff index 'catalog': query 'ENG ': returned 2 matches of 2 total in 0.000 sec displaying matches: 1. document=8, weight=1, assembly=5, model=7 id=8 partno=ENG088 description=Cylinder head price=55 2. document=9, weight=1, assembly=5, model=3 id=9 partno=ENG976 description=Large cylinder head price=65 words: 1. 'eng': 2 documents, 2 hits $ /usr/local/bin/search --config /usr/local/etc/sphinx.conf wind Sphinx 0.9.7 Copyright (c) 2001-2007, Andrew Aksyonoff index 'catalog': query 'wind ': returned 2 matches of 2 total in 0.000 sec displaying matches: 1. document=1, weight=1, assembly=3, model=1 id=1 partno=WIN408 description=Portal window price=423 2. document=5, weight=1, assembly=3, model=1 id=5 partno=WIN958 description=Windshield, front price=500 words: 1. 'wind': 2 documents, 2 hits $ /usr/local/bin/search \ --config /usr/local/etc/sphinx.conf --filter model 3 ENG Sphinx 0.9.7 Copyright (c) 2001-2007, Andrew Aksyonoff index 'catalog': query 'ENG ': returned 1 matches of 1 total in 0.000 sec displaying matches: 1. document=9, weight=1, assembly=5, model=3 id=9 partno=ENG976 description=Large cylinder head price=65 words: 1. 'eng': 2 documents, 2 hits
第一条命令 /usr/local/bin/search --config /usr/local/etc/sphinx.conf ENG
ENG
/usr/local/bin/search --config /usr/local/etc/sphinx.conf wind
wind
。而第三条命令把结果限定为 model
3
编写代码
最后,您可以编写 PHP 代码来调用 Sphinx 搜索引擎。Sphinx PHP API 非常小并且易于掌握。清单 12 是一个小型 PHP 应用程序,用于调用 searchd 以得到使用上面所示的最后一条命令得到的相同结果(“在属于型号 3 的名称中找到含有 ‘cylinder’ 的所有零件”)。
清单 12. 从 PHP 调用 Sphinx 搜索引擎
要测试代码,需要为 Sphinx 创建 log 目录,启动 searchd,然后运行 PHP 应用程序,如下所示:
清单 13. PHP 应用程序
$ sudo mkdir -p /var/log/searchd $ sudo /usr/local/bin/searchd --config /usr/local/etc/sphinx.conf $ php search.php 9 Array ( [fields] => Array ( [0] => partno [1] => description ) [attrs] => Array ( [assembly] => 1 [model] => 1 ) [matches] => Array ( [9] => Array ( [weight] => 1 [attrs] => Array ( [assembly] => 5 [model] => 3 ) ) ) [total] => 1 [total_found] => 1 [time] => 0.000 [words] => Array ( [cylind] => Array ( [docs] => 2 [hits] => 2 ) ) )
输出为 $result
results
print_r()
注意事项:total_found
found
SetLimits()
。一个分页示例是用 $cl->SetLimits( ( $page - 1 ) * SPAN, SPAN )
SPAN
结束语
Sphinx 还有更多的功能可以利用。我在这里仅仅介绍了最浅显的一部分,但是您现在有一个可以工作的现实示例作为基石来扩展您的技能。
仔细研读随发行版附带的样例 Sphinx 配置文件 /usr/local/etc/sphinx.conf.dist。该文件中的注释将说明每个 Sphinx 参数可以实现的功能;展示如何创建分布式冗余配置;并说明如何继承基本设置以避免源代码及索引中的重复。Sphinx README 文件还是十分丰富的信息源,包括如何将 Sphinx 直接嵌入 MySQL V5 —— 不需要使用守护程序。