近几年来,Linux+Nginx+PHP+MongoDB(LNPM)这样的组合越来越火,甚至有取代Linux+Nginx/Apache+PHP+mysql这种组合的趋势。究其原因,是MongoDB强大,灵活,易扩展,更关键的易用。MongoDB不用事先设计好表结构,往里面插入什么都可以,管理还方便。所以成为创业团队的首选数据库,更是移动互联网的一枝新秀。
但MongoDB和关系型数据库也有很多相似之处,如全文索引不支持中文。在MongoDB2.6版本中开始默认支持全文索引,一如既往的不支持伟大的Chinese,因此如果需要搜索功能,就要另辟蹊径。
Sphinx与Lucene是做搜索引擎的不错的选择。个人观点Lucene对Java的支持比较好,而Sphixn对PHP的支持较好,所以我选择了Sphinx。其实Sphinx对中文的支持也不是很好,由于Sphinx是根据空格来分词(适用与英文),根本不适用中文分词。幸好有人提供了基于Sphinx的支持中文的插件Coreseek和Sphinx—for—chinese。
Coreseek有完整的文档,目前支持最新版的Sphinx,因此我选择Coreseek。
Sphinx-for-chinese严重缺乏文档。
安装:
(1)Coreseek安装。
(2)phinx-for-chinese安装。
创建索引:
Coreseek支持与Mysql直接对接,只需在Coreseek配置文件里填上Mysql的信息,Coreseek就会自动读取Mysql数据来创建索引(当然前提是你做了生成索引的相应设置或者执行生成索引的命令)。但是Sphinx不支持与MongoDB直接对接,可以把Mongo数据源转换为Python数据源或者转换成xmlpipe2数据源。
本人不会Python,因此用php些了一个xml管道用于把MongoDB数据传输到Coreseek。参考代码如下所示:
class SphinxXmlpipe{
private $xmlWriter;
private $fields = array();
private $attributes = array();
private $documents = array();
public function setFields($fields) {
$this->fields = $fields;
}
public function setAttributes($attributes) {
$this->attributes = $attributes;
}
public function beginOutput() {
//create a new xml document
$this->xmlWriter = new \XMLWriter();
$this->xmlWriter->openMemory();
$this->xmlWriter->setIndent(true);
$this->xmlWriter->startDocument('1.0', 'UTF-8');
$this->xmlWriter->startElement('sphinx:docset');
$this->xmlWriter->startElement('sphinx:schema');
// add fileds to the schma
foreach($this->fields as $field) {
$this->xmlWriter->startElement('sphinx:field');
$this->xmlWriter->writeAttribute('name', $field);
$this->xmlWriter->endElement();
}
/*
// add atttributes to the schema
foreach($this->attributes as $attributes) {
$this->xmlWriter->startElement('sphinx:attr');
foreach($attributes as $key => $value) {
$this->xmlWriter->writeAttribute($key, $value);
}
$this->xmlWriter->endElement();
}
*/
$this->xmlWriter->endElement(); // schema
}
public function addDocument($doc) {
$this->xmlWriter->startElement('sphinx:document');
$this->xmlWriter->writeAttribute('id', $doc['book_id']);
foreach($doc as $key => $value) {
$this->xmlWriter->startElement($key);
$this->xmlWriter->text($value);
$this->xmlWriter->endElement();
}
$this->xmlWriter->endElement(); // document
}
public function endOutput() {
// end sphinx:docset
$this->xmlWriter->endElement();
$this->xmlWriter->endDocument();
echo $this->xmlWriter->outputMemory();
}
public function xmlpipe2() {
$this->setfields( array(
'book_id',
'book_name',
));
$this->setAttributes( array(
array(
'name' => 'book_id',
'type' => 'int',
'bits' => '16',
'default' => '1',
),
));
$this->beginOutput();
$mBook = D('book');
$count = $mBook->count();
$limit = c('XMLPIPE_BOOKS_COUNT_PER_TIME');
$tCont = (int)$count/$limit;
$oCount = $count%$limit;
if($tCont>0) {
do {
$books = $mBook->field('book_id,book_name','_id=>0')->limit($limit)->select();
foreach($books as $book) {
$this->addDocument($book);
}
unset($books);
$tCont--;
} while($tCont>0);
$books = $mBook->field('book_id,book_name','_id=>0')->limit($oCount)->select();
foreach($books as $book) {
$this->addDocument($book);
}
unset($books);
} else {
$books = $mBook->field('book_id,book_name','_id=>0')->limit($oCount)->select();
foreach($books as $book) {
$this->addDocument($book);
}
unset($books);
}
$this->endOutput();
}
}
输出的xml格式如下所示
图1
相应的Coreseek设置,参考代码如下所示:
source src1
{
type = xmlpipe2
xmlpipe_command = cd /var/www/PHPParser && php index.php /Home/SphinxXmlpipe/xmlpipe2
xmlpipe_field = book_id
xmlpipe_field = book_name
xmlpipe_attr_timestamp = book_id
xmlpipe_attr_uint = book_id
xmlpipe_fixup_utf8 = 1
}
搜索:
(1)PHP提供了Sphinx扩展,适用于Coreseek。
(2)phinx 安装包提供了sphinxapi,在api目录下。
我用的PHP扩展
sphinx搜索代码。参考代码如下所示:
public function getResultBySearchText($search_text) {
$sphinxClient = new \SphinxClient();
$sphinxClient->setServer('localhost', 9312); // server = localhost,port = 9312.
$sphinxClient->setMatchMode(SPH_MATCH_ANY);
$sphinxClient->setMaxQueryTime(5000); // set search time 5 seconds.
$result = $sphinxClient->query($search_text);
if(isset($result['matches'])) {
$rel['time'] = $result['time'];
$rel['matches'] = $result['matches'];
return $rel;
} else {
$rel['time'] = $result['time'];
return $rel;
}
}
因为用的xmlpipe数据源,所以返回的是文档id,还需根据id去mongo提取数据。至于如何提取mongo数据,我就不写了。