PHP中使用Sphinx

一、简介 (https://www.cnblogs.com/cnsanshao/p/7003960.html)

Sphinx是由俄罗斯人Andrew Aksyonoff开发的一个全文检索引擎。意图为其他应用提供高速、低空间占用、高结果 相关度的全文搜索功能。Sphinx可以非常容易的与SQL数据库和脚本语言集成。当前系统内置MySQL和PostgreSQL 数据库数据源的支持,也支持从标准输入读取特定格式 的XML数据。
Sphinx创建索引的速度为:创建100万条记录的索引只需3~4分钟,创建1000万条记录的索引可以在50分钟内完成,而只包含最新10万条记录的增量索引,重建一次只需几十秒。
Sphinx的特性如下:
a)  高速的建立索引(在当代CPU上,峰值性能可达到10 MB/秒);
b)  高性能的搜索(在2 – 4GB 的文本数据上,平均每次检索响应时间小于0.1秒);
c)  可处理海量数据(目前已知可以处理超过100 GB的文本数据, 在单一CPU的系统上可处理100 M 文档);
d)  提供了优秀的相关度算法,基于短语相似度和统计(BM25)的复合Ranking方法;
e)  支持分布式搜索;
f)  支持短语搜索
g)  提供文档摘要生成
h)  可作为MySQL的存储引擎提供搜索服务;
i)  支持布尔、短语、词语相似度等多种检索模式;
j)  文档支持多个全文检索字段(最大不超过32个);
k)  文档支持多个额外的属性信息(例如:分组信息,时间戳等);
l)  支持断词;
虽然mysql的MYISAM提供全文索引,但是性能却不敢让人恭维

 

Sphinx是不负责文本字段的存储。

假设将数据库的id、date、title、body字段,用Sphinx建立搜索索引。根据关键字、时间、类别、范围等信息查询一下Sphinx,Sphinx只会将查询结果得到的ID号等非文本信息。

如果要显示title、body等信息,我们还需要根据此ID号去查询MySQL数据库,或者从Memcachedb等其他的存储中取得。安装SphinxSE作为MySQL的存储引擎,将MySQL与Sphinx结合起来,是一种便捷的方法。

 

二、 环境搭建一 (http://blog.csdn.net/huang2017/article/details/69665057,https://blog.csdn.net/nuli888/article/details/51892776)

1、Windows环境下:

    1)Sphinx下载地址 http://sphinxsearch.com/downloads/release/ 这里根据自己的情况选择相应的版本下载即可

    2)这里我下载的是sphinx-2.2.11-release-win64.zip,将下载的文件解压。(选择解压到 sphinx-2.2.11-release-win64选项),解压后将sphinx-2.2.11-release-win64文件夹重命名为sphinx(方便后续操作)

    3)将上一步得到的sphinx文件夹剪切或复制到某个磁盘下(方便后续操作),这里我放到F盘下

    4)将F:\sphinx\sphinx.conf.in复制到F:\sphinx\bin\下,并重命名为sphinx.conf;

    5)修改F:\sphinx\bin\sphinx.conf文件的数据库配置,这里根据自己的情况进行配置即可,以下配置项若前面有‘#’注释的,将‘#’去掉,下同

type = mysql
sql_host = localhost
sql_user = root
#笔者数据库用户密码为空,这里就什么都没有填
sql_pass =
sql_db = blog
sql_port = 3306
sql_query_pre = SET NAMES utf-8

    6)修改F:\sphinx\bin\sphinx.conf文件的index test1配置

#存放索引的目录
path = F:/sphinx/data/
#添加utf-8编码表
charset_table = 0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F
#简单分词,搜索中文必须设置为1(只有0和1两个选项)
ngram_len = 1
# 需要分词的字符,搜索中文时必须
ngram_chars = U+3000..U+2FA1F
#指定以下三个文件路径
pid_file=F:/sphinx/log/searchd.pid
log=F:/sphinx/log/searchd.log
query_log=F:/sphinx/log/query.log

    7)导入数据,控制台下输入命令 mysql -uroot blog<F:/sphinx/example.sql,执行成功后在blog数据中中会新增两个表documents和tags
    说明:
        (1)由于我的数据库名称叫blog,因此这里我写的是blog,blog数据库编码设置为utf8
        (2)example.sql中默认的数据库名为test,在导入前修改为统一替换为blog即可,若本身就是在test数据库下测试,就不用修改了
    8)打开cmd窗口,进入目录F:\sphinx\bin
    9)建立索引,执行indexer.exe test1
    10)搜索关键字test(test是导入的documents表中已经存在的关键字),执行命令searchd.exe --pidfile,出现如下图所示界面表示sphinx服务启动成功,关闭该窗口,sphinx服务就停止运行,若要在cmd执行其他操作,可重新开一个

    0475c2a78c9a56b0c027eab1e0fb5e1174a.jpg

2、PHP中调用Sphinx有两种方式

方式一:使用Sphinx官方提供的api文档,引入即可include 'sphinxapi.php'; 

方式二:

(1)在这里 https://pecl.php.net/package/sphinx/1.3.2/windows 下载Sphinx模块(根据自身环境选择相应版本,可以通过phpinfo进行查看,如下图所示)

2df9f000e5756ac03cb7d0a13e18817b8e9.jpg

(2)下载后解压,并将解压得到的php_sphinx.dll文件放到php的ext目录下

(3)修改php.ini文件(根据上面phpinfo中的Loaded Configuration File一栏找的php.ini的位置),添加一行extension=php_sphinx.dll

(4)再次查看phpinfo,出现sphinx扩展(如下图),表示安装成功,可以使用了

e5fd85d57889f3a29107b165185f7aa15a2.jpg

3、使用,代码如下

<?php
$keyword = $_POST['keyword'];
//$keyword = '航母';
$sphinx = new SphinxClient();
$sphinx->SetServer('localhost',9312);
$sphinx->setMatchMode(SPH_MATCH_ANY);//匹配模式 SPH_MATCH_ALL:完全匹配
$result = $sphinx->query($keyword,'*');//*表示在所有索引里面进行搜索
$ids = implode(',',array_keys($result['matches']));

$conn = mysqli_connect('localhost','root','');
mysqli_query($conn,'set names utf8');
mysqli_select_db($conn,'blog');

$sql = "select * from blog_articles where id in (".$ids.")";
$rst = mysqli_query($conn,$sql);
//给匹配关键字添加样式
$opts = array(
'before_match'=>'<font style="font-weight:bold;color:#f00;">',
'after_match'=>'</font>'
);
echo '<pre>';
while($row = mysqli_fetch_assoc($rst)){
$row2 = $sphinx->buildExcerpts($row,'test1',$keyword,$opts);//test1 配置文件中的主数据源索引
print_r($row2); 
}
echo '</pre>';
?>

 

三、 环境搭建二

1、Centos6.5+php5.6+apache+mysql

    1)安装依赖包

yum -y install make gcc g++ gcc-c++ libtool autoconf automake imake php-devel mysql-devel libxml2-devel expat-devel

    2)安装Sphinx

yum install expat expat-devel
wget -c http://sphinxsearch.com/files/sphinx-2.0.7-release.tar.gz
tar zxvf sphinx-2.0.7-release.tar.gz
cd sphinx-2.0.7-release
./configure --prefix=/usr/local/sphinx  --with-mysql --with-libexpat --enable-id64
make && make install

    3)安装libsphinxclient,php扩展用到

cd api/libsphinxclient
./configure --prefix=/usr/local/sphinx/libsphinxclient
make && make install

    4)安装Sphinx的PHP扩展:我的是5.6需装sphinx-1.3.3.tgz,如果是php5.4以下可sphinx-1.3.0.tgz

wget -c http://pecl.php.net/get/sphinx-1.3.3.tgz
tar zxvf sphinx-1.3.3.tgz
cd sphinx-1.3.3
phpize
./configure --with-sphinx=/usr/local/sphinx/libsphinxclient/ --with-php-config=/usr/bin/php-config
make && make install
成功后会提示:
Installing shared extensions:     /usr/lib64/php/modules/
 
 
echo "[Sphinx]" >> /etc/php.ini
echo "extension = sphinx.so" >> /etc/php.ini
#重启apache
service httpd restart

    5)创建测试数据

CREATE TABLE IF NOT EXISTS `items` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `title` varchar(255) NOT NULL,
  `content` text NOT NULL,
  `created` datetime NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COMMENT='全文检索测试的数据表' AUTO_INCREMENT=11 ;
 
INSERT INTO `items` (`id`, `title`, `content`, `created`) VALUES
(1, 'linux mysql集群安装', 'MySQL Cluster 是MySQL 适合于分布式计算环境的高实用、可拓展、高性能、高冗余版本', '2016-09-07 00:00:00'),
(2, 'mysql主从复制', 'mysql主从备份(复制)的基本原理 mysql支持单向、异步复制,复制过程中一个服务器充当主服务器,而一个或多个其它服务器充当从服务器', '2016-09-06 00:00:00'),
(3, 'hello', 'can you search me?', '2016-09-05 00:00:00'),
(4, 'mysql', 'mysql is the best database?', '2016-09-03 00:00:00'),
(5, 'mysql索引', '关于MySQL索引的好处,如果正确合理设计并且使用索引的MySQL是一辆兰博基尼的话,那么没有设计和使用索引的MySQL就是一个人力三轮车', '2016-09-01 00:00:00'),
(6, '集群', '关于MySQL索引的好处,如果正确合理设计并且使用索引的MySQL是一辆兰博基尼的话,那么没有设计和使用索引的MySQL就是一个人力三轮车', '0000-00-00 00:00:00'),
(9, '复制原理', 'redis也有复制', '0000-00-00 00:00:00'),
(10, 'redis集群', '集群技术是构建高性能网站架构的重要手段,试想在网站承受高并发访问压力的同时,还需要从海量数据中查询出满足条件的数据,并快速响应,我们必然想到的是将数据进行切片,把数据根据某种规则放入多个不同的服务器节点,来降低单节点服务器的压力', '0000-00-00 00:00:00');
 
CREATE TABLE IF NOT EXISTS `sph_counter` (
  `counter_id` int(11) NOT NULL,
  `max_doc_id` int(11) NOT NULL,
  PRIMARY KEY (`counter_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COMMENT='增量索引标示的计数表';

    以下采用"Main + Delta" ("主索引"+"增量索引")的索引策略,使用Sphinx自带的一元分词。

    6)Sphinx配置:注意修改数据源配置信息

vi /usr/local/sphinx/etc/sphinx.conf
source items {
    type = mysql
    sql_host = localhost
    sql_user = root
    sql_pass = 123456
    sql_db = sphinx_items
    
    sql_query_pre = SET NAMES utf8
    sql_query_pre = SET SESSION query_cache_type = OFF
    sql_query_pre = REPLACE INTO sph_counter SELECT 1, MAX(id) FROM items
    
    sql_query_range = SELECT MIN(id), MAX(id) FROM items \
                            WHERE id<=(SELECT max_doc_id FROM sph_counter WHERE counter_id=1)
    sql_range_step = 1000
    sql_ranged_throttle = 1000
 
    sql_query = SELECT id, title, content, created, 0 as deleted FROM items \
                    WHERE id<=(SELECT max_doc_id FROM sph_counter WHERE counter_id=1) \
                                AND id >= $start AND id <= $end  
    
    sql_attr_timestamp = created
    sql_attr_bool = deleted
}
 
source items_delta : items {
    sql_query_pre = SET NAMES utf8
    sql_query_range = SELECT MIN(id), MAX(id) FROM items \
                            WHERE id > (SELECT max_doc_id FROM sph_counter WHERE counter_id=1)
    sql_query = SELECT id, title, content, created, 0 as deleted FROM items \
                    WHERE id>( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 ) \
                        AND id >= $start AND id <= $end
    sql_query_post_index = set @max_doc_id :=(SELECT max_doc_id FROM sph_counter WHERE counter_id=1)
    sql_query_post_index = REPLACE INTO sph_counter SELECT 2, IF($maxid, $maxid, @max_doc_id)
}
#主索引
index items {
    source = items
    path = /usr/local/sphinx/var/data/items
    docinfo = extern
    morphology = none
    min_word_len = 1
    min_prefix_len = 0
    html_strip = 1
    html_remove_elements = style, script
    ngram_len = 1
    ngram_chars = U+3000..U+2FA1F
    charset_type = utf-8
    charset_table = 0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F
    preopen = 1   
	min_infix_len = 1
}
 
#增量索引
index items_delta : items {
    source = items_delta
    path = /usr/local/sphinx/var/data/items-delta
}
 
#分布式索引
index master {
    type = distributed
    local = items
    local = items_delta
}
 
indexer {
    mem_limit = 256M
}
 
searchd {
    listen                   = 9312
    listen                   = 9306:mysql41 #Used for SphinxQL
    log                      = /usr/local/sphinx/var/log/searchd.log
    query_log          = /usr/local/sphinx/var/log/query.log
    compat_sphinxql_magics   = 0
    attr_flush_period        	     = 600
    mva_updates_pool   = 16M
    read_timeout           = 5
    max_children           = 0
    dist_threads             = 2 
    pid_file                    = /usr/local/sphinx/var/log/searchd.pid
    max_matches          = 1000
    seamless_rotate       = 1
    preopen_indexes     = 1
    unlink_old               = 1
    workers                  = threads # for RT to work
    binlog_path            = /usr/local/sphinx/var/data
	
}

    保存退出

    7)Sphinx创建索引

#第一次需重建索引:
[root@localhost bin]# ./indexer -c /usr/local/sphinx/etc/sphinx.conf --all
Sphinx 2.0.7-id64-release (r3759)
Copyright (c) 2001-2012, Andrew Aksyonoff
Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com)
 
using config file '/usr/local/sphinx/etc/sphinx.conf'...
indexing index 'items'...
collected 8 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 8 docs, 1121 bytes
total 1.017 sec, 1101 bytes/sec, 7.86 docs/sec
indexing index 'items_delta'...
collected 0 docs, 0.0 MB
total 0 docs, 0 bytes
total 1.007 sec, 0 bytes/sec, 0.00 docs/sec
skipping non-plain index 'master'...
total 4 reads, 0.000 sec, 0.7 kb/call avg, 0.0 msec/call avg
total 14 writes, 0.001 sec, 0.5 kb/call avg, 0.1 msec/call avg
#启动sphinx
[root@localhost bin]# ./searchd -c /usr/local/sphinx/etc/sphinx.conf
Sphinx 2.0.7-id64-release (r3759)
Copyright (c) 2001-2012, Andrew Aksyonoff
Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com)
 
using config file '/usr/local/sphinx/etc/sphinx.conf'...
listening on all interfaces, port=9312
listening on all interfaces, port=9306
precaching index 'items'
precaching index 'items_delta'
rotating index 'items_delta': success
precached 2 indexes in 0.012 sec
#查看进程
[root@localhost bin]# ps -ef | grep searchd
root     30431     1  0 23:59 ?        00:00:00 ./searchd -c /usr/local/sphinx/etc/sphinx.conf
root     30432 30431  0 23:59 ?        00:00:00 ./searchd -c /usr/local/sphinx/etc/sphinx.conf
root     30437  1490  0 23:59 pts/0    00:00:00 grep searchd
#停止Searchd:
./searchd -c /usr/local/sphinx/etc/sphinx.conf --stop
#查看Searchd状态:
./searchd -c /usr/local/sphinx/etc/sphinx.conf --status

    索引更新及使用说明
    "增量索引"每N分钟更新一次.通常在每天晚上低负载的时进行一次索引合并,同时重新建立"增量索引"。当然"主索引"数据不多的话,也可以直接重新建立"主索引"。
    API搜索的时,同时使用"主索引"和"增量索引",这样可以获得准实时的搜索数据.本文的Sphinx配置将"主索引"和"增量索引"放到分布式索引master中,因此只需查询分布式索引"master"即可获得全部匹配数据(包括最新数据)。

    索引的更新与合并的操作可以放到cron job完成:

crontab -e
*/1 * * * *  /usr/local/sphinx/shell/delta_index_update.sh
0 3 * * *    /usr/local/sphinx/shell/merge_daily_index.sh
crontab -l

    cron job所用的shell脚本例子:
    delta_index_update.sh:

#!/bin/bash
/usr/local/sphinx/bin/indexer -c /usr/local/sphinx/etc/sphinx.conf --rotate items_delta > /dev/null 2>&1

    merge_daily_index.sh:

#!/bin/bash
indexer=`which indexer`
mysql=`which mysql`
 
QUERY="use sphinx_items;select max_doc_id from sph_counter where counter_id = 2 limit 1;"
index_counter=$($mysql -h192.168.1.198 -uroot -p123456 -sN -e "$QUERY")
 
#merge "main + delta" indexes
$indexer -c /usr/local/sphinx/etc/sphinx.conf --rotate --merge items items_delta --merge-dst-range deleted 0 0 >> /usr/local/sphinx/var/index_merge.log 2>&1
 
if [ "$?" -eq 0 ]; then
    ##update sphinx counter
    if [ ! -z $index_counter ]; then
        $mysql -h192.168.1.198 -uroot -p123456 -Dsphinx_items -e "REPLACE INTO sph_counter VALUES (1, '$index_counter')"
    fi
    ##rebuild delta index to avoid confusion with main index
    $indexer -c /usr/local/sphinx/etc/sphinx.conf --rotate items_delta >> /usr/local/sphinx/var/rebuild_deltaindex.log 2>&1
fi

    8)php中文分词scws安装:注意扩展的版本和php的版本

wget -c http://www.xunsearch.com/scws/down/scws-1.2.3.tar.bz2
tar jxvf scws-1.2.3.tar.bz2
cd scws-1.2.3
./configure --prefix=/usr/local/scws
make && make install

    9)scws的PHP扩展安装

cd ./phpext
phpize 
./configure
make && make install
echo "[scws]" >> /etc/php.ini
echo "extension = scws.so" >> /etc/php.ini
echo "scws.default.charset = utf-8" >> /etc/php.ini
echo "scws.default.fpath = /usr/local/scws/etc/" >> /etc/php.ini

    10)词库安装

wget http://www.xunsearch.com/scws/down/scws-dict-chs-utf8.tar.bz2
tar jxvf scws-dict-chs-utf8.tar.bz2 -C /usr/local/scws/etc/
chown www:www /usr/local/scws/etc/dict.utf8.xdb

    11)php使用Sphinx+scws测试例子
    
在Sphinx源码API中,有好几种语言的API调用.其中有一个是sphinxapi.php。
    不过以下的测试使用的是Sphinx的PHP扩展.具体安装见本文开头的Sphinx安装部分。
    测试用的搜索类Search.php:
注意修改getDBConnection()信息为自己的

<?php
class Search {
    /**
     * @var SphinxClient 
    **/
    protected $client;
    /**
     * @var string
    **/
    protected $keywords;
    /**
     * @var resource
    **/
    private static $dbconnection = null;
    
    /**
     * Constructor
     **/
    public function __construct($options = array()) {
        $defaults = array(
            'query_mode' => SPH_MATCH_EXTENDED2,
            'sort_mode' => SPH_SORT_EXTENDED,
            'ranking_mode' => SPH_RANK_PROXIMITY_BM25,
            'field_weights' => array(),
            'max_matches' => 1000,
            'snippet_enabled' => true,
            'snippet_index' => 'items',
            'snippet_fields' => array(), 
        );
        $this->options = array_merge($defaults, $options);
        $this->client = new SphinxClient();
		//$this->client->setServer("192.168.1.198", 9312);
        $this->client->setMatchMode($this->options['query_mode']);
        if ($this->options['field_weights'] !== array()) {
            $this->client->setFieldWeights($this->options['field_weights']);
        }
        /*
        if ( in_array($this->options['query_mode'], [SPH_MATCH_EXTENDED2,SPH_MATCH_EXTENDED]) ) {
            $this->client->setRankingMode($this->options['ranking_mode']);
        }
        */
    }
   
    /**
     * Query
     *
     * @param string  $keywords
     * @param integer $offset
     * @param integer $limit
     * @param string  $index
     * @return array
     **/
    public function query($keywords, $offset = 0, $limit = 10, $index = '*') {
        $this->keywords = $keywords;
        $max_matches = $limit > $this->options['max_matches'] ? $limit : $this->options['max_matches'];
        $this->client->setLimits($offset, $limit, $max_matches);
        $query_results = $this->client->query($keywords, $index);
 
        if ($query_results === false) {
            $this->log('error:' . $this->client->getLastError());
        }
               
        $res = [];
        if ( empty($query_results['matches']) ) {
            return $res;
        }
        $res['total'] = $query_results['total'];
        $res['total_found'] = $query_results['total_found'];
        $res['time'] = $query_results['time'];
        $doc_ids = array_keys($query_results['matches']);
        unset($query_results);
        $res['data'] = $this->fetch_data($doc_ids);
        if ($this->options['snippet_enabled']) {
            $this->buildExcerptRows($res['data']);
        }
        
        return $res;
    }
    
    /**
     * custom sorting 
     * 
     * @param string $sortBy
     * @param int $mode
     * @return bool
     **/
    public function setSortBy($sortBy = '', $mode = 0) {
        if ($sortBy) {
            $mode = $mode ?: $this->options['sort_mode'];
            $this->client->setSortMode($mode, $sortBy);
        } else {
            $this->client->setSortMode(SPH_SORT_RELEVANCE);
        }
    }
    
    /**
     * fetch data based on matched doc_ids
     * 
     * @param array $doc_ids
     * @return array
     **/    
    protected function fetch_data($doc_ids) {
        $ids = implode(',', $doc_ids);
        $queries = self::getDBConnection()->query("SELECT * FROM items WHERE id in ($ids)", PDO::FETCH_ASSOC);
        return iterator_to_array($queries);
    }
    
    /**
     * build excerpts for data
     * 
     * @param array $rows
     * @return array
     **/ 
    protected function buildExcerptRows(&$rows) {
        $options = array(
            'before_match' => '<b style="color:red">',
            'after_match'  => '</b>',
            'chunk_separator' => '...',
            'limit'    => 256,
            'around'   => 3,
            'exact_phrase' => false,
            'single_passage' => true,
            'limit_words' => 5,
        );
        $scount = count($this->options['snippet_fields']);
        foreach ($rows as &$row) {
            foreach ($row as $fk => $item) {
                if (!is_string($item) || ($scount && !in_array($fk, $this->options['snippet_fields'])) ) continue;
                $item = preg_replace('/[\r\t\n]+/', '', strip_tags($item));
                $res = $this->client->buildExcerpts(array($item), $this->options['snippet_index'], $this->keywords, $options);
                $row[$fk] = $res === false ? $item : $res[0];
            }
        }        
        return $rows;
    }
    
    /**
     * database connection
     *
     * @return resource
     **/ 
    private static function getDBConnection() {
        $dsn = 'mysql:host=192.168.1.198;dbname=sphinx_items';
        $user = 'root';
        $pass = '123456';
        if (!self::$dbconnection) {
            try {
                self::$dbconnection = new PDO($dsn, $user, $pass);
            } catch (PDOException $e) {
                die('Connection failed: ' . $e->getMessage());
            }
        }
        return self::$dbconnection;
    }
 
    /**
     * Chinese words segmentation
     *
     **/
    public function wordSplit($keywords) {
        $fpath = ini_get('scws.default.fpath');
        $so = scws_new();
        $so->set_charset('utf-8');
        $so->add_dict($fpath . '/dict.utf8.xdb');
        //$so->add_dict($fpath .'/custom_dict.txt', SCWS_XDICT_TXT);
        $so->set_rule($fpath . '/rules.utf8.ini');
        $so->set_ignore(true);
        $so->set_multi(false);
        $so->set_duality(false);
        $so->send_text($keywords);
        $words = [];
        $results =  $so->get_result();
        foreach ($results as $res) {
            $words[] = '(' . $res['word'] . ')';
        }
        $words[] = '(' . $keywords . ')';
        return join('|', $words);
    }
    
    /**
     * get current sphinx client
     *
     * @return resource
     **/
    public function getClient() {
        return $this->client;
    }
    /**
     * log error
     **/
    public function log($msg) {
        // log errors here
        //echo $msg;
    }    
    /**
     * magic methods
     **/
    public function __call($method, $args) {
        $rc = new ReflectionClass('SphinxClient');
        if ( !$rc->hasMethod($method) ) {
            throw new Exception('invalid method :' . $method);
        }
        return call_user_func_array(array($this->client, $method), $args);
    }
}

    测试文件test.php:

<?php
require('Search.php');
$s = new Search([
        'snippet_fields' => ['title', 'content'],
        'field_weights' => ['title' => 20, 'content' => 10],
    ]);
$s->setSortMode(SPH_SORT_EXTENDED, 'created desc,@weight desc');
//$s->setSortBy('created desc,@weight desc');
$words = $s->wordSplit("mysql集群");//先分词 结果:(mysql)|(集群)|(mysql集群)
//print_r($words);exit;
$res = $s->query($words, 0, 10, 'master');
echo '<pre/>';print_r($res);

    测试结果:

eb55f2c5aaf3afce2f7207b3e1200aa8e27.jpg

    12、SphinxQL测试

    要使用SphinxQL需要在Searchd的配置里面增加相应的监听端口(参考上文配置)。

[root@localhost bin]# mysql -h127.0.0.1 -P9306 -uroot -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 2.0.7-id64-release (r3759)
 
Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.
 
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
mysql> show global variables;
+----------------------+---------+
| Variable_name        | Value   |
+----------------------+---------+
| autocommit           | 1       |
| collation_connection | libc_ci |
| query_log_format     | plain   |
| log_level            | info    |
+----------------------+---------+
4 rows in set (0.00 sec)
 
mysql> desc items;
+---------+-----------+
| Field   | Type      |
+---------+-----------+
| id      | bigint    |
| title   | field     |
| content | field     |
| created | timestamp |
| deleted | bool      |
+---------+-----------+
5 rows in set (0.00 sec)
 
mysql> select * from master where match ('mysql集群') limit 10;
+------+---------+---------+
| id   | created | deleted |
+------+---------+---------+
|    1 |    2016 |       0 |
|    6 |       0 |       0 |
+------+---------+---------+
2 rows in set (0.00 sec)
 
mysql> show meta;
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| total         | 2     |
| total_found   | 2     |
| time          | 0.006 |
| keyword[0]    | mysql |
| docs[0]       | 5     |
| hits[0]       | 15    |
| keyword[1]    | 集    |
| docs[1]       | 3     |
| hits[1]       | 4     |
| keyword[2]    | 群    |
| docs[2]       | 3     |
| hits[2]       | 4     |
+---------------+-------+
12 rows in set (0.00 sec)
 
mysql>

 

转载于:https://my.oschina.net/JKOPERA/blog/1839261

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
pocketsphinx是一个开源的语音识别引擎,可以用于嵌入式设备和桌面应用程序。以下是pocketsphinx使用教程: 1. 下载pocketsphinx-0.7.tar.gz文件并解压缩。 2. 下载文语言模型文件tdt_sc_8k和拼音读音字典xxxx.dic,放置于pocketsphinx*** 下载英文语言模型文件hub4wsj_sc_8k和英文读音字典xxxx.dic,放置于pocketsphinx-0.7\model\hmm\en_US和pocketsphinx-0.7\model\lm\en_US目录下。 4. 安装SWIG和Python,并将pocketsphinx-0.7\swig目录添加到系统路径。 5. 在Python导入pocketsphinx模块,并使用以下代码进行语音识别: ``` import pocketsphinx as ps import pyaudio # 创建语音识别器 config = ps.Decoder.default_config() config.set_string('-hmm', 'pocketsphinx-0.7/model/hmm/zh/tdt_sc_8k') config.set_string('-lm', 'pocketsphinx-0.7/model/lm/zh_CN/xxxx.dic') config.set_string('-dict', 'pocketsphinx-0.7/model/lm/zh_CN/xxxx.dic') decoder = ps.Decoder(config) # 开始录音并进行语音识别 p = pyaudio.PyAudio() stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=1024) stream.start_stream() in_speech_bf = False decoder.start_utt() while True: buf = stream.read(1024) if buf: decoder.process_raw(buf, False, False) if decoder.get_in_speech() != in_speech_bf: in_speech_bf = decoder.get_in_speech() if not in_speech_bf: decoder.end_utt() print('识别结果:', decoder.hyp().hypstr) decoder.start_utt() else: break decoder.end_utt() # 关闭录音流 stream.stop_stream() stream.close() p.terminate() ``` 相关问题: 1. pocketsphinx支持哪些语言? 2. 如何优化pocketsphinx的识别准确率? 3. pocketsphinx适用于哪些场景?
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值