(四) MYSQL全文索引 sphinx + coreseek 支持中文

最新推荐文章于 2021-02-07 06:48:21 发布

依秋无泪

最新推荐文章于 2021-02-07 06:48:21 发布

阅读量356

点赞数

分类专栏： mysql性能优化文章标签：优化 Linux mysql 全文索引 sphinx+coreseek

本文链接：https://blog.csdn.net/u012312203/article/details/77661556

版权

mysql性能优化专栏收录该内容

6 篇文章 0 订阅

订阅专栏

 
 前言 : 

 
 注意 : 中文的分词包 在 coreseek 中, 而 coreseek 包含了 sphink ,所以我们要下载 coreseek  

 
 悲催的是 coreseek 官网已经挂掉了,但还是能用 , 需要扩展去学 xunsearch(讯搜) 

 
 1 . 安装 coreseek  

解压
- csft-3.2.14 : sphinx 软件包
- mmseg-3.2.14 : 分词软件包
- README.txt : 说明文档
- testpack : 测试包
先安装分词软件包 mmseg
- 如果检查环境出错,应该检查下gcc装了没有
- 进入目录 : 使用 ./bootstrap 检查环境
- 生成makefile目录 ./configure --prefix = /working/mmseg
- 编译并且执行 : make && make install `
安装 sphinx 软件 [ csft ]
- 检查环境 : bash buildconf.sh 或者是 ./buildconf.sh

- 生成makefile目录 : configure
  - --prefix : 配置程序目录
  - --with-mysql : 指定mysql文件
  - --with-mmseg : 指定分词包
  - --with-mmseg-includes : 指定头信息
  - --with-mmesg-libs : 指定lib包信息
  - --with-unixodbc :
  - ./configure --prefix=/working/sphinx --with-mysql --with-mmseg --with-mmseg-includes=/working/mmseg/include/mmseg --with-mmseg-libs=/working/mmseg/lib/ --with-unixodbc
- make && make install
Coreseek 配置- - -请查看下一篇博客
- 注意 : Coreseek 是把查询索引的sql语句写在该配置文件中,所以一定要注意 : 增量索引其实就是在索引的sql之前进行条件判断

- 假设这个配置文件名叫做 mysql.cnf
- 移动到 sphinx 目录下的 /etc/目录下

注意 : 如果mysql不是源码安装的,那么要配置lib 目录下的so 引用

 
  
 注意 : 要让它立马生效 需要 一个Linux指令 

 
 ldconfig 

 
 生成索引文件 

使用indexer 指令生成全文索引
- indexer : 在 sphinx 目录下的 /bin/下面
indexer -c 配置文件的绝对路径索引名[配置文件中的]
/working/sphinx/bin/indexer -c /working/sphinx/etc/mysql.conf mysql

 
 生开启查询服务 

 
 sphinx 目录下的 / bin /searchd 服务 

 
 参数说明 : --config 或者-c : 就是我们的coreseek配置文件的绝对路径,在上面的配置文件中是 mysql.cnf 

 
 /working/sphinx/bin/searchd -c /working/sphinx/etc/mysql.cnf  

 
 使用 php + API 接口测试中文索引 

 
 1. 找到接口文件 :  

 
 在下载的目录下有个 testpack/api/sphinxapi.php文件 

 
 1 
 、 
 SPH_MATCH_ALL 
 检索方式 

 
 搜索的词：中国人 

 
 分词：中国，中国人， 
 人 

 
 只要搜索到中国人这三个字就返回成功！ 

 
 例如 : 
 我们 
 中国 
 是有很多 
 人 
 的 . 
 匹配成功 

             
  我们 
 中国人 
 是很厉害的 

        
  我们这边有很多 
 人 
 是 
 中国 
 国籍的. 

 
 2 
 、 
 SPH_MATCH_ANY 
 检索方式 

 
 搜索的词：中国人 

 
 分词：中国，中国人， 
 人 

 
 只要搜索到 
 任意一个分词 
 就返回成功！ 

 
 3 
 、 
 SPH_MATCH_ 
 PHRASE 
 检索方式 

 
 搜索的词：中国人 

 
 分词：中国人 

 
 搜索到 
 中国人按照顺序 
 这个词 
 才会返回成功 

 
 例如 : 
 我们 
 中国人 
 是很厉害的. 

              
  我们 
 中国 
 是有很多 
 人 
 的 

 
 4 
 、 
 SPH_MATCH_BOOLEAN 
 检索方式 

 
 搜索的词：中国人 

 
 分词：中国，中国人， 
 人 

 
 这个可以使用布尔值：（ 
 ! 
 取反 
  | 
 或者） 

 
 例如 : 

 
 $search = “ 
 中国 
  |  
 人 
 ”   
 是包含中国或者人的都匹配 

 
 $search= “ 
 中国 
 ! 
 人 
 ”  
 是包含中国不包含人的匹配 
    

 
 5 
 、 
 SPH_MATCH_EXTENDED 
 检索方式 

 
 搜索的词：中国人 

 
 分词：中国，中国人， 
 人 

 
 可以指定搜索某个字段： 

 
 比如： 

  @comname 
  指定搜索comname 
 这个字段 

  @comaddress 
  指定搜索comaddress 
 这个字段 

 
 发现数据很小：只有20 
 条。默认是显示20 
 条 

 
 public 
   
 bool 
   
 SphinxClient::setLimits 
  (  
 int 
   
 $offset 
  ,  
 int 
   
 $limit 
  [,  
 int 
   
 $max_matches 
  = 0 
  [,  
 int 
 $cutoff 
  = 0 
  ]] ) 

 
 代码 :  

  <?php 

  //第一步 : 包含sphinx 的api 文件 

  include 'sphinxapi.php'; 

  header('content-type:text/html;charset=utf-8'); 

  // 第二步: 创建sphinx对象 并且设置服务器 

  $sphinx = new SphinxClient(); 

  $sphinx->SetServer('192.168.241.135:9312'); 

  // 第三步 : 设置索引跟查询的内容 

  $index = 'mysql'; 

  $search = '纺织'; 

  $sphinx->SetLimits(20, 20); 

  $rs = $sphinx->Query($search,$index); 

  // 第四步 : 解析结果集, 发现 $rs['matches'] 才是我们想要的 ,其id就是我们要的数组 

  $ids = array_keys($rs['matches']); 

  $ids = join(',', $ids); 

  //拿到查询到的id,我们去查询数据库 

  $db = mysqli_connect('192.168.241.135', 'hxy', '123456', 'test'); 

  mysqli_query($db,'set names utf8'); 

  $rs = mysqli_query($db, "SELECT * FROM address WHERE id in ($ids)"); 

  //实现 查询关键字的特殊处理 

  $opt = [ 

  "before_match" =>'<font color="red">' , 

  'after_match' => '</font>' 

];

  //遍历结果集 

  while($row = mysqli_fetch_assoc($rs)){ 

  // $row['comname'] = str_replace($search, '<font color="red">' . $search."</font>", $row['comname']); 

  //这个返回的是 索引数组 ,非常的不友好 

  // $row = $sphinx->BuildExcerpts($rs, $index, $search,$opt); 

  echo '编号' . $row['id'] . ": 公司 :" . $row['comname'] . " 地址是:" . $row['comaddress']; 

  echo '<hr>'; 

}

依秋无泪

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
(四) MYSQL全文索引 sphinx + coreseek 支持中文

前言 :注意 : 中文的分词包在 coreseek 中, 而 coreseek 包含了 sphink ,所以我们要下载 coreseek悲催的是 coreseek 官网已经挂掉了,但还是能用 , 需要扩展去学 xunsearch(讯搜)1 . 安装 coreseek解压 csft-3.2.14 : sphinx 软件包mmseg-3
复制链接

扫一扫

专栏目录