主要分为三步:
1)准备工作
- 下载 Coreseek 4.1 和 WAMP
- 下载 词库(网友共享 或 百度、搜狗各类专业词库)
2)合并词库
- 步骤1:单词排序
- 步骤2:词库合并(重新计算词频)
3)配置 Coreseek
- 参数配置
source mysql { type = mysql ... sql_query_pre = SET NAMES utf8 ... sql_query_info_pre = SET NAMES utf8 ... } index mysql { source = mysql ... html_strip = 0 ... charset_type = zh_cn.utf-8 } indexer { mem_limit = 512M } searchd { listen = 9312 ... max_children = 50 ... }
- 建立索引
bin\indexer.exe --config etc\csft_mysql.conf --all
- 启动服务
bin\searchd.exe --config etc\csft_mysql.conf
- 实时查询
bin\search.exe --config etc\csft_mysql.conf qq
4)编写搜索页面(在 index-mysql.php 基础上)
- 使用GET传参
$keys = array(); $values = array(); foreach ( $_GET as $k => $v ) { $keys[] = $k; $values[] = $v; }
- 设置摘要选项
$opts = array ( "before_match" => "<span style='background: yellow'>", "after_match" => "</span>", "chunk_separator" => " ... ", "limit" => 64, "around" => 8, "exact_phrase" => 1, );
- 生成文章摘要
$docs = Array($value); $excerpts = $cl->BuildExcerpts ( $docs, $index, $words, $opts );
- 输出搜索结果
$value = strip_tags($excerpts[0], "<span>"); ... print "$attrname = $value<br/>\n";
5)测试搜索性能(略)
- all
- extended2