Solr之定义业务模型和搜索引擎配置。

最新推荐文章于 2019-06-05 09:19:12 发布

软件求生

最新推荐文章于 2019-06-05 09:19:12 发布

阅读量374

点赞数

分类专栏： # solr 文章标签： solr 搜索引擎

solr 专栏收录该内容

32 篇文章 0 订阅

订阅专栏

定义业务模型

定义搜索的业务模型即对搜索的需求进行分析，确定搜索的业务对象和结构：

确定要搜索的业务对象

分析确定有哪些内容需要进行搜索，这些内容的来源，更新的频次等信息，按如下结构进行定义：

编码	业务对象	优先级	内容来源	更新频次	说明
T001	天气信息	1	168平台	1小时	输入城市名则结果为该城市的天气，没有输入地名则结果为您所属地市的天气信息
T002	餐饮酒店	1	互联网	每日	搜索词可以是地名、特色菜、餐馆名称，搜索结果是与您的搜索词相关的餐馆信息
T003	航班信息	1	互联网	每日	搜索词可以是起始和到港的城市名，搜索结果是与您的搜索词相关的航班信息
T004	火车时刻	1	互联网	每日	搜索词可以是列车车次、起始站点、途经站点，搜索结果是与您的搜索词相关的火车车次详细信息
T005	电影影讯	1	12580	每日	搜索词可以是最新放映的影片名或影院名称，搜索结果是与您的搜索词相关的各影院近期放映的电影影讯。
T006	股票行情	1	168平台	1分钟	搜索词可以是股票代码、股票名称，搜索结果是与您的搜索词相关的即时的股票行情信息。
T007	移动业务	1	手工导入	每月	搜索词可以移动是业务名称，搜索结果是与您的搜索词相关的业务简介和开通方法信息。
T008	通讯录	1	手工导入	每月
T009	短信	1	手工导入	每日
T010	彩信	1	手工导入	每日
T011	WAP网址	1	手工导入	每日

梳理业务对象的结构

针对以上业务对象进行分析梳理出业务对象的结构。以下是通讯录和wap网址的结构定义：

a、通讯录

字段	说明
ID	内容标识
userName	姓名
userMail	用户邮箱
mobile	用户手机号
department	部门
sex	性别
birthday	生日
msn	MSN
qq	QQ号

b、wap网站

字段			说明
ID			内容标识
标题			内容标题
URL地址			Wap网站的URL地址
标签			搜索的依据，多个关键字以"，"分隔。
摘要			Wap网站的简要说明
网站分类			搜索、娱乐、SNS、新闻

定制索引服务

定制索引服务是对索引源进行结构化的索引，换句话说，索引后的结果是结构化的、有意义的信息。定制索引主要涉及如下几个方面：

分类：多核解决方案。
需要检索的字段。
需要存储的字段。
过滤条件。
排序。
索引的更新频次。

定制搜索服务

定制搜索是指确定搜索支持的规则。

过滤条件
排序规则

搜索引擎配置

Solr Schema 设计（如何定制索引的结构）

索引的结构配置主要是对schema.xml中Filedtype、Fields、copyField、dynamicField的配置，下面我们以搜索通信录为例进行说明。

public String userName = null; // 姓名

public String userMail = null; // 用户邮箱

public String mobile = null; // 用户手机号

public String department = null;

public String sex = null; // 性别

public String birthday = null; // 生日

public String msn = null;

public String qq = null;

定义好需要的类型（Fieldtype）

定义好需要的类型（Fieldtype），同时为类型（Fieldtype）配置适合的分词器，下面以mmsg4j中文分词为示例：

<types>

......

<fieldType name = "textComplex" class = "solr.TextField" positionIncrementGap = "100">

<analyzer>

<tokenizer class = "com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode = "complex" dicPath = "/opt/solr-tomcat/solr/dic" />

<filter class = "solr.LowerCaseFilterFactory" />

</analyzer>

</fieldType>

<fieldType name = "textMaxWord" class = "solr.TextField" positionIncrementGap = "100">

<analyzer>

<tokenizer class = "com.chenlb.nmseg4j.solr.MMSegTokenizerFactory" mode = "max-word" dicPath = "/opt/solr-tomcat/solr/dic" />

<filter class = "solr.LowerCaseFilterFactory" />

</analyzer>

</fieldType>

<fieldType name = "textSimple" class = "solr.TextField" positionIncrementGap = "100">

<analyzer>

<tokenizer class = "com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode = "simple" dicPath = "/opt/solr-tomcat/solr/dic" />

<filter class = "solr.LowerCaseFilterFactory" />

</analyzer>

</fieldType>

</types>

注意：dicPath="/opt/solr-tomcat/solr/dic"是你的词库路径。

定义好需要的字段（field）

<fields>

......

<field name = "username" type = "string" indexed = "true" stored = "true" />

<field name = "usermail" type = "text" indexed = "true" stored = "true" />

<field name = "department" type = "text" indexed = "true" stored = "true" />

<field name = "sex" type = "text" indexed = "true" stored = "true" />

<field name = "birthday" type = "text" indexed = "true" stored = "true" />

<field name = "msn" type = "text" indexed = "true" stored = "true" />

<field name = "qq" type = "text" indexed = "true" stored = "true" />

......

</fields>

<copyField source = "simple" dest = "text" />

<copyField source = "complex" dest = "text" />

示例：博客应用程序的生命字段

<field name = "keywords" type = "text_ws" indexed = "true" stored = "true" multiValued = "true" omitNorms = "true" />

<field name = "creationDate" type = "date" indexed = "true" stored = "true" />

<field name = "rating" type = "sint" indexed = "true" stored = "true" />

<field name = "published" type = "boolean" indexed = "true" stored = "true" />

<field name = "content" type = "text" indexed = "true" stored = "true" />

<field name = "all" type = "text" indexed = "true" stored = "true" multiValued = "true"/>

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。