1.DSL搜索 - 数据准备
1.数据准备
-
自定义词库
- 慕课网
- 慕课
- 课网
- 慕
- 课
- 网
-
建立索引 shop(名字随意)
-
手动建立mappings
POST /shop/_mapping { "properties": { "id": { "type": "long" }, "age": { "type": "integer" }, "username": { "type": "keyword" }, "nickname": { "type": "text", "analyzer": "ik_max_word" }, "money": { "type": "float" }, "desc": { "type": "text", "analyzer": "ik_max_word" }, "sex": { "type": "byte" }, "birthday": { "type": "date" }, "face": { "type": "text", "index": false } } }
-
录入数据
POST /shop/_doc/1001 { "id": 1001, "age": 18, "username": "imoocAmazing", "nickname": "慕课网", "money": 88.8, "desc": "我在慕课网学习java和前端,学习到了很多的知识", "sex": 0, "birthday": "1992-12-24", "face": "https://www.imooc.com/static/img/index/logo.png" } POST /shop/_doc/1002 { "id": 1002, "age": 19, "username": "justbuy", "nickname": "周杰棍", "money": 88.8, "desc": "今天上下班都很堵,车流量很大", "sex": 1, "birthday": "1993-11-24", "face": "https://www.imooc.com/static/img/index/logo.png" } POST /shop/_doc/1003 { "id": 1003, "age": 20, "username": "bigFace", "nickname": "飞翔的巨鹰", "money": 66.8, "desc": "慕课网团队和导游坐飞机去海外旅游,去了新马泰和欧洲", "sex": 1, "birthday": "1996-11-20", "face": "https://www.imooc.com/static/img/index/logo.png" } POST /shop/_doc/1004 { "id": 1004, "age": 22, "username": "flyfish", "nickname": "水中鱼", "money": 55.8, "desc": "昨天在学校的池塘里,看到了很多鱼在游泳,然后就去慕课网上课里", "sex": 0, "birthday": "1988-02-14", "face": "https://www.imooc.com/static/img/index/logo.png" } POST /shop/_doc/1005 { "id": 1005, "age": 25, "username": "gotoplay", "nickname": "ps游戏机", "money": 155.8, "desc": "今年生日,女友送我一台play station游戏机,非常好玩,非常不错", "sex": 1, "birthday": "1988-12-14", "face": "https://www.imooc.com/static/img/index/logo.png" } POST /shop/_doc/1006 { "id": 1006, "age": 19, "username": "missmooc", "nickname": "我叫小慕", "money": 155.8, "desc": "我叫凌云彻,今年20岁,是一名律师,我在奇葩星球做演讲", "sex": 1, "birthday": "1988-12-14", "face": "https://www.imooc.com/static/img/index/logo.png" } POST /shop/_doc/1007 { "id": 1007, "age": 19, "username": "msgame", "nickname": "gamexbox", "money": 1556.8, "desc": "明天去进货,最近微软处理了很多游戏机,还要买xbox游戏卡带", "sex": 1, "birthday": "1988-12-14", "face": "https://www.imooc.com/static/img/index/logo.png" } POST /shop/_doc/1008 { "id": 1008, "age": 19, "username": "muke", "nickname": "幕学习", "money": 1056.8, "desc": "大学毕业后,可以到imooc.com进修", "sex": 1, "birthday": "1987-12-14", "face": "https://www.imooc.com/static/img/index/logo.png" } POST /shop/_doc/1009 { "id": 1008, "age": 19, "username": "muke", "nickname": "幕学习", "money": 1056.8, "desc": "大学毕业后,可以到imooc.com进修", "sex": 1, "birthday": "1987-12-14", "face": "https://www.imooc.com/static/img/index/logo.png" } POST /shop/_doc/1010 { "id": 1008, "age": 19, "username": "muke", "nickname": "幕学习", "money": 1056.8, "desc": "大学毕业后,可以到imooc.com进修", "sex": 1, "birthday": "1987-12-14", "face": "https://www.imooc.com/static/img/index/logo.png" } POST /shop/_doc/1011 { "id": 1011, "age": 31, "username": "speder", "nickname": "皮特帕克", "money": 91.8, "desc": "他是一个超级英雄嗷嗷", "sex": 1, "birthday": "1986-12-14", "face": "https://www.imooc.com/static/img/index/logo.png" }
2.DSL搜索 - 入门语法
1.请求参数的查询**(QueryString)**
1.1查询[字段]包含[内容]的文档
GET /shop/_doc/_search?q=desc:慕课网
GET /shop/_doc/_search?q=nickname:慕&q=age:25
1.2text与keyword搜索对比测试(keyword不会被倒排索引,不会被分词) ,所有username需要完全匹配
GET /shop/_doc/_search?q=nickname:super
GET /shop/_doc/_search?q=username:super
GET /shop/_doc/_search?q=username:super hero
这种方式称之为QueryString查询方式,参数都是放在url中作为请求参数的。
2.DSL基本语法
QueryString用的很少,一旦参数复杂就难以构建,所以大多查询都会使用dsl来进行查询更好。
-
Domain Specific Language
-
特定领域语言
-
基于JSON格式的数据查询
-
查询更灵活,有利于复杂查询
DSL格式语法:
# 查询
POST /shop/_doc/_search
{
"query": {
"match": {
"desc": "慕课网"
}
}
}
# 判断某个字段是否存在
{
"query": {
"exists": {
"field": "desc"
}
}
}
- 语法格式为一个json object,内容都是key-value键值对,json可以嵌套。
- key可以是一些es的关键字,也可以是某个field字段,后面会遇到
3.搜索不合法问题定位
DSL查询的时候经常会出现一些错误查询,出现这样的问题大多都是json无法被es解析,他会像java那样报一个异常信息,根据异常信息去推断问题所在,比如json对,关键词不存在未注册等等,甚至有时候不能定位问题直接复制错误信息到百度一搜就能定位问题了。
3.DSL搜索 - 查询所有与分页
1.查询所有
- match_all
在索引中查询所有的文档
GET /shop/_doc/_search
或
POST /shop/_doc/_search
{
"query": {
"match_all": {}
},
"_source": ["id", "nickname", "age"]
}
- Head 可视化操作
2.分页查询
默认查询是只有10条记录,可以通过分页来展示 from是从第几条开始,size是数据的大小
POST /shop/_doc/_search
{
"query": {
"match_all": {}
},
"from": 0,
"size": 10
}
{
"query": {
"match_all": {}
},
"_source": [
"id",
"nickname",
"age"
],
"from": 5,
"size": 5
}
4. DSL搜索 - term/match
1.term精确搜索与match****分词搜索
搜索的时候会把用户搜索内容,比如“慕课网强大”作为一整个关键词去搜索,而不会对其进行分词后再搜索
POST /shop/_doc/_search
{
"query": {
"term": {
"desc": "慕课网"
}
}
}
对比
{
"query": {
"match": {
"desc": "慕课网"
}
}
}
-
注:match会对 慕课网 先进行分词(其实就是全文检索),在查询,而term则不会,直接把 慕课网 作为一个整的词汇去搜索。
-
head 可视化操作对比:
2.terms 多个词语匹配检索
相当于是tag标签查询,比如慕课网的一些课程会打上 前端 / 后端 / 大数据 / 就业课 这样的标签,可以完全匹配做类似标签的查询
POST /shop/_doc/_search
{
"query": {
"terms": {
"desc": ["慕课网", "学习", "骚年"]
}
}
}
5.DSL搜索 - match_phrase
match_phrase 短语匹配
match:分词后只要有匹配就返回,match_phrase:分词结果必须在text字段分词中都包含,而且顺序必须相同,而且必须都是连续的。(搜索比较严格)
-
slop:允许词语间跳过的数量,可以超过,设置比较大的数量
POST /shop/_doc/_search { "query": { "match_phrase": { "desc": { "query": "大学 毕业 研究生", "slop": 2 } } } }
6.DSL搜索 - match(operator)/ids
match 扩展
-
operator
-
or:搜索内容分词后,只要存在一个词语匹配就展示结果
-
and:搜索内容分词后,都要满足词语匹配
-
POST /shop/_doc/_search
{
"query": {
"match": {
"desc": "慕课网"
}
}
}
# 等同于
{
"query": {
"match": {
"desc": {
"query": "xbox游戏机",
"operator": "or"
}
}
}
}
# 相当于 select * from shop where desc=‘xbox’ or|and desc=‘游戏机’
-
minimum_should_match: 最低匹配精度,至少有[分词后的词语个数]x百分百,得出一个数据值取整。举个例子:当前属性设置为 70 ,若一个用户查询检有10个词语,那么匹配度按照 10x70%=7,则desc中至少需要有7个词语匹配,就展示;若分词后有8个,则 8x70%=5.6,则desc中至少需要有5个词语匹示。
-
minimum_should_match 也能设置具体的数字,表示个数
POST /shop/_doc/_search { "query": { "match": { "desc": { "query": "女友生日送我好玩的xbox游戏机", "minimum_should_match": "60%" } } } }
根据文档主键ids搜索
GET /shop/_doc/1001
查询多个
POST /shop/_doc/_search
{
"query":{
"ids":{
"type": "_doc",
"values": ["1001","1002"]
}
},
"_source": ["id","nickname","desc"],
"from": 0,
"size": 10
}
7.DSL搜索 - multi_match/boost
multi_match
满足使用match在多个字段中进行查询的需求
POST /shop/_doc/_search
{
"query": {
"multi_match": {
"query": "皮特帕克慕课网",
"fields": ["desc", "nickname"]
}
}
}
boost
权重,为某个字段设置权重,权重越高,文档相关性得分就越高。通畅来说搜索商品名称要比商品简介的权重更高。
POST /shop/_doc/_search
{
"query": {
"multi_match": {
"query": "皮特帕克慕课网",
"fields": ["desc", "nickname^10"]
}
}
}
nickname^10 代表搜索提升10倍相关性,也就是说用户搜索的时候其实以这个nickname为主,desc为辅,nickname的匹配相关度当然要提高权重比例了。
8.DSL搜索 - 布尔查询
可以组合多重查询
-
must:查询必须匹配搜索条件,譬如 and
-
should:查询匹配满足1个以上条件,譬如 or
-
must_not:不匹配搜索条件,一个都不要满足
实操1:
POST /shop/_doc/_search
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "慕课网",
"fields": ["desc", "nickname"]
}
},
{
"term": {
"sex": 1
}
},
{
"term": {
"birthday": "1996-01-14"
}
}
]
}
}
}
//例二
{
"query": {
"bool": {
"should(must_not)": [
{
"multi_match": {
"query": "学习",
"fields": ["desc", "nickname"]
}
},
{
"match": {
"desc": "游戏"
}
},
{
"term": {
"sex": 0
}
}
]
}
}
}
//例三
{
"query":{
"bool":{
"must":[
{
"match":{
"desc":"慕"
}
},
{
"match":{
"nickname":"慕"
}
}
]
}
},
"_source": ["id","nickname","desc","sex","age"],
"from": 0,
"size": 10
}
//例四
{
"query":{
"bool":{
"must":[
{
"match":{
"desc":"慕"
}
},
{
"match":{
"nickname":"慕"
}
}
],
"should":[
{
"match":{
"sex":0
}
}
],
"must_not":[
{
"term":{
"id":"1001"
}
}
]
}
},
"_source": ["id","nickname","desc","sex","age"],
"from": 0,
"size": 10
}
//例五
{
"query":{
"bool":{
"must":[
{
"match":{
"desc":{
"query":"学习",
"boost":2
}
}
},
{
"match":{
"nickname":{
"query":"慕",
"boost":10
}
}
}
]
}
},
"_source": ["id","nickname","desc","sex","age"],
"from": 0,
"size": 10
}
9.DSL搜索 - 过滤器
对搜索出来的结果进行数据过滤。不会到es库里去搜,不会去计算文档的相关度分数,所以过滤的性能会比较高,过滤器可以和全文搜索结合在一起使用。
post_filter元素是一个顶层元素,只会对搜索结果进行过滤。不会计算数据的匹配度相关性分数,不会根据分数去排序,query则相反,会计算分数,也会按照分
使用场景:
- query:根据用户搜索条件检索匹配记录
- post_filter:用于查询后,对结果数据的筛选
实操:查询账户金额大于80元,小于160元的用户。并且生日在1998-07-14的用户
-
gte:大于等于
-
lte:小于等于
-
gt:大于
-
lt:小于
(除此以外还能做其他的match等操作也行)
POST /shop/_doc/_search
//例1
{
"query":{
"match":{
"desc":{
"query":"慕课网"
}
}
},
"post_filter":{
"range":{
"money":{
"gt":40,
"lt":70
}
}
},
"_source": ["id","nickname","desc","sex","age","money"],
"from": 0,
"size": 10
}
//例二
{
"query": {
"match": {
"desc": "慕课网游戏"
}
},
"post_filter": {
"range": {
"money": {
"gt": 60,
"lt": 1000
}
}
}
}
//例三
{
"query":{
"match":{
"desc":{
"query":"慕课网"
}
}
},
"post_filter":{
"term":{
"birthday":"1996-11-20"
}
},
"_source": ["id","nickname","desc","sex","age","money","birthday"],
"from": 0,
"size": 10
}
10. DSL搜索 - 排序
es的排序同sql,可以desc也可以asc。也支持组合排序。
实操:
POST /shop/_doc/_search
{
"query": {
"match": {
"desc": "慕课网游戏"
}
},
"post_filter": {
"range": {
"money": {
"gt": 55.8,
"lte": 155.8
}
}
},
"sort": [
{
"age": "desc"
},
{
"money": "asc"
}
]
}
对文本排序
由于文本会被分词,所以往往要去做排序会报错,通常我们可以为这个字段增加额外的一个附属属性,类型为keyword,用于做排序。
- 创建新的索引
POST /shop2/_mapping
{
"properties": {
"id": {
"type": "long"
},
"nickname": {
"type": "text",
"analyzer": "ik_max_word",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
- 添加数据
POST http://123.57.129.206:9200/shop2/_doc/1001
{
"id":1001,
"nickname":"美丽的风景"
}
POST http://123.57.129.206:9200/shop2/_doc/1002
{
"id":1002,
"nickname":"漂亮的小哥哥"
}
POST http://123.57.129.206:9200/shop2/_doc/1003
{
"id":1003,
"nickname":"漂亮的小哥哥"
}
POST http://123.57.129.206:9200/shop2/_doc/1004
{
"id":1004,
"nickname":"完美的天空·"
}
POST http://123.57.129.206:9200/shop2/_doc/1005
{
"id":1005,
"nickname":"广阔的海域"
}
- 文本搜索
GET http://123.57.129.206:9200/shop2/_doc/_search
{
"sort":{
"nickname.keyword": "desc"
}
}
11.DSL搜索 - 高亮highlight
高亮显示
POST /shop/_doc/_search
{
"query": {
"match": {
"desc": "慕课网"
}
},
"highlight": {
"pre_tags": ["<tag>"],
"post_tags": ["</tag>"],
"fields": {
"desc": {}
}
}
}
注意:如果不加**“pre_tags”: [""], “post_tags”: [""],**这两个属性的话,默认的便签是,加了后,是可以自定义便签类型的。
a em {
color: #f73131;
text-decoration: none;
}
12.课外拓展 - prefix-fuzzy-wildcard
-
prefix
根据前缀去查询
POST /shop/_doc/_search { "query": { "prefix": { "desc": "imo" } }
-
fuzzy
模糊搜索,并不是指的sql的模糊搜索,而是用户在进行搜索的时候的打字错误现象,搜索引擎会自动纠正,然后尝试匹配索引库中的数据。
POST /shop/_doc/_search { "query": { "fuzzy": { "desc": "imoov.coom" } } } \# 或多字段搜索 { "query": { "multi_match": { "fields": [ "desc", "nickname"], "query": "imcoc supor", "fuzziness": "AUTO" } } } { "query": { "multi_match": { "fields": [ "desc", "nickname"], "query": "演说", "fuzziness": "1" } } }
官文:https://www.elastic.co/guide/cn/elasticsearch/guide/current/fuzzy-match-query.html
wildcard
占位符查询。
-
?:1个字符
-
*:1个或多个字符