一、数据准备
- 自定义词库
- 建立索引dsl_search(名字随意)
- 手动建立mappings
POST /dsl_search/_mapping
{
"properties": {
"id": {
"type": "long"
},
"age": {
"type": "integer"
},
"username": {
"type": "keyword"
},
"nickname": {
"type": "text",
"analyzer": "ik_max_word"
},
"money": {
"type": "float"
},
"desc": {
"type": "text",
"analyzer": "ik_max_word"
},
"sex": {
"type": "byte"
},
"birthday": {
"type": "date"
},
"face": {
"type": "text",
"index": false
}
}
}
- 录入数据
-
POST /dsl_search/_doc/1001 { "id": 1001, "age": 18, "username": "chinanewsAmazing", "nickname": "中国新闻网", "money": 88.8, "desc": "我在中国新闻网到了很多新闻", "sex": 0, "birthday": "2022-09-01", "face": "https://i2.chinanews.com.cn/simg/cmshd/2022/09/01/527bc4b462d946be81eb900d7c8e63fe.jpg" } { "id": 1002, "age": 19, "username": "justbuy", "nickname": "周杰棍", "money": 77.8, "desc": "今天上下班都很堵,车流量很大", "sex": 1, "birthday": "1993-01-24", "face": "https://i2.chinanews.com.cn/simg/cmshd/2022/09/01/ea8d5d4bc6c146239201034cf7731dce.jpg" } { "id": 1003, "age": 20, "username": "bigFace", "nickname": "飞翔的巨鹰", "money": 66.8, "desc": "中国新闻网团队和导游坐飞机去海外旅游,去了新马泰和欧洲", "sex": 1, "birthday": "1996-01-14", "face": "https://i2.chinanews.com.cn/simg/cmshd/2022/09/01/ea8d5d4bc6c146239201034cf7731dce.jpg" } { "id": 1004, "age": 22, "username": "flyfish", "nickname": "水中鱼", "money": 55.8, "desc": "昨天在学校的池塘里,看到有很多鱼在游泳,然后就去中国新闻网学习了", "sex": 0, "birthday": "1988-02-14", "face": "https://i2.chinanews.com.cn/simg/cmshd/2022/09/01/ea8d5d4bc6c146239201034cf7731dce.jpg" } { "id": 1005, "age": 25, "username": "gotoplay", "nickname": "ps游戏机", "money": 155.8, "desc": "今年生日,女友送了我一台play station游戏机,非常好玩,非常不错", "sex": 1, "birthday": "1989-03-14", "face": "https://i2.chinanews.com.cn/simg/cmshd/2022/09/01/ea8d5d4bc6c146239201034cf7731dce.jpg" } { "id": 1006, "age": 19, "username": "missimooc", "nickname": "我叫小髦", "money": 156.8, "desc": "我叫髦髦,今年20岁,是一名律师,我在琦䯲星球做演讲", "sex": 1, "birthday": "1993-04-14", "face": "https://i2.chinanews.com.cn/simg/cmshd/2022/09/01/ea8d5d4bc6c146239201034cf7731dce.jpg" } { "id": 1007, "age": 19, "username": "msgame", "nickname": "gamexbox", "money": 1056.8, "desc": "明天去进货,最近微软处理很多游戏机,还要买xbox游戏卡带", "sex": 1, "birthday": "1985-05-14", "face": "https://i2.chinanews.com.cn/simg/cmshd/2022/09/01/ea8d5d4bc6c146239201034cf7731dce.jpg" } { "id": 1008, "age": 19, "username": "muke", "nickname": "新闻学习", "money": 1056.8, "desc": "大学毕业后,可以到i2.chinanews.com.cn进修", "sex": 1, "birthday": "1995-06-14", "face": "https://i2.chinanews.com.cn/simg/cmshd/2022/09/01/ea8d5d4bc6c146239201034cf7731dce.jpg" } { "id": 1009, "age": 22, "username": "shaonian", "nickname": "骚年轮", "money": 96.8, "desc": "骚年在大学毕业后,考研究生去了", "sex": 1, "birthday": "1998-07-14", "face": "https://i2.chinanews.com.cn/simg/cmshd/2022/09/01/ea8d5d4bc6c146239201034cf7731dce.jpg" } { "id": 1010, "age": 30, "username": "tata", "nickname": "隔壁老王", "money": 100.8, "desc": "隔壁老外去国外出差,带给我很多好吃的", "sex": 1, "birthday": "1988-07-14", "face": "https://i2.chinanews.com.cn/simg/cmshd/2022/09/01/ea8d5d4bc6c146239201034cf7731dce.jpg" } { "id": 1011, "age": 31, "username": "sprder", "nickname": "皮特帕克", "money": 180.8, "desc": "它是一个超级英雄", "sex": 1, "birthday": "1989-08-14", "face": "https://i2.chinanews.com.cn/simg/cmshd/2022/09/01/ea8d5d4bc6c146239201034cf7731dce.jpg" } { "id": 1012, "age": 31, "username": "super hero", "nickname": "super hero", "money": 188.8, "desc": "BatMan, GreenArrow, SpiderMan, IronMan... are all Super Hero", "sex": 1, "birthday": "1980-08-14", "face": "https://i2.chinanews.com.cn/simg/cmshd/2022/09/01/ea8d5d4bc6c146239201034cf7731dce.jpg" }
二、入门语法
请求参数的查询(QueryString)
查询[字段]包含[内容]的文档
GET /dsl_search/_search?q=desc:新闻网
GET /dsl_search/_search?q=nickname:新&q=age:25
text与keyword搜索对比测试(keyword不会被倒排索引,不会被分词)
GET /dsl_search/_search?q=nickname:super
GET /dsl_search/_search?q=username:super
GET /dsl_search/_search?q=username:super hero
这种方式称之为QueryString查询方式,参数都是放在url中作为请求参数的。
DSL基本语法
QueryString查询方式用的很少,因为一旦参数复杂就很难构建,所以大多数查询都会使用dsl来进行查询更好。
- Domain Specific Language
- 特定领域语言
- 基于JSON格式的数据查询
- 查询更灵活,有利于复杂查询
DSL格式语法:
# 查询
POST /dsl_search/_search
{
"query": {
"match": {
"desc": "新闻网"
}
}
}
# 判断某个字段是否存在
{
"query": {
"exists": {
"field": "desc"
}
}
}
- 语法格式为一个json object,内容都是key-value键值对,json可以嵌套。
- key可以是一些es的关键字,也可以是某个field字段,后面会遇到
搜索不合法问题定位
DSL查询的时候经常会出现一些错误查询,出现这样的问题大多数都是json无法被es解析,也会像java一样报一个异常信息,根据异常信息去推断问题所在,比如json格式不对,关键词不存在未注册等等,甚至有时候不能定位问题直接复制错误信息到google里面搜一下就能定位问题了。
三、查询所有与分页
match_all
在索引中查询所有的文档
GET /dsl_search/_search
演示:
结果:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 13,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "dsl_search",
"_id": "1002",
"_score": 1,
"_source": {
"id": 1002,
"age": 19,
"username": "justbuy",
"nickname": "周杰棍",
"money": 77.8,
"desc": "今天上下班都很堵,车流量很大",
"sex": 1,
"birthday": "1993-01-24",
"face": "https://i2.chinanews.com.cn/simg/cmshd/2022/09/01/ea8d5d4bc6c146239201034cf7731dce.jpg"
}
},
{
"_index": "dsl_search",
"_id": "1003",
"_score": 1,
"_source": {
"id": 1003,
"age": 20,
"username": "bigFace",
"nickname": "飞翔的巨鹰",
"money": 66.8,
"desc": "中国新闻网团队和导游坐飞机去海外旅游,去了新马泰和欧洲",
"sex": 1,
"birthday": "1996-01-14",
"face": "https://i2.chinanews.com.cn/simg/cmshd/2022/09/01/ea8d5d4bc6c146239201034cf7731dce.jpg"
}
},
{
"_index": "dsl_search",
"_id": "1007",
"_score": 1,
"_source": {
"id": 1007,
"age": 19,
"username": "msgame",
"nickname": "gamexbox",
"money": 1056.8,
"desc": "明天去进货,最近微软处理很多游戏机,还要买xbox游戏卡带",
"sex": 1,
"birthday": "1985-05-14",
"face": "https://i2.chinanews.com.cn/simg/cmshd/2022/09/01/ea8d5d4bc6c146239201034cf7731dce.jpg"
}
},
{
"_index": "dsl_search",
"_id": "1008",
"_score": 1,
"_source": {
"id": 1008,
"age": 19,
"username": "muke",
"nickname": "新闻学习",
"money": 1056.8,
"desc": "大学毕业后,可以到i2.chinanews.com.cn进修",
"sex": 1,
"birthday": "1995-06-14",
"face": "https://i2.chinanews.com.cn/simg/cmshd/2022/09/01/ea8d5d4bc6c146239201034cf7731dce.jpg"
}
},
{
"_index": "dsl_search",
"_id": "1011",
"_score": 1,
"_source": {
"id": 1011,
"age": 31,
"username": "sprder",
"nickname": "皮特帕克",
"money": 180.8,
"desc": "它是一个超级英雄",
"sex": 1,
"birthday": "1989-08-14",
"face": "https://i2.chinanews.com.cn/simg/cmshd/2022/09/01/ea8d5d4bc6c146239201034cf7731dce.jpg"
}
},
{
"_index": "dsl_search",
"_id": "_search",
"_score": 1,
"_source": {
"query": {
"match": {
"desc": "新闻网"
}
}
}
},
{
"_index": "dsl_search",
"_id": "1001",
"_score": 1,
"_source": {
"id": 1001,
"age": 18,
"username": "chinanewsAmazing",
"nickname": "中国新闻网",
"money": 88.8,
"desc": "我在中国新闻网到了很多新闻",
"sex": 0,
"birthday": "2022-09-01",
"face": "https://i2.chinanews.com.cn/simg/cmshd/2022/09/01/527bc4b462d946be81eb900d7c8e63fe.jpg"
}
},
{
"_index": "dsl_search",
"_id": "1004",
"_score": 1,
"_source": {
"id": 1004,
"age": 22,
"username": "flyfish",
"nickname": "水中鱼",
"money": 55.8,
"desc": "昨天在学校的池塘里,看到有很多鱼在游泳,然后就去中国新闻网学习了",
"sex": 0,
"birthday": "1988-02-14",
"face": "https://i2.chinanews.com.cn/simg/cmshd/2022/09/01/ea8d5d4bc6c146239201034cf7731dce.jpg"
}
},
{
"_index": "dsl_search",
"_id": "1005",
"_score": 1,
"_source": {
"id": 1005,
"age": 25,
"username": "gotoplay",
"nickname": "ps游戏机",
"money": 155.8,
"desc": "今年生日,女友送了我一台play station游戏机,非常好玩,非常不错",
"sex": 1,
"birthday": "1989-03-14",
"face": "https://i2.chinanews.com.cn/simg/cmshd/2022/09/01/ea8d5d4bc6c146239201034cf7731dce.jpg"
}
},
{
"_index": "dsl_search",
"_id": "1006",
"_score": 1,
"_source": {
"id": 1006,
"age": 19,
"username": "missimooc",
"nickname": "我叫小髦",
"money": 156.8,
"desc": "我叫髦髦,今年20岁,是一名律师,我在琦䯲星球做演讲",
"sex": 1,
"birthday": "1993-04-14",
"face": "https://i2.chinanews.com.cn/simg/cmshd/2022/09/01/ea8d5d4bc6c146239201034cf7731dce.jpg"
}
}
]
}
}
或
POST /dsl_search/_search
{
"query": {
"match_all": {}
},
"_source": ["id", "nickname", "age"]
}
演示:
查询结果和上面一致。
- Head可视化操作
分页查询
默认查询是只有10条记录,可以通过分页来展示。
POST /dsl_search/_search
{
"query": {
"match_all": {}
},
"from": 0,
"size": 10
}
演示:
自定义分页查询
{
"query": {
"match_all": {}
},
"_source": [
"id",
"nickname",
"age"
],
"from": 5,
"size": 5
}
演示:
- Head可视化操作
四、term/match/match_phrase
term精确搜索与match分词搜索
搜索的时候会把用户搜索内容,比如"中国新闻网强大"作为一整个关键词去搜索,而不会对其进行分词后在搜索。
POST /dsl_search/_search
{
"query": {
"term": {
"desc": "新闻网"
}
}
}
对比
{
"query": {
"match": {
"desc": "新闻网"
}
}
}
- 注:match会对新闻网先进行分词(其实就是全文检索),再查询,而term则不会,直接把新闻网作为一个整的词汇去搜索。
- head可视化操作对比
terms多个词汇匹配检索
相当于是tag标签页查询,比如一些新闻会打上国际/宗教/人文/娱乐这样的标签,可以完全匹配做类似标签的查询。
POST /dsl_search/_search
{
"query": {
"terms": {
"desc": ["新闻网", "学习", "骚年"]
}
}
}
match_phrase短语匹配
match:分词后只要有匹配就返回,match_phrase:分词结果必须在text字段分词中都包含,而且顺序必须相同,而且必须都是连续的。(搜索比较严格)
- slop:允许词语间跳过的数量
POST /dsl_search/_search
{
"query": {
"match_phrase": {
"desc": {
"query": "大学 毕业 研究生",
"slop": 3
}
}
}
}
五、match(operator)/ids
match扩展
- operator
(1) or:搜索内容分词后,只要存在一个词语匹配就展示结果
(2) and:搜索内容分词后,都要满足词语匹配
POST /dsl_search/_search
{
"query": {
"match": {
"desc": "xbox游戏机"
}
}
}
# 等同于
{
"query": {
"match": {
"desc": {
"query": "xbox游戏机",
"operator": "or"
}
}
}
}
# 相当于 select * from shop where desc='xbox' or|and desc='游戏机'
- minimum_should_match:最低匹配精度,至少有[分词后的词语个数]x百分比,得出一个数据值取整。举个例子:当前属性设置为70,若一个用户查询检索内容分词后有10个词语,那么匹配度按照10x70%=7,则desc中至少需要有7个词语匹配,就展示;若分词有8个,则8x70%=5.6,则desc中至少需要有5个词语匹配,就展示。
- minimum_should_match 也能设置具体的数字,表示个数
POST /dsl_search/_search
{
"query": {
"match": {
"desc": {
"query": "女友生日送我好玩的xbox游戏机",
"minimum_should_match": "60%"
}
}
}
}
根据文档主键id单个查询
GET /dsl_search/_doc/1001
根据文档主键ids搜索
官网地址:
IDs | Elasticsearch Guide [8.12] | Elastic
POST /dsl_search/_search
{
"query": {
"ids": {
"values": [
"1001",
"1010",
"1008"
]
}
}
}
六、multi_match/boost
官网地址:Multi-match query | Elasticsearch Guide [8.12] | Elastic
multi_match
满足使用match在多个字段中进行查询的需求
POST /dsl_search/_search
{
"query": {
"multi_match": {
"query": "皮特帕克新闻网",
"fields": ["desc", "nickname"]
}
}
}
boost
权重,为某个字段设置权重,权重越高,文档相关性得分就越高。通常来说搜索商品名称要比商品介绍的权重更高。
POST /dsl_search/_search
{
"query": {
"multi_match": {
"query": "皮特帕克新闻网",
"fields": ["desc", "nickname^10"]
}
}
}
nickname^10代表搜索提升10倍相关性,也就是说用户搜索的时候其实以这个nickname为主,desc为辅,nickname的匹配相关度当然要提高权重比例了。
七、布尔查询
- must:查询必须匹配搜索条件,譬如 and
- should:查询匹配满足1个以上条件,譬如 or
- must_not:不匹配搜索条件,一个都不要满足
实操1:
POST /dsl_search/_search
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "新闻网",
"fields": ["desc", "nickname"]
}
},
{
"term": {
"sex": 1
}
},
{
"term": {
"birthday": "1996-01-14"
}
}
]
}
}
}
{
"query": {
"bool": {
"should(must_not)": [
{
"multi_match": {
"query": "学习",
"fields": ["desc", "nickname"]
}
},
{
"match": {
"desc": "游戏"
}
},
{
"term": {
"sex": 0
}
}
]
}
}
}
实操2:
{
"query": {
"bool": {
"must": [
{
"match": {
"desc": "新"
}
},
{
"match": {
"nickname": "新"
}
}
],
"should": [
{
"match": {
"sex": "0"
}
}
],
"must_not": [
{
"term": {
"birthday": "1992-12-24"
}
}
]
}
}
}
Head可视化组合查询
为指定词语加权
特殊场景下,某些词语可以单独加权,这样可以排的更加靠前。
POST /dsl_search/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"desc": {
"query": "律师",
"boost": 18
}
}
},
{
"match": {
"desc": {
"query": "进修",
"boost": 2
}
}
}
]
}
}
}
八、过滤器
对搜索出来的结果进行数据过滤。不会到es库里去搜,不会去计算文档的相关度分数,所以过滤的性能会比较高,过滤器可以和全文搜索结合在一起使用。
post_filter元素是一个顶层元素,只会对搜索结果进行过滤。不会计算数据的匹配度相关性分数,不会根据分数去排序,query则相反,会计算分数,也会按照分数去排序。
使用场景:
- query:根据用户搜索条件检索匹配记录
- post_filter:用于查询后,对结果数据的筛选
实操:查询账户金额大于80元,小于160元的用户。并且生日在1998-07-14的用户
- gte:大于等于
- lte:小于等于
- gt:大于
- lt:小于
(除此以外还能做其他的match等操作也行)
POST /dsl_search/_search
{
"query": {
"match": {
"desc": "新闻网游戏"
}
},
"post_filter": {
"range": {
"money": {
"gt": 60,
"lt": 1000
}
}
}
}
九、排序
es的排序同sql,可以desc也可以asc,也支持组合排序。
实操:
POST /dsl_search/_search
{
"query": {
"match": {
"desc": "新闻网游戏"
}
},
"post_filter": {
"range": {
"money": {
"gt": 55.8,
"lte": 155.8
}
}
},
"sort": [
{
"age": "desc"
},
{
"money": "desc"
}
]
}
对文本排序
由于文本会被分词,所以往往要去做排序会报错,通常我们可以为这个字段增加额外的一个属性,类型为keyword,用于做排序。
- 创建新的索引
POST /dsl_search2/_mapping
{
"properties": {
"id": {
"type": "long"
},
"nickname": {
"type": "text",
"analyzer": "ik_max_word",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
- 插入数据
POST /dsl_search2/_doc
{
"id": 1001,
"nickname": "美丽的风景"
}
{
"id": 1002,
"nickname": "漂亮的小哥哥"
}
{
"id": 1003,
"nickname": "飞翔的巨鹰"
}
{
"id": 1004,
"nickname": "完美的天空"
}
{
"id": 1005,
"nickname": "广阔的海域"
}
- 排序
{
"sort": [
{
"nickname.keyword": "desc"
}
]
}
十、高亮highlight
高亮显示
POST /dsl_search/_search
{
"query": {
"match": {
"desc": "新闻网"
}
},
"highlight": {
"pre_tags": ["<tag>"],
"post_tags": ["</tag>"],
"fields": {
"desc": {}
}
}
}
十一、prefix-fuzzy-wildcard
prefix--根据前缀搜索
场景: 有些英文单词用户记不住,只能记住开头几个字母;
使用match,肯定不行,match只能根据完整词汇;
这个时候可以使用prefix
POST /dsl_search/_search
{
"query": {
"prefix": {
"desc": "新"
}
}
}
fuzzy--模糊搜索
模糊搜索,并不是指的sql的模糊搜索,而是用户在进行搜索的时候的打字错误现象,搜索引擎会自动纠正,然后尝试匹配索引库中的数据。
POST /dsl_search/_search
{
"query": {
"fuzzy": {
"desc": "i2.chinanews.com.co"
}
}
}
# 或多字段搜索
{
"query": {
"multi_match": {
"fields": [ "desc", "nickname"],
"query": "i2.chinaneww supor",
"fuzziness": "AUTO"
}
}
}
{
"query": {
"multi_match": {
"fields": [ "desc", "nickname"],
"query": "演说",
"fuzziness": "1"
}
}
}
官方文档:
Fuzzy query | Elasticsearch Guide [8.12] | Elastic
wildcard
占位符查询
- ?:1个字符
- *:1个或多个字符
POST /dsl_search/_search
{
"query": {
"wildcard": {
"desc": "*chinanews.com.c?"
}
}
}
{
"query": {
"wildcard": {
"desc": "演*"
}
}
}
官方文档: