es常用语法一

最新推荐文章于 2024-08-30 16:23:11 发布

isTrueLoveColour

最新推荐文章于 2024-08-30 16:23:11 发布

阅读量907

点赞数 1

文章标签： elasticsearch java 大数据

本文链接：https://blog.csdn.net/persistence_PSH/article/details/125416994

版权

索引的CRUD

创建索引

PUT   索引名?pretty
 ?及后面的参数可加可不加

_search 查询索引数据信息

GET   索引名/_search  查询这个索引数据的所有信息

_doc对文档的操作

查看文档数据
GET /索引名/_doc/id

_cat查询索引本身的信息（有很多命令）

GET _cat/indices?v

插入及修改数据

put可新增也可以修改，修改是全量替换（替换所有数据），必须加上id

PUT test/_doc/1
{
  "name": "lisi",
  "age":"30",
  "class":"10"
}

post插入数据可以不加id，会自动生成
POST test/_doc
{
  "name": "lisi",
  "age":"30",
  "class":"30"
}


POST test/_doc/kkedMYEBLE3IVvHq7QF0
{
  "name": "lisi",
  "age":"30",
  "class":"10"
}

修改数据

POST test/_update/nkexMYEBLE3IVvHq4AGo
{
  "doc": {
    "name": "zhaoliu",
    "age": "10"
  }
}

删除

删除数据
DELETE /test/_doc/1

删除索引
DELETE  索引名

Mapping

查看一个索引的映射

GET /test/_mapping

删除映射

DELETE /test_mapping

Elasticsearch 支持如下简单域类型：

字符串: string
整数 : byte, short, integer, long
浮点数: float, double
布尔型: boolean
日期: date

当你索引一个包含新域的文档—之前未曾出现-- Elasticsearch 会使用 动态映射 ，通过JSON中基本数据类型，尝试猜测域类型

例如：下面的语句，会自动映射

PUT test/_doc/1
{
  "name":"lisi",
  "age": 20 ,
  "date":"2022-06-05",
  "des":"这是一条测试数据"
}

查看自动隐射的结果是

GET /test/_mapping

{
  "test" : {
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "long"
        },
        "date" : {
          "type" : "date"
        },
        "des" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

手动映射

创建索引的时候一起创建映射，

PUT test
{
  "mappings": {
    "properties": {
      "age": {
        "type": "text"
      },
      "name": {
        "type": "keyword"
      }
    }
  }
}

mapping一旦创建，type属性不能修改，具体数据类型，参数类型，请查看官网

查询的上下文

{
  "took" : 37,  			//耗时
  "timed_out" : false,		//是否超时
  "_shards" : {			
    "total" : 1,			//总请求数量
    "successful" : 1,		//成功数量
    "skipped" : 0,			//跳过数量
    "failed" : 0			//失败数量
  },
  "hits" : {				//返回的数据
    "total" : {		
      "value" : 4,			//总数量
      "relation" : "eq"		//查询条件，相等
    },
    "max_score" : 1.0,		//最大相关度评分，如果没有指定排序字段，会以相关度排序
    "hits" : [						//数据
    		****************
    ]
  }
}

source

禁用source

GET /test/_search
{
  "_source": false, 
  "query": {
     "match_all": {}
  }
}
禁用source，得到结果，只会有id

也可以指定字段显示
GET /test/_search
{
  "_source": ["name"], 
  "query": {
     "match_all": {}
  }
}

GET /test/_search
{
  "_source": ["obj.*"], 
  "query": {
     "match_all": {}
  }
}

includes包含，excludes排除，如果两者发生冲突，以排除为准
GET /test/_search
{
  "_source": {
    "includes": ["date","age"], 
    "excludes": ["des","tags"]
  } ,
  "query": {
     "match_all": {}
  }
}

这样可以节约空间，不显示不需要的字段，但是这样不支持update,不支持高亮等，通常情况下，如果只是为了节省磁盘，压缩索引不禁用元数据要好的多

DSL 查询

带条件查询

query

默认一页10条

GET /test/_search?q=age:16

第0页，每页4条数据，并按照age升序排序（desc倒序）
GET /test/_search?from=0&size=4&sort=age:asc

不指定字段就会检索全部创建了倒排索引的字段
GET /test/_search?q=测试

这两个查询语句结果是相同的，都是全量查询，相当于mysql中：select * from tableName where 1=1  
GET /test/_search


GET /test/_search
{
  "query": {
     "match_all": {}
  }
}

match 全文检索

查询des字段包含 "测试 123" 的数据（搜索词也会被分词）
GET /test/_search
{
  "query": {
     "match": {
      "des": "测试 123"
    }
  }
}

短语搜索，搜索条件的分词需要全部都有，而且顺序不能颠倒
GET /test/_search
{
  "query": {
    "match_phrase": {
      "des": "测试 a"
    }
  }
}

全量查询
GET /test/_search
{
  "query": {
    "match_all": {
    }
  }
}

查询["name","des","tags"]字段中包含 "w 233 测试" 的数据（查询条件会被分词）
GET /test/_search
{
  "query": {
    "multi_match": {
      "query": "w 233 测试",
      "fields": ["name","des","tags"]
    }
  }
}

match和source的区别

source的排除和包含相当于

select name ,age from table 

match 的过滤相当于：
select * from table  where name=XXX or age=XXX

一个返回结果时，是对字段进行过滤，一个是查询条件的过滤

term精准匹配

把搜索词认为是一个词，不在分词，查询结果必须完成包含搜索词,但是需要注意背查询的词汇是否被分词了

terms 相当于where in ("",""....)
GET /test/_search
{
  "query": {
    "term": {
      "tags": {
        "value": "TEST"
      }
    }
  }
}

match只需要包含一个词项就可以了，match_phrase是短语搜索，会对搜索词进行分词，搜索结果需要包含搜索词的每一个词项且不能更改顺序（必须连续，顺序不能颠倒），term精准搜索搜索条件和查询结果都不能分词

keword不会被分词，但是字段如果过长，进行精准匹配的时候会非常消耗磁盘空间，所以keword默认只会保留前256个字符作为一个词项，keword的词项就需要完全相等了

range范围查找

查询年龄大于10，小于20的数据
GET test/_search
{
  "query" : {
   "range": {
     "age": {
       "gte": 10,
       "lte": 20
     }
   }
  }
}

time_zone是时区
GET test/_search
{
  "query" : {
   "range": {
     "date": {
       "time_zone": "+01:00", 
       "gte": "2021-06-05",
       "lte": "2022-06-05"
     }
   }
  }
}

filter

filter不会去计算相关度和排序，而且会有缓存，所以性能会好很多

constant_score就是固定得分，boost就是分数，得到的所有数据分数是相同的

GET test/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "range": {
          "age": {
            "gte": 10,
            "lte": 20
          }
        }
      },
      "boost": 1.2
    }
  }
}

嵌套bool查询

GET test/_search
{
  "query": {
    "bool": {
      "filter": {
        "range": {
          "age": {
            "gte": 10,
            "lte": 20
          }
        }
      },
      "boost": 1.2
    }
  }
}

bool

bool组合查询，可以组合多个查询语句，有四种子语句：

must 必须全部满足
filter 过滤器
should 可能满足
must_not 必须不满足

只有must和should会影响得分

must 必须全部满足

GET /test/_search/
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "lisi 233"
          }
        },
        {
          "range": {
            "age": {
              "gte": 10,
              "lte": 30
            }
          }
        }
      ]
    }
  }
}

filter 不计算分值
GET /test/_search/
{
  "query": {
    "bool": {
      "filter": [
        {
          "match": {
            "name": "lisi 233"
          }
        },
        {
          "range": {
            "age": {
              "gte": 10,
              "lte": 30
            }
          }
        }
      ]
    }
  }
}

must_not全部不包含

GET /test/_search/
{
  "query": {
    "bool": {
      "must_not": [
        {
          "match": {
            "name": "lisi 233"
          }
        },
        {
          "range": {
            "age": {
              "gte": 10,
              "lte": 15
            }
          }
        }
      ]
    }
  }
}

should满足一个或多个条件，类似与or
GET /test/_search/
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "name": "lisi 233"
          }
        },
        {
          "range": {
            "age": {
              "gte": 10,
              "lte": 35
            }
          }
        }
      ]
    }
  }
}

多字句查询，must和filter，字句之间的关系是and

GET /test/_search/
{
  "query": {
    "bool": {
      "filter": [
        {
          "match": {
            "name": "lisi 233"
          }
        }
      ],
      "must": [
        {
          "range": {
            "age": {
              "gte": 20,
              "lte": 35
            }
          }
        }
      ]
    }
  }
}

should和其他字句组合，满足条件数就可以是0了

GET /test/_search/
{
  "query": {
    "bool": {
      "filter": [
        {
          "match": {
            "name": "lisi 233"
          }
        }
      ],
      "should": [
        {
          "range": {
            "age": {
              "gte": 50,
              "lte": 35
            }
          }
        }
      ]
    }
  }
}



GET /test/_search/
{
  "query": {
    "bool": {
      "filter": [
        {
          "match": {
            "name": "lisi 233"
          }
        }
      ],
      "should": [
        {
          "range": {
            "age": {
              "gte": 50,
              "lte": 35
            }
          }
        }
      ]
      , "minimum_should_match": 1
    }
  }
}

“minimum_should_match”: 可以设置should的满足条件数量或者百分比，如果bool包含至少一个should字句，而没有must或者filter，默认为l,否则，默认为0,想要改变就需要自己设置

bool还可以在嵌套bool

聚合分析

分桶聚合：

bucket agregations类似于group by

GET test/_search
{
  "size": 0,						//size为0，可以不显示hits中的数据
  "aggs": {
    "agg_name1": {
      "terms": {
        "field": "age",
        "size": 10 ,				//默认就是10条数据
        "order": {
          "_key": "desc"
        }
      }
    },
     "agg_name2": {
      "terms": {
        "field": "name.keyword"
      }
    }
  }
}

结果为：

{
******
  "aggregations" : {
    "agg_name2" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "jack",
          "doc_count" : 1
        },
        {
          "key" : "张三",
          "doc_count" : 1
        },
        {
          "key" : "李四",
          "doc_count" : 1
        },
        {
          "key" : "田七",
          "doc_count" : 1
        },
        {
          "key" : "赵六",
          "doc_count" : 1
        }
      ]
    },
    "agg_name1" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 18,
          "doc_count" : 2
        },
        {
          "key" : 19,
          "doc_count" : 1
        },
        {
          "key" : 20,
          "doc_count" : 1
        },
        {
          "key" : 22,
          "doc_count" : 1
        }
      ]
    }
  }
}

聚合分析使用了doc_values的结构，也就是正排索引，而text的类型默认是不创建正排索引的，而keyword是创建正排索引的，所以要对text类型的数据进行聚合分析需要加上".keyword"，或者将text的field_data设置为true

修改mapping时，只有很少一部分可以修改，其他都要保持原样，包括分词器和数据类型
POST test/_mapping
{
  "properties":{
    "des" :{
      "type":"text",
      "analyzer": "ik_smart", 
      "fielddata":true
    }
  }
}

但是这样，对于长文本的数据进行分词一般是没有意义的，而且很占内存，所以不建议修改

指标聚合

Avg 平均值

Max 最大值

Min 最小值

Sum 求和

Cardinality 基数去重

Value Count 技数

Stats 统计聚合

Top Hits 聚合

GET test/_search
{
  "size": 0,
  "aggs": {
    "agg_name1": {
      "max": {
        "field": "age"
      }
    },
    "agg_name2": {
      "min": {
        "field": "age"
      }
    },
    "agg_name3": {
      "avg": {
        "field": "age"
      }
    }
  }
}

最大值，最小值，平均值

{
******
  "aggregations" : {
    "agg_name2" : {
      "value" : 18.0
    },
    "agg_name1" : {
      "value" : 22.0
    },
    "agg_name3" : {
      "value" : 19.4
    }
  }
}

stats能达到同样的效果

GET test/_search
{
  "size": 0,
  "aggs": {
    "agg_name1": {
      "stats": {
        "field": "age"
      }
    }
  }
}

{
 ******
  "aggregations" : {
    "agg_name1" : {
      "count" : 5,
      "min" : 18.0,
      "max" : 22.0,
      "avg" : 19.4,
      "sum" : 97.0
    }
  }
}

cardinality去重，相当于去重后，计算总条数

GET test/_search
{
  "size": 0,
  "aggs": {
    "agg_name1": {
      "cardinality": {
        "field": "age"
      }
    }
  }
}

去重之后有4条词项
{
  "aggregations" : {
    "agg_name1" : {
      "value" : 4
    }
  }
}

管道聚合

对聚合结果进行二次聚合

buckets_path

#平均年龄最小的名字

GET test/_search
{
  "size": 0,
  "aggs": {
    "test": {
      "terms": {
        "field": "name.keyword",
        "size": 10
      },
      "aggs": {
        "test01": {
          "avg": {
            "field": "age"
          }
        }
      }
    },
    "test_bucket": {
      "min_bucket": {
        "buckets_path": "test>test01"    //两个桶之间进行聚合，bucket必须和第一个桶平级
      }
    }
  }
}

{
******
  "aggregations" : {
    "test" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "张三",
          "doc_count" : 2,
          "test01" : {
            "value" : 19.0
          }
        },
        {
          "key" : "李四",
          "doc_count" : 2,
          "test01" : {
            "value" : 17.5
          }
        },
        {
          "key" : "田七",
          "doc_count" : 1,
          "test01" : {
            "value" : 19.0
          }
        },
        {
          "key" : "赵六",
          "doc_count" : 1,
          "test01" : {
            "value" : 22.0
          }
        }
      ]
    },
    "test_bucket" : {
      "value" : 17.5,
      "keys" : [
        "李四"
      ]
    }
  }
}

嵌套聚合

类似于sql的子查询

GET test/_search
{
  "size": 0,
  "aggs": {
    "test": {
      "terms": {
        "field": "type.keyword",
        "size": 10
      },
      "aggs": {							//二次聚合是针对上一次结果的聚合，所以一定和上一级结果同级
        "test01": {
          "avg": {
            "field": "age"
          }
        },
        "test02": {
          "stats": {
            "field": "date"
          }
        }
      }
    }
  }
}

查询与聚合

基于查询的聚合：

先查询在聚合，和代码的顺序无关（不是query在前就执行query，aggs在前就执行aggs）

GET test/_search
{
  "query": {
    "range": {
      "age": {
        "gte": 20,
        "lte": 50
      }
    }
  }, 
  "size": 0,
  "aggs": {
    "test": {
      "terms": {
        "field": "tags.keyword",
        "size": 10
      }
    }
  }
}

基于聚合的查询：

对聚合的结果进行过滤

GET test/_search
{
  "size": 0,
  "aggs": {
    "test": {
      "terms": {
        "field": "name.keyword",
        "size": 10
      }
    }
  },
  "post_filter": {
    "range": {
      "age": {
        "gte": 25,
        "lte": 35
      }
    }
  }
}

聚合排序

order用与聚合内部的排序，

_count 出现数量
_key 字典序

GET /test/_search
{
  "size": 0, 
  "aggs": {
    "test_name": {
      "terms": {
        "field": "name.keyword",
        "size": 10,
        "order": {
          "_key": "asc"
        }
      }
    }
  }
}

如果在有一层就用>嵌套


GET /test/_search
{
  "size": 0,
  "aggs": {
    "test_name": {
      "terms": {
        "field": "name.keyword",
        "order": {
          "agg_state.min": "asc"
        }
      },
      "aggs": {
        "agg_state": {
          "stats": {
            "field": "age"
          }
        }
      }
    }
  }
}

GET /test/_search
{
  "size": 0,
  "aggs": {
    "test_name": {
      "terms": {
        "field": "name.keyword",
        "order": {
          "agg_state>agg_state2.max": "desc"
        }
      },
      "aggs": {
        "agg_state": {
          "filter": {
           "range": {
             "age": {
               "gte": 10,
               "lte": 20
             }
           }
          }, 
          "aggs": {
            "agg_state2": {
              "stats": {
                "field": "age"
              }
            }
          }
        }
      }
    }
  }
}

agg_state>agg_state2.max

层级之间用>,同层用.

{
          "key" : "田七",
          "doc_count" : 1,
          "agg_state" : {
            "doc_count" : 1,
            "agg_state2" : {
              "count" : 1,
              "min" : 19.0,
              "max" : 19.0,
              "avg" : 19.0,
              "sum" : 19.0
            }
          }
        },

常用聚合函数

统计直方图图，分段统计，用之前的方法也可以实现，但是histogram更简单

GET test/_search
{
  "size": 0, 
  "aggs": {
    "NAME": {
      "histogram": {
        "field": "age",				//用age字段进行统计，10为分割的单位
        "interval": 3,
        "keyed": false	,			//是否以key-value的形式展现
        "min_doc_count": 0	,		//显示的最小的doc_count，默认就是0
        "missing": 20				//如果数据有空值，赋予一个默认值
      }
    }
  }
}

date_histogram，针对日期的柱状图

auto_date_histogram可以动态的分割时间

GET test/_search
{
  "size": 0,
  "aggs": {
    "test": {
      "date_histogram": {
        "field": "date",
        "fixed_interval": "10h",			//以什么做分割
        "format": "yyyy-MM-dd",
        "min_doc_count": 0,
        "extended_bounds": {				//起始和终止
          "min": "2022-02-21",
          "max": "2022-06-21"
        }
      }
    } ,
      "aggs": {								//上面分割时间（x轴），下面统计数值（Y轴）
        "NAME": {
          "max": {
            "field": "age"
          }
        }
      }
  }
}

GET /test/_search
{
  "size": 0, 
  "aggs": {
    "test_aggs": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "to": 20			//没有from就是从0开始
          },
          {
            "from":20,
            "to": 30
          },{
            "from":30			//没有to就是大于30
          }
        ]
      }
    }
  }
}

百分比统计，计算的结果不是精确数据，只是接近，数据量越大，越接近

percentiles 10-15，15-20
percentile_ranks 小于10，小于15…

GET test/_search
{
  "size": 0,
  "aggs": {
    "test": {
      "percentile_ranks": {				//
        "field": "age",
        "values": [
          10,
          15,
          20,
          25
        ]
      }
    }
  }
}

{
 
  "aggregations" : {
    "test" : {
      "values" : {
        "10.0" : 0.0,
        "15.0" : 7.142857142857142,
        "20.0" : 47.61904761904762,
        "25.0" : 100.0
      }
    }
  }
}

脚本查询

scripting增删改查

painless是es5.0以上的默认scripting语言

语法类似与java，学习简单，
灵活度高，可编程
性能强于其他脚本语言，但是不如dsl，所以不适合简单的场景
安全性高（相对于其他脚本语言）

ctx._source 上下文，可以获任何数据

GET test/_doc/4


POST test/_update/4
{
  "script": {
    "source": "ctx._source.age+=10"				//age在原来的基础上+10
  }
}

{
  "_index" : "test",
  "_id" : "4",
  "_version" : 3,
  "_seq_no" : 9,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "age" : 11,
    "date" : *,
    "des" : *,
    "name" : *,
    "type" : *,
    "tags" : [
      *,
      *
    ]
  }
}

备份索引的数据

POST _reindex
{
  "source": {
    "index": "test"
  }
  , "dest": {
    "index": "test_bak"
  }
}

用script修改数据

POST test_bak/_update/4
{
  "script": {
    "source": "ctx._source.tags.add('2233')"
  }
}

删除一条数据

POST test_bak/_update/4
{
  "script": {
    "source": "ctx.op='delete'"
  }
}

如果这条数据存在，执行script修改，如果不存在就执行upsert新增

POST test_bak/_update/4
{
  "script": {
    "source": "ctx._source.tags.add('2233')"
  },
  "upsert": {
    "age": "22",
    "date": "2022-04-21T15:42:14.000Z",
    "des": "《共产党宣言》。",
    "name": "12345",
    "type": "学生",
    "tags": [
      "睡觉",
      "打游戏",
      "找工作"
    ]
  }
}

查询，可以在原有查询的基础上，进行计算，不如打折的价格

GET test/_search
{
  "script_fields": {
    "FIELD": {
      "script": {
        "source": "doc['age'].value+10"
      }
    }
  }
}

参数脚本化

上述的方法，确实可以得到想要的结果，但是每次都会重新编译，用下面的方法来动态传递参数，就不会重新编译，可以提高效率

POST test_bak/_update/4
{
  "script": {
    "lang": "painless", 
    "source": "ctx._source.tags.add(params.tag1)",
    "params": {
      "tag1":"123456",
      "tag2":"123"
    }
  }
}

参数是数组，这样就同时获得了原价及6、7、8折的数据

GET test/_search
{
  "script_fields": {
    "brfore": {
      "script": {
        "source": "doc['age'].value"
      }
    },
    "after": {
      "script": {
        "source": "[doc['age'].value*params.in1,doc['age'].value*params.in2,doc['age'].value*params.in3]",
        "params": {
          "in1": 0.8,
          "in2": 0.7,
          "in3": 0.6
        }
      }
    }
  }
}

es可以把模板保存在集群中，这样可以大幅度提高工作效率，

/_scripts/{id}

通过id获取模板

script_id就是模板id

POST  _scripts/script_id
{
  "script": {
    "lang":"painless",
    "source": "doc.age.value * params.in1"
  }
}


GET  test/_search
{
  "script_fields": {
    "test": {
      "script": {
        "lang": "painless",
        "source": "doc['age'].value"
      }
    },
    "tt":{
       "script": {
        "id" :"script_id",
        "params": {
          "in1": 0.8 
        }
      }
    }
  }
}

函数式编程

对于复杂逻辑，往往需要写很长的代码，“”“”""6个"相当于java的{}，这里面可以写一个函数，也就是一个方法