一文入门Elasticsearch

文章放置于:https://github.com/zgkaii/CS-Notes-Kz,欢迎批评指正!

一、ElasticSearch相关概念

1. ElasticSearch和MySQL中的概念比较

ElasticSearch MySQL
Index Database
Type Table
Document Row
Field Column
Mapping Schema
Everything is indexed Index
GET http://… select * from …
POST http://… update table set …

2. 倒排索引

Elasticsearch 使用一种称为 倒排索引 的结构,它适用于快速的全文搜索。一个倒排索引由文档中所有不重复词的列表构成,对于其中每个词,有一个包含它的文档列表。

倒排索引原理

二、安装elasticsearch

docker中安装elastic search
(1)下载elastic search和kibana

docker pull elasticsearch:7.6.2
docker pull kibana:7.6.2

(2)配置

mkdir -p /mydata/elasticsearch/config
mkdir -p /mydata/elasticsearch/data
echo "http.host: 0.0.0.0" >/mydata/elasticsearch/config/elasticsearch.yml
chmod -R 777 /mydata/elasticsearch/

(3)启动Elastic search

docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \
-e  "discovery.type=single-node" \
-e ES_JAVA_OPTS="-Xms64m -Xmx512m" \
-v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \
-v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \
-v  /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins \
-d elasticsearch:7.6.2

设置开机启动elasticsearch

docker update elasticsearch --restart=always

(4)启动kibana:

docker run --name kibana -e ELASTICSEARCH_HOSTS=http://192.168.56.10:9200 -p 5601:5601 -d kibana:7.6.2

设置开机启动kibana

docker update kibana  --restart=always

(5)测试
查看elasticsearch版本信息: http://192.168.56.10:9200/

{
   
"name": "d30f21ec35fc",
"cluster_name": "elasticsearch",
"cluster_uuid": "OidCLBVNSP2su8UiNJl9oA",
"version": {
   
	"number": "7.6.2",
	"build_flavor": "default",
	"build_type": "docker",
	"build_hash": "ef48eb35cf30adf4db14086e8aabd07ef6fb113f",
	"build_date": "2020-03-26T06:34:37.794943Z",
	"build_snapshot": false,
	"lucene_version": "8.4.0",
	"minimum_wire_compatibility_version": "6.8.0",
	"minimum_index_compatibility_version": "6.0.0-beta1"
	},
	"tagline": "You Know, for Search"
}

显示elasticsearch 节点信息http://192.168.56.10:9200/_cat/nodes

127.0.0.1 16 86 8 0.00 0.12 0.24 dilm * d30f21ec35fc

访问Kibana: http://192.168.56.10:5601/app/kibana

Kibana

三、初步检索

1. _CAT

(1) GET/cat/nodes:查看所有节点
如:http://192.168.56.10:9200/_cat/nodes

127.0.0.1 15 88 0 0.06 0.03 0.11 dilm * d30f21ec35fc

注:*表示集群中的主节点

(2)GET/cat/health:查看es健康状况
如: http://192.168.56.10:9200/_cat/health

1599568858 12:40:58 elasticsearch green 1 1 3 3 0 0 0 0 - 100.0%

注:green表示健康值正常

(3)GET /cat/master:查看主节点
如: http://192.168.56.10:9200/_cat/master

QeMO9rMQSj2UjNvuxaNQqg 127.0.0.1 127.0.0.1 d30f21ec35fc

(4)GET/_cat/indices:查看所有索引 ,等价于mysql数据库的show databases;
如: http://192.168.56.10:9200/_cat/indices

green open .kibana_task_manager_1   wldcIEtbT3uQfH_copflYw 1 0 2 1 26.8kb 26.8kb
green open .apm-agent-configuration g5iOaK05QrSSmenBQtqupg 1 0 0 0   283b   283b
green open .kibana_1                2kGHFyKBSCmS8lU1loIqVg 1 0 6 0 22.6kb 22.6kb

2. 索引一个文档

保存一个数据,保存在哪个索引的哪个类型下,指定用那个唯一标识
PUT customer/external/1 在customer索引下的external类型下保存1号数据为

{
   
 "name":"John Doe"
}

PUT和POST都可以
POST新增。如果不指定id,会自动生成id。指定id就会修改这个数据,并新增版本号;
PUT可以新增也可以修改。PUT必须指定id;由于PUT需要指定id,我们一般用来做修改操作,不指定id会报错。
下面是在postman中的测试数据:

创建数据成功后,显示201 created表示插入记录成功。

{
   
    "_index": "customer",
    "_type": "external",
    "_id": "1",
    "_version": 1,
    "result": "created",
    "_shards": {
   
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 0,
    "_primary_term": 1
}

这些返回的JSON串的含义;这些带有下划线开头的,称为元数据,反映了当前的基本信息。
“_index”: “customer” 表明该数据在哪个数据库下;
“_type”: “external” 表明该数据在哪个类型下;
“_id”: “1” 表明被保存数据的id;
“_version”: 1, 被保存数据的版本
“result”: “created” 这里是创建了一条数据,如果重新put一条数据,则该状态会变为updated,并且版本号也会发生变化。
下面选用POST方式:
添加数据的时候,不指定ID,会自动的生成id,并且类型是新增:

再次使用POST插入数据,仍然是新增的:

添加数据的时候,指定ID,会使用该id,并且类型是新增:

再次使用POST插入数据,类型为updated

3. 查看文档

GET /customer/external/2
http://192.168.56.10:9200/customer/external/1

{
   
    "_index": "customer", //在哪个索引
    "_type": "external",//在哪个类型
    "_id": "2",//记录id
    "_version": 2,//版本号
    "_seq_no": 4,//并发控制字段,每次更新都会+1,用来做乐观锁
    "_primary_term": 1, //同上,主分片重新分配,如重启,就会变化
    "found": true,
    "_source": {
    // 真正内容
        "name": "John Doe"
    }
}

通过“if_seq_no=1&if_primary_term=1 ”,当序列号匹配的时候,才进行修改,否则不修改。
实例:将id=2的数据更新为name=1,然后再次更新为name=2,起始_seq_no=4,_primary_term=1
(1)将name更新为1
http://192.168.56.10:9200/customer/external/2?if_seq_no=6&if_primary_term=1

(2)将name更新为2,更新过程中使用seq_no=6
http://192.168.56.10:9200/customer/external/2?if_seq_no=6&if_primary_term=1

出现更新错误。

(3)将name更新为2,更新过程中使用seq_no=5
http://192.168.56.10:9200/customer/external/2?if_seq_no=5&if_primary_term=1

更新成功。

4. 更新文档

(1)POST更新文档,带有_update
http://192.168.56.10:9200/customer/external/2/_update

如果再次执行更新,则不执行任何操作,序列号也不发生变化

{
   
    "_index": "customer",
    "_type": "external",
    "_id": "2",
    "_version": 5,
    "result": "noop",
    "_shards": {
   
        "total": 0,
        "successful": 0,
        "failed": 0
    },
    "_seq_no": 8,
    "_primary_term": 1
}

POST更新方式,会对比原来的数据,和原来的相同,则不执行任何操作(version和_seq_no)都不变。
(2)POST更新文档,不带_update

{
   
    "_index": "customer",
    "_type": "external",
    "_id": "2",
    "_version": 6,
    "result": "updated",
    "_shards": {
   
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 9,
    "_primary_term": 1
}

在更新过程中,重复执行更新操作,数据也能够更新成功,不会和原来的数据进行对比。

5. 删除文档或索引

DELETE customer/external/1
DELETE customer

注:elasticsearch并没有提供删除类型的操作,只提供了删除索引和文档的操作。
实例:删除id=1的数据,删除后继续查询

实例:customer
删除前,所有的索引

green  open .kibana_task_manager_1   wldcIEtbT3uQfH_copflYw 1 0 2 1 26.8kb 26.8kb
green  open .apm-agent-configuration g5iOaK05QrSSmenBQtqupg 1 0 0 0   283b   283b
green  open .kibana_1                2kGHFyKBSCmS8lU1loIqVg 1 0 6 0 22.6kb 22.6kb
yellow open customer                 KjaEuF2-TpaqKWeidOgJUA 1 1 4 1 17.6kb 17.6kb

删除“ customer ”索引

{
   
    "acknowledged": true
}

删除后,所有的索引

green open .kibana_task_manager_1   wldcIEtbT3uQfH_copflYw 1 0 2 1 26.8kb 26.8kb
green open .apm-agent-configuration g5iOaK05QrSSmenBQtqupg 1 0 0 0   283b   283b
green open .kibana_1                2kGHFyKBSCmS8lU1loIqVg 1 0 6 0 22.6kb 22.6kb

6. elasticsearch的批量操作——bulk

语法格式:

{action:{metadata}}\n
{request body  }\n
{action:{metadata}}\n
{request body  }\n

这里的批量操作,当发生某一条执行发生失败时,其他的数据仍然能够接着执行,也就是说彼此之间是独立的。
bulk api以此按顺序执行所有的action(动作)。如果一个单个的动作因任何原因失败,它将继续处理它后面剩余的动作。当bulk api返回时,它将提供每个动作的状态(与发送的顺序相同),所以您可以检查是否一个指定的动作是否失败了。

实例1: 执行多条数据

POST customer/external/_bulk
{"index":{"_id":"1"}}
{"name":"John Doe"}
{"index":{"_id":"2"}}
{"name":"John Doe"}

执行结果

实例2:对于整个索引执行批量操作

POST /_bulk
{"delete":{"_index":"website","_type":"blog","_id":"123"}}
{"create":{"_index":"website","_type":"blog","_id":"123"}}
{"title":"my first blog post"}
{"index":{"_index":"website","_type":"blog"}}
{"title":"my second blog post"}
{"update":{"_index":"website","_type":"blog","_id":"123"}}
{"doc":{"title":"my updated blog post"}}

运行结果:

{
   
  "took" : 289,
  "errors" : false,
  "items" : [
    {
   
      "delete" : {
   
        "_index" : "website",
        "_type" : "blog",
        "_id" : "123",
        "_version" : 1,
        "result" : "not_found",
        "_shards" : {
   
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 1,
        "status" : 404
      }
    },
    {
   
      "create" : {
   
        "_index" : "website",
        "_type" : "blog",
        "_id" : "123",
        "_version" : 2,
        "result" : "created",
        "_shards" : {
   
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 1,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
   
      "index" : {
   
        "_index" : "website",
        "_type" : "blog",
        "_id" : "thn6bXQBrGHPZfvxjEyh",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
   
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 2,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
   
      "update" : {
   
        "_index" : "website",
        "_type" : "blog",
        "_id" : "123",
        "_version" : 3,
        "result" : "updated",
        "_shards" : {
   
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 3,
        "_primary_term" : 1,
        "status" : 200
      }
    }
  ]
}

7. 样本测试数据

准备了一份顾客银行账户信息的虚构的JSON文档样本。每个文档都有下列的schema(模式)。

{
   
	"account_number": 1,
	"balance": 39225,
	"firstname": "Amber",
	"lastname": "Duke",
	"age": 32,
	"gender": "M",
	"address": "880 Holmes Lane",
	"employer": "Pyrami",
	"email": "amberduke@pyrami.com",
	"city": "Brogan",
	"state": "IL"
}

https://github.com/elastic/elasticsearch/blob/master/docs/src/test/resources/accounts.json ,导入官方测试数据,
POST bank/account/_bulk

四、进阶检索

1. search Api

ES支持两种基本方式检索;

  • 通过REST request uri 发送搜索参数 (uri +检索参数);
  • 通过REST request body 来发送它们(uri+请求体);

信息检索
![](https://img-blog.csdnimg.cn/img_convert/88d8002eea63e29db53e236583f146af.png#align=left&display=inline&height=462&margin=[object Object]&originHeight=462&originWidth=1063&status=done&style=none&width=1063)
![](https://img-blog.csdnimg.cn/img_convert/2768e71f9406c5d0690bd7c6620d120f.png#align=left&display=inline&height=635&margin=[object Object]&originHeight=635&originWidth=950&status=done&style=none&width=950)
![](https://img-blog.csdnimg.cn/img_convert/78a132f50f8d94798a5f9c1a168c9cc9.png#align=left&display=inline&height=158&margin=[object Object]&originHeight=158&originWidth=940&status=done&style=none&width=940)
uri+请求体进行检索

GET /bank/_search
{
   
  "query": {
    "match_all": {
   } },
  "sort": [
    {
    "account_number": "asc" },
    {
   "balance":"desc"}
  ]
}

HTTP客户端工具(),get请求不能够携带请求体,

GET bank/_search?q=*&sort=account_number:asc

返回结果:

{
   
  "took" : 46,
  "timed_out" : false,
  "_shards" : {
   
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
   
    "total" : {
   
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
   
        "_index" : "bank",
        "_type" : "account",
        "_id" : "0",
        "_score" : null,
        "_source" : {
   
          "account_number" : 0,
          "balance" : 16623,
          "firstname" : "Bradshaw",
          "lastname" : "Mckenzie",
          "age" : 29,
          "gender" : "F",
          "address" : "244 Columbus Place",
          "employer" : "Euron",
          "email" : "bradshawmckenzie@euron.com",
          "city" : "Hobucken",
          "state" : "CO"
        },
        "sort" : [
          0
        ]
      },
      {
   
        "_index" : "bank",
        "_type" : "account",
        "_id" : "1",
        "_score" : null,
        "_source" : {
   
          "account_number" : 1,
          "balance" : 39225,
          "firstname" : "Amber",
          "lastname" : "Duke",
          "age" : 32,
          "gender" : "M",
          "address" : "880 Holmes Lane",
          "employer" : "Pyrami",
          "email" : "amberduke@pyrami.com",
          "city" : "Brogan",
          "state" : "IL"
        },
        "sort" : [
          1
        ]
      },
      {
   
        "_index" : "bank",
        "_type" : "account",
        "_id" : "2",
        "_score" : null,
        "_source" : {
   
          "account_number" : 2,
          "balance" : 28838,
          "firstname" : "Roberta",
          "lastname" : "Bender",
          "age" : 22,
          "gender" : "F",
          "address" : "560 Kingsway Place",
          "employer" : "Chillium",
          "email" : "robertabender@chillium.com",
          "city" : "Bennett",
          "state" : "LA"
        },
        "sort" : [
          2
        ]
      },
      {
   
        "_index" : "bank",
        "_type" : "account",
        "_id" : "3",
        "_score" : null,
        "_source" : {
   
          "account_number" : 3,
          "balance" : 44947,
          "firstname" : "Levine",
          "lastname" : "Burks",
          "age" : 26,
          "gender" : "F",
          "address" : "328 Wilson Avenue",
          "employer" : "Amtap",
          "email" : "levineburks@amtap.com",
          "city" : "Cochranville",
          "state" : "HI"
        },
        "sort" : [
          3
        ]
      },
      {
   
        "_index" : "bank",
        "_type" : "account",
        "_id" : "4",
        "_score" : null,
        "_source" : {
   
          "account_number" : 4,
          "balance" : 27658,
          "firstname" : "Rodriquez",
          "lastname" : "Flores",
          "age" : 31,
          "gender" : "F",
          "address" : "986 Wyckoff Avenue",
          "employer" : "Tourmania",
          "email" : "rodriquezflores@tourmania.com",
          "city" : "Eastvale",
          "state" : "HI"
        },
        "sort" : [
          4
        ]
      },
      {
   
        "_index" : "bank",
        "_type" : "account",
        "_id" : "5",
        "_score" : null,
        "_source" : {
   
          "account_number" : 5,
          "balance" : 29342,
          "firstname" : "Leola",
          "lastname" : "Stewart",
          "age" : 30,
          "gender" : "F",
          "address" : "311 Elm Place",
          "employer" : "Diginetic",
          "email" : "leolastewart@diginetic.com",
          "city" : "Fairview",
          "state" : "NJ"
        },
        "sort" : [
          5
        ]
      },
      {
   
        "_index" : "bank",
        "_type" : "account",
        "_id" : "6",
        "_score" : null,
        "_source" : {
   
          "account_number" : 6,
          "balance" : 5686,
          "firstname" : "Hattie",
          "lastname" : "Bond",
          "age" : 36,
          "gender" : "M",
          "address" : "671 Bristol Street",
          "employer" : "Netagy",
          "email" : "hattiebond@netagy.com",
          "city" : "Dante",
          "state" : "TN"
        },
        "sort" : [
          6
        ]
      },
      {
   
        "_index" : "bank",
        "_type" : "account",
        "_id" : "7",
        "_score" : null,
        "_source" : {
   
          "account_number" : 7,
          "balance" : 39121,
          "firstname" : "Levy",
          "lastname" : "Richard",
          "age" : 22,
          "gender" : "M",
          "address" : "820 Logan Street",
          "employer" : "Teraprene",
          "email" : "levyrichard@teraprene.com",
          "city" : "Shrewsbury",
          "state" : "MO"
        },
        "sort" : [
          7
        ]
      },
      {
   
        "_index" : "bank",
        "_type" : "account",
        "_id" : "8",
        "_score" : null,
        "_source" : {
   
          "account_number" : 8,
          "balance" : 48868,
          "firstname" : "Jan",
          "lastname" : "Burns",
          "age" : 35,
          "gender" : "M",
          "address" : "699 Visitation Place",
          "employer" : "Glasstep",
          "email" : "janburns@glasstep.com",
          "city" : "Wakulla",
          "state" : "AZ"
        },
        "sort" : [
          8
        ]
      },
      {
   
        "_index" : "bank",
        "_type" : "account",
        "_id" : "9",
        "_score" : null,
        "_source" : {
   
          "account_number" : 9,
          "balance" : 24776,
          "firstname" : "Opal",
          "lastname" : "Meadows",
          "age" : 39,
          "gender" : "M",
          "address" : "963 Neptune Avenue",
          "employer" : "Cedward",
          "email" : "opalmeadows@cedward.com",
          "city" : "Olney",
          "state" : "OH"
        },
        "sort" : [
          9
        ]
      }
    ]
  }
}

(1)只有6条数据,这是因为存在分页查询;
(2)详细的字段信息,参照: https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started-search.html

The response also provides the following information about the search request:

  • took – how long it took Elasticsearch to run the query, in milliseconds
  • timed_out – whether or not the search request timed out
  • _shards – how many shards were searched and a breakdown of how many shards succeeded, failed, or were skipped.
  • max_score – the score of the most relevant document found
  • hits.total.value - how many matching documents were found
  • hits.sort - the document’s sort position (when not sorting by relevance score)
  • hits._score - the document’s relevance score (not applicable when using match_all)

2. Query DSL

ÿ
  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值