ElasticSearch--入门操作(CRUD)

  • 基本概念

1 Node 与 Cluster

Elastic 本质上是一个分布式数据库,允许多台服务器协同工作,每台服务器可以运行多个 Elastic 实例。单个 Elastic 实例称为一个节点(node)。一组节点构成一个集群(cluster)。

查看当前集群的健康状态:

GET _cluster/health

{
  "cluster_name": "elasticsearch",
  "status": "yellow",
  "timed_out": false,
  "number_of_nodes": 1,
  "number_of_data_nodes": 1,
  "active_primary_shards": 1,
  "active_shards": 1,
  "relocating_shards": 0,
  "initializing_shards": 0,
  "unassigned_shards": 1,
  "delayed_unassigned_shards": 0,
  "number_of_pending_tasks": 0,
  "number_of_in_flight_fetch": 0,
  "task_max_waiting_in_queue_millis": 0,
  "active_shards_percent_as_number": 50
}

 

2 Index

Elastic 会索引所有字段,经过处理后写入一个倒排索引(Inverted Index,也叫反向索引)。查找数据的时候,直接查找该索引。所以,Elastic 数据管理的顶层单位就叫做 Index(索引)。它是单个数据库的同义词。每个 Index (即数据库)的名字必须是小写。

  • 查看当前节点的所有 Index:
GET _cat/indices?v

health status index   uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   .kibana BVGnJPzQSVCDgVY0vtbMmw   1   1          1            0      3.1kb          3.1kb
  • 新建Index
PUT weather
返回
{
  "acknowledged": true,//操作成功
  "shards_acknowledged": true
}

创建时指定元数据 

PUT test_index
{
  "settings": {
    "number_of_replicas": 1,
    "number_of_shards": 1
  }, 
  "mappings": {
    "test_type":{
      "properties": {
        "name":{
          "type": "text"
        }
      }
    }
  }
}
返回:
{
  "acknowledged": true,
  "shards_acknowledged": true
}
  • 修改index
PUT test_index/_settings
{
  "number_of_replicas": 1
}

number_of_shards不可以修改

  • 删除 Index
DELETE weather

{
  "acknowledged": true
}

GET _cat/indices?v
health status index   uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   .kibana BVGnJPzQSVCDgVY0vtbMmw   1   1          1            0      3.1kb          3.1kb

附:

DELETE _all:删除所有(在elasticsearch.yml中设置action.destructive_requires:true,就不可以使用此方法删除所有索引了)

DELETE index1,index2:删除index1和index2

DELETE index*:删除以index开头的索引

3 Type

Document 可以分组,比如weather这个 Index 里面,可以按城市分组(北京和上海),也可以按气候分组(晴天和雨天)。这种分组就叫做 Type,它是虚拟的逻辑分组,用来过滤 Document。

不同的 Type 应该有相似的结构(schema),举例来说,id字段不能在这个组是字符串,在另一个组是数值。这是与关系型数据库的表的一个区别。性质完全不同的数据(比如productslogs)应该存成两个 Index,而不是一个 Index 里面的两个 Type(虽然可以做到)。

Elastic 6.x 版只允许每个 Index 包含一个 Type,7.x 版将会彻底移除 Type。

列出每个 Index 所包含的 Type:

GET _mapping/?pretty=true
返回:
{
  ".kibana": {
    "mappings": {
      "server": {
        "properties": {
          "uuid": {
            "type": "keyword"
          }
        }
      },
      "index-pattern": {
        "properties": {
          "fieldFormatMap": {
            "type": "text"
          },
          "fields": {
            "type": "text"
          },
          "intervalName": {
            "type": "text"
          },
          "notExpandable": {
            "type": "boolean"
          },
          "sourceFilters": {
            "type": "text"
          },
          "timeFieldName": {
            "type": "text"
          },
          "title": {
            "type": "text"
          }
        }
      },
      "config": {
        "properties": {
          "buildNum": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

 

4 Document

Index 里面单条的记录称为 Document(文档)。许多条 Document 构成了一个 Index。Document 使用 JSON 格式表示,同一个 Index 里面的 Document,不要求有相同的结构(scheme),但是最好保持相同,这样有利于提高搜索效率。

  • 新增document/全量更新(替换document):指定_id 

全量更新: 记录的Id不变,但是版本(version)加1,操作类型(result)从created变成updated,created字段变成false。

其内部是先获取旧数据,然后发送逻辑删除命令,最后再发送新增命令。因为在获取旧数据和es完成新增操作的时间不可靠,所以会增加并发冲突,故不推荐用此方法更新document。

PUT accounts/person/1
{
  "user": "张三",
  "title": "工程师",
  "desc": "数据库管理"
}
返回:
{
  "_index": "accounts",
  "_type": "person",
  "_id": "1",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "created": true
}
  • 新增document:不指定_id
POST accounts/person
{
  "user": "张三",
  "title": "工程师",
  "desc": "数据库管理"
}
返回:
{
  "_index": "accounts",
  "_type": "person",
  "_id": "AXCz0GZWNekRbwyPL8i2",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "created": true
}
  • 查看document

http协议里get请求是不能带request body的,但是ES认为使用GET请求语义更好,所以ES支持GET+request body的方式(大部分服务器和浏览器也都支持),如果遇到不支持的情况,把GET请求更改为POST请求即可。

GET accounts/person/1/?pretty=true //pretty=true表示以易读的格式返回。

返回:
{
  "_index": "accounts",
  "_type": "person",
  "_id": "1",
  "_version": 1,
  "found": true,//查询成功
  "_source": {//原始记录
    "user": "张三",
    "title": "工程师",
    "desc": "数据库管理"
  }
}
  • 删除document

删除操作并非立即物理删除,而是先进行逻辑删除,当存储空间不足等问题发生的时候,才会进行真正的物理删除。

验证方式:

  1. 新增一个document,指定id为1,其_version为1
  2. 删除此document,其_version变为2
  3. 再次新增一个document并且指定id为1,会发现其_version为3
DELETE accounts/person/1
返回:
{
  "found": true,
  "_index": "accounts",
  "_type": "person",
  "_id": "1",
  "_version": 6,
  "result": "deleted",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  }
}

GET accounts/person/1
返回:
{
  "_index": "accounts",
  "_type": "person",
  "_id": "1",
  "found": false
}
  • 更新document/partial update

部分更新,不需要全部的json数据,_version会+1,result会变为updated

优势:

  1. 所有的查询,修改,回写都发生在es内部,节省了网络数据的传输开销,提升了性能
  2. 减少了查询和修改中的时间间隔,有效减少并发冲突
POST /accounts/person/1/_update
{
  "doc": {
    "user": "张三san",
    "title": "工程师",
    "desc": "数据库管理"
  }
}
返回:
{
  "_index": "accounts",
  "_type": "person",
  "_id": "1",
  "_version": 2,
  "result": "updated",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  }
}

GET accounts/person/1
返回:
{
  "_index": "accounts",
  "_type": "person",
  "_id": "1",
  "_version": 2,
  "found": true,
  "_source": {
    "user": "张三san",
    "title": "工程师",
    "desc": "数据库管理"
  }
}
  • 列出所有document  
GET accounts/person/_search
{
  "query": {
    "match_all": {}
  }
}
或者
GET accounts/person/_search
返回:
{
  "took": 1,//操作的耗时(单位为毫秒)
  "timed_out": false,//是否超时
  "_shards": {//所使用到的shard
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {//命中的记录
    "total": 2,//返回记录数
    "max_score": 1,//最高的匹配程度
    "hits": [//返回的记录组成的数组
      {
        "_index": "accounts",
        "_type": "person",
        "_id": "AXCz0GZWNekRbwyPL8i2",
        "_score": 1,//匹配的程序,默认是按照这个字段降序排列
        "_source": {
          "user": "张三",
          "title": "工程师",
          "desc": "数据库管理"
        }
      },
      {
        "_index": "accounts",
        "_type": "person",
        "_id": "1",
        "_score": 1,
        "_source": {
          "user": "张三san",
          "title": "工程师",
          "desc": "数据库管理"
        }
      }
    ]
  }
}
 

 

5 全文搜索(full text search)

会将输入的字符串进行分词,然后去倒排索引里去一一匹配,只要能匹配上任意一个分词后的词语,就可以作为结果返回

GET accounts/person/_search
{
  "from": 0, //指定位移
  "size": 1, //设置返回的结果数量
  "query": {
    "match": {
      "user": "san"
    }
  }
}
返回:
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.25316024,
    "hits": [
      {
        "_index": "accounts",
        "_type": "person",
        "_id": "1",
        "_score": 0.25316024,
        "_source": {
          "user": "张三san",
          "title": "工程师",
          "desc": "数据库管理"
        }
      }
    ]
  }
}

6 短语搜索(phrase search)

输入的字符串必须在指定的字段文本中,必须包含一模一样的字符串才算匹配,才能作为结果返回,和全文搜索正好相反

短语搜索
GET accounts/person/_search
{
  "query": {
    "match": {
      "desc": "数据管理"
    }
  }
}
返回
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

使用全文搜索:
GET accounts/person/_search
{
  "query": {
    "match": {
      "desc": "数据管理"
    }
  }
}
返回:
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": 1.1299736,
    "hits": [
      {
        "_index": "accounts",
        "_type": "person",
        "_id": "5",
        "_score": 1.1299736,
        "_source": {
          "user": "lisi",
          "age": 50,
          "salary": 6000,
          "title": "业务员",
          "desc": "数据库管理"
        }
      },
      {
        "_index": "accounts",
        "_type": "person",
        "_id": "1",
        "_score": 1.1299736,
        "_source": {
          "user": "zeng",
          "age": 30,
          "salary": 20000,
          "title": "工程师",
          "desc": "数据库管理"
        }
      },
      {
        "_index": "accounts",
        "_type": "person",
        "_id": "3",
        "_score": 1.1299736,
        "_source": {
          "user": "jim",
          "age": 35,
          "salary": 17000,
          "title": "工程师",
          "desc": "数据库管理"
        }
      },
      {
        "_index": "accounts",
        "_type": "person",
        "_id": "2",
        "_score": 0.71613276,
        "_source": {
          "user": "sam",
          "age": 20,
          "salary": 15000,
          "title": "工程师",
          "desc": "数据库管理"
        }
      },
      {
        "_index": "accounts",
        "_type": "person",
        "_id": "4",
        "_score": 0.71613276,
        "_source": {
          "user": "zhangsan",
          "age": 50,
          "salary": 5000,
          "title": "业务员",
          "desc": "数据库管理"
        }
      }
    ]
  }
}

7 逻辑运算搜索

现有记录:
GET accounts/person/_search
返回:
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "accounts",
        "_type": "person",
        "_id": "AXCz0GZWNekRbwyPL8i2",
        "_score": 1,
        "_source": {
          "user": "张三",
          "title": "工程师",
          "desc": "数据库管理"
        }
      },
      {
        "_index": "accounts",
        "_type": "person",
        "_id": "2",
        "_score": 1,
        "_source": {
          "user": "王五 ",
          "title": "工程师",
          "desc": "数据库管理"
        }
      },
      {
        "_index": "accounts",
        "_type": "person",
        "_id": "1",
        "_score": 1,
        "_source": {
          "user": "张三李四",
          "title": "工程师",
          "desc": "数据库管理"
        }
      }
    ]
  }
}

and搜索:
GET accounts/person/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "user": "四"
          }
        },
        {
          "match": {
            "user": "三"
          }
        }
      ],
      "must_not": [
        {"match": {
          "user": "五"
        }}
      ]
    }
  }
}
返回:
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "accounts",
        "_type": "person",
        "_id": "1",
        "_score": 0.5753642,
        "_source": {
          "user": "张三李四",
          "title": "工程师",
          "desc": "数据库管理"
        }
      }
    ]
  }
}

or搜索:
GET accounts/person/_search
{
  "query": {
    "match": {
      "user": "张三 李四"
    }
  }
}
返回:
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1.1507283,
    "hits": [
      {
        "_index": "accounts",
        "_type": "person",
        "_id": "1",
        "_score": 1.1507283,
        "_source": {
          "user": "张三李四",
          "title": "工程师",
          "desc": "数据库管理"
        }
      },
      {
        "_index": "accounts",
        "_type": "person",
        "_id": "AXCz0GZWNekRbwyPL8i2",
        "_score": 0.51623213,
        "_source": {
          "user": "张三",
          "title": "工程师",
          "desc": "数据库管理"
        }
      }
    ]
  }
}

8 高亮搜索

GET accounts/person/_search
{
  "query": {
    "match": {
      "user": "sam"
    }
  },
  "highlight": {
    "pre_tags": [//高亮前缀标签
        "<a class='highlightClass' href='#'>"
      ],
    "post_tags": [//高亮后缀标签
        "</a>"
      ], 
    "fields": {//需要高亮的字段
      "user":{},
      "desc":{}
    }
  }
}
返回:
{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.6931472,
    "hits": [
      {
        "_index": "accounts",
        "_type": "person",
        "_id": "2",
        "_score": 0.6931472,
        "_source": {
          "user": "sam",
          "age": 20,
          "salary": 15000,
          "title": "工程师",
          "desc": "数据库管理"
        },
        "highlight": {
          "user": [
            "<em>sam</em>"
          ]
        }
      }
    ]
  }
}

9 综合示例

现有数据:
GET accounts/person/_search
返回:
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": 1,
    "hits": [
      {
        "_index": "accounts",
        "_type": "person",
        "_id": "5",
        "_score": 1,
        "_source": {
          "user": "lisi",
          "age": 50,
          "salary": 6000,
          "title": "业务员",
          "desc": "数据库管理"
        }
      },
      {
        "_index": "accounts",
        "_type": "person",
        "_id": "2",
        "_score": 1,
        "_source": {
          "user": "sam",
          "age": 20,
          "salary": 15000,
          "title": "工程师",
          "desc": "数据库管理"
        }
      },
      {
        "_index": "accounts",
        "_type": "person",
        "_id": "4",
        "_score": 1,
        "_source": {
          "user": "zhangsan",
          "age": 50,
          "salary": 5000,
          "title": "业务员",
          "desc": "数据库管理"
        }
      },
      {
        "_index": "accounts",
        "_type": "person",
        "_id": "1",
        "_score": 1,
        "_source": {
          "user": "zeng",
          "age": 30,
          "salary": 20000,
          "title": "工程师",
          "desc": "数据库管理"
        }
      },
      {
        "_index": "accounts",
        "_type": "person",
        "_id": "3",
        "_score": 1,
        "_source": {
          "user": "jim",
          "age": 35,
          "salary": 17000,
          "title": "工程师",
          "desc": "数据库管理"
        }
      }
    ]
  }
}

1 查找出title包含“工”字、salary在16000至20000之间的记录,并且按age倒序,只显示age、title和salary字段
GET accounts/person/_search
{
  "query": {
    "bool": {
      "must": [//查找条件
        {"match": {
          "title": "工"
        }}
      ],
      "filter": {//过滤条件
        "range": {
          "salary": {
            "gte": 16000,
            "lte": 200000
          }
        }
      }
    }
  }, 
  "sort": [//排序条件
    {
      "age": "desc"
    }
  ], 
  "_source": ["age","title","salary"]//指定显示字段
}
返回:
{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": null,
    "hits": [
      {
        "_index": "accounts",
        "_type": "person",
        "_id": "3",
        "_score": null,
        "_source": {
          "title": "工程师",
          "age": 35
        },
        "sort": [
          35
        ]
      },
      {
        "_index": "accounts",
        "_type": "person",
        "_id": "1",
        "_score": null,
        "_source": {
          "title": "工程师",
          "age": 30
        },
        "sort": [
          30
        ]
      },
      {
        "_index": "accounts",
        "_type": "person",
        "_id": "2",
        "_score": null,
        "_source": {
          "title": "工程师",
          "age": 20
        },
        "sort": [
          20
        ]
      }
    ]
  }
}

 

注:搜索的时候可以指定超时时间,在指定的时间内,找到多少条结果就返回多少条结果,如:GET _search?timeout=10ms。单位ms为毫秒数,s为秒数,m为分钟数

  • filter和query的区别

  1. filter:只是按照搜索条件搜索出来的数据(已计算出相关度,已排好序)进行过滤,对相关度分数没有影响,且自动cache最常用的filter的数据,所以性能最好
  2. query:会计算每个document相对于搜索条件的相关度,并按照相关度排序,且不会cache结果

 

  • query的类型

  1. match: match模糊匹配,先对输入进行分词,对分词后的结果进行查询,文档只要包含match查询条件的一部分就会被返回
  2. term结构化字段查询,匹配一个值,且输入的值不会被分词器分词。
  3. match_phase习语匹配,查询确切的phase,在对查询字段定义了分词器的情况下,会使用分词器对输入进行分词

  4. 等等

  • constant_score:只使用filter过滤数据

GET /accounts/person/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "age": 50
        }
      }
    }
  }
}
返回:
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "accounts",
        "_type": "person",
        "_id": "5",
        "_score": 1,
        "_source": {
          "user": "lisi",
          "age": 50,
          "salary": 6000,
          "title": "业务员",
          "desc": "数据库管理",
          "tag": "girl"
        }
      },
      {
        "_index": "accounts",
        "_type": "person",
        "_id": "9",
        "_score": 1,
        "_source": {
          "user": "lisi",
          "age": 50,
          "salary": 6000,
          "title": "业务员",
          "desc": "数据库管理"
        }
      },
      {
        "_index": "accounts",
        "_type": "person",
        "_id": "6",
        "_score": 1,
        "_source": {
          "user": "lisi",
          "age": 50,
          "salary": 6000,
          "title": "业务员",
          "desc": "数据库管理"
        }
      }
    ]
  }
}

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值