一、简介
Elasticsearch是基于Lucene的开源分布式搜索引擎,大幅降低了PB级海量数据存储、检索,分析门槛。它的特点是:
- 分布式实时文件存储、检索、分析
- 零配置、集群自动发现
- 索引自动分片、副本机制
- RESTful风格接口
- 多数据源
- 自动搜素负载
二、分词
Elasticsearch实现分词默认的有三种分词器:
(1).standard:将text文本拆成一个一个汉字或单词
GET _analyze
{
"analyzer": "standard"
, "text": "中华人民共和国"
}
GET _analyze
{
"analyzer": "standard"
, "text": "I love es and kibana"
}
(2).ik_smart:主要针对中文词句进行分词,根据其词库做相应的分割,
GET _analyze
{
"analyzer": "ik_smart"
, "text": "我喜欢吃苹果"
}
结果为:
{
"tokens": [
{
"token": "我",
"start_offset": 0,
"end_offset": 1,
"type": "CN_CHAR",
"position": 0
},
{
"token": "喜欢吃",
"start_offset": 1,
"end_offset": 4,
"type": "CN_WORD",
"position": 1
},
{
"token": "苹果",
"start_offset": 4,
"end_offset": 6,
"type": "CN_WORD",
"position": 2
}
]
}
(3).ik_max_word:对中文词句进行精细化分割、
GET _analyze
{
"analyzer": "ik_max_word"
, "text": "我喜欢吃苹果"
}
结果为:
{
"tokens": [
{
"token": "我",
"start_offset": 0,
"end_offset": 1,
"type": "CN_CHAR",
"position": 0
},
{
"token": "喜欢吃",
"start_offset": 1,
"end_offset": 4,
"type": "CN_WORD",
"position": 1
},
{
"token": "喜欢",
"start_offset": 1,
"end_offset": 3,
"type": "CN_WORD",
"position": 2
},
{
"token": "吃",
"start_offset": 3,
"end_offset": 4,
"type": "CN_CHAR",
"position": 3
},
{
"token": "苹果",
"start_offset": 4,
"end_offset": 6,
"type": "CN_WORD",
"position": 4
}
]
}
三、增加索引
1.直接创建索引名称,不加任何参数设置,默认的就是5个分片,1个副本
PUT stu
2.可以对分片和副本进行设置
PUT student
{
"settings": {
"number_of_shards": 3
, "number_of_replicas": 2
}
}
设置student的索引名称,其分片是3个,副本2个(这里副本个数不能超过子节点个数)
3.对索引类型,字段名进行设置
PUT stu
{
"settings": {
"number_of_shards": 2
} ,
"mappings": {
"_doc":{
"properties": {
"name":{"type": "keyword"},
"age":{"type": "keyword"},
"gender":{"type": "keyword"}
}
}
}
}
这里索引名称是stu,索引类型是_doc,字段为:“name”,“age”,“gender”,字段的类型都是keyword.
四、使用bulk进行批量操作
1.增加
POST /_bulk
{"create":{"_index":"stu","_type":"_doc","_id":"2"}}
{"name":"xxx","age":"23","gender":"男"}
{"create":{"_index":"stu","_type":"_doc","_id":"3"}}
{"name":"yyy","age":"25","gender":"女"}
2.修改
POST /_bulk
{"update":{"_index":"stu","_type":"_doc","_id":"1"}}
{"doc":{"age":"25"}}
{"update":{"_index":"stu","_type":"_doc","_id":"2"}}
{"doc":{"age":"25"}}
3.删除
POST /_bulk
{"delete":{"_index":"stu","_type":"_doc","_id":"1"}}
{"delete":{"_index":"stu","_type":"_doc","_id":"2"}}
五、查询
1.mget用法
(1)根据id查询,返回的是多条数据
GET stu/_mget
{
"ids" : [ "1", "2" ]
}
(2)根据不同索引中不同属性查找
GET /_mget
{
"docs":[
{ "_index": "stu", "_id":"1"},
{"_index":"abc", "_id":"2"}
]
}
2.match用法
(1)match:只要查询的字段中有一个与之匹配,就可以查出对应的数据(模糊查询)
GET stu/_search
{
"query":{
"match":{
"name":"john"
}
}
}
(2)match_phrase:精确查找,查找的字段值要与结果一一对应
GET stu/_search
{
"query":{
"match_phrase":{
"name":"John kery"
}
}
}
(3)match_phrase_prefix:一个分词作为前缀匹配
GET stu/_search
{
"query":{
"match_phrase_prefix":{
"name":"John ke"
}
}
}
(4)multi_match:多字段查询,包含john或喜欢cooking的都可以查出来
GET stu/_search
{
"query": {
"multi_match": {
"query": "john likes cooking",
"fields": ["name","interest"]
}
}
}
3.term用法
(1)term:精确查找,按照存储在倒排索引中的确切字词,进行匹配(倒排索中的字词都是全小写)
GET def/stu/_search
{
"query": {
"term": {
"name":"john"
}
}
}
(2)terms:多词条查询
GET def/stu/_search
{
"query": {
"terms": {
"name":["john","da"]
}
}
}
4.range用法
GET def/stu/_search
{
"query":{
"range":{
"gpa":{
"gte":3.0,
"lte":4.0
}
}
}
}
5.bool查询
GET def/stu/_search
{
"query": {
"bool": {
"must": {
"match": { "interest": "cooking"} },
"must_not": {
"range": { "yearOfBorn": { "gte": 1995, "lte": 2000 }}}
}
}
}
6.其他查询
(1)GET def/stu/1 --查询索引为def,类型为stu,id为1的学生的信息
(2)GET def/stu/_search?q=interest:soccer --查询爱好为soccer的学生信息
(3)GET def/stu/_search --查询所有信息
(4)GET def/stu/1/_source --查询信息,不显示元数据