ElasticSearch part2

最新推荐文章于 2024-04-13 03:31:31 发布

H0_0P

最新推荐文章于 2024-04-13 03:31:31 发布

阅读量266

点赞数

分类专栏：数据库文章标签： elasticsearch

本文链接：https://blog.csdn.net/H0_0P/article/details/85252912

版权

数据库专栏收录该内容

4 篇文章 0 订阅

订阅专栏

ElasticSearch part2

安装ES

下载包并解压（开箱即用）
windows下cmd下运行elasticsearch.bat
检验是否成功，浏览器下访问http://localhost:9200/?pretty

{
	name: "PuFTRWU",  //node名称
	cluster_name: "elasticsearch",  //集群名称（默认elasticsearch）
	cluster_uuid: "Fl4hcPzAS9-tyy9TgFPAKw", 
	version: {  //es版本号
		number: "6.5.2",
		build_flavor: "default",
		build_type: "zip",
		build_hash: "9434bed",
		build_date: "2018-11-29T23:58:20.891072Z",
		build_snapshot: false,
		lucene_version: "7.5.0",
		minimum_wire_compatibility_version: "5.6.0",
		minimum_index_compatibility_version: "5.0.0"
	},
	tagline: "You Know, for Search"
}

在config/elasticsearch.yml里面修改es配置

kibana

Kibana是一个开源分析和可视化平台，旨在与Elasticsearch协同工作

elasticsearch的图形化界面工具

下载包并解压
bin/kibana.bat
浏览器访问http://localhost:5601/

创建索引

mapping里面规定了字段类型，也是以key：value的方式存储，是为了方便搜索，在搜索的时候会在相应类型的字段里面去搜索

put /index
{		
	"setting": {
        "number_of_shards": 5, 
    	"number_of_replicas": 1
	},
	"mapping": {
        //定义数据结构
        "my_type": {
            "properties": {
                "field": {
                    "type": ""
                }
            }
        }
	}
}

简单的集群管理操作

快速检索集群的健康状况

get _cat/health?v

epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1544716861 16:01:01 elasticsearch green 1 1 1 1 0 0 0 0 - 100.0%

status：

green——每个索引的primary shard和replica shard都是active状态

yellow——每个索引的primary shard都是active状态，部分replica shard不可用

red——部分primary shard不可用

快速查看集群中的索引

get _cat/indices?v

health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open .kibana_1 XDUVeEKpTX6ylkSX6zDNJA 1 0 3 0 11.9kb 11.9kb

简单的索引操作

put

delete

put test_index?pretty   // 创建索引 test_index是索引名

get _cat/indices //查看索引
// 结果
yellow open test_index CiUD-eEMTpmZRfo5R9WcuQ 5 1 0 0  1.1kb  1.1kb
green  open .kibana_1  XDUVeEKpTX6ylkSX6zDNJA 1 0 3 0 11.9kb 11.9kb

delete test_index?pretty  // 删除索引

get _cat/indices
// 结果
green open .kibana_1 XDUVeEKpTX6ylkSX6zDNJA 1 0 3 0 11.9kb 11.9kb

对document的操作

新增document

指定id

put /index/type/id      //  索引/类型/记录id
{
    "field" : value
}

新增document的过程中，es会自动创建index和type

es自动生成id

post /index/type
{
    "field" : value
}

“_id” : “wl3OvGcBxixzDxFAyNjD”

使用GUID算法生成的id长度20，不会重复

查询document

get /index/type/id

更新document

全量替换（put时id已经存在）

同新增，使用put，必须带上所有field修改，不然数据会丢失

部分替换（partial update）

post /index/type/id/_update
{
    "doc" : {
        "field" : value
    }
}

2种方式原理相似：原document被mark为deleted，产生新的document

部分替换的查询，修改，写回操作都在一个shard中进行，避免网络传输提高性能；时间短，减少冲突时间（总结就是post比put修改快！！！）

删除document

delete /index/type/id/？pretty

没有进行物理删除，而是标记为isDelete，之后删除

CRUD test

command 1    新增文档
put /user/student/1
{
  "studentName": "huping",
  "studentAge": 20
}

result 1
{
  "_index" : "user",
  "_type" : "student",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

command 2    查询文档
get /index/type/1

result 2
{
  "_index" : "user",
  "_type" : "student",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "studentName" : "huping",
    "studentAge" : 20
  }
}

command 3    修改文档   ！！！需要带上所有field，不然数据丢失
put /user/student/1
{
  "studentName": "huping",
  "studentAge": 21
}

result 3    版本号变为2  result是update
{
  "_index" : "user",
  "_type" : "student",
  "_id" : "1",
  "_version" : 2,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 1,
  "_primary_term" : 1
}

command 4    修改文档
post /user/student/1/_update
{
  "doc" : {
    "studentAge": 18    我不管，永远18，嘿嘿
  }
}

result 4
{
  "_index" : "user",
  "_type" : "student",
  "_id" : "1",
  "_version" : 3,
  "found" : true,
  "_source" : {
    "studentName" : "huping",
    "studentAge" : 18
  }
}

command 5
delete /user/student/1

command 6
get /user/student/1

result 6
{
  "_index" : "user",
  "_type" : "student",
  "_id" : "1",
  "found" : false
}

搜索restful API（_search）

query string search

搜索所有记录

GET /user/student/_search

有条件搜索(q=)

GET /user/student/_search?q=studentName:huping&sort=studentAge:desc

_all field搜索   document所有field里面搜索字符串里面有huping的
GET /user/student/_search?q=huping

query DSL(Domain Specified Language)

相当于sql语句

https://blog.csdn.net/jiaminbao/article/details/80105636 youxiu

http request body
query 和 filter
term （用在不分词的字段上，完全匹配）与match
多搜索条件组合查询 (bool)

must（必须匹配，类似于数据库的 =），must_not（必须不匹配，类似于数据库的 !=），should（没有强制匹配，类似于数据库的 or），filter（过滤）

简单例子

查询所有记录（match_all）的指定字段（_source）并分页（from size）

GET /user/student/_search
{
  "query": {
    "match_all": {}
  },
  "_source": "studentName", 
  "from": 0,
  "size": 2
}

条件查询（match）排序（sort）

GET /user/student/_search
{
  "query": {
    "match": {
      "studentName": "huping"
    }
  },
  "sort": [
    {
      "studentAge": {
        "order": "desc"
      }
    }
  ]
}

query filter

studentName.keyword和studentName的区别

studentName.keyword，字段和输入参数的完全匹配（精确查询）

studentName，字段里面包括输入参数（模糊查询）

GET /user/student/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "studentName": "huping"
          }
        }
      ],
      "filter": {
        "range": {
          "studentAge": {
            "gte": 10,
            "lte": 20
          }
        }
      }
    }
  }
}

full-text search

GET /user/student/_search
{
  "query": {
    "match": {
      "studentName": "huping qaq"
    }
  }
}

result：
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "user",
        "_type" : "student",
        "_id" : "2",
        "_score" : 0.2876821,
        "_source" : {
          "studentName" : "qaq",
          "studentAge" : 21
        }
      },
      {
        "_index" : "user",
        "_type" : "student",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "studentName" : "huping",
          "studentAge" : 18
        }
      },
      {
        "_index" : "user",
        "_type" : "student",
        "_id" : "3",
        "_score" : 0.2876821,
        "_source" : {
          "studentName" : "huping ya",
          "studentAge" : 20
        }
      }
    ]
  }
}

会将给的参数“huping qaq”拆分成关键字“huping”和“qaq”进行检索

所以huping，qaq，huping ya都能被匹配到

匹配度根据每个hit里面的*_score*衡量

phrase search

完全包含一样的匹配（不会拆分）（match_phrase）

GET /user/student/_search
{
  "query": {
    "match_phrase": {
      "studentName": "huping qaq"
    }
  }
}

highlight search

highlight

GET /user/student/_search
{
  "query": {
    "match": {
      "studentName": "huping qaq"
    }
  },
  "highlight": {
    "fields": {
      "studentName": {}
    }
  }
}


	"hits" : [
      {
        "_index" : "user",
        "_type" : "student",
        "_id" : "2",
        "_score" : 0.2876821,
        "_source" : {
          "studentName" : "qaq",
          "studentAge" : 21
        },
        "highlight" : {
          "studentName" : [
            "<em>qaq</em>"
          ]
        }
      },

聚合搜索

 "aggs": {
    "NAME": {
      "AGG_TYPE": {}
    }
  }
  
  AGG_TYPE可以为avg，term，range等

primary shard & replica shard

primary shard在创建索引的时候固定了(因为涉及到了document路由算法)，replica shard可以在创建索引之后再改变

replica shard是primary shard的副本，不能在一个node上，否则不能保证容错

PUT /index_test
{
  "settings": {
    "number_of_shards": 5, 
    "number_of_replicas": 1
  }
}

document路由

一个index被存放在多个shard上，创建document时放在哪个shard上——路由

路由算法：
shard = hash(routing number) % number_of_primary_shards

每次增删改查会带routing number，默认为文档id，可以手动指定

请求可以给任意node，接收到请求的node会路由给相应的primary shard（节点对等，负载均衡）

document元数据

_index（索引名，小写，不能用下划线开头，不能含逗号；相似的数据及字段大部分一样的数据存在一个index中，会拥有较好的查询性能）

_type（类型，大写或小写，不能用下划线开头，不能含逗号；个别字段不同的数据type不能同）

_id（/index/type/id是document的唯一标志，可以手动指定put，也可由es自动创建post）

_score（与搜索文本的关联度，越高关联度越强）

_source（数据，搜索时可使用其指定需要查询的字段）

elasticsearch并发（乐观锁）

乐观锁（根据数据版本号判断数据是否被修改过）
悲观锁（x锁 s锁）

document第一次创建的时候_version是1，之后再put，post，delete时+1

delete也是_version+1，因为不是马上进行物理删除（可进行实验，先delete再put，version数据仍然保留）

external version——可以不使用内部_version，使用外部version进行版本控制

?version=1 (内部)

version需要与_version相同，才能操作

?version=1&version=external（外部）

version > _version，就能操作

retry_on_conflict（重复次数控制的参数）

批量操作

减少网络请求次数

mget

GET user/student/_mget
{
  "docs": [
    {
      "_id": "1"
    },
    {
      "_id": "2"
    }
  ]
}

bulk

POST /_bulk
{"opType" : {"metadata"}}
{"data"}
{"opType" : {"metadata"}}
{"data"}

opType——DETETE（删除只需要指定{“metadata”},不需要跟{“data”}）,CREATE（put新增）,INDEX(put操作，新增或全量替换),UPDATE(partial update)
{“metadata”}——{"_index":"", “_type”:"", “_id”:""}
{“data”}——{“field”:“value”}

优点，每个操作都可以发到不同shard上（性能！！！不用在一个节点上处理所有操作，数据较多时不会只占用一个节点的内存）

分布式一致性

consistency参数控制

one（one primary shard is active）

all（all shard is active）

quorum（default，most shard is active）

multi-index 和 multi-type

多个index或type的搜索

/_search
/index1,index2/_search
/_all/type1,type2/_search
/type*/_search

exact value & full text

精确搜索和全文检索

精确搜索——不拆分

全文检索——关键字拆分查询
建立倒排索引时除了分词之外，还会进行normalization（对每个词进行处理，大小写、同义词、缩写…）

分词器

standard analyzer（default）、simple analyzer、whitespace analyzer、language analyzer

不同的分词器对于特殊符号（短横-，括号（）…）的处理方式不同

GET /_analyze
{
  "analyzer": "standard",
  "text": "text to analyze"
}

mapping

在es里面插入document时，mapping（es会自动创建，dynamic mapping）里面会定义每个field的类型

数据类型

简单类型

string

byte，short，integer，long

float，double

boolean

date

复杂类型

object
mapping也可以手动生成，指定分词器，指定field类型

reindex

java程序使用的是别名

_alias将old index别名为alias index

新建new index，将old index data通过/_bluk迁移到new index

将alias index切换到new index

score算法

relevance score算法——TF/IDF(term frequency/inverse document frequency)

倒排索引 & 正排索引

倒排索引（搜索）

doc1：hello，nice to meet you

doc2：hello world

word	doc1	doc2
hello	*	*
nice	*
to	*
meet	*
you	*
world		*

hello——》doc1，doc2

nice——》doc1

world——》doc2

正排索引（排序，过滤）

doc1：{“name”：“jack”， “age”：22}

doc2：{name”：“rose”， “age”：20}

document	name	age
doc1	jack	22
doc2	rose	20

JAVA API

emmm，挺多的，和DSL对应

https://juejin.im/post/5b3ac6db6fb9a024fc284e60#heading-28

参考文档 https://www.cnblogs.com/jajian/p/9976900.html

https://blog.csdn.net/jiaminbao/article/category/7314565

学习视频 https://www.bilibili.com/video/av29521652?from=search&seid=14185113444924508715