昨日回顾
1. 概念
索引
文档
类型:es type
field type
映射
动态映射
自定义映射
集群
分片
副本
recovery
gateway
transport
2. 分词器
2.1 ik分词器
2.2 ik分词器如何和索引关联
2.3 全文检索
一 ElasticSearch介绍
1 全文(Context)检索(Search)工具
说得直白一点,用来帮助我们进行模糊查询在保证查询效率的情况下。早期在java领域lucune,compass。现目前市面比较流行的solr/elasticsearch。
2 elasticsearch
es是基于luceune的全文检索工具。es使用java开发的。
全文检索
模糊查询
数据分析
3 安装es
3.1 解压
[root@chancechance software]# tar -zxvf elasticsearch-6.5.3.tar.gz -C /opt/apps/
[root@chancechance elasticsearch-6.5.3]# vi /etc/profile
export ES_HOME=/opt/apps/elasticsearch-6.5.3
export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/opt/apps/jdk1.8.0_261/bin:/opt/apps/hadoop-2.8.1/bin:/opt/apps/hadoop-2.8.1/sbin:/opt/apps/hive-1.2.1/bin:$ES_HOME/bin
3.2 配置:elasticsearch.yml
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: hzbigdata-2005
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: chancechance
node.master: true
node.data: true
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /opt/apps/elasticsearch-6.5.3/data
#
# Path to log files:
#
path.logs: /opt/apps/elasticsearch-6.5.3//logs
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: 0.0.0.0
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
# 如果是全分布式,这里需要配置多个节点的ip
discovery.zen.ping.unicast.hosts: ["10.206.0.4"]
3.3 es不支持使用root账户进行启动
##1. 创建普通账户用于启动es
[root@chancechance home]# useradd chancechance
[root@chancechance home]# passwd chancechance
Changing password for user chancechance.
New password:
BAD PASSWORD: The password fails the dictionary check - it is too simplistic/systematic
Retype new password:
passwd: all authentication tokens updated successfully.
##2. 授于账户权限
[root@chancechance home]# vi /etc/sudoers
## Allow root to run any commands anywhere
root ALL=(ALL) ALL
chancechance ALL=(ALL) ALL
##3. 授予指定用户的文件所有权
[root@chancechance apps]# chown -R chancechance:chancechance elasticsearch-6.5.3/
##4. 切换到指定账户进行启动es
[root@chancechance apps]# su chancechance
[chancechance@chancechance bin]$ ./elasticsearch
3.4 发现几个问题
3.4.1 第一个问题
## max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
解决方式:
[chancechance@chancechance bin]$ sudo vi /etc/sysctl.conf
3.4.2 第二个问题
## max number of threads [xxxx] for chancechance is too low, increase to at least [xxxx]
解决方式:
[chancechance@chancechance bin]$ sudo vi /etc/security/limits.d/20-nproc.conf
* soft nproc 4096
root soft nproc unlimited
3.4.3 第三个问题
## max file descriptors [xxx] for chancechance is too low, increase to at least [xxxx]
解决方式:
[chancechance@chancechance bin]$ sudo vi /etc/security/limits.conf
* soft nofile 65536
* hard nofile 131072
* soft nproc 2048
* hard nproc 4096
重启操作系统
3.5 校验
http://146.56.208.76:9200/
{
"name" : "chancechance",
"cluster_name" : "hzbigdata-2005",
"cluster_uuid" : "RLyHYkvjS_ybK4er_re2xw",
"version" : {
"number" : "6.5.3",
"build_flavor" : "default",
"build_type" : "tar",
"build_hash" : "159a78a",
"build_date" : "2018-12-06T20:11:28.826501Z",
"build_snapshot" : false,
"lucene_version" : "7.5.0",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}
3.6 安装head插件
##1. 打开谷歌浏览器的扩展程序
##2. 在扩展程序中将开发者模式设置使之生效
##3. 加载已解压的扩展程序,选中head插件目录即可
##4. 在head插件的文本框中输入es的服务地址
二 ES的基本使用——快速入门
1 支持restful风格的访问
1.1 curl
curl www.baidu.com
method:get/post/delete/put
get:查询
post:修改
delete:删除
put:添加
-X : http请求的url
-d : 要传输的参数
-H : 指定http的头信息
语法:
curl -XPUT http://<ip>:<port>/index_name/type_name/doc
1.2 创建索引库
e.g.
[root@chancechance ~]# curl -XPUT http://10.206.0.4:9200/hzbigdata2005
{"acknowledged":true,"shards_acknowledged":true,"index":"hzbigdata2005"}
1.3 创建索引
e.g.
curl -XPUT http://10.206.0.4:9200/hzbigdata2005/student/1 -H "Content-Type:application/json" -d '{"name":"lixi", "age":"34"}'
{"_index":"hzbigdata2005","_type":"student","_id":"1","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}
1.4 查询索引
[root@chancechance ~]# curl -XGET http://10.206.0.4:9200/hzbigdata2005/student/1?pretty
{
"_index" : "hzbigdata2005",
"_type" : "student",
"_id" : "1",
"_version" : 1,
"found" : true,
"_source" : {
"name" : "lixi",
"age" : "34"
}
}
1.5 查询索引库中的所有的文档
[root@chancechance ~]# curl -XGET http://10.206.0.4:9200/hzbigdata2005/student/_search?pretty
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [
{
"_index" : "hzbigdata2005",
"_type" : "student",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"name" : "narudo",
"age" : "35"
}
},
{
"_index" : "hzbigdata2005",
"_type" : "student",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"name" : "lixi",
"age" : "34"
}
}
]
}
}
1.6 删除索引
[root@chancechance ~]# curl -XDELETE http://10.206.0.4:9200/hzbigdata2005/student/1?pretty
{
"_index" : "hzbigdata2005",
"_type" : "student",
"_id" : "1",
"_version" : 2,
"result" : "deleted",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 1,
"_primary_term" : 1
}
1.7 修改(覆盖)
[root@chancechance ~]# curl -XPOST http://10.206.0.4:9200/hzbigdata2005/student/2?pretty -H "Content-Type:application/json" -d '{"name":"lidong"}'
{
"_index" : "hzbigdata2005",
"_type" : "student",
"_id" : "2",
"_version" : 2,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 1,
"_primary_term" : 1
}
2 内置接口
URL | 描述 |
---|---|
/index/type/_search | 查询指定的索引库中的这个type中的所有的doc |
/_aliases | 获取或者向你的索引库添加一个别名 |
/index/type/_mapping | 操作映射 |
/index/type/_setting | 设置 |
/index/type/_open | 打开索引 |
/index/type/_close | 关闭索引 |
/index/type/_refresh | 刷新 |
/index/type/flush | 触发底层lucene |
3 集群状态
red:都不能用
yellow:主分片可用,但是某个或者全部副分片不可用
green:所有的主分片和副分片都可用
4 索引操作
4.1 创建索引
##1. put
curl -XPUT http://10.206.0.4:9200/hzbigdata2005/student/1?pretty -H "Content-Type:application/json" -d '{"name":"lixi", "age":"34"}'
##2. post
curl -XPOST http://10.206.0.4:9200/hzbigdata2005/student/3?pretty -H "Content-Type:application/json" -d '{"name":"linan", "age":"33"}'
二者的区别?
curl -XPOST http://10.206.0.4:9200/hzbigdata2005/student?pretty -H “Content-Type:application/json” -d ‘{“name”:“linbei”, “age”:“32”}’
post添加可以不用指定docid,它会随机的生成uuid。但是put操作必须要指定docid。
post可以做修改,而put只能做添加
4.2 查询索引
4.2.1 条件查询
curl -XGET "http://10.206.0.4:9200/hzbigdata2005/student/_search?q=name:lidong&pretty"
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.2876821,
"hits" : [
{
"_index" : "hzbigdata2005",
"_type" : "student",
"_id" : "2",
"_score" : 0.2876821,
"_source" : {
"name" : "lidong"
}
}
]
}
}
4.2.2 查询指定的属性
[root@chancechance ~]# curl -XGET "http://10.206.0.4:9200/hzbigdata2005/student/_search?q=name:linbei&_source=name&pretty"
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "hzbigdata2005",
"_type" : "student",
"_id" : "8tx5InYBuP4PMfFoIEBb",
"_score" : 0.6931472,
"_source" : {
"name" : "linbei"
}
}
]
}
}
4.2.3 分页查询
[root@chancechance ~]# curl -XGET "http://10.206.0.4:9200/hzbigdata2005/student/_search?from={1}&size={2}&pretty"
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 4,
"max_score" : 1.0,
"hits" : [
{
"_index" : "hzbigdata2005",
"_type" : "student",
"_id" : "8tx5InYBuP4PMfFoIEBb",
"_score" : 1.0,
"_source" : {
"name" : "linbei",
"age" : "32"
}
},
{
"_index" : "hzbigdata2005",
"_type" : "student",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"name" : "lixi1"
}
}
]
}
}
4.3 修改索引
[root@chancechance ~]# curl -XPOST "http://10.206.0.4:9200/hzbigdata2005/student/3/_update?pretty" -d '{"doc":{"name":"sakura"}}' -H "Content-Type:application/json"
{
"_index" : "hzbigdata2005",
"_type" : "student",
"_id" : "3",
"_version" : 2,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 1,
"_primary_term" : 1
}
4.4 批量操作:批量添加
4.4.1 命令
curl -XPOST "http://10.206.0.4:9200/hzbigdata2005/student/_bulk?pretty" -H "Content-Type:application/json" --data-binary "@/home/student.json"
4.4.2 student.json
奇数行:元数据信息
偶数行:具体的信息
{"index":{"_id":"4"}}
{"name":"hehe", "age":"11"}
{"index":{"_id":"5"}}
{"name":"haha", "age":"12"}
{"index":{"_id":"6"}}
{"name":"xixi", "age":"13"}
三 es可视化插件——Kibana(了解)
1 安装
[root@chancechance software]# tar -zxvf kibana-6.5.3-linux-x86_64.tar.gz -C /opt/apps/
[root@chancechance apps]# mv kibana-6.5.3-linux-x86_64/ kibana-6.5.3
[root@chancechance kibana-6.5.3]# vi /etc/profile
export KIBANA_HOME=/opt/apps/kibana-6.5.3
export PATH=$PATH:$KIBANA_HOME/bin
[root@chancechance config]# vi kibana.yml
server.port: 5601
server.host: "0.0.0.0"
server.name: "chancechance"
elasticsearch.url: "http://10.206.0.4:9200"
2 执行kibana
##1. 后台启动es
nohup elasticsearch > $ES_HOME/logs/startup.log 2>&1 &
##2. 后台启动kibana
nohup kibana serve > $KIBANA_HOME/logs/startup.log 2>&1 &
curl -XPUT http://10.206.0.4:9200/hzbigdata2005/student/1?pretty -H "Content-Type:application/json" -d '{"name":"lixi", "age":34}'
3 使用kibana
四 es基本概念
1 通用概念
1.1 index概念(索引库)
在es中index是对逻辑数据的逻辑存储。由于它本身的结构,决定了它的检索效率是非常高的。es可以把一个索引存放在一台服务器上,也可以分别存储在多台服务器上。每个索引有一个或多个shard(分片)构成。每隔分片可以有多个relicas(副本)。
一个集群中可以定义多个索引(索引库),但是一个索引中只能由一个type(类型/索引)。
1.2 document概念
存在es重要的数据就叫做文档。在es中所有的文档只能由一个type(索引)。在同一个索引库中,文档中的相同字段只能由一个类型
1.2.1 创建文档
语法:
curl -XPUT es_url/type/{id} \
-H "Content-Type:application/json" \
-d '{name:value}'
## 指定文档id
curl -XPUT http://10.206.0.4:9200/hzbigdata2005/student/1 -H "Content-Type:application/json" -d '{"name":"lixi", "age":"34"}'
## 自增id
curl -XPOST http://10.206.0.4:9200/hzbigdata2005/student -H "Content-Type:application/json" -d '{"name":"lixi", "age":"34"}'
1.2.2 获取文档
##1. 查询文档
curl -XGET http://10.206.0.4:9200/hzbigdata2005/student/1?pretty
tip:
在任何url添加pretty可以达到美化json输出
##2. 带上响应头
curl -XGET http://10.206.0.4:9200/hzbigdata2005/student/1?pretty -i
HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 164
{
"_index" : "hzbigdata2005",
"_type" : "student",
"_id" : "1",
"_version" : 1,
"found" : true,
"_source" : {
"name" : "lixi",
"age" : 34
}
}
##3. 检索文档的一部分功能:详细参考索引操作
##4. 检索多个文档
curl -XGET http://10.206.0.4:9200/hzbigdata2005/student/_mget?pretty -i \
-H "Content-Type:application/json" \
-d '{
"docs":[
{
"_index":"hzbigdata2005",
"_type":"student",
"_id":1,
"_source":"name"
},
{
"_index":"hzbigdata2005",
"_type":"student",
"_id":2
}
]
}'
HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 419
{
"docs" : [
{
"_index" : "hzbigdata2005",
"_type" : "student",
"_id" : "1",
"_version" : 1,
"found" : true,
"_source" : {
"name" : "lixi"
}
},
{
"_index" : "hzbigdata2005",
"_type" : "student",
"_id" : "2",
"_version" : 1,
"found" : true,
"_source" : {
"name" : "lixi2",
"age" : 35
}
}
]
}
1.3 field type概念
1.3.1 elasticsearch type
在es6之后,一个index只能有一个type。换言之,在6之前,一个index可以有多个type
1.3.2 field type
类别 | 类型 | ||||
---|---|---|---|---|---|
字符串 | text、keyword | ||||
数值类型 | long、integer、short、byte、float、half_float、scaled_float | ||||
日期类型 | date | ||||
布尔类型 | boolean | ||||
二进制类型 | binary | ||||
范围类型 | integer_range、float_range、long_range、double_range、date_range | ||||
数组类型 | array | ||||
对象类型 | object | ||||
嵌套类型 | nested object | ||||
地理位置 | geo_point、geo_sharp |
1.4 map映射的概念
1.4.1 创建mapping
curl -XPUT "http://10.206.0.4:9200/hzbigdata2004?pretty" -i \
-H "Content-Type:application/json" \
-d '{
"mappings":{
"doc":{
"properties":{
"username":{
"type":"text",
"fields":{
"pinyin":{
"type":"text"
}
}
}
}
}
}
}'
14.2 查询mapping
curl -XGET "http://10.206.0.4:9200/hzbigdata2004/_mapping?pretty" -i
HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 294
{
"hzbigdata2004" : {
"mappings" : {
"doc" : {
"properties" : {
"username" : {
"type" : "text",
"fields" : {
"pinyin" : {
"type" : "text"
}
}
}
}
}
}
}
}
14.3 动态映射
它会根据json的key的value,反向的在mapping中生成field的type
json类型 | es类型 |
---|---|
null | 忽略 |
boolean | boolean |
浮点类型 | float |
整数 | long |
object | object |
array | 由数组中的第一个非null元素的类型决定 |
string | text |
tip:
mapping中的字段类型一旦决定,禁止修改
curl -XPUT "http://10.206.0.4:9200/hzbigdata2003?pretty" -i \
-H "Content-Type:application/json" \
-d '{
"mappings":{
"doc":{
"dynamic":false,
"properties":{
"username":{
"type":"text",
"dynamic":true,
"fields":{
"pinyin":{
"type":"text"
}
}
}
}
}
}
}'
“dynamic”:
- true:允许新增字段(默认配置)
- false:不允许自动新增字段,但是文档可以正常写入,无法对字段进行查询操作
- strict:文档不能写入
14.4 动态映射——识别date类型
14.4.1 动态的映射日期格式
curl -XPUT "http://10.206.0.4:9200/hzbigdata2003/user/1?pretty" -i \
-H "Content-Type:application/json" \
-d '{
"username":"lixi",
"birth":"1986-11-25"
}'
14.4.2 自定义日期格式识别
##1. 设置日期格式
curl -XPUT "http://10.206.0.4:9200/hzbigdata2003?pretty" -i \
-H "Content-Type:application/json" \
-d '{
"mappings":{
"user":{
"dynamic_date_formats":["yyyy:MM:dd", "yyyy-MM-dd"]
}
}
}'
##2. 插入数据
curl -XPUT "http://10.206.0.4:9200/hzbigdata2003/user/1?pretty" -i \
-H "Content-Type:application/json" \
-d '{
"username":"lixi",
"birth":"1986:11:25"
}'
14.4.3 关闭日期识别
##1. 关闭日期格式识别
curl -XPUT "http://10.206.0.4:9200/hzbigdata2003?pretty" -i \
-H "Content-Type:application/json" \
-d '{
"mappings":{
"user":{
"date_detection":false
}
}
}'
##2. 插入数据
curl -XPUT "http://10.206.0.4:9200/hzbigdata2003/user/1?pretty" -i \
-H "Content-Type:application/json" \
-d '{
"username":"lixi",
"birth":"1986-11-25"
}'
2 核心组件的概念
2.1 cluster
es的集群。集群中有多个节点,其中有一个为主节点,这个主节点一般都是选举产生的。而且这个主节点是从集群内部来说。对于外部来说他们都是相同的节点,没有主从之分。去中心化的分布式集群。换言之,你在任何集群中的节点访问数据都一样。
主节点主要负责管理集群的状态。
查看集群的状态:
curl -XGET -H "Content-Type:application/json" 'http://10.206.0.4:9200/_cluster/health?pretty'
2.2 shards
可以在创建索引的时候就指定它的分片,你可以理解:spark之于rdd,kafka中的partition
设置分片:
curl -XPUT -H "Content-Type:application/json" 'http://10.206.0.4:9200/hzbigdata2002?pretty' \
-d '{
"settings":{
"number_of_shards": "3",
"number_of_replicas": "1"
}
}'
2.3 replicas
索引的副本,保证系统的容错性。当节点挂掉的时候可以从副本中恢复数据。
2.4 recovery
数据的重分布,当es中有新的节点加入或者删除了节点的时候,内部的数据会进行重新分配。
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-1tx9lrox-1607325433528)(001.png)]
2.5 gateway
es的持久化存储方式,es默认是将数据存在内存中。当内存满了的时候会将数据溢出到磁盘中。当我们重启的时候会从gateway中读取索引数据。
es支持很多gateway类型,可以默认的本地系统磁盘,也可以是hdfs。。。
2.6 discovery.zen
自动发现机制。es基于p2p,他会先广播寻找存在的节点,找到了之后再通过广播进行节点于节点的通信。
禁用了自动发现机制:
discovery.zen.ping.multicast.enabled: true/false
设置节点再启动的时候能够被发现的列表
discovery.zen.ping.unicast.hosts: ["10.206.0.4"]
2.7 Transport
es内部节点或者集群于客户端之间的交互方式。内部使用tcp协议进行交互,同时也支持http协议、thift、servlet、nosql、MQ
五 分词器
1 默认的分词器
##1. 英文分词
curl -XGET -H "Content-Type:application/json" 'http://10.206.0.4:9200/_analyze?pretty' \
-d '{
"text":"Although I am very handsome, but I am very low-key"
}'
##2. 中文分词
curl -XGET -H "Content-Type:application/json" 'http://10.206.0.4:9200/_analyze?pretty' \
-d '{
"text":"我虽然很帅,但是我很低调"
}'
2 ik中文分词器
2.1 安装
##1. 安装unzip
yum -y install unzip
##2. 上传ik分词器
##3. 拷贝zip到指定目录
[root@chancechance plugins]# mkdir -p $ES_HOME/plugins/ik && mv /opt/software/elasticsearch-analysis-ik-6.5.3.zip ./ik
##4. 解压缩
[root@chancechance ik]# unzip elasticsearch-analysis-ik-6.5.3.zip && rm -f elasticsearch-analysis-ik-6.5.3.zip
##5. 如果是全分布式,就得将这个ik目录分发给其他的节点
##6. 重启es即可
2.2 测试ik分词器
##1. 使用ik分词器对中文进行分词
curl -XGET -H "Content-Type:application/json" 'http://10.206.0.4:9200/_analyze?pretty' \
-d '{
"analyzer":"ik_max_word",
"text":"我虽然很帅,但是我很低调"
}'
##2. 支持英文分词
curl -XGET -H "Content-Type:application/json" 'http://10.206.0.4:9200/_analyze?pretty' \
-d '{
"analyzer":"ik_max_word",
"text":"Although I am very handsome, but I am very low-key"
}'
2.3 创建索引库并指定分词策略
curl -XPUT -H "Content-Type:application/json" 'http://10.206.0.4:9200/chinese?pretty' \
-d '{
"settings":{
"number_of_shards": "3",
"number_of_replicas": "1",
"analysis":{
"analyzer":{
"ik":{
"tokenizer":"ik_max_word"
}
}
}
},
"mappings":{
"test":{
"properties":{
"content":{
"type":"text",
"analyzer":"ik_max_word",
"search_analyzer":"ik_max_word"
}
}
}
}
}'
2.4 向安装ik分词器的索引库中插入记录
curl -XPUT -H "Content-Type:application/json" 'http://10.206.0.4:9200/chinese/test/1?pretty' \
-d '{
"content":"麦克乔丹是一名伟大的nba篮球运动员"
}'
curl -XPUT -H "Content-Type:application/json" 'http://10.206.0.4:9200/chinese/test/2?pretty' \
-d '{
"content":"他率领美国篮球队获取到了奥运会和nba的冠军,是所有篮球运动员中的翘楚"
}'
curl -XPUT -H "Content-Type:application/json" 'http://10.206.0.4:9200/chinese/test/3?pretty' \
-d '{
"content":"美国篮球产生了很多伟大的篮球运动员,如科比、詹姆斯等等"
}'
2.5 全文检索,模糊查询
curl -XGET -H "Content-Type:application/json" 'http://10.206.0.4:9200/chinese/_search?pretty' \
-d '{
"query":{
"match":{
"content":"冠军"
}
}
}'
六 Java API
1 导入依赖
<!-- es -->
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>transport</artifactId>
<version>6.5.3</version>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<version>1.18.8</version>
</dependency>
<!-- fastjson -->
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>1.2.71</version>
</dependency>
2 快速入门
package cn.qphone.es;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.transport.TransportAddress;
import org.elasticsearch.transport.client.PreBuiltTransportClient;
import org.junit.Test;
import java.net.InetAddress;
import java.net.UnknownHostException;
public class Demo1_quickStart {
public static void main(String[] args) throws UnknownHostException {
//1. 获取到es的核心类:TransportClient
//1.1 Setting
Settings settings = Settings.builder()
.put("cluster.name", "hzbigdata2005")
.build();
//1.2 获取到核心类
TransportClient client = new PreBuiltTransportClient(settings);
//1.3 添加es的集群地址
TransportAddress[] transportAddresses = {
new TransportAddress(InetAddress.getByName("chancechance"), 9300)
};
client.addTransportAddresses(transportAddresses);
}
@Test
public void test() throws UnknownHostException {
System.out.println(InetAddress.getByName("chancechance"));
}
}
3 封装ElasticSearchUtils
package cn.qphone.utils;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.transport.TransportAddress;
import org.elasticsearch.transport.client.PreBuiltTransportClient;
import java.net.InetAddress;
public class ElasticSearchUtils {
private static TransportClient client;
static {
try {
Settings settings = Settings.builder()
.put("cluster.name", "hzbigdata2005")
.build();
client = new PreBuiltTransportClient(settings);
TransportAddress[] transportAddresses = {
new TransportAddress(InetAddress.getByName("chancechance"), 9300)
};
client.addTransportAddresses(transportAddresses);
}catch (Exception e) {
e.printStackTrace();
}
}
public static TransportClient getClient() {
return client;
}
}
4 关于代码CRUD
package cn.qphone.es;
import cn.qphone.utils.ElasticSearchUtils;
import org.elasticsearch.client.transport.TransportClient;
public class Demo2_CRUD {
private static TransportClient client = ElasticSearchUtils.getClient();
public static void main(String[] args) {
//1. 创建
/*
* curl -XPUT -H "Content-Type:application/json" 'http://10.206.0.4:9200/chinese/test/1?pretty' \
-d '{
"content":"麦克乔丹是一名伟大的nba篮球运动员"
}'
{"_index":"hzbigdata2005","_type":"student","_id":"1","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}
*/
// String json = "{\"namenode\":\"qphone01\", \"datanode\":\"qphone02\"}";
// IndexResponse indexResponse = client.prepareIndex("hadoop", "hdfs")
// .setSource(json, XContentType.JSON)
// .get();
// System.out.println("version : " + indexResponse.getVersion());
// System.out.println("index : " + indexResponse.getIndex());
// System.out.println("type : " + indexResponse.getType());
//2. 查询
// GetResponse getResponse = client.prepareGet("hadoop", "hdfs", "jPufK3YB6ppBFcv2xD7l").get();
// String json = getResponse.getSourceAsString();
// System.out.println(json);
// System.out.println(getResponse.getSource());
// System.out.println(getResponse.getSourceAsMap());
// System.out.println(getResponse.getIndex());
//3. 删除
// DeleteResponse deleteResponse = client.prepareDelete("hadoop", "hdfs", "jPufK3YB6ppBFcv2xD7l").get();
// System.out.println(deleteResponse.getIndex());
// System.out.println(deleteResponse.getResult());
}
}
5 全文检索代码
package cn.qphone.es;
import cn.qphone.utils.ElasticSearchUtils;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.search.SearchType;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;
public class Demo3_Search {
/*
* curl -XGET -H "Content-Type:application/json" 'http://10.206.0.4:9200/chinese/_search?pretty' \
-d '{
"query":{
"match":{
"content":"冠军"
}
}
}'
*/
public static void main(String[] args) {
//1. 获取到全文检索的响应对象
TransportClient client = ElasticSearchUtils.getClient();
SearchResponse searchResponse = client.prepareSearch("chinese")
/*
* QUERY_THEN_FETCH : 伪分布式
* DFS_QUERY_THEN_FETCH : 全分布式
*/
.setSearchType(SearchType.QUERY_THEN_FETCH)
/*
* matchQuery : select * from t where xxx like
* matchAllQuery : select * from t
* termQuery : select * from t where xxx =
*/
.setQuery(QueryBuilders.matchQuery("content", "运动员"))
.get();
//2. 打印数据
/*
{
"took" : 14,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.2876821,
"hits" : [
{
"_index" : "chinese",
"_type" : "test",
"_id" : "2",
"_score" : 0.2876821,
"_source" : {
"content" : "他率领美国篮球队获取到了奥运会和nba的冠军,是所有篮球运动员中的翘楚"
}
}
]
}
}
*/
SearchHits hits = searchResponse.getHits();
long total = hits.getTotalHits();
float maxScore = hits.getMaxScore();
System.out.println("total hits : " + total);
System.out.println("max score : " + maxScore);
SearchHit[] searchHits = hits.getHits(); // 包含了具体记录数据
for (SearchHit searchHit : searchHits) {
System.out.println("index : " + searchHit.getIndex());
System.out.println("type : " + searchHit.getType());
System.out.println("id : " + searchHit.getId());
System.out.println("content : " + searchHit.getSourceAsString());
}
}
}