ElasticSearch笔记

最新推荐文章于 2023-12-04 18:38:09 发布

Vln

最新推荐文章于 2023-12-04 18:38:09 发布

阅读量299

点赞数

分类专栏：大数据笔记文章标签： elasticsearch 大数据

本文链接：https://blog.csdn.net/qq_40469797/article/details/110819432

版权

大数据笔记专栏收录该内容

1 篇文章 0 订阅

订阅专栏

昨日回顾

1. 概念
索引
文档
类型：es type
	field type
映射
	动态映射
	自定义映射
	
集群
分片
副本
recovery
gateway
transport

2. 分词器
2.1 ik分词器
2.2 ik分词器如何和索引关联
2.3 全文检索

一 ElasticSearch介绍

1 全文（Context）检索（Search）工具

	说得直白一点，用来帮助我们进行模糊查询在保证查询效率的情况下。早期在java领域lucune，compass。现目前市面比较流行的solr/elasticsearch。

2 elasticsearch

es是基于luceune的全文检索工具。es使用java开发的。
全文检索
模糊查询
数据分析

3 安装es

3.1 解压

[root@chancechance software]# tar -zxvf elasticsearch-6.5.3.tar.gz -C /opt/apps/

[root@chancechance elasticsearch-6.5.3]# vi /etc/profile
export ES_HOME=/opt/apps/elasticsearch-6.5.3
export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/opt/apps/jdk1.8.0_261/bin:/opt/apps/hadoop-2.8.1/bin:/opt/apps/hadoop-2.8.1/sbin:/opt/apps/hive-1.2.1/bin:$ES_HOME/bin

3.2 配置：elasticsearch.yml

# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: hzbigdata-2005

# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: chancechance
node.master: true
node.data: true

# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /opt/apps/elasticsearch-6.5.3/data
#
# Path to log files:
#
path.logs: /opt/apps/elasticsearch-6.5.3//logs

# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: 0.0.0.0

# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
# 如果是全分布式，这里需要配置多个节点的ip
discovery.zen.ping.unicast.hosts: ["10.206.0.4"]

3.3 es不支持使用root账户进行启动

##1. 创建普通账户用于启动es
[root@chancechance home]# useradd chancechance
[root@chancechance home]# passwd chancechance
Changing password for user chancechance.
New password:
BAD PASSWORD: The password fails the dictionary check - it is too simplistic/systematic
Retype new password:
passwd: all authentication tokens updated successfully.

##2. 授于账户权限
[root@chancechance home]# vi /etc/sudoers
## Allow root to run any commands anywhere
root    ALL=(ALL)       ALL
chancechance    ALL=(ALL)       ALL

##3. 授予指定用户的文件所有权
[root@chancechance apps]# chown -R chancechance:chancechance elasticsearch-6.5.3/

##4. 切换到指定账户进行启动es
[root@chancechance apps]# su chancechance
[chancechance@chancechance bin]$ ./elasticsearch

3.4 发现几个问题

3.4.1 第一个问题

## max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

解决方式：
[chancechance@chancechance bin]$ sudo vi /etc/sysctl.conf

3.4.2 第二个问题

## max number of threads [xxxx] for chancechance is too low, increase to at least [xxxx]

解决方式：
[chancechance@chancechance bin]$ sudo vi /etc/security/limits.d/20-nproc.conf
*          soft    nproc     4096
root       soft    nproc     unlimited

3.4.3 第三个问题

## max file descriptors [xxx] for chancechance is too low, increase to at least [xxxx]

解决方式：
[chancechance@chancechance bin]$ sudo vi /etc/security/limits.conf
*	soft	nofile	65536
*	hard	nofile	131072
*	soft	nproc	2048
*	hard	nproc	4096

重启操作系统

3.5 校验

http://146.56.208.76:9200/

{
  "name" : "chancechance",
  "cluster_name" : "hzbigdata-2005",
  "cluster_uuid" : "RLyHYkvjS_ybK4er_re2xw",
  "version" : {
    "number" : "6.5.3",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "159a78a",
    "build_date" : "2018-12-06T20:11:28.826501Z",
    "build_snapshot" : false,
    "lucene_version" : "7.5.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

3.6 安装head插件

##1. 打开谷歌浏览器的扩展程序
##2. 在扩展程序中将开发者模式设置使之生效
##3. 加载已解压的扩展程序，选中head插件目录即可
##4. 在head插件的文本框中输入es的服务地址

二 ES的基本使用——快速入门

1 支持restful风格的访问

1.1 curl

curl www.baidu.com

method:get/post/delete/put
get:查询
post：修改
delete：删除
put：添加

-X : http请求的url
-d : 要传输的参数
-H : 指定http的头信息

语法：
curl -XPUT http://<ip>:<port>/index_name/type_name/doc

1.2 创建索引库

e.g.
[root@chancechance ~]# curl -XPUT http://10.206.0.4:9200/hzbigdata2005
{"acknowledged":true,"shards_acknowledged":true,"index":"hzbigdata2005"}

1.3 创建索引

e.g.
curl -XPUT http://10.206.0.4:9200/hzbigdata2005/student/1 -H "Content-Type:application/json" -d '{"name":"lixi", "age":"34"}'
{"_index":"hzbigdata2005","_type":"student","_id":"1","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}

1.4 查询索引

[root@chancechance ~]# curl -XGET http://10.206.0.4:9200/hzbigdata2005/student/1?pretty
{
  "_index" : "hzbigdata2005",
  "_type" : "student",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "name" : "lixi",
    "age" : "34"
  }
}

1.5 查询索引库中的所有的文档


[root@chancechance ~]# curl -XGET http://10.206.0.4:9200/hzbigdata2005/student/_search?pretty
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "hzbigdata2005",
        "_type" : "student",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "name" : "narudo",
          "age" : "35"
        }
      },
      {
        "_index" : "hzbigdata2005",
        "_type" : "student",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "name" : "lixi",
          "age" : "34"
        }
      }
    ]
  }
}

1.6 删除索引

[root@chancechance ~]# curl -XDELETE http://10.206.0.4:9200/hzbigdata2005/student/1?pretty
{
  "_index" : "hzbigdata2005",
  "_type" : "student",
  "_id" : "1",
  "_version" : 2,
  "result" : "deleted",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 1,
  "_primary_term" : 1
}

1.7 修改(覆盖)

[root@chancechance ~]# curl -XPOST http://10.206.0.4:9200/hzbigdata2005/student/2?pretty -H "Content-Type:application/json" -d '{"name":"lidong"}'
{
  "_index" : "hzbigdata2005",
  "_type" : "student",
  "_id" : "2",
  "_version" : 2,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 1,
  "_primary_term" : 1
}

2 内置接口

URL	描述
/index/type/_search	查询指定的索引库中的这个type中的所有的doc
/_aliases	获取或者向你的索引库添加一个别名
/index/type/_mapping	操作映射
/index/type/_setting	设置
/index/type/_open	打开索引
/index/type/_close	关闭索引
/index/type/_refresh	刷新
/index/type/flush	触发底层lucene

3 集群状态

red:都不能用
yellow:主分片可用，但是某个或者全部副分片不可用
green:所有的主分片和副分片都可用

4 索引操作

4.1 创建索引

##1. put
curl -XPUT http://10.206.0.4:9200/hzbigdata2005/student/1?pretty -H "Content-Type:application/json" -d '{"name":"lixi", "age":"34"}'

##2. post
curl -XPOST http://10.206.0.4:9200/hzbigdata2005/student/3?pretty -H "Content-Type:application/json" -d '{"name":"linan", "age":"33"}'

二者的区别？

curl -XPOST http://10.206.0.4:9200/hzbigdata2005/student?pretty -H “Content-Type:application/json” -d ‘{“name”:“linbei”, “age”:“32”}’

post添加可以不用指定docid，它会随机的生成uuid。但是put操作必须要指定docid。

post可以做修改，而put只能做添加

4.2 查询索引

4.2.1 条件查询

curl -XGET "http://10.206.0.4:9200/hzbigdata2005/student/_search?q=name:lidong&pretty"

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "hzbigdata2005",
        "_type" : "student",
        "_id" : "2",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "lidong"
        }
      }
    ]
  }
}

4.2.2 查询指定的属性

[root@chancechance ~]# curl -XGET "http://10.206.0.4:9200/hzbigdata2005/student/_search?q=name:linbei&_source=name&pretty"
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.6931472,
    "hits" : [
      {
        "_index" : "hzbigdata2005",
        "_type" : "student",
        "_id" : "8tx5InYBuP4PMfFoIEBb",
        "_score" : 0.6931472,
        "_source" : {
          "name" : "linbei"
        }
      }
    ]
  }
}

4.2.3 分页查询

[root@chancechance ~]# curl -XGET "http://10.206.0.4:9200/hzbigdata2005/student/_search?from={1}&size={2}&pretty"
{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 4,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "hzbigdata2005",
        "_type" : "student",
        "_id" : "8tx5InYBuP4PMfFoIEBb",
        "_score" : 1.0,
        "_source" : {
          "name" : "linbei",
          "age" : "32"
        }
      },
      {
        "_index" : "hzbigdata2005",
        "_type" : "student",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "name" : "lixi1"
        }
      }
    ]
  }
}

4.3 修改索引

[root@chancechance ~]# curl -XPOST "http://10.206.0.4:9200/hzbigdata2005/student/3/_update?pretty" -d '{"doc":{"name":"sakura"}}' -H "Content-Type:application/json"
{
  "_index" : "hzbigdata2005",
  "_type" : "student",
  "_id" : "3",
  "_version" : 2,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 1,
  "_primary_term" : 1
}

4.4 批量操作:批量添加

4.4.1 命令

curl -XPOST "http://10.206.0.4:9200/hzbigdata2005/student/_bulk?pretty" -H "Content-Type:application/json" --data-binary "@/home/student.json"

4.4.2 student.json

奇数行：元数据信息

偶数行：具体的信息

{"index":{"_id":"4"}}
{"name":"hehe", "age":"11"}
{"index":{"_id":"5"}}
{"name":"haha", "age":"12"}
{"index":{"_id":"6"}}
{"name":"xixi", "age":"13"}

三 es可视化插件——Kibana（了解）

1 安装

[root@chancechance software]# tar -zxvf kibana-6.5.3-linux-x86_64.tar.gz -C /opt/apps/
[root@chancechance apps]# mv kibana-6.5.3-linux-x86_64/ kibana-6.5.3
[root@chancechance kibana-6.5.3]# vi /etc/profile
export KIBANA_HOME=/opt/apps/kibana-6.5.3
export PATH=$PATH:$KIBANA_HOME/bin

[root@chancechance config]# vi kibana.yml
server.port: 5601
server.host: "0.0.0.0"
server.name: "chancechance"
elasticsearch.url: "http://10.206.0.4:9200"

2 执行kibana

##1. 后台启动es
nohup elasticsearch > $ES_HOME/logs/startup.log 2>&1 &
##2. 后台启动kibana
nohup kibana serve > $KIBANA_HOME/logs/startup.log 2>&1 &

curl -XPUT http://10.206.0.4:9200/hzbigdata2005/student/1?pretty -H "Content-Type:application/json" -d '{"name":"lixi", "age":34}'

3 使用kibana

四 es基本概念

1 通用概念

1.1 index概念（索引库）

在es中index是对逻辑数据的逻辑存储。由于它本身的结构，决定了它的检索效率是非常高的。es可以把一个索引存放在一台服务器上，也可以分别存储在多台服务器上。每个索引有一个或多个shard（分片）构成。每隔分片可以有多个relicas(副本)。

一个集群中可以定义多个索引（索引库），但是一个索引中只能由一个type（类型/索引）。

1.2 document概念

存在es重要的数据就叫做文档。在es中所有的文档只能由一个type(索引)。在同一个索引库中，文档中的相同字段只能由一个类型

1.2.1 创建文档

语法：
curl -XPUT es_url/type/{id} \
-H "Content-Type:application/json" \
-d '{name:value}'

## 指定文档id
curl -XPUT http://10.206.0.4:9200/hzbigdata2005/student/1 -H "Content-Type:application/json" -d '{"name":"lixi", "age":"34"}'

## 自增id
curl -XPOST http://10.206.0.4:9200/hzbigdata2005/student -H "Content-Type:application/json" -d '{"name":"lixi", "age":"34"}'

1.2.2 获取文档

##1. 查询文档
curl -XGET http://10.206.0.4:9200/hzbigdata2005/student/1?pretty

tip:
在任何url添加pretty可以达到美化json输出

##2. 带上响应头
curl -XGET http://10.206.0.4:9200/hzbigdata2005/student/1?pretty -i
HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 164

{
  "_index" : "hzbigdata2005",
  "_type" : "student",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "name" : "lixi",
    "age" : 34
  }
}

##3. 检索文档的一部分功能：详细参考索引操作

##4. 检索多个文档
curl -XGET http://10.206.0.4:9200/hzbigdata2005/student/_mget?pretty -i \
-H "Content-Type:application/json" \
-d '{
    "docs":[
        {
            "_index":"hzbigdata2005",
            "_type":"student",
            "_id":1,
            "_source":"name"
        },
        {
            "_index":"hzbigdata2005",
            "_type":"student",
            "_id":2
        }
    ]
}'

HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 419

{
  "docs" : [
    {
      "_index" : "hzbigdata2005",
      "_type" : "student",
      "_id" : "1",
      "_version" : 1,
      "found" : true,
      "_source" : {
        "name" : "lixi"
      }
    },
    {
      "_index" : "hzbigdata2005",
      "_type" : "student",
      "_id" : "2",
      "_version" : 1,
      "found" : true,
      "_source" : {
        "name" : "lixi2",
        "age" : 35
      }
    }
  ]
}

1.3 field type概念

1.3.1 elasticsearch type

在es6之后，一个index只能有一个type。换言之，在6之前，一个index可以有多个type

1.3.2 field type

类别	类型
字符串	text、keyword
数值类型	long、integer、short、byte、float、half_float、scaled_float
日期类型	date
布尔类型	boolean
二进制类型	binary
范围类型	integer_range、float_range、long_range、double_range、date_range
数组类型	array
对象类型	object
嵌套类型	nested object
地理位置	geo_point、geo_sharp

1.4 map映射的概念

1.4.1 创建mapping

curl -XPUT "http://10.206.0.4:9200/hzbigdata2004?pretty" -i \
-H "Content-Type:application/json" \
-d '{
"mappings":{
"doc":{
"properties":{
"username":{
"type":"text",
"fields":{
"pinyin":{
"type":"text"
}
}
}
}
}
}
}'

14.2 查询mapping

curl -XGET "http://10.206.0.4:9200/hzbigdata2004/_mapping?pretty" -i
HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 294

{
  "hzbigdata2004" : {
    "mappings" : {
      "doc" : {
        "properties" : {
          "username" : {
            "type" : "text",
            "fields" : {
              "pinyin" : {
                "type" : "text"
              }
            }
          }
        }
      }
    }
  }
}

14.3 动态映射

它会根据json的key的value，反向的在mapping中生成field的type

json类型	es类型
null	忽略
boolean	boolean
浮点类型	float
整数	long
object	object
array	由数组中的第一个非null元素的类型决定
string	text

tip:

mapping中的字段类型一旦决定，禁止修改

curl -XPUT "http://10.206.0.4:9200/hzbigdata2003?pretty" -i \
-H "Content-Type:application/json" \
-d '{
"mappings":{
"doc":{
"dynamic":false,
"properties":{
"username":{
"type":"text",
"dynamic":true,
"fields":{
"pinyin":{
"type":"text"
}
}
}
}
}
}
}'

“dynamic”:

true:允许新增字段（默认配置）
false：不允许自动新增字段，但是文档可以正常写入，无法对字段进行查询操作
strict：文档不能写入

14.4 动态映射——识别date类型

14.4.1 动态的映射日期格式

curl -XPUT "http://10.206.0.4:9200/hzbigdata2003/user/1?pretty" -i \
-H "Content-Type:application/json" \
-d '{
    "username":"lixi",
    "birth":"1986-11-25"
}'

14.4.2 自定义日期格式识别

##1. 设置日期格式
curl -XPUT "http://10.206.0.4:9200/hzbigdata2003?pretty" -i \
-H "Content-Type:application/json" \
-d '{
    "mappings":{
        "user":{
            "dynamic_date_formats":["yyyy:MM:dd", "yyyy-MM-dd"]
        }
    }
}'

##2. 插入数据
curl -XPUT "http://10.206.0.4:9200/hzbigdata2003/user/1?pretty" -i \
-H "Content-Type:application/json" \
-d '{
    "username":"lixi",
    "birth":"1986:11:25"
}'

14.4.3 关闭日期识别

##1. 关闭日期格式识别
curl -XPUT "http://10.206.0.4:9200/hzbigdata2003?pretty" -i \
-H "Content-Type:application/json" \
-d '{
    "mappings":{
        "user":{
            "date_detection":false
        }
    }
}'

##2. 插入数据
curl -XPUT "http://10.206.0.4:9200/hzbigdata2003/user/1?pretty" -i \
-H "Content-Type:application/json" \
-d '{
    "username":"lixi",
    "birth":"1986-11-25"
}'

2 核心组件的概念

2.1 cluster

	es的集群。集群中有多个节点，其中有一个为主节点，这个主节点一般都是选举产生的。而且这个主节点是从集群内部来说。对于外部来说他们都是相同的节点，没有主从之分。去中心化的分布式集群。换言之，你在任何集群中的节点访问数据都一样。
	主节点主要负责管理集群的状态。
	查看集群的状态：	
curl -XGET -H "Content-Type:application/json" 'http://10.206.0.4:9200/_cluster/health?pretty'

2.2 shards

可以在创建索引的时候就指定它的分片，你可以理解：spark之于rdd，kafka中的partition
设置分片：
curl -XPUT -H "Content-Type:application/json" 'http://10.206.0.4:9200/hzbigdata2002?pretty' \
-d '{
    "settings":{
        "number_of_shards": "3",
        "number_of_replicas": "1"
    }
}'

2.3 replicas

索引的副本，保证系统的容错性。当节点挂掉的时候可以从副本中恢复数据。

2.4 recovery

数据的重分布，当es中有新的节点加入或者删除了节点的时候,内部的数据会进行重新分配。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-1tx9lrox-1607325433528)(001.png)]

2.5 gateway

	es的持久化存储方式，es默认是将数据存在内存中。当内存满了的时候会将数据溢出到磁盘中。当我们重启的时候会从gateway中读取索引数据。
	es支持很多gateway类型，可以默认的本地系统磁盘，也可以是hdfs。。。

2.6 discovery.zen

自动发现机制。es基于p2p，他会先广播寻找存在的节点，找到了之后再通过广播进行节点于节点的通信。
禁用了自动发现机制：
discovery.zen.ping.multicast.enabled: true/false
设置节点再启动的时候能够被发现的列表
discovery.zen.ping.unicast.hosts: ["10.206.0.4"]

2.7 Transport

	es内部节点或者集群于客户端之间的交互方式。内部使用tcp协议进行交互，同时也支持http协议、thift、servlet、nosql、MQ

五分词器

1 默认的分词器

##1. 英文分词
curl -XGET -H "Content-Type:application/json" 'http://10.206.0.4:9200/_analyze?pretty' \
-d '{
    "text":"Although I am very handsome, but I am very low-key"
}'

##2. 中文分词
curl -XGET -H "Content-Type:application/json" 'http://10.206.0.4:9200/_analyze?pretty' \
-d '{
    "text":"我虽然很帅，但是我很低调"
}'

2 ik中文分词器

2.1 安装

##1. 安装unzip
yum -y install unzip

##2. 上传ik分词器
##3. 拷贝zip到指定目录
[root@chancechance plugins]# mkdir -p $ES_HOME/plugins/ik && mv /opt/software/elasticsearch-analysis-ik-6.5.3.zip ./ik
##4. 解压缩
[root@chancechance ik]# unzip elasticsearch-analysis-ik-6.5.3.zip && rm -f elasticsearch-analysis-ik-6.5.3.zip
##5. 如果是全分布式，就得将这个ik目录分发给其他的节点
##6. 重启es即可

2.2 测试ik分词器

##1. 使用ik分词器对中文进行分词
curl -XGET -H "Content-Type:application/json" 'http://10.206.0.4:9200/_analyze?pretty' \
-d '{
    "analyzer":"ik_max_word",
    "text":"我虽然很帅，但是我很低调"
}'

##2. 支持英文分词
curl -XGET -H "Content-Type:application/json" 'http://10.206.0.4:9200/_analyze?pretty' \
-d '{
    "analyzer":"ik_max_word",
    "text":"Although I am very handsome, but I am very low-key"
}'

2.3 创建索引库并指定分词策略

curl -XPUT -H "Content-Type:application/json" 'http://10.206.0.4:9200/chinese?pretty' \
-d '{
    "settings":{
        "number_of_shards": "3",
        "number_of_replicas": "1",
        "analysis":{
            "analyzer":{
                "ik":{
                    "tokenizer":"ik_max_word"
                }
            }
        }
    },
    "mappings":{
        "test":{
            "properties":{
                "content":{
                    "type":"text",
                    "analyzer":"ik_max_word",
                    "search_analyzer":"ik_max_word"
                }
            }
        }
    }
}'

2.4 向安装ik分词器的索引库中插入记录

curl -XPUT -H "Content-Type:application/json" 'http://10.206.0.4:9200/chinese/test/1?pretty' \
-d '{
    "content":"麦克乔丹是一名伟大的nba篮球运动员"
}'

curl -XPUT -H "Content-Type:application/json" 'http://10.206.0.4:9200/chinese/test/2?pretty' \
-d '{
    "content":"他率领美国篮球队获取到了奥运会和nba的冠军，是所有篮球运动员中的翘楚"
}'

curl -XPUT -H "Content-Type:application/json" 'http://10.206.0.4:9200/chinese/test/3?pretty' \
-d '{
    "content":"美国篮球产生了很多伟大的篮球运动员，如科比、詹姆斯等等"
}'

2.5 全文检索，模糊查询

curl -XGET -H "Content-Type:application/json" 'http://10.206.0.4:9200/chinese/_search?pretty' \
-d '{
    "query":{
        "match":{
            "content":"冠军"
        }
    }
}'

六 Java API

1 导入依赖

<!-- es -->
<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>transport</artifactId>
    <version>6.5.3</version>
</dependency>

<dependency>
    <groupId>org.projectlombok</groupId>
    <artifactId>lombok</artifactId>
    <version>1.18.8</version>
</dependency>

<!-- fastjson -->
<dependency>
    <groupId>com.alibaba</groupId>
    <artifactId>fastjson</artifactId>
    <version>1.2.71</version>
</dependency>

2 快速入门

package cn.qphone.es;

import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.transport.TransportAddress;
import org.elasticsearch.transport.client.PreBuiltTransportClient;
import org.junit.Test;

import java.net.InetAddress;
import java.net.UnknownHostException;

public class Demo1_quickStart {
    public static void main(String[] args) throws UnknownHostException {
        //1. 获取到es的核心类：TransportClient
        //1.1 Setting
        Settings settings = Settings.builder()
                .put("cluster.name", "hzbigdata2005")
                .build();

        //1.2 获取到核心类
        TransportClient client = new PreBuiltTransportClient(settings);

        //1.3 添加es的集群地址
        TransportAddress[] transportAddresses = {
            new TransportAddress(InetAddress.getByName("chancechance"), 9300)
        };
        client.addTransportAddresses(transportAddresses);
    }

    @Test
    public void test() throws UnknownHostException {
        System.out.println(InetAddress.getByName("chancechance"));
    }
}

3 封装ElasticSearchUtils

package cn.qphone.utils;

import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.transport.TransportAddress;
import org.elasticsearch.transport.client.PreBuiltTransportClient;

import java.net.InetAddress;

public class ElasticSearchUtils {
    private static TransportClient client;

    static {
        try {
            Settings settings = Settings.builder()
                    .put("cluster.name", "hzbigdata2005")
                    .build();
            client = new PreBuiltTransportClient(settings);
            TransportAddress[] transportAddresses = {
                    new TransportAddress(InetAddress.getByName("chancechance"), 9300)
            };
            client.addTransportAddresses(transportAddresses);
        }catch (Exception e) {
            e.printStackTrace();
        }
    }

    public static TransportClient getClient() {
        return client;
    }
}

4 关于代码CRUD

package cn.qphone.es;

import cn.qphone.utils.ElasticSearchUtils;
import org.elasticsearch.client.transport.TransportClient;

public class Demo2_CRUD {

    private static TransportClient client = ElasticSearchUtils.getClient();

    public static void main(String[] args) {
        //1. 创建
        /*
         * curl -XPUT -H "Content-Type:application/json" 'http://10.206.0.4:9200/chinese/test/1?pretty' \
            -d '{
                "content":"麦克乔丹是一名伟大的nba篮球运动员"
            }'

            {"_index":"hzbigdata2005","_type":"student","_id":"1","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}
         */
//        String json = "{\"namenode\":\"qphone01\", \"datanode\":\"qphone02\"}";
//        IndexResponse indexResponse = client.prepareIndex("hadoop", "hdfs")
//                .setSource(json, XContentType.JSON)
//                .get();
//        System.out.println("version : " + indexResponse.getVersion());
//        System.out.println("index : " + indexResponse.getIndex());
//        System.out.println("type : " + indexResponse.getType());

        //2. 查询
//        GetResponse getResponse = client.prepareGet("hadoop", "hdfs", "jPufK3YB6ppBFcv2xD7l").get();
//        String json = getResponse.getSourceAsString();
//        System.out.println(json);
//        System.out.println(getResponse.getSource());
//        System.out.println(getResponse.getSourceAsMap());
//        System.out.println(getResponse.getIndex());

        //3. 删除
//        DeleteResponse deleteResponse = client.prepareDelete("hadoop", "hdfs", "jPufK3YB6ppBFcv2xD7l").get();
//        System.out.println(deleteResponse.getIndex());
//        System.out.println(deleteResponse.getResult());

    }
}

5 全文检索代码

package cn.qphone.es;

import cn.qphone.utils.ElasticSearchUtils;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.search.SearchType;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;

public class Demo3_Search {

    /*
     * curl -XGET -H "Content-Type:application/json" 'http://10.206.0.4:9200/chinese/_search?pretty' \
        -d '{
            "query":{
                "match":{
                    "content":"冠军"
                }
            }
        }'
     */
    public static void main(String[] args) {
        //1. 获取到全文检索的响应对象
        TransportClient client = ElasticSearchUtils.getClient();
        SearchResponse searchResponse = client.prepareSearch("chinese")
                /*
                 * QUERY_THEN_FETCH : 伪分布式
                 * DFS_QUERY_THEN_FETCH : 全分布式
                 */
                .setSearchType(SearchType.QUERY_THEN_FETCH)
                /*
                 * matchQuery : select * from t where xxx like
                 * matchAllQuery : select * from t
                 * termQuery : select * from t where xxx =
                 */
                .setQuery(QueryBuilders.matchQuery("content", "运动员"))
                .get();
        //2. 打印数据
        /*
           {
              "took" : 14,
              "timed_out" : false,
              "_shards" : {
                "total" : 3,
                "successful" : 3,
                "skipped" : 0,
                "failed" : 0
              },
              "hits" : {
                "total" : 1,
                "max_score" : 0.2876821,
                "hits" : [
                  {
                    "_index" : "chinese",
                    "_type" : "test",
                    "_id" : "2",
                    "_score" : 0.2876821,
                    "_source" : {
                      "content" : "他率领美国篮球队获取到了奥运会和nba的冠军，是所有篮球运动员中的翘楚"
                    }
                  }
                ]
              }
            }
         */

        SearchHits hits = searchResponse.getHits();
        long total = hits.getTotalHits();
        float maxScore = hits.getMaxScore();
        System.out.println("total hits : " + total);
        System.out.println("max score : " + maxScore);
        SearchHit[] searchHits = hits.getHits(); // 包含了具体记录数据
        for (SearchHit searchHit : searchHits) {
            System.out.println("index : " + searchHit.getIndex());
            System.out.println("type : " + searchHit.getType());
            System.out.println("id : " + searchHit.getId());
            System.out.println("content : " + searchHit.getSourceAsString());
        }
    }
}

Vln

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
ElasticSearch笔记

昨日回顾1. 概念索引文档类型：es type field type映射动态映射自定义映射集群分片副本recoverygatewaytransport2. 分词器2.1 ik分词器2.2 ik分词器如何和索引关联2.3 全文检索一 ElasticSearch介绍1 全文（Context）检索（Search）工具说得直白一点，用来帮助我们进行模糊查询在保证查询效率的情况下。早期在java领域lucune，compass。现目前市面比较流行的solr/el
复制链接

扫一扫