IK分词器
把一句中文分解成为一个个关键字,进行相关的匹配操作
如果使用中文:推荐使用IK分词器
- 两个分词算法:ik_smart(最少切分),ik_max_word(最细粒度划分)
ik_smart(最少切分)测试
GET _analyze
{
"analyzer": "ik_smart",
"text": "我来自宝鸡文理学院"
}
result
{
"tokens" : [
{
"token" : "我",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_CHAR",
"position" : 0
},
{
"token" : "来自",
"start_offset" : 1,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "宝鸡",
"start_offset" : 3,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "文理学院",
"start_offset" : 5,
"end_offset" : 9,
"type" : "CN_WORD",
"position" : 3
}
]
}
ik_max_word(最细粒度划分)测试
GET _analyze
{
"analyzer": "ik_max_word",
"text": "我来自宝鸡文理学院"
}
result
{
"tokens" : [
{
"token" : "我",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_CHAR",
"position" : 0
},
{
"token" : "来自",
"start_offset" : 1,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "宝鸡",
"start_offset" : 3,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "文理学院",
"start_offset" : 5,
"end_offset" : 9,
"type" : "CN_WORD",
"position" : 3
},
{
"token" : "文理",
"start_offset" : 5,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 4
},
{
"token" : "理学院",
"start_offset" : 6,
"end_offset" : 9,
"type" : "CN_WORD",
"position" : 5
},
{
"token" : "理学",
"start_offset" : 6,
"end_offset" : 8,
"type" : "CN_WORD",
"position" : 6
},
{
"token" : "学院",
"start_offset" : 7,
"end_offset" : 9,
"type" : "CN_WORD",
"position" : 7
}
]
}
Rest风格
基本的Rest命令
method | url | 描述 |
---|---|---|
PUT | localhost:9200/索引名称/类型名称/文档id | 创建文档(指定文档id) |
POST | localhost:9200/索引名称/类型名称 | 创建文档(随机文档id) |
POST | localhost:9200/索引名称/类型名称/文档id/_update | 修改文档 |
DELETE | localhost:9200/索引名称/类型名称/文档id | 删除文档 |
GET | localhost:9200/索引名称/类型名称/文档id | 查询文档(通过文档id) |
POST | localhost:9200/索引名称/类型名称/_search | 查询所有数据 |
索引的基本操作
创建一个索引
PUT /索引名/类型名(新版本逐步废弃)/文档ID
{
字段名:字段值
}
PUT /text1/type1/1
{
"name": "李永康",
"age": 18
}
创建索引(数据库)和字段
PUT /索引名
{
"mappings": {
"properties": {
属性名: {
类型
}
}
}
}
创建一个text2索引,name字段为text,address字段为text,age字段为long。
如果没有指定字段的类型,ES会帮我们指定字段类型!
PUT /text2
{
"mappings": {
"properties": {
"name": {
"type": "text"
},
"address": {
"type": "text"
},
"age": {
"type": "long"
}
}
}
}
获取索引的信息
GET 索引名
{
"text1" : {
"aliases" : { },
"mappings" : {
"properties" : {
"age" : {
"type" : "long"
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"settings" : {
"index" : {
"creation_date" : "1633788361458",
"number_of_shards" : "1",
"number_of_replicas" : "1",
"uuid" : "Qjhtls6BSh6pBmcw4XvCug",
"version" : {
"created" : "7060199"
},
"provided_name" : "text1"
}
}
}
}
更新一个文档
方法一:覆盖
覆盖就是再put一个相同的文档。
方法二:更改
PUT /索引名/类型名/文档ID
{
"doc":{
要更改的字段:更新内容
}
}
把text1索引的type1类型1号文档的name属性改为:“李永康12138”
POST text1/type1/1/_update
{
"doc":{
"name":"李永康12138"
}
}
删除索引
DELETE /索引名/类型名/文档ID
根据要求可以删除索引或者文档
文档的相关操作
添加文档
PUT /索引名/类型名(新版本逐步废弃)/文档ID
{
字段名:字段值
}
PUT /text1/type1/1
{
"name": "李永康",
"age": 18
}
获取文档信息
GET 索引名/类型名/文档ID
GET text1/type1/2
简单查询
GET 索引名/_search?q=属性名:值
复杂查询
match匹配
match查询会使用分词器解析,先解析文档,然后通过分析的文档进行查询。
判断字段name含有"穿越"的文档
GET text1/_search
{
"query": {
"match": {
"name": "穿越"
}
}
}
结果:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,//查询数量
"relation" : "eq"//查询条件 eq
},
"max_score" : 1.3097506,//最大分数(权重)
"hits" : [
{
"_index" : "text1",
"_type" : "type1",
"_id" : "2",
"_score" : 1.3097506,
"_source" : {
"name" : "穿越火线",
"age" : 13
}
},
{
"_index" : "text1",
"_type" : "type1",
"_id" : "4",
"_score" : 1.179499,
"_source" : {
"name" : "穿越火线HD",
"age" : 13
}
}
]
}
}
“_source”: [“XXX”]查询指定值
GET text1/_search
{
"query": {
"match": {
"name": "穿越"
}
}
, "_source": ["name"]
}
只查询索引的name列(如果要查询多个列,直接在[]里面写)
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.3097506,
"hits" : [
{
"_index" : "text1",
"_type" : "type1",
"_id" : "2",
"_score" : 1.3097506,
"_source" : {
"name" : "穿越火线"
}
},
{
"_index" : "text1",
"_type" : "type1",
"_id" : "4",
"_score" : 1.179499,
"_source" : {
"name" : "穿越火线HD"
}
}
]
}
}
sort排序
GET text1/_search
{
"query": {
"match": {
"name": "穿越"
}
},
"sort": [
{
"age": { //根据age字段进行排序
"order": "asc" //asc升序 desc降序
}
}
]
}
分页查询
- from 从第几条数据开始
- size 每页显示多少条数据
GET text1/_search
{
"query": {
"match": {
"name": "穿越"
}
},
"sort": [
{
"age": {
"order": "asc"
}
}
],
"from": 0,
"size": 2
}
布尔值查询
must查询
需要两个条件都匹配上。相当于and查询,(where name=XXX and age=18)
GET text1/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "穿越"
}
},
{
"match": {
"age": "18"
}
}
]
}
}
}
should查询
相当于or操作,同上。
must_not操作
等价于!= , 同上上。
filter查询
查询age>15的文档
- gt gte 大于,大于等于
- lt lte 小于,小于等于
GET text1/_search
{
"query": {
"bool": {
"filter": {
"range": {
"age": {
"gt": 15
}
}
}
}
}
}
可以使用多个条件形成区间
GET text1/_search
{
"query": {
"bool": {
"filter": {
"range": {
"age": {
"gte": 5,
"lte": 18
}
}
}
}
}
}
匹配多个条件
GET text2/_search
{
"query": {
"match": {
"tags": "女 唱"
}
}
}
说明:tags是一个数组,使用多个条件用空格隔开,返回匹配的值,匹配值越多,score越高
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.0304216,
"hits" : [
{
"_index" : "text2",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0304216,
"_source" : {
"name" : "万维网索王",
"tags" : [
"为歌",
"和牛",
"女"
]
}
},
{
"_index" : "text2",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.4589591,
"_source" : {
"name" : "李永康大魔王",
"tags" : [
"唱歌",
"跳舞",
"宅男"
]
}
},
{
"_index" : "text2",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.4589591,
"_source" : {
"name" : "最终搜索王",
"tags" : [
"唱歌",
"和牛",
"直男"
]
}
}
]
}
}
精确查找
term是通过倒排索引的词条进行精确查找的
分词的两种情况:
- term,直接查询精确值
- match,会使用分词器先进行解析,(先分析文档,然后通过分析的文档进行查询)
两个类型不同的情况
- keyword
- 该类型不会被分词器分析
- text
- 该类型会先被分词器分析
高亮
POST /text1/_search
{
"query": {
"term": {
"address": "宝"
}
},"highlight": {
"fields": {
"address": {}
}
}
}
{
"took" : 89,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.9227538,
"hits" : [
{
"_index" : "text1",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.9227538,
"_source" : {
"name" : "李永康",
"address" : "宝光路44号"
},
"highlight" : {
"address" : [
"<em>宝</em>光路44号"
]
}
}
]
}
}
也可以通过标签自定义高亮类型
- pre_tags 前缀
- post_tags 后缀
POST /text1/_search
{
"query": {
"term": {
"address": "宝"
}
},"highlight": {
"pre_tags": "<p class='key'>",
"post_tags": "</p>",
"fields": {
"address": {}
}
}
}
term和keyword的区别
- term查询keyword字段。
term不会分词。而keyword字段也不分词。需要完全匹配才可。
- term查询text字段。
因为text字段会分词,而term不分词,所以term查询的条件必须是text字段分词后的某一个。
- match查询keyword字段
match会被分词,而keyword不会被分词,match的需要跟keyword的完全匹配可以。
- match查询text字段
match分词,text也分词,只要match的分词结果和text的分词结果有相同的就匹配。
集成SpringBoot
导入依赖
elasticsearch的版本要和自己的版本对应
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>
编写配置文件
@Configuration
public class ElasticSearchClientConfig {
@Bean
public RestHighLevelClient restHighLevelClient() {
RestHighLevelClient client = new RestHighLevelClient(
RestClient.builder(new HttpHost("127.0.0.1", 9200, "http"))
);
return client;
}
}
索引相关API
创建索引
@Test
void createIndex() throws IOException {
CreateIndexRequest kang_index = new CreateIndexRequest("kang_index");
CreateIndexResponse createIndexResponse = restHighLevelClient.indices().create(kang_index, RequestOptions.DEFAULT);
System.out.println(createIndexResponse.index());
}
查看索引
@Test
void ExistIndex() throws IOException {
GetIndexRequest getIndexRequest = new GetIndexRequest("kang_index");
boolean exists = restHighLevelClient.indices().exists(getIndexRequest, RequestOptions.DEFAULT);
System.out.println(exists);
}
删除索引
@Test
void delIndex() throws IOException {
DeleteIndexRequest kang_index = new DeleteIndexRequest("kang_index");
AcknowledgedResponse delete = restHighLevelClient.indices().delete(kang_index, RequestOptions.DEFAULT);
System.out.println(delete.isAcknowledged());
}
文档的相关API
添加文档
@Test
void createdDocument() throws IOException {
User user = new User("李永康", 19);
IndexRequest request = new IndexRequest("kang_index");
request.id("2");
request.timeout("1s");
request.source(JSON.toJSONString(user), XContentType.JSON);
IndexResponse index = restHighLevelClient.index(request, RequestOptions.DEFAULT);
System.out.println(index.status()); //CREATED
}
测试文档是否存在
@Test
void testExistDocument() throws IOException {
//测试文档的 没有index
GetRequest request = new GetRequest("kang_index", "1");
boolean exist = restHighLevelClient.exists(request, RequestOptions.DEFAULT);
System.out.println("测试文档是否存在-----" + exist);
}
获取文档
@Test
void testGetDocument() throws IOException {
//获取指定文档
GetRequest request = new GetRequest("kang_index", "1");
GetResponse documentFields = restHighLevelClient.get(request, RequestOptions.DEFAULT);
System.out.println(documentFields.getSourceAsString());
}
修改文档
@Test
void testUpdateDocument() throws IOException {
UpdateRequest updateRequest = new UpdateRequest("kang_index", "1");
User user = new User("赵丽颖", 32);
updateRequest.doc(JSON.toJSONString(user), XContentType.JSON);
UpdateResponse update = restHighLevelClient.update(updateRequest, RequestOptions.DEFAULT);
System.out.println(update.status());
}
删除文档
@Test
void testDeleteDocument() throws IOException {
DeleteRequest deleteIndexRequest = new DeleteRequest("kang_index", "2");
DeleteResponse delete = restHighLevelClient.delete(deleteIndexRequest, RequestOptions.DEFAULT);
System.out.println(delete.status());
}
批量添加文档
// 测试批量添加
@Test
void testBulkAddRequest() throws IOException {
ArrayList<User> users = new ArrayList<>();
users.add(new User("lyk1", 18));
users.add(new User("lyk2", 18));
users.add(new User("lyk3", 18));
users.add(new User("lyk4", 18));
BulkRequest bulkRequest = new BulkRequest();
for (int i = 0; i < users.size(); i++) {
bulkRequest.add(
new IndexRequest("kang_index").id("" + (i + 1))
.source(JSON.toJSONString(users.get(i)), XContentType.JSON)
);
}
BulkResponse bulk = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
System.out.println(bulk.status());
}
查询文档
@Test
void testQuery() throws IOException {
SearchRequest searchRequest = new SearchRequest("kang_index");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); //构建搜索条件
MatchQueryBuilder matchQueryBuilder = QueryBuilders.matchQuery("name", "lyk"); //match n
searchSourceBuilder.query(matchQueryBuilder);
searchRequest.source(searchSourceBuilder);
SearchResponse search = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
for (SearchHit documentFields : search.getHits().getHits()) {
System.out.println("测试查询文档--遍历参数--" + documentFields.getSourceAsMap());
}
}