3. ES 的基础操作 —— 查询、聚合和分析

假设我们在某个公司工作。人力资源部门需要我们创建一个员工目录,需要满足如下需求:

  • 能够包含多种类型的数据,包括数字、纯文本等等
  • 能有用于检索员工信息
  • 支持结构化搜索,例如查找30岁以上的员工。
  • 支持简单的全文搜索和复杂的短语搜索。
  • 高亮搜索结果中的关键字
  • 利用图片分析和管理这些数据。
1. 插入数据

首先创建一个表(目录)用于存储员工的数据。
在 ES 中,我们将存储数据的目录称之为索引。在每个索引中,又存在着一种或者多种类型,而每个文档(记录)属于一种类型。

我们用下表来讲ES与传统的关系型数据库进行对比:

MySQLES
数据库 Database索引 Indices
表 Table类型 Type
行 Row文档 Document
列 Column字段 Field

简单来说,每个 ES 集群/节点可以包含多个 Indices,而每个Indices中又可以包含多个 Type,每个 Type 中有很多 Document,而 Document 又是由多个 Field 组成的。

默认情况下,文档中的所有字段都会被索引(拥有一个倒排索引),只有这样他们才是可被搜索的。

那针对目前的要求(创建一个索引用于存储员工的数据),我们需要进行如下的设计:

  • 在ES中创建一个 alibaba 的索引。
  • 在索引中创建一个 employee 的类型。
  • 在 employee 类型下,每个文档对应一条员工的记录。

下面,我们可以执行如下命令来完成这一操作:

PUT http://{host}:9200/alibaba/employee/1
{
    "first_name" : "Douglas",
    "last_name" : "Fir",
    "age" : 35,
    "about": "I like to build cabinets",
    "interests": [ "forestry" ]
}

我们可以看到在url中包含如下三部分的信息:

  1. 索引名称:alibaba
  2. 类型名称:employee
  3. 员工 ID:1

在请求的Body中,包含了这个员工的具体的信息:姓名、年龄、爱好等。

Ps:在 ES 中,我们不需要专门单独创建索引、类型,而是可以直接向索引、类型中插入数据,如果索引、类型当时不存在,它们会被自动的创建。

下面,我们可以向目录中插入更多的数据:

PUT http://{host}:9200/alibaba/employee/2
{
    "first_name" : "John",
    "last_name" : "Smith",
    "age" : 25,
    "about" : "I love to go rock climbing",
    "interests": [ "sports", "music" ]
}

PUT http://{host}:9200/alibaba/employee/3
{
    "first_name" : "Jane",
    "last_name" : "Smith",
    "age" : 32,
    "about" : "I like to collect rock albums",
    "interests": [ "music" ]
}
2. 检索数据
2.1 根据员工 id 查询员工信息
GET /alibaba/employee/1

{
    "_index": "alibaba",
    "_type": "employee",
    "_id": "1",
    "_version": 1,
    "found": true,
    "_source": {
        "first_name": "Douglas",
        "last_name": "Fir",
        "age": 35,
        "about": "I like to build cabinets",
        "interests": [
            "forestry"
        ]
    }
}

响应中的实际的记录内容包含在返回 JSON 的 _source 中。
Ps:补充说明,在调用该接口时,使用 GET 请求表示查询、DELETE 表示删除、PUT 表示修改、HEAD 表示检查文档是否存在。

2.2 查询全部信息
GET /alibaba/employee/_search
{
    "took": 78,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 3,
        "max_score": 1,
        "hits": [
            {
                "_index": "alibaba",
                "_type": "employee",
                "_id": "2",
                "_score": 1,
                "_source": {
                    "first_name": "John",
                    "last_name": "Smith",
                    "age": 25,
                    "about": "I love to go rock climbing",
                    "interests": [
                        "sports",
                        "music"
                    ]
                }
            },
            {
                "_index": "alibaba",
                "_type": "employee",
                "_id": "1",
                "_score": 1,
                "_source": {
                    "first_name": "Douglas",
                    "last_name": "Fir",
                    "age": 35,
                    "about": "I like to build cabinets",
                    "interests": [
                        "forestry"
                    ]
                }
            },
            {
                "_index": "alibaba",
                "_type": "employee",
                "_id": "3",
                "_score": 1,
                "_source": {
                    "first_name": "Jane",
                    "last_name": "Smith",
                    "age": 32,
                    "about": "I like to collect rock albums",
                    "interests": [
                        "music"
                    ]
                }
            }
        ]
    }
}

在 hits 中包含了我们全部的三条记录。
Ps:默认情况下,搜索会返回前10个结果。

2.3 条件查询
GET /alibaba/employee/_search?q=first_name:Jane
{
    "took": 48,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 1,
        "max_score": 0.2876821,
        "hits": [
            {
                "_index": "alibaba",
                "_type": "employee",
                "_id": "3",
                "_score": 0.2876821,
                "_source": {
                    "first_name": "Jane",
                    "last_name": "Smith",
                    "age": 32,
                    "about": "I like to collect rock albums",
                    "interests": [
                        "music"
                    ]
                }
            }
        ]
    }
}

除了在url中添加简单的查询条件外,ES还提供了强大的查询语言(DSL),它可以允许我们进行更加强大、复杂的查询。
DSL 以 JSON请求体的形式使用,例如对于之前名字中包含 Jane的员工,可以使用如下 DSL 来表示:

POST /alibaba/employee/_search
{
	"query": {
		"match": {
			"first_name": "Jane"
		}
	}
}
{
    "took": 8,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 1,
        "max_score": 0.2876821,
        "hits": [
            {
                "_index": "alibaba",
                "_type": "employee",
                "_id": "3",
                "_score": 0.2876821,
                "_source": {
                    "first_name": "Jane",
                    "last_name": "Smith",
                    "age": 32,
                    "about": "I like to collect rock albums",
                    "interests": [
                        "music"
                    ]
                }
            }
        ]
    }
}
2.4 复杂查询

找到姓氏中包含smith且年月大于30岁的员工

POST /alibaba/employee/_search
{
	"query": {
		"bool": {
			"filter": {
				"range": {
					"age": {
						"gt": 30
					} 
				}
			},
			"must": {
				"match": {
					"last_name": "Smith"
				}
			}
		}
	}
}
{
    "took": 45,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 1,
        "max_score": 0.2876821,
        "hits": [
            {
                "_index": "alibaba",
                "_type": "employee",
                "_id": "3",
                "_score": 0.2876821,
                "_source": {
                    "first_name": "Jane",
                    "last_name": "Smith",
                    "age": 32,
                    "about": "I like to collect rock albums",
                    "interests": [
                        "music"
                    ]
                }
            }
        ]
    }
}
2.5 全文查询
2.5.1 模糊匹配
POST /alibaba/employee/_search
{
	"query": {
		"match": {
			"about" : "rock climbing"
		}
	}
}
{
    "took": 17,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": 0.5753642,
        "hits": [
            {
                "_index": "alibaba",
                "_type": "employee",
                "_id": "2",
                "_score": 0.5753642,
                "_source": {
                    "first_name": "John",
                    "last_name": "Smith",
                    "age": 25,
                    "about": "I love to go rock climbing",
                    "interests": [
                        "sports",
                        "music"
                    ]
                }
            },
            {
                "_index": "alibaba",
                "_type": "employee",
                "_id": "3",
                "_score": 0.2876821,
                "_source": {
                    "first_name": "Jane",
                    "last_name": "Smith",
                    "age": 32,
                    "about": "I like to collect rock albums",
                    "interests": [
                        "music"
                    ]
                }
            }
        ]
    }
}
2.5.2 完全匹配
POST /alibaba/employee/_search
{
	"query": {
		"match_phrase": {
			"about" : "rock climbing"
		}
	}
}
{
    "took": 17,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 1,
        "max_score": 0.5753642,
        "hits": [
            {
                "_index": "alibaba",
                "_type": "employee",
                "_id": "2",
                "_score": 0.5753642,
                "_source": {
                    "first_name": "John",
                    "last_name": "Smith",
                    "age": 25,
                    "about": "I love to go rock climbing",
                    "interests": [
                        "sports",
                        "music"
                    ]
                }
            }
        ]
    }
}
2.5.3 高亮显示
POST /alibaba/employee/_search
{
	"query": {
		"match_phrase": {
			"about" : "rock climbing"
		}
	},
	"highlight": {
		"fields": {
			"about": {}
		}
	}
}
{
    "took": 105,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 1,
        "max_score": 0.5753642,
        "hits": [
            {
                "_index": "alibaba",
                "_type": "employee",
                "_id": "2",
                "_score": 0.5753642,
                "_source": {
                    "first_name": "John",
                    "last_name": "Smith",
                    "age": 25,
                    "about": "I love to go rock climbing",
                    "interests": [
                        "sports",
                        "music"
                    ]
                },
                "highlight": {
                    "about": [
                        "I love to go <em>rock</em> <em>climbing</em>"
                    ]
                }
            }
        ]
    }
}
3. 分析数据

除了插入数据、查找数据外,ES本身来提供了一个重要的功能:数据分析。
例如ES可以通过聚合等在已有的数据中进行复杂的分析统计。该功能类似于MySQL中的GroupBy,但是对比而言功能要强大的多。

3.1 聚合
POST /alibaba/employee/_search
{
	"aggs": {
		"all_interests": {
			"terms" : { 
				"field": "interests" 
			}
		}
	}
}
{
    "took": 125,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 3,
        "max_score": 1,
        "hits": [
            {
                "_index": "alibaba",
                "_type": "employee",
                "_id": "2",
                "_score": 1,
                "_source": {
                    "first_name": "John",
                    "last_name": "Smith",
                    "age": 25,
                    "about": "I love to go rock climbing",
                    "interests": [
                        "sports",
                        "music"
                    ]
                }
            },
            {
                "_index": "alibaba",
                "_type": "employee",
                "_id": "1",
                "_score": 1,
                "_source": {
                    "first_name": "Douglas",
                    "last_name": "Fir",
                    "age": 35,
                    "about": "I like to build cabinets",
                    "interests": [
                        "forestry"
                    ]
                }
            },
            {
                "_index": "alibaba",
                "_type": "employee",
                "_id": "3",
                "_score": 1,
                "_source": {
                    "first_name": "Jane",
                    "last_name": "Smith",
                    "age": 32,
                    "about": "I like to collect rock albums",
                    "interests": [
                        "music"
                    ]
                }
            }
        ]
    },
    "aggregations": {
        "all_interests": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "music",
                    "doc_count": 2
                },
                {
                    "key": "forestry",
                    "doc_count": 1
                },
                {
                    "key": "sports",
                    "doc_count": 1
                }
            ]
        }
    }
}

Ps:对于ES5.X及以后的版本,聚合这些操作用单独的数据结构(fielddata)缓存到内存里了,需要单独开启。即首先需要使用如下命令来开启fielddata。

PUT alibaba/_mapping/employee/
{
  	"properties": {
    	"interests": { 
      		"type": "text",
      		"fielddata": true
    	}
  	}
}
3.2 带过滤条件的聚合
POST /alibaba/employee/_search
{
	"query": {
		"match": {
			"last_name": "Smith"
		}
	},
	"aggs": {
		"all_interests": {
			"terms" : { 
				"field": "interests" 
			}
		}
	}
}
{
    "took": 13,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": 0.2876821,
        "hits": [
            {
                "_index": "alibaba",
                "_type": "employee",
                "_id": "2",
                "_score": 0.2876821,
                "_source": {
                    "first_name": "John",
                    "last_name": "Smith",
                    "age": 25,
                    "about": "I love to go rock climbing",
                    "interests": [
                        "sports",
                        "music"
                    ]
                }
            },
            {
                "_index": "alibaba",
                "_type": "employee",
                "_id": "3",
                "_score": 0.2876821,
                "_source": {
                    "first_name": "Jane",
                    "last_name": "Smith",
                    "age": 32,
                    "about": "I like to collect rock albums",
                    "interests": [
                        "music"
                    ]
                }
            }
        ]
    },
    "aggregations": {
        "all_interests": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "music",
                    "doc_count": 2
                },
                {
                    "key": "sports",
                    "doc_count": 1
                }
            ]
        }
    }
}
3.3 聚合并统计
POST /alibaba/employee/_search
{
	"aggs": {
		"all_interests": {
			"terms" : { 
				"field": "interests" 
			},
			"aggs": {
				"avg_age": {
					"avg": {
						"field": "age"
					}
				}
			}
		}
	}
}
{
    "took": 35,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 3,
        "max_score": 1,
        "hits": [
            {
                "_index": "alibaba",
                "_type": "employee",
                "_id": "2",
                "_score": 1,
                "_source": {
                    "first_name": "John",
                    "last_name": "Smith",
                    "age": 25,
                    "about": "I love to go rock climbing",
                    "interests": [
                        "sports",
                        "music"
                    ]
                }
            },
            {
                "_index": "alibaba",
                "_type": "employee",
                "_id": "1",
                "_score": 1,
                "_source": {
                    "first_name": "Douglas",
                    "last_name": "Fir",
                    "age": 35,
                    "about": "I like to build cabinets",
                    "interests": [
                        "forestry"
                    ]
                }
            },
            {
                "_index": "alibaba",
                "_type": "employee",
                "_id": "3",
                "_score": 1,
                "_source": {
                    "first_name": "Jane",
                    "last_name": "Smith",
                    "age": 32,
                    "about": "I like to collect rock albums",
                    "interests": [
                        "music"
                    ]
                }
            }
        ]
    },
    "aggregations": {
        "all_interests": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "music",
                    "doc_count": 2,
                    "avg_age": {
                        "value": 28.5
                    }
                },
                {
                    "key": "forestry",
                    "doc_count": 1,
                    "avg_age": {
                        "value": 35
                    }
                },
                {
                    "key": "sports",
                    "doc_count": 1,
                    "avg_age": {
                        "value": 25
                    }
                }
            ]
        }
    }
}

注:本文Demo数据来源于 https://www.missshi.cn/api/view/blog/5aa120f25b925d1040000003

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值