https://endymecy.gitbooks.io/elasticsearch-guide-chinese/content/java-api/index-api.html
es概念:
关系型数据库中的数据库(DataBase),等价于ES中的索引(Index)
(2)一个数据库下面有N张表(Table),等价于1个索引Index下面有N多类型(Type),
(3)一个数据库表(Table)下的数据由多行(ROW)多列(column,属性)组成,等价于1个Type由多个文档(Document)和多Field组成。
(4)在一个关系型数据库里面,schema定义了表、每个表的字段,还有表和字段之间的关系。 与之对应的,在ES中:Mapping定义索引下的Type的字段处理规则,即索引如何建立、索引类型、是否保存原始索引JSON文档、是否压缩原始JSON文档、是否需要分词处理、如何进行分词处理等。
(5)在数据库中的增insert、删delete、改update、查search操作等价于ES中的增PUT/POST、删Delete、改_update、查GET.
在一个index/type里面,你可以存储任意多的文档。注意,一个文档物理上存在于一个索引之中,但文档必须被索引/赋予一个索引的type。
分片:分库分表
//准备查询对象
SearchRequestBuilder searchRequestBuilder = this.prepareSearch();
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
// 强制删除类似与sql中的where条件=
boolQueryBuilder.must(QueryBuilders.termQuery("is_delete", 0));
// 过滤 类似与sql中的where条件!=
boolQueryBuilder.mustNot(QueryBuilders.termQuery("is_delete", 0));
boolQueryBuilder.must(QueryBuilders.termsQuery(RecybagReportConstant.COL_SITE_CODE, siteCodes));
boolQueryBuilder.shoud(QueryBuilders.termsQuery(RecybagReportConstant.COL_SITE_CODE, siteCodes));
boolQueryBuilder.must(QueryBuilders.termsQuery(RecybagReportConstant.COL_SITE_CODE, siteCodes));
//对时间过滤
RangeQueryBuilder timeBuilder=timeBuilder = QueryBuilders.rangeQuery("time es时间字段");
//设置起始时间
timeBuilder.from(DateUtil.format(startTime, DateUtil.FORMAT_DATE_TIME));
//设置终止时间
timeBuilder.to(DateUtil.format(endTime, DateUtil.FORMAT_DATE_TIME));
// 以网点编码聚合分组,设置size为网点列表数量 相当于group
AggregationBuilder aggregationBuilder = AggregationBuilders.terms(RecybagReportConstant.COL_SITE_CODE).field(RecybagReportConstant.COL_SITE_CODE)
.size(siteCodes.size());
// 在网点桶下创建库存状态分组相当于count
aggregationBuilder.subAggregation(AggregationBuilders.terms(RecybagReportConstant.COL_STORE_STATUS).field(RecybagReportConstant.COL_STORE_STATUS));
//topHits为查询type中的第一个对象
TopHitsAggregationBuilder topBuilder =
AggregationBuilders.topHits("top").explain(true).size(1);
Terms groupTerms = searchResponse.getAggregations().get(typeEnum.getEsField());
在for循环中先循环利用bucket.getAggregations().get("top");取出
for (Terms.Bucket bucket : groupTerms.getBuckets()) {
OperateReportResDto operateReportResDto = new OperateReportResDto();
TopHits topHits = bucket.getAggregations().get("top");
log.info("topHits.getHits().getHits().length:{}", topHits.getHits().getHits().length);
if (topHits.getHits().getHits().length > 0) {
SearchHit hit = topHits.getHits().getHits()[0];
log.info("hit.getSourceAsMap():{}", hit.getSourceAsMap().toString());
try {
this.convertToBean(operateReportResDto, hit.getSourceAsMap());
} catch (Exception e) {
log.error("es对象转换异常{}", e);
}
}
ReflectionUtils.setFieldValue(operateReportResDto, typeEnum.getEsField(), bucket.getKeyAsString());
}
TopHits topHits = bucket.getAggregations().get("top");
aggregationBuilder.subAggregation(topBuilder)
// 为查询对象 设置查询条件及聚合桶
searchRequestBuilder.setQuery(boolQueryBuilder);
searchRequestBuilder.addAggregation(aggregationBuilder);
//增加排序条件
SortBuilder sortBuilder = SortBuilders.fieldSort("orderTime")
.order(SortOrder.DESC);
searchRequestBuilder.addSort(sortBuilder);
//cardinality其实相当于该字段互不相同的值有多少类,输出的是种类数
CardinalityAggregationBuilder skuCount = AggregationBuilders.cardinality("skuCount").field("sku").precisionThreshold(Constants.precisionThreshold);
// 设置不返回hits 并执行查询
SearchResponse searchResponse = searchRequestBuilder.setSize(0).execute().actionGet();
Aggregations aggregations = searchResponse.getAggregations();
AggregationBuilders.sum对分组中的结果进行求和统计
SumAggregationBuilder expectNumCount = AggregationBuilders.sum("expectNumCount").field("expectNum");
// 聚合统计命名为:orders, 统计order_id字段值的数量
ValueCountAggregationBuilder valueCountAggregationBuilder = AggregationBuilders.count("orders")
.field("order_id");
我们知道ES对于from+size的个数是有限制的,二者之和不能超过1w。当所请求的数据总量大于1w时,可用游标scroll来代替from+size。
1.第一步发起一个scroll 的post请求,带上参数scroll=1m (1m的意思是1分钟的意思)
POST /twitter/_search?scroll=1m
{
"size": 100,
"query": {
"match" : {
"title" : "elasticsearch"
}
}
}
这一步会得到一个_scroll_id
2. 使用第一步得到的_scroll_id 来翻页,一直执行这个请求,就可以得到所有的数据了
POST /_search/scroll
{
"scroll" : "1m",
"scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAA3okgWbkYzT1lBcjRUS0NmbkRnclY3bElmUQ=="
}
3. scrapy去请求翻页,很有可能因为_scroll_id没有变化,造成请求重复而被放弃,一定要加上dont_filter=True
@Test
public void searchApi2() throws IOException {
SearchRequest searchRequest = new SearchRequest("item");
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
// 必须匹配条件
boolQueryBuilder.must(QueryBuilders.matchQuery("scompCode", "G0000001"));
// 模糊查询
boolQueryBuilder.filter(QueryBuilders.wildcardQuery("itemDesc", "*手机*"));
// 范围查询 from:相当于闭区间; gt:相当于开区间(>) gte:相当于闭区间 (>=) lt:开区间(<) lte:闭区间 (<=)
boolQueryBuilder.filter(QueryBuilders.rangeQuery("itemPrice").from(4500).to(8899));
boolQueryBuilder.filter(QueryBuilders.rangeQuery("itemPrice").lt(4500).gt(8899));
sourceBuilder.query(boolQueryBuilder);
searchRequest.source(sourceBuilder);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
logger.info("查询数据:{}", Arrays.toString(searchResponse.getHits().getHits()));
}
/**
* 查询某个字段中字段值的基数(种类)
*/
@Test
public void cardinality(){
SearchResponse response = client.prepareSearch(indexName).setTypes(typeName)
.addAggregation(AggregationBuilders.cardinality("userAgg").field("user.keyword"))
.get();
Cardinality userAgg = response.getAggregations().get("userAgg");
System.out.println(userAgg.getValue());
}
CardinalityAggregationBuilder nameCount = AggregationBuilders.cardinality("name").field("name").precisionThreshold(Constants.precisionThreshold);
SumAggregationBuilder expectnum = AggregationBuilders.sum("numCount").field("expectNum");
日志输出时BoolQueryBuilder对象之间利用toString输出SearchResponse可之间输出
cardinality统计种类(去重计算数目),sum求和
must:与关系,相当于关系型数据库中的 and。
should:或关系,相当于关系型数据库中的 or。
must_not:非关系,相当于关系型数据库中的 not。
filter:过滤条件。
range:条件筛选范围。
gt:大于,相当于关系型数据库中的 >。
gte:大于等于,相当于关系型数据库中的 >=。
lt:小于,相当于关系型数据库中的 <。
网络资源
JAVAAPI:
https://endymecy.gitbooks.io/elasticsearch-guide-chinese/content/java-api/index-api.html
ES中批量查询与_bulk的批量增删改操作
https://www.jianshu.com/p/5317d11e9f0b
Elasticsearch学习,请先看这一篇!
https://blog.csdn.net/laoyang360/article/details/52244917
es 修改 mapping 字段类型
https://www.cnblogs.com/royfans/p/11436395.html
ES数据库重建索引——Reindex(数据迁移)
https://www.cnblogs.com/Ace-suiyuan008/p/9985249.html
Elasticsearch索引mapping的写入、查看与修改
https://my.oschina.net/LucasZhu/blog/1574964
mapping store默认false index默认true
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html
es索引别名操作相关
https://blog.csdn.net/u010454030/article/details/79719019
在生产环境运行Elasticsearch深度指南
https://mp.weixin.qq.com/s/j3bvSBeNLBsRENmKxkY5Uw
ES 的query 和 filter 区别
https://blog.csdn.net/qq_29580525/article/details/80908523
es 深度分页问题
https://cf.jd.com/pages/viewpage.action?pageId=270683055
ES分布式架构及底层原理
https://segmentfault.com/a/1190000015256970
Elasticsearch(ES)集群架构 --- 图比较多
https://www.jianshu.com/p/b1724c49d7c9
ES搜索原理 - 倒排索引部分讲的比较透彻
https://www.yuque.com/see-engineering/ygoczq/uuku0d
Elasticsearch: What It Is, How It Works, And What It’s Used For
https://www.knowi.com/blog/what-is-elastic-search/
Spring Data Elasticsearch - Reference Documentation
https://docs.spring.io/spring-data/elasticsearch/docs/4.1.5/reference/html/#preface
ES 数据写入过程
SearchRequestBuilder searchRequestBuilder = this.prepareSearch();
SearchResponse searchResponse = searchRequestBuilder.setQuery(queryParam).setFrom(offset).setSize(rows).setExplain(false)
.execute().actionGet();
https://www.jianshu.com/p/2cd9f1136eda
es在索引recybag_report中type追加字段
PUT /recybag_report/_mapping/recybag_report
{
"properties": {
"provinceOrgCode":{
"type": "keyword"
},
"provinceOrgName":{
"type": "keyword"
},
"sliceOrgCode":{
"type": "keyword"
},
"sliceOrgName":{
"type": "keyword"
},
"partitionOrgCode":{
"type": "keyword"
},
"partitionOrgName":{
"type": "keyword"
}
}
}
清空索引为recybag_report,type为recybag_report下的所有数据
POST /recybag_report/recybag_report/_delete_by_query
{
"query": {"match_all": {}}
}
查询索引group_order2021_03
GET /group_order2021_03/_search
{
"query": {
"match_all": {}
}
}
多条件复合搜索
GET test_index/_search
{
"query":{
"boolean":{
"should":[
{"match":{
"name":"khue"
}},
{"match":{
"department":"coder"
}}
],
"must":[
{"range":{
"age":{
"gte":20,
"lte":25
}
}}
],
"must_not":[
{"match":{
"gender":"男性"
}}
]
}
}
}
查看索引mapping
GET /order/_mapping/type
查询聚合
GET group_order/_search
{
"size": 20,
"query": {
"bool": {
"must": [
{
"term": {
"test": {
"value": "0"
}
}
}
]
}
}
,
"aggs": {
"siteCode": {
"terms": {
"field": "siteCode",
"size": 10
},
"aggs": {
"belongCityCode": {
"value_count": {
"field": "belongCityCode"
}
}
}
}
}
}