一.es是什么
Search & Analyze Data in Real Time
核心的功能就是搜索,全文搜索框架,接近实时的搜索强力搜索引擎依赖Lucene,新上传,修改的索引同步速度接近实时
优势:
1.分布式,水平扩容,高可用
2.实时搜索,提供分词功能
3.提供强力的restfulAPI
二.场景介绍
tb级别的数据量,需要提供全文搜索功能,并且实时返回匹配的结果如下
例如在一个入口搜索一个组合的关键词,得到最匹配的结果列表,并且是实时返回,索引中存着很多的商品 tb级别) 用火锅 辣 这样的组合单词去搜索索引中的title字段
1.【通州区】麻合辣重庆九宫格火锅
2. 【平谷城区】北京嗨辣激情火锅
分词器会把titel 【通州区】麻合辣重庆九宫格火锅 进行一个拆分 [通,州,区,麻,合,辣,重,庆,九,宫,格,火,锅] ,之后进行单词匹配,并给匹配的结果打分(关联性)之后利用打分的结果进行排序,返回最匹配的结果
更详细有关分词器内容可以查看官方文档
3 安装(单机版)
https://www.elastic.co/downloads/elasticsearch
下载后解压进入bin目录
输入./elasticsearch
看到上图表示启动成功
4 es词汇
es有很多新的名词例如node document index type id理解这些词组才能有一个好的开始
node 集群中的一个节点;
index :一个索引是一个包含某些特性类似数据的集合
type:在一个索引里面,可以定义一个或多个types, 一个type是逻辑 分类你的索引数据
document:一个文本是一个能被索引的基础单位
对比mysql数据关系如下
mysql: db -table - row
es: index-type-id
mysql的库等同于es的index,table等同于type,row等同于id;
五. restful API
https://github.com/bly2k/files/blob/master/accounts.zip?raw=true 1000条批量json数据
提取它放到当前命令后目录输入
curl -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary "@accounts.json"
这个操作会上传1000条数据进入bank下面的account type下
批处理命令 _bulk
?pretty 漂亮的格式返回
下列是列举各类的查询语法
分页:
curl -XPOST 'localhost:9200/hotelswitch/_search?pretty' -d ' { "query": { "match_all": {} }, "from": 10, "size": 10 }'
排序:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{ "query": { "match_all": {} }, "sort": { "balance": { "order": "desc" } } }'
返回部分字段 -在source 里面指定
curl -XPOST 'localhost:9200/hotelswitch/_search?pretty' -d ' { "query": { "match": {"account_number":20} }, "_source": ["account_number", "email"] }'
curl -XPOST 'localhost:9200/bank/_search?pretty' -d ' { "query": { "match": { "address": "mill lane" } } }'
组合查询
curl -XPOST 'localhost:9200/bank/_search?pretty' -d ' { "query": { "bool": { "must": [ { "match": { "address": "mill" } }, { "match": { "address": "lane" } } ] } } }'
范围过滤器
curl -XPOST 'localhost:9200/bank/_search?pretty' -d ' { "query": { "bool": { "must": { "match_all": {} }, "filter": { "range": { "balance": { "gte": 20000, "lte": 30000 } } } } } }'
聚合函数 类似于sql 的group by
curl -XPOST 'localhost:9200/bank/_search?pretty' -d ' { "size": 0, "aggs": { "group_by_state": { "terms": { "field": "state" } } } }'
更多详细的restful API可以看官方文档
六 java client
1.maven引入依赖jar包
<dependency> <groupId>org.elasticsearch</groupId> <artifactId>elasticsearch</artifactId> <version>2.4.0</version> </dependency>
2.上传索引和文本
public class elasticSearch_local {
private static final Logger logger = LoggerFactory.getLogger(elasticSearch_local.class);
private static Random r=new Random();
static int [] typeConstant =new int[]{0,1,2,3,4,5,6,7,8,9,10};
static String [] roomTypeNameConstant =new String[]{"标准大床房","标准小床房","豪华大房","主题情侣房间"};
public static void main (String []agre) throws Exception {
//http://bj1.lc.data.sankuai.com/ test 80 online 9300
// on startup
//初始化client实列 连接本机的es 9300端口
TransportClient client = TransportClient.builder().build()
.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"), 9300));
long startTime = System.currentTimeMillis();
for (int i=0;i<1000;i++) {
//上传数据第一个参数为索引,第二个为type,source是文本
IndexResponse response = client.prepareIndex("hotel", "room")
.setSource(getEsDataString()
)
.get();
}
logger.info(" run 1000 index consume time : "+(System.currentTimeMillis()-startTime));
}
public static XContentBuilder getEsDataString () throws Exception{
SimpleDateFormat sp =new SimpleDateFormat("yyyy-MM-dd");
Date d =new Date();
int offset = r.nextInt(15);
//es的原生api 提供json数据的转换 jsonBuilder.field(key,value).endObject();
XContentBuilder object= jsonBuilder()
.startObject().field("gmtCreate", (System.currentTimeMillis()-(864000008*offset))+"").field("gmtModified",(System.currentTimeMillis()-(864000008*offset))+"")
.field("sourceType",typeConstant[r.nextInt(10)]+"").field("partnerId",r.nextInt(999999999)+"").field("poiId",r.nextInt(999999999)+"")
.field("roomType",r.nextInt(999999999)+"").field("roomName",roomTypeNameConstant[r.nextInt(4)]).field("bizDay",r.nextInt(999999999)+"")
.field("status",typeConstant[r.nextInt(10)]+"").field("freeCount",r.nextInt(99999)+"").field("soldPrice",r.nextInt(99999)+"")
.field("marketPrice",r.nextInt(99999)+"").field("ratePlanId",r.nextInt(99999)+"").field("accessCode",r.nextInt(999999999)+"")
.field("basePrice",r.nextInt(999999999)+"").field("memPrice",r.nextInt(999999999)+"").field("priceCheck",typeConstant[r.nextInt(10)]+"")
.field("shardPart",typeConstant[r.nextInt(10)]+"").field("sourceCode",typeConstant[r.nextInt(10)]+"").field("realRoomType",r.nextInt(999999999)+"")
.field("typeLimitValue",typeConstant[r.nextInt(10)]+"").field("openInventoryByAccessCodeList","").field("closeInventoryByAccessCodeList","")
.field("openOrClose","1").field("openInventoryByAccessCodeListSize",r.nextInt(999999999)+"").field("openInventoryByAccessCodeListIterator",r.nextInt(999999999)+"")
.field("closeInventoryByAccessCodeListSize",r.nextInt(999999999)+"").field("closeInventoryByAccessCodeListIterator",r.nextInt(999999999)+"")
.field("datetime", sp.format(d))
.endObject();
return object;
}
}
3.查询代码
public class elasticSearch_formeituanSearch { private static final Logger logger = LoggerFactory.getLogger(elasticSearch_formeituanSearch.class); public static void main (String []agre) throws Exception { //http://bj1.lc.data.sankuai.com/ test 80 online 9300 // on startup //连接到集群 初始化客户端 TransportClient client = TransportClient.builder().build() .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"), 9300)); /*QueryBuilder queryBuilder = QueryBuilders .disMaxQuery() .add(QueryBuilders.termQuery("roomName", "豪华大床")) .add(QueryBuilders.termQuery("status", "0"));*/ //查询条件 在匹配文字的时候一定用matchQuery termQuery 用于精确匹配 匹配数字 ,long型 term查询不会分词 QueryBuilder qb = boolQuery().must(matchQuery("roomName", "豪华大房")) ; /* QueryBuilder qb = boolQuery() .must(matchQuery("roomName", "豪华大房")) .must(matchQuery("status", "0")) .must(matchQuery("sourceCode", "4")) .must(matchQuery("typeLimitValue", "5")) .must(matchQuery("soldPrice", "11673"));*/ SearchResponse response = client.prepareSearch("hotel") //hotel索引 .setTypes("room") //room type .setSearchType(SearchType.DFS_QUERY_THEN_FETCH) //搜索类型 .setQuery(qb) // Query .setPostFilter(QueryBuilders.rangeQuery("datetime").gte("2016-10-20").lte("2016-10-21").format("yyyy-MM-dd")) //在查询到的结果后 进行日期过滤 .setFrom(0).setSize(10).setExplain(true) //分页 .execute() //执行 .actionGet(); long count =response.getHits().getTotalHits(); //命中的结果 System.out.println(count); SearchHit[] hits =response.getHits().getHits(); for (SearchHit hit : hits) { System.out.println(hit.getSource()); } } }
4 删除数据
public class elasticSearch_fordelete { private static final Logger logger = LoggerFactory.getLogger(elasticSearch_fordelete.class); public static void main (String []agre) throws Exception { TransportClient client = TransportClient.builder().build() .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"), 9300)); //匹配所有 Scroll便利数据 每次读取1000条 while循环中 会重新拉取数据 大数据建议用Scroll QueryBuilder qb = matchAllQuery(); SearchResponse response = client.prepareSearch("hotelindex") .setTypes("poidata") .setSearchType(SearchType.DFS_QUERY_THEN_FETCH) .addSort(SortParseElement.DOC_FIELD_NAME, SortOrder.ASC) .setScroll(new TimeValue(60000)) .setQuery(qb) .setFrom(0) .setSize(50) .execute() .actionGet(); long count =response.getHits().getTotalHits(); while (true) { for (SearchHit hit : response.getHits().getHits()) { client.prepareDelete(hit.getIndex(),hit.getType(),hit.getId()).get(); } try { response = client.prepareSearchScroll(response.getScrollId()).setScroll(new TimeValue(60000)).execute().actionGet(); //Break condition: No hits are returned if (response.getHits().getHits().length == 0) { break; } }catch (Exception e){ e.printStackTrace(); } } } }
搜索区别-
//查询条件 在匹配文字的时候一定用matchQuery termQuery用于精确匹配匹配数字long型term查询不会分词
match_query :全文搜索 首先分析单词
term_query:精确查询-不分析单词
Mapings:
建立字段映射多种数据类型
注意 已经存在的索引不能够重新被映射
索引的几种建立方式
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-put-mapping.html
需要源码的请加技术群:468246651