ES 基础操作参考文章
查看实例状态
curl http://10.25.164.23:9200
查看集群全部索引
curl -X GET '10.25.164.23:9200/_cat/indices?v'
查看Index中包含的Type
curl -X GET '10.25.164.23:9200/_mapping?pretty=true'
- 新建Index
curl -X PUT '10.25.164.23:9200/accounts' -d '
{
"mappings": {
"person": {
"properties": {
"user": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_max_word",
"index": "not_analyzed"
},
"title": {
"type": "text"
},
"desc": {
"type": "text"
},
"join_time": {
"type": "date",
"format": "dateOptionalTime",
"index": "not_analyzed"
}
}
}
}
}'
curl -X PUT '10.25.164.23:9200/accounts' -d '
{
"settings":{
"number_of_shards":1,
"number_of_replicas":1,
"index":{
"analysis":{
"analyzer":{
"default":{
"tokenizer":"standard",
"filter":[
"asciifolding",
"lowercase",
"ourEnglishFilter"
]
}
},
"filter":{
"ourEnglishFilter":{
"type":"kstem"
}
}
}
}
}
}'
-- 模板
curl -XPUT [address]/blog/ -d '{
"settings":{
"number_of_shards":1, //设置分片数量
"number_of_replicas":2, //设置副本数量
//自定义索引默认分析器
"index":{
"analysis":{
"analyzer":{
"default":{
"tokenizer":"standard", //分词器
"filter":[ //过滤器
"asciifolding",
"lowercase",
"ourEnglishFilter"
]
}
},
"filter":{
"ourEnglishFilter":{
"type":"kstem"
}
}
}
}
}
}'
index mapping 属性
type表示field的数据类型,上例中interests的type为string表示为普通文本。
Elasticsearch支持以下数据类型:
文本: string
数字: byte, short, integer, long
浮点数: float, double
布尔值: boolean
Date: date
对于type为 string 的字段,最重要的属性是:index and analyzer。index
index 属性控制string如何被索引,它有三个可选值:
analyzed: First analyze the string, then index it. In other words, index this field as full text.
not_analyzed:: Index this field, so it is searchable, but index the value exactly as specified. Do not analyze it.
no: Don’t index this field at all. This field will not be searchable.
对于string类型的filed index 默认值是: analyzed.如果我们想对进行精确查找, 那么我们需要将它设置为: not_analyzed。
例如:
{ “tag”: { “type”: “string”, “index”: “not_analyzed” }analyzer
对于 string类型的字段, 我们可以使用 analyzer 属性来指定在搜索阶段和索引阶段使用哪个分词器. 默认, Elasticsearch 使用 standard analyzer, 你也可以指定Elasticsearch内建的其它分词器,比如 whitespace, simple, or english:
例如:
{ “tweet”: { “type”: “string”, “analyzer”: “english” }
删除Index
curl -X DELETE '10.25.164.23:9200/weather'
- 插入数据
curl -X PUT '10.25.164.23:9200/accounts/person/1' -d '
{
"user": "张三",
"title": "工程师",
"desc": "数据库管理"
}'
- 查看插入的数据
curl '10.25.164.23:9200/accounts/person/1?pretty=true'
- 查询所有记录
curl '10.25.164.23:9200/accounts/person/_search?pretty=true'
- 删除记录
curl -X DELETE '10.25.164.23:9200/accounts/person/1'
- 全文检索
curl '10.25.164.23:9200/accounts/person/_search' -d '
{
"query" : { "match" : { "desc" : "软件" }}
}'
HIVE管理ES外部表
set mapreduce.map.java.opts=-Xmx1024m -XX:+PrintGCDetails -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/@taskid@.hprof;
add jar hdfs://pasc/metadata/libs/es/elasticsearch-hadoop-hive-2.3.1.jar;
set hive.execution.engine=mr;
CREATE EXTERNAL TABLE test_person(
user string,
title string,
desc string)
ROW FORMAT SERDE
'org.elasticsearch.hadoop.hive.EsSerDe'
STORED BY
'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES (
'es.mapping.names'='user:user,title:title,desc:desc',
'es.nodes'='10.25.164.23:9200',
'es.read.metadata'='true',
'es.resource'='accounts/person');