druid使用
一、安装
druid安装
使用hdp中druid-0.10.1
配置sql支持:
Custom druid-broker添加
druid.sql.enable=true组件:
Broker 8082 Coordinator 8081 Overlord 8090 Router 8888 Historical 8083 MiddleManager 8091 imply安装
对应imply版本为imply-2.3.9,仅使用其中的ui
启动imply-ui:
- bin/run-imply-ui-quickstart conf-quickstart
二、导入离线数据
文件准备 wikipedia_data.csv
将此文件上传到hdfs://ns/user/druid/quickstart
2013-08-31T01:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,SanFrancisco,57,200,-143 2013-08-31T01:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,SanFrancisc,57,200,-143 2013-08-31T01:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,SanFrancis,57,200,-143 2013-08-31T01:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,SanFranci,57,200,-143 2013-08-31T01:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,SanFranc,57,200,-143 2013-08-31T01:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,SanFran,57,200,-143 2013-08-31T01:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,SanFra,57,200,-143 2013-08-31T01:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,SanFr,57,200,-143 2013-08-31T01:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,SanF,57,200,-143 2013-08-31T01:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,Sa,57,200,-143 2013-08-31T02:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,SanFrancisco,57,200,-143 2013-08-31T02:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,SanFrancisc,57,200,-143 2013-08-31T02:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,SanFrancis,57,200,-143 2013-08-31T02:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,SanFranci,57,200,-143 2013-08-31T02:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,SanFranc,57,200,-143 2013-08-31T02:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,SanFran,57,200,-143 2013-08-31T02:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,SanFra,57,200,-143 2013-08-31T02:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,SanFr,57,200,-143 2013-08-31T02:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,SanF,57,200,-143 2013-08-31T02:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,Sa,57,200,-143
定义数据源的数据格式描述文件 wikipedia_index_hadoop_csv_task.json
{ "type": "index_hadoop", "spec": { "dataSchema": { "dataSource": "wikipedia2", "parser": { "type": "hadoopyString", "parseSpec": { "format": "csv", "timestampSpec": { "column": "timestamp" }, "columns": ["timestamp", "page", "language", "user", "unpatrolled", "newPage", "robot", "anonymous", "namespace", "continent", "country", "region", "city", "added", "deleted", "delta"], "dimensionsSpec": { "dimensions": ["page", "language", "user", "unpatrolled", "newPage", "robot", "anonymous", "namespace", "continent", "country", "region", "city"] } } }, "metricsSpec": [{ "type": "count", "name": "count" }, { "type": "doubleSum", "name": "added", "fieldName": "added" }, { "type": "doubleSum", "name": "deleted", "fieldName": "deleted" }, { "type": "doubleSum", "name": "delta", "fieldName": "delta" }], "granularitySpec": { "type": "uniform", "segmentGranularity": "DAY", "queryGranularity": "DAY", "intervals": ["2013-08-31/2013-09-01"] } }, "ioConfig": { "type": "hadoop", "inputSpec": { "type": "static", "paths": "quickstart/wikipedia_data.csv" } }, "tuningConfig": { "type": "hadoop", "partitionsSpec": { "type": "hashed", "targetPartitionSize": 5000000 }, "jobProperties": { } } } }
paths:hdfs下/user/druid/quickstart/wikipedia_data.csv
聚合粒度由queryGranularity控制
提交任务
- curl -X ‘POST’ -H ‘Content-Type:application/json’ -d @wikipedia_index_hadoop_csv_task.json p5.ambari:8090/druid/indexer/v1/task
sql查询
- curl -XPOST -H’Content-Type: application/json’ http://p6.ambari:8082/druid/v2/sql/ -d ‘{“query”:”SELECT * FROM wikipedia2”}’
ui展示
- imply
- pivot:http://ip:9095/
- 支持界面sql查询,展示图形较为简单
- superset
- hdp自带
- 不支持界面sql查询,展示图形较丰富,但是最长可查询距当前时间一年以内的数据
- imply
三、接入kafka实时数据
创建kafka topic
- kafka-topics.sh –create –zookeeper localhost:2181 –partitions 1 –replication-factor 1 –topic metrics
定义数据源的数据格式描述文件 metrics-kafka.json
{
"type": "kafka",
"dataSchema": {
"dataSource": "metrics-kafka",
"parser": {
"type": "string",
"parseSpec": {
"timestampSpec": {
"column": "time",
"format": "auto"
},
"dimensionsSpec": {
"dimensions": ["url", "user"]
},
"format": "json"
}
},
"granularitySpec": {
"type": "uniform",
"segmentGranularity": "hour",
"queryGranularity": "second"
},
"metricsSpec": [{
"type": "count",
"name": "views"
},
{
"name": "latencyMs",
"type": "doubleSum",
"fieldName": "latencyMs"
}
]
},
"ioConfig": {
"topic": "metrics",
"consumerProperties": {
"bootstrap.servers": "p6.ambari:6667",
"group.id": "kafka-indexing-service"
},
"taskCount": 1,
"replicas": 1,
"taskDuration": "PT1H"
},
"tuningConfig": {
"type": "kafka",
"maxRowsInMemory": "100000"
}
}
提交任务
- curl -XPOST -H ‘Content-Type: application/json’ -d @metrics-kafka.json http://p5.ambari:8090/druid/indexer/v1/supervisor
写数据到kafka topic
kafka-console-producer.sh –broker-list p6.ambari:6667 –topic metrics
{"time": "2018-03-06T09:58:09.111Z", "url": "/foo/bar", "user": "bob", "latencyMs": 45} {"time": "2018-03-06T09:58:09.222Z", "url": "/foo/bar", "user": "bob", "latencyMs": 45} {"time": "2018-03-06T09:58:09.333Z", "url": "/foo/bar", "user": "bob", "latencyMs": 45}
停止任务