核心
1、环境和架构
2、druid的安装
3、druid的配置
4、overlord json
5、overlord csv
1、druid 环境和架构
环境信息
Centos6.5
32GB 8C *5
Zookeeper 3.4.5
Druid 0.9.2
Hadoop-2.6.5
Jdk1.7
架构
10.20.23.42 Broker Real-time datanode NodeManager QuorumPeerMain
10.20.23.29 middleManager datanode NodeManager
10.20.23.38 overlord datanode NodeManager QuorumPeerMain
10.20.23.82 coordinator namenode ResourceManager
10.20.23.41 historical datanode NodeManager QuorumPeerMain
2、druid安装
Hadoop的安装就不介绍了,之前一直用Hadoop2.3.0安装但是没有成功,所以换成了2.6.5
和单机一样的流程
1、 先解压
2、 拷贝文件
拷贝Hadoop的配置文件到 ${DRUID_HOME}/conf/druid/_common目录下面,拷贝4个core-site.xml hdfs-site.xml mapred-site.xml yarn-site.xml
3、 创建目录,拷贝jar包
在${DRUID_HOME} /hadoop-dependencies/hadoop-client目录下面创建一个2.6.5(建议选择Hadoop的版本号)的文件夹,将Hadoop的jar包拷贝到这个目录下面
4、 修改配置文件
注意:配置文件特别繁琐,只要有一个地方配置错误任务就不能执行
#配置元数据信息,修改成druid-hdfs-storage和mysql-metadata-storage
druid.extensions.loadList=["druid-hdfs-storage","mysql-metadata-storage"]
#配置zookeeper的信息
druid.zk.service.host=10.20.23.82:2181
druid.zk.paths.base=/druid/cluster
#配置元数据MySQL的信息
druid.metadata.storage.type=mysql
druid.metadata.storage.connector.connectURI=jdbc:mysql://10.20.23.42:3306/druid
druid.metadata.storage.connector.user=root
druid.metadata.storage.connector.password=123456
# 配置存储的信息
# Deep storage
#
# For HDFS (make sure to include the HDFS extension and that your Hadoop config files in the cp):
druid.storage.type=hdfs
druid.storage.storageDirectory=/druid/segments
#配置日志存储的信息
# Indexing service logs
#
# For HDFS (make sure to include the HDFS extension and that your Hadoop config files in the cp):
druid.indexer.logs.type=hdfs
druid.indexer.logs.directory=/druid/indexing-logs
broker的配置
broker的配置,主要配置根据实际情况修改内存分配的大小。添加druid.host参数和修改Duser.timezone的值,因为druid默认的时区是Z。所以我们需要加上+0800
[hadoop@SZB-L0038784 broker]$ cat jvm.config
-server
-Xms1g
-Xmx1g
-XX:MaxDirectMemorySize=4096m
-Duser.timezone=UTC+0800
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=var/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
[hadoop@SZB-L0038784 broker]$ cat runtime.properties
druid.host=10.20.23.82
druid.service=druid/broker
druid.port=8082
# HTTP server threads
druid.broker.http.numConnections=5
druid.server.http.numThreads=25
# Processing threads and buffers
druid.processing.buffer.sizeBytes=536870912
druid.processing.numThreads=7
# Query cache
druid.broker.cache.useCache=true
druid.broker.cache.populateCache=true
druid.cache.type=local
druid.cache.sizeInBytes=2000000000
coordinator的配置
coordinator的配置,主要配置根据实际情况修改内存分配的大小。添加druid.host参数和修改Duser.timezone的值,因为druid默认的时区是Z。所以我们需要加上+0800
[hadoop@SZB-L0038784 coordinator]$ cat jvm.config
-server
-Xms1g
-Xmx1g
-Duser.timezone=UTC+0800
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=var/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
-Dderby.stream.error.file=var/druid/derby.log
[hadoop@SZB-L0038784 coordinator]$ cat runtime.properties
druid.host=10.20.23.82
druid.service=druid/coordinator
druid.port=18091
druid.coordinator.startDelay=PT30S
druid.coordinator.period=PT30S
historical 的配置
historical的配置,主要配置根据实际情况修改内存分配的大小。添加druid.host参数和修改Duser.timezone的值,因为druid默认的时区是Z。所以我们需要加上+0800
[hadoop@SZB-L0038784 historical]$ cat jvm.config
-server
-Xms1g
-Xmx1g
-XX:MaxDirectMemorySize=4960m
-Duser.timezone=UTC+0800
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=var/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
[hadoop@SZB-L0038784 historical]$ cat runtime.properties
druid.host=10.20.23.82
druid.service=druid/historical
druid.port=8083
# HTTP server threads
druid.server.http.numThreads=25
# Processing threads and buffers
druid.processing.buffer.sizeBytes=536870912
druid.processing.numThreads=7
# Segment storage
druid.segmentCache.locations=[{"path":"var/druid/segment-cache","maxSize"\:130000000000}]
druid.server.maxSize=130000000000
middleManager 的配置
middleManager的配置,主要配置根据实际情况修改内存分配的大小。添加druid.host参数和修改Duser.timezone的值,因为druid默认的时区是Z。所以我们需要加上+0800
其中hadoop-client:2.6.5 这个2.6.5是和第3点中创建的路径名字是一样的,
[hadoop@SZB-L0038784 middleManager]$ cat jvm.config
-server
-Xms64m
-Xmx64m
-Duser.timezone=UTC+0800
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=var/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
[hadoop@SZB-L0038784 middleManager]$ cat runtime.properties
druid.service=druid/middleManager
druid.port=8091
# Number of tasks per middleManager
druid.worker.capacity=3
# Task launch parameters
druid.indexer.runner.javaOpts=-server -Xmx2g -Duser.timezone=UTC+0800 -Dfile.encoding=UTF-8 -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
druid.indexer.task.baseTaskDir=var/druid/task
# HTTP server threads
druid.server.http.numThreads=25
# Processing threads and buffers
druid.processing.buffer.sizeBytes=536870912
druid.processing.numThreads=2
# Hadoop indexing
druid.host=10.20.23.82
druid.indexer.task.hadoopWorkingPath=/druid/hadoop-tmp
druid.indexer.task.defaultHadoopCoordinates=["org.apache.hadoop:hadoop-client:2.6.5"]
overlord 的配置
overlord的配置,主要配置根据实际情况修改内存分配的大小。添加druid.host参数和修改Duser.timezone的值,因为druid默认的时区是Z。所以我们需要加上+0800
[hadoop@SZB-L0038784 overlord]$ cat jvm.config
-server
-Xms1g
-Xmx1g
-Duser.timezone=UTC+0800
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=var/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
[hadoop@SZB-L0038784 overlord]$ cat runtime.properties
druid.host=10.20.23.82
druid.service=druid/overlord
druid.port=8090
druid.indexer.queue.startDelay=PT30S
druid.indexer.runner.type=remote
druid.indexer.storage.type=metadata
5、 在 通过scp拷贝到其他的机器上面去
6、 在对应机器启动各个进程
java `cat conf/druid/historical/jvm.config | xargs` -cp "conf/druid/_common:conf/druid/historical:lib/*" io.druid.cli.Main server historical
java `cat conf/druid/broker/jvm.config | xargs` -cp "conf/druid/_common:conf/druid/broker:lib/*" io.druid.cli.Main server broker
java `cat conf/druid/coordinator/jvm.config | xargs` -cp "conf/druid/_common:conf/druid/coordinator:lib/*" io.druid.cli.Main server coordinator
java `cat conf/druid/overlord/jvm.config | xargs` -cp "conf/druid/_common:conf/druid/overlord:lib/*" io.druid.cli.Main server overlord
java `cat conf/druid/middleManager/jvm.config | xargs` -cp "conf/druid/_common:conf/druid/middleManager:lib/*" io.druid.cli.Main server middleManager
可以通过下面2个URL查看到对应的页面
http://10.20.23.82:18091/#/
http://10.20.23.82:8090/console.html
7、 创建hdfs对应的路径
/druid/indexing-logs
/druid/segments
3、overlord json文件
[hadoop@SZB-L0038784 hadoop-client]$ hadoop fs -ls /druid
drwxr-xr-x - hadoop supergroup 0 2017-05-30 16:02 /druid/hadoop-tmp
drwxr-xr-x - hadoop supergroup 0 2017-05-30 16:00 /druid/indexing-logs
drwxr-xr-x - hadoop supergroup 0 2017-05-30 15:39 /druid/segments
-rw-r--r-- 3 hadoop supergroup 153 2017-05-29 16:58 /druid/wikipedia_data.csv
-rw-r--r-- 3 hadoop supergroup 17106256 2017-05-29 10:54 /druid/wikiticker-2015-09-12-sampled.json
运行overlord命令
curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/wikiticker-index.json 10.20.23.38:8090/druid/indexer/v1/task
在监控页面可以查看到SUCCESS说明已经overlord成功了
查询
[hadoop@SZB-L0038787 druid-0.9.2]$ curl -L -H'Content-Type: application/json' -XPOST --data-binary @quickstart/wikiticker-top-pages.json http://10.20.23.42:8082/druid/v2/?pretty
[ {
"timestamp" : "2015-09-12T00:46:58.771Z",
"result" : [ {
"page" : "Wikipedia:Vandalismusmeldung",
"edits" : 20
}, {
"page" : "Jeremy Corbyn",
"edits" : 18
}, {
"page" : "User talk:Dudeperson176123",
"edits" : 17
}, {
"page" : "Utente:Giulio Mainardi/Sandbox",
"edits" : 16
}, {
"page" : "User:Cyde/List of candidates for speedy deletion/Subpage",
"edits" : 15
}, {
"page" : "Wikipédia:Le Bistro/12 septembre 2015",
"edits" : 14
}, {
"page" : "Wikipedia:Administrators' noticeboard/Incidents",
"edits" : 12
}, {
"page" : "Kim Davis (county clerk)",
"edits" : 11
}, {
"page" : "The Naked Brothers Band (TV series)",
"edits" : 10
}, {
"page" : "Гомосексуальный образ жизни",
"edits" : 10
}, {
"page" : "Wikipedia:Administrator intervention against vandalism",
"edits" : 9
}, {
"page" : "Wikipedia:De kroeg",
"edits" : 9
}, {
"page" : "Wikipedia:Files for deletion/2015 September 12",
"edits" : 9
}, {
"page" : "التهاب السحايا",
"edits" : 9
}, {
"page" : "Chess World Cup 2015",
"edits" : 8
}, {
"page" : "The Book of Souls",
"edits" : 8
}, {
"page" : "Wikipedia:Requests for page protection",
"edits" : 8
}, {
"page" : "328-я стрелковая дивизия (2-го формирования)",
"edits" : 7
}, {
"page" : "Campanya dels Balcans (1914-1918)",
"edits" : 7
}, {
"page" : "Homo naledi",
"edits" : 7
}, {
"page" : "List of shipwrecks in August 1944",
"edits" : 7
}, {
"page" : "User:Tokyogirl79/sandbox4",
"edits" : 7
}, {
"page" : "Via Lliure",
"edits" : 7
}, {
"page" : "Vorlage:Revert-Statistik",
"edits" : 7
}, {
"page" : "Wikipedia:Löschkandidaten/12. September 2015",
"edits" : 7
} ]
} ]
Json文件的内容 特别注意需要加上jobProperties 这个不然程序会报错
json index的配置
[hadoop@SZB-L0038787 druid-0.9.2]$ cat quickstart/wikiticker-index.json
{
"type" : "index_hadoop",
"spec" : {
"ioConfig" : {
"type" : "hadoop",
"inputSpec" : {
"type" : "static",
"paths" : "/druid/wikiticker-2015-09-12-sampled.json"
}
},
"dataSchema" : {
"dataSource" : "wikiticker",
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "day",
"queryGranularity" : "none",
"intervals" : ["2015-09-12/2015-09-13"]
},
"parser" : {
"type" : "hadoopyString",
"parseSpec" : {
"format" : "json",
"dimensionsSpec" : {
"dimensions" : [
"channel",
"cityName",
"comment",
"countryIsoCode",
"countryName",
"isAnonymous",
"isMinor",
"isNew",
"isRobot",
"isUnpatrolled",
"metroCode",
"namespace",
"page",
"regionIsoCode",
"regionName",
"user"
]
},
"timestampSpec" : {
"format" : "auto",
"column" : "time"
}
}
},
"metricsSpec" : [
{
"name" : "count",
"type" : "count"
},
{
"name" : "added",
"type" : "longSum",
"fieldName" : "added"
},
{
"name" : "deleted",
"type" : "longSum",
"fieldName" : "deleted"
},
{
"name" : "delta",
"type" : "longSum",
"fieldName" : "delta"
},
{
"name" : "user_unique",
"type" : "hyperUnique",
"fieldName" : "user"
}
]
},
"tuningConfig" : {
"type" : "hadoop",
"partitionsSpec" : {
"type" : "hashed",
"targetPartitionSize" : 5000000
},
"jobProperties" : {
"mapreduce.job.classloader": "true",
"mapreduce.job.classloader.system.classes": "-javax.validation.,java.,javax.,org.apache.commons.logging.,org.apache.log4j.,org.apache.hadoop."
}
}
}
}
查询的json文件
[hadoop@SZB-L0038787 druid-0.9.2]$ cat quickstart/wikiticker-top-pages.json
{
"queryType" : "topN",
"dataSource" : "wikiticker",
"intervals" : ["2015-09-12/2015-09-13"],
"granularity" : "all",
"dimension" : "page",
"metric" : "edits",
"threshold" : 25,
"aggregations" : [
{
"type" : "longSum",
"name" : "edits",
"fieldName" : "count"
}
]
}
5、overlord csv文件
我们先准备一些csv的数据
[hadoop@SZB-L0038787 data]$ cat test
2017-08-01T01:02:33Z,10202111900173056925,30202111900037998891,2020211,20202000434,2,1,B18,3,4,J,B,2020003088,,,,,,01,,00000655,,,,,0.00,OLAPMAN,2017-01-0421:16:08+08:00,OLAPMAN,2017-01-0421:16:08+08:00,2015-06-0910:56:03+08:00,
2017-07-16T01:02:33Z,10202111900164385197,30202111900034745280,2020211,20202000434,2,1,B18,3,4,J,B,2020003454,,,,,,01,,00000655,,,,,-2000.00,OLAPMAN,2017-01-0421:16:08+08:00,OLAPMAN,2017-01-0421:16:08+08:00,2015-04-1510:42:26+08:00,
2017-05-15T01:02:33Z,13024011900164473005,33024011900035728305,2302401,2302401,2,1,A01,2,1,G,H,2300000212,,,,30240061,,01,309,,,,,,59.25,OLAPMAN,2017-01-0421:16:08+08:00,OLAPMAN,2017-01-0421:16:08+08:00,2015-04-1517:23:31+08:00,
2017-08-01T01:02:33Z,10202111900173999588,30202111900038540746,2020211,20202000434,2,1,B18,3,4,J,B,2020003155,,,,,,01,,00000655,,,,,0.00,OLAPMAN,2017-01-0421:16:08+08:00,OLAPMAN,2017-01-0421:16:08+08:00,2015-06-1515:41:34+08:00,
2017-08-01T01:02:33Z,10202111900174309914,30202111900038542126,2020211,20202000434,2,1,B18,3,4,J,B,2020003155,,,,,,01,,00000655,,,,,0.00,OLAPMAN,2017-01-0421:16:08+08:00,OLAPMAN,2017-01-0421:16:08+08:00,2015-06-1710:36:16+08:00,
2017-08-01T01:02:33Z,10202111900176540667,30202111900038893351,2020211,20202000434,2,1,B18,3,4,J,B,2020003155,,,,,,01,,00000655,,,,,0.00,OLAPMAN,2017-01-0421:16:08+08:00,OLAPMAN,2017-01-0421:16:08+08:00,2015-06-2913:54:09+08:00,
2017-06-18T01:02:33Z,12078001900174397522,32078001900038476523,22078,22078002835,2,1,A56,2,2,C,A,2200041441,,,,20760002,,01,999,,,,,,0.00,OLAPMAN,2017-01-0421:16:08+08:00,OLAPMAN,2017-01-0421:16:08+08:00,2015-06-1717:36:41+08:00,
2017-12-24T01:02:33Z,11414021900149429403,31414021900036312816,2141402,21414020238,2,1,A01,2,2,8,9,2141400018,,,,14140018,,01,402,,,,,,0.00,OLAPMAN,2017-01-0421:16:08+08:00,OLAPMAN,2017-01-0421:16:08+08:00,2014-12-2612:15:31+08:00,
2017-06-01T01:02:33Z,10202111900165839017,30202111900035354013,2020211,20202000434,2,1,B18,3,4,J,B,2020003088,,,,,,01,,00000655,,,,,0.00,OLAPMAN,2017-01-0421:16:08+08:00,OLAPMAN,2017-01-0421:16:08+08:00,2015-04-2314:32:53+08:00,
准备csv的json文件
[hadoop@SZB-L0038787 quickstart]$ cat test-index.json
{
"type": "index_hadoop",
"spec": {
"dataSchema": {
"dataSource": "test",
"parser": {
"type": "string",
"parseSpec":
{
"format" : "csv",
"timestampSpec" :
{
"column" : "stat_date"
},
"columns" : [
"stat_date",
"policy_no",
"endorse_no",
"department_code",
"sale_group_code",
"business_type",
"business_mode",
"plan_code",
"business_source_code",
"business_source_detail_code",
"channel_source_code",
"channel_source_detail_code",
"sale_agent_code",
"primary_introducer_code",
"renewal_type",
"purchase_year",
"agent_code",
"partner_id",
"currency_code",
"parent_company_code",
"broker_code",
"dealer_code",
"auto_series_id",
"usage_attribute_code",
"new_channel_ground_mark",
"ply_prem_day",
"created_by",
"date_created",
"updated_by",
"date_updated",
"underwrite_time",
"partner_worknet_code"
],
"dimensionsSpec" :
{
"dimensions" : [
"department_code",
"sale_group_code",
"business_type",
"business_mode",
"plan_code",
"business_source_code",
"business_source_detail_code",
"channel_source_code",
"channel_source_detail_code",
"sale_agent_code"
]
}
}
},
"metricsSpec": [
{
"type": "count",
"name": "count"
},
{
"type": "doubleSum",
"name": "ply_prem_day",
"fieldName": "ply_prem_day"
}
],
"granularitySpec": {
"type": "uniform",
"segmentGranularity": "DAY",
"queryGranularity": "NONE",
"intervals": ["2017-05-15/2017-12-25"]
}
},
"ioConfig" : {
"type" : "hadoop",
"inputSpec" : {
"type" : "static",
"paths" : "/druid/test"
}
},
"tuningConfig" : {
"type": "hadoop",
"jobProperties" : {
"mapreduce.job.classloader": "true",
"mapreduce.job.classloader.system.classes": "-javax.validation.,java.,javax.,org.apache.commons.logging.,org.apache.log4j.,org.apache.hadoop."
}
}
}
}
准备csv的查询json文件
[hadoop@SZB-L0038787 quickstart]$ cat test-top-pages.json
{
"queryType" : "topN",
"dataSource" : "test",
"intervals" : ["2017-05-15/2017-12-25"],
"granularity" : "all",
"dimension" : "department_code",
"metric" : "edits",
"threshold" : 25,
"aggregations" : [
{
"type" : "longSum",
"name" : "edits",
"fieldName" : "count"
}
]
}