一、提取参数说明
### --- 数据摄取规范
~~~ dataSchema。指定传入数据的Schema
~~~ ioConfig。指定数据的来源和去向
~~~ tuningConfig。指定各种摄取参数
{
"type": "kafka",
"spec": {
"ioConfig": Object { ...
},
"tuningConfig": Object { ...
},
"dataSchema": Object { ...
}
}
}
### --- dataSchema的定义:
~~~ Druid摄入数据规范的核心是dataSchema,dataSchema定义了如何解析输入的数据,
~~~ 并将数据存储到Druid中。
~~~ dataSource。摄取数据后生成 dataSource 的名称(dataSource是在查询中使用的表)
~~~ granularitySpec。如何创建段和汇总数据
~~~ timestampSpec。设置时间戳的列和格式
~~~ dimensionsSpec。指定数据的维度列
~~~ metricsSpec。指定数据的指标列,以及Rollup时指标列如何计算
~~~ transformSpec。指定数据的转换规则和过滤规则,这里没有定义
~~~ 备注:如果没有定义Rollup,在摄取数据时维度和度量之间没有区别
"dataSchema": {
"dataSource": "yanqitable1",
"granularitySpec": {
"type": "uniform",
"queryGranularity": "MINUTE",
"segmentGranularity": "DAY",
"rollup": true
},
"timestampSpec": {
"column": "ts",
"format": "iso"
},
"dimensionsSpec": Object { ...
},
"metricsSpec": Array[6]
}
### --- ioConfig的定义:
~~~ 输入数据的数据源在ioConfig中指定,
~~~ 每个任务类型都有自己的ioConfig,这里从 kafka 获取数据,配置如下:
"ioConfig": {
"type": "kafka",
"consumerProperties": {
"bootstrap.servers": "hadoop01:9092,hadoop02:9092"
},
"topic": "yanqidruid1",
"inputFormat": {
"type": "json"
},
"useEarliestOffset": true,
"appendToExisting": true
}
### --- tuningConfig的定义
~~~ tuningConfig规范根据摄取任务类型而有所不同。
"tuningConfig": {
"type": "kafka"
}
二、提取参数json文件
{
"type": "kafka",
"spec": {
"ioConfig": {
"type": "kafka",
"consumerProperties": {
"bootstrap.servers": "hadoop01:9092,hadoop02:9092"
},
"topic": "yanqidruid1",
"inputFormat": {
"type": "json"
},
"useEarliestOffset": true
},
"tuningConfig": {
"type": "kafka"
},
"dataSchema": {
"dataSource": "yanqitable1",
"timestampSpec": {
"column": "ts",
"format": "iso"
},
"dimensionsSpec": {
"dimensions": [
"dstip",
"protocol",
"srcip",
{
"type": "long",
"name": "srcport"
},
{
"type": "long",
"name": "dstPort"
}
]
},
"granularitySpec": {
"queryGranularity": "minute",
"rollup": true,
"segmentGranularity": "day"
},
"metricsSpec": [
{
"name": "count",
"type": "count"
},
{
"name": "min_bytes",
"type": "longMin",
"fieldName": "bytes"
},
{
"name": "sum_cost",
"type": "doubleSum",
"fieldName": "cost"
},
{
"name": "max_packets",
"type": "longMax",
"fieldName": "packets"
},
{
"name": "min_packets",
"type": "longMin",
"fieldName": "packets"
},
{
"name": "sum_packets",
"type": "longSum",
"fieldName": "packets"
}
]
}
}
}