durid建表注意点
nested json提取
嵌套json文件的内容提取:在parser中用flattenSpec定义
原文见:https://druid.apache.org/docs/latest/ingestion/flatten-json.html
属性值useFieldDiscovery :默认为true,若为true,所有第一级的维度,若没有声明,自动添加到表中。这里由于要提取,所以声明"useFieldDiscovery": false
属性值fields定义维度
在每一个维度的定义中,存在三个属性:type、name、expr
type分为root、path和jq root是第一级维度,可以直接省略大括号声明为“xxxxx”
path定义需要提取的内容
{
“type”: “path”,
“name”: “count”,
“expr”: “$.content.count”
},
为一个完整path定义,name是重命名,expr是值的来源。
jq应用于value为数组的键值对。(不用管)
filter
filter需定义在parser之外,定义在transformSpec中。filter有多种定义方法,type属性可以为in、selector、like等。类似SQL。
"transformSpec": {
"filter": { "type": "selector", "dimension": "type", "values": "segment" },
"transforms": []
},
others
datasource定义新建的表名
“dataSource”: “t_sheriff_transaction_test”,
ioConfig中的topic属性定义数据来源于哪张表。
完整json建表语句
"type": "kafka",
"dataSchema": {
"dataSource": "t_sheriff_transaction_test",
"parser": {
"type": "string",
"parseSpec": {
"format": "json",
"flattenSpec":{
"useFieldDiscovery": false,
"fields": [
"type",
{
"type": "path",
"name": "count",
"expr": "$.content.count"
},
{
"type": "path",
"name": "maxTime",
"expr": "$.content.maxTime"
},
{
"type": "path",
"name": "minTime",
"expr": "$.content.minTime"
},
{
"type": "path",
"name": "name",
"expr": "$.content.name"
},
{
"type": "path",
"name": "size",
"expr": "$.content.size"
},
"dt",
"hour",
"timestamp",
"minute"
]
}
}
},
"transformSpec": {
"filter": { "type": "selector", "dimension": "type", "values": "segment" },
"transforms": []
},
"metricsSpec": [
{"name": "all_count_num","fieldName": "sumCount","type": "longSum"}
],
"granularitySpec": {
"type": "uniform",
"segmentGranularity": "HOUR",
"queryGranularity": "NONE",
"rollup": true
}
},
"ioConfig": {
"topic": "sheriff_transaction_info_m",
"consumerProperties": {
"bootstrap.servers": "kf01new.adrd.sohuno.com:8092"
}
}
}