19.9.4 工作日志-durid读取kafka数据流建表

最新推荐文章于 2022-04-07 08:23:13 发布

蛇大大

最新推荐文章于 2022-04-07 08:23:13 发布

阅读量195

点赞数

文章标签： durid

本文链接：https://blog.csdn.net/weixin_42372973/article/details/100541814

版权

durid建表注意点

nested json提取

嵌套json文件的内容提取：在parser中用flattenSpec定义

原文见：https://druid.apache.org/docs/latest/ingestion/flatten-json.html

属性值useFieldDiscovery ：默认为true，若为true，所有第一级的维度，若没有声明，自动添加到表中。这里由于要提取，所以声明"useFieldDiscovery": false

属性值fields定义维度

在每一个维度的定义中，存在三个属性：type、name、expr

type分为root、path和jq root是第一级维度，可以直接省略大括号声明为“xxxxx”

path定义需要提取的内容

{
“type”: “path”,
“name”: “count”,
“expr”: “$.content.count”
},

为一个完整path定义，name是重命名，expr是值的来源。

jq应用于value为数组的键值对。（不用管）

filter

filter需定义在parser之外，定义在transformSpec中。filter有多种定义方法，type属性可以为in、selector、like等。类似SQL。

  "transformSpec": {
    "filter":  { "type": "selector", "dimension": "type", "values": 	"segment" },
    "transforms": []
   },

others

datasource定义新建的表名

“dataSource”: “t_sheriff_transaction_test”,

ioConfig中的topic属性定义数据来源于哪张表。

完整json建表语句

    "type": "kafka",
    "dataSchema": {
      "dataSource": "t_sheriff_transaction_test",
      "parser": {
        "type": "string",
        "parseSpec": {
          "format": "json",
          "flattenSpec":{
			"useFieldDiscovery": false,
			"fields": [
				"type",
				{
					"type": "path",
					"name": "count",
					"expr": "$.content.count"
				},
				{
					"type": "path",
					"name": "maxTime",
					"expr": "$.content.maxTime"
				},
				{
					"type": "path",
					"name": "minTime",
					"expr": "$.content.minTime"
				},
				{
					"type": "path",
					"name": "name",
					"expr": "$.content.name"
				},
				{
					"type": "path",
					"name": "size",
					"expr": "$.content.size"
				},
				"dt",
				"hour",
				"timestamp",
				"minute"
			]
		  }
        }
      },
	  "transformSpec": {
        "filter":  { "type": "selector", "dimension": "type", "values": "segment" },
        "transforms": []
       },
      "metricsSpec": [
          {"name": "all_count_num","fieldName": "sumCount","type": "longSum"}
      ],
      "granularitySpec": {
        "type": "uniform",
        "segmentGranularity": "HOUR",
        "queryGranularity": "NONE",
        "rollup": true
      }
    },
    "ioConfig": {
      "topic": "sheriff_transaction_info_m",
      "consumerProperties": {
        "bootstrap.servers": "kf01new.adrd.sohuno.com:8092"
      }
    }
}

蛇大大

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
19.9.4 工作日志-durid读取kafka数据流建表

durid建表注意点nested json提取嵌套json文件的内容提取：在parser中用flattenSpec定义原文见：https://druid.apache.org/docs/latest/ingestion/flatten-json.html属性值useFieldDiscovery ：默认为true，若为true，所有第一级的维度，若没有声明，自动添加到表中。这里由于要提取，所...
复制链接

扫一扫