Apache Druid 实践及案例分享

数据摄取

Druid 支持流式批量两种方式的数据摄入,针对不同类型的数据,Druid 将外部数据源分为两种形式:

① 流式数据源(流式)

指的是持续不断地生产数据的数据源。例如:消息队列、日志、文件等

② 静态数据源(批量)

指的是数据已经生产完毕,不会有新数据产生的数据源,例如:文件系统的文件

流式数据可以通过两种方式来摄入:本地文件和远程文件

静态数据源数据摄取(本地文件)

案例:将本地文件的数据加载到 Druid 中,然后进行 SQL 操作

① 本地文件摄取,创建本地文件

mkdir -p /opt/server/druid/

# 创建本地文件
vim /opt/server/druid/stock-trade.json

# 添加以下内容到文件中
{"timestamp":"2019-10-01T01:03.00Z","sec_code":"浦发银行","platform":"pc","trade_vol":"20"}
{"timestamp":"2019-10-01T01:01.00Z","sec_code":"白云机场","platform":"pc","trade_vol":"10"}
{"timestamp":"2019-10-01T01:03.00Z","sec_code":"浦发银行","platform":"pc","trade_vol":"5"}
{"timestamp":"2019-10-01T05:01.00Z","sec_code":"浦发银行","platform":"pc","trade_vol":"1"}
{"timestamp":"2019-10-01T01:03.00Z","sec_code":"白云机场","platform":"pc","trade_vol":"8"}
{"timestamp":"2019-10-01T01:01.00Z","sec_code":"浦发银行","platform":"mobile","trade_vol":"7"}
{"timestamp":"2019-10-01T01:03.00Z","sec_code":"浦发银行","platform":"pc","trade_vol":"6"}
{"timestamp":"2019-10-01T03:01.00Z","sec_code":"东风汽车","platform":"pc","trade_vol":"5"}
{"timestamp":"2019-10-01T01:03.00Z","sec_code":"浦发银行","platform":"pc","trade_vol":"7"}
{"timestamp":"2019-10-01T01:01.00Z","sec_code":"白云机场","platform":"pc","trade_vol":"8"}
{"timestamp":"2019-10-01T05:03.00Z","sec_code":"浦发银行","platform":"mobile","trade_vol":"9"}
{"timestamp":"2019-10-01T01:01.00Z","sec_code":"东风汽车","platform":"pc","trade_vol":"23"}
{"timestamp":"2019-10-01T01:03.00Z","sec_code":"浦发银行","platform":"pc","trade_vol":"5"}
{"timestamp":"2019-10-01T01:01.00Z","sec_code":"白云机场","platform":"pc","trade_vol":"1"}
{"timestamp":"2019-10-01T03:03.00Z","sec_code":"浦发银行","platform":"mobile","trade_vol":"1"}
{"timestamp":"2019-10-01T01:01.00Z","sec_code":"东风汽车","platform":"pc","trade_vol":"2"}
{"timestamp":"2019-10-01T01:03.00Z","sec_code":"浦发银行","platform":"pc","trade_vol":"3"}
{"timestamp":"2019-10-01T05:01.00Z","sec_code":"浦发银行","platform":"pc","trade_vol":"5"}
{"timestamp":"2019-10-01T01:03.00Z","sec_code":"浦发银行","platform":"mobile","trade_vol":"6"}
{"timestamp":"2019-10-01T01:01.00Z","sec_code":"东风汽车","platform":"pc","trade_vol":"3"}

② 将 stock_trade.json 分发到其他 Druid 节点相同的目录

注意:数据要保持在各个节点都要有,否则会出现任务执行成功,但是没有 DataSource 的情况

# 在 node1 操作
scp /opt/server/druid/stock-trade.json node2:$PWD
scp /opt/server/druid/stock-trade.json node3:$PWD

③ 发送创建 DataSource 的请求

{
  "spec": {
  	  "dataSchema": {
	    "dataSource": "stock_trade",
	    "parser": {
	      "type": "String",
	      "parseSpec": {
	        "format": "json",
	        "dimensionsSpec": {
	          "dimensions": [
	            "sec_code",
	            "platform"
	          ]
	        },
	        "timestampSpec": {
	          "column": "timestamp",
	          "format": "auto"
	        }
	      }
	    },
	    "metricsSpec": [
	      {
	        "type": "count",
	        "name": "count"
	      },
	      {
	        "type": "longSum",
	        "name": "trade_vol",
	        "fieldName": "trade_vol",
	        "expression": null
	      }
	    ],
	    "granularitySpec": {
	      "type": "uniform",
	      "segmentGranularity": "DAY",
	      "queryGranularity": "HOUR",
	      "rollup": true,
	      "intervals": [
	        "2019-10-01T00:00:00.000Z/2019-10-03T00:00:00.000Z"
	      ]
	    },
	    "transformSpec": {
	      "filter": null,
	      "transforms": []
	    }
	  },
	  "ioConfig": {
	    "type": "index",
	    "firehose": {
	      "type": "local",
	      "baseDir": "/opt/server/druid",
	      "filter": "stock-trade.json",
	      "parser": null
	    },
	    "appendToExisting": false
	  },
	  "tuningConfig": {
	    "type": "index",
	    "maxRowsPerSegment": null,
	    "maxRowsInMemory": 1000000,
	    "maxBytesInMemory": 0,
	    "maxTotalRows": null,
	    "numShards": null,
	    "partitionDimensions": [],
	    "indexSpec": {
	      "bitmap": {
	        "type": "concise"
	      },
	      "dimensionCompression": "lz4",
	      "metricCompression": "lz4",
	      "longEncoding": "longs"
	    },
	    "maxPendingPersists": 0,
	    "buildV9Directly": true,
	    "forceGuaranteedRollup": false,
	    "reportParseExceptions": false,
	    "pushTimeout": 0,
	    "segmentWriteOutMediumFactory": null,
	    "logParseExceptions": false,
	    "maxParseExceptions": 2147483647,
	    "maxSavedParseExceptions": 0
	  }
  },
  "type": "index"
}

在这里插入图片描述
④ 查看 DataSource
在这里插入图片描述
在这里插入图片描述
⑤ SQL 查询

-- 查看各个股票在各个平台的成交总量
SELECT
  sec_code,
  platform,
  SUM(trade_vol) AS total_trade_vol
FROM "stock_trade"
WHERE TIME_FORMAT("__time", 'yyyyMMdd') = '20191001'
GROUP BY sec_code, platform

在这里插入图片描述

静态数据源数据摄取(HDFS文件)

案例:加载 HDFS 上的数据到 Druid 中,然后进行 SQL 操作

# 执行原理
1. Druid 使用 HadoopDruidIndexer 加载批量数据,将数据生成 segments 文件,存放在HDFS上
2. 从 HDFS 下载 segments 文件到本地
3. 然后遍可从Druid中查询数据

① 创建文件 wiki-visit-log.json 文件并上传到 HDFS 中

vim wiki-visit-log.json

# 上传到 HDFS 中
hdfs dfs -put wiki-visit-log.json /tmp/logs
{"time":"2015-09-12T00:47:00.496Z","channel":"#ca.wikipedia","cityName":null,"comment":"Robot inserta {{Commonscat}} que enllaça amb [[commons:category:Rallicula]]","countryIsoCode":null,"countryName":null,"isAnonymous":false,"isMinor":true,"isNew":false,"isRobot":true,"isUnpatrolled":false,"metroCode":null,"namespace":"Main","page":"Rallicula","regionIsoCode":null,"regionName":null,"user":"PereBot","delta":17,"added":17,"deleted":0}
{"time":"2015-09-12T00:47:05.474Z","channel":"#en.wikipedia","cityName":"Auburn","comment":"/* Status of peremptory norms under international law */ fixed spelling of 'Wimbledon'","countryIsoCode":"AU","countryName":"Australia","isAnonymous":true,"isMinor":false,"isNew":false,"isRobot":false,"isUnpatrolled":false,"metroCode":null,"namespace":"Main","page":"Peremptory norm","regionIsoCode":"NSW","regionName":"New South Wales","user":"60.225.66.142","delta":0,"added":0,"deleted":0}
{"time":"2015-09-12T00:47:08.770Z","channel":"#vi.wikipedia","cityName":null,"comment":"fix Lỗi CS1: ngày tháng","countryIsoCode":null,"countryName":null,"isAnonymous":false,"isMinor":true,"isNew":false,"isRobot":true,"isUnpatrolled":false,"metroCode":null,"namespace":"Main","page":"Apamea abruzzorum","regionIsoCode":null,"regionName":null,"user":"Cheers!-bot","delta":18,"added":18,"deleted":0}
{"time":"2015-09-12T00:47:11.862Z","channel":"#vi.wikipedia","cityName":null,"comment":"clean up using [[Project:AWB|AWB]]","countryIsoCode":null,"countryName":null,"isAnonymous":false,"isMinor":false,"isNew":false,"isRobot":true,"isUnpatrolled":false,"metroCode":null,"namespace":"Main","page":"Atractus flammigerus","regionIsoCode":null,"regionName":null,"user":"ThitxongkhoiAWB","delta":18,"added":18,"deleted":0}
{"time":"2015-09-12T00:47:13.987Z","channel":"#vi.wikipedia","cityName":null,"comment":"clean up using [[Project:AWB|AWB]]","countryIsoCode":null,"countryName":null,"isAnonymous":false,"isMinor":false,"isNew":false,"isRobot":true,"isUnpatrolled":false,"metroCode":null,"namespace":"Main","page":"Agama mossambica","regionIsoCode":null,"regionName":null,"user":"ThitxongkhoiAWB","delta":18,"added":18,"deleted":0}
{"time":"2015-09-12T00:47:17.009Z","channel":"#ca.wikipedia","cityName":null,"comment":"/* Imperi Austrohongarès */","countryIsoCode":null,"countryName":null,"isAnonymous":false,"isMinor":false,"isNew":false,"isRobot":false,"isUnpatrolled":false,"metroCode":null,"namespace":"Main","page":"Campanya dels Balcans (1914-1918)","regionIsoCode":null,"regionName":null,"user":"Jaumellecha","delta":-20,"added":0,"deleted":20}

在这里插入图片描述

流式数据源数据摄取(Kafka)

① 创建 topic

cd /opt/server/kafka_2.12-2.4.1
bin/kafka-topics.sh --create --zookeeper node1:2181 --partitions 1 --replication-factor 1 --topic metrics

② 提交索引任务

③ 通过 WEB-UI 界面创建任务在这里插入图片描述
④ 启动 Kafka 生产者

# 启动
bin/kafka-topics.sh --create --zookeeper node1:2181 --partitions 1 --replication-factor 1 --topic metrics

# 添加 1 条数据
{"time":"2019-07-23T17:57:58Z","url":"/foo/bar","user":"alice","latencyMs":32}

⑤ 配置任务信息

Connect
在这里插入图片描述
Parse Data
在这里插入图片描述

ParseTime

在这里插入图片描述
Transform
在这里插入图片描述
Filter
在这里插入图片描述
Configure Schema
在这里插入图片描述
Partition
在这里插入图片描述
Tune
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
⑥ 任务提交后查看
在这里插入图片描述
在这里插入图片描述
⑦ SQL 查询
在这里插入图片描述

  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值