Druid实时OLAP分析
开发环境准备
启动Druid
- 1、node1节点(使用外部zk而不使用imply自带zk启动overlord和coordinator)
# 使用外部zk而不使用imply自带zk启动overlord和coordinator
/export/servers/imply-3.0.4/bin/supervise -c /export/servers/imply-3.0.4/conf/supervise/master-no-zk.conf
- 2、node2节点(启动historical和middlemanager)
/export/servers/imply-3.0.4/bin/supervise -c /export/servers/imply-3.0.4/conf/supervise/data.conf
- 3、node3节点(启动broker和router)
/export/servers/imply-3.0.4/bin/supervise -c /export/servers/imply-3.0.4/conf/supervise/query.conf
访问WebUI
组件名 | URL |
---|---|
broker | http://node3:8888 |
coordinator、overlord | http://node1:8081/index.html |
middleManager、historical | http://node1:8090/console.html |
点击流日志指标分析
- 操作步骤:
- 1、打开 postman
- 2、修改摄取Kafka实时数据 配置文件
- 打开 Druid实时数据分析项目_配置文件\index_kafka_dws_click_log.json数据文件
- 修改 Kafka 集群地址
- 修改 topic 地址
- 3、复制JSON配置文件到 postman
-
- 4、发送请求到 http://node1:8090/druid/indexer/v1/supervisor
每日PV分析
SELECT
SUM("count") as totalPV
from
"dws_click_log"
where TIME_FORMAT("__time", 'yyyy-MM-dd') = '2010-09-05'
每日UV分析
SELECT
COUNT(DISTINCT "uid") as totalPV
from
"dws_click_log"
where TIME_FORMAT("__time", 'yyyy-MM-dd') = '2010-09-05'
每日IP分析
SELECT
COUNT(DISTINCT "ip") as totalPV
from
"dws_click_log"
where TIME_FORMAT("__time", 'yyyy-MM-dd') = '2010-09-05'
每日用户访问来源流量占比(百度、知乎、新浪、首页…)
SELECT
referDomain,
SUM("count") as total_count
FROM
"dws_click_log"
WHERE
TIME_FORMAT("__time", 'yyyy-MM-dd') ='2010-09-05'
GROUP BY 1
每日不同城市访问来源流量占比
SELECT
province,
city,
SUM("count") as total_count
FROM
"dws_click_log"
WHERE
TIME_FORMAT("__time", 'yyyy-MM-dd') ='2010-09-05'
GROUP BY province,city
订单数指标分析
- 操作步骤:
- 1、打开 postman
- 2、修改摄取Kafka实时数据 配置文件
- 打开 Druid实时数据分析项目_配置文件\index_kafka_dws_order.json数据文件
- 修改 Kafka 集群地址
- 修改 topic 地址
- 3、复制JSON配置文件到 postman
- 4、发送请求到 http://node1:8090/druid/indexer/v1/supervisor
日订单数分析
SELECT
SUM("count") as total_count
FROM
"dws_order"
WHERE
TIME_FORMAT("__time", 'yyyy-MM-dd') = '2019-01-06'
周订单数分析
SELECT
SUM("count") as total_count
FROM
"dws_order"
WHERE
"__time" >= CURRENT_TIMESTAMP - INTERVAL '7' DAY
月订单数分析
SELECT
SUM("count") as total_count
FROM
"dws_order"
WHERE
"__time" >= CURRENT_TIMESTAMP - INTERVAL '1' MONTH
今日各区域订单数(地图)
SELECT
areaId,
sum("count") as total_count
FROM
"dws_order"
WHERE
TIME_FORMAT("__time", 'yyyy-MM-dd') = '2019-01-06'
GROUP BY 1
ORDER BY 2 DESC
周订单数趋势分析
- 分析一周内每日订单数
SELECT
TIME_FORMAT("__time", 'yyyy-MM-dd') as "date",
sum("count") as total_count
FROM
"dws_order"
WHERE
"__time" >= CURRENT_TIMESTAMP - INTERVAL '7' DAY
GROUP BY 1
今日区域订单的订单数Top8
- 按照区域分组
- 按照订单数降序排列
- 取前8条
SELECT
areaId,
SUM("count") total_count
FROM
"dws_order"
WHERE
TIME_FORMAT("__time", 'yyyy-MM-dd') = '2019-01-06'
GROUP BY 1
ORDER BY 2 DESC
LIMIT 8
周销售环比分析
- 获取上周每天总销售额
- 获取本周每天中销售额
-- 上周
SELECT
'上周' as "week",
TIME_FORMAT("__time", 'yyyy-MM-dd') as "date1",
SUM("totalMoney") as total_money
FROM
"dws_order"
WHERE
"__time" BETWEEN (CURRENT_TIMESTAMP - INTERVAL '14' DAY) AND (CURRENT_TIMESTAMP - INTERVAL '7' DAY)
GROUP BY 1,2
UNION ALL
SELECT
'本周' as "week",
TIME_FORMAT("__time", 'yyyy-MM-dd') as "date1",
SUM("totalMoney") as total_money
FROM
"dws_order"
WHERE
"__time" >= CURRENT_TIMESTAMP - INTERVAL '7' DAY
GROUP BY 1,2;
24小时销售额分析
SELECT
TIME_FORMAT("__time", 'HH') as "hour",
SUM("totalMoney") as "totalMoney"
FROM
"dws_order"
WHERE
"__time" >= CURRENT_TIMESTAMP - INTERVAL '1' DAY
GROUP BY 1
ORDER BY 2 DESC
今日top4地区销售排行
SELECT
areaId,
SUM("totalMoney") as "totalMoney"
FROM
"dws_order"
WHERE
"__time" >= CURRENT_TIMESTAMP - INTERVAL '1' DAY
GROUP BY 1
ORDER BY 2 DESC
LIMIT 4
每日实际支付买家数
SELECT
COUNT(DISTINCT "userId") as "totalCount"
FROM
"dws_order"
WHERE TIME_FORMAT("__time", 'yyyy-MM-dd') = '2019-01-06' and isPay=1
每日购物车支付转换率
SELECT
SUM("count") as "totalCount", --找到总的已支付的订单
SUM(case when isFromCart=0 then 1 else 0 end) as "cartTotalCount" --直接下单的订单数量
FROM
"dws_order"
WHERE TIME_FORMAT("__time", 'yyyy-MM-dd') = '2019-01-06' and isPay=1
商品消息数指标分析
- 操作步骤:
- 1、打开 postman
- 2、修改摄取Kafka实时数据 配置文件
- 打开 Druid实时数据分析项目_配置文件\index_kafka_dws_goods.json数据文件
- 修改 Kafka 集群地址
- 修改 topic 地址
- 3、复制JSON配置文件到 postman
- 4、发送请求到 http://node1:8090/druid/indexer/v1/supervisor
每日商家商品数量
SELECT
shopId,
COUNT(DISTINCT "goodsId") as total_count
FROM
"dws_goods"
WHERE
TIME_FORMAT("__time", 'yyyy-MM-dd') ='2019-02-23'
GROUP BY shopId
每日商家商品品牌数量
SELECT
shopId,
COUNT(DISTINCT "brandId") as total_count
FROM
"dws_goods"
WHERE
TIME_FORMAT("__time", 'yyyy-MM-dd') ='2019-02-23'
GROUP BY shopId
每日首发上架商品数
SELECT
shopId,
COUNT(DISTINCT "goodsId") as total_count
FROM
"dws_goods"
WHERE
TIME_FORMAT("__time", 'yyyy-MM-dd') ='2019-02-23' and isSale=1
GROUP BY shopId
购物车消息数指标分析
- 操作步骤:
- 1、打开 postman
- 2、修改摄取Kafka实时数据 配置文件
- 打开 Druid实时数据分析项目_配置文件\index_kafka_dws_cart.json数据文件
- 修改 Kafka 集群地址
- 修改 topic 地址
- 3、复制JSON配置文件到 postman
- 4、发送请求到 http://node1:8090/druid/indexer/v1/supervisor
每日加入购物车次数
SELECT
SUM("count") as total_count
FROM
"dws_cart"
WHERE
TIME_FORMAT("__time", 'yyyy-MM-dd') ='2019-12-16'
每日加入购物车买家数
SELECT
COUNT(DISTINCT "userId") as total_count
FROM
"dws_cart"
WHERE
TIME_FORMAT("__time", 'yyyy-MM-dd') ='2019-12-16'
每日加入购物车商品数
SELECT
SUM("totalGoods") as "totalCount"
FROM
"dws_cart"
WHERE TIME_FORMAT("__time", 'yyyy-MM-dd') = '2019-12-16'
评论数指标分析
- 操作步骤:
- 1、打开 postman
- 2、修改摄取Kafka实时数据 配置文件
- 打开 Druid实时数据分析项目_配置文件\index_kafka_dws_comments.json数据文件
- 修改 Kafka 集群地址
- 修改 topic 地址
- 3、复制JSON配置文件到 postman
- 4、发送请求到 http://node1:8090/druid/indexer/v1/supervisor
每日买家评价数
select
userId,
SUM("count") as totalCount
from dws_comments
WHERE
TIME_FORMAT("__time", 'yyyy-MM-dd') = '2019-12-06'
GROUP BY userId
每日买家评价卖家数
select
userId,
shopId,
SUM("count") as totalCount
from dws_comments
WHERE
TIME_FORMAT("__time", 'yyyy-MM-dd') = '2019-12-06'
GROUP BY 1,2
每日买家好评率
select n
userId,
SUM("count") as totalCount,
SUM(case when starScore>'3' then "count" else 0 end) as goodCount
from dws_comments
WHERE
TIME_FORMAT("__time", 'yyyy-MM-dd') = '2019-12-06'
GROUP BY userId