transwarp Slipstream 简介之实战应用

Application间的资源隔离

Exg:用户在基于某个App下的Stream时,只能在改App下查看该App下的StreamJob;当用户退出该App时,将无法查看其他App下的StreamJob只能查看当前App下的StreamJob。

流上的统计

Emily接到了老板的第一个任务:如何实现对网站访问次数做统计。假设源数据如下:
27.0.1.125,www.transwarp.io/home.html,2016-8-14 20:12:31.132
54.231.66.16,www.transwarp.io/product.html,2016-8-14 20:43:31.213
72.21.203.5,www.transwarp.io/case.html,2016-8-14 20:45:31.132
207.241.224.2,www.transwarp.io/product.html,2016-8-14 20:46:15.540
12.129.206.133,www.transwarp.io/product.html,2016-8-14 20:47:21.332
208.111.148.7,www.transwarp.io/home.html,2016-8-14 20:50:31.876

  • Emily创建一个流和一个表:
 CREATE STREAM accesslog(ip STRING, url STRING, time TIMESTAMP) ROW FORMAT DELIMITED FIELDS
  TERMINATED BY ',' TBLPROPERTIES("topic"="accesslog","kafka.zookeeper"="172.16.1.128:2181"
  ,"kafka.broker.list"="172.16.1.128:9092");
按系统时间统计每10秒各个 url 有多少访问量:
  CREATE TABLE result(url STRING, count INT);
  CREATE STREAM waccesslog AS SELECT * FROM accesslog STREAMWINDOW (LENGTH '10' SECOND SLIDE '10'
  SECOND);
  INSERT INTO result SELECT url, COUNT(*) FROM waccesslog GROUP BY url;
  • 按消息时间统计每10秒各个 url 有多少访问量:
 CREATE TABLE result2(url STRING, count INT);
  CREATE STREAM accesslog2(ip STRING, url STRING, time TIMESTAMP)
  ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
  TBLPROPERTIES ("topic"="accesslog", "kafka.zookeeper"="172.16.1.128:2181","kafka.broker.list"
  ="172.16.1.128:9092","timefield"="time", "timeformat"="yyyy-MM-dd HH-mm-ss.SSS");
  CREATE STREAM waccesslog2 AS SELECT * FROM accesslog2 STREAMWINWDOW sw AS (SEPARATED BY time LENGTH
  '10' SECOND SLIDE '10' SECOND);
  INSERT INTO result2 SELECT url, COUNT(*) FROM waccesslog2 GROUP BY url;
  • 按时间字段切分的多流关联
    由于基于Slipstream实现比较快,Emily接到新的任务将原有sql迁移,需要将两个流和一个维度表按照数据 字段时间做切分。现有期货价格和现货价格两个流, 分别有ID, 时间, 和价格三个字段,例如:
    qihuo
    2016-8-14 20:50:00,1,50.20
    2016-8-14 20:50:00,2,65.40
    2016-8-14 20:50:00,3,31.30
    2016-8-14 20:50:01,1,50.80
    2016-8-14 20:50:01,2,65.10
    2016-8-14 20:50:01,3,29.90

    xianhuo
    2016-8-14 20:50:00,1,55.10
    2016-8-14 20:50:00,2,67.20
    2016-8-14 20:50:00,3,33.10
    2016-8-14 20:50:01,1,55.20
    2016-8-14 20:50:01,2,66.70
    2016-8-14 20:50:01,3,30.30
    Emily先创建两个流:
  CREATE STREAM qihuo2(timestamp STRING, id STRING, price DOUBLE) ROW FORMAT DELIMITED FIELDS
  TERMINATED BY ',' TBLPROPERTIES ('topic'='qihuo', 'timefield'='timestamp', 'timeformat'='yyyy-MM-dd
  HH-mm-ss', "kafka.zookeeper"="tw-node127:2181", "kafka.broker.list"="172.16.1.128:9092");
  CREATE STREAM xianhuo2(timestamp STRING, id STRING, price DOUBLE) ROW FORMAT DELIMITED FIELDS
  TERMINATED BY ',' TBLPROPERTIES ('topic'='xianhuo', 'timefield'='timestamp', 'timeformat'='yyyy-MM-
  dd HH-mm-ss', "kafka.zookeeper"="tw-node127:2181", "kafka.broker.list"="172.16.1.128:9092");

按消息时间对两个流进行关联, 并求现货于期货价格之差, 转化了 (时间, ID, 现货期货之差) 为字段的数 据流:

  CREATE STREAM transform_stream2 AS
    SELECT
      qihuo.timestamp,
      qihuo.id,
      (xianhuo.price - qihuo.price) AS diff
    FROM
      qihuo2 qihuo JOIN xianhuo2 xianhuo
        ON
        qihuo.timestamp=xianhuo.timestamp
        AND
        qihuo.id=xianhuo.id;

在上述数据流上,每隔1秒,取长度为3分钟的窗口. 转化为 “最近3分钟” 的数据流:


 CREATE STREAM window_diff_stream AS
    SELECT * FROM transform_stream2
      STREAMWINDOW w1 AS (LENGTH '3' MINUTE SLIDE '1' SECOND);

最终求取最近3分钟内, 每个ID现货期货之差的最大值和最小值,以及它们的差:


  CREATE VIEW max_min_diff_window_stream AS SELECT maxTable.id, maxTable.diff AS maxDiff,
  maxTable.timestamp AS maxTime, minTable.diff AS minDiff, minTable.timestamp AS minTime,
  (maxTable.diff - minTable.diff) AS maxMinDiff, minTable.maxtime AS calTime FROM
  (SELECT id,
  timestamp,
  diff,
  maxtime FROM (
  SELECT id,
  timestamp,
  diff,
  row_number() OVER (PARTITION BY id ORDER BY diff ) AS minDiff,
  max(timestamp) OVER (PARTITION BY id) AS maxtime FROM window_diff_stream)
  WHERE minDiff=1) minTable
  JOIN
  (SELECT id,
  timestamp,
  diff FROM (
  SELECT id,
  timestamp,
  diff,
  row_number() OVER (PARTITION BY id ORDER BY diff DESC ) AS maxDiff
  FROM window_diff_stream)
  WHERE maxDiff=1) maxTable
  ON minTable.id = maxTable.id;mily触发计算:

  CREATE TABLE result(maxTableId STRING, maxDiff DOUBLE, maxTime STRING, minDiff DOUBLE, minTime
  STRING, maxMinDiff DOUBLE, calTime STRING);
  SET streamsql.use.eventtime=true;
  INSERT INTO result SELECT * FROM  max_min_diff_window_stream;

流上的PL/SQL

当她老板得知Slipstream还支持PLSQL时,让她调研是否将原有的PLSQL代码迁移到Slipstream上。除了上述 期货和现货表, 现在还有一个交易表, 包含ID,时间,以及交易量, Emily要对每个ID累计从开市到当前时间的 交易量。另外一个需求是当 max_min_diff_window_stream 这个 stream 中 maxmindiff 大于等于 50 时, 触发警报, 选取day_sum中, 累计值前20的记录, 插入警告表:

 CREATE STREAM transaction_stream(timestamp STRING, id STRING, transaction DOUBLE) ROW FORMAT
  DELIMITED FIELDS TERMINATED BY ',' TBLPROPERTIES ('topic'='transaction', "kafka.zookeeper"="tw-
  node127:2181");
  CREATE TABLE day_sum(id STRING , sd STRING, total DOUBLE) ROW FORMAT DELIMITED FIELDS TERMINATED BY
  ',';
  CREATE TABLE warm_transaction (id STRING, timestamp STRING, total DOUBLE);
  CREATE VIEW transaction_sum AS SELECT id, timestamp, sum(s.transaction) total FROM
  transaction_stream s GROUP BY id, timestamp;
  SET plsql.show.sqlresults=true;
  SET stream.enabled=true;
  SET stream.tables=qihuo2,xianhuo2,transaction_stream;
DECLARE
  threshold_count int := 0
  BEGIN
  INSERT INTO day_sum SELECT id, sd, CASE WHEN isnull(total2) THEN total1 ELSE (total1 + total2) END
  total FROM
  (SELECT t1.id id, t1.timestamp sd , t1.total total1, t2.total total2 FROM transaction_sum t1 LEFT
  JOIN day_sum t2 ON t1.id=t2.id AND (to_unix_timestamp(t2.sd, 'yyyy-MM-dd HH:mm:ss') +
  1)=unix_timestamp(t1.timestamp, 'yyyy-MM-dd HH:mm:ss'))
  CREATE STREAM transaction_stream(timestamp STRING, id STRING, transaction DOUBLE) ROW FORMAT
  DELIMITED FIELDS TERMINATED BY ',' TBLPROPERTIES ('topic'='${db}.transaction', "kafka.zookeeper"
  ="tw-node127:2181");
  SELECT count(*) INTO threshold_count FROM max_min_diff_window_stream WHERE maxmindiff >= 50
  IF threshold_count > 0 THEN
     INSERT INTO warm_transaction SELECT id, sd, total FROM day_sum ORDER BY total DESC LIMIT 20
END IF END;

StreamJob的持久化

Emily可以熟练使用Slipstream了,但有一天当她配置了一些信息后,应用意外终止了,因此她开始使 用StreamJob做持久化:

CREATE STREAMJOB access_log_streamjob AS ("INSERT INTO result SELECT url, COUNT(*) FROM waccesslog
  GROUP BY url");
  START STREAMJOB access_log_streamjob;

邮件警告

在每台机器的/etc/inceptor1/conf/alert4j.properties中配置:

  alert4j.service=email
  email.server.host=smtp.exmail.qq.com
  email.server.port=465
  email.server.ssl=true
  email.validate=true
  email.sender.username=prod@xxx.io
  email.sender.password=password
  email.from.address=prod@xxx.io
  email.to.addresses=stream-alert@xxx.io

当系统发生问题时,系统会发邮件到stream-alert@xxx.io, 其中包含了关于系统问题的重要信息.

高可用性

通过Transwarp Manager管理界面配置额外的InceptorServer,设置它的role为Standby。 设置完毕后将其启 动。
使用 checkpoint 和 WAL 保证不丢失数据:
第一步, 通过Transwarp Manager管理界面修改InceptorServer配置,添加"stream.driver.checkpoint.dir" 配置项,并将其设置为HDFS上的某个目录。配置完毕后,重启Inceptor。
第二步, 创建 application 时需要指定 application 的 checkpoint 目录,设置为HDFS上的某个目录:

  CREATE APPLICATION app1 WITH APPPROPERTIES("application.checkpoint.dir"="/tmp/app1/",
  "application.enable.wal"="true");
或者对于已有的 application, 设置这两个属性:
  ALTER APPLICATION app1 SET APPPROPERTIES("application.checkpoint.dir"="/tmp/app1/");
  ALTER APPLICATION app1 SET APPPROPERTIES("application.enable.wal"="true");

Holodesk配置

Emily老板告诉她,网站访问次数统计功能非常有用,有好几个其他小组都在使用该统计结果。然而,由于当 前数据存在Hive Table,查询速度太慢。老板让她想想有没有什么办法使查询速度更快。Emily想到组 件Holodesk可以用于快速查询分析,因此她将结果表设置为Holodesk,并做了相关参数设置:

CREATE TABLE holo(url STRING, count INT) STORED AS HOLODESK TBLPROPERTIES("holodesk.window.length"
  ="100000","holodesk.window.timeout"="100000");
  CREATE STREAM w_accesslog AS SELECT * FROM accesslog STREAMWINDOW sw AS (LENGTH '10' SECOND SLIDE
  '10' SECOND) TBLPROPERTIES("kafka.zookeeper"="tw-node127:2181");
  INSERT INTO holo SELECT url, COUNT(*) FROM w_accesslog GROUP BY url;

Slipstream参数整理

SET =;
这里写图片描述

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值