慢慢活成讨厌的样子-CSDN博客

原创 Hbase基本命令

Hbase 创建表 create 'student','info' 展示所有表 list 查看表结构 desc 'student' 修改表结构 alter 'student',{NAME=>'info',VERSIONS=>'3'} 添加列族 alter 'student','msg' 删除列族 alter 'student',NAME=>'msg...

2020-06-18 19:47:33 302

原创 Sqoop的导入导出数据

导入数据 RDBMS(传统关系型数据库mysql)到HDFS 全部导入 $ bin/sqoop import \ --connect jdbc:mysql://hadoop102:3306/company \ --username root \ --password 000000 \ --table staff \ --target-dir /user/company \ ...

2020-06-18 19:45:24 223

public class MysqlSource extends AbstractSource implements Configurable, PollableSource { private MySQLSourceHelper mySQLSourceHelper; private int queryDelay; @Override public Status process() throws EventDeliveryException { tr...

2020-06-18 19:37:41 423

转载 Flume自定义Source实现循环打印

public class MySource extends AbstractSource implements Configurable, PollableSource { //前缀参数 private String prefix; //后缀参数 private String suffix; //延时参数 private Long delay; //生成数据参数 private int n; public Status proce...

2020-06-18 19:36:31 281

原创 Flume监控kafka的消息(可结合消费者使用) hdfs接收

a1.sources = r1a1.channels = c1a1.sinks = k1a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSourcea1.sources.r1.kafka.bootstrap.servers = localhost:9092a1.sources.r1.kafka.topics = flumekafkaa1.sinks.k1.type = hdfsa1.sinks.k1.hdfs.path = /

2020-06-18 19:35:31 197

原创 Flume监控目录下追加文件 kafka接收

a1.sources = r1a1.channels = c1a1.sinks = k1a1.sources.r1.type = TAILDIRa1.sources.r1.positionFile = /opt/module/taildir_position.jsona1.sources.r1.filegroups = f1 f2a1.sources.r1.filegroups.f1 = /opt/module/logs/.*file.*a1.sources.r1.filegroups.f.

2020-06-18 19:34:11 429

原创 Flume实时监控目录下多个追加文件

a3.sources = r3a3.sinks = k3a3.channels = c3# Describe/configure the sourcea3.sources.r3.type = TAILDIRa3.sources.r3.positionFile = /opt/module/flume/tail_dir.jsona3.sources.r3.filegroups = f1 f2a3.sources.r3.filegroups.f1 = /opt/module/flume/files

2020-06-18 19:32:51 591

原创 Flume实时监控目录下多个新文件

a3.sources = r3a3.sinks = k3a3.channels = c3# Describe/configure the sourcea3.sources.r3.type = spooldira3.sources.r3.spoolDir = /opt/module/flume/uploada3.sources.r3.fileSuffix = .COMPLETEDa3.sources.r3.fileHeader = true#忽略所有以.tmp结尾的文件，不上传a3.sou

2020-06-18 19:30:56 401

原创 Flume实时监控单个追加文件

# Name the components on this agenta2.sources = r2a2.sinks = k2a2.channels = c2# Describe/configure the sourcea2.sources.r2.type = execa2.sources.r2.command = tail -F /opt/module/datas/A.loga2.sources.r2.shell = /bin/bash -c# Describe the sinka2

2020-06-18 19:30:03 275

原创 Flume监控端口数据的配置

# Name the components on this agenta1.sources = r1a1.sinks = k1a1.channels = c1# Describe/configure the sourcea1.sources.r1.type = netcata1.sources.r1.bind = localhosta1.sources.r1.port = 44444# Describe the sinka1.sinks.k1.type = logger# Use

2020-06-18 19:28:48 244

原创处理Url数据按/Local /WEB分类

object Test5 { System.setProperty("hadoop.home.dir","d://software/hadoop-2.9.2") def main(args: Array[String]): Unit = { val spark = SparkSession .builder() .master("local") .appName("test2") .getOrCreate() ...

2020-06-09 16:47:32 232

原创智慧交通指标分析(实时查询黑名单车辆监控记录)

1.从数据库中查询黑名单车辆//获取黑名单数据 val df = spark .read .format("jdbc") .option("url", "jdbc:mysql://hadoop-senior.test.com:3306/traffic1") .option("driver", "com.mysql.jdbc.Driver") .option("dbtable", "blackname") .option("u...

2020-06-08 20:28:05 610

原创智慧交通指标分析(获取道路转换率)

获取每辆车的运行轨迹1 11 21 31 12 12 32 4================================(1,[1,2,3,1])(2,[1,3,4])concat_ws('|',collect_list(road_id))(1,1|2|3|1)(2,1|3|4])val sql1 = "select car,concat_ws(',',collect_list(road_id)) roadsfrom...

2020-06-08 20:26:31 342

原创智慧交通指标分析(卡口监控)

1.从监控日志表中获取每个卡口摄像头个数查询没有过车的卡口(暂放)select * from traffic.monitor_camera_info where monitor_id not in ( select monitor_id from traffic.monitor_flow_action)val sql1 = "select monitor_id,count(distinct camera_id) cameraCnt,count(0) carCnt from traffic..

2020-06-08 20:24:58 1201

原创智慧交通指标分析(每天每小时随机抽取车辆)

每天每小时的车流量及车流信息val monitorFlowAction = spark.sql("select * from traffic.monitor_flow_action")//对查出的数据过滤一遍，去掉脏数据var carFilterRDD = monitorFlowAction.rdd.filter(_.size == 8)/** * 切割时间，划分时间段（date hour, car_id） */var carMapRDD = carFilterRDD.map(x =&g..

2020-06-08 20:21:55 692

原创智慧交通指标分析(top5车辆高速通过的卡口)

按卡口分组，获取不同车速类型通过卡口的车辆数val sql = "select * from traffic.monitor_flow_action"val df = spark.sql(sql)implicit val monitorFlowActionEncoder: Encoder[MonitorFlowAction] = ExpressionEncoder()implicit val tupleEncoder: Encoder[Tuple2[String, Moni...

2020-06-08 20:19:54 1086

原创智慧交通指标分析(各个区域车流量最多的10条道路与车流及对应卡口的车流)

各个区域各条道路的车流SELECT area_id,road_id,COUNT(car) AS carNum1FROM traffic.monitor_flow_actionGROUP BY area_id,road_id各个区域top10车流的道路及车流SELECT area_id,road_id,carNum1FROM ( SELECT area_id,road_id,carNum1, ROW_NUMBER() OVER (PARTITION BY area_i...

2020-06-08 20:18:16 2478

原创模拟区道路监控管理(数据kafka消费者)

object Test2 { def main(args: Array[String]): Unit = { val conf = new SparkConf().setMaster("local[2]").setAppName("monitor_action_test2") val sc = new SparkContext(conf) sc.setLogLevel("WARN") val ssc = new StreamingContext(sc, Seconds(5...

2020-06-08 20:16:23 240 1

原创模拟区道路监控管理(数据sql筛选)

object Test1 { def main(args: Array[String]): Unit = { val spark = SparkSession.builder().master("local").appName("test1").getOrCreate() spark.sparkContext.setLogLevel("WARN")// val file = "d://data.txt" val file = args(0) val df = s...

2020-06-08 20:15:29 238

原创模拟区道路监控管理(数据源)

object MockData { /** * 获取n位随机数 * * @param index 位数 * @param random * @return */ def randomNum(index: Int, random: Random): String = { var str = "" for (i <- 0 until index) { str += random.nextInt(10) } ...

2020-06-08 20:14:43 231

原创 sparkStreaming nc flume

nc安装 sudo yum install -y nc起个任意端口启动nc nc -lk 9999删除nc yum erase nc -yflume监控目录 spooldir监控文件 execsparkStreaming整合flume 拉模式Poll 配置文件flume-poll.conf a1.sources = r1 a1.sinks = k1 a1.channels = c1 ...

2020-06-03 13:09:11 134

原创利用sparkStreaming接受kafka中的数据实现单词计数----采用receivers 08版本

//object SparkStreamingKafkaReceiverCheckpoint {//// System.setProperty("hadoop.home.dir", "d://software/hadoop-2.9.2")// def updateFunc(a:Seq[Int], b:Option[Int]) :Option[Int] ={// Some(a.sum+b.getOrElse(0))// }// def main(args: Array[String]...

2020-06-03 13:07:51 266

原创利用sparkStreaming接受kafak中的消息，采用的是低层次Api-- direct 08版本

//object SparkStreamingKafka10DirectCheckpoint {// System.setProperty("hadoop.home.dir", "d://software/hadoop-2.9.2")// def updateFunc(a: Seq[Int], b: Option[Int]): Option[Int] = {// Some(a.sum + b.getOrElse(0))// }//// def main(args: Array[St...

2020-06-03 13:07:20 125

原创利用sparkStreaming接受kafak中的消息，采用的是低层次Api-- direct 10版本

object SparkStreamingKafkaDirectCheckpoint { System.setProperty("hadoop.home.dir", "d://software/hadoop-2.9.2") def updateFunc(a: Seq[Int], b: Option[Int]): Option[Int] = { Some(a.sum + b.getOrElse(0)) } def main(args: Array[String]): Unit ...

2020-06-03 13:06:49 127

原创 spark UDTF

class MyUDTF extends GenericUDTF { override def close(): Unit = { // TODO Auto-generated method stub } //这个方法的作用：1.输入参数校验 2. 输出列定义，可以多于1列，相当于可以生成多行多列数据 override def initialize(args: Array[ObjectInspector]): StructObjectInspector = { if (a...

2020-06-02 21:03:27 430

原创 spark UDAF

object Two { def main(args: Array[String]): Unit = { val spark = SparkSession.builder().master("local").appName("Two").getOrCreate() val df = spark.read.json("D:\\Two.json") df.createOrReplaceTempView("user") /** * 注册一个UDAF函数,实现统计相...

2020-06-02 21:02:50 164

原创 sparkStreaming整合flume 推模式Push

object SparkStreamingFlumePush { System.setProperty("hadoop.home.dir", "d://software/hadoop-2.9.2") //newValues 表示当前批次汇总成的(word,1)中相同单词的所有的1 //runningCount 历史的所有相同key的value总和 def updateFunction(newValues: Seq[Int], runningCount: Option[Int]): Optio...

2020-06-02 21:00:34 122

原创 sparkStreaming整合flume 拉模式Poll

object SparkStreamingFlumePoll { System.setProperty("hadoop.home.dir", "d://software/hadoop-2.9.2") //newValues 表示当前批次汇总成的(word,1)中相同单词的所有的1 //runningCount 历史的所有相同key的value总和 def updateFunction(newValues: Seq[Int], runningCount: Option[Int]): Optio...

2020-06-02 21:00:04 170

空空如也

空空如也