自定义博客皮肤VIP专享

*博客头图:

格式为PNG、JPG,宽度*高度大于1920*100像素,不超过2MB,主视觉建议放在右侧,请参照线上博客头图

请上传大于1920*100像素的图片!

博客底图:

图片格式为PNG、JPG,不超过1MB,可上下左右平铺至整个背景

栏目图:

图片格式为PNG、JPG,图片宽度*高度为300*38像素,不超过0.5MB

主标题颜色:

RGB颜色,例如:#AFAFAF

Hover:

RGB颜色,例如:#AFAFAF

副标题颜色:

RGB颜色,例如:#AFAFAF

自定义博客皮肤

-+
  • 博客(113)
  • 收藏
  • 关注

原创 Hbase基本命令

Hbase 创建表 create 'student','info' 展示所有表 list 查看表结构 desc 'student' 修改表结构 alter 'student',{NAME=>'info',VERSIONS=>'3'} 添加列族 alter 'student','msg' 删除列族 alter 'student',NAME=>'msg...

2020-06-18 19:47:33 258 1

原创 Sqoop的导入导出数据

导入数据 RDBMS(传统关系型数据库mysql)到HDFS 全部导入 $ bin/sqoop import \ --connect jdbc:mysql://hadoop102:3306/company \ --username root \ --password 000000 \ --table staff \ --target-dir /user/company \ ...

2020-06-18 19:45:24 186

原创 Flume自定义Source 连接mysql

public class MysqlSource extends AbstractSource implements Configurable, PollableSource { private MySQLSourceHelper mySQLSourceHelper; private int queryDelay; @Override public Status process() throws EventDeliveryException { tr...

2020-06-18 19:37:41 358

转载 Flume自定义Source实现循环打印

public class MySource extends AbstractSource implements Configurable, PollableSource { //前缀参数 private String prefix; //后缀参数 private String suffix; //延时参数 private Long delay; //生成数据参数 private int n; public Status proce...

2020-06-18 19:36:31 244

原创 Flume监控kafka的消息(可结合消费者使用) hdfs接收

a1.sources = r1a1.channels = c1a1.sinks = k1a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSourcea1.sources.r1.kafka.bootstrap.servers = localhost:9092a1.sources.r1.kafka.topics = flumekafkaa1.sinks.k1.type = hdfsa1.sinks.k1.hdfs.path = /

2020-06-18 19:35:31 173

原创 Flume监控目录下追加文件 kafka接收

a1.sources = r1a1.channels = c1a1.sinks = k1a1.sources.r1.type = TAILDIRa1.sources.r1.positionFile = /opt/module/taildir_position.jsona1.sources.r1.filegroups = f1 f2a1.sources.r1.filegroups.f1 = /opt/module/logs/.*file.*a1.sources.r1.filegroups.f.

2020-06-18 19:34:11 387

原创 Flume实时监控目录下多个追加文件

a3.sources = r3a3.sinks = k3a3.channels = c3# Describe/configure the sourcea3.sources.r3.type = TAILDIRa3.sources.r3.positionFile = /opt/module/flume/tail_dir.jsona3.sources.r3.filegroups = f1 f2a3.sources.r3.filegroups.f1 = /opt/module/flume/files

2020-06-18 19:32:51 546

原创 Flume实时监控目录下多个新文件

a3.sources = r3a3.sinks = k3a3.channels = c3# Describe/configure the sourcea3.sources.r3.type = spooldira3.sources.r3.spoolDir = /opt/module/flume/uploada3.sources.r3.fileSuffix = .COMPLETEDa3.sources.r3.fileHeader = true#忽略所有以.tmp结尾的文件,不上传a3.sou

2020-06-18 19:30:56 366

原创 Flume实时监控单个追加文件

# Name the components on this agenta2.sources = r2a2.sinks = k2a2.channels = c2# Describe/configure the sourcea2.sources.r2.type = execa2.sources.r2.command = tail -F /opt/module/datas/A.loga2.sources.r2.shell = /bin/bash -c# Describe the sinka2

2020-06-18 19:30:03 236

原创 Flume监控端口数据的配置

# Name the components on this agenta1.sources = r1a1.sinks = k1a1.channels = c1# Describe/configure the sourcea1.sources.r1.type = netcata1.sources.r1.bind = localhosta1.sources.r1.port = 44444# Describe the sinka1.sinks.k1.type = logger# Use

2020-06-18 19:28:48 210

原创 处理Url数据按/Local /WEB分类

object Test5 { System.setProperty("hadoop.home.dir","d://software/hadoop-2.9.2") def main(args: Array[String]): Unit = { val spark = SparkSession .builder() .master("local") .appName("test2") .getOrCreate() ...

2020-06-09 16:47:32 209

原创 智慧交通指标分析(实时查询黑名单车辆监控记录)

1.从数据库中查询黑名单车辆//获取黑名单数据 val df = spark .read .format("jdbc") .option("url", "jdbc:mysql://hadoop-senior.test.com:3306/traffic1") .option("driver", "com.mysql.jdbc.Driver") .option("dbtable", "blackname") .option("u...

2020-06-08 20:28:05 570

原创 智慧交通指标分析(获取道路转换率)

获取每辆车的运行轨迹1 11 21 31 12 12 32 4================================(1,[1,2,3,1])(2,[1,3,4])concat_ws('|',collect_list(road_id))(1,1|2|3|1)(2,1|3|4])val sql1 = "select car,concat_ws(',',collect_list(road_id)) roadsfrom...

2020-06-08 20:26:31 307

原创 智慧交通指标分析(卡口监控)

1.从监控日志表中获取每个卡口摄像头个数查询没有过车的卡口(暂放)select * from traffic.monitor_camera_info where monitor_id not in ( select monitor_id from traffic.monitor_flow_action)val sql1 = "select monitor_id,count(distinct camera_id) cameraCnt,count(0) carCnt from traffic..

2020-06-08 20:24:58 1088

原创 智慧交通指标分析(每天每小时随机抽取车辆)

每天每小时的车流量及车流信息val monitorFlowAction = spark.sql("select * from traffic.monitor_flow_action")//对查出的数据过滤一遍,去掉脏数据var carFilterRDD = monitorFlowAction.rdd.filter(_.size == 8)/** * 切割时间,划分时间段(date hour, car_id) */var carMapRDD = carFilterRDD.map(x =&g..

2020-06-08 20:21:55 649

原创 智慧交通指标分析(top5车辆高速通过的卡口)

按卡口分组,获取不同车速类型通过卡口的车辆数val sql = "select * from traffic.monitor_flow_action"val df = spark.sql(sql)implicit val monitorFlowActionEncoder: Encoder[MonitorFlowAction] = ExpressionEncoder()implicit val tupleEncoder: Encoder[Tuple2[String, Moni...

2020-06-08 20:19:54 970

原创 智慧交通指标分析(各个区域车流量最多的10条道路与车流及对应卡口的车流)

各个区域各条道路的车流SELECT area_id,road_id,COUNT(car) AS carNum1FROM traffic.monitor_flow_actionGROUP BY area_id,road_id各个区域top10车流的道路及车流SELECT area_id,road_id,carNum1FROM ( SELECT area_id,road_id,carNum1, ROW_NUMBER() OVER (PARTITION BY area_i...

2020-06-08 20:18:16 2374

原创 模拟区道路监控管理(数据kafka消费者)

object Test2 { def main(args: Array[String]): Unit = { val conf = new SparkConf().setMaster("local[2]").setAppName("monitor_action_test2") val sc = new SparkContext(conf) sc.setLogLevel("WARN") val ssc = new StreamingContext(sc, Seconds(5...

2020-06-08 20:16:23 212 1

原创 模拟区道路监控管理(数据sql筛选)

object Test1 { def main(args: Array[String]): Unit = { val spark = SparkSession.builder().master("local").appName("test1").getOrCreate() spark.sparkContext.setLogLevel("WARN")// val file = "d://data.txt" val file = args(0) val df = s...

2020-06-08 20:15:29 209

原创 模拟区道路监控管理(数据源)

object MockData { /** * 获取n位随机数 * * @param index 位数 * @param random * @return */ def randomNum(index: Int, random: Random): String = { var str = "" for (i <- 0 until index) { str += random.nextInt(10) } ...

2020-06-08 20:14:43 206

原创 sparkStreaming nc flume

nc安装 sudo yum install -y nc起个任意端口启动nc nc -lk 9999删除nc yum erase nc -yflume监控目录 spooldir监控文件 execsparkStreaming整合flume 拉模式Poll 配置文件flume-poll.conf a1.sources = r1 a1.sinks = k1 a1.channels = c1 ...

2020-06-03 13:09:11 104

原创 利用sparkStreaming接受kafka中的数据实现单词计数----采用receivers 08版本

//object SparkStreamingKafkaReceiverCheckpoint {//// System.setProperty("hadoop.home.dir", "d://software/hadoop-2.9.2")// def updateFunc(a:Seq[Int], b:Option[Int]) :Option[Int] ={// Some(a.sum+b.getOrElse(0))// }// def main(args: Array[String]...

2020-06-03 13:07:51 236

原创 利用sparkStreaming接受kafak中的消息,采用的是低层次Api-- direct 08版本

//object SparkStreamingKafka10DirectCheckpoint {// System.setProperty("hadoop.home.dir", "d://software/hadoop-2.9.2")// def updateFunc(a: Seq[Int], b: Option[Int]): Option[Int] = {// Some(a.sum + b.getOrElse(0))// }//// def main(args: Array[St...

2020-06-03 13:07:20 110

原创 利用sparkStreaming接受kafak中的消息,采用的是低层次Api-- direct 10版本

object SparkStreamingKafkaDirectCheckpoint { System.setProperty("hadoop.home.dir", "d://software/hadoop-2.9.2") def updateFunc(a: Seq[Int], b: Option[Int]): Option[Int] = { Some(a.sum + b.getOrElse(0)) } def main(args: Array[String]): Unit ...

2020-06-03 13:06:49 110

原创 spark UDTF

class MyUDTF extends GenericUDTF { override def close(): Unit = { // TODO Auto-generated method stub } //这个方法的作用:1.输入参数校验 2. 输出列定义,可以多于1列,相当于可以生成多行多列数据 override def initialize(args: Array[ObjectInspector]): StructObjectInspector = { if (a...

2020-06-02 21:03:27 400

原创 spark UDAF

object Two { def main(args: Array[String]): Unit = { val spark = SparkSession.builder().master("local").appName("Two").getOrCreate() val df = spark.read.json("D:\\Two.json") df.createOrReplaceTempView("user") /** * 注册一个UDAF函数,实现统计相...

2020-06-02 21:02:50 124

原创 sparkStreaming整合flume 推模式Push

object SparkStreamingFlumePush { System.setProperty("hadoop.home.dir", "d://software/hadoop-2.9.2") //newValues 表示当前批次汇总成的(word,1)中相同单词的所有的1 //runningCount 历史的所有相同key的value总和 def updateFunction(newValues: Seq[Int], runningCount: Option[Int]): Optio...

2020-06-02 21:00:34 103

原创 sparkStreaming整合flume 拉模式Poll

object SparkStreamingFlumePoll { System.setProperty("hadoop.home.dir", "d://software/hadoop-2.9.2") //newValues 表示当前批次汇总成的(word,1)中相同单词的所有的1 //runningCount 历史的所有相同key的value总和 def updateFunction(newValues: Seq[Int], runningCount: Option[Int]): Optio...

2020-06-02 21:00:04 142

原创 sparkStreming开窗函数应用----统计一定时间内的热门词汇

object SparkStreamingTCPWindowHotWords { def main(args: Array[String]): Unit = { //配置sparkConf参数 val sparkConf = new SparkConf().setAppName("SparkStreamingTCPWindowHotWords").setMaster("local[2]") //构建sparkContext对象 val sc = new SparkCon...

2020-06-01 20:46:57 172

原创 sparkStreming开窗函数---统计一定时间内单词出现的次数

object SparkStreamingTCPWindow { def main(args: Array[String]): Unit = { //配置sparkConf参数 val sparkConf = new SparkConf().setAppName("SparkStreamingTCPWindow").setMaster("local[2]") //构建sparkContext对象 val sc = new SparkContext(sparkConf)...

2020-06-01 20:46:27 207

原创 sparkStreaming流式处理,接受socket数据,实现单词统计并且每个批次数据结果累加

object SparkStreamingTCPTotal { //newValues 表示当前批次汇总成的(word,1)中相同单词的所有的1 //runningCount 历史的所有相同key的value总和 //newValues:新过来的值 //runningCount:之前保存的状态值 def updateFunction(newValues: Seq[Int], runningCount: Option[Int]): Option[Int] = { val newC...

2020-06-01 20:45:56 240

原创 sparkStreming流式处理接受socket数据,实现单词统计

object SparkStreamingTCP { def main(args: Array[String]): Unit = { //配置sparkConf参数 //一个线程进行计算,一个线程接收数据 val sparkConf = new SparkConf().setAppName("SparkStreamingTCP").setMaster("local[2]") //构建sparkContext对象 val sc = new SparkContex...

2020-06-01 20:45:26 90

原创 sparksql支持hive数据源

第一种idea直接调用metastore(引入spark-hive依赖包,引入hive-hcatalog-core依赖包) val spark = SparkSession.builder().master("local").appName("datasource") .config("fs.defaultFS", "hdfs://wml.com:9000") .config("spark.sql.warehouse.dir", "hdfs://wml.com:9000/test")...

2020-05-31 19:53:26 224

原创 spark UDF

object UDFDemo { def main(args: Array[String]): Unit = { val spark = SparkSession.builder() .master("local[*]") .appName("udfdemo") .getOrCreate() import spark.implicits._ val rdd: RDD[(Int, String, Double)] = spark.spark...

2020-05-29 16:33:04 145

原创 spark 通过df进行sql处理(简洁版)

object Demo1 { System.setProperty("hadoop.home.dir","D:\\hadoop\\hadoop-2.9.2") def main(args: Array[String]): Unit = { val spark = SparkSession.builder().appName("demo1").master("local[*]").getOrCreate() val lines = spark.sparkContext.textFil.

2020-05-29 16:32:06 497

原创 spark 通过df进行sql处理

object DataOperation { def main(args: Array[String]): Unit = { val spark = SparkSession.builder().master("local").appName("data-operation").getOrCreate() //1.读取json进行sql处理// jsonFile(spark) //2.读取text进行sql处理 textFile(spark) //3....

2020-05-29 16:31:25 782

原创 spark 通过读取各种文件获取df并写入

object DataSource { System.setProperty("hadoop.home.dir","d://software/hadoop-2.9.2") def main(args: Array[String]): Unit = { val spark = SparkSession.builder().master("local").appName("datasource").getOrCreate() //如果使用hive数据源,需在SparkSession中添加...

2020-05-29 16:30:04 1600

原创 spark Rdd,DataSet,DataFrame三个互转

object Rdd2DataFrame { System.setProperty("hadoop.home.dir","D:\\hadoop\\hadoop-2.9.2") def main(args: Array[String]): Unit = { val spark = SparkSession.builder().master("local").appName("rdd2dataframe").getOrCreate() val lineRdd = spark.spark.

2020-05-29 16:28:31 194

原创 spark二次排序

/** * Spark的二次排序 **/object SparkSecondarySort { System.setProperty("hadoop.home.dir","d://software/hadoop-2.9.2") def main(args: Array[String]): Unit = { //分区数量// val partitions: Int = args(0).toInt //文件输入路径 val inputPath: String =...

2020-05-29 08:57:56 100

原创 spark实现ip地址查询

/** * ip地址查询 */object IPLocation { def main(args: Array[String]): Unit = { //todo:创建sparkconf 设置参数 val sparkConf: SparkConf = new SparkConf().setAppName("IPLocaltion").setMaster("local[2]") //todo:创建SparkContext val sc = new SparkC...

2020-05-26 20:46:31 309

空空如也

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

提示
确定要删除当前文章?
取消 删除