- 博客(113)
- 收藏
- 关注
原创 Hbase基本命令
Hbase 创建表 create 'student','info' 展示所有表 list 查看表结构 desc 'student' 修改表结构 alter 'student',{NAME=>'info',VERSIONS=>'3'} 添加列族 alter 'student','msg' 删除列族 alter 'student',NAME=>'msg...
2020-06-18 19:47:33 258 1
原创 Sqoop的导入导出数据
导入数据 RDBMS(传统关系型数据库mysql)到HDFS 全部导入 $ bin/sqoop import \ --connect jdbc:mysql://hadoop102:3306/company \ --username root \ --password 000000 \ --table staff \ --target-dir /user/company \ ...
2020-06-18 19:45:24 186
原创 Flume自定义Source 连接mysql
public class MysqlSource extends AbstractSource implements Configurable, PollableSource { private MySQLSourceHelper mySQLSourceHelper; private int queryDelay; @Override public Status process() throws EventDeliveryException { tr...
2020-06-18 19:37:41 358
转载 Flume自定义Source实现循环打印
public class MySource extends AbstractSource implements Configurable, PollableSource { //前缀参数 private String prefix; //后缀参数 private String suffix; //延时参数 private Long delay; //生成数据参数 private int n; public Status proce...
2020-06-18 19:36:31 244
原创 Flume监控kafka的消息(可结合消费者使用) hdfs接收
a1.sources = r1a1.channels = c1a1.sinks = k1a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSourcea1.sources.r1.kafka.bootstrap.servers = localhost:9092a1.sources.r1.kafka.topics = flumekafkaa1.sinks.k1.type = hdfsa1.sinks.k1.hdfs.path = /
2020-06-18 19:35:31 173
原创 Flume监控目录下追加文件 kafka接收
a1.sources = r1a1.channels = c1a1.sinks = k1a1.sources.r1.type = TAILDIRa1.sources.r1.positionFile = /opt/module/taildir_position.jsona1.sources.r1.filegroups = f1 f2a1.sources.r1.filegroups.f1 = /opt/module/logs/.*file.*a1.sources.r1.filegroups.f.
2020-06-18 19:34:11 387
原创 Flume实时监控目录下多个追加文件
a3.sources = r3a3.sinks = k3a3.channels = c3# Describe/configure the sourcea3.sources.r3.type = TAILDIRa3.sources.r3.positionFile = /opt/module/flume/tail_dir.jsona3.sources.r3.filegroups = f1 f2a3.sources.r3.filegroups.f1 = /opt/module/flume/files
2020-06-18 19:32:51 546
原创 Flume实时监控目录下多个新文件
a3.sources = r3a3.sinks = k3a3.channels = c3# Describe/configure the sourcea3.sources.r3.type = spooldira3.sources.r3.spoolDir = /opt/module/flume/uploada3.sources.r3.fileSuffix = .COMPLETEDa3.sources.r3.fileHeader = true#忽略所有以.tmp结尾的文件,不上传a3.sou
2020-06-18 19:30:56 366
原创 Flume实时监控单个追加文件
# Name the components on this agenta2.sources = r2a2.sinks = k2a2.channels = c2# Describe/configure the sourcea2.sources.r2.type = execa2.sources.r2.command = tail -F /opt/module/datas/A.loga2.sources.r2.shell = /bin/bash -c# Describe the sinka2
2020-06-18 19:30:03 236
原创 Flume监控端口数据的配置
# Name the components on this agenta1.sources = r1a1.sinks = k1a1.channels = c1# Describe/configure the sourcea1.sources.r1.type = netcata1.sources.r1.bind = localhosta1.sources.r1.port = 44444# Describe the sinka1.sinks.k1.type = logger# Use
2020-06-18 19:28:48 210
原创 处理Url数据按/Local /WEB分类
object Test5 { System.setProperty("hadoop.home.dir","d://software/hadoop-2.9.2") def main(args: Array[String]): Unit = { val spark = SparkSession .builder() .master("local") .appName("test2") .getOrCreate() ...
2020-06-09 16:47:32 209
原创 智慧交通指标分析(实时查询黑名单车辆监控记录)
1.从数据库中查询黑名单车辆//获取黑名单数据 val df = spark .read .format("jdbc") .option("url", "jdbc:mysql://hadoop-senior.test.com:3306/traffic1") .option("driver", "com.mysql.jdbc.Driver") .option("dbtable", "blackname") .option("u...
2020-06-08 20:28:05 570
原创 智慧交通指标分析(获取道路转换率)
获取每辆车的运行轨迹1 11 21 31 12 12 32 4================================(1,[1,2,3,1])(2,[1,3,4])concat_ws('|',collect_list(road_id))(1,1|2|3|1)(2,1|3|4])val sql1 = "select car,concat_ws(',',collect_list(road_id)) roadsfrom...
2020-06-08 20:26:31 307
原创 智慧交通指标分析(卡口监控)
1.从监控日志表中获取每个卡口摄像头个数查询没有过车的卡口(暂放)select * from traffic.monitor_camera_info where monitor_id not in ( select monitor_id from traffic.monitor_flow_action)val sql1 = "select monitor_id,count(distinct camera_id) cameraCnt,count(0) carCnt from traffic..
2020-06-08 20:24:58 1088
原创 智慧交通指标分析(每天每小时随机抽取车辆)
每天每小时的车流量及车流信息val monitorFlowAction = spark.sql("select * from traffic.monitor_flow_action")//对查出的数据过滤一遍,去掉脏数据var carFilterRDD = monitorFlowAction.rdd.filter(_.size == 8)/** * 切割时间,划分时间段(date hour, car_id) */var carMapRDD = carFilterRDD.map(x =&g..
2020-06-08 20:21:55 649
原创 智慧交通指标分析(top5车辆高速通过的卡口)
按卡口分组,获取不同车速类型通过卡口的车辆数val sql = "select * from traffic.monitor_flow_action"val df = spark.sql(sql)implicit val monitorFlowActionEncoder: Encoder[MonitorFlowAction] = ExpressionEncoder()implicit val tupleEncoder: Encoder[Tuple2[String, Moni...
2020-06-08 20:19:54 970
原创 智慧交通指标分析(各个区域车流量最多的10条道路与车流及对应卡口的车流)
各个区域各条道路的车流SELECT area_id,road_id,COUNT(car) AS carNum1FROM traffic.monitor_flow_actionGROUP BY area_id,road_id各个区域top10车流的道路及车流SELECT area_id,road_id,carNum1FROM ( SELECT area_id,road_id,carNum1, ROW_NUMBER() OVER (PARTITION BY area_i...
2020-06-08 20:18:16 2374
原创 模拟区道路监控管理(数据kafka消费者)
object Test2 { def main(args: Array[String]): Unit = { val conf = new SparkConf().setMaster("local[2]").setAppName("monitor_action_test2") val sc = new SparkContext(conf) sc.setLogLevel("WARN") val ssc = new StreamingContext(sc, Seconds(5...
2020-06-08 20:16:23 212 1
原创 模拟区道路监控管理(数据sql筛选)
object Test1 { def main(args: Array[String]): Unit = { val spark = SparkSession.builder().master("local").appName("test1").getOrCreate() spark.sparkContext.setLogLevel("WARN")// val file = "d://data.txt" val file = args(0) val df = s...
2020-06-08 20:15:29 209
原创 模拟区道路监控管理(数据源)
object MockData { /** * 获取n位随机数 * * @param index 位数 * @param random * @return */ def randomNum(index: Int, random: Random): String = { var str = "" for (i <- 0 until index) { str += random.nextInt(10) } ...
2020-06-08 20:14:43 206
原创 sparkStreaming nc flume
nc安装 sudo yum install -y nc起个任意端口启动nc nc -lk 9999删除nc yum erase nc -yflume监控目录 spooldir监控文件 execsparkStreaming整合flume 拉模式Poll 配置文件flume-poll.conf a1.sources = r1 a1.sinks = k1 a1.channels = c1 ...
2020-06-03 13:09:11 104
原创 利用sparkStreaming接受kafka中的数据实现单词计数----采用receivers 08版本
//object SparkStreamingKafkaReceiverCheckpoint {//// System.setProperty("hadoop.home.dir", "d://software/hadoop-2.9.2")// def updateFunc(a:Seq[Int], b:Option[Int]) :Option[Int] ={// Some(a.sum+b.getOrElse(0))// }// def main(args: Array[String]...
2020-06-03 13:07:51 236
原创 利用sparkStreaming接受kafak中的消息,采用的是低层次Api-- direct 08版本
//object SparkStreamingKafka10DirectCheckpoint {// System.setProperty("hadoop.home.dir", "d://software/hadoop-2.9.2")// def updateFunc(a: Seq[Int], b: Option[Int]): Option[Int] = {// Some(a.sum + b.getOrElse(0))// }//// def main(args: Array[St...
2020-06-03 13:07:20 110
原创 利用sparkStreaming接受kafak中的消息,采用的是低层次Api-- direct 10版本
object SparkStreamingKafkaDirectCheckpoint { System.setProperty("hadoop.home.dir", "d://software/hadoop-2.9.2") def updateFunc(a: Seq[Int], b: Option[Int]): Option[Int] = { Some(a.sum + b.getOrElse(0)) } def main(args: Array[String]): Unit ...
2020-06-03 13:06:49 110
原创 spark UDTF
class MyUDTF extends GenericUDTF { override def close(): Unit = { // TODO Auto-generated method stub } //这个方法的作用:1.输入参数校验 2. 输出列定义,可以多于1列,相当于可以生成多行多列数据 override def initialize(args: Array[ObjectInspector]): StructObjectInspector = { if (a...
2020-06-02 21:03:27 400
原创 spark UDAF
object Two { def main(args: Array[String]): Unit = { val spark = SparkSession.builder().master("local").appName("Two").getOrCreate() val df = spark.read.json("D:\\Two.json") df.createOrReplaceTempView("user") /** * 注册一个UDAF函数,实现统计相...
2020-06-02 21:02:50 124
原创 sparkStreaming整合flume 推模式Push
object SparkStreamingFlumePush { System.setProperty("hadoop.home.dir", "d://software/hadoop-2.9.2") //newValues 表示当前批次汇总成的(word,1)中相同单词的所有的1 //runningCount 历史的所有相同key的value总和 def updateFunction(newValues: Seq[Int], runningCount: Option[Int]): Optio...
2020-06-02 21:00:34 103
原创 sparkStreaming整合flume 拉模式Poll
object SparkStreamingFlumePoll { System.setProperty("hadoop.home.dir", "d://software/hadoop-2.9.2") //newValues 表示当前批次汇总成的(word,1)中相同单词的所有的1 //runningCount 历史的所有相同key的value总和 def updateFunction(newValues: Seq[Int], runningCount: Option[Int]): Optio...
2020-06-02 21:00:04 142
原创 sparkStreming开窗函数应用----统计一定时间内的热门词汇
object SparkStreamingTCPWindowHotWords { def main(args: Array[String]): Unit = { //配置sparkConf参数 val sparkConf = new SparkConf().setAppName("SparkStreamingTCPWindowHotWords").setMaster("local[2]") //构建sparkContext对象 val sc = new SparkCon...
2020-06-01 20:46:57 172
原创 sparkStreming开窗函数---统计一定时间内单词出现的次数
object SparkStreamingTCPWindow { def main(args: Array[String]): Unit = { //配置sparkConf参数 val sparkConf = new SparkConf().setAppName("SparkStreamingTCPWindow").setMaster("local[2]") //构建sparkContext对象 val sc = new SparkContext(sparkConf)...
2020-06-01 20:46:27 207
原创 sparkStreaming流式处理,接受socket数据,实现单词统计并且每个批次数据结果累加
object SparkStreamingTCPTotal { //newValues 表示当前批次汇总成的(word,1)中相同单词的所有的1 //runningCount 历史的所有相同key的value总和 //newValues:新过来的值 //runningCount:之前保存的状态值 def updateFunction(newValues: Seq[Int], runningCount: Option[Int]): Option[Int] = { val newC...
2020-06-01 20:45:56 240
原创 sparkStreming流式处理接受socket数据,实现单词统计
object SparkStreamingTCP { def main(args: Array[String]): Unit = { //配置sparkConf参数 //一个线程进行计算,一个线程接收数据 val sparkConf = new SparkConf().setAppName("SparkStreamingTCP").setMaster("local[2]") //构建sparkContext对象 val sc = new SparkContex...
2020-06-01 20:45:26 90
原创 sparksql支持hive数据源
第一种idea直接调用metastore(引入spark-hive依赖包,引入hive-hcatalog-core依赖包) val spark = SparkSession.builder().master("local").appName("datasource") .config("fs.defaultFS", "hdfs://wml.com:9000") .config("spark.sql.warehouse.dir", "hdfs://wml.com:9000/test")...
2020-05-31 19:53:26 224
原创 spark UDF
object UDFDemo { def main(args: Array[String]): Unit = { val spark = SparkSession.builder() .master("local[*]") .appName("udfdemo") .getOrCreate() import spark.implicits._ val rdd: RDD[(Int, String, Double)] = spark.spark...
2020-05-29 16:33:04 145
原创 spark 通过df进行sql处理(简洁版)
object Demo1 { System.setProperty("hadoop.home.dir","D:\\hadoop\\hadoop-2.9.2") def main(args: Array[String]): Unit = { val spark = SparkSession.builder().appName("demo1").master("local[*]").getOrCreate() val lines = spark.sparkContext.textFil.
2020-05-29 16:32:06 497
原创 spark 通过df进行sql处理
object DataOperation { def main(args: Array[String]): Unit = { val spark = SparkSession.builder().master("local").appName("data-operation").getOrCreate() //1.读取json进行sql处理// jsonFile(spark) //2.读取text进行sql处理 textFile(spark) //3....
2020-05-29 16:31:25 782
原创 spark 通过读取各种文件获取df并写入
object DataSource { System.setProperty("hadoop.home.dir","d://software/hadoop-2.9.2") def main(args: Array[String]): Unit = { val spark = SparkSession.builder().master("local").appName("datasource").getOrCreate() //如果使用hive数据源,需在SparkSession中添加...
2020-05-29 16:30:04 1600
原创 spark Rdd,DataSet,DataFrame三个互转
object Rdd2DataFrame { System.setProperty("hadoop.home.dir","D:\\hadoop\\hadoop-2.9.2") def main(args: Array[String]): Unit = { val spark = SparkSession.builder().master("local").appName("rdd2dataframe").getOrCreate() val lineRdd = spark.spark.
2020-05-29 16:28:31 194
原创 spark二次排序
/** * Spark的二次排序 **/object SparkSecondarySort { System.setProperty("hadoop.home.dir","d://software/hadoop-2.9.2") def main(args: Array[String]): Unit = { //分区数量// val partitions: Int = args(0).toInt //文件输入路径 val inputPath: String =...
2020-05-29 08:57:56 100
原创 spark实现ip地址查询
/** * ip地址查询 */object IPLocation { def main(args: Array[String]): Unit = { //todo:创建sparkconf 设置参数 val sparkConf: SparkConf = new SparkConf().setAppName("IPLocaltion").setMaster("local[2]") //todo:创建SparkContext val sc = new SparkC...
2020-05-26 20:46:31 309
空空如也
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人