streaming接mysql数据库_sparkstreaming实时统计并且存储到mysql数据库中

最新推荐文章于 2021-01-30 10:06:15 发布

电眼樱桃女

最新推荐文章于 2021-01-30 10:06:15 发布

阅读量265

点赞数

文章标签： streaming接mysql数据库

本文链接：https://blog.csdn.net/weixin_42239752/article/details/113543857

版权

package com.scala.my

import org.apache.spark.SparkConf

import org.apache.spark.streaming.Durations

import org.apache.spark.streaming.StreamingContext

/**

* @author root

* 测试步骤：

* 1\打开h15\h16\h17\h18,启动zookeeper，再启动hadoop集群：start-all.sh，再启动mysql

* 2\在h15上创建文件夹wordcount_checkpoint，用于docheckpoint

* 在h5上mysql的dg数据库中创建表t_word

* 3\启动eclipse的本程序，让他等待着

* 4\在h15的dos窗口下输入单词,以空格分隔的单词(需要在h15上开启端口9999:#nc -lk 9999)

* 5\查询h15上的mysql的dg数据库的t_word表是否有数据即可

*注：建表语句

* mysql> show create table wordcount; //查看表语句

CREATE TABLE t_word (

id int(11) NOT NULL AUTO_INCREMENT,

updated_time timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,

word varchar(255) DEFAULT NULL,

count int(11) DEFAULT NULL,

PRIMARY KEY (id)

);

* 测试结果：通过，注意-----》第74行没有取得数据，原因在最后没有触发事件(封装事件)，目前已经解决

* sh spark-submit --master spark://de2:7077 --class 全类名 --driver-class-path /mysql-connector-java-5.1.26.jar sparkstreaming.jar

sh spark-submit --class com.day6.scala.my.PresistMysqlWordCount --master yarn-cluster --driver-class-path /home/spark-1.5.1-bin-hadoop2.4/lib/mysql-connector-

java-5.1.31-bin.jar /home/spark-1.5.1-bin-hadoop2.4/sparkstreaming.jar

$bin/hadoop dfsadmin -safemode leave 也就是关闭Hadoop的安全模式，这样问题就解决了。 */ object PresistMysqlWordCount { def main(args: Array[String]): Unit = { //获取streamingContext，并且设置每5秒切割一次rdd // val sc = new StreamingContext(new SparkConf().setAppName("mysqlPresist").setMaster("local[2]"), Durations.seconds(8)) val sc = new StreamingContext(new SparkConf().setAppName("mysqlPresist").setMaster("local[2]"), Durations.seconds(8)) //设置checkpoit缓存策略 /** * 利用 checkpoint 来保留上一个窗口的状态， * 这样可以做到移动窗口的更新统计 */ sc.checkpoint("hdfs://hh15:8020/wordcount_checkpoint") // sc.checkpoint("hdfs://h15:8020/wordcount_checkpoint") //获取doc窗口或者hdfs上的words // val lines=sc.textFileStream("hdfs://h15:8020/文件夹名称") //实时监控hdfs文件夹下新增的数据 val lines = sc.socketTextStream("hh15", 9999) // val lines = sc.socketTextStream("h15", 9999) //压扁 val words = lines.flatMap { x => x.split(" ") } //map val paris = words.map { (_, 1) } //定义一个函数,用于保持状态 val addFunc = (currValues: Seq[Int], prevValueState: Option[Int]) => { var newValue = prevValueState.getOrElse(0) for (value wd.foreachPartition( data => { val conn = ConnectPool.getConn("root", "1714004716", "hh15", "dg") // val conn = ConnectPool.getConn("root", "1714004716", "h15", "dg") //插入数据 // conn.prepareStatement("insert into t_word2(word,num) values('tom',23)").executeUpdate() try { for (row

电眼樱桃女

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
streaming接mysql数据库_sparkstreaming实时统计并且存储到mysql数据库中

package com.scala.myimport org.apache.spark.SparkConfimport org.apache.spark.streaming.Durationsimport org.apache.spark.streaming.StreamingContext/**** @author root* 测试步骤：* 1\打开h15\h16\h17\h18,启动zo...
复制链接

扫一扫