一、将DStream输出到文件中
Spark Streaming提供了多个上层接口,用于将DStream书出到外部文件,包括saveAsObjectFiles、saveAsTextFiles、saveAsHadoopFiles,可以分别将DStram输出到序列化文件,文本文件及Hadoop文件中。
下面简单的词频统计将DStream输出到文本文件中
关键步骤
1、构建一个流式上线文,配置我们Spark集群的地址
2、利用textFileStream从传入的路径读入我们的文本文件,注意,textFileStream只会监控读取指定目录新建文件的内容,这里将需要统计的文件复制到input目录下
3、对文本文件进行词频统计
4、利用print、saveAsTextFiles、saveAsObjectFiles,三种输出操作
package doc
import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}
/**
* @Author huangwei
* @Date 19-10-17
* @Comments : 将DStream 输出到文件中
**/
object DStreamSaveFile {
def main(args: Array[String]): Unit = {
val conf = new SparkConf()
.setAppName("DStreamSavaFile")
.setMaster("local[*]")
val ssc = new StreamingContext(conf,Seconds(3))
val input = "/home/huangwei/input"
val output = "/home/huangwei"
println("read file name:" + input)
val textStream = ssc.textFileStream("file://"+input)
val wcStream = textStream.flatMap{ line => line.split(" ")}
.map{ word => (word,1)}
.reduceByKey(_ + _)
wcStream.print()
// 保存到指定目录
wcStream.saveAsTextFiles("file://"+output+"/saveAsObjectFiles")
wcStream.saveAsObjectFiles("file://"+output+"/saveAsObjectFiles")
ssc.start()
ssc.awaitTermination()
}
}
二、将DStream输出到MySQL中
应用C3P0连接池,建立一个数据库连接的通用类
package doc
import java.sql.Connection
import com.mchange.v2.c3p0.ComboPooledDataSource
/**
* @Author huangwei
* @Date 19-10-17
* @Comments
**/
class MysqlPool extends Serializable {
private val cpds:ComboPooledDataSource = new ComboPooledDataSource(true)
private val conf = Conf.mysqlConfig
try {
// 利用c3p00设置MySQL的各类信息
cpds.setJdbcUrl(conf.get("url").getOrElse("jdbc:mysql://localhost:3306/test?useUnicode=true&characterEncoding=UTF-8"))
cpds.setDriverClass("com.mysql.jdbc.Driver")
cpds.setUser(conf.get("username").getOrElse("root"))
cpds.setPassword(conf.get("password").getOrElse("Mysql_123"))
cpds.setMaxPoolSize(200)
cpds.setMinPoolSize(20)
cpds.setAcquireIncrement(5)
cpds.setMaxStatements(180)
}catch {
case e:Exception => e.printStackTrace()
}
// 获取连接
def getConnection:Connection = {
try {
return cpds.getConnection()
}catch {
case ex:Exception => ex.printStackTrace()
null
}
}
}
object MysqlManager {
var mysqlManager:MysqlPool = _
def getMysqlManager:MysqlPool = {
synchronized {
if (mysqlManager == null) {
mysqlManager = new MysqlPool
}
}
mysqlManager
}
}
MySQL输出操作
package doc
import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}
/**
* @Author huangwei
* @Date 19-10-17
* @Comments 将DStream输出到MySQL中
**/
object DStreamMySQL {
def main(args: Array[String]): Unit = {
val conf = new SparkConf()
.setAppName("DStreamSavaFile")
.setMaster("local[*]")
val ssc = new StreamingContext(conf,Seconds(3))
val input = "/home/huangwei/input"
println("read file name:" + input)
val textStream = ssc.textFileStream("file://"+input)
val dsStream = textStream.map{line => line.split(",")}.map(f =>(f(0),f(1),f(2),f(3)))
dsStream.print()
dsStream.foreachRDD(rdd =>{
if (!rdd.isEmpty()) {
rdd.foreachPartition(partitionRecords => {
// 从连接池中获取一个连接
val conn = MysqlManager.getMysqlManager.getConnection
val statement = conn.createStatement()
try {
conn.setAutoCommit(false)
partitionRecords.foreach(record => {
// SQL语句,往table中写入数据
val sql = "insert into person(name,gender,age,homeaddress) values('"+record._1+"','"+record._2+"','"+record._3.toInt+"','"+record._4+"')"
statement.addBatch(sql) // 加入batch
})
statement.executeBatch() // 执行batch
conn.commit() // 提交执行
}catch {
case e:Exception => e.printStackTrace()
} finally {
statement.close() // 关闭状态
conn.close() // 关闭连接
}
})
}
})
ssc.start()
ssc.awaitTermination()
}
}
错误处理:Caused by: java.sql.SQLException: Incorrect string value: ‘\xE9\x9B\xB7\xE5\x86\x9B’ for column …
这属于编码问题,是数据库的charset和collation问题。
解决方法:尝试把表的charset改为utf-8,collection改为utf8-unicode-ci。这里我是把表drop了,再重新create
mysql> DROP TABLE person;
mysql> CREATE TABLE person ( name varchar(10), gender varchar(5), age int, homeaddress varchar(10)) charset utf8 collate utf8_general_ci;
运行结果
查询数据库结果
mysql> select * from person;
+-----------+--------+------+---------------+
| name | gender | age | homeaddress |
+-----------+--------+------+---------------+
| 马云 | 男 | 55 | 浙江-杭州 |
| 马化腾 | 男 | 48 | 广东-深圳 |
| 李彦宏 | 男 | 51 | 山西-阳泉 |
| 刘强东 | 男 | 46 | 江苏-宿迁 |
| 雷军 | 男 | 50 | 湖北-仙桃 |
+-----------+--------+------+---------------+
5 rows in set (0.00 sec)