flink在批处理中常见的sink
基于本地集合的sink(Collection-based-sink)
基于文件的sink(File-based-sink)
基于本地集合的sink
目标:
基于下列数据,分别 进行打印输出,error输出,collect()
(19, “zhangsan”, 178.8),
(17, “lisi”, 168.8),
(18, “wangwu”, 184.8),
(21, “zhaoliu”, 164.8)
package com.ccj.pxj.heima.sink
import org.apache.flink.api.scala._
/**
* (19, "zhangsan", 178.8),
* (17, "lisi", 168.8),
* (18, "wangwu", 184.8),
* (21, "zhaoliu", 164.8)
*
*/
object BatchSinkCollection {
def main(args: Array[String]): Unit = {
val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
val datas: DataSet[(Int, String, Double)] = env.fromElements(
(19, "zhangsan", 178.8),
(17, "lisi", 168.8),
(18, "wangwu", 184.8),
(21, "zhaoliu", 164.8)
)
datas.print()
datas.printToErr()
print(datas.collect())
//env.execute("pxj")
}
}
基于文件的sink
flink支持多种存储设备上的文件,包括本地文件,hdfs文件等。
flink支持多种文件的存储格式,包括text文件,CSV文件等。
writeAsText():TextOuputFormat - 将元素作为字符串写入行。字符串是通过调用每个元素的toString()方法获得
的。
package com.ccj.pxj.heima.sink
import org.apache.flink.api.scala._
import org.apache.flink.core.fs.FileSystem.WriteMode
object BatchSinkFile {
def main(args: Array[String]): Unit = {
val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
val data: DataSet[Map[Int, String]] = env.fromElements(Map(1 -> "spark", 2 -> "flink"))
data.setParallelism(1).writeAsText("./data/d.txt",WriteMode.OVERWRITE)
env.execute()
}
}
写入HDFS
package com.ccj.pxj.heima.sink
import org.apache.flink.api.java.operators.DataSink
import org.apache.flink.api.scala._
import org.apache.flink.core.fs.FileSystem.WriteMode
object BatchSinkHDFSFile {
def main(args: Array[String]): Unit = {
val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
val dats: DataSet[(Int, String, Double)] = env.fromElements((19, "zhangsan", 178.8),
(17, "lisi", 168.8),
(18, "wangwu", 184.8),
(21, "zhaoliu", 164.8)
)
val dataa: DataSink[(Int, String, Double)] = dats.setParallelism(1).writeAsText("hdfs://pxj60:9000//input", WriteMode.OVERWRITE)
env.execute()
}
}
作者:pxj
日期:2021-07-26 23:33:20