目录
2.2.1 文件Source
基于文件:
readTextFile(path)
读取文本文件,文件遵循TextInputFormat读取规则,逐行读取并返回。
示例:
1.示例数据:
sensor_1,1547718199,35.8
sensor_6,1547718201,15.4
sensor_7,1547718202,6.7
sensor_10,1547718205,38.1
sensor_1,1547718207,36.3
sensor_1,1547718209,32.8
sensor_1,1547718212,37.1
2.操作过程
将数据文件拷贝到D:目录下,然后创建如下的class
首先还是和我之前写的分布式系列博客4.2.2中,flink的五大步骤
然后sorce这里我们就要改动了
可以看出这个api只需要给出文件路径
3.最终代码
package com.edu.neusoft.bigdata.flink.source;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
public class FileSource {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStreamSource<String> fileSource = env.readTextFile("你的文件路径");
SingleOutputStreamOperator<String> stream = fileSource.map(new MapFunction<String, String>() {
@Override
public String map(String line) throws Exception {
String[] words = line.split(",");
boolean f = Double.valueOf(words[2]) > 30;
return line+"->"+f;
}
});
stream .print().setParallelism(1);
env.execute("FileSource");
}
}
这里我做了一个简单的transform将温度大于30的进行输出
2.2.2 Socket Source
2.2.3 集合Source
基于集合:
fromCollection(Collection)
通过Java中的Collection集合创建一个数据流,集合中的所有元素必须是相同类型的
示例:
1.操作过程
然后创建如下的class
还是五大步骤,只需要更改sorce部分,它需要的参数是集合。
2.最终代码
package com.edu.neusoft.bigdata.flink.source;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import java.util.Arrays;
public class CollectionSource {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStreamSource<Integer> Source = env.fromCollection(Arrays.asList(1, 3, 5, 7, 9));
//方法一
// SingleOutputStreamOperator<Integer> stream = Source.map(new MapFunction<Integer, Integer>() {
// @Override
// public Integer map(Integer n) throws Exception {
// return n * n;
// }
// });
//方法二
SingleOutputStreamOperator<Integer> stream = Source.map(n -> n * n);
stream.print().setParallelism(1);
env.execute("CollectionSource");
}
}
这了简单做了一个transform,对每个数都进行平方,这里我用了两个方法,方法一是我们之前讲到过的map方法,方法二是拉姆达表达式,更简单方便(有点像Scala里的语法)。
2.2.4 Kafka Source(主要)
第三方Source对接:
addSource可以实现对接第三方数据源的数据
系统内置提供了一批Connectors
示例:
1.虚拟机环境启动zk、kafka:
$ zkServer.sh start
$ cd /usr/local/kafka
$ bin/kafka-server-start.sh config/server.properties
2.在kafka上创建一个topic t1:
3.在realtime工程的pom.xml文件中添加如下依赖:
<!-- Kafka Connector -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka-0.11_2.11</artifactId>
<version>1.11.3</version>
</dependency>
4.编写class:
然后创建如下的class
还是五大步骤,重点是写source,代码如下
package com.edu.neusoft.bigdata.flink.source;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer011;
import java.util.Properties;
public class KafkaSource {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
//kafka配置项
String topic = "t1";
Properties prop = new Properties();
prop.setProperty("bootstrap.servers","ubuntu:9092");
//消费者组id
prop.setProperty("group.id","flink_kafka_group1");
//初始化consumer
FlinkKafkaConsumer011<String> flinkKafkaConsumer = new FlinkKafkaConsumer011<>(topic, new SimpleStringSchema(), prop);
//默认的消费策略
flinkKafkaConsumer.setStartFromGroupOffsets();
DataStreamSource<String> kafkaSource = env.addSource(flinkKafkaConsumer);
// SingleOutputStreamOperator<Integer> stream = Source.map(new MapFunction<Integer, Integer>() {
// @Override
// public Integer map(Integer n) throws Exception {
// return n * n;
// }
// });
kafkaSource.print().setParallelism(1);
env.execute("KafkaSource");
}
}
这时我们打开kafka生产者这一端,可以看到控制台有数据产生,同时我们的kafkatool也有数据产生