Flink SQL案例实践(1.11.0)
1. 背景
- 2020年随着阿里flink 批流一体大会开展,更多人和公司知道了flink的强大以及业务场景下的实际表现.关注和使用flink的公司以及个人会越来越多
- 在大数据领域,一个引擎可以同时支持处理结构化数据,图计算,机器学习,流计算目前主流还是spark和flink.
- 由于引擎设计理念,flink一开始就是为流计算设计,而spark则一开始就是以批处理设计的,不过是微批次的理念.这也导致了在一些方面,流数据处理领域flink会更加强大和灵活.
- 本文案例是参考github上的一个开源项目,但在环境配置和细节上做了一些补充说明, 因为按照文档说明大家会发现跑步起来.
- https://github.com/Chengyanan1008/flink-sql-submit-client以及https://github.com/wuchong/flink-sql-submit
2. 环境
- flink 1.11.0
- maven 3.6.3
- jdk 8
- idea 2020.2
- mysql 5.6 (本人mac os 环境,直接brew install 安装的,windows环境可以看我另外mysql博客)
- kafka 2.11-2.2.0
- zopokeeper 3.4.9
3. 案例代码
- kafka就是本地模式,大家按照https://blog.csdn.net/lwf006164/article/details/94143819, 然后启动本地模式即可
- zookeeper 安装也是本地模式即可,https://blog.csdn.net/lwf006164/article/details/93301273
- kafka的脚本
- 启动kafka脚本 start-kafka.sh
#!/usr/bin/env bash
################################################################################
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
################################################################################
source "$(dirname "$0")"/kafka-common.sh
# prepare Kafka
echo "Preparing Kafka..."
start_kafka_cluster
- kafka停止脚本
#!/usr/bin/env bash
################################################################################
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
################################################################################
source "$(dirname "$0")"/kafka-common.sh
# prepare Kafka
echo "Stop Kafka..."
rm -rf /tmp/zookeeper
rm -rf /tmp/kafka-logs
stop_kafka_cluster
- kafka通用工具脚本 kafka-common.sh
#!/usr/bin/env bash
################################################################################
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
################################################################################
source "$(dirname "$0")"/env.sh
function create_kafka_json_source {
topicName="$1"
create_kafka_topic 1 1 $topicName
# put JSON data into Kafka
echo "Sending messages to Kafka..."
send_messages_to_kafka '{"rowtime": "2018-03-12T08:00:00Z", "user_name": "Alice", "event": { "message_type": "WARNING", "message": "This is a warning."}}' $topicName
send_messages_to_kafka '{"rowtime": "2018-03-12T08:10:00Z", "user_name": "Alice", "event": { "message_type": "WARNING", "message": "This is a warning."}}' $topicName
send_messages_to_kafka '{"rowtime": "2018-03-12T09:00:00Z", "user_name": "Bob", "event": { "message_type": "WARNING", "message": "This is another warning."}}' $topicName
send_messages_to_kafka '{"rowtime": "2018-03-12T09:10:00Z", "user_name": "Alice", "event": { "message_type": "INFO", "message": "This is a info."}}' $topicName
send_messages_to_kafka '{"rowtime": "2018-03-12T09:20:00Z", "user_name": "Steve", "event": { "message_type": "INFO", "message": "This is another info."}}' $topicName
send_messages_to_kafka '{"rowtime": "2018-03-12T09:30:00Z", "user_name": "Steve", "event": { "message_type": "INFO", "message": "This is another info."}}' $topicName
send_messages_to_kafka '{"rowtime": "2018-03-12T09:30:00Z", "user_name": null, "event": { "message_type": "WARNING", "message": "This is a bad message because the user is missing."}}' $topicName
send_messages_to_kafka '{"rowtime": "2018-03-12T10:40:00Z", "user_name": "Bob", "event": { "message_type": "ERROR", "message": "This is an error."}}' $topicName
}
function create_kafka_topic {
$KAFKA_DIR/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor $1 --partitions $2 --topic $3
}
function drop_kafka_topic {
$KAFKA_DIR/bin/kafka-topics.sh --delete --zookeeper localhost:2181 --topic $1
}
function send_messages_to_kafka {
echo -e $1 | $KAFKA_DIR/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic $2
}
function start_kafka_cluster {
if [[ -z $KAFKA_DIR ]]; then
echo "Must run setup kafka dist dir before attempting to start Kafka cluster"
exit 1
fi
$KAFKA_DIR/bin/zookeeper-server-start.sh -daemon $KAFKA_DIR/config/zookeeper.properties
$KAFKA_DIR/bin/kafka-server-start.sh -daemon $KAFKA_DIR/config/server.properties
# zookeeper outputs the "Node does not exist" bit to stderr
while [[ $($KAFKA_DIR/bin/zookeeper-shell.sh localhost:2181 get /brokers/ids/0 2>&1) =~ .*Node\ does\ not\ exist.* ]]; do
echo "Waiting for broker..."
sleep 1
done
}
function stop_kafka_cluster {
$KAFKA_DIR/bin/kafka-server-stop.sh
$KAFKA_DIR/bin/zookeeper-server-stop.sh
# Terminate Kafka process if it still exists
PIDS=$(jps -vl | grep -i 'kafka\.Kafka' | grep java | grep -v grep | awk '{print $1}'|| echo "")
if [ ! -z "$PIDS" ]; then
kill -s TERM $PIDS || true
fi
# Terminate QuorumPeerMain process if it still exists
PIDS=$(jps -vl | grep java | grep -i QuorumPeerMain | grep -v grep | awk '{print $1}'|| echo "")
if [ ! -z "$PIDS" ]; then
kill -s TERM $PIDS || true
fi
}
- kafka消费者脚本 kafka-consumer.sh 注意查看kafka偏移量命令不同版本也有差异, 注释中也可以参考使用.
#!/usr/bin/env bash
################################################################################
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
################################################################################
source "$(dirname "$0")"/common.sh
$KAFKA_DIR/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic $1 --from-beginning
# bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic user_behavior --from-beginning
# bin/kafka-consumer-offset-checker.sh --zookeeper localhost:2181 --topic user_behavior --group test-consumer-group --broker-info
# bin/kafka-console-consumer.sh --topic __consumer_offsets --bootstrap-server localhost:9092 --formatter "kafka.coordinator.group.GroupMetadataManager\$OffsetsMessageFormatter" --consumer.config /Users/hulc/developEnv/kafka_2.11-2.2.0/config/consumer.properties --from-beginning
- kafka数据导入脚本 source-generator.sh 注意这个其实是执行工程的一个数据产生类的代码来导入数据,也可以自己写脚本或者java程序往kafka导入数据.
#!/usr/bin/env bash
################################################################################
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
################################################################################
source "$(dirname "$0")"/kafka-common.sh
# prepare Kafka
echo "Generating sources..."
create_kafka_topic 1 1 user_behavior
java -cp target/flink-sql-submit.jar com.github.wuchong.sqlsubmit.SourceGenerator 1000 | $KAFKA_DIR/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic user_behavior
3.1 数据产生脚本和代码
- 相应脚本,参考上述代码中source-generator.sh部分
- 相应java代码如下:
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
public class SourceGenerator {
private static final long SPEED = 1000; // 每秒1000条
public static void main(String[] args) {
long speed = SPEED;
if (args.length > 0) {
speed = Long.valueOf(args[0]);
}
long delay = 1000_000 / speed; // 每条耗时多少毫秒
try (InputStream inputStream = SourceGenerator.class.getClassLoader().getResourceAsStream("user_behavior.log")) {
BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream));
long start = System.nanoTime();
while (reader.ready()) {
String line = reader.readLine();
System.out.println(line);
long end = System.nanoTime();
long diff = end - start;
while (diff < (delay*1000)) {
Thread.sleep(1);
end = System.nanoTime();
diff = end - start;
}
start = end;
}
reader.close();
} catch (IOException e) {
throw new RuntimeException(e);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
- 相应jar包, 大家使用idea的maven工具,执行package命令,就会在target目录下生成对应的jar包了,如下
- 执行脚本解释
可以看出,就是6个部分组成, 前面是当前工程打出来的jar包, 然后是SourceGenerator全类名, 然后是这个类main函数需要的参数,然后是kafka的producer指令以及所需要的broker list地址信息,还有需要写入的kafka topic id字符串名 - kafka数据源的log日志(不完整,截取部分展示,详细可以去github上下载对应项目即可)
{"user_id": "543462", "item_id":"1715", "category_id": "1464116", "behavior": "pv", "ts": "2017-11-26T01:00:00Z"}
{"user_id": "662867", "item_id":"2244074", "category_id": "1575622", "behavior": "pv", "ts": "2017-11-26T01:00:00Z"}
{"user_id": "561558", "item_id":"3611281", "category_id": "965809", "behavior": "pv", "ts": "2017-11-26T01:00:00Z"}
{"user_id": "894923", "item_id":"3076029", "category_id": "1879194", "behavior": "pv", "ts": "2017-11-26T01:00:00Z"}
{"user_id": "834377", "item_id":"4541270", "category_id": "3738615", "behavior": "pv", "ts": "2017-11-26T01:00:00Z"}
{"user_id": "315321", "item_id":"942195", "category_id": "4339722", "behavior": "pv", "ts": "2017-11-26T01:00:00Z"}
{"user_id": "625915", "item_id":"1162383", "category_id": "570735", "behavior": "pv", "ts": "2017-11-26T01:00:00Z"}
{"user_id": "578814", "item_id":"176722", "category_id": "982926", "behavior": "pv", "ts": "2017-11-26T01:00:00Z"}
{"user_id": "873335", "item_id":"1256540", "category_id": "1451783", "behavior": "pv", "ts": "2017-11-26T01:00:00Z"}
{"user_id": "429984", "item_id":"