文章目录
项目配置
配置在 StreamX 中是非常重要的概念。针对参数设置的问题,在 StreamX 中提出统一程序配置的概念,把程序的一系列参数从开发到部署阶段按照特定的格式配置到application.yml里,抽象出一个通用的配置模板,按照这种规定的格式将上述配置的各项参数在配置文件里定义出来,在程序启动的时候将这个项目配置传入到程序中即可完成环境的初始化工作,在任务启动的时候也会自动识别启动时的参数,于是就有了配置文件这一概念。
针对Flink Sql作业在代码里写 sql 的问题,StreamX 针对 Flink Sql 作业做了更高层级封装和抽象,开发者只需要将 sql 按照一定的规范要求定义到 sql.yaml 文件中,在程序启动时将该 sql 文件传入到主程序中, 就会自动按照要求加载执行 sql,于是就有了 sql 文件的概念。
项目结构
如下图所示

其中:
assembly\bin目录
该目录的作用是是部署之后启动应用,开发者不用关注,其中各个文件内容如下:
setclasspath.sh
#!/bin/bash
#
# Copyright (c) 2019 The StreamX Project
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
# -----------------------------------------------------------------------------
# Set JAVA_HOME or JRE_HOME if not already set, ensure any provided settings
# are valid and consistent with the selected start-up options and set up the
# endorsed directory.
# -----------------------------------------------------------------------------
# Make sure prerequisite environment variables are set
if [[ -z "$JAVA_HOME" && -z "$JRE_HOME" ]]; then
# shellcheck disable=SC2154
if ${darwin}; then
# Bugzilla 54390
if [[ -x '/usr/libexec/java_home' ]]; then
# shellcheck disable=SC2006
# shellcheck disable=SC2155
export JAVA_HOME=$(/usr/libexec/java_home)
# Bugzilla 37284 (reviewed).
elif [[ -d "/System/Library/Frameworks/JavaVM.framework/Versions/CurrentJDK/Home" ]]; then
export JAVA_HOME="/System/Library/Frameworks/JavaVM.framework/Versions/CurrentJDK/Home"
fi
else
# shellcheck disable=SC2006
# shellcheck disable=SC2230
JAVA_PATH=$(which java 2>/dev/null)
if [[ "x$JAVA_PATH" != "x" ]]; then
# shellcheck disable=SC2006
JAVA_PATH=$(dirname "${JAVA_PATH}" 2>/dev/null)
# shellcheck disable=SC2006
JRE_HOME=$(dirname "${JAVA_PATH}" 2>/dev/null)
fi
if [[ "x$JRE_HOME" == "x" ]]; then
# XXX: Should we try other locations?
if [[ -x /usr/bin/java ]]; then
JRE_HOME=/usr
fi
fi
fi
if [[ -z "$JAVA_HOME" && -z "$JRE_HOME" ]]; then
echo "Neither the JAVA_HOME nor the JRE_HOME environment variable is defined"
echo "At least one of these environment variable is needed to run this program"
exit 1
fi
fi
if [[ -z "$JAVA_HOME" && "$1" == "debug" ]]; then
echo "JAVA_HOME should point to a JDK in order to run in debug mode."
exit 1
fi
if [[ -z "$JRE_HOME" ]]; then
JRE_HOME="$JAVA_HOME"
fi
# If we're running under jdb, we need a full jdk.
if [[ "$1" == "debug" ]]; then
# shellcheck disable=SC2154
if [[ "$os400" == "true" ]]; then
if [[ ! -x "$JAVA_HOME"/bin/java || ! -x "$JAVA_HOME"/bin/javac ]]; then
echo "The JAVA_HOME environment variable is not defined correctly"
echo "This environment variable is needed to run this program"
echo "NB: JAVA_HOME should point to a JDK not a JRE"
exit 1
fi
else
if [[ ! -x "$JAVA_HOME"/bin/java || ! -x "$JAVA_HOME"/bin/jdb || ! -x "$JAVA_HOME"/bin/javac ]]; then
echo "The JAVA_HOME environment variable is not defined correctly"
echo "This environment variable is needed to run this program"
echo "NB: JAVA_HOME should point to a JDK not a JRE"
exit 1
fi
fi
fi
# Set standard commands for invoking Java, if not already set.
# shellcheck disable=SC2153
if [[ -z "$_RUNJAVA" ]]; then
# shellcheck disable=SC2034
RUNJAVA="$JRE_HOME"/bin/java
fi
if [[ "$os400" != "true" ]]; then
if [[ -z "$_RUNJDB" ]]; then
_RUNJDB="$JAVA_HOME"/bin/jdb
fi
fi
shutdown.sh
#!/bin/bash
#
# Copyright (c) 2019 The StreamX Project
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
# -----------------------------------------------------------------------------
#Stop Script for the StreamX
# -----------------------------------------------------------------------------
#
# Better OS/400 detection: see Bugzilla 31132
os400=false
# shellcheck disable=SC2006
case "$(uname)" in
OS400*) os400=true ;;
esac
# resolve links - $0 may be a softlink
PRG="$0"
while [[ -L "$PRG" ]]; do
# shellcheck disable=SC2006
ls=$(ls -ld "$PRG")
# shellcheck disable=SC2006
link=$(expr "$ls" : '.*-> \(.*\)$')
if expr "$link" : '/.*' >/dev/null; then
PRG="$link"
else
# shellcheck disable=SC2006
PRG=$(dirname "$PRG")/"$link"
fi
done
# shellcheck disable=SC2006
PRGDIR=$(dirname "$PRG")
EXECUTABLE=flink.sh
# Check that target executable exists
if ${os400}; then
# -x will Only work on the os400 if the files are:
# 1. owned by the user
# 2. owned by the PRIMARY group of the user
# this will not work if the user belongs in secondary groups
eval
else
if [[ ! -x "$PRGDIR"/"$EXECUTABLE" ]]; then
echo "Cannot find $PRGDIR/$EXECUTABLE"
echo "The file is absent or does not have execute permission"
echo "This file is needed to run this program"
exit 1
fi
fi
exec "$PRGDIR"/"$EXECUTABLE" stop "$@"
startup.sh
#!/bin/bash
#
# Copyright (c) 2019 The StreamX Project
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
# -----------------------------------------------------------------------------
#Start Script for the StreamX
# -----------------------------------------------------------------------------
#
# Better OS/400 detection: see Bugzilla 31132
os400=false
# shellcheck disable=SC2006
case "$(uname)" in
OS400*) os400=true ;;
esac
# resolve links - $0 may be a softlink
PRG="$0"
while [[ -L "$PRG" ]]; do
# shellcheck disable=SC2006
ls=$(ls -ld "$PRG")
# shellcheck disable=SC2006
link=$(expr "$ls" : '.*-> \(.*\)$')
if expr "$link" : '/.*' >/dev/null; then
PRG="$link"
else
# shellcheck disable=SC2006
PRG=$(dirname "$PRG")/"$link"
fi
done
# shellcheck disable=SC2006
PRGDIR=$(dirname "$PRG")
EXECUTABLE=streamx.sh
# Check that target executable exists
if ${os400}; then
# -x will Only work on the os400 if the files are:
# 1. owned by the user
# 2. owned by the PRIMARY group of the user
# this will not work if the user belongs in secondary groups
eval
else
if [[ ! -x "$PRGDIR"/"$EXECUTABLE" ]]; then
echo "Cannot find $PRGDIR/$EXECUTABLE"
echo "The file is absent or does not have execute permission"
echo "This file is needed to run this program"
exit 1
fi
fi
exec "$PRGDIR"/"$EXECUTABLE" start "$@"
streamx.sh
#!/bin/bash
#
# Copyright (c) 2019 The StreamX Project
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
# -----------------------------------------------------------------------------
#Start Script for the StreamX
# -----------------------------------------------------------------------------
#
# Better OS/400 detection: see Bugzilla 31132
#echo color
WHITE_COLOR="\E[1;37m"
RED_COLOR="\E[1;31m"
BLUE_COLOR='\E[1;34m'
GREEN_COLOR="\E[1;32m"
YELLOW_COLOR="\E[1;33m"
RES="\E[0m"
echo_r() {
# Color red: Error, Failed
[[ $# -ne 1 ]] && return 1
# shellcheck disable=SC2059
printf "[${BLUE_COLOR}Flink${RES}] ${RED_COLOR}$1${RES}\n"
}
echo_g() {
# Color green: Success
[[ $# -ne 1 ]] && return 1
# shellcheck disable=SC2059
printf "[${BLUE_COLOR}Flink${RES}] ${GREEN_COLOR}$1${RES}\n"
}
echo_y() {
# Color yellow: Warning
[[ $# -ne 1 ]] && return 1
# shellcheck disable=SC2059
printf "[${BLUE_COLOR}Flink${RES}] ${YELLOW_COLOR}$1${RES}\n"
}
echo_w() {
# Color yellow: White
[[ $# -ne 1 ]] && return 1
# shellcheck disable=SC2059
printf "[${BLUE_COLOR}Flink${RES}] ${WHITE_COLOR}$1${RES}\n"
}
# OS specific support. $var _must_ be set to either true or false.
cygwin=false
darwin=false
os400=false
hpux=false
# shellcheck disable=SC2006
case "$(uname)" in
CYGWIN*) cygwin=true ;;
Darwin*) darwin=true ;;
OS400*) os400=true ;;
HP-UX*) hpux=true ;;
esac
# resolve links - $0 may be a softlink
PRG="$0"
while [[ -L "$PRG" ]]; do
# shellcheck disable=SC2006
ls=$(ls -ld "$PRG")
# shellcheck disable=SC2006
link=$(expr "$ls" : '.*-> \(.*\)$')
if expr "$link" : '/.*' >/dev/null; then
PRG="$link"
else
# shellcheck disable=SC2006
PRG=$(dirname "$PRG")/"$link"
fi
done
# Get standard environment variables
# shellcheck disable=SC2006
PRGDIR=$(dirname "$PRG")
# shellcheck disable=SC2124
# shellcheck disable=SC2034
RUN_ARGS="$@"
#global variables....
# shellcheck disable=SC2006
# shellcheck disable=SC2164
APP_HOME=$(
cd "$PRGDIR/.." >/dev/null
pwd
)
APP_BASE="$APP_HOME"
APP_CONF="$APP_BASE"/conf
APP_LOG="$APP_BASE"/logs
APP_LIB="$APP_BASE"/lib
# shellcheck disable=SC2034
APP_BIN="$APP_BASE"/bin
APP_TEMP="$APP_BASE"/temp
[[ ! -d "$APP_LOG" ]] && mkdir "${APP_LOG}" >/dev/null
[[ ! -d "$APP_TEMP" ]] && mkdir "${APP_TEMP}" >/dev/null
# For Cygwin, ensure paths are in UNIX format before anything is touched
if ${cygwin}; then
# shellcheck disable=SC2006
[[ -n "$APP_HOME" ]] && APP_HOME=$(cygpath --unix "$APP_HOME")
# shellcheck disable=SC2006
[[ -n "$APP_BASE" ]] && APP_BASE=$(cygpath --unix "$APP_BASE")
fi
# Ensure that neither APP_HOME nor APP_BASE contains a colon
# as this is used as the separator in the classpath and Java provides no
# mechanism for escaping if the same character appears in the path.
case ${APP_HOME} in
*:*)
echo "Using APP_HOME: $APP_HOME"
echo "Unable to start as APP_HOME contains a colon (:) character"
exit 1
;;
esac
case ${APP_BASE} in
*:*)
echo "Using APP_BASE: $APP_BASE"
echo "Unable to start as APP_BASE contains a colon (:) character"
exit 1
;;
esac
# For OS400
if ${os400}; then
# Set job priority to standard for interactive (interactive - 6) by using
# the interactive priority - 6, the helper threads that respond to requests
# will be running at the same priority as interactive jobs.
COMMAND='chgjob job('${JOBNAME}') runpty(6)'
system "${COMMAND}"
# Enable multi threading
export QIBM_MULTI_THREADED=Y
fi
# (0) export HADOOP_CLASSPATH
# shellcheck disable=SC2006
if [ x"$(hadoop classpath)" == x"" ]; then
echo_r " Please make sure to export the HADOOP_CLASSPATH environment variable or have hadoop in your classpath"
else
# shellcheck disable=SC2155
export HADOOP_CLASSPATH=$(hadoop classpath)
fi
doStart() {
local yaml=""
local sql=""
if [[ $# -eq 0 ]]; then
yaml="application.yml"
echo_w "not input properties-file,use default application.yml"
else
#Solve the path problem, arbitrary path, ignore prefix, only take the content after conf/
if [ "$1" == "--conf" ]; then
shift
# shellcheck disable=SC2034
yaml=$(echo "$1" | awk -F 'conf/' '{print $2}')
shift
fi
if [ "$1" == "--sql" ]; then
shift
# shellcheck disable=SC2034
sql=$(echo "$1" | awk -F 'conf/' '{print $2}')
shift
fi
fi
local app_proper=""
if [[ -f "$APP_CONF/$yaml" ]]; then
app_proper="$APP_CONF/$yaml"
fi
local app_sql=""
if [[ -f "$APP_CONF/$sql" ]]; then
app_sql="$APP_CONF/$sql"
fi
# flink main jar...
# shellcheck disable=SC2155
local jarfile="${APP_LIB}/$(basename "${APP_BASE}").jar"
local param_cli="com.streamxhub.streamx.flink.core.conf.ParameterCli"
# shellcheck disable=SC2006
# shellcheck disable=SC2155
local app_name="$(java -cp "${jarfile}" $param_cli --name "${app_proper}")"
local trim="s/^[ \s]\{1,\}//g;s/[ \s]\{1,\}$//g"
# shellcheck disable=SC2006
# shellcheck disable=SC2155
local detached_mode="$(java -cp "${jarfile}" $param_cli --detached "$app_proper")"
# shellcheck disable=SC2006
# trim...
detached_mode="$(echo "$detached_mode" | sed "$trim")"
# shellcheck disable=SC2006
# shellcheck disable=SC2155
local option="$(java -cp "${jarfile}" $param_cli --option "$app_proper") $*"
# shellcheck disable=SC2006
option="$(echo "$option" | sed "$trim")"
# shellcheck disable=SC2006
# shellcheck disable=SC2155
local property_params="$(java -cp "${jarfile}" $param_cli --property "$app_proper")"
# shellcheck disable=SC2006
property_params="$(echo "$property_params" | sed "$trim")"
echo_g "${app_name} Starting by:<${detached_mode}> mode"
# json all params...
local runOption="$option"
if [ x"$property_params" != x"" ]; then
runOption="$runOption $property_params"
fi
local argsOption=""
if [ x"$app_proper" != x"" ]; then
argsOption="--conf $app_proper"
fi
if [ x"$app_sql" != x"" ]; then
argsOption="$argsOption --sql $app_sql"
fi
if [ x"$detached_mode" == x"Detached" ]; then
flink run \
$runOption \
$jarfile \
$argsOption
echo "${app_name}" >"${APP_TEMP}/.running"
else
# shellcheck disable=SC2006
# shellcheck disable=SC2155
local app_log_date=$(date "+%Y%m%d_%H%M%S")
local app_out="${APP_LOG}/${app_name}-${app_log_date}.log"
flink run \
$runOption \
$jarfile \
$argsOption >> "$app_out" 2>&1 &
echo "${app_name}" >"${APP_TEMP}/.running"
echo_g "${app_name} starting,more detail please log:$app_out"
fi
}
doStop() {
# shellcheck disable=SC2155
# shellcheck disable=SC2034
local running_app=$(cat "${APP_TEMP/.running/}")
if [ x"${running_app}" != x"" ]; then
echo_w "can not found flink job!"
fi
flink list -r | grep "${running_app}" | awk '{print $4}' | xargs flink cancel
}
case "$1" in
start)
shift
doStart "$@"
exit $?
;;
stop)
doStop
exit $?
;;
*)
echo_g "Unknown command: $1"
echo_g "commands:"
echo_g " start Start"
echo_g " stop Stop"
echo_g " are you running?"
exit 1
;;
esac
logback.xml
这是日志配置文件,内容如下所示:
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<!-- 日志文件存储路径 -->
<property name="LOG_HOME" value="logs/"/>
<property name="FILE_SIZE" value="50MB"/>
<property name="MAX_HISTORY" value="100"/>
<timestamp key="DATE_TIME" datePattern="yyyy-MM-dd HH:mm:ss"/>
<property name="log.colorPattern"
value="%d{yyyy-MM-dd HH:mm:ss} | %highlight(%-5level) | %boldYellow(%thread) | %boldGreen(%logger) | %msg%n"/>
<property name="log.pattern"
value="%d{yyyy-MM-dd HH:mm:ss.SSS} %contextName [%thread] %-5level %logger{36} - %msg%n"/>
<!-- 控制台打印 -->
<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<encoder charset="utf-8">
<pattern>${log.colorPattern}</pattern>
</encoder>
</appender>
<!-- ERROR 输入到文件,按日期和文件大小 -->
<appender name="ERROR" class="ch.qos.logback.core.rolling.RollingFileAppender">
<encoder charset="utf-8">
<pattern>${log.pattern}</pattern>
</encoder>
<filter class="ch.qos.logback.classic.filter.LevelFilter">
<level>ERROR</level>
<onMatch>ACCEPT</onMatch>
<onMismatch>DENY</onMismatch>
</filter>
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
<fileNamePattern>${LOG_HOME}%d/error.%i.log</fileNamePattern>
<maxHistory>${MAX_HISTORY}</maxHistory>
<timeBasedFileNamingAndTriggeringPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedFNATP">
<maxFileSize>${FILE_SIZE}</maxFileSize>
</timeBasedFileNamingAndTriggeringPolicy>
</rollingPolicy>
</appender>
<!-- WARN 输入到文件,按日期和文件大小 -->
<appender name="WARN" class="ch.qos.logback.core.rolling.RollingFileAppender">
<encoder charset="utf-8">
<pattern>${log.pattern}</pattern>
</encoder>
<filter class="ch.qos.logback.classic.filter.LevelFilter">
<level>WARN</level>
<onMatch>ACCEPT</onMatch>
<onMismatch>DENY</onMismatch>
</filter>
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
<fileNamePattern>${LOG_HOME}%d/warn.%i.log</fileNamePattern>
<MAX_HISTORY>${MAX_HISTORY}</MAX_HISTORY>
<timeBasedFileNamingAndTriggeringPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedFNATP">
<maxFileSize>${FILE_SIZE}</maxFileSize>
</timeBasedFileNamingAndTriggeringPolicy>
</rollingPolicy>
</appender>
<!-- INFO 输入到文件,按日期和文件大小 -->
<appender name="INFO" class="ch.qos.logback.core.rolling.RollingFileAppender">
<encoder charset="utf-8">
<pattern>${log.pattern}</pattern>
</encoder>
<filter class="ch.qos.logback.classic.filter.LevelFilter">
<level>INFO</level>
<onMatch>ACCEPT</onMatch>
<onMismatch>DENY</onMismatch>
</filter>
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
<fileNamePattern>${LOG_HOME}%d/info.%i.log</fileNamePattern>
<MAX_HISTORY>${MAX_HISTORY}</MAX_HISTORY>
<timeBasedFileNamingAndTriggeringPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedFNATP">
<maxFileSize>${FILE_SIZE}</maxFileSize>
</timeBasedFileNamingAndTriggeringPolicy>
</rollingPolicy>
</appender>
<!-- DEBUG 输入到文件,按日期和文件大小 -->
<appender name="DEBUG" class="ch.qos.logback.core.rolling.RollingFileAppender">
<encoder charset="utf-8">
<pattern>${log.pattern}</pattern>
</encoder>
<filter class="ch.qos.logback.classic.filter.LevelFilter">
<level>DEBUG</level>
<onMatch>ACCEPT</onMatch>
<onMismatch>DENY</onMismatch>
</filter>
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
<fileNamePattern>${LOG_HOME}%d/debug.%i.log</fileNamePattern>
<MAX_HISTORY>${MAX_HISTORY}</MAX_HISTORY>
<timeBasedFileNamingAndTriggeringPolicy
class="ch.qos.logback.core.rolling.SizeAndTimeBasedFNATP">
<maxFileSize>${FILE_SIZE}</maxFileSize>
</timeBasedFileNamingAndTriggeringPolicy>
</rollingPolicy>
</appender>
<!-- TRACE 输入到文件,按日期和文件大小 -->
<appender name="TRACE" class="ch.qos.logback.core.rolling.RollingFileAppender">
<encoder charset="utf-8">
<pattern>${log.pattern}</pattern>
</encoder>
<filter class="ch.qos.logback.classic.filter.LevelFilter">
<level>TRACE</level>
<onMatch>ACCEPT</onMatch>
<onMismatch>DENY</onMismatch>
</filter>
<rollingPolicy
class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
<fileNamePattern>${LOG_HOME}%d/trace.%i.log</fileNamePattern>
<MAX_HISTORY>${MAX_HISTORY}</MAX_HISTORY>
<timeBasedFileNamingAndTriggeringPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedFNATP">
<maxFileSize>${FILE_SIZE}</maxFileSize>
</timeBasedFileNamingAndTriggeringPolicy>
</rollingPolicy>
</appender>
<!-- Logger 根目录 -->
<root level="INFO">
<appender-ref ref="STDOUT"/>
<appender-ref ref="DEBUG"/>
<appender-ref ref="ERROR"/>
<appender-ref ref="WARN"/>
<appender-ref ref="INFO"/>
<appender-ref ref="TRACE"/>
</root>
</configuration>
application.yml
这是项目配置文件,内容如下所示:
flink:
deployment:
option:
target: application
detached:
shutdownOnAttachedExit:
zookeeperNamespace:
jobmanager:
property:
$internal.application.main: StreamKafka # 主类
pipeline.name: stramx_kafka_demo # yarn应用名
taskmanager.numberOfTaskSlots: 1
parallelism.default: 2
jobmanager.memory:
flink.size:
heap.size:
jvm-metaspace.size:
jvm-overhead.max:
off-heap.size:
process.size:
taskmanager.memory:
flink.size:
framework.heap.size:
framework.off-heap.size:
managed.size:
process.size:
task.heap.size:
task.off-heap.size:
jvm-metaspace.size:
jvm-overhead.max:
jvm-overhead.min:
managed.fraction: 0.4
checkpoints:
enable: false
interval: 30000
mode: EXACTLY_ONCE
timeout: 300000
unaligned: true
watermark:
interval: 10000
# 状态后端
state:
backend:
value: hashmap # 保存类型,在flink1.13中只有('rocksdb','hashmap')
checkpoints.num-retained: 1
# 重启策略
restart-strategy:
value: fixed-delay #重启策略[(fixed-delay|failure-rate|none)共3个可配置的策略]
fixed-delay:
attempts: 3
delay: 5000
failure-rate:
max-failures-per-interval:
failure-rate-interval:
delay:
# table
table:
planner: blink # (blink|old|any)
mode: streaming #(batch|streaming)
# kafka配置
kafka.source:
bootstrap.servers: scentos:9092
topic: s1
group.id: szc
auto.offset.reset: latest
assembly.xml
这是部署配置文件,内容如下,一般无需修改:
<assembly>
<id>bin</id>
<formats>
<format>tar.gz</format>
</formats>
<fileSets>
<fileSet>
<directory>assembly/bin</directory>
<outputDirectory>bin</outputDirectory>
<fileMode>0755</fileMode>
</fileSet>
<fileSet>
<directory>${project.build.directory}</directory>
<outputDirectory>lib</outputDirectory>
<fileMode>0755</fileMode>
<includes>
<include>*.jar</include>
</includes>
<excludes>
<exclude>original-*.jar</exclude>
</excludes>
</fileSet>
<fileSet>
<directory>assembly/conf</directory>
<outputDirectory>conf</outputDirectory>
<fileMode>0755</fileMode>
</fileSet>
<fileSet>
<directory>assembly/logs</directory>
<outputDirectory>logs</outputDirectory>
<fileMode>0755</fileMode>
</fileSet>
<fileSet>
<directory>assembly/temp</directory>
<outputDirectory>temp</outputDirectory>
<fileMode>0755</fileMode>
</fileSet>
</fileSets>
</assembly>
流式应用开发
pom.xml文件
内容如下:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.example</groupId>
<artifactId>StreamXAPIStreamTest</artifactId>
<version>1.0</version>
<properties>
<flink.version>1.13.6</flink.version>
<scala.binary.version>2.11</scala.binary.version>
<streamx.flink.version>1.13</streamx.flink.version>
<streamx.version>1.2.2</streamx.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-scala_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-scala_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-runtime-web_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.streamxhub.streamx</groupId>
<artifactId>streamx-flink-core</artifactId>
<version>${streamx.version}</version>
</dependency>
<dependency>
<groupId>com.streamxhub.streamx</groupId>
<artifactId>streamx-flink-shims_flink-${streamx.flink.version}</artifactId>
<version>${streamx.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-statebackend-rocksdb_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
</dependencies>
<build>
<plugins>
<!--(start) shade-->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.2.4</version>
<executions>
<execution>
<id>shade-flink</id>
<phase>none</phase>
</execution>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<createDependencyReducedPom>false</createDependencyReducedPom>
<artifactSet>
<excludes>
<exclude>org.apache.flink:force-shading</exclude>
<exclude>com.google.code.findbugs:jsr305</exclude>
<exclude>org.slf4j:*</exclude>
<exclude>log4j:*</exclude>
</excludes>
</artifactSet>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
</execution>
</executions>
</plugin>
<!--(end) shade -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>3.1.1</version>
<executions>
<execution>
<id>distro-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
<configuration>
<appendAssemblyId>false</appendAssemblyId>
<descriptors>
<descriptor>assembly.xml</descriptor>
</descriptors>
</configuration>
</plugin>
<!--同时指定java和scala为源码路径-->
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>build-helper-maven-plugin</artifactId>
</plugin>
<!--maven-->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
</plugin>
<!--scala-->
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>
Kafka-connector的使用
依赖:
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
基本数据消费
application.yml中相关配置:
# kafka配置
kafka.source:
bootstrap.servers: scentos:9092
topic: s1
group.id: szc
auto.offset.reset: latest
java代码:
import com.streamxhub.streamx.flink.core.StreamEnvConfig;
import com.streamxhub.streamx.flink.core.java.source.KafkaSource;
import com.streamxhub.streamx.flink.core.scala.StreamingContext;
import com.streamxhub.streamx.flink.core.scala.source.KafkaRecord;
import org.apache.flink.api.common.functions.MapFunction;
public class StreamKafka {
public static void main(String[] args) {
// 配置
StreamEnvConfig javaConfig = new StreamEnvConfig(args, null);
// 创建 StreamingContext对象
StreamingContext ctx = new StreamingContext(javaConfig);
// 消费kafka数据
new KafkaSource<String>(ctx)
.getDataStream()
.map(new MapFunction<KafkaRecord<String>, String>() {
@Override
public String map(KafkaRecord<String> value) throws
Exception {
return value.value();
}
})
.print();
// 启动任务
ctx.start();
}
}
在Idea中运行前,要在Edit Configuration中,添加应用的命令行参数,指定application.yml的绝对路径:
--conf D:\develop\ideaWorkspace\StreamXAPIStreamTest\assembly\conf\application.yml
并且在Modify options->Java中指定将依赖添加到classpath的provided视野中:

然后直接运行代码即可,运行效果如下:

在kafka的命令行生产者中测试结果如下:

多主题消费
application.yml中配置如下:
# kafka配置
kafka.source:
bootstrap.servers: scentos:9092
topic: s1,s2,s3 # 要消费的主题
group.id: szc
auto.offset.reset: latest
java代码:
new KafkaSource<String>(ctx)
.topic("s1") // 指定要消费的主题
.getDataStream()
.map(new MapFunction<KafkaRecord<String>, String>() {
@Override
public String map(KafkaRecord<String> value) throws
Exception {
return value.value();
}
})
.print("s1");
new KafkaSource<String>(ctx)
.topic("s1", "s2", "s3") // 指定要消费的主题
.getDataStream()
.map(new MapFunction<KafkaRecord<String>, String>() {
@Override
public String map(KafkaRecord<String> value) throws
Exception {
return value.value();
}
})
.print("all");
测试结果如下:

多kafka集群消费
application.yml中配置如下:
kafka.source:
kafka1: # 集群别名
bootstrap.servers: scentos:9092
topic: s1,s2,s3
group.id: szc
auto.offset.reset: latest
kafka2: # 集群别名
bootstrap.servers: scentos2:9092
topic: s1,s2,s3
group.id: szc
auto.offset.reset: latest
java代码,用alias()方法指定要消费kafka服务器别名:
// 消费kafka数据
new KafkaSource<String>(ctx)
.alias("kafka1")
.getDataStream()
.map(new MapFunction<KafkaRecord<String>, String>() {
@Override
public String map(KafkaRecord<String> value) throws
Exception {
return value.value();
}
})
.print("s1");
new KafkaSource<String>(ctx)
.topic("s2")
.alias("kafka2")
.getDataStream()
.map(new MapFunction<KafkaRecord<String>, String>() {
@Override
public String map(KafkaRecord<String> value) throws
Exception {
return value.value();
}
})
.print("all");
JDBC的使用
实现需求:把数据从Kafka读取并写入到MySQL中。
pom.xml依赖:
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.49</version>
</dependency>
application.yml中的配置:
jdbc:
driverClassName: com.mysql.jdbc.Driver
jdbcUrl: jdbc:mysql://192.168.31.60:3306/test?useSSL=false&allowPublicKeyRetrieval=true
username: root
password: root
kafka.source:
bootstrap.servers: scentos:9092
topic: s1
group.id: szc
auto.offset.reset: latest
java代码,注意要把表中的列数据进行封装:
import com.streamxhub.streamx.flink.core.StreamEnvConfig;
import com.streamxhub.streamx.flink.core.java.function.SQLFromFunction;
import com.streamxhub.streamx.flink.core.java.sink.JdbcSink;
import com.streamxhub.streamx.flink.core.java.source.KafkaSource;
import com.streamxhub.streamx.flink.core.scala.StreamingContext;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
public class StreamKafka {
public static void main(String[] args) {
// 配置
StreamEnvConfig javaConfig = new StreamEnvConfig(args, null);
// 创建 StreamingContext对象
StreamingContext ctx = new StreamingContext(javaConfig);
// 消费kafka数据
SingleOutputStreamOperator<User> source = new KafkaSource<String>(ctx)
.topic("s1")
.getDataStream()
.map(record -> {
String[] data = record.value().split(",");
return new User(Integer.parseInt(data[0]), data[1],
data[2], data[3]);
});
new JdbcSink<User>(ctx)
.sql(new SQLFromFunction<User>() {
// 抽取sql语句
@Override
public String from(User user) {
return String.format("insert into users(id, username, password, email)values('%d', '%s', '%s', '%s')",
user.getId(),
user.getUsername(),
user.getPassword(),
user.getEmail());
}
})
.sink(source);
// 启动任务
ctx.start();
}
}
class User {
int id;
String username;
String password;
String email;
public User(int id, String username, String password, String email) {
this.id = id;
this.username = username;
this.password = password;
this.email = email;
}
public int getId() {
return id;
}
public void setId(int id) {
this.id = id;
}
public String getUsername() {
return username;
}
public void setUsername(String username) {
this.username = username;
}
public String getPassword() {
return password;
}
public void setPassword(String password) {
this.password = password;
}
public String getEmail() {
return email;
}
public void setEmail(String email) {
this.email = email;
}
}
测试结果如下:

关于从MySQL中读取数据,StreamX也支持,但仍建议使用FlinkCDC。
提交streamX平台执行
在Linux中启动StreamX,登录平台,创建项目,创建应用(参见StreamX学习笔记之介绍与安装部署中部署FlinkStream应用一节),只是创建应用时,应用类型这里应该选择StreamX Flink,而不是Apache Flink;应用conf配置文件选择application.yml即可:

提交应用并运行后,打开kafka命令行生产者,再按照StreamX学习笔记之介绍与安装部署里,部署FlinkStream应用一节中的方法查看应用的输出,可以看到结果如下:

Flink SQL应用开发
pom依赖
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-planner-blink_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-scala_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-csv</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-json</artifactId>
<version>${flink.version}</version>
</dependency>
application.yml
在flink.deployment标签下,添加table标签(有就不用添加了):
flink:
deployment:
.....
# table
table:
planner: blink # (blink|old|any)
mode: streaming #(batch|streaming)
sql.yml
在application.yml的同级目录下,新建sql.yml文件,内容如下:
first: |
create table s1 (
key string,
value string
) with(
'connector' = 'kafka',
'topic' = 's1',
'properties.bootstrap.servers' =
'scentos:9092',
'properties.group.id' = 'szc',
'scan.startup.mode' = 'latest-offset',
'format' = 'csv'
);
create table s2 (
key string,
value string
) with(
'connector' = 'print'
);
insert into s2 select * from s1;
其中,第一行的first是执行SQL时使用的别名。
java代码
import com.streamxhub.streamx.flink.core.TableContext;
import com.streamxhub.streamx.flink.core.TableEnvConfig;
public class StreamKafka {
public static void main(String[] args) {
TableEnvConfig tableEnvConfig = new TableEnvConfig(args, null);
TableContext ctx = new TableContext(tableEnvConfig);
ctx.sql("first"); // 要执行的SQL别名
}
}
在Idea的Edit Configuration里,添加运行时参数,以指定application.yml和sql.yml的位置:
--conf D:\develop\ideaWorkspace\StreamXAPIStreamTest\assembly\conf\application.yml --sql D:\develop\ideaWorkspace\StreamXAPIStreamTest\assembly\conf\sql.yml
测试结果
在Idea里运行后,用kafka命令行生产者测试结果如下:

4785

被折叠的 条评论
为什么被折叠?



