最近在使用Flink写了一个流式处理项目,最后是把处理的结果写到mysql里面,虽然Flink官方提供JDBC Connector,但是毕竟JDBC太原始了,用Mybatis+Druid不香吗,经过一段时间的摸索和测试,把最后的代码以及遇到的一些问题分享一下。
相关依赖:
pom.xml
<dependency>
<groupId>org.mybatis</groupId>
<artifactId>mybatis</artifactId>
<version>3.3.0</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>6.0.4</version>
</dependency>
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>druid</artifactId>
<version>1.2.4</version>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<version>1.16.22</version>
<optional>true</optional>
</dependency>
Mybatis配置:
mybatis_conf.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE configuration PUBLIC "-//mybatis.org//DTD SQL Map Config 3.0//EN"
"http://mybatis.org/dtd/mybatis-3-config.dtd">
<configuration>
<settings>
<setting name="useGeneratedKeys" value="true"/>
<setting name="defaultExecutorType" value="REUSE"/>
<!-- <setting name="logImpl" value="STDOUT_LOGGING"/> 打印查询语句 -->
</settings>
<environments default="default">
<environment id="default">
<transactionManager type="JDBC"/>
<dataSource type="com.util.DruidDataSourceFactory">
<property name="driverClassName" value="com.mysql.cj.jdbc.Driver"/>
<property name="url" value="....."/>
<property name="username" value="....."/>
<property name="password" value="....."/>
<property name="validationQuery" value="select 'x'"/>
<property name="testOnBorrow" value="false"/>
<property name="testWhileIdle" value="true"/>
<!--小于mysql服务器设置的wait_timeout,mysql默认设置是8个小时-->
<property name="timeBetweenEvictionRunsMillis" value="3600000"/>
</dataSource>
</environment>
</environments>
<mappers>
<mapper resource="mapper.xml"/>
</mappers>
</configuration>
Mapper :
mapper.xml
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE mapper PUBLIC "-//mybatis.org//DTD Mapper 3.0//EN" "http://mybatis.org/dtd/mybatis-3-mapper.dtd">
<mapper namespace="com.example.mapper">
<insert id="updateActive" parameterType="com.pojo.IndeH5RtDau">
insert into .........
</insert>
</mapper>
代码:
import com.alibaba.druid.pool.DruidDataSource;
import org.apache.ibatis.datasource.pooled.PooledDataSourceFactory;
public class DruidDataSourceFactory extends PooledDataSourceFactory {
public DruidDataSourceFactory() {
this.dataSource = new DruidDataSource();
}
}
import lombok.extern.log4j.Log4j;
import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
import org.apache.ibatis.session.SqlSession;
import org.apache.ibatis.session.SqlSessionFactory;
import org.apache.ibatis.session.SqlSessionFactoryBuilder;
import java.io.InputStream;
import static com.util.Utils.getResourceAsStream;
@Log4j
public class MybatisSink<T> extends RichSinkFunction<T> {
private static SqlSessionFactory sqlSessionFactory;
static {
try (InputStream inputStream = getResourceAsStream("mybatis_conf.xml")) {
sqlSessionFactory = new SqlSessionFactoryBuilder().build(inputStream);
} catch (Exception e) {
log.error(e.getMessage(), e);
}
}
private String sqlId;
public MybatisSink(String sqlId) {
this.sqlId = sqlId;
}
@Override
public void invoke(T value, Context context) {
try (SqlSession sqlSession = sqlSessionFactory.openSession(true)) {
sqlSession.insert(sqlId, value);
} catch (Exception ex) {
log.error(ex.getMessage(), ex);
}
}
}
@Log4j
public class Utils {
public static InputStream getResourceAsStream(String s) {
return Utils.class.getClassLoader().getResourceAsStream(s);
}
}
在Flink代码中直接在类型DataStream<T>上调用addSink(new MybatisSink<>("com.example.mapper.updateActive"))来使用MybatisSink来操作数据库。
这个代码并不复杂,但是有一些值得注意的地方。Mybatis的使用主要问题就在于SqlSessionFactory和SqlSession的创建与使用,SqlSessionFactory在代码中应该使用单例模式;而SqlSession是线程不安全的,应该作为局部变量,并且应该及时调用close方法关闭资源,在这个代码中直接使用最简单的静态单例模式以及try-with-resources语句来解决这两个问题。最开始我的代码是这样写的:
public class MybatisSink<T> extends RichSinkFunction<T> {
private String sql;
private SqlSessionFactory sqlSessionFactory;
public MybatisSink(String sql) {
this.sql = sql;
}
@Override
public void open(Configuration parameters) throws Exception {
sqlSessionFactory = getSqlSessionFactory();
}
@Override
public void invoke(Row value, Context context) {
try (SqlSession sqlSession = sqlSessionFactory.openSession()) {
}
}
}
这样写其实是有问题的,SqlSessionFactory可能不是单例的。Flink是以多进程+多线程模式执行任务的,比如有3个Task Manager,6个Task Slot,那么一般而言就是3个进程,每个进程里面有2个线程在执行任务,每个线程叫做一个子任务(SubTask)。而每个子任务都会单独创建一个Sink的实例,按照上面这种写法每个Sink实例都会去创建一个SqlSessionFactory实例,而每创建一个SqlSessionFactory实例都会去创建一个DataSource实例,虽然这样写代码是可以运行的,但是显然是不科学的,在一个进程中所有的线程应该是共享一个线程池才对。