【技术实现】java基于WAL日志订阅获取postgresql实时变更数据
注:获取文章源码可以通过关注微信公众号:技术闲人,回复:2048 即可获取
一、前言
在以往的工作中项目组经常有实时获取postgresql
数据库变更数据的需求,之前的解决方式是通过更新字段updatetime
或者插入字段inserttime
定时查询数据表中的相关表格,这种业务逻辑实现复杂,实时性不高,如果表格比较多或者单表数据量比较大,还会导致性能问题;
我们知道阿里巴巴的Canal工具
可以实现数据同步,基于binlog日志
实现对数据库增量或者变更数据的订阅/通知功能,但是canal工具只适用于mysql数据库
,不能用于postgresql数据库,但是postgresql数据库可以实现主从数据库
之间的实时数据同步,所以类似canal
的功能也一定也可以实现的,那是如何实现的问题;
二、WAL日志
canal
是基于mysql
的binlog日志
实现的,所以java实现
对postgresql
数据库变更数据的订阅功能肯定也是基于主从同步日志实现的,postgresql没有mysql数据库的binlong日志,但是有WAL日志
用于主从数据同步,我们先简单介绍一下WAL日志
;
WAL日志
中记录了自数据创建依赖对数据的所有修改历史,因此通过这些日志可以恢复出一个相同的数据库镜像。为了提高数据库的可用性,可以选择建立一个或几个备机,实时接收主机的WAL日志,并对这些日志进行回放,形成基于日志复制的高可用集群。
从日志复制的时机上来看,postgresql
的日志复制
可以分为异步复制
和同步复制
,主要区别是事务提交时是否需要等待备机的日志持久化性消息。从日志的形态来看,又可以分为逻辑复制和物理复制,他们发送的分别是逻辑日志和物理日志。
postgresql的物理日志的内容是二进制
的,是对页面写操作的一种描述;而postgresql的逻辑日志
则是对当前事务操作的描述;
postgresql的逻辑日志通过对物理日志进行解码得到,由于物理日志是按照写操作的执行顺序逐个产生的,因此逻辑日志也能保证与用户写操作的执行顺序一致;
三、逻辑解码和逻辑复制
逻辑解码
和逻辑复制
是我们实现postgresql
数据库实时数据订阅通知
的关键,由于这里的内容太多了,并且需要较多时间进行详细分析,内容太多,这里不再详细介绍,有兴趣的可以学习一下postgresql原理
;
这里主要介绍几个比较重要的概念,大家可以百度了解详情,关键字:
复制槽
逻辑解码
快照处理
逻辑复制
发布订阅
四、准备工作
1)修改postgresql数据库配置
通过复制流技术监听增量事件,基于内置插件pgoutput、test_decoding实现解析wal日志
修改postgresql.conf文件,重启服务
wal_level=logical
2)创建验证数据库表
DROP TABLE IF EXISTS "public"."tbl_user";
CREATE TABLE "public"."tbl_user" (
"id" int4 NOT NULL,
"name" varchar(255) COLLATE "pg_catalog"."default",
"age" int4
)
;
-- ----------------------------
-- Primary Key structure for table tbl_user
-- ----------------------------
ALTER TABLE "public"."tbl_user" ADD CONSTRAINT "tbl_user_pkey" PRIMARY KEY ("id");
3)pom.xml添加依赖
<dependency>
<groupId>org.postgresql</groupId>
<artifactId>postgresql</artifactId>
<version>42.2.2</version>
</dependency>
五、实现步骤(代码示例)
废话不不多说,直接上实现代码,项目总体结构如下:
1)订阅postgresql数据库
与数据库的操作是通过jdbc
实现,关键步骤也是基于jdbc
和postgresql.jar
依赖包,获取数据的中实现类是PGReplicationStream
和PGConnection
;
package com.sk.service;
import lombok.Data;
import lombok.extern.log4j.Log4j2;
import org.postgresql.PGConnection;
import org.postgresql.PGProperty;
import org.postgresql.replication.LogSequenceNumber;
import org.postgresql.replication.PGReplicationStream;
import org.postgresql.replication.fluent.logical.ChainedLogicalStreamBuilder;
import org.springframework.util.Assert;
import java.sql.*;
import java.util.*;
import java.util.concurrent.TimeUnit;
@Log4j2
@Data
public class SimpleDataSource {
private static String url = "jdbc:postgresql://localhost:5432/test";
private static String username = "postgres";
private static String passwd = "xxxxxx";
private static String driverClassName = "org.postgresql.Driver";
//private static final String GET_SLOT = "select count(1) from pg_replication_slots where database = ? and slot_name = ? and plugin = ?";
//private static final String GET_RESTART_LSN = "select restart_lsn from pg_replication_slots where database = ? and slot_name = ? and plugin = ?";
private static final String GET_ROLE = "SELECT r.rolcanlogin AS login, r.rolreplication AS replication, CAST(array_position(ARRAY(SELECT b.rolname FROM pg_catalog.pg_auth_members m JOIN pg_catalog.pg_roles b ON (m.roleid = b.oid) WHERE m.member = r.oid), 'rds_superuser') AS BOOL) IS TRUE AS superuser, CAST(array_position(ARRAY(SELECT b.rolname FROM pg_catalog.pg_auth_members m JOIN pg_catalog.pg_roles b ON (m.roleid = b.oid) WHERE m.member = r.oid), 'rdsadmin') AS BOOL) IS TRUE AS admin, CAST(array_position(ARRAY(SELECT b.rolname FROM pg_catalog.pg_auth_members m JOIN pg_catalog.pg_roles b ON (m.roleid = b.oid) WHERE m.member = r.oid), 'rdsrepladmin') AS BOOL) IS TRUE AS rep_admin FROM pg_roles r WHERE r.rolname = current_user";
private static final String GET_DATABASE = "SELECT current_database()";
private static final String GET_WAL_LEVEL = "SHOW WAL_LEVEL";
private static final String DEFAULT_WAL_LEVEL = "logical";
private static final String PLUGIN_NAME = "pluginName";
private static final String LSN_POSITION = "position";
private static final String DROP_SLOT_ON_CLOSE = "dropSlotOnClose";
private static Connection connection;
private static PGReplicationStream stream;
private boolean dropSlotOnClose;
private static String database;
private static String schema = "test";
private static String metaId = "isa";
private static LogSequenceNumber startLsn;
protected static Map<String, String> snapshot = new HashMap<>();
private static volatile boolean connected;
public void run() throws SQLException {
//pg数据库初始化连接
connection = JdbcConnection3();
//获取WAL日志级别
String walLevel = (String) query(GET_WAL_LEVEL,connection,"wal_level","String");
if (!DEFAULT_WAL_LEVEL.equals(walLevel)) {
log.error(String.format("Postgres server wal_level property must be \"%s\" but is: %s", DEFAULT_WAL_LEVEL, walLevel));
}
//校验当前登录用户的权限
boolean hasAuth = isHasAuth(GET_ROLE,connection);
if(!hasAuth){
log.error(String.format("Postgres roles LOGIN and REPLICATION are not assigned to user: %s",username));
}
//获取当前pg连接对接的数据库
database = (String) query(GET_DATABASE,connection,"current_database","String");
//订阅pg数据库变更消息连接
connect();
connected = true;
//接收的消息处理类
MessageHandle messageHandle = new MessageHandle(stream,connection,snapshot,LSN_POSITION);
//消息接收线程
Worker worker = new Worker(stream,startLsn,messageHandle);
Thread workerThread = new Thread(worker);
workerThread.setName(new StringBuilder("wal-parser-").append(url).append("_").append(worker.hashCode()).toString());
workerThread.setDaemon(false);
workerThread.start();
System.out.println("==========2.server start success!!!");
}
private void connect() throws SQLException {
Properties props = new Properties();
PGProperty.USER.set(props, username);
PGProperty.PASSWORD.set(props, passwd);
// Postgres 9.4发布逻辑复制功能
PGProperty.ASSUME_MIN_SERVER_VERSION.set(props, "9.4");
PGProperty.REPLICATION.set(props, "database");
PGProperty.PREFER_QUERY_MODE.set(props, "simple");
//建立消息订阅连接
connection = DriverManager.getConnection(url, props);
Assert.notNull(connection, "Unable to get connection.");
PGConnection pgConnection = connection.unwrap(PGConnection.class);
//创建复制消息槽位
createReplicationSlot(pgConnection);
//创建复制消息stream流
createReplicationStream(pgConnection);
System.out.println("==========1.server connect success!!!");
}
public String getSlotName() {
return String.format("dbs_slot_%s_%s_%s", schema, username, metaId);
}
//创建复制消息槽位
private void createReplicationSlot(PGConnection pgConnection) throws SQLException {
String slotName = getSlotName();
String plugin = "test_decoding";//messageDecoder.getOutputPlugin(); 插件:TEST_DECODING test_decoding 插件:PG_OUTPUT pgoutput
String GET_SLOT = "select count(1) from pg_replication_slots where database = '"+database+"' and slot_name = '"+slotName+"' and plugin = '"+plugin+"'";
long slot = (Long) query(GET_SLOT,connection,"count","Long");
boolean existSlot = slot > 0;
if (!existSlot) {
pgConnection.getReplicationAPI()
.createReplicationSlot()
.logical()
.withSlotName(slotName)
.withOutputPlugin(plugin)
.make();
// wait for create replication slot to have finished
try {
TimeUnit.MILLISECONDS.sleep(300);
} catch (InterruptedException e) {
log.info(e.getMessage());
}
}
if (!snapshot.containsKey(LSN_POSITION)) {
String GET_RESTART_LSN = "select restart_lsn from pg_replication_slots where database = '"+database+"' and slot_name = '"+slotName+"' and plugin = '"+plugin+"'";
LogSequenceNumber lsn = LogSequenceNumber.valueOf((String) query(GET_RESTART_LSN,connection,"restart_lsn","String"));
if (null == lsn || lsn.asLong() == 0) {
log.error("No maximum LSN recorded in the database");
}
snapshot.put(LSN_POSITION, lsn.asString());
}
startLsn = LogSequenceNumber.valueOf(snapshot.get(LSN_POSITION));
}
//创建复制消息stream流
private void createReplicationStream(PGConnection pgConnection) throws SQLException {
ChainedLogicalStreamBuilder streamBuilder = pgConnection
.getReplicationAPI()
.replicationStream()
.logical()
.withSlotName(getSlotName())
.withStartPosition(startLsn)
.withStatusInterval(10, TimeUnit.SECONDS)
.withSlotOption("include-xids", true)
.withSlotOption("skip-empty-xacts", true);
//messageDecoder.withSlotOption(streamBuilder);
stream = streamBuilder.start();
}
private Connection JdbcConnection3() {
try {
// 1. 加载Driver类,Driver类对象将自动被注册到DriverManager类中 "org.postgresql.Driver"
Class.forName(driverClassName);
// 2. 连接数据库,返回连接对象
Connection conn = DriverManager.getConnection(url, username, passwd);
return conn;
} catch (Exception e) {
log.error("----连接数据库异常,error:{}", e);
}
return null;
}
//jdbc执行查询操作
private Object query(String sql, Connection connection,String columnLabel,String type) throws SQLException {
try (PreparedStatement statement = connection.prepareStatement(sql)) {
try (ResultSet resultSet = statement.executeQuery()) {
while (resultSet.next()) {
if("Long".equals(type)){
return resultSet.getLong(columnLabel);
}else{
return resultSet.getString(columnLabel);
}
}
}
}
return null;
}
//校验当前用户权限
private boolean isHasAuth(String sql, Connection connection) throws SQLException {
try (PreparedStatement statement = connection.prepareStatement(sql)) {
try (ResultSet resultSet = statement.executeQuery()) {
while (resultSet.next()) {
boolean login = resultSet.getBoolean("login");
boolean replication = resultSet.getBoolean("replication");
boolean superuser = resultSet.getBoolean("superuser");
boolean admin = resultSet.getBoolean("admin");
boolean repAdmin = resultSet.getBoolean("rep_admin");
return login && (replication || superuser || admin || repAdmin);
}
}
}
return false;
}
}
通过postgresql jdbc
我们简单了解一下PGReplicationStream
类的属性,如下:
postgresql jdbc api大家可以自己看一下类中的方法详情
2)获取通知消息线程worker
package com.sk.service;
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import lombok.extern.log4j.Log4j2;
import org.postgresql.replication.LogSequenceNumber;
import org.postgresql.replication.PGReplicationStream;
import java.nio.ByteBuffer;
import java.util.concurrent.TimeUnit;
@Log4j2
@Data
@NoArgsConstructor
@AllArgsConstructor
public class Worker implements Runnable{
private PGReplicationStream stream;
private LogSequenceNumber startLsn;
private MessageHandle messageHandle;
@Override
public void run() {
while (true) {
try {
// non blocking receive message
ByteBuffer msg = stream.readPending();
if (msg == null) {
sleepInMills(10L);
continue;
}
LogSequenceNumber lsn = stream.getLastReceiveLSN();
if (messageHandle.skipMessage(msg, startLsn, lsn)) {
continue;
}
messageHandle.flushLsn(lsn);
// process decoder
//sendChangedEvent(processMessage(msg));
System.out.println(messageHandle.processMessage(msg));
// feedback
stream.setAppliedLSN(lsn);
stream.setFlushedLSN(lsn);
stream.forceUpdateStatus();
} catch (Exception e) {
log.error(e.getMessage());
//recover();
}
}
}
private void sleepInMills(long timeout) {
try {
TimeUnit.MILLISECONDS.sleep(timeout);
} catch (InterruptedException e) {
log.info(e.getMessage());
}
}
}
3)消息处理类
package com.sk.service;
import com.sk.column.Lexer;
import com.sk.column.PgColumnValue;
import com.sk.config.MessageTypeEnum;
import com.sk.constant.ConnectorConstant;
import com.sk.event.RowChangedEvent;
import com.sk.utils.DatabaseUtil;
import lombok.Data;
import lombok.extern.log4j.Log4j2;
import org.postgresql.replication.LogSequenceNumber;
import org.postgresql.replication.PGReplicationStream;
import org.postgresql.util.PGmoney;
import java.nio.ByteBuffer;
import java.sql.Connection;
import java.time.Instant;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
/**
* 接收通知消息处理类
*/
@Log4j2
@Data
public class MessageHandle {
private PGReplicationStream stream;
private Connection connection;
private Map<String, String> snapshot;
private String LSN_POSITION;
private PgColumnValue value = new PgColumnValue();
public MessageHandle(PGReplicationStream stream, Connection connection, Map<String, String> snapshot, String LSN_POSITION) {
this.stream = stream;
this.connection = connection;
this.snapshot = snapshot;
this.LSN_POSITION = LSN_POSITION;
}
/*private void recover() {
try {
long s = Instant.now().toEpochMilli();
DatabaseUtil.close(stream);
DatabaseUtil.close(connection);
stream = null;
connection = null;
while (connected) {
try {
connect();
break;
} catch (Exception e) {
log.error("Recover streaming occurred error");
DatabaseUtil.close(stream);
DatabaseUtil.close(connection);
sleepInMills(3000L);
}
}
long e = Instant.now().toEpochMilli();
log.info("Recover logical replication success, slot:{}, plugin:{}, cost:{}seconds", getSlotName(), ("test_decoding"), (e - s) / 1000);
} finally {
//connectLock.unlock();
}
}*/
protected void sendChangedEvent(RowChangedEvent event){
changedEvent(event);
}
public void changedEvent(RowChangedEvent event) {
if (null != event) {
switch (event.getEvent()) {
case ConnectorConstant.OPERTION_UPDATE:
// 是否支持监听修改事件
//processEvent(!listenerConfig.isBanUpdate(), event);
break;
case ConnectorConstant.OPERTION_INSERT:
// 是否支持监听新增事件
//processEvent(!listenerConfig.isBanInsert(), event);
break;
case ConnectorConstant.OPERTION_DELETE:
// 是否支持监听删除事件
//processEvent(!listenerConfig.isBanDelete(), event);
break;
default:
break;
}
}
}
public String processMessage(ByteBuffer buffer) {
if (!buffer.hasArray()) {
throw new IllegalStateException("Invalid buffer received from PG server during streaming replication");
}
MessageTypeEnum type = MessageTypeEnum.getType((char) buffer.get());
if (MessageTypeEnum.TABLE == type) {
int offset = buffer.arrayOffset();
byte[] source = buffer.array();
return new String(source, offset, (source.length - offset));
}
return null;
}
/*public static RowChangedEvent processMessage(ByteBuffer buffer) {
if (!buffer.hasArray()) {
throw new IllegalStateException("Invalid buffer received from PG server during streaming replication");
}
MessageTypeEnum type = MessageTypeEnum.getType((char) buffer.get());
if (MessageTypeEnum.TABLE == type) {
int offset = buffer.arrayOffset();
byte[] source = buffer.array();
return parseMessage(new String(source, offset, (source.length - offset)));
}
return null;
}*/
private RowChangedEvent parseMessage(String message) {
Lexer lexer = new Lexer(message);
// table
lexer.nextToken(' ');
// schemaName
lexer.nextToken('.');
// tableName
lexer.skip(1);
String table = lexer.nextToken('"');
lexer.skip(2);
// eventType
String eventType = lexer.nextToken(':');
lexer.skip(1);
List<Object> data = new ArrayList<>();
while (lexer.hasNext()) {
String name = parseName(lexer);
if ("(no-tuple-data)".equals(name)) {
// 删除时,无主键,不能同步
return null;
}
String type = parseType(lexer);
lexer.skip(1);
String value = parseValue(lexer);
data.add(resolveValue(type, value));
}
RowChangedEvent event = null;
switch (eventType) {
case ConnectorConstant.OPERTION_UPDATE:
case ConnectorConstant.OPERTION_INSERT:
case ConnectorConstant.OPERTION_DELETE:
event = new RowChangedEvent(table, eventType, data);
break;
default:
log.info("Type {} not implemented", eventType);
}
return event;
}
protected Object resolveValue(String typeName, String columnValue) {
value.setValue(columnValue);
if (value.isNull()) {
// nulls are null
return null;
}
switch (typeName) {
// include all types from https://www.postgresql.org/docs/current/static/datatype.html#DATATYPE-TABLE
case "boolean":
case "bool":
return value.asBoolean();
case "integer":
case "int":
case "int4":
case "smallint":
case "int2":
case "smallserial":
case "serial":
case "serial2":
case "serial4":
return value.asInteger();
case "bigint":
case "bigserial":
case "int8":
case "oid":
return value.asLong();
case "real":
case "float4":
return value.asFloat();
case "double precision":
case "float8":
return value.asDouble();
case "numeric":
case "decimal":
return value.asBigDecimal();
case "character":
case "char":
case "character varying":
case "varchar":
case "bpchar":
case "text":
case "hstore":
return value.asString();
case "date":
return value.asDate();
case "timestamp with time zone":
case "timestamptz":
return value.asOffsetDateTimeAtUtc();
case "timestamp":
case "timestamp without time zone":
return value.asTimestamp();
case "time":
return value.asTime();
case "time without time zone":
return value.asLocalTime();
case "time with time zone":
case "timetz":
return value.asOffsetTimeUtc();
case "bytea":
return value.asByteArray();
// these are all PG-specific types and we use the JDBC representations
// note that, with the exception of point, no converters for these types are implemented yet,
// i.e. those values won't actually be propagated to the outbound message until that's the case
case "box":
return value.asBox();
case "circle":
return value.asCircle();
case "interval":
return value.asInterval();
case "line":
return value.asLine();
case "lseg":
return value.asLseg();
case "money":
final Object v = value.asMoney();
return (v instanceof PGmoney) ? ((PGmoney) v).val : v;
case "path":
return value.asPath();
case "point":
return value.asPoint();
case "polygon":
return value.asPolygon();
// PostGIS types are HexEWKB strings
// ValueConverter turns them into the correct types
case "geometry":
case "geography":
case "citext":
case "bit":
case "bit varying":
case "varbit":
case "json":
case "jsonb":
case "xml":
case "uuid":
case "tsrange":
case "tstzrange":
case "daterange":
case "inet":
case "cidr":
case "macaddr":
case "macaddr8":
case "int4range":
case "numrange":
case "int8range":
return value.asString();
// catch-all for other known/builtin PG types
case "pg_lsn":
case "tsquery":
case "tsvector":
case "txid_snapshot":
// catch-all for unknown (extension module/custom) types
default:
return null;
}
}
private String parseName(Lexer lexer) {
if (lexer.current() == ' ') {
lexer.skip(1);
}
lexer.nextToken('[');
return lexer.token();
}
private String parseType(Lexer lexer) {
lexer.nextToken(']');
return lexer.token();
}
private String parseValue(Lexer lexer) {
if (lexer.current() == '\'') {
lexer.skip(1);
lexer.nextTokenToQuote();
return lexer.token();
}
lexer.nextToken(' ');
return lexer.token();
}
public void flushLsn(LogSequenceNumber lsn) {
if (null != lsn && lsn.asLong() > 0) {
snapshot.put(LSN_POSITION, lsn.asString());
}
}
public boolean skipMessage(ByteBuffer buffer, LogSequenceNumber startLsn, LogSequenceNumber lastReceiveLsn) {
if (null == lastReceiveLsn || lastReceiveLsn.asLong() == 0 || startLsn.equals(lastReceiveLsn)) {
return true;
}
int position = buffer.position();
try {
MessageTypeEnum type = MessageTypeEnum.getType((char) buffer.get());
switch (type) {
case BEGIN:
case COMMIT:
case RELATION:
case TRUNCATE:
case TYPE:
case ORIGIN:
case NONE:
return true;
default:
// TABLE|INSERT|UPDATE|DELETE
return false;
}
} finally {
buffer.position(position);
}
}
}
4)程序执行入口main方法
package com.sk.service;
import java.sql.SQLException;
public class Test {
public static void main(String[] args) throws SQLException {
SimpleDataSource simpleDataSource = new SimpleDataSource();
simpleDataSource.run();
}
}
5)通知消息字符串解析类
==注:样例demo中没有用到,大家在解析message时可以借鉴一下;
package com.sk.column;
/**
* @author AE86
* @version 1.0.0
* @date 2022/4/24 18:22
*/
public final class Lexer {
private final char[] array;
private final int length;
private int pos = 0;
private String token;
public Lexer(String input) {
this.array = input.toCharArray();
this.length = this.array.length;
}
public String token() {
return token;
}
public String nextToken(char comma) {
if (pos < length) {
StringBuilder out = new StringBuilder(16);
while (pos < length && array[pos] != comma) {
out.append(array[pos]);
pos++;
}
pos++;
return token = out.toString();
}
return token = null;
}
public String nextTokenToQuote() {
if (pos < length) {
int commaCount = 1;
StringBuilder out = new StringBuilder(16);
while (!((pos == length - 1 || (array[pos + 1] == ' ' && commaCount % 2 == 1)) && array[pos] == '\'')) {
if (array[pos] == '\'') {
commaCount++;
}
out.append(array[pos]);
pos++;
}
pos++;
return token = out.toString();
}
return token = null;
}
public void skip(int skip) {
this.pos += skip;
}
public char current() {
return array[pos];
}
public boolean hasNext() {
return pos < length;
}
}
6)执行结果
Connected to the target VM, address: '127.0.0.1:1124', transport: 'socket'
==========1.server connect success!!!
==========2.server start success!!!
table public.tbl_user: UPDATE: id[integer]:2 name[character varying]:'李四' age[integer]:15
table public.tbl_user: INSERT: id[integer]:4 name[character varying]:'刘能' age[integer]:26
table public.tbl_user: INSERT: id[integer]:5 name[character varying]:'赵四' age[integer]:25
table public.tbl_user: UPDATE: id[integer]:4 name[character varying]:'刘能' age[integer]:23
table public.tbl_user: DELETE: id[integer]:2
五、总结
在实际开发过程中,可以从实现复杂度、可维护性、运行稳定性和可复用性等方面考虑