原因分析
通过ODBC协议访问Clickhouse时,一些服务端可能会因为数据长度问题报错,无法拿到数据集,如下图:
这句报错的意思是,在获取数据集的时候,允许列a的最大数值是4,但实际长度是12。
为了方便探究原因,这里使用了Clikhouse的Mysql引擎(9004端口),并把同样的一句SQL分别发送给Mysql和Clickhouse,然后对比各自返回的报文,看看有什么不同
下一步,对比Clickhouse的MysqlEngine和Mysql5.7返回的结果集有什么差异,如下图:
可以看出,除去clickhouse后面的附加信息外,其余报文大都是相同的,第一个不同的地方是字段类型段,mysql对数据a的描述为-3(utf8),clickhouse对数据a的描述为-2(utf8mb4)。
第二个不同的地方则是用4个字节表示的字段最大长度,mysql算出数据a的长度为4,又因为只有这一行数据,所以最大数据也是4,clickhouse因为俄罗斯人考虑到效率问题就直接省略成0了,这就是造成问题的原因。
还有第三个不同的地方就是mysql认为字段a的类型为固定长度的varchar型数组,clickhouse认为字段a的类型为char类型,这点其实也很重要,一些客户端会根据类型校对长度,可能是sqlserver较为严格的原因,字段最大长度即便改成比真实长度大也会报错
解决办法
找到原因就好办了,通过代理中间件,把报文中的字段信息修正成实际的最大信息,编码统一设置为utf8,格式统一设成varchar,即可。
这里使用了Vertx作为中间件,拦截clickhouse端返回的结果集,并在接收完整后,挑选出每个字段中最大数据的数据长度,并修改报文,重新发送给客户端。
这里提供项目源码:
pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.gzpgbus</groupId>
<artifactId>ClickhouseMysqlEngineODBCProxy</artifactId>
<version>1.0.0</version>
<properties>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
</properties>
<dependencies>
<dependency>
<groupId>io.vertx</groupId>
<artifactId>vertx-core</artifactId>
<version>4.2.6</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<executions>
<execution>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<shadedArtifactAttached>true</shadedArtifactAttached>
<transformers>
<transformer implementation=
"org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>clickhouse.mysqlengine.odbc.proxy.ClickhouseMysqlEngineODBCProxyServer</mainClass>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
ClickhouseMysqlEngineODBCProxyServer
package clickhouse.mysqlengine.odbc.proxy;
import io.netty.util.internal.StringUtil;
import io.vertx.core.AbstractVerticle;
import io.vertx.core.Vertx;
import io.vertx.core.buffer.impl.BufferImpl;
import io.vertx.core.logging.Logger;
import io.vertx.core.logging.LoggerFactory;
import io.vertx.core.net.NetClient;
import io.vertx.core.net.NetServer;
import io.vertx.core.net.NetSocket;
/**
* @author shaoyunchuan
* @date 2022/11/24
*/
public class ClickhouseMysqlEngineODBCProxyServer {
private static final Logger logger = LoggerFactory.getLogger(ClickhouseMysqlEngineODBCProxyServer.class);
public static void main(String[] args) {
Vertx.vertx().deployVerticle(new ClickhouseMysqlEngineODBCProxyServerVerticle());
}
public static class ClickhouseMysqlEngineODBCProxyServerVerticle extends AbstractVerticle {
private final String clickhouseHost;
private final int clickhousePort;
private final int serverPort;
ClickhouseMysqlEngineODBCProxyServerVerticle() {
String clickhouseHost = System.getProperty("clickhouse.host");
if (StringUtil.isNullOrEmpty(clickhouseHost)) {
clickhouseHost = "127.0.0.1";
}
this.clickhouseHost = clickhouseHost;
String clickhousePort = System.getProperty("clickhouse.port");
if (StringUtil.isNullOrEmpty(clickhousePort)) {
clickhousePort = "9004";
}
this.clickhousePort = Integer.parseInt(clickhousePort);
String serverPort = System.getProperty("server.port");
if (StringUtil.isNullOrEmpty(serverPort)) {
serverPort = "19004";
}
this.serverPort = Integer.parseInt(serverPort);
}
@Override
public void start() {
NetServer netServer = vertx.createNetServer();//创建代理服务器
NetClient netClient = vertx.createNetClient();//创建连接clickhouse客户端
netServer.connectHandler(socket -> netClient.connect(clickhousePort, clickhouseHost, result -> {
//响应来自客户端的连接请求,成功之后,在建立一个与目标clickhouse服务器的连接
if (result.succeeded()) {
//与目标clickhouse服务器成功连接连接之后,创造一个ClickhouseMysqlEngineODBCProxyConnection对象,并执行代理方法
new ClickhouseMysqlEngineODBCProxyConnection(socket, result.result()).proxy();
} else {
logger.error(result.cause().getMessage(), result.cause());
socket.close();
}
})).listen(serverPort, listenResult -> {//代理服务器的监听端口
if (listenResult.succeeded()) {
//成功启动代理服务器
logger.info("Clickhouse proxy server start up. clickhouse.host:" + this.clickhouseHost + " clickhouse.port:" + this.clickhousePort+ " server.port:" + this.serverPort);
} else {
//启动代理服务器失败
logger.error("Clickhouse proxy exit. because: " + listenResult.cause().getMessage(), listenResult.cause() + " host:" + this.clickhouseHost + " port:" + this.clickhousePort);
System.exit(1);
}
});
}
}
public static class ClickhouseMysqlEngineODBCProxyConnection {
private final NetSocket clientSocket;
private final NetSocket serverSocket;
private final ClickHouseMysqlEngineODBCHandler engineHandler;
public ClickhouseMysqlEngineODBCProxyConnection(NetSocket clientSocket, NetSocket serverSocket) {
this.clientSocket = clientSocket;
this.serverSocket = serverSocket;
this.engineHandler = new ClickHouseMysqlEngineODBCHandler();
}
private void proxy() {
//当代理与clickhouse服务器连接关闭时,关闭client与代理的连接
serverSocket.closeHandler(v -> clientSocket.close());
//反之亦然
clientSocket.closeHandler(v -> serverSocket.close());
//不管那端的连接出现异常时,关闭两端的连接
serverSocket.exceptionHandler(e -> {
logger.error(e.getMessage(), e);
close();
});
clientSocket.exceptionHandler(e -> {
logger.error(e.getMessage(), e);
close();
});
//当收到来自客户端的数据包时,转发给clickhouse目标服务器
clientSocket.handler(buffer ->
serverSocket.write(buffer)
);
//当收到来自clickhouse目标服务器的数据包时,转发给客户端
serverSocket.handler(buffer -> {
try {
byte[] bytes = buffer.getBytes();
// 排除挥手报文与登录报文
if ((this.engineHandler.isEmpty() && bytes[0] == 1)
|| !this.engineHandler.isEmpty()) {
this.engineHandler.append(bytes);
if (this.engineHandler.hasCompleted()) {
// 确保数据集完整后,开始补充字段最大长度与转换字段类型
engineHandler.handler();
// 推流
clientSocket.write(BufferImpl.buffer(engineHandler.read()));
// 清空handler
this.engineHandler.clear();
}
return;
}
clientSocket.write(buffer);
} catch (Exception e) {
close();
logger.error(e.getMessage());
}
}
);
}
private void close() {
engineHandler.clear();
try {
clientSocket.close();
} catch (Exception ignored) {
}
try {
serverSocket.close();
} catch (Exception ignored) {
}
}
}
}
ClickHouseMysqlEngineODBCHandler
package clickhouse.mysqlengine.odbc.proxy;
import io.vertx.core.logging.Logger;
import io.vertx.core.logging.LoggerFactory;
import java.nio.ByteBuffer;
/**
* @author shaoyunchuan
* @date 2022/11/24
*/
public class ClickHouseMysqlEngineODBCHandler {
private static final Logger logger = LoggerFactory.getLogger(ClickHouseMysqlEngineODBCHandler.class);
private byte[] data;// 数据集
private int remain;
private int type;// 0:普通 1:错误报文
private boolean hasCompleted;
private int rowSetIdx;
private int eofScannerIdx;
public ClickHouseMysqlEngineODBCHandler() {
this.data = new byte[0];
this.hasCompleted = false;
}
public void append(byte[] data) {
if (hasCompleted()) {
logger.error("not full data!");
throw new RuntimeException("not full data!");
}
byte[] oldData = this.data;
byte[] newData = new byte[oldData.length + data.length];
System.arraycopy(oldData,0,newData,0,oldData.length);
System.arraycopy(data,0,newData,oldData.length,data.length);
this.data = newData;
if (hasError()) {
if (this.remain <= 0) {
this.hasCompleted = true;
}
}
if (hasEOF()) {
if (this.remain <= 0) {
this.hasCompleted = true;
}
} else {
if (this.remain > 0) {
this.remain -= data.length;
}
}
}
public boolean isEmpty() {
return this.data.length == 0;
}
public byte[] read() {
return this.data;
}
public void clear() {
this.hasCompleted = false;
this.type = 0;
this.data = new byte[0];
this.rowSetIdx = 0;
this.eofScannerIdx = 0;
}
public boolean hasCompleted() {
return this.hasCompleted;
}
/**
* 判断是否携带eof标记,并记录附带信息长度
* @return
*/
private boolean hasEOF() {
int idx = skipMetaData();
byte[] tmp = this.data;
if (idx < 0) {
return false;
}
idx = Math.max(this.eofScannerIdx, idx);
while (idx + 4 < tmp.length) {
if (tmp[idx + 4] == -2 && tmp[idx] < 16777215) {
this.remain = (tmp[idx] & 0xff) + ((tmp[idx + 1] & 0xff) << 8) + ((tmp[idx + 2] & 0xff) << 16) - (tmp.length - idx - 4);
return true;
}
if((idx = skipPackets(idx, tmp)) < 0) {
return false;
}
}
this.eofScannerIdx = idx;
return false;
}
/**
* 核心方法,为完整的数据集补充字段最大长度信息,并设置字段类型为utf8
*/
public void handler() {
if (!this.hasCompleted) {
throw new RuntimeException("not full data!");
}
if (this.type > 0) {
return;
}
byte[] tmp = this.data;
int columnNum = readColumnNum();
int idx = 0;
idx = skipPackets(idx,tmp);
int idxForSkip = idx;
long[] maxLengths = readColumnMaxLength();
for (int i = 0; i < columnNum; i++) {
idx+=4;
idx = skipPara(idx, tmp);
idx = skipPara(idx, tmp);
idx = skipPara(idx, tmp);
idx = skipPara(idx, tmp);
idx = skipPara(idx, tmp);
idx = skipPara(idx, tmp);
idx++;// 12
tmp[idx] = 45;// 字符集
idx++;
idx++;
byte[] bites = toBites(maxLengths[i]);
tmp[idx] = bites[3];
tmp[idx + 1] = bites[2];
tmp[idx + 2] = bites[1];
tmp[idx + 3] = bites[0];
idx += 4;
tmp[idx] = -3;
idx += 2;
tmp[idx] = 31;
idx = skipPackets(idxForSkip,tmp);// 不使用游标
idxForSkip = idx;
}
}
/**
* 获取数据集字段数,若-1则为错误报文
* @return
*/
private int readColumnNum() {
return this.data[4];
}
/**
* 返回每个字段的最大值
* @return
*/
private long[] readColumnMaxLength() {
byte[] tmp = this.data;
int maxLength = this.data.length;
int columnNum = readColumnNum();
long[] columnMaxLength = new long[columnNum];
if (!this.hasCompleted) {
logger.error("数据不完整!");
return columnMaxLength;
}
int idx = skipMetaData();
if (idx < 0) {
logger.error("获取最大长度失败!");
return columnMaxLength;
}
// 按行跳转
int idxForSkip = idx;
while (idx < maxLength) {
int remaining = tmp[idx]; // 废弃
idx+=4;
int rw = tmp[idx];
// EOF报文
if (rw == -2 && remaining < 16777215) {
break;
}
// 错误码
if (rw == -1) {
break;
}
for (int i = 0; i < columnNum; i++) {
int len = 0;
rw = tmp[idx];
switch (rw) {
case -5: // null
break;
case -4:
len = (tmp[idx + 1] & 0xff) | ((tmp[idx + 2] & 0xff) << 8);
break;
case -3:
len = (tmp[idx + 1] & 0xff) | ((tmp[idx + 2] & 0xff) << 8) | ((tmp[idx + 3] & 0xff) << 16);
break;
case -2:
len = (int) ((tmp[idx + 1] & 0xff) | ((long) (tmp[idx + 2] & 0xff) << 8)
| ((long) (tmp[idx + 3] & 0xff) << 16) | ((long) (tmp[idx + 4] & 0xff) << 24)
| ((long) (tmp[idx + 5] & 0xff) << 32) | ((long) (tmp[idx + 6] & 0xff) << 40)
| ((long) (tmp[idx + 7] & 0xff) << 48) | ((long) (tmp[idx + 8] & 0xff) << 56));
break;
default:
len = rw;
}
// 使用for + if循环替换原版方法 columnMaxLength[i] = Math.max(columnMaxLength[i] ,new String(tmp,idx + 1,len).length())
long num = num(tmp, idx + 1, len);
if (columnMaxLength[i] < num) {
columnMaxLength[i] = num;
}
idx = skipPara(idx,tmp);
}
idx = skipPackets(idxForSkip,tmp);
idxForSkip = idx;
}
for (int i = 0; i < columnMaxLength.length; i++) {
columnMaxLength[i] = columnMaxLength[i] * 4L;
}
return columnMaxLength;
}
/**
* 字节数组转字符个数
* @param chars
* @param off
* @param len
* @return
*/
private long num(byte[] chars,int off,int len) {
long num = 0L;
int idx = off;
int rowLen = idx + len;
while (idx < rowLen) {
if (chars[idx] < 0) {
idx += 3;
} else {
++idx;
}
++num;
}
return num;
}
/**
* 返回跳过字段信息后的指针 -1意味着报文并不完整
* @return
*/
private int skipMetaData() {
if (this.rowSetIdx > 0) {
return this.rowSetIdx;
}
int idx = 0;
int maxLength = this.data.length;
int columnNum = readColumnNum();
// 跳过字段数报文
idx = skipPackets(idx, this.data);
if (idx >= maxLength) {
return -1;
}
for (int i = 0; i < columnNum; i++) {
idx = skipPackets(idx, this.data);
if (idx >= maxLength) {
return -1;
}
}
this.rowSetIdx = idx;
return idx;
}
/**
* 跳过Packets
* @param begin
* @param bytes
* @return
*/
private int skipPackets(int begin, byte[] bytes) {
return begin + 4 + (bytes[begin] & 0xff) + ((bytes[begin + 1] & 0xff) << 8) + ((bytes[begin + 2] & 0xff) << 16);
}
/**
* 跳过字符串
* @param begin
* @param bytes
* @return
*/
private int skipPara(int begin, byte[] bytes) {
return begin + 1 + bytes[begin];
}
/**
* 将long型转换为4字节数组
* @param value
* @return
*/
private byte[] toBites(long value) {
byte[] bites = new byte[4];
System.arraycopy(ByteBuffer
.allocate(Long.SIZE / Byte.SIZE)
.putLong(value)
.array(),4,bites,0,4);
return bites;
}
/**
* 返回报文是否为错误报文,若为错误报文,则记录剩余字段长度即可,无需后续处理
* @return
*/
private boolean hasError() {
byte[] tmp = this.data;
if (readColumnNum() < 0) {
if (this.remain < 1) {
this.type = 1;
this.remain = (tmp[0] & 0xff) + ((tmp[1] & 0xff) << 8) + ((tmp[2] & 0xff) << 16) - (tmp.length - 4);
}
return true;
}
return false;
}
}
提供好clickhouse mysqlengine的地址和端口(默认为9004)并打成Jar包运行即可,也可以放在docker上运行,这里提供Dockerfile的脚本
# 基础镜像使用java
FROM openjdk:8
# 作者
MAINTAINER shaoyunchuan
# VOLUME 指定临时文件目录 /tmp 在主机 /var/lib/docker 目录下创建一个临时文件并连接到容器的 /tmp
VOLUME /tmp
# 将jar包添加到容器中
ADD ClickhouseMysqlEngineODBCProxy-1.0.0-shaded.jar ClickhouseMysqlEngineODBCProxy-1.0.0-shaded.jar
# 环境变量
ENV CLICKHOUSE_HOST '192.168.200.76'
ENV CLICKHOUSE_PORT '9004'
ENV SERVER_PORT '19004'
ENV Xms '64m'
ENV Xmx '1024m'
# 运行jar包
RUN bash -c 'touch /ClickhouseMysqlEngineODBCProxy-1.0.0-shaded.jar'
ENTRYPOINT ["sh","-c","java -XX:+UseG1GC -Xms$Xms -Xmx$Xmx -Dserver.port=$SERVER_PORT -Dclickhouse.host=$CLICKHOUSE_HOST -Dclickhouse.port=$CLICKHOUSE_PORT -jar /ClickhouseMysqlEngineODBCProxy-1.0.0-shaded.jar"]
# 暴露端口
EXPOSE 19004
这个问题不管使用mysql协议还是clickhouse原生的http协议下都会存在,为了方便开发和对比,这里采用了clickhouse mysql engine的mysql协议,也就是说用mysql odbc连接这个代理中间件即可:
这里要和9004区别开来,19004为VertX代理中间件的端口,客户端(sqlserver链接服务器等)通过这个ODBC拿数即可
最后小结
这个代理中间件在查询结果集很大的时候毫无疑问有oom的风险,这里也能看出俄罗斯人为何会省略最大长度等元信息,毕竟在大数据的背景下不可能拿着整个结果集一点一点的比较出到底哪行数据长度最大,最大长度是多少,旦作为一些客户端而言,没有最大长度可能意味着报文信息有误,不符合协议规范,不方便管理内存等问题。