Clickhouse批量插入数据时报错：Too many partitions for single INSERT block

最新推荐文章于 2024-07-03 03:50:27 发布

qq_34669699

最新推荐文章于 2024-07-03 03:50:27 发布

阅读量638

点赞数

文章标签： clickhouse

本文链接：https://blog.csdn.net/qq_34669699/article/details/132193922

版权

我们在使用clickhouse建表时指定分片字段，如果查询时指定分片，检索量很小，有利于检索性能；但是如果分片字段执行的不合适，会导致插入式报错。

问题一：插入报错

Too many partitions for single INSERT block (more than 150). The l
imit is controlled by 'max_partitions_per_insert_block' setting. Large number of partitions is a common misconception. I
t will lead to severe negative performance impact, including slow server startup, slow INSERT queries and slow SELECT qu
eries. Recommended total number of partitions for a table is under 1000..10000. Please note, that partitioning is not in
tended to speed up SELECT queries (ORDER BY key is sufficient to make range queries fast). Partitions are intended for d
ata manipulation (DROP PARTITION, etc)

问题在于，建表时指定的分片字段不合适，插入数据离散型太高导致要插入的数据几乎都不在一个分片下。
分区不是为了加快SELECT查询的速度（ORDER BY可以使范围查询更快）。分区用于数据操作（DROP PARTITION等）

解决方案：

1、去除建表语句中的分片字段，表不再分片
2、修改分片字段为一个合适的字段，保证数据离散型不要太大

值得注意的是，clickhouse是无法直接对表进行分区字段修改的，只能通过建立新表，数据移动，删除旧表的方式。

问题二：
当我们需要查询大批量数据，比如几百万的user_id,user_id会分散的写入到不同的分区里，此时查询某个user_id,检索量很大，效率低下。

解决方案

根据user_id 生成hash_code,然后根据hash_code分区，查询时根据user_id 生成的hash_code ,添加hash_code作为查询条件进行查询

直接通过系统去哈希值受到内存地址的影响取值结果不固定，通过思考决定采用ASCII码+进制加权的方式取哈希值，确保不同技术栈的取值结果相同。
原理：将字符串转换成长度为N的字符数组，如['c','z','f'], 计算方法：99*2*2+122*2+102*1 = 742

1、JAVA实现参考：

static int myHash(String a) {
	int rs = 0;
	int length = a.length() ;
	for (int i = 0; i < length; i++) {
		int b = a.charAt(i) - 0;
		Double c = Math.pow(2, length - 1 - i);
		rs += b *c;
	}
return rs;
}

在这里插入图片描述

2、python实现

def hm(s):
	rs=0
	x=len(s)
	for i in range(x):
		a=ord(s[i])
		b=2**(x-i-1)
		rs+=a*b
	return rs
hm( ' czf' )

在这里插入图片描述
在ck上建表注意事项：

通过执行查询语句：

通过观察查询日志，发现检索量很小，有利于检索性能

qq_34669699

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫