Mycat分库分表分片方式

最新推荐文章于 2024-08-30 09:00:00 发布

如风之夏

最新推荐文章于 2024-08-30 09:00:00 发布

阅读量773

点赞数

文章标签： mycat 分片方式分库分表

本文链接：https://blog.csdn.net/xiarufeng/article/details/127790509

版权

1. 取模分片

<tableRule name="mod-long">
    <rule>
        <columns>id</columns>
        <algorithm>mod-long</algorithm>
    </rule>
</tableRule>

<function name="mod-long" class="io.mycat.route.function.PartitionByMod">
    <property name="count">3</property>
</function>

配置说明：

属性	描述
columns	标识将要分片的表字段
algorithm	指定分片函数与function的对应关系
class	指定该分片算法对应的类
count	数据节点的数量,有三个节点值就是3

2. 范围分片

根据指定的字段及其配置的范围与数据节点的对应情况，来决定该数据属于哪一个分片

<tableRule name="auto-sharding-long">
	<rule>
		<columns>id</columns>
		<algorithm>rang-long</algorithm>
	</rule>
</tableRule>

<function name="rang-long" class="io.mycat.route.function.AutoPartitionByLong">
	<property name="mapFile">autopartition-long.txt</property>
    <property name="defaultNode">0</property>
</function>

autopartition-long.txt 配置如下：

# range start-end ,data node index
# K=1000,M=10000.
0-500M=0
500M-1000M=1
1000M-1500M=2

含义为： 0 - 500 万之间的值，存储在0号数据节点； 500万 - 1000万之间的数据存储在1号数据节点； 1000万 - 1500 万的数据节点存储在2号节点；

配置说明：

属性	描述
columns	标识将要分片的表字段
algorithm	指定分片函数与function的对应关系
class	指定该分片算法对应的类
mapFile	对应的外部配置文件
type	默认值为0 ; 0 表示Integer , 1 表示String
defaultNode	默认节点的所用:枚举分片时,如果碰到不识别的枚举值, 就让它路由到默认节点 ; 如果没有默认值,碰到不识别的则报错

3. 枚举分片

通过在配置文件中配置可能的枚举值，指定数据分布到不同数据节点上，本规则适用于按照省份或状态拆分数据等业务，配置如下：

<tableRule name="sharding-by-intfile">
    <rule>
        <columns>status</columns>
        <algorithm>hash-int</algorithm>
    </rule>
</tableRule>

<function name="hash-int" class="io.mycat.route.function.PartitionByFileMap">
    <property name="mapFile">partition-hash-int.txt</property>
    <property name="type">0</property>
    <property name="defaultNode">0</property>
</function>

partition-hash-int.txt ，内容如下 : 等号左边的为状态值，右边dataNode节点数

1=0
2=1
3=2

配置说明：

属性	描述
columns	标识将要分片的表字段
algorithm	指定分片函数与function的对应关系
class	指定该分片算法对应的类
mapFile	对应的外部配置文件
type	默认值为0 ; 0 表示Integer , 1 表示String
defaultNode	默认节点的所用:枚举分片时,如果碰到不识别的枚举值, 就让它路由到默认节点 ; 如果没有默认值,碰到不识别的则报错

4. 范围求模算法

该算法先进行范围分片，计算出分片组，再进行组内求模
优点：综合了范围分片和求模分片的优点。分片组内使用求模可以保证组内的数据分布比较均匀，分片组之间采用范围分片可以兼顾范围分片的特点。
缺点：在数据范围固定值时，存在不方便扩展的情况，列如将dataNode Group size从2扩展为4时，需要进行数据迁移才能完成。

<tableRule name="auto-sharding-rang-mod">
	<rule>
		<columns>id</columns>
		<algorithm>rang-mod</algorithm>
	</rule>
</tableRule>

<function name="rang-mod" class="io.mycat.route.function.PartitionByRangeMod">
	<property name="mapFile">autopartition-range-mod.txt</property>
    <property name="defaultNode">0</property>
</function>

autopartition-range-mod.txt 配置格式 :

#range  start-end , data node group size
0-500M=1
500M1-2000M=2

在上述配置文件中, 等号前面的范围代表一个分片组 , 等号后面的数字代表该分片组所拥有的分片数量;
配置说明：

属性	描述
columns	标识将要分片的表字段
algorithm	指定分片函数与function的对应关系
class	指定该分片算法对应的类
mapFile	对应的外部配置文件
defaultNode	默认节点 ; 未包含以上规则的数据存储在defaultNode节点中, 节点从0开始

5. 固定分片hash算法

优点：这种策略比较灵活，可以均匀分配也可以非均匀分配，各节点的分配比例和容量大小由partitionCount和partitionLength两个参数决定
缺点：和取模分片类似，不易扩展节点

<tableRule name="brand_partition_rule">
		<rule>
			<columns>id</columns>
			<algorithm>brand_partition</algorithm>
		</rule>
</tableRule>
<function name="brand_partition" class="io.mycat.route.function.PartitionByLong">
		<property name="partitionCount">2,1</property>
    	<property name="partitionLength">256,512</property>
</function>

在示例中配置的分片策略，希望将数据水平分成3份，前两份各占25%，第三份占50%。

配置说明：

属性	描述
columns	标识将要分片的表字段
algorithm	指定分片函数与function的对应关系
class	指定该分片算法对应的类
partitionCount	分片个数列表
partitionLength	分片范围列表

约束：
1. 分片长度：默认最大2*10,为1024
2. count,length的数组长度必须是一致的
3. 两组数据的对应情况(partitionCount[0]partitionLength[0])=(partitionCount[1]partitionLength[1])
4. 以上分为三个分区：0-255，256-511，512-1023

6. 取模范围算法

该算法先进行取模，然后根据取模值所属范围进行分片。
优点：可以自主决定取模后数据的节点分布
缺点： dataNode划分节点是事先建好的，需要扩展时比较麻烦。

<tableRule name="sharding-by-pattern_rule">
		<rule>
			<columns>id</columns>
			<algorithm>sharding-by-pattern</algorithm>
		</rule>
</tableRule>

<function name="sharding-by-pattern" class="io.mycat.route.function.PartitionByPattern">
		<property name="mapFile">partition-pattern.txt</property>
		<property name="defaultNode">0</property>
		<property name="patternValue">96</property>
 </function>

partition-pattern.txt 配置如下:

0-32=0
33-64=1
65-96=2

在mapFile配置文件中，1-32即代表id%96后的分布情况，如果在1-32则在分片0上，如果33-64则在分片1上，如果65-96则在分片2上
配置说明：

属性	描述
columns	标识将要分片的表字段
algorithm	指定分片函数与function的对应关系
class	指定该分片算法对应的类
mapFile	对应的外部配置文件
defaultNode	默认节点 ; 如果id不是数字, 无法求模, 将分配在defaultNode上
patternValue	求模基数

注意 : 取模范围算法只能针对于数字类型进行取模运算 ; 如果是字符串则无法进行取模分片 ;

7. 字符串Hash求模范围算法

与取模范围算法类似，该算法支持数值，符号，字母取模，首先截取长度为prefixLength的字串，在对字串中的每一个字符的ASCII码求和，然后对求和值进行取模运算(sum%patternValue),就可以计算出子串的分片数。
优点：可以自主决定取模后数据的节点分布
缺点： dataNode划分节点是事先建好的，需要扩展时比较麻烦。
配置如下：

<tableRule name="sharding-by-prefixpattern">
	<rule>
		<columns>id</columns>
		<algorithm>sharding-by-prefixpattern</algorithm>
	</rule>
</tableRule>

<function name="sharding-by-prefixpattern" class="io.mycat.route.function.PartitionByPrefixPattern">
	<property name="mapFile">partition-prefixpattern.txt</property>
    <property name="prefixLength">5</property>
    <property name="patternValue">96</property>
</function>

partition-prefixpattern.txt 配置如下:

# range start-end ,data node index
# ASCII
# 48-57=0-9
# 64、65-90=@、A-Z
# 97-122=a-z
###### first host configuration
0-32=0
33-64=1
65-96=2

配置说明：

属性	描述
columns	标识将要分片的表字段
algorithm	指定分片函数与function的对应关系
class	指定该分片算法对应的类
mapFile	对应的外部配置文件
prefixLength	截取的位数; 将该字段获取前prefixLength位所有ASCII码的和, 进行求模sum%patternValue ,获取的值，在通配范围内的即分片数 ;
patternValue	求模基数

字符串如何计算：

字符串 :
	gf89f9a
截取字符串的前5位进行ASCII的累加运算 : 
	g - 103
	f - 102
	8 - 56
	9 - 57
	f - 102
    sum求和 : 103 + 102 + + 56 + 57 + 102 = 420
    求模 : 420 % 96 = 36

8. 应用指定算法

运行阶段由应用自主决定路由到哪个分片，直接根据字符字串(必须是数字)计算分片好，配置如下：

<tableRule name="sharding-by-substring">
	<rule>
		<columns>id</columns>
		<algorithm>sharding-by-substring</algorithm>
	</rule>
</tableRule>

<function name="sharding-by-substring" class="io.mycat.route.function.PartitionDirectBySubString">
	<property name="startIndex">0</property> <!-- zero-based -->
	<property name="size">2</property>
	<property name="partitionCount">3</property>
	<property name="defaultPartition">0</property>
</function>

配置说明：

属性	描述
columns	标识将要分片的表字段
algorithm	指定分片函数与function的对应关系
class	指定该分片算法对应的类
startIndex	字符子串起始索引
size	字符长度
partitionCount	分区(分片)数量
defaultPartition	默认分片(在分片数量定义时, 字符标示的分片编号不在分片数量内时,使用默认分片)

示例说明：
id=05-100000002,在此配置中代表根据id中从startIndex=0,开始截取size=2位数字即05，05就是获取的分区，如果没传默认分配到defaultPartition.

9. 字符串hash解析算法

截取字符串中的指定位置的子字符串，进行hash算法，算出分片，配置如下：

<tableRule name="sharding-by-stringhash">
	<rule>
		<columns>user_id</columns>
		<algorithm>sharding-by-stringhash</algorithm>
	</rule>
</tableRule>

<function name="sharding-by-stringhash" class="io.mycat.route.function.PartitionByString">
	<property name="partitionLength">512</property> <!-- zero-based -->
	<property name="partitionCount">2</property>
	<property name="hashSlice">0:2</property>
</function>

配置说明：

属性	描述
columns	标识将要分片的表字段
algorithm	指定分片函数与function的对应关系
class	指定该分片算法对应的类
partitionLength	hash求模基数 ; length*count=1024 (出于性能考虑)
partitionCount	分区数
hashSlice	hash运算位 , 根据子字符串的hash运算 ; 0 代表 str.length() , -1 代表 str.length()-1 , 大于0只代表数字自身 ; 可以理解为substring（start，end），start为0则只表示0

10. 一致性hash算法

一致性hash算法有效的解决了分布式数据拓容问题，配置如下：
所谓一致性哈希，相同的哈希因子计算值总是被划分到相同的分区表中，不会因为分区节点的增加而改变原来数据的分区位置，比如，原来数据有6个节点，现在有7个节点，原来坐落在6个节点中的数据，不会因为新增一个节点而导致存量数据的分区发生改变，一般用于数据迁移与合并的场合，解决分布式数据扩容的问题

<tableRule name="sharding-by-murmur">
    <rule>
        <columns>id</columns>
        <algorithm>murmur</algorithm>
    </rule>
</tableRule>

<function name="murmur" class="io.mycat.route.function.PartitionByMurmurHash">
    <property name="seed">0</property>
    <property name="count">3</property><!--  -->
    <property name="virtualBucketTimes">160</property>
    <!-- <property name="weightMapFile">weightMapFile</property> -->
    <!-- <property name="bucketMapPath">/etc/mycat/bucketMapPath</property> -->
</function>

配置说明：

属性	描述
columns	标识将要分片的表字段
algorithm	指定分片函数与function的对应关系
class	指定该分片算法对应的类
seed	创建murmur_hash对象的种子，默认0
count	要分片的数据库节点数量，必须指定，否则没法分片
virtualBucketTimes	一个实际的数据库节点被映射为这么多虚拟节点，默认是160倍，也就是虚拟节点数是物理节点数的160倍;virtualBucketTimes*count就是虚拟结点数量 ;
weightMapFile	节点的权重，没有指定权重的节点默认是1。以properties文件的格式填写，以从0开始到count-1的整数值也就是节点索引为key，以节点权重值为值。所有权重值必须是正整数，否则以1代替
bucketMapPath	用于测试时观察各物理节点与虚拟节点的分布情况，如果指定了这个属性，会把虚拟节点的murmur hash值与物理节点的映射按行输出到这个文件，没有默认值，如果不指定，就不会输出任何东西

11. 自然月分片算法

<tableRule name="sharding-by-month">
    <rule>
        <columns>create_time</columns>
        <algorithm>sharding-by-month</algorithm>
    </rule>
</tableRule>

<function name="sharding-by-month" class="io.mycat.route.function.PartitionByMonth">
        <property name="dateFormat">yyyy-MM-dd</property>
		<property name="sBeginDate">2022-11-02</property>
		<property name="sEndDate">2023-01-02</property>
</function>

<table name="operation_log" primaryKey="id" autoIncrement="true" dataNode="node$1-3" rule="sharding-by-month" />

配置说明：

属性	描述
columns	标识将要分片的表字段
algorithm	指定分片函数与function的对应关系
class	指定该分片算法对应的类
dateFormat	日期格式
sBeginDate	开始日期
sEndDate	结束日期，如果配置了结束日期,则循环分片,其上有3个节点，sEndDate一定要正确

12. 日期分片算法

按照日期来分片

<tableRule name="sharding-by-date">
    <rule>
        <columns>create_time</columns>
        <algorithm>sharding-by-date</algorithm>
    </rule>
</tableRule>

<function name="sharding-by-date" class="io.mycat.route.function.PartitionByDate">
	<property name="dateFormat">yyyy-MM-dd</property>
	<property name="sBeginDate">2022-01-01</property>
	<property name="sEndDate">2022-12-31</property>
    <property name="sPartionDay">10</property>
</function>

配置说明：

属性	描述
columns	标识将要分片的表字段
algorithm	指定分片函数与function的对应关系
class	指定该分片算法对应的类
dateFormat	日期格式
sBeginDate	开始日期
sEndDate	结束日期，如果配置了结束日期，则代码数据到达了这个日期的分片后，会重复从开始分片插入
sPartionDay	分区天数，默认值 10 ，从开始日期算起，每个10天一个分区

注意：配置规则的表dataNode的分片，必须和分片规则数量一致，例如：2022-01-01到2022-12-31，每10天一个分片，一共需要37个分片
13. 单月小时算法
单月内按照小时拆分，最小粒度是小时，一天最多可以有24个分片，最小1个分片，下个月从头开始循环，每个月末需要手动清理数据
配置如下：

<tableRule name="sharding-by-hour">
    <rule>
        <columns>create_time</columns>
        <algorithm>sharding-by-hour</algorithm>
    </rule>
</tableRule>

<function name="sharding-by-hour" class="io.mycat.route.function.LatestMonthPartion">
	<property name="splitOneDay">24</property>
</function>

配置说明：

属性	描述
columns	标识将要分片的表字段
algorithm	指定分片函数与function的对应关系
class	指定该分片算法对应的类
splitOneDay	一天切分的分片数

13. 日期范围hash算法

其思想和范围取模分片一样，先根据日期进行范围分片求出分片组，再根据时间hash使得短期内数据分布的更均匀；
优点：可以避免扩容时的数据迁移，又可以一定程度上避免范围分片的热点问题
注意：要求日期格式尽量精确些，不然达不到局部均匀的目的。

<tableRule name="range-date-hash">
    <rule>
        <columns>create_time</columns>
        <algorithm>range-date-hash</algorithm>
    </rule>
</tableRule>

<function name="range-date-hash" class="io.mycat.route.function.PartitionByRangeDateHash">
	<property name="dateFormat">yyyy-MM-dd HH:mm:ss</property>
	<property name="sBeginDate">2022-01-01 00:00:00</property>
	<property name="groupPartionSize">6</property>
    <property name="sPartionDay">10</property>
</function>

配置说明：

属性	描述
columns	标识将要分片的表字段
algorithm	指定分片函数与function的对应关系
class	指定该分片算法对应的类
dateFormat	日期格式 , 符合Java标准
sBeginDate	开始日期 , 与 dateFormat指定的格式一致
groupPartionSize	每组的分片数量
sPartionDay	代表多少天为一组