Mycat学习（6）--- Mycat实现分库分表（水平）

最新推荐文章于 2024-05-02 17:42:03 发布

技术闲聊DD

最新推荐文章于 2024-05-02 17:42:03 发布

阅读量291

点赞数

分类专栏： Mycat 文章标签： mycat

本文链接：https://blog.csdn.net/wu2374633583/article/details/116130845

版权

Mycat 专栏收录该内容

6 篇文章 1 订阅

订阅专栏

1 水平拆分思路

当表中数据量特别大的时候，就要考虑到表数据的水平拆分。水平拆分也就是按照某个字段的某种规则来分散到多个库之中，每个表中包含一部分数据。拆分数据就需要定义分片规则，几种典型的分片规则包括：

按照用户 ID 求模，将数据分散到不同的数据库，具有相同数据用户的数据都被分散到一个库中；
按照日期，将不同月甚至日的数据分散到不同的库中；
按照某个特定的字段求摸，或者根据特定范围段分散到不同的库中。

优点：

拆分规则抽象好，join 操作基本可以数据库做；
不存在单库大数据，高并发的性能瓶颈；
应用端改造较少；
提高了系统的稳定性跟负载能力。

缺点：

拆分规则难以抽象；
分片事务一致性难以解决；
数据多次扩展难度跟维护量极大；
跨库 join 性能较差。

针对数据源管理，目前主要有两种思路：

客户端模式，在每个应用程序模块中配置管理自己需要的一个（或者多个）数据源，直接访问各个数据库，
在模块内完成数据的整合；
通过中间代理层来统一管理所有的数据源，后端数据库集群对前端应用程序透明；

原则：

第一原则：能不切分尽量不要切分。
第二原则：如果要切分一定要选择合适的切分规则，提前规划好。
第三原则：数据切分尽量通过数据冗余或表分组（Table Group）来降低跨库 Join 的可能。
第四原则：由于数据库中间件对数据 Join 实现的优劣难以把握，而且实现高性能难度极大，业务读取尽量
少使用多表 Join。

2 配置server.xml

指定主键生成策略

<!--配置数据库的主键怎么生成，0为本地文件方式，1为数据库方式，2为时间戳序列方式-->

<property name="sequnceHandlerType">0</property>

3 配置schema.xml

指定逻辑库，分片结点，结点主机等

<?xml version="1.0"?>
<!DOCTYPE mycat:schema SYSTEM "schema.dtd">
<mycat:schema xmlns:mycat="http://io.mycat/">
<schema name="mycatdb" checkSQLschema="false" sqlMaxLimit="100">
    <!-- 要实现分库分表，那么就需要在<schema>标签下配置表了，现在是水平切分，表示要对哪张表进行切分 -->
    <table name="orders" primaryKey="id" autoIncrement="true" dataNode="dn1,dn2,dn3,dn4" rule="mod-long" />
</schema>
<!--配置真实存在的物理数据库-->
<dataNode name="dn1" dataHost="localhost1" database="test01" />
<dataNode name="dn2" dataHost="localhost1" database="test02" />
<dataNode name="dn3" dataHost="localhost1" database="test03" />
<dataNode name="dn4" dataHost="localhost1" database="test04" />
<dataHost name="localhost1"
          maxCon="1000"
          minCon="10"
          balance="1"
          writeType="0"
          dbType="mysql"
          dbDriver="native"
          switchType="1"
          slaveThreshold="100">
    <heartbeat>select user()</heartbeat>
    <writeHost host="hostM3308" url="192.168.119.11:3308" user="root" password="123456">
        <readHost host="hostS3309" url="192.168.119.11:3309" user="root" password="123456" />
         <readHost host="hostS3310" url="192.168.119.11:3310" user="root" password="123456" />
    </writeHost>
    <writeHost host="hostM3310" url="192.168.119.11:3310" user="root" password="123456">
        <readHost host="hostS3311" url="192.168.119.11:3311" user="root" password="123456" />
        <readHost host="hostS3308" url="192.168.119.11:3308" user="root" password="123456" />
    </writeHost>
</dataHost>
</mycat:schema>

4 配置rule.xml

指定分片结点数

<function name="mod-long" class="io.mycat.route.function.PartitionByMod">
    <!-- how many data nodes -->
    <property name="count">4</property>
</function>

5 验证

在Mycat客户端执行插入操作，会发现每个表中都含有数据；
然后再执行查询操作，发现查询出来的是所有数据。

INSERT INTO `orders` (`id`, `name`) VALUES(NEXT VALUE FOR MYCATSEQ_GLOBAL,'20');
INSERT INTO `orders` (`id`, `name`) VALUES(NEXT VALUE FOR MYCATSEQ_GLOBAL,'21');
INSERT INTO `orders` (`id`, `name`) VALUES(NEXT VALUE FOR MYCATSEQ_GLOBAL,'22');
INSERT INTO `orders` (`id`, `name`) VALUES(NEXT VALUE FOR MYCATSEQ_GLOBAL,'23');
INSERT INTO `orders` (`id`, `name`) VALUES(NEXT VALUE FOR MYCATSEQ_GLOBAL,'24');
INSERT INTO `orders` (`id`, `name`) VALUES(NEXT VALUE FOR MYCATSEQ_GLOBAL,'25');
INSERT INTO `orders` (`id`, `name`) VALUES(NEXT VALUE FOR MYCATSEQ_GLOBAL,'26');
INSERT INTO `orders` (`id`, `name`) VALUES(NEXT VALUE FOR MYCATSEQ_GLOBAL,'27');
INSERT INTO `orders` (`id`, `name`) VALUES(NEXT VALUE FOR MYCATSEQ_GLOBAL,'28');

在这里插入图片描述

6 rule.xml的几种分片算法

6.1 枚举法

<tableRule name="sharding-by-intfile">
    <rule>
        <columns>user_id</columns>
        <algorithm>hash-int</algorithm>
    </rule>
</tableRule>
<function name="hash-int" class="io.mycat.route.function.PartitionByFileMap">
    <property name="mapFile">partition-hash-int.txt</property>
    <property name="type">0</property>
    <property name="defaultNode">0</property>
</function>

partition-hash-int.txt 配置：

10000=0
10010=1

上面columns 标识将要分片的表字段，algorithm 分片函数，其中分片函数配置中，mapFile标识配置文件名称，type默认值为0，0表示Integer，非零表示String，所有的节点配置都是从0开始，0代表节点1。

defaultNode 默认节点：小于0表示不设置默认节点，大于等于0表示设置默认节点,结点为指定的值。
默认节点的作用：枚举分片时，如果碰到不识别的枚举值，就让它路由到默认节点，如果不配置默认节点（defaultNode值小于0表示不配置默认节点），碰到不识别的枚举值就会报错，like this：can’t find datanode for sharding column:column_nameval:ffffffff 。

6.2 固定分片hash算法

<tableRule name="rule1">
    <rule>
         <columns>user_id</columns>
         <algorithm>func1</algorithm>
    </rule>
</tableRule>
<function name="func1" class="io.mycat.route.function.PartitionByLong">
    <property name="partitionCount">2,1</property>
    <property name="partitionLength">256,512</property>
</function>

上面columns 标识将要分片的表字段，algorithm 分片函数，partitionCount 分片个数列表，partitionLength 分片范围列表。分区长度:默认为最大2^n=1024 ,即最大支持1024分区。
约束 :count,length两个数组的长度必须是一致的。1024 = sum((count[i]*length[i])). count和length两个向量的点积恒等于1024。
示例：

@Test
public void testPartition() {
    // 本例的分区策略：希望将数据水平分成3份，前两份各占25%，第三份占50%。（故本例非均匀分区）
    // |<---------------------1024------------------------>|
    // |<----256--->|<----256--->|<----------512---------->|
    // | partition0 | partition1 |        partition2       |
    // | 共2份,故count[0]=2       |   共1份，故count[1]=1    |
    int[] count = new int[] { 2, 1 };
    int[] length = new int[] { 256, 512 };
    PartitionUtil pu = new PartitionUtil(count, length);
　　　　// 下面代码演示分别以offerId字段或memberId字段根据上述分区策略拆分的分配结果
    int DEFAULT_STR_HEAD_LEN = 8; // cobar默认会配置为此值
    long offerId = 12345;
    String memberId = "qiushuo";

　　// 若根据offerId分配，partNo1将等于0，即按照上述分区策略，offerId为12345时将会被分配到partition0中
    int partNo1 = pu.partition(offerId);

　　// 若根据memberId分配，partNo2将等于2，即按照上述分区策略，memberId为qiushuo时将会被分到partition2中
    int partNo2 = pu.partition(memberId, 0, DEFAULT_STR_HEAD_LEN);

    Assert.assertEquals(0, partNo1);
    Assert.assertEquals(2, partNo2);

}

如果需要平均分配设置：平均分为4分片，partitionCount*partitionLength=1024



<function name="func1" class="org.opencloudb.route.function.PartitionByLong">
    <property name="partitionCount">4</property>
    <property name="partitionLength">256</property>
</function>

6.3 范围约定

<tableRule name="auto-sharding-long">
    <rule>
      <columns>user_id</columns>
      <algorithm>rang-long</algorithm>
    </rule>
</tableRule>
<function name="rang-long" class="io.mycat.route.function.AutoPartitionByLong">
    <property name="mapFile">autopartition-long.txt</property>
</function>

autopartition-long.txt文件：

# range start-end ,data node index
# K=1000,M=10000.
0-500M=0
500M-1000M=1
1000M-1500M=2
或
0-10000000=0
10000001-20000000=1

columns 标识将要分片的表字段，algorithm 分片函数，rang-long 函数中mapFile代表配置文件路径，所有的节点配置都是从0开始，及0代表节点1，此配置非常简单，即预先制定可能的id范围到某个分片。

6.4 求模法

<tableRule name="mod-long">
    <rule>
      <columns>user_id</columns>
      <algorithm>mod-long</algorithm>
    </rule>
</tableRule>
<function name="mod-long" class="io.mycat.route.function.PartitionByMod">
   <!-- how many data nodes  -->
    <property name="count">3</property>
</function>

columns 标识将要分片的表字段，algorithm 分片函数，此种配置非常明确即根据id与count（你的结点数）进行求模预算，相比方式1，此种在批量插入时需要切换数据源，id不连续。

6.5 日期列分区法

<tableRule name="sharding-by-date">
      <rule>
        <columns>create_time</columns>
        <algorithm>sharding-by-date</algorithm>
      </rule>
</tableRule> 
<function name="sharding-by-date" class="io.mycat.route.function..PartitionByDate">
   <property name="dateFormat">yyyy-MM-dd</property>
    <property name="sBeginDate">2014-01-01</property>
    <property name="sPartionDay">10</property>
</function>

columns 标识将要分片的表字段，algorithm 分片函数，配置中配置了开始日期，分区天数，即默认从开始日期算起，分隔10天一个分区。

6.6 通配取模

<tableRule name="sharding-by-pattern">
    <rule>
        <columns>user_id</columns>
        <algorithm>sharding-by-pattern</algorithm>
    </rule>
</tableRule>
<function name="sharding-by-pattern" class="io.mycat.route.function.PartitionByPattern">
    <property name="patternValue">256</property>
    <property name="defaultNode">2</property>
    <property name="mapFile">partition-pattern.txt</property>
</function>

partition-pattern.txt

# id partition range start-end ,data node index
###### first host configuration
1-32=0
33-64=1
65-96=2
97-128=3
######## second host configuration
129-160=4
161-192=5
193-224=6
225-256=7
0-0=7

columns 标识将要分片的表字段，algorithm 分片函数，patternValue 即求模基数，defaoultNode 默认节点，如果不配置了默认，则默认是0即第一个结点。mapFile 配置文件路径，配置文件中，1-32 即代表id%256后分布的范围，如果在1-32则在分区1，其他类推，如果id非数字数据，则会分配在defaoultNode 默认节点。代码示例：

String idVal = "0";
Assert.assertEquals(true, 7 == autoPartition.calculate(idVal));
idVal = "45a";
Assert.assertEquals(true, 2 == autoPartition.calculate(idVal));

6.7 ASCII码求模通配

<tableRule name="sharding-by-prefixpattern">
      <rule>
            <columns>user_id</columns>
            <algorithm>sharding-by-prefixpattern</algorithm>
      </rule>
</tableRule>
<function name="sharding-by-pattern" class="io.mycat.route.function.PartitionByPrefixPattern">
    <property name="patternValue">256</property>
    <property name="prefixLength">5</property>
    <property name="mapFile">partition-pattern.txt</property>
</function>

partition-pattern.txt

# range start-end ,data node index
# ASCII
# 48-57=0-9
# 64、65-90=@、A-Z
# 97-122=a-z
###### first host configuration
1-4=0
5-8=1
9-12=2
13-16=3
###### second host configuration
17-20=4
21-24=5
25-28=6
29-32=7
0-0=7

columns 标识将要分片的表字段，algorithm 分片函数，patternValue 即求模基数，prefixLength ASCII 截取的位数。mapFile 配置文件路径，配置文件中，1-32 即代表id%256后分布的范围，如果在1-32则在分区1，其他类推。

此种方式类似方式6只不过采取的是将列种获取前prefixLength位列所有ASCII码的和进行求模sum%patternValue ,获取的值，在通配范围内的分片数。

/**
* ASCII编码：
* 48-57=0-9阿拉伯数字
* 64、65-90=@、A-Z
* 97-122=a-z
*
*/
String idVal="gf89f9a";
Assert.assertEquals(true, 0 == autoPartition.calculate(idVal));

idVal="8df99a";
Assert.assertEquals(true, 4 == autoPartition.calculate(idVal));

idVal="8dhdf99a";
Assert.assertEquals(true, 3 == autoPartition.calculate(idVal));

6.8 编程指定

<tableRule name="sharding-by-substring">
    <rule>
        <columns>user_id</columns>
        <algorithm>sharding-by-substring</algorithm>
    </rule>
</tableRule>
<function name="sharding-by-substring" class="io.mycat.route.function.PartitionDirectBySubString">
    <property name="startIndex">0</property> <!-- zero-based -->
    <property name="size">2</property>
    <property name="partitionCount">8</property>
    <property name="defaultPartition">0</property>
</function>

columns 标识将要分片的表字段，algorithm 分片函数，此方法为直接根据字符子串（必须是数字）计算分区号（由应用传递参数，显式指定分区号）。

例如id=05-100000002，在此配置中代表根据id中从startIndex=0开始，截取size=2位数字即05，05就是获取的分区，如果没传默认分配到defaultPartition。

6.9 字符串拆分hash解析

<tableRule name="sharding-by-stringhash">
      <rule>
            <columns>user_id</columns>
            <algorithm>sharding-by-stringhash</algorithm>
      </rule>
</tableRule>
<function name="sharding-by-substring" class="io.mycat.route.function.PartitionByString">
    <property name="length">512</property> <!-- zero-based -->
    <property name="count">2</property>
    <property name="hashSlice">0:2</property>
</function>

columns 标识将要分片的表字段，algorithm 分片函数，函数中length代表字符串hash求模基数，count分区数，hashSlice hash预算位，即根据子字符串 hash运算，

hashSlice ： 0 means str.length(), -1 means str.length()-1

/**
     * "2" -> (0,2)<br/>
     * "1:2" -> (1,2)<br/>
     * "1:" -> (1,0)<br/>
     * "-1:" -> (-1,0)<br/>
     * ":-1" -> (0,-1)<br/>
     * ":" -> (0,0)<br/>
     */
public class PartitionByStringTest {

    @Test
    public void test() {
        PartitionByString rule = new PartitionByString();
        String idVal=null;
        rule.setPartitionLength("512");
        rule.setPartitionCount("2");
        rule.init();
        rule.setHashSlice("0:2");
        // idVal = "0";
        // Assert.assertEquals(true, 0 == rule.calculate(idVal));
        // idVal = "45a";
        //Assert.assertEquals(true, 1 == rule.calculate(idVal));

         //last 4
        rule = new PartitionByString();
        rule.setPartitionLength("512");
        rule.setPartitionCount("2");
        rule.init();
        //last 4 characters
        rule.setHashSlice("-4:0");
        idVal = "aaaabbb0000";
        Assert.assertEquals(true, 0 == rule.calculate(idVal));
        idVal = "aaaabbb2359";
        Assert.assertEquals(true, 0 == rule.calculate(idVal));　　}
}

6.10 一致性hash

<tableRule name="sharding-by-murmur">
      <rule>
        <columns>user_id</columns>
        <algorithm>murmur</algorithm>
      </rule>
</tableRule>
<function name="murmur" class="io.mycat.route.function.PartitionByMurmurHash">
      <property name="seed">0</property><!-- 默认是0-->
      <property name="count">2</property><!-- 要分片的数据库节点数量，必须指定，否则没法分片 -->
      <property name="virtualBucketTimes">160</property><!-- 一个实际的数据库节点被映射为这么多虚拟节点，默认是160倍，也就是虚拟节点数是物理节点数的160倍-->
      <!--
      <property name="weightMapFile">weightMapFile</property>
                     节点的权重，没有指定权重的节点默认是1。以properties文件的格式填写，以从0开始到count-1的整数值也就是节点索引为key，以节点权重值为值。所有权重值必须是正整数，否则以1代替 -->
      <!--
      <property name="bucketMapPath">/etc/mycat/bucketMapPath</property>
                      用于测试时观察各物理节点与虚拟节点的分布情况，如果指定了这个属性，会把虚拟节点的murmur hash值与物理节点的映射按行输出到这个文件，没有默认值，如果不指定，就不会输出任何东西 -->
</function>

一致性hash预算有效解决了分布式数据的扩容问题，前1-9中id规则都多少存在数据扩容难题，而10规则解决了数据扩容难点。

技术闲聊DD

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
打赏
0
评论
Mycat学习（6）--- Mycat实现分库分表（水平）

1 水平拆分思路当表中数据量特别大的时候，就要考虑到表数据的水平拆分。水平拆分也就是按照某个字段的某种规则来分散到多个库之中，每个表中包含一部分数据。拆分数据就需要定义分片规则，几种典型的分片规则包括：按照用户 ID 求模，将数据分散到不同的数据库，具有相同数据用户的数据都被分散到一个库中；按照日期，将不同月甚至日的数据分散到不同的库中；按照某个特定的字段求摸，或者根据特定范围段分散到不同的库中。优点：拆分规则抽象好，join 操作基本可以数据库做；不存在单库大数据，高并发的性能瓶颈；
复制链接

扫一扫