ShardingSphere是一款起源于当当网内部的应用框架,2015年在当当网内部诞生,2016年由主要开发人员张亮带入京东数科,在国内经历了当当网、电信翼支付、京东数科等多家大型互联网企业的考验,在2017年开源。
并逐渐由原本只关注于关系型数据库增强工具的ShardingJDBC升级成为一整套以数据分片为基础的数据生态圈,更名为ShardingSphere;在2020年4月,成为Apache软件基金会顶级项目。
ShardingSphere包含三个重要的产品:ShardingJDBC、ShardingProxy、ShardingSidecar.
ShardingJDBC用来做客户端分库分表的产品;ShardingProxy用来做服务端分库分表的产品;ShardingSidecar目前在规划中,针对service mesh定位的分库分表插件;
ShardingJDBC
ShardingJDBC可以理解为一个特殊的JDBC驱动,所有分库分表逻辑均由业务方自己控制,功能灵活,支持的数据库也非常多,但对业务侵入大,需要业务方自己控制所有的分库分表逻辑。
核心功能是数据分片和读写分离,通过ShardingJDBC,应用可以透明的使用JDBC访问已经分库分表、读写分离的多个数据源,而不用关心数据源的数量及数据如何分布。
核心概念:
逻辑表:水平拆分的数据库的相同逻辑和数据结构表的总称
真实表:在分片的数据库中真实存在的物理表
数据节点:数据分片的最小单元,由数据源名称和数据表组成
绑定表:分片规则一致的主表和子表
广播表:也叫公共表,指所有的分片数据源中都存在的表,表结构和表中数据在每个数据库中都完全一致,e.g. 字典表
分片键:用于分片的数据库字段,是将数据库(表)进行水平拆分的关键字段。如果SQL中没有分片字段,将会执行全路由,性能很差。
分片算法:通过分片算法将数据进行分片,支持通过=、BETWEEN和IN分片。分片算法需要由应用开发者自行实现,可实现的灵活度非常高。
分片策略:真正用于进行分片操作的是分片键+分片算法,也就是分片策略。在ShardingJDBC中一般采用基于Groovy表达式的inline分片策略,通过一个包含分片键的算法表达式来制定分片策略,如:t_user_$->{u_id%8} 标识根据u_id模8,分成8张表,表名称为t_user_0到t_user_7。
简单分库分表实战
添加maven依赖
<dependencies>
<dependency>
<groupId>org.apache.shardingsphere</groupId>
<artifactId>sharding-jdbc-spring-boot-starter</artifactId>
<version>4.1.1</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
</dependency>
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>druid</artifactId>
<version>1.1.22</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
</dependency>
<dependency>
<groupId>com.baomidou</groupId>
<artifactId>mybatis-plus-boot-starter</artifactId>
<version>3.0.5</version>
</dependency>
</dependencies>
yml配置
spring:
shardingsphere:
#配置数据源
datasource:
names: m1
m1:
type: com.alibaba.druid.pool.DruidDataSource
driver-class-name: com.mysql.cj.jdbc.Driver
url: jdbc:mysql://116.62.207.166:10000/coursedb?serverTimezone=GMT%2B8
username: root
password: root
sharding:
tables:
course:
#配置真实表分布
actual-data-nodes: m1.course_$->{1..2}
#主键生成策略
key-generator:
column: cid
type: SNOWFLAKE
props:
worker:
id: 1
tableStrategy:
inline:
sharding-column: cid
algorithm-expression: course_$->{cid%2+1}
#其他运行属性
props:
sql:
show: true
main:
allow-bean-definition-overriding: true
1、首先定义一个数据源m1,并对m1进行实际的JDBC参数配置
2、spring.shardingsphere.sharding.tables.course开头的一系列属性,定义了一个名为course的逻辑表;actual-data-nodes属性定义course逻辑表的实际数据分布情况,他分布在m1.course_1和m1.course_2两个表;key-generator属性配置了他的主键列以及主键生成策略;
ShardingJDBC默认提供了UUID和SNOWFLAK两种分布式主键生成策略;
table-strategy属性即配置他的分库分表策略;分片键为cid属性,分片算法为course_$->{cid%2+1},表示按照cid模2+1的结果,然后加上前面的course_部分作为前缀就是他的实际表结果。如果这个表达式计算出来的结果和实际数据分布对应不上,会报错。
sql.show属性表示要在日志中打印实际SQL
运行测试类
日志如下:
2021-11-08 14:31:26.508 INFO 33000 --- [ main] ShardingSphere-SQL : Logic SQL: INSERT INTO course ( cname,
user_id,
cstatus ) VALUES ( ?,
?,
? )
2021-11-08 14:31:26.508 INFO 33000 --- [ main] ShardingSphere-SQL : SQLStatement: InsertStatementContext(super=CommonSQLStatementContext(sqlStatement=org.apache.shardingsphere.sql.parser.sql.statement.dml.InsertStatement@2be818da, tablesContext=org.apache.shardingsphere.sql.parser.binder.segment.table.TablesContext@46320c9a), tablesContext=org.apache.shardingsphere.sql.parser.binder.segment.table.TablesContext@46320c9a, columnNames=[cname, user_id, cstatus], insertValueContexts=[InsertValueContext(parametersCount=3, valueExpressions=[ParameterMarkerExpressionSegment(startIndex=59, stopIndex=59, parameterMarkerIndex=0), ParameterMarkerExpressionSegment(startIndex=62, stopIndex=62, parameterMarkerIndex=1), ParameterMarkerExpressionSegment(startIndex=65, stopIndex=65, parameterMarkerIndex=2), DerivedParameterMarkerExpressionSegment(super=ParameterMarkerExpressionSegment(startIndex=0, stopIndex=0, parameterMarkerIndex=3))], parameters=[shardingsphere, 1000, 1])], generatedKeyContext=Optional[GeneratedKeyContext(columnName=cid, generated=true, generatedValues=[664476262894407680])])
2021-11-08 14:31:26.508 INFO 33000 --- [ main] ShardingSphere-SQL : Actual SQL: m1 ::: INSERT INTO course_1 ( cname,
user_id,
cstatus , cid) VALUES (?, ?, ?, ?) ::: [shardingsphere, 1000, 1, 664476262894407680]
2021-11-08 14:31:26.575 INFO 33000 --- [ main] ShardingSphere-SQL : Logic SQL: INSERT INTO course ( cname,
user_id,
cstatus ) VALUES ( ?,
?,
? )
2021-11-08 14:31:26.575 INFO 33000 --- [ main] ShardingSphere-SQL : SQLStatement: InsertStatementContext(super=CommonSQLStatementContext(sqlStatement=org.apache.shardingsphere.sql.parser.sql.statement.dml.InsertStatement@2be818da, tablesContext=org.apache.shardingsphere.sql.parser.binder.segment.table.TablesContext@2bb62414), tablesContext=org.apache.shardingsphere.sql.parser.binder.segment.table.TablesContext@2bb62414, columnNames=[cname, user_id, cstatus], insertValueContexts=[InsertValueContext(parametersCount=3, valueExpressions=[ParameterMarkerExpressionSegment(startIndex=59, stopIndex=59, parameterMarkerIndex=0), ParameterMarkerExpressionSegment(startIndex=62, stopIndex=62, parameterMarkerIndex=1), ParameterMarkerExpressionSegment(startIndex=65, stopIndex=65, parameterMarkerIndex=2), DerivedParameterMarkerExpressionSegment(super=ParameterMarkerExpressionSegment(startIndex=0, stopIndex=0, parameterMarkerIndex=3))], parameters=[shardingsphere, 1001, 1])], generatedKeyContext=Optional[GeneratedKeyContext(columnName=cid, generated=true, generatedValues=[664476263435472897])])
2021-11-08 14:31:26.576 INFO 33000 --- [ main] ShardingSphere-SQL : Actual SQL: m1 ::: INSERT INTO course_2 ( cname,
user_id,
cstatus , cid) VALUES (?, ?, ?, ?) ::: [shardingsphere, 1001, 1, 664476263435472897]
2021-11-08 14:31:26.610 INFO 33000 --- [ main] ShardingSphere-SQL : Logic SQL: INSERT INTO course ( cname,
user_id,
cstatus ) VALUES ( ?,
?,
? )
2021-11-08 14:31:26.610 INFO 33000 --- [ main] ShardingSphere-SQL : SQLStatement: InsertStatementContext(super=CommonSQLStatementContext(sqlStatement=org.apache.shardingsphere.sql.parser.sql.statement.dml.InsertStatement@2be818da, tablesContext=org.apache.shardingsphere.sql.parser.binder.segment.table.TablesContext@2443135), tablesContext=org.apache.shardingsphere.sql.parser.binder.segment.table.TablesContext@2443135, columnNames=[cname, user_id, cstatus], insertValueContexts=[InsertValueContext(parametersCount=3, valueExpressions=[ParameterMarkerExpressionSegment(startIndex=59, stopIndex=59, parameterMarkerIndex=0), ParameterMarkerExpressionSegment(startIndex=62, stopIndex=62, parameterMarkerIndex=1), ParameterMarkerExpressionSegment(startIndex=65, stopIndex=65, parameterMarkerIndex=2), DerivedParameterMarkerExpressionSegment(super=ParameterMarkerExpressionSegment(startIndex=0, stopIndex=0, parameterMarkerIndex=3))], parameters=[shardingsphere, 1002, 1])], generatedKeyContext=Optional[GeneratedKeyContext(columnName=cid, generated=true, generatedValues=[664476263582273536])])
2021-11-08 14:31:26.610 INFO 33000 --- [ main] ShardingSphere-SQL : Actual SQL: m1 ::: INSERT INTO course_1 ( cname,
user_id,
cstatus , cid) VALUES (?, ?, ?, ?) ::: [shardingsphere, 1002, 1, 664476263582273536]
2021-11-08 14:31:26.648 INFO 33000 --- [ main] ShardingSphere-SQL : Logic SQL: INSERT INTO course ( cname,
user_id,
cstatus ) VALUES ( ?,
?,
? )
2021-11-08 14:31:26.648 INFO 33000 --- [ main] ShardingSphere-SQL : SQLStatement: InsertStatementContext(super=CommonSQLStatementContext(sqlStatement=org.apache.shardingsphere.sql.parser.sql.statement.dml.InsertStatement@2be818da, tablesContext=org.apache.shardingsphere.sql.parser.binder.segment.table.TablesContext@3ca14cf4), tablesContext=org.apache.shardingsphere.sql.parser.binder.segment.table.TablesContext@3ca14cf4, columnNames=[cname, user_id, cstatus], insertValueContexts=[InsertValueContext(parametersCount=3, valueExpressions=[ParameterMarkerExpressionSegment(startIndex=59, stopIndex=59, parameterMarkerIndex=0), ParameterMarkerExpressionSegment(startIndex=62, stopIndex=62, parameterMarkerIndex=1), ParameterMarkerExpressionSegment(startIndex=65, stopIndex=65, parameterMarkerIndex=2), DerivedParameterMarkerExpressionSegment(super=ParameterMarkerExpressionSegment(startIndex=0, stopIndex=0, parameterMarkerIndex=3))], parameters=[shardingsphere, 1003, 1])], generatedKeyContext=Optional[GeneratedKeyContext(columnName=cid, generated=true, generatedValues=[664476263741657089])])
2021-11-08 14:31:26.649 INFO 33000 --- [ main] ShardingSphere-SQL : Actual SQL: m1 ::: INSERT INTO course_2 ( cname,
user_id,
cstatus , cid) VALUES (?, ?, ?, ?) ::: [shardingsphere, 1003, 1, 664476263741657089]
2021-11-08 14:31:26.691 INFO 33000 --- [ main] ShardingSphere-SQL : Logic SQL: INSERT INTO course ( cname,
user_id,
cstatus ) VALUES ( ?,
?,
? )
2021-11-08 14:31:26.692 INFO 33000 --- [ main] ShardingSphere-SQL : SQLStatement: InsertStatementContext(super=CommonSQLStatementContext(sqlStatement=org.apache.shardingsphere.sql.parser.sql.statement.dml.InsertStatement@2be818da, tablesContext=org.apache.shardingsphere.sql.parser.binder.segment.table.TablesContext@1e98b788), tablesContext=org.apache.shardingsphere.sql.parser.binder.segment.table.TablesContext@1e98b788, columnNames=[cname, user_id, cstatus], insertValueContexts=[InsertValueContext(parametersCount=3, valueExpressions=[ParameterMarkerExpressionSegment(startIndex=59, stopIndex=59, parameterMarkerIndex=0), ParameterMarkerExpressionSegment(startIndex=62, stopIndex=62, parameterMarkerIndex=1), ParameterMarkerExpressionSegment(startIndex=65, stopIndex=65, parameterMarkerIndex=2), DerivedParameterMarkerExpressionSegment(super=ParameterMarkerExpressionSegment(startIndex=0, stopIndex=0, parameterMarkerIndex=3))], parameters=[shardingsphere, 1004, 1])], generatedKeyContext=Optional[GeneratedKeyContext(columnName=cid, generated=true, generatedValues=[664476263922012160])])
2021-11-08 14:31:26.692 INFO 33000 --- [ main] ShardingSphere-SQL : Actual SQL: m1 ::: INSERT INTO course_1 ( cname,
user_id,
cstatus , cid) VALUES (?, ?, ?, ?) ::: [shardingsphere, 1004, 1, 664476263922012160]
2021-11-08 14:31:26.730 INFO 33000 --- [ main] ShardingSphere-SQL : Logic SQL: INSERT INTO course ( cname,
user_id,
cstatus ) VALUES ( ?,
?,
? )
2021-11-08 14:31:26.731 INFO 33000 --- [ main] ShardingSphere-SQL : SQLStatement: InsertStatementContext(super=CommonSQLStatementContext(sqlStatement=org.apache.shardingsphere.sql.parser.sql.statement.dml.InsertStatement@2be818da, tablesContext=org.apache.shardingsphere.sql.parser.binder.segment.table.TablesContext@10f477e2), tablesContext=org.apache.shardingsphere.sql.parser.binder.segment.table.TablesContext@10f477e2, columnNames=[cname, user_id, cstatus], insertValueContexts=[InsertValueContext(parametersCount=3, valueExpressions=[ParameterMarkerExpressionSegment(startIndex=59, stopIndex=59, parameterMarkerIndex=0), ParameterMarkerExpressionSegment(startIndex=62, stopIndex=62, parameterMarkerIndex=1), ParameterMarkerExpressionSegment(startIndex=65, stopIndex=65, parameterMarkerIndex=2), DerivedParameterMarkerExpressionSegment(super=ParameterMarkerExpressionSegment(startIndex=0, stopIndex=0, parameterMarkerIndex=3))], parameters=[shardingsphere, 1005, 1])], generatedKeyContext=Optional[GeneratedKeyContext(columnName=cid, generated=true, generatedValues=[664476264085590017])])
2021-11-08 14:31:26.731 INFO 33000 --- [ main] ShardingSphere-SQL : Actual SQL: m1 ::: INSERT INTO course_2 ( cname,
user_id,
cstatus , cid) VALUES (?, ?, ?, ?) ::: [shardingsphere, 1005, 1, 664476264085590017]
2021-11-08 14:31:26.768 INFO 33000 --- [ main] ShardingSphere-SQL : Logic SQL: INSERT INTO course ( cname,
user_id,
cstatus ) VALUES ( ?,
?,
? )
2021-11-08 14:31:26.768 INFO 33000 --- [ main] ShardingSphere-SQL : SQLStatement: InsertStatementContext(super=CommonSQLStatementContext(sqlStatement=org.apache.shardingsphere.sql.parser.sql.statement.dml.InsertStatement@2be818da, tablesContext=org.apache.shardingsphere.sql.parser.binder.segment.table.TablesContext@5729b410), tablesContext=org.apache.shardingsphere.sql.parser.binder.segment.table.TablesContext@5729b410, columnNames=[cname, user_id, cstatus], insertValueContexts=[InsertValueContext(parametersCount=3, valueExpressions=[ParameterMarkerExpressionSegment(startIndex=59, stopIndex=59, parameterMarkerIndex=0), ParameterMarkerExpressionSegment(startIndex=62, stopIndex=62, parameterMarkerIndex=1), ParameterMarkerExpressionSegment(startIndex=65, stopIndex=65, parameterMarkerIndex=2), DerivedParameterMarkerExpressionSegment(super=ParameterMarkerExpressionSegment(startIndex=0, stopIndex=0, parameterMarkerIndex=3))], parameters=[shardingsphere, 1006, 1])], generatedKeyContext=Optional[GeneratedKeyContext(columnName=cid, generated=true, generatedValues=[664476264244973568])])
2021-11-08 14:31:26.768 INFO 33000 --- [ main] ShardingSphere-SQL : Actual SQL: m1 ::: INSERT INTO course_1 ( cname,
user_id,
cstatus , cid) VALUES (?, ?, ?, ?) ::: [shardingsphere, 1006, 1, 664476264244973568]
2021-11-08 14:31:26.805 INFO 33000 --- [ main] ShardingSphere-SQL : Logic SQL: INSERT INTO course ( cname,
user_id,
cstatus ) VALUES ( ?,
?,
? )
2021-11-08 14:31:26.805 INFO 33000 --- [ main] ShardingSphere-SQL : SQLStatement: InsertStatementContext(super=CommonSQLStatementContext(sqlStatement=org.apache.shardingsphere.sql.parser.sql.statement.dml.InsertStatement@2be818da, tablesContext=org.apache.shardingsphere.sql.parser.binder.segment.table.TablesContext@553d2579), tablesContext=org.apache.shardingsphere.sql.parser.binder.segment.table.TablesContext@553d2579, columnNames=[cname, user_id, cstatus], insertValueContexts=[InsertValueContext(parametersCount=3, valueExpressions=[ParameterMarkerExpressionSegment(startIndex=59, stopIndex=59, parameterMarkerIndex=0), ParameterMarkerExpressionSegment(startIndex=62, stopIndex=62, parameterMarkerIndex=1), ParameterMarkerExpressionSegment(startIndex=65, stopIndex=65, parameterMarkerIndex=2), DerivedParameterMarkerExpressionSegment(super=ParameterMarkerExpressionSegment(startIndex=0, stopIndex=0, parameterMarkerIndex=3))], parameters=[shardingsphere, 1007, 1])], generatedKeyContext=Optional[GeneratedKeyContext(columnName=cid, generated=true, generatedValues=[664476264395968513])])
2021-11-08 14:31:26.805 INFO 33000 --- [ main] ShardingSphere-SQL : Actual SQL: m1 ::: INSERT INTO course_2 ( cname,
user_id,
cstatus , cid) VALUES (?, ?, ?, ?) ::: [shardingsphere, 1007, 1, 664476264395968513]
2021-11-08 14:31:26.844 INFO 33000 --- [ main] ShardingSphere-SQL : Logic SQL: INSERT INTO course ( cname,
user_id,
cstatus ) VALUES ( ?,
?,
? )
2021-11-08 14:31:26.844 INFO 33000 --- [ main] ShardingSphere-SQL : SQLStatement: InsertStatementContext(super=CommonSQLStatementContext(sqlStatement=org.apache.shardingsphere.sql.parser.sql.statement.dml.InsertStatement@2be818da, tablesContext=org.apache.shardingsphere.sql.parser.binder.segment.table.TablesContext@668cc9a2), tablesContext=org.apache.shardingsphere.sql.parser.binder.segment.table.TablesContext@668cc9a2, columnNames=[cname, user_id, cstatus], insertValueContexts=[InsertValueContext(parametersCount=3, valueExpressions=[ParameterMarkerExpressionSegment(startIndex=59, stopIndex=59, parameterMarkerIndex=0), ParameterMarkerExpressionSegment(startIndex=62, stopIndex=62, parameterMarkerIndex=1), ParameterMarkerExpressionSegment(startIndex=65, stopIndex=65, parameterMarkerIndex=2), DerivedParameterMarkerExpressionSegment(super=ParameterMarkerExpressionSegment(startIndex=0, stopIndex=0, parameterMarkerIndex=3))], parameters=[shardingsphere, 1008, 1])], generatedKeyContext=Optional[GeneratedKeyContext(columnName=cid, generated=true, generatedValues=[664476264559546368])])
2021-11-08 14:31:26.844 INFO 33000 --- [ main] ShardingSphere-SQL : Actual SQL: m1 ::: INSERT INTO course_1 ( cname,
user_id,
cstatus , cid) VALUES (?, ?, ?, ?) ::: [shardingsphere, 1008, 1, 664476264559546368]
2021-11-08 14:31:26.890 INFO 33000 --- [ main] ShardingSphere-SQL : Logic SQL: INSERT INTO course ( cname,
user_id,
cstatus ) VALUES ( ?,
?,
? )
2021-11-08 14:31:26.890 INFO 33000 --- [ main] ShardingSphere-SQL : SQLStatement: InsertStatementContext(super=CommonSQLStatementContext(sqlStatement=org.apache.shardingsphere.sql.parser.sql.statement.dml.InsertStatement@2be818da, tablesContext=org.apache.shardingsphere.sql.parser.binder.segment.table.TablesContext@3c9ef6e9), tablesContext=org.apache.shardingsphere.sql.parser.binder.segment.table.TablesContext@3c9ef6e9, columnNames=[cname, user_id, cstatus], insertValueContexts=[InsertValueContext(parametersCount=3, valueExpressions=[ParameterMarkerExpressionSegment(startIndex=59, stopIndex=59, parameterMarkerIndex=0), ParameterMarkerExpressionSegment(startIndex=62, stopIndex=62, parameterMarkerIndex=1), ParameterMarkerExpressionSegment(startIndex=65, stopIndex=65, parameterMarkerIndex=2), DerivedParameterMarkerExpressionSegment(super=ParameterMarkerExpressionSegment(startIndex=0, stopIndex=0, parameterMarkerIndex=3))], parameters=[shardingsphere, 1009, 1])], generatedKeyContext=Optional[GeneratedKeyContext(columnName=cid, generated=true, generatedValues=[664476264756678657])])
2021-11-08 14:31:26.890 INFO 33000 --- [ main] ShardingSphere-SQL : Actual SQL: m1 ::: INSERT INTO course_2 ( cname,
user_id,
cstatus , cid) VALUES (?, ?, ?, ?) ::: [shardingsphere, 1009, 1, 664476264756678657]
2021-11-08 14:31:26.933 INFO 33000 --- [extShutdownHook] com.alibaba.druid.pool.DruidDataSource : {dataSource-1} closing ...
2021-11-08 14:31:26.943 INFO 33000 --- [extShutdownHook] com.alibaba.druid.pool.DruidDataSource : {dataSource-1} closed
整个分库分表的核心就是在于配置的分片算法。inline分片算法:提供一个分片键和一个分片表达式来制定分片算法,这种方式配置简单,功能灵活,是分库分表最佳配置方式,满足大多数的分库分表场景。
分库分表带来的问题
1、ShardingSphere的这种分库分表,是希望在软件层面对硬件资源进行管理,从而便于对数据库的横向扩展。
2、一般情况下,如果单机数据库容量支撑不住了,首先考虑缓存技术降低数据库访问压力,其次考虑读写分离,最后才是考虑分库分表;在实际设计系统时,最好用MySQL数据库只存储关系性较强的热点数据,而对海量数据采取另外的一些分布式存储产品,如:PostGreSQL甚至HBase、Hive、ES等这些大数据组件来存储。
3、分库分表,对SQL语句的支持,其实是步步维艰的,很容易造成SQL语句不支持、业务数据混乱。实际使用中,建议分库分表,能不用就尽量不使用。如果确定要分库分表,设计之初将数据库往简单的增删改查的数据存储层放心进行弱化,首先规划垂直拆分的策略,至于水平拆分,应谨慎谨慎再谨慎,一般就日志表、操作记录表等很少的一些边缘场景才偶尔用用。