公司的项目,订单表已经过3000万,并且日均8万+的增速在新增,已经尝试过单表优化,目前如果不全量查询,还没有明显的卡顿,但是分库分表需要提上日程了。
公司使用的是.net core6+mysql的单体架构,目前来说,十分脆弱,做应用集群和网关负载均衡放到后期实现,先把最要紧的分库分表实现一下,计划在2024年年初,完成数据库这部分的架构调整。
经过一段时间的选型和尝试,决定使用shardingsphere-proxy5.x作为中间件。这里吐槽一下,网上的教程真的是一言难尽,十个教程重复九个,还都只是最简单的通过对字段取余进行分库分表,用的还都是shardingproxy4.x,b站上的视频教程也是高度重复。
以我这次需要操作的订单表为例,需要通过对orderTime字段判断,按年度分库,按月份分表,我计划demo里先从2022分到2024年,即3个库,每个库的订单表有12张,orders_1…orders_12,在新增订单时,中间件会根据orderTime字段的值,根据算法自动插入对应的库和表。
摸索了几天,实现了一个简单的案例。希望能够帮助到有需要的朋友。
1.下载安装shardingsphere-proxy5.x
下载安装https://archive.apache.org/dist/shardingsphere/5.0.0/
解压后,可以看到文件夹里有config-sharding.yaml和server.yaml两个yaml文件,个人建议不要用记事本修改,你可能会不注意缩进。
先修改config-sharding.yaml
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
######################################################################################################
#
# Here you can configure the rules for the proxy.
# This example is configuration of sharding rule.
#
######################################################################################################
#
#databaseName: sharding_db
#
#dataSources:
# ds_0:
# url: jdbc:postgresql://127.0.0.1:5432/demo_ds_0
# username: postgres
# password: postgres
# connectionTimeoutMilliseconds: 30000
# idleTimeoutMilliseconds: 60000
# maxLifetimeMilliseconds: 1800000
# maxPoolSize: 50
# minPoolSize: 1
# ds_1:
# url: jdbc:postgresql://127.0.0.1:5432/demo_ds_1
# username: postgres
# password: postgres
# connectionTimeoutMilliseconds: 30000
# idleTimeoutMilliseconds: 60000
# maxLifetimeMilliseconds: 1800000
# maxPoolSize: 50
# minPoolSize: 1
#
#rules:
#- !SHARDING
# tables:
# t_order:
# actualDataNodes: ds_${0..1}.t_order_${0..1}
# tableStrategy:
# standard:
# shardingColumn: order_id
# shardingAlgorithmName: t_order_inline
# keyGenerateStrategy:
# column: order_id
# keyGeneratorName: snowflake
# auditStrategy:
# auditorNames:
# - sharding_key_required_auditor
# allowHintDisable: true
# t_order_item:
# actualDataNodes: ds_${0..1}.t_order_item_${0..1}
# tableStrategy:
# standard:
# shardingColumn: order_id
# shardingAlgorithmName: t_order_item_inline
# keyGenerateStrategy:
# column: order_item_id
# keyGeneratorName: snowflake
# bindingTables:
# - t_order,t_order_item
# defaultDatabaseStrategy:
# standard:
# shardingColumn: user_id
# shardingAlgorithmName: database_inline
# defaultTableStrategy:
# none:
# defaultAuditStrategy:
# auditorNames:
# - sharding_key_required_auditor
# allowHintDisable: true
#
# shardingAlgorithms:
# database_inline:
# type: INLINE
# props:
# algorithm-expression: ds_${user_id % 2}
# t_order_inline:
# type: INLINE
# props:
# algorithm-expression: t_order_${order_id % 2}
# t_order_item_inline:
# type: INLINE
# props:
# algorithm-expression: t_order_item_${order_id % 2}
#
# keyGenerators:
# snowflake:
# type: SNOWFLAKE
#
# auditors:
# sharding_key_required_auditor:
# type: DML_SHARDING_CONDITIONS
######################################################################################################
#
# If you want to connect to MySQL, you should manually copy MySQL driver to lib directory.
#
######################################################################################################
databaseName: orders_sharding_proxy # 逻辑库
dataSources:
ds2022:
url: jdbc:mysql://127.0.0.1:3306/dev?serverTimezone=UTC&useSSL=false&allowPublicKeyRetrieval=true
username: root
password: root
connectionTimeoutMilliseconds: 30000
idleTimeoutMilliseconds: 60000
maxLifetimeMilliseconds: 1800000
maxPoolSize: 50
minPoolSize: 1
ds2023:
url: jdbc:mysql://127.0.0.1:3306/dev2?serverTimezone=UTC&useSSL=false&allowPublicKeyRetrieval=true
username: root
password: root
connectionTimeoutMilliseconds: 30000
idleTimeoutMilliseconds: 60000
maxLifetimeMilliseconds: 1800000
maxPoolSize: 50
minPoolSize: 1
ds2024:
url: jdbc:mysql://127.0.0.1:3306/dev3?serverTimezone=UTC&useSSL=false&allowPublicKeyRetrieval=true
username: root
password: root
connectionTimeoutMilliseconds: 30000
idleTimeoutMilliseconds: 60000
maxLifetimeMilliseconds: 1800000
maxPoolSize: 50
minPoolSize: 1
rules:
- !SHARDING
tables:
orders:
actualDataNodes: ds${2022..2024}.orders_${1..12}
tableStrategy:
standard:
shardingColumn: orderTime
shardingAlgorithmName: table_mod
databaseStrategy: # 分库规则
standard: # 标准类型分片,目前官方有四种分片类型
shardingColumn: orderTime
shardingAlgorithmName: database_mod # 算法名称
# databaseStrategy: # 分库规则
# standard:
# shardingColumn: orderTime
# shardingAlgorithmName: order_MOD
# keyGenerateStrategy:
# column: id
# keyGeneratorName: snowflake
# auditStrategy:
# auditorNames:
# - sharding_key_required_auditor
# allowHintDisable: true
# t_order_item:
# actualDataNodes: ds_${0..1}.t_order_item_${0..1}
# tableStrategy:
# standard:
# shardingColumn: order_id
# shardingAlgorithmName: t_order_item_inline
# keyGenerateStrategy:
# column: order_item_id
# keyGeneratorName: snowflake
# bindingTables:
# - t_order,t_order_item
# defaultDatabaseStrategy:
# standard:
# shardingColumn: user_id
# shardingAlgorithmName: database_inline
# defaultTableStrategy:
# none:
# defaultAuditStrategy:
# auditorNames:
# - sharding_key_required_auditor
# allowHintDisable: true
#
shardingAlgorithms:
table_mod:
type: CLASS_BASED
props:
strategy: STANDARD
algorithmClassName: com.ecs2uj.shardingspherediy.EcsShardingTableMonthAlgorithm #引用分库类
database_mod:
type: CLASS_BASED
props:
strategy: STANDARD #设置属性
algorithmClassName: com.ecs2uj.shardingspherediy.EcsShardingDatabaseDateYearAlgorithm #引用分库类
# t_order_inline:
# type: INLINE
# props:
# algorithm-expression: t_order_${order_id % 2}
# t_order_item_inline:
# type: INLINE
# props:
# algorithm-expression: t_order_item_${order_id % 2}
#
# keyGenerators:
# snowflake:
# type: SNOWFLAKE
#
# auditors:
# sharding_key_required_auditor:
# type: DML_SHARDING_CONDITIONS
↑↑↑以我的配置为例,我的虚拟库名叫orders_sharding_proxy,我理解为这个库是对外的一个代理,不是真实的库,是单个或多个真实库整合后的映射。名字随便起
dataSources是配置真实库的信息,例如我的ds2022,这也是代称,这个节点下面的信息是真实的数据库连接信息,我配置了ds2022,ds2023,ds2024三个库。
rules节点下是规则的配置,看上去复杂,确实挺烦的。actualDataNodes意思是这张表在这些库里分成这些表。听着绕,其实仔细想象很简单,例如我写的,就是在ds2022到ds2024三个库里分成orders_1到orders_12共计36张表。
tableStrategy是配置分表算法,即根据什么去分,网上千篇一律的是根据id等字段取余后,分到不同的表中,我这里是自定义了一个方法,即通过判断orderTime的月份,去匹配orders_xx的月份。具体方法贴在后面,先讲讲释义。databaseStrategy是配置分库算法,跟分表一个意思。shardingColumn是判断的字段,shardingAlgorithmName是算法的名称,名称在下面的shardingAlgorithms节点里配置。分库分表的配置方式是一样的,算法是我写的自定义方法,实现了StandardShardingAlgorithm类,具体方法看下面。
自定义分表(按月)
创建springboot项目,引入jar包
<dependency>
<groupId>org.apache.shardingsphere</groupId>
<artifactId>shardingsphere-jdbc-core-spring-boot-starter</artifactId>
<version>5.2.0</version>
</dependency>
这里我遇到一个问题,在后面打包运行的时候,提示版本高了,是因为我用的是jdk1.8,但是生成的pom里,parent节点下spring-boot-starter-parent的版本太高,我改成2.2.3.RELEASE后,就ok了
接下来新建一个类,实现StandardShardingAlgorithm,重写doSharding方法。下面是分表算法,返回值应当是一个表名,即orders_月份
package com.ecs2uj.shardingspherediy;
import org.apache.shardingsphere.sharding.api.sharding.standard.PreciseShardingValue;
import org.apache.shardingsphere.sharding.api.sharding.standard.RangeShardingValue;
import org.apache.shardingsphere.sharding.api.sharding.standard.StandardShardingAlgorithm;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.time.LocalDate;
import java.util.Collection;
import java.util.Date;
import java.util.Properties;
public class EcsShardingTableMonthAlgorithm implements StandardShardingAlgorithm<String> {
private static final String Table_Name = "orders_"; // 表名公用前缀
private int ModedID = 5;
@Override
public String doSharding(Collection<String> availableTargetNames, PreciseShardingValue<String> shardingValue) {
if(shardingValue != null){
String value = shardingValue.getValue();
int month = -1;
try{
month = getMonthFromString(value);
}catch (Exception e){
e.printStackTrace();
}
// 判断表名是否存在
for(String each:availableTargetNames){
// new UnsupportedOperationException(each+"--"+Table_Name + String.valueOf(month));
if (each.endsWith(String.valueOf(month))) {
return each;
}
//throw new UnsupportedOperationException("content_channel没有匹配到可用表");
}
}else{
throw new UnsupportedOperationException("分片列为空");
}
return null;
}
// 获取字段中的月份数字
private int getMonthFromString(String value) throws ParseException{
int month = -1;
try{
if(null == value || "" == value){
Date date = new Date();
return date.getMonth();
}else{
try{
SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
try {
Date date = dateFormat.parse(value);
System.out.println(date);
month = date.getMonth();
} catch (ParseException e) {
e.printStackTrace();
}
}catch(Exception e){
Date date = new Date();
return date.getMonth();
}
}
}catch(Exception e){
Date date = new Date();
return date.getMonth();
}
return month+1;
}
@Override
public Properties getProps() {
return null;
}
@Override
public void init(Properties properties) {
}
@Override
public Collection<String> doSharding(Collection<String> collection, RangeShardingValue<String> rangeShardingValue) {
return null;
}
}
自定义分库(按年)
上面完成了分表算法,接下来是分库算法,新增一个类,同样实现了StandardShardingAlgorithm类,重写doSharing方法。和分表不同的,只是一个是取表名的月份,一个是取库名的年份。
package com.ecs2uj.shardingspherediy;
import com.google.common.collect.Range;
import org.apache.shardingsphere.sharding.api.sharding.standard.PreciseShardingValue;
import org.apache.shardingsphere.sharding.api.sharding.standard.RangeShardingValue;
import org.apache.shardingsphere.sharding.api.sharding.standard.StandardShardingAlgorithm;
import java.text.ParseException;
import java.util.ArrayList;
import java.util.Collection;
import java.util.Date;
import java.util.Properties;
public class EcsShardingDatabaseDateYearAlgorithm implements StandardShardingAlgorithm<String> {
private int ModedID = 5;
private static final String DataBase_Name = "ds";
//获取传入值的年份
private int getYearFromString(String value) throws ParseException{
int nYear = -1;
try{
if(null == value || ""==value||value.length()<4){
Date date = new Date();
return date.getYear();
}else{
try{
nYear = Integer.valueOf(value.substring(0,4));
}catch (Exception e){
Date date = new Date();
return date.getYear();
}
}
// 如果年份小于2022,就给2022
if(nYear < 2022){
nYear = 2022;
}
}catch(Exception e){
Date date = new Date();
return date.getYear();
}
return nYear;
}
@Override
public String doSharding(Collection<String> availableTargetNames, PreciseShardingValue<String> shardingValue) {
if(shardingValue != null){
String value = shardingValue.getValue();// 需要传递的值
int nYear = -1;
try{
nYear = getYearFromString(value);//获取年份
}catch(Exception e){
e.printStackTrace();
}
System.out.println(availableTargetNames.toString());
for (String each : availableTargetNames) {
if (each.equals(DataBase_Name + String.valueOf(nYear))) {
return each;
}
}
//throw new UnsupportedOperationException("content_channel没有匹配到可用数据库节点");
}else{
throw new UnsupportedOperationException("分片列为空");
}
return null;
}
@Override
public Collection<String> doSharding(Collection<String> databaseNamescollection, RangeShardingValue<String> rangeShardingValue) {
Collection<String> collect = new ArrayList<>();
if (rangeShardingValue != null) {
Range<String> valueRange = rangeShardingValue.getValueRange();//获得范围区间值
String slowerEndpointDate = String.valueOf(valueRange.hasLowerBound()?valueRange.lowerEndpoint():""); //获得返回区间值下限
String supperEndpointDate = String.valueOf(valueRange.hasUpperBound()?valueRange.upperEndpoint():"");//获得范围区间值上限
int nStartYear =-1;
int nEndYear=-1;
try {
nStartYear = getYearFromString(slowerEndpointDate) ;//获得下限年份
nEndYear = getYearFromString(supperEndpointDate) ;//获得上限年份
if(nStartYear==-1&&nEndYear!=-1) {//下限年份为空时,上限不为空时,下限=上限-5
nStartYear=nEndYear-ModedID;
} else if(nStartYear!=-1&&nEndYear==-1) {//下限不为空,上限为空,上限=下限+5
nEndYear=nStartYear+ModedID;
}
} catch (ParseException e) {
e.printStackTrace();
}
//根据上下限范围,循环取值判断对应的数据库名称,返回数据库名称集合
for (String each : databaseNamescollection) {
for(int i=nStartYear;i<=nEndYear;i++) {
if (each.equals(DataBase_Name + String.valueOf(i))) {
if(!collect.contains(each)) {
collect.add(each);
}
}
}
}
return collect;
}else {
throw new UnsupportedOperationException("分片列为空");
}
}
@Override
public Properties getProps() {
return null;
}
@Override
public void init(Properties properties) {
}
}
打包成jar文件
记得去掉pom里build节点下springboot自带的打包规则,会让jar包找不到主类的,可以直接删掉build节点。
package成功后,会在target目录下生成一个jar文件,直接放到shardingproxy的lib目录下,然后在config-sharding.yaml的shardingAlgorithms里,把自定义方法的路径名写到algorithmClassName后面。
server.yaml配置
这个文件配置服务和权限的相关信息,基本不要动,改一下authority节点,配置用户信息就行
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
######################################################################################################
#
# If you want to configure governance, authorization and proxy properties, please refer to this file.
#
######################################################################################################
#mode:
# type: Cluster
# repository:
# type: ZooKeeper
# props:
# namespace: governance_ds
# server-lists: localhost:2181
# retryIntervalMilliseconds: 500
# timeToLiveSeconds: 60
# maxRetries: 3
# operationTimeoutMilliseconds: 500
#
authority:
users:
- user: root@%
password: root
- user: sharding
password: sharding
privilege:
type: ALL_PERMITTED
#
#transaction:
# defaultType: XA
# providerType: Atomikos
#
#sqlParser:
# sqlCommentParseEnabled: false
# sqlStatementCache:
# initialCapacity: 2000
# maximumSize: 65535
# parseTreeCache:
# initialCapacity: 128
# maximumSize: 1024
#
#cdc:
# enabled: false
# port: 33071
#
props:
max-connections-size-per-query: 1
kernel-executor-size: 16 # Infinite by default.
proxy-frontend-flush-threshold: 128 # The default value is 128.
proxy-hint-enabled: false
sql-show: true
check-table-metadata-enabled: false
# Proxy backend query fetch size. A larger value may increase the memory usage of ShardingSphere Proxy.
# The default value is -1, which means set the minimum value for different JDBC drivers.
proxy-backend-query-fetch-size: -1
proxy-frontend-executor-size: 0 # Proxy frontend executor size. The default value is 0, which means let Netty decide.
# Available options of proxy backend executor suitable: OLAP(default), OLTP. The OLTP option may reduce time cost of writing packets to client, but it may increase the latency of SQL execution
# and block other clients if client connections are more than `proxy-frontend-executor-size`, especially executing slow SQL.
proxy-backend-executor-suitable: OLAP
proxy-frontend-max-connections: 0 # Less than or equal to 0 means no limitation.
# Available sql federation type: NONE (default), ORIGINAL, ADVANCED
sql-federation-type: NONE
proxy-mysql-default-version: 5.7.22 # In the absence of schema name, the default version will be used.
proxy-default-port: 3307 # Proxy default port.
proxy-netty-backlog: 1024 # Proxy netty backlog.
启动测试
点击shardingproxy里bin文件夹下的start.bat,成功的话会出现
ShardingSphere-Proxy Standalone mode started successfully
的提示,不过失败是常态哈哈哈,会不停的有各种各样的错。成功的话,直接跑sql是会自动进入对应的库和表。
常见错误
常见的错误是我凭借主观臆断和有限的经验推测出的原因和解决方案,不代表规范的,正确的,优雅的解决方式,只作为参考,不作为标准答案。
- Insert statement does not support sharding table routing
我遇到过两种,一种是算法不满足实际分出来的表的名称导致的。例如表名是orders_1~orders_12,通过自带算法或自定义算法生成的指向表名是orders_13,就会报这个错。建议想办法看看算法生成的表名是什么,例如自定义方法里,可以看看return的表名是不是正确的。
还有一种是因为orderTime的格式,当时我用的自带分片规则INTERVAL,.net core获取DateTime类型的时间,格式是yyyy/MM/dd HH:mm:ss,但我维护的datetime-pattern格式是yyyy-MM-dd hh:mm:ss,所以有这个错 - No database route info
跟分表算法出错一样,这个错是分库算法提供的库名没有匹配上真实库名
其他错当时解决了就忘了,下次遇到再写吧
写在最后
目前只完成了最简单的搭建,还有很多更深的细节也没有去深挖。因为我从来也没用过,写这个博客也是记录一下,提醒自己,也希望给有需要的朋友提供一点小小的帮助。接下来我计划在.net core中整合这块的操作,并做好记录,大家加油,我也加油。