目标
解决业务数据库中只有少数几个表数据量特大的问题,大部分普通表在百万级别,有几张特别大的接近1亿条数据了。针对这种情况,虽然可以通过增加分库解决,只是库的数量增长会比较快。如果只针对大表做个分表方案,可以有效降低分库的增长速度,也解决了单表数据量过大性能问题。
分表方案
如何实现分表呢?方案有哪些呢?其实不管用什么方案,有一点是肯定要做的,就是对业务层要屏蔽sql的差异,在一个合适的位置对SQL拦截重写,替换成分表的表名。
按目前大部分项目的特点,有如下几种方案可以考虑:
方案1:在ORM框架层(mybaits)进行SQL拦截修改
方案2:在连接池层(druid)进行SQL拦截修改
方案3: 代理中间件(mycat、shardingsphere)中进行SQL的拦截修改,这和方案1,2的原理是一样的,只不过换了个位置
方案的实现
今天老吕谈下在方案2连接池层面分表方案的实现
SQL改写的步骤一般是这样的:
1、将SQL解析为 AST
2、修改AST
3、生成新的SQL(如果2中没有修改则不用生成新SQL)
4、替换原始SQL继续向下走
这里面涉及几个问题:
1、哪些表需要分表?
数据量相差悬殊的那几张表。如果你这个库里的表之间的数据量相差不大,则干脆直接分库来的简单高效,不需要分表。一般分表名单和数量确定后不会随便修改,写到配置文件或者常量即可。
2、分表的后缀如何获取?
一般会选取 用户ID,企业ID等字段,这些字段的值往往登录完成后就确定下来了,也就是说登录完成后 分库分表的后缀就确定下来了,之后用的时候就非常方便了,不需要从SQL里解析获取,直接从登录上下文中就可以得到 分表后缀 值。
如何实现将SQL转换为AST并对AST进行修改?
很庆幸阿里Druid连接池提供的方便的SQL解析器。
1、解析成AST
List<SQLStatement> sqlStatements =
SQLUtils.parseStatements(sql, JdbcConstants.MYSQL);
2、编写 visitor类,访问并修改AST
class RewriteTableNameVisitor extends MySqlASTVisitorAdapter {
//分表后缀
private String suffix;
//模拟下需要进行分表操作的表名单
private static Map<String,Boolean> sharding = new HashMap<>();
static {
sharding.put("product",true);
}
/**
* sql是否被重写
*/
private boolean rewrite = false;
/**
* 是否insert语句?
*/
private boolean insert = false;
/**
* 是否delete语句?
*/
private boolean delete = false;
/**
* 是否update语句?
*/
private boolean update = false;
public RewriteTableNameVisitor(String suffix){
this.suffix = suffix;
}
@Override
public void endVisit(MySqlInsertStatement x) {
this.insert = true;
}
@Override
public void endVisit(MySqlDeleteStatement x) {
this.delete = true;
}
@Override
public void endVisit(MySqlUpdateStatement x) {
this.update = true;
}
@Override
public boolean visit(SQLExprTableSource tableSource){
String tableName = tableSource.getTableName();
//确认否需要分表
if (sharding.containsKey(tableName.toLowerCase())) {
//追加别名
if (tableSource.getAlias()==null&&insert) {
tableSource.setAlias(tableName);
}
//追加分表后缀
tableSource.setSimpleName(tableName+this.suffix);
//已经被修改,需要替换SQL
this.rewrite = true;
}
return true;
}
public boolean isRewrite() {
return rewrite;
}
}
3、获取新SQL
newSQL = SQLUtils.toSQLString(sqlStatements, JdbcConstants.MYSQL);
4、替换
public class MyDruidSqlRewriteFilter extends FilterEventAdapter {
private Logger logger = LoggerFactory.getLogger(this.getClass());
@Override
public PreparedStatementProxy connection_prepareStatement(FilterChain chain, ConnectionProxy connection, String sql) throws SQLException {
//解决PrepareStatement执行sql问题
String shardingTableColumnValue = TraceUtil.getShardingTableColumnValue();
String newSQL = getNewSQL(sql,shardingTableColumnValue);
return super.connection_prepareStatement(chain, connection, newSQL);
}
@Override
public boolean statement_execute(FilterChain chain, StatementProxy statement, String sql) throws SQLException{
//解决statement执行sql问题
String shardingTableColumnValue = TraceUtil.getShardingTableColumnValue();
String newSQL = getNewSQL(sql,shardingTableColumnValue);
return super.statement_execute(chain, statement, newSQL);
}
private String getNewSQL(String sql,String suffix) {
boolean rewrite = false;
String newSQL = sql;
List<SQLStatement> sqlStatements = SQLUtils.parseStatements(sql, JdbcConstants.MYSQL);
for (SQLStatement st : sqlStatements) {
RewriteTableNameVisitor visitor = new RewriteTableNameVisitor(suffix);
st.accept(visitor);
if (visitor.isRewrite()) {
rewrite = true;
}
}
if (rewrite) {
newSQL = SQLUtils.toSQLString(sqlStatements, JdbcConstants.MYSQL);
}
System.out.println(newSQL);
return newSQL;
}
}
5、是不是感觉很简单就实现了呢?其实这得益于 Druid 提供的强大的accept方法,这是典型的 访问者设计模式
List<SQLStatement> sqlStatements = SQLUtils.parseStatements(sql, JdbcConstants.MYSQL);
for (SQLStatement st : sqlStatements) {
RewriteTableNameVisitor visitor = new RewriteTableNameVisitor(suffix);
//这一句话相当于我AST树的根节点接受了我的访问者visitor
//SQLStatement 就相当于这棵树的根节点,root
st.accept(visitor);
if (visitor.isRewrite()) {
rewrite = true;
}
}
再看下 SQLStatement 的源码就可以知道,你提供的visitor方法如果返回true,它就会自动遍历下级节点,这个太重要了,它省去了你自己遍历AST的复杂性。
这是基类里核心的accept方法,里面的accept0方法是抽象方法,子类覆盖。preVisit和postVisit也是抽象的,有默认实现,不太重要。这是典型的模板方法设计模式。
public final void accept(SQLASTVisitor visitor) {
if (visitor == null) {
throw new IllegalArgumentException();
}
visitor.preVisit(this);
accept0(visitor);
visitor.postVisit(this);
}
看下子类的accept0方法(其中一个子类,其它子类大体结构相同)
public void accept0(MySqlASTVisitor visitor) {
if (visitor.visit(this)) {
acceptChild(visitor, name);
acceptChild(visitor, from);
}
visitor.endVisit(this);
}
visitor方法返回true,表示要遍历下级节点,所有的节点都实现了 SQLObject 接口,有的节点包括下级节点,有的节点是叶子节点,这明显是 组合设计模式 的运用,深度优先遍历树状结构。
这就是Druid SQL Parse模块为我们提供的便利,它自身的不少过滤功能也是基于此模块原理实现的。
测试结果
原始sql:delete from gl_account a where a.id = 1;
改写后sql:DELETE FROM gl_account_1 a
WHERE a.id = 1;
---------------------------------------------
原始sql:REPLACE INTO gl_collected_log( orgId,id,startTime ) VALUES (1,2,3)
改写后sql:REPLACE INTO gl_collected_log_1 (orgId, id, startTime)
VALUES (1, 2, 3)
---------------------------------------------
原始sql:SELECT id,matchId,matchAccFatherCode,accName,matchName,matchMethod,category,errerMsg,isOrderMode,spare
FROM gl_acc_match_preset
WHERE category in(1) AND matchId IN (1,2)
ORDER BY matchId,isOrderMode ;
改写后sql:SELECT id, matchId, matchAccFatherCode, accName, matchName
, matchMethod, category, errerMsg, isOrderMode, spare
FROM gl_acc_match_preset_1 gl_acc_match_preset
WHERE category IN (1)
AND matchId IN (1, 2)
ORDER BY matchId, isOrderMode;
---------------------------------------------
原始sql: INSERT INTO gl_excel_accountBegin (`orgId`,`id`) VALUES (1,2),(1,3);
改写后sql:INSERT INTO gl_excel_accountBegin_1 gl_excel_accountBegin (`orgId`, `id`)
VALUES (1, 2),
(1, 3);
---------------------------------------------
原始sql: SELECT
acc.isEndNode,acc.isCalc as accIsCalc,ctf.accountId,
acc.isCalcMulti,ctf.id,ctf.orgId,ctf.year,
ctf.origAmountDr,ctf.origAmountCr,ctf.amountDr,ctf.amountCr,
ctf.quantityDr,ctf.quantityCr,ctf.isDetailData,
ctf.beginOrigAmount,ctf.beginAmount,ctf.beginQuantity,
ctf.beginOrigAmountDr,ctf.beginAmountDr,ctf.beginQuantityDr,
ctf.beginOrigAmountCr,ctf.beginAmountCr,ctf.beginQuantityCr
FROM gl_certificate ctf
INNER JOIN gl_account acc ON ctf.accountId = acc.id AND ctf.orgId = acc.orgId
and ctf.isPeriodBegin = 1 and ctf.orgId = #{orgId} and ctf.year = #{year}
and (acc.parentId = #{accountId} or acc.id=#{accountId})
where ctf.orgId = #{orgId};
改写后sql:SELECT acc.isEndNode, acc.isCalc AS accIsCalc, ctf.accountId, acc.isCalcMulti, ctf.id
, ctf.orgId, ctf.year, ctf.origAmountDr, ctf.origAmountCr, ctf.amountDr
, ctf.amountCr, ctf.quantityDr, ctf.quantityCr, ctf.isDetailData, ctf.beginOrigAmount
, ctf.beginAmount, ctf.beginQuantity, ctf.beginOrigAmountDr, ctf.beginAmountDr, ctf.beginQuantityDr
, ctf.beginOrigAmountCr, ctf.beginAmountCr, ctf.beginQuantityCr
FROM gl_certificate_1 ctf
INNER JOIN gl_account_1 acc
ON ctf.accountId = acc.id
AND ctf.orgId = acc.orgId
AND ctf.isPeriodBegin = 1
AND ctf.orgId = #{orgId}
AND ctf.year = #{year}
AND (acc.parentId = #{accountId}
OR acc.id = #{accountId})
WHERE ctf.orgId = #{orgId};
---------------------------------------------
原始sql:
select 1 from gl_certificate ctf
inner join edf_org org on ctf.orgId = org.id and ctf.year = org.enabledYear
where ctf.orgId = #{orgId}
and ctf.isPeriodBegin = 1
LIMIT 1;
改写后sql:SELECT 1
FROM gl_certificate_1 ctf
INNER JOIN edf_org_1 org
ON ctf.orgId = org.id
AND ctf.year = org.enabledYear
WHERE ctf.orgId = #{orgId}
AND ctf.isPeriodBegin = 1
LIMIT 1;
---------------------------------------------
原始sql:select ts from gl_certificate
inner join edf_org org on orgId = org.id and year = org.enabledYear
where orgId = #{orgId}
and isPeriodBegin = 1
order by ts desc
LIMIT 1;
改写后sql:SELECT ts
FROM gl_certificate_1 gl_certificate
INNER JOIN edf_org_1 org
ON orgId = org.id
AND year = org.enabledYear
WHERE orgId = #{orgId}
AND isPeriodBegin = 1
ORDER BY ts DESC
LIMIT 1;
---------------------------------------------
原始sql:SELECT
acc.id,
acc.code,
acc.name,
acc.balanceDirection,
IF(acc.balanceDirection = 0, '借', '贷') AS balanceDirectionName,
1 AS selected
FROM
gl_account acc
WHERE
acc.orgId = 1 AND acc.id = 1
union all
SELECT
acc.id,
acc.code,
acc.name,
acc.balanceDirection,
IF(acc.balanceDirection = 0, '借', '贷') AS balanceDirectionName,
0 AS selected
FROM
gl_account acc
LEFT JOIN
gl_excel_account_relation relation ON acc.orgId = relation.orgId
AND acc.id = relation.accountId
WHERE
acc.orgId = 1
AND relation.accountId IS NULL
ORDER BY code
改写后sql:SELECT acc.id, acc.code, acc.name, acc.balanceDirection
, IF(acc.balanceDirection = 0, '借', '贷') AS balanceDirectionName
, 1 AS selected
FROM gl_account_1 acc
WHERE acc.orgId = 1
AND acc.id = 1
UNION ALL
SELECT acc.id, acc.code, acc.name, acc.balanceDirection
, IF(acc.balanceDirection = 0, '借', '贷') AS balanceDirectionName
, 0 AS selected
FROM gl_account_1 acc
LEFT JOIN gl_excel_account_relation_1 relation
ON acc.orgId = relation.orgId
AND acc.id = relation.accountId
WHERE acc.orgId = 1
AND relation.accountId IS NULL
ORDER BY code
---------------------------------------------
原始sql:select * from product a
join product b
join product c
join orders d
join (select * from orders where id=1) e
on a.id = b.id
where a.id = 1 and a.id in (select id from goods)
改写后sql:SELECT *
FROM product_1 a
JOIN product_1 b
JOIN product_1 c
JOIN orders_1 d
JOIN (
SELECT *
FROM orders_1 orders
WHERE id = 1
) e
ON a.id = b.id
WHERE a.id = 1
AND a.id IN (
SELECT id
FROM goods_1 goods
)
---------------------------------------------
原始sql:update order a set id = 1,name = 2 where id = 1;
改写后sql:UPDATE order_1 a
SET id = 1, name = 2
WHERE id = 1;
---------------------------------------------
原始sql:insert into xxx values (1), (2), (3)
改写后sql:INSERT INTO xxx_1 xxx
VALUES (1),
(2),
(3)
---------------------------------------------
原始sql:update A, B set A.a = a1, B.b = b1 where A.id = B.aid
改写后sql:UPDATE A_1 A, B_1 B
SET A.a = a1, B.b = b1
WHERE A.id = B.aid
---------------------------------------------
原始sql:delete A, B from A, B where A.id = B.aid
改写后sql:DELETE A_1 A, B_1 B
FROM A_1 A, B_1 B
WHERE A.id = B.aid
---------------------------------------------
原始sql:select * from x limit 2
改写后sql:SELECT *
FROM x_1 x
LIMIT 2
---------------------------------------------
原始sql:select A.name, sum(A.num), A.price from A group by A.name
改写后sql:SELECT A.name, sum(A.num), A.price
FROM A_1 A
GROUP BY A.name
---------------------------------------------
原始sql:select * from A order by rand()
改写后sql:SELECT *
FROM A_1 A
ORDER BY rand()
---------------------------------------------
原始sql:replace into x select * from y
改写后sql:REPLACE INTO x_1
SELECT *
FROM y_1 y
---------------------------------------------
总结
1、看似简单的实现背后并不简单,通过阅读Druid源码可以了解到整个实现过程非常繁琐,因为AST的每个节点的下级节点都有多种情况需要考虑,我梳理了一下代码中运用了一下设计模式:
1)访问者设计模式
2)模板方法设计模式
3)组合设计模式(处理树状结构)
4)责任链设计模式(过滤器链)
2、基本上引入任何框架都会导致架构整体复杂度上升一个层级,要考虑你的团队规模、能力水平、阶段主要矛盾是否相适应,要考虑能否有能力解决架构改变带来的负面问题
3、从核心原理看待一个框架,能否用最小的代价实现最核心的功能,价值最大化,风险最小化