使用Druid的sql parser做一个表数据血缘分析工具

前言

大数据场景下,每天可能都要在离线集群,运行大量的任务来支持业务、运营的分析查询。任务越来越多的时候,就会有越来越多的依赖关系,每一个任务都需要等需要的input表生产出来后,再去生产自己的output表。最开始的时候,依赖关系自然是可以通过管理员来管理,随着任务量的加大,就需要一个分析工具来解析任务的inputs、outs,并且自行依赖上生产inputs表的那些任务。本文就介绍一个使用druid parser,来解析SQL的input、output的血缘分析工具。

建议对druid比较陌生的同学可以先看下druid的官方文档。

做一次sql的血缘分析的流程

  • 解析sql,拿到抽象语法树
  • 遍历抽象语法树,得到from、to

使用druid解析sql到语法树

druid提供了简单、快速的SQL解析工具,可以很简单拿到一段SQL的AST(抽象语法树)。而druid对语法树提供了多种的SQLStatement,使遍历语法树更加容易。

 SQLStatementParser parser = SQLParserUtils.createSQLStatementParser(sql, JdbcConstants.HIVE);
 SQLStatement stmt= parser.parseStatementList().get(0);

从语法树中取出from和to

拿到语法树之后,想办法把from、to从语法树中取出来就大功告成。

最初的写法

最开始,就是简单的遍历一下语法树的节点,取出from表和to表的表名。

    /**
     * 根据create或者insert的sql取出from、to
     * @param sql
     * @return
     * @throws ParserException
     */
    private static Map<String, Set<String>> getFromTo(String sql) throws ParserException {
        SQLStatementParser parser = SQLParserUtils.createSQLStatementParser(sql, JdbcConstants.HIVE);
        SQLStatement stmt= parser.parseStatementList().get(0);

        Set<String> from = new HashSet<>();
        Set<String> to = new HashSet<>();
        if (stmt instanceof SQLInsertStatement) {
            SQLInsertStatement istmt = (SQLInsertStatement) stmt;
            to.add(istmt.getTableSource().toString().toUpperCase());

            SQLTableSource sts = istmt.getQuery().getQueryBlock().getFrom();
            from = getFromTableFromTableSource(sts);
        } else if (stmt instanceof SQLCreateTableStatement) {
            SQLCreateTableStatement cstmt = (SQLCreateTableStatement) stmt;
            to.add(cstmt.getTableSource().toString().toUpperCase());

            SQLTableSource sts = cstmt.getSelect().getQueryBlock().getFrom();
            from = getFromTableFromTableSource(sts);
        }

        Map<String, Set<String>> fromTo = new HashMap<>(4);
        fromTo.put("from", from);
        fromTo.put("to", to);
        return fromTo;
    }

    private static Set<String> getFromTableFromTableSource (SQLTableSource sts) {
        Set<String> from = new HashSet<>();
        if (sts instanceof SQLJoinTableSource) {
            from = getFromTableFromJoinSource((SQLJoinTableSource)sts);
        } else {
            from.add(sts.toString().toUpperCase());
        }
        return from;
    }

    private static Set<String> getFromTableFromJoinSource (SQLJoinTableSource sjts) {
        Set<String> result = new HashSet<>();
        getFromTable(result, sjts);
        return result;
    }

    // 递归获取join的表list
    private static void getFromTable (Set<String> fromList, SQLJoinTableSource sjts) {
        SQLTableSource left = sjts.getLeft();
        if (left instanceof SQLJoinTableSource) {
            getFromTable(fromList, (SQLJoinTableSource)left);
        } else {
            fromList.add(left.toString().toUpperCase());
        }
        SQLTableSource right = sjts.getRight();
        if (right instanceof SQLJoinTableSource) {
            getFromTable(fromList, (SQLJoinTableSource)right);
        } else {
            fromList.add(right.toString().toUpperCase());
        }
    }

用druid更好的实现

因为是为了快速完成,所以写的取出from、to表的部分还是存在很大的问题的。只能支持一条sql,只能支持简单的sql语句,比如union all或者子查询就有些无力。于是又看了一下文档,其实druid是提供了visitor方法来遍历语法树的,而且提供了一个简单的SchemaStatVisitor,可以取出Sql中所有用到的表。于是就可以写成这种格式。

public static Map<String, TreeSet<String>> getFromTo (String sql) throws ParserException {
        List<SQLStatement> stmts = SQLUtils.parseStatements(sql, JdbcConstants.HIVE);
        TreeSet<String> fromSet = new TreeSet<>();
        TreeSet<String> toSet = new TreeSet<>();
        if (stmts == null) {
            return null;
        }

        String database="DEFAULT";
        for (SQLStatement stmt : stmts) {
            SchemaStatVisitor statVisitor = SQLUtils.createSchemaStatVisitor(JdbcConstants.HIVE);
            if (stmt instanceof SQLUseStatement) {
                database = ((SQLUseStatement) stmt).getDatabase().getSimpleName().toUpperCase();
            }
            stmt.accept(statVisitor);
            Map<Name, TableStat> tables = statVisitor.getTables();
            if (tables != null) {
                final String db = database;
                tables.forEach((tableName, stat) -> {
                    if (stat.getCreateCount() > 0 || stat.getInsertCount() > 0) {
                        String to = tableName.getName().toUpperCase();
                        if (!to.contains("."))
                            to = db + "." + to;
                        toSet.add(to);
                    } else if (stat.getSelectCount() > 0) {
                        String from = tableName.getName().toUpperCase();
                        if (!from.contains("."))
                            from = db + "." + from;
                        fromSet.add(from);
                    }
                });
            }
        }
  • 5
    点赞
  • 35
    收藏
    觉得还不错? 一键收藏
  • 5
    评论
以下是一个使用Druid解析SQL分析字段血缘关系的Java代码示例: ``` import com.alibaba.druid.sql.SQLUtils; import com.alibaba.druid.sql.ast.SQLExpr; import com.alibaba.druid.sql.ast.SQLStatement; import com.alibaba.druid.sql.ast.expr.*; import com.alibaba.druid.sql.ast.statement.*; import com.alibaba.druid.sql.dialect.mysql.parser.MySqlStatementParser; import com.alibaba.druid.sql.parser.SQLStatementParser; import java.util.*; public class DruidSqlParser { public static void main(String[] args) { String sql = "SELECT t1.id, t2.name FROM table1 t1 JOIN table2 t2 ON t1.id = t2.id WHERE t1.age > 18"; List<ColumnInfo> columnInfoList = analyzeSql(sql); for (ColumnInfo columnInfo : columnInfoList) { System.out.println(columnInfo); } } public static List<ColumnInfo> analyzeSql(String sql) { List<ColumnInfo> columnInfoList = new ArrayList<>(); SQLStatementParser parser = new MySqlStatementParser(sql); List<SQLStatement> statementList = parser.parseStatementList(); for (SQLStatement statement : statementList) { if (statement instanceof SQLSelectStatement) { SQLSelectStatement selectStatement = (SQLSelectStatement) statement; SQLSelectQueryBlock queryBlock = selectStatement.getSelect().getQueryBlock(); analyzeSelectItems(queryBlock.getSelectList(), columnInfoList); analyzeWhere(queryBlock.getWhere(), columnInfoList); analyzeJoin(queryBlock.getFrom(), columnInfoList); } } return columnInfoList; } private static void analyzeSelectItems(List<SQLSelectItem> selectItems, List<ColumnInfo> columnInfoList) { for (SQLSelectItem selectItem : selectItems) { SQLExpr expr = selectItem.getExpr(); String columnName = null; String tableName = null; if (expr instanceof SQLPropertyExpr) { SQLPropertyExpr propertyExpr = (SQLPropertyExpr) expr; columnName = propertyExpr.getName(); tableName = propertyExpr.getOwner().getName(); } else if (expr instanceof SQLIdentifierExpr) { SQLIdentifierExpr identifierExpr = (SQLIdentifierExpr) expr; columnName = identifierExpr.getName(); } if (columnName != null) { ColumnInfo columnInfo = new ColumnInfo(columnName, tableName, true); if (!columnInfoList.contains(columnInfo)) { columnInfoList.add(columnInfo); } } } } private static void analyzeWhere(SQLExpr whereExpr, List<ColumnInfo> columnInfoList) { if (whereExpr instanceof SQLBinaryOpExpr) { SQLBinaryOpExpr binaryOpExpr = (SQLBinaryOpExpr) whereExpr; analyzeWhere(binaryOpExpr.getLeft(), columnInfoList); analyzeWhere(binaryOpExpr.getRight(), columnInfoList); } else if (whereExpr instanceof SQLBinaryOpExprGroup) { SQLBinaryOpExprGroup binaryOpExprGroup = (SQLBinaryOpExprGroup) whereExpr; for (SQLExpr sqlExpr : binaryOpExprGroup.getItems()) { analyzeWhere(sqlExpr, columnInfoList); } } else if (whereExpr instanceof SQLIdentifierExpr) { SQLIdentifierExpr identifierExpr = (SQLIdentifierExpr) whereExpr; String columnName = identifierExpr.getName(); ColumnInfo columnInfo = new ColumnInfo(columnName, null, false); if (!columnInfoList.contains(columnInfo)) { columnInfoList.add(columnInfo); } } } private static void analyzeJoin(SQLTableSource tableSource, List<ColumnInfo> columnInfoList) { if (tableSource instanceof SQLJoinTableSource) { SQLJoinTableSource joinTableSource = (SQLJoinTableSource) tableSource; analyzeJoin(joinTableSource.getLeft(), columnInfoList); analyzeJoin(joinTableSource.getRight(), columnInfoList); SQLExpr onExpr = joinTableSource.getCondition(); if (onExpr instanceof SQLBinaryOpExpr) { SQLBinaryOpExpr binaryOpExpr = (SQLBinaryOpExpr) onExpr; SQLExpr leftExpr = binaryOpExpr.getLeft(); SQLExpr rightExpr = binaryOpExpr.getRight(); if (leftExpr instanceof SQLPropertyExpr && rightExpr instanceof SQLPropertyExpr) { SQLPropertyExpr leftPropertyExpr = (SQLPropertyExpr) leftExpr; SQLPropertyExpr rightPropertyExpr = (SQLPropertyExpr) rightExpr; String leftColumnName = leftPropertyExpr.getName(); String leftTableName = leftPropertyExpr.getOwner().getName(); String rightColumnName = rightPropertyExpr.getName(); String rightTableName = rightPropertyExpr.getOwner().getName(); ColumnInfo leftColumnInfo = new ColumnInfo(leftColumnName, leftTableName, false); ColumnInfo rightColumnInfo = new ColumnInfo(rightColumnName, rightTableName, false); if (columnInfoList.contains(leftColumnInfo)) { int index = columnInfoList.indexOf(leftColumnInfo); columnInfoList.get(index).setTableName(leftTableName); } if (columnInfoList.contains(rightColumnInfo)) { int index = columnInfoList.indexOf(rightColumnInfo); columnInfoList.get(index).setTableName(rightTableName); } } } } else if (tableSource instanceof SQLSubqueryTableSource) { SQLSubqueryTableSource subqueryTableSource = (SQLSubqueryTableSource) tableSource; SQLSelect subSelect = subqueryTableSource.getSelect(); analyzeSelectItems(subSelect.getQueryBlock().getSelectList(), columnInfoList); analyzeWhere(subSelect.getQueryBlock().getWhere(), columnInfoList); analyzeJoin(subSelect.getQueryBlock().getFrom(), columnInfoList); } else if (tableSource instanceof SQLExprTableSource) { SQLExprTableSource exprTableSource = (SQLExprTableSource) tableSource; String tableName = null; if (exprTableSource.getAlias() != null) { tableName = exprTableSource.getAlias(); } else if (exprTableSource.getExpr() instanceof SQLIdentifierExpr) { tableName = ((SQLIdentifierExpr) exprTableSource.getExpr()).getName(); } for (ColumnInfo columnInfo : columnInfoList) { if (columnInfo.getTableName() == null) { columnInfo.setTableName(tableName); } } } } } class ColumnInfo { private String columnName; private String tableName; private boolean isSelectItem; public ColumnInfo(String columnName, String tableName, boolean isSelectItem) { this.columnName = columnName; this.tableName = tableName; this.isSelectItem = isSelectItem; } public String getColumnName() { return columnName; } public void setColumnName(String columnName) { this.columnName = columnName; } public String getTableName() { return tableName; } public void setTableName(String tableName) { this.tableName = tableName; } public boolean isSelectItem() { return isSelectItem; } public void setSelectItem(boolean selectItem) { isSelectItem = selectItem; } @Override public String toString() { return "ColumnInfo{" + "columnName='" + columnName + '\'' + ", tableName='" + tableName + '\'' + ", isSelectItem=" + isSelectItem + '}'; } @Override public boolean equals(Object o) { if (this == o) return true; if (o == null || getClass() != o.getClass()) return false; ColumnInfo that = (ColumnInfo) o; return Objects.equals(columnName, that.columnName) && Objects.equals(tableName, that.tableName); } @Override public int hashCode() { return Objects.hash(columnName, tableName); } } ``` 这个代码实现了对SELECT、WHERE和JOIN三个部分的分析,并且能够正确地识别出每个字段所属的和是否是SELECT中的选择项。你可以根据自己的需求对这个代码进行扩展和修改。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值