Implement SQL Lint with Bazel and SQLFluff

f5cdc08fa394b0630ea908808a20c8aa.jpeg

Background

After SQL version control, SQL lint step in CI/CD pipeline is required. The benefit of it is that we are able to find out the issues from SQL before we execute it.

In this blog, we'll combine Bazel and SQLFluff[1] to implement linting SQL.

SQLFluff is a SQL lint tool written in Python and supported a lot of SQL dialects.

The features it have:

  1. 1. Supporting a lot of SQL dialects, like Snowflake、PostgreSQL、ClickHouse, etc. Here is the list: https://docs.sqlfluff.com/en/stable/dialects.html ;

  2. 2. Able to output the correct SQL. It can reduce our effort to fix SQL issue.

  3. 3. At the same time, it supports the use of command lines and API calls.

Put it into CI/CD Pipeline

In my opinion, there are two ways to implement SQL lint in CI/CD pipeline:

  • • way 1:Add a SQL lint step into pipeline;

  • • way 2: Put the SQL lint logic into the testing code. When you run test command, it run automatically.

Way 2 is my favorite. I will explain it at the end of this blog.

Project Structure

.
├── BUILD.bazel
├── WORKSPACE
├── repository-hibernate-impl
│   ├── BUILD.bazel
│   └── src
│       ├── main
│       │   └── sql
│       │       └── V1__runbook_table.sql
│       └── test
│           └── python
│               ├── BUILD.bazel
│               ├── requirements_lock.txt
│               └── sql_test.py

Step 1: Add Python Dependencies in WORKSPACE file

This article is based on Bazel 5.4.0, so we defined external dependencies in WORKSPACE file.

http_archive(  
    name = "rules_python",  
    sha256 = "a644da969b6824cc87f8fe7b18101a8a6c57da5db39caa6566ec6109f37d2141",  
    strip_prefix = "rules_python-0.20.0",  
    url = "https://github.com/bazelbuild/rules_python/releases/download/0.20.0/rules_python-0.20.0.tar.gz",  
)  
  
load("@rules_python//python:repositories.bzl", "py_repositories")  
  
  
py_repositories()  
  
load("@rules_python//python:repositories.bzl", "python_register_toolchains")  
  
python_register_toolchains(  
    name = "python3_11",  
    python_version = "3.11",  
)  
  
load("@python3_11//:defs.bzl", interpreter_3_11 = "interpreter")  
  
load("@rules_python//python:pip.bzl", "pip_parse")  
  
# Create a central repo that knows about the dependencies needed from  
# requirements_lock.txt.  
pip_parse(  
   name = "pip_deps",  
   python_interpreter_target = interpreter_3_11,  
   requirements_lock = "//repository-hibernate-impl/src/test/python:requirements_lock.txt",  
)  
# Load the starlark macro which will define your dependencies.  
load("@pip_deps//:requirements.bzl", "install_deps")  
# Call it to define repos for your requirements.  
install_deps()

Step 2: Define SQLFluff Dependencies

The content of requirements_lock.txt file:

sqlfluff==2.0.5  
Jinja2==3.1.2  
MarkupSafe==2.1.2  
Pygments==2.15.0  
appdirs==1.4.4  
chardet==5.1.0  
click==8.1.3  
colorama==0.4.6  
diff_cover==7.5.0  
iniconfig==2.0.0  
packaging==23.1.0  
pathspec==0.11.1  
pluggy==1.0.0  
pytest==7.3.1  
tomli==2.0.1  
toml==0.10.2  
exceptiongroup==1.1.1  
pyyaml==6.0  
regex===2023.3.23  
tblib==1.7.0  
tqdm==4.65.0  
typing_extensions==4.5.0

Step 3: Define BUILD target

load("@pip_deps//:requirements.bzl", "requirement")  
load("@rules_python//python:defs.bzl", "py_test")  
  
py_test(  
    name = "sql_test",  
    srcs = ["sql_test.py"],  
    # data attribute value is the sql's location
    data = [ "//repository-hibernate-impl:sqlTest",],  
    deps = [  
       requirement("sqlfluff"),  
       requirement("Jinja2"),  
       requirement("MarkupSafe"),  
       requirement("Pygments"),  
       requirement("appdirs"),  
       requirement("chardet"),  
       requirement("click"),  
       requirement("colorama"),  
       requirement("diff_cover"),  
       requirement("iniconfig"),  
       requirement("packaging"),  
       requirement("pathspec"),  
       requirement("pluggy"),  
       requirement("pytest"),  
       requirement("tomli"),  
       requirement("toml"),  
       requirement("exceptiongroup"),  
       requirement("pyyaml"),  
       requirement("regex"),  
       requirement("tblib"),  
       requirement("tqdm"),  
       requirement("typing_extensions"),  
    ],  
)

NOTE: The target of SQL files in repository-hibernate-impl/BUILD.bazel is as followed:

filegroup(  
    name = "sqlTest",  
    testonly = 1,  
    srcs = glob(["src/main/sql/*.sql"]),  
    visibility = ["//visibility:public"],  
)

Step 4: Run SQL Lint

All the SQL lint logics are written in a python unit test.

import unittest  
import sqlfluff  
import os  
import codecs  
  
sqls_path = os.path.join(os.getcwd(), "repository-hibernate-impl/src/main/sql/")  
  
dialect = "postgres"  
  
class TestSum(unittest.TestCase):  
    def test_lint_sql(self):  
        sql_dir_files = os.listdir(sqls_path)  
        # assert that there's a sql file at least in the dir. 
        self.assertTrue(len(sql_dir_files) > 0)  
        for sql_filename in sql_dir_files:  
            if sql_filename.endswith(".sql"):  
                f = codecs.open(os.path.join(sqls_path, sql_filename), "r", "utf-8")  
                sql_content = f.read()  
                lint_result = sqlfluff.lint(sql_content, dialect=dialect)  
                # if there's sql issue then
                if len(lint_result) > 0:  
                    # get the fixed sql content from SQLFluff
                    fix_result = sqlfluff.fix(sql_content, dialect=dialect) 
                    # output the fixed sql content 
                    print("correct sql should be: \n" + fix_result)
                self.assertEqual(len(lint_result), 0)  
  
if __name__ == "__main__":  
    unittest.main()

Here is done that all the codes for SQL lint. Now we can run the command bazel test //..., in our local dev environment or in CI/CD pipeline, to lint our SQL file.

Why I choose Way 2

There're 2 reasons:

  1. 1. Way 1 needs that developers commit the code to git to make it possible that running SQL lint. But way 2, developers are able to run SQL lint on their dev environment.

  2. 2. Way 2 implement build caching (naturally supported by Bazel), which can save a lot of build costs.

引用链接

[1] SQLFluff: https://github.com/sqlfluff/sqlfluff

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值