dbt
在运行时,会按照ref()
和source()
方法,构造出所有模型的DAG
——有向无环图,再按照DAG的顺序执行。此外,dbt
也是通过这两个方法,确定数据血缘关系,并在docs
中展现出来。
ref()
引用的是dbt
中构造的表模型,但是对于数据库中已经有,在dbt
中没有模型的表,虽然可以直接在SQL语句中使用,但这种方法在血源图中就看不到了,需要使用source()
引用。
测试环境
- dbt 1.5
- 数据库:Postgres 14
构造测试表
- 在
Postgres
数据库中直接生成表,做为source
来源。
create table dim_student as (
select 1 as id, '张三' as name, 19 as age, 'M' as gender, '计算机' as major union all
select 2 as id, '李四' as name, 18 as age, 'M' as gender, '物理' as major union all
select 3 as id, '王舞' as name, 20 as age, 'F' as gender, '外语' as major
)
- 在
dbt
中建立事实表模型,使用<dbt-project>/models/fct_score.sql
with source_data as (
select 1 as id, 1 as student_id, 'chinese' as class, 98 as score union all
select 2 as id, 1 as student_id, 'math' as class, 97 as score union all
select 3 as id, 1 as student_id, 'english' as class, 96 as score union all
select 4 as id, 2 as student_id, 'chinese' as class, 88 as score union all
select 5 as id, 2 as student_id, 'math' as class, 87 as score union all
select 6 as id, 2 as student_id, 'english' as class, 86 as score union all
select 4 as id, 3 as student_id, 'chinese' as class, 83 as score union all
select 5 as id, 3 as student_id, 'math' as class, 82 as score union all
select 6 as id, 3 as student_id, 'english' as class, 81 as score
)
select * from source_data
配置source
在 <dbt-project>/models/schema.yml
配置文件中增加 source
配置,如果没有该文件的话,可以自行添加。
- name: src
,可以在一个schema
下面放多个来源表,scr
是这个schema
的名称,也可以自己取名database
,指定来源数据库。schema
,指定来源schema
。tables
,下面是来源表- name: dim_student
,来源表名descriptions
,来源表描述columns
:来源表字段,可以不填
version: 2
sources:
- name: src
database: postgres
schema: test_exp
tables:
- name: dim_student
descriptions: "维度表: 学生信息"
columns:
- name: id
- name: name
- name: age
- name: gender
- name: major
测试
使用dim_student
(已存在的表) 和 fct_score
(在dbt
中建立的模型),建立新表 fct_student_score
with source_data as (
select
fct.student_id ,
dim.name,
dim.gender,
dim.age,
dim.major,
fct.class,
fct.score
from {{ ref('fct_score') }} fct
left join {{ source('test_exp', 'dim_student')}} dim on fct.student_id=dim.id
order by fct.score desc
)
select * from source_data
运行 dbt run -s fct_student_score
后生成的表
在 docs
中的血源关系