3、postgresql中递归\CTE\WITH\INSERT\COPY

ly201552y

已于 2024-07-31 11:47:21 修改

阅读量247

点赞数 8

分类专栏： postgresql 文章标签： postgresql 数据库

于 2024-07-31 00:49:50 首次发布

本文链接：https://blog.csdn.net/lycdf/article/details/140810294

版权

postgresql 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

with查询（common table expressions）

目的：简化sql减少嵌套，可理解为在查询中定义了临时表，递归查询

with查询

postgres=# with test as (select generate_series(1,5)) select * from test;
 generate_series 
-----------------
               1
               2
               3
               4
               5
(5 rows)

在这里插入图片描述

CTE递归查询

创建表并插入测试数据：表 funds 来存储基金和它们持有的子基金信息

假设我们有3个基金，其中基金2和基金3是基金1的子基金，基金3又持有基金2：

CREATE TABLE funds (
    fund_id SERIAL PRIMARY KEY,
    fund_name VARCHAR(255),
    parent_fund_id INTEGER REFERENCES funds(fund_id)
);
INSERT INTO funds (fund_name, parent_fund_id) VALUES
('基金A', NULL), -- 基金A是顶级基金
('基金B', 1),    -- 基金B是基金A的子基金
('基金C', 1);    -- 基金C也是基金A的子基金

UPDATE funds SET parent_fund_id = 3 WHERE fund_id = 2;-- 基金C持有基金B

postgres=# select * from funds;
 fund_id | fund_name | parent_fund_id 
---------+-----------+----------------
       1 | 基金A     |               
       3 | 基金C     |              1
       2 | 基金B     |              3

要求返回所有基金的层次结构，包括直接和间接的持有关系

WITH RECURSIVE fund_hierarchy AS (
    SELECT fund_id, fund_name, parent_fund_id
    FROM funds
    WHERE fund_id = 1 -- 从基金A开始

    UNION ALL

    SELECT f.fund_id, f.fund_name, f.parent_fund_id
    FROM funds f
    INNER JOIN fund_hierarchy fh ON f.parent_fund_id = fh.fund_id
)
SELECT * FROM fund_hierarchy;

查询结果

 fund_id | fund_name | parent_fund_id 
---------+-----------+----------------
       1 | 基金A     |               
       3 | 基金C     |              1
       2 | 基金B     |              3
(3 rows)

UNION ALL：合并两个多个sql集合，不去重，递归查询中，用来计算与递归部分的结合，union会去重

INNER JOIN：连接两个表的语句，只有当两个表中都有匹配行才返回结果，在递归查询中，INNER JOIN 通常用于将递归部分的结果与自身连接，以便能够递归地查询更深层次的数据

初始的 SELECT 语句（锚部分）选择了顶级基金（基金A），这里我们假设基金A的 fund_id 是 1。
递归部分的 SELECT 语句通过 INNER JOIN 将 funds 表与递归别名 fund_hierarchy 连接起来，这里连接的条件是子基金的 parent_fund_id 等于递归查询中当前行的 fund_id。
然后，使用 UNION ALL 将锚部分的结果和递归部分的结果合并起来，形成一个完整的层次结构。
可以这样理解：
从锚部分开始，选择顶级基金。
递归部分从当前结果集中选择子基金，通过 INNER JOIN 连接到自身，找到所有直接子基金。
将这些直接子基金添加到结果集中。
递归地重复步骤2和3，直到没有更多的子基金可以被找到。

insert批量插入

通常方式:insert into 表名 select XXX from YYY;

pg另外的方式：

insert into values (),(),(),

优势：在一个事务中完成？[这个后面验证一下]，减少和数据库交互和wal日志生成，提高插入效率

INSERT INTO funds (fund_name, parent_fund_id) VALUES ('基金D', NULL),('基金E', 2),('基金F', 3);

postgres=# select * from funds;
 fund_id | fund_name | parent_fund_id 
---------+-----------+----------------
       1 | 基金A     |               
       3 | 基金C     |              1
       2 | 基金B     |              3
(3 rows)

postgres=# INSERT INTO funds (fund_name, parent_fund_id) VALUES ('基金D', NULL),('基金E', 2),('基金F', 3);
INSERT 0 3
postgres=# select * from funds;
 fund_id | fund_name | parent_fund_id 
---------+-----------+----------------
       1 | 基金A     |               
       3 | 基金C     |              1
       2 | 基金B     |              3
       4 | 基金D     |               
       5 | 基金E     |              2
       6 | 基金F     |              3
(6 rows)

COPY和\copy命令

速度比insert更快，最好在事务中进行。

两个命令区别：

(1)COPY是SQL命令，\copy 是元命令.
(2)COPY必须具有SUPERUSER超级权限（将数据通过stdin/stdout方式导入导出情况除外），而 \copy 元命令不需要SUPERUSER权限。
(3)COPY读取或写入数据库服务端主机上的文件，而\copy 元命令是从psql客户端主机读取或写入文件。
大数据量导出文件或大文件数据导入数据库，COPY比\copy 性能高，因为\copy命令每行数据都需要交互。

以下是COPY和\copy导入数据性能测试

结论：COPY导入数据性能比\copy好
生成测试数据

CREATE TABLE test_data (
    id BIGINT PRIMARY KEY,
    value TEXT
);
DO $$
DECLARE
    i int;
BEGIN
    FOR i IN 1..20000000 LOOP
        INSERT INTO test_data (id, value) VALUES (
            i,
            'Value' || floor(random() * 100000)::text  -- 随机生成文本
        );
    END LOOP;
END $$;

使用COPY把表拷贝到本机

postgres=# COPY test_data to '/var/lib/pgsql/test_data.txt';
COPY 500000
Time: 80.702 ms
postgres=# \q
-bash-4.2$ ls
10  logfile  test_data.txt
-bash-4.2$ head -10 test_data.txt 
1	Value31714
2	Value49837
3	Value1007
4	Value21399
5	Value72550
6	Value5330
7	Value89267
8	Value14112
9	Value92807
10	Value51580

清理表，分别使用命令拷贝数据，50w数据测试

postgres=# truncate test_data ;
TRUNCATE TABLE
postgres=# COPY test_data from '/var/lib/pgsql/test_data.txt';
COPY 500000
Time: 620.223 ms
postgres=# truncate test_data ;
TRUNCATE TABLE
Time: 11.474 ms
postgres=# \copy test_data from '/var/lib/pgsql/test_data.txt';
COPY 500000
Time: 648.234 ms

插入2kw数据测试：

Time: 132524.928 ms (02:12.525)

导出数据

postgres=# COPY test_data to '/var/lib/pgsql/test_data.txt';
COPY 20000000
Time: 5083.841 ms (00:05.084)
postgres=# truncate test_data ;
TRUNCATE TABLE
Time: 850.747 ms
postgres=# \q
-bash-4.2$ du -sh test_data.txt 
369M	test_data.txt

导入对比测试

postgres=# COPY test_data from '/var/lib/pgsql/test_data.txt';
COPY 20000000
Time: 26875.071 ms (00:26.875)
postgres=# truncate test_data ;
TRUNCATE TABLE
Time: 135.217 ms
postgres=# \copy test_data from '/var/lib/pgsql/test_data.txt';
COPY 20000000
Time: 27416.673 ms (00:27.417)