使用 COPY 加速 PostgreSQL 批量插入

最新推荐文章于 2024-04-15 11:01:55 发布

中年如酒

最新推荐文章于 2024-04-15 11:01:55 发布

阅读量814

点赞数

分类专栏： Postgresql 文章标签： postgresql 数据库

本文链接：https://blog.csdn.net/weixin_43230594/article/details/134476752

版权

Postgresql 专栏收录该内容

32 篇文章 4 订阅

订阅专栏

文章目录

1.copy命令介紹
2.copy vs insert的优势
3.测量性能
4.结论

1.copy命令介紹

PostgreSQL 中的命令COPY是执行批量插入和数据迁移的强大工具。它允许快速有效地将大量数据插入表中。

COPY命令为批量插入和数据迁移提供了更简单且更具成本效益的解决方案。
可以避免使用诸如：分布式处理工具、为数据库添加更多的 CPU 和 RAM的方案或者其他的加速方案
因此，如果有一个任务需要在短时间内插入大量行，可以考虑使用COPY 命令。它可以显著加快数据迁移和载入过程。
据说PostgreSQL 16 已将 COPY 的性能提高了 300% 以上
详尽的有关copy命令的语法可参考官网

2.copy vs insert的优势

Three-Column Table

	COPY	INSERT (multi-line)
Logging	One log for the entire load	One log for each line/entry
Network	No latency, data is streamed	Latency between inserts
Parsing	Only one parsing operation	Parsing overhead
Transaction	Single transaction	Each insert statement is a separate transaction
Query Plan	Simpler query execution plan	Lots of different query execution plans

总而言之，COPY 速度更快，因为与多行 INSERT 语句相比，它减少了日志记录、网络延迟、解析和事务管理的开销。它允许更简单的查询执行计划，从而实现更快、更高效的批量插入和数据迁移。一个权衡是它需要直接访问文件系统，因此它可能并不适合所有需要插入数据的场景。另一个权衡是持久性，COPY 生成很少的日志，并在单个事务中执行所有日志，这使得它的风险更大。

3.测量性能

创建3个测试表

test=# create table t1 (id1 bigint,id2 bigint);
CREATE TABLE
Time: 7.744 ms
test=# create table t2 (id1 bigint,id2 bigint);
CREATE TABLE
Time: 8.680 ms
test=# create table t3 (id1 bigint,id2 bigint);
CREATE TABLE
Time: 0.924 ms

向t1插入1千万笔测试资料，产生size 422MB的测试表

test=# insert into t1 select generate_series(1,10000000),generate_series(10000000,1,-1);
INSERT 0 10000000
Time: 11933.658 ms (00:11.934)
test=# select count(1),pg_size_pretty(pg_relation_size('t1')) from t1;
  count   | pg_size_pretty 
----------+----------------
 10000000 | 422 MB
 (1 row)

Time: 377.028 ms

汇出成csv文件备用

test=# \copy t2 from '/var/lib/postgresql/t1.csv';
COPY 10000000
Time: 5997.302 ms (00:05.997)

验证汇出的csv文件的数据行数与大小

postgres@pgd-prod01:~$ cat t1.csv|wc -l
10000000
postgres@pgd-prod01:~$ ls -alh|grep t1;
-rw-rw-r--  1 postgres postgres  151M Nov 18 11:26 t1.csv

test=# insert into t3 select * from t1;
INSERT 0 10000000
Time: 9811.316 ms (00:09.811)

4.结论

最后测试结果表明，COPY 命令与 INSERT 命令相比具有更高的效率，速度上的差异是相当显着的，当插入同样的1仟万笔数据时，copy费时5997.302 ms,而insert费时9811.316 ms,相较insert而言，节约40%的时间，这是在postgresql 10版本的测试，postgresql 16据说提升更多