大数据-完转数据-调用postgres的多种操作_pg的存储过程里面可以调shell吗-CSDN博客

本文链接：https://blog.csdn.net/s_unbo/article/details/125236545

一、shell 脚本调用postgress存储过程及代码块

1、修改pg_hba.conf 设置本地连接无密码

# "local" is for Unix domain socket connections only
local   all             all                                     trust
# IPv4 local connections:
host    all             all             127.0.0.1/32            trust

2、shell脚本中调用存储过程

vi pg_porcDatacont.sh

需在$ $ 前加 \ 直接在shell脚本中调用，穿插调用存储过程，正常sql及do $ $ 代码块

#!/bin/bash
echo "BEGINNING OF $0 `date +'%Y-%m-%d %H:%M:%S'`"
echo "plpgsql running"
psql -h ${POSTGIS_IP}  -U ${PGUSER} -d ${POSTGIS_DB_NAME} <<EOF
CALL public.pp();
CALL public.cc();
update aa set bb=1;
do \$$
    declare
        rowcount integer;
    BEGIN
        update aa set bb=1;
        get diagnostics rowcount := row_count;
        raise notice 'update bts count is % ',rowcount;
        IF rowcount < 100 THEN
            COMMIT;
        ELSE
            ROLLBACK;
        END IF;
    END;
\$$ LANGUAGE 'plpgsql';
EOF
echo "END OF $0 ("`date +'%y-%m-%d %H:%M:%S'`")"

vi  linux_schedule_test

#!/bin/bash
#***********************************************
# ** linux_schedule_test
#**********************************************
export PATH=/usr/pgsql-10/bin:/usr/bin;
 
psql "host=127.0.0.1 port=5434 user=admin  password=123456 dbname=linux_schedule_test" << EOF #2>/dev/null
 
\copy test from '/etl-script/test.csv' delimiter ',' csv header encoding 'GBK';
 
create table test$(date +%Y%m%d_%H%m%S) as select * from test;
\q
EOF

3、crontab 定期执行添加执行任务

编辑定时任务:

crontab -e

*/1 * * * * /home/sample/pg_porcDatacont.sh >> /var/log/pg_porcData.log
*/1 * * * * /home/sample/linux_schedule_test.sh >> /var/log/pg_porcData.log

查看定时任务:

crontab -l

查看执行结果:
cat /var/log/pg_porcData.log

二、python操作postgresql

不仅适合pgsql,更适合mysql，思路都是一致的，如果读者学会使用psycopg2操作pgsql，那么使用PyMySQL 操作mysql也是很简单；本段涵盖内容广泛，提供的操作选择性很多，比如多种数据插入操作，防止sql注入方式，异常处理，sql语句打印处理，显示行号等操作，一篇文章真的收益匪浅；

1、安装 psycopg2

#  pip install psycopg2

2、连接数据库

每条完整的sql执行步骤如下：

建立连接获得 connect 对象
获得游标对象，一个游标对象可以对数据库进行执行操作，非线程安全，多个应用会在同一个连接种创建多个光标；
书写sql语句
调用execute()方法执行sql
抓取数据（可选操作）
提交事物
关闭连接

# -*- coding: utf-8 -*-
import psycopg2
# 获得连接
conn = psycopg2.connect(database="python", user="postgres", password="123456", host="127.0.0.1", port="5432")
# 获得游标对象
cursor = conn.cursor()
# sql语句
sql = "SELECT VERSION()"
# 执行语句
cursor.execute(sql)
# 获取单条数据.
data = cursor.fetchone()
# 打印
print("database version : %s " % data)
# 事物提交
conn.commit()
# 关闭数据库连接
conn.close()

输出结果打印出数据库版本说明连接数据库成功：

database version : PostgreSQL 11.3, compiled by Visual C++ build 1914, 64-bit

3、创建表

创建学生表主要有字段id 唯一标识，字段 num 代表学号，字段 name 代表学生姓名；详细的建表默认规则转换见附录

# -*- coding: utf-8 -*-
import psycopg2
# 获得连接
conn = psycopg2.connect(database="python", user="postgres", password="123456", host="127.0.0.1", port="5432")
# 获得游标对象，一个游标对象可以对数据库进行执行操作
cursor = conn.cursor()
# sql语句 建表
sql = """CREATE TABLE student (
id serial4 PRIMARY KEY, 
num int4,
name varchar(25));"""
# 执行语句
cursor.execute(sql)
print("student table created successfully")
# 事物提交
conn.commit()
# 关闭数据库连接
conn.close()

4、插入操作

4.1 插入数据姿势一
知识追寻者提供的第一种防止sql注入的插入数据方式（具有占位符的预编译sql），重要程度不言而喻；美中不足是字符串类型必须带上单引号；

# -*- coding: utf-8 -*-
import psycopg2
# 获得连接
conn = psycopg2.connect(database="python", user="postgres", password="123456", host="127.0.0.1", port="5432")
# 获得游标对象，一个游标对象可以对数据库进行执行操作
cursor = conn.cursor()
# sql语句 建表
sql ="INSERT INTO student (num, name) 
                    VALUES (%s, '%s')" % 
                    (100, 'zszxz')
# 执行语句
cursor.execute(sql)
print("successfully")
# 事物提交
conn.commit()
# 关闭数据库连接
conn.close()

4.2 插入数据姿势二（参数分离）
知识追寻者认为下面参数与sql语句分离插入的姿势更简便帅气，也是防止sql注入问题；强烈推荐；

# -*- coding: utf-8 -*-
import psycopg2
# 获得连接
conn = psycopg2.connect(database="python", user="postgres", password="123456", host="127.0.0.1", port="5432")
# 获得游标对象，一个游标对象可以对数据库进行执行操作
cursor = conn.cursor()
# sql语句 建表
sql ="""INSERT INTO student (num, name) VALUES (%s, %s)"""
params = (101, 'zszxz')
# 执行语句
cursor.execute(sql,params)
print("successfully")
# 事物提交
conn.commit()
# 关闭数据库连接
conn.close()

4.3 插入数据姿势三（字典）
第三种姿势也就是是支持字典映射关系插入，使用字典方式的插入数据是根据字典的key进行匹配占位符，强烈推荐；

# -*- coding: utf-8 -*-
import psycopg2
# 获得连接
conn = psycopg2.connect(database="python", user="postgres", password="123456", host="127.0.0.1", port="5432")
# 获得游标对象，一个游标对象可以对数据库进行执行操作
cursor = conn.cursor()
# sql语句 建表
sql ="""INSERT INTO student (num, name) VALUES (%(num)s, %(name)s)"""
params = {'num':102, 'name':'zszxz'}
# 执行语句
cursor.execute(sql,params)
print("successfully")
# 事物提交
conn.commit()
# 关闭数据库连接
conn.close()

5、查询操作

5.1 查询一条数据
使用fetchone()方法可以抓取一条数据, 返回的是元组；

# -*- coding: utf-8 -*-
import psycopg2
# 获得连接
conn = psycopg2.connect(database="python", user="postgres", password="123456", host="127.0.0.1", port="5432")
# 获得游标对象，一个游标对象可以对数据库进行执行操作
cursor = conn.cursor()
# sql语句 建表
sql ="""SELECT * FROM student;"""
# 执行语句
cursor.execute(sql)
# 抓取
row = cursor.fetchone()
print(row)
# 事物提交
conn.commit()
# 关闭数据库连接
cursor.close()
conn.close()

输出结果：

(1, 100, 'zszxz')

5.2 查询多条数据
使用fetchmany([size=cursor.arraysize])方法可以抓取多条数据；
此方法可以多次使用，直到数据库中没有数据，此时会返回空列表；
如果不传参数，会限制查询条数，一般就是返回第一条；

# -*- coding: utf-8 -*-
import psycopg2
# 获得连接
conn = psycopg2.connect(database="python", user="postgres", password="123456", host="127.0.0.1", port="5432")
# 获得游标对象，一个游标对象可以对数据库进行执行操作
cursor = conn.cursor()
# sql语句 建表
sql ="""SELECT * FROM student;"""
# 执行语句
cursor.execute(sql)
# 抓取
#row = cursor.fetchone()
rows = cursor.fetchmany(2)
print(rows)
# 事物提交
conn.commit()
# 关闭数据库连接
cursor.close()
conn.close()

输出结果：

[(1, 100, 'zszxz'), (2, 101, 'zszxz')]

5.3 查询全部数据
使用 fetchall() 方法会抓取所有数据；

# -*- coding: utf-8 -*-
import psycopg2
# 获得连接
conn = psycopg2.connect(database="python", user="postgres", password="123456", host="127.0.0.1", port="5432")
# 获得游标对象，一个游标对象可以对数据库进行执行操作
cursor = conn.cursor()
# sql语句 建表
sql ="""SELECT * FROM student;"""
# 执行语句
cursor.execute(sql)
# 抓取
rows = cursor.fetchall()
print(rows)
# 事物提交
conn.commit()
# 关闭数据库连接
cursor.close()
conn.close()

输出结果：

[(1, 100, 'zszxz'), (2, 101, 'zszxz'), (3, 102, 'zszxz')]

5.4 按条件查询
带参查询读者应该谨记sql 与参数分离
参数的末尾必须加上逗号
如果知道返回的数据就一条使用fetchone()方法，如果无特殊要求，否则建议使用fetchall()方法

# -*- coding: utf-8 -*-
import psycopg2
# 获得连接
conn = psycopg2.connect(database="python", user="postgres", password="123456", host="127.0.0.1", port="5432")
# 获得游标对象，一个游标对象可以对数据库进行执行操作
cursor = conn.cursor()
# sql语句 建表
sql ="""SELECT * FROM student where id = %s;"""
params = (1,)
# 执行语句
cursor.execute(sql,params)
# 抓取
rows = cursor.fetchall()
print(rows)
# 事物提交
conn.commit()
# 关闭数据库连接
cursor.close()
conn.close()

输出结果：

[(1, 100, 'zszxz')]

6、更新操作

更新操作跟之前的查询，插入类似，参数对应的文章分清楚即可。

# -*- coding: utf-8 -*-
import psycopg2
# 获得连接
conn = psycopg2.connect(database="python", user="postgres", password="123456", host="127.0.0.1", port="5432")
# 获得游标对象，一个游标对象可以对数据库进行执行操作
cursor = conn.cursor()
# sql语句 建表
sql ="""update student set name = %s where id = %s  """
params = ('知识追寻者',3,)
# 执行语句
cursor.execute(sql,params)
# 事物提交
conn.commit()
# 关闭数据库连接
cursor.close()
conn.close()

7、删除操作

删除操作很简单，看如下代码，与之前的代码流程没什么区别；

# -*- coding: utf-8 -*-
import psycopg2
# 获得连接
conn = psycopg2.connect(database="python", user="postgres", password="123456", host="127.0.0.1", port="5432")
# 获得游标对象，一个游标对象可以对数据库进行执行操作
cursor = conn.cursor()
# sql语句 建表
sql ="""delete from  student where id = %s  """
params = (3,)
# 执行语句
cursor.execute(sql,params)
# 事物提交
conn.commit()
# 关闭数据库连接
cursor.close()
conn.close()

8、异常处理

处理 sql 的异常非常重要，知识追寻者这边使用psycopg2的 Error 进行异常捕获，能捕获到sql执行时期的所有异常；下面代码中表test是库中不存的表，执行sql后会报异常，经过异常捕获后非常美观，不影响程序运行；

# -*- coding: utf-8 -*-
import psycopg2
# 获得连接
conn = psycopg2.connect(database="python", user="postgres", password="123456", host="127.0.0.1", port="5432")
# 获得游标对象，一个游标对象可以对数据库进行执行操作
cursor = conn.cursor()
# sql语句 建表
sql ="""select * from test"""
params = (3,)
try:
    # 执行语句
    cursor.execute(sql,params)
except psycopg2.Error as e:
    print(e)
# 事物提交
conn.commit()
# 关闭数据库连接
cursor.close()
conn.close()

执行结果

错误:  关系 "test" 不存在
LINE 1: select * from test

9、打印sql

使用cursor.query 可以查看执行的sql语句，方便排查；

# -*- coding: utf-8 -*-
import psycopg2
# 获得连接
conn = psycopg2.connect(database="python", user="postgres", password="123456", host="127.0.0.1", port="5432")
# 获得游标对象，一个游标对象可以对数据库进行执行操作
cursor = conn.cursor()
# sql语句 建表
sql ="""select * from student"""
try:
    # 执行语句
    cursor.execute(sql,)
    que = cursor.query
    print(que)
except psycopg2.Error as e:
    print(e)
# 事物提交
conn.commit()
# 关闭数据库连接
cursor.close()
conn.close()

执行结果：

b'select * from student'

10、获取总条数

使用cursor.rowcount 可以获得表中所有行总数；

# 获得连接
conn = psycopg2.connect(database="python", user="postgres", password="123456", host="127.0.0.1", port="5432")
# 获得游标对象，一个游标对象可以对数据库进行执行操作
cursor = conn.cursor()
# sql语句 建表
sql ="""select * from student"""
# 执行语句
cursor.execute(sql)
count = cursor.rowcount
print(count)
# 事物提交
conn.commit()
# 关闭数据库连接
conn.close()

输出

11、显示行号

使用cursor.rownumber 可以显示当前查询sql获得数据的行号，每抓取一次光标的索引就会加1；

# 获得连接
conn = psycopg2.connect(database="python", user="postgres", password="123456", host="127.0.0.1", port="5432")
# 获得游标对象，一个游标对象可以对数据库进行执行操作
cursor = conn.cursor()
# sql语句 建表
sql ="""select * from student """
# 执行语句
cursor.execute(sql)
row_1 = cursor.fetchone()
print(cursor.rownumber)
row_2 = cursor.fetchone()
print(cursor.rownumber)
# 事物提交
conn.commit()
# 关闭数据库连接
conn.close()

输出结果：

1
2

12、显示执行参数

使用 mogrify(operation[, parameters]) 能够显示执行语句的参数绑定结果，返回的是字符串形式；

# 获得连接
conn = psycopg2.connect(database="python", user="postgres", password="123456", host="127.0.0.1", port="5432")
# 获得游标对象，一个游标对象可以对数据库进行执行操作
cursor = conn.cursor()
# sql语句 建表
sql ="""INSERT INTO student (num, name) VALUES (%s, %s)"""
params = (102, '知识追寻者')
# 执行语句
result = cursor.mogrify(sql,params)
print(result.decode('UTF-8'))
cursor.execute(sql,params)
# 事物提交
conn.commit()
# 关闭数据库连接
conn.close()

执行结果：

INSERT INTO student (num, name) VALUES (102, '知识追寻者')

13、附录

支持默认的类型转换如下,如果想要使用强制类型转换，详细的可以参照pgsql官网手册；

Python	PostgreSQL
None	NULL
bool	bool
float	real，double
int，long	smallint，integer，bigint
Decimal	numeric
str，unicode	varchar，text
buffer，memoryview，bytearray，bytes，Buffer protocol	bytea
date	date
time	time,timetz
datetime	timestamp,timestamptz
timedelta	interval
list	ARRAY
tuple,namedtuple	Composite typesIN syntax
dict	hstore
Range	range
UUID	uuid
Anything	json
ipaddress	inet

三、pg_cron插件定时任务

pg_cron 是由 citusdata 公司开发的一个 PostgreSQL 定时任务插件（类似于 Oracle 中的 DBMS_SCHEDULER）。pg_cron 作为一个后台工作进程运行，使用类似 cron 的编辑语法，允许直接在数据库中执行定时任务。例如：

-- 每周六 3:30am (GMT) 删除历史记录SELECT cron.schedule('30 3 * * 6', $$DELETE FROM events WHERE event_time < now() - interval '1 week'$$); schedule----------       42
-- 每天 10:00am (GMT) 执行清理作业SELECT cron.schedule('nightly-vacuum', '0 10 * * *', 'VACUUM'); schedule----------       43
-- 将清理作业修改为 3:00am (GMT)SELECT cron.schedule('nightly-vacuum', '0 3 * * *', 'VACUUM'); schedule----------       43
-- 停止计划中的任务SELECT cron.unschedule('nightly-vacuum' ); unschedule ------------ t(1 row)
SELECT cron.unschedule(42); unschedule------------          t

1、安装 pg_cron

pg_cron 目前只支持 Linux 操作系统。对于 Red Hat、CentOS 以及 Fedora 等操作系统可以使用以下命令进行安装（PostgreSQL 12）：

sudo yum install -y pg_cron_12

对于 Debian 以及 Ubuntu 可以使用以下命令进行安装（PostgreSQL 12）：

sudo apt-get -y install postgresql-12-cron

另外，我们也可以使用源码进行编译安装：

git clone https://github.com/citusdata/pg_cron.git 
cd pg_cron
make
make install

2、配置 pg_cron 参数

为了在启动 PostgreSQL 时运行 pg_cron 后台工作进程，我们需要将 pg_cron 添加到 postgresql.conf 文件中的 shared_preload_libraries配置项。默认情况下，pg_cron 后台进程使用 postgres 数据库获取所需的元数据。但是我们也可以使用 cron.database_name 配置参数进行设置。

shared_preload_libraries = 'pg_cron'
cron.database_name = 'postgres'

另外我们还需要配置pg_hba.conf：

host    all    all   localhost        trust
host    all   all  0.0.0.0/0     trust

或者配置.pgpass文件也可以：

hostname:port:database:user:password

3、重启 PostgreSQL

pg_ctl stop
pg_ctl start

或

sudo service postgresql-12 restart

查询启动情况

ps -ef|grep pg_cron|grep -v grep

然后我们可以使用以下命令创建 pg_cron 函数以及元数据相关的表：

-- 使用 superuser 运行以下命令
CREATE EXTENSION pg_cron;
-- 可选操作，为其他用户授予访问权限
GRANT USAGE ON SCHEMA cron TO username;

4、使用

我们先看下该扩展为我们提供了哪些函数：

postgres=# \dx+ pg_cron 
    Objects in extension "pg_cron"
          Object description          
--------------------------------------
 function cron.job_cache_invalidate()
 function cron.schedule(text,text)
 function cron.unschedule(bigint)
 schema cron
 sequence cron.jobid_seq
 table cron.job
(6 rows)

cron.schedule：创建定时任务，其中第一个参数是执行频率，第二个参数是执行的任务。
cron.unschedule：取消定时任务，参数为jobid。
创建定时任务：

postgres=# SELECT cron.schedule('0 10 * * *', 'VACUUM');
 schedule 
----------
        1
(1 row)

调用存储过程或函数定时任务：

每分钟执行一次：
SELECT cron.schedule('process-new-events','* * * * *', 'select public.proc_t_fm_alarm_number()');
每5分钟执行一次：
SELECT cron.schedule('process-new-events','*/5 * * * *', 'select public.proc_t_fm_alarm_number()');

查看定时任务：

postgres=# select * from cron.job;
 jobid |  schedule  | command | nodename  | nodeport | database | username | active 
-------+------------+---------+-----------+----------+----------+----------+--------
     1 | 0 10 * * * | VACUUM  | localhost |     1921 | bill     | bill     | t
(1 row)

取消定时任务：

postgres=# SELECT cron.unschedule(1); 
 unschedule 
------------
 t
(1 row)

或者我们可以直接删除cron.job表中对应的记录也可以取消定时任务。

5、标准 cron 语法

pg_cron 使用标准的 cron 语法，其中 * 表示“每个该时间运行”，特定数字表示“仅在这个数字时运行”

┌───────────── 分钟 (0 - 59)
│ ┌────────────── 小时 (0 - 23)
│ │ ┌─────────────── 日期 (1 - 31)
│ │ │ ┌──────────────── 月份 (1 - 12)
│ │ │ │ ┌───────────────── 一周中的某一天 (0 - 6) (0 到 6 表示周末到下周六，
│ │ │ │ │ 7 仍然是周末）
│ │ │ │ │
│ │ │ │ │