azkaban历史数据清理

1、手工执行sql清理

SELECT count(1) FROM execution_flows ;
SELECT count(1) FROM execution_flows WHERE update_time<=UNIX_TIMESTAMP('2022-03-01 00:00:00')*1000 ;
DELETE  FROM execution_flows WHERE update_time<=UNIX_TIMESTAMP('2022-03-01 00:00:00')*1000 ;
 
SELECT count(1) FROM execution_jobs ;
SELECT count(1) FROM execution_jobs WHERE start_time<=UNIX_TIMESTAMP('2022-03-01 00:00:00')*1000 ;
DELETE  FROM execution_jobs WHERE start_time<=UNIX_TIMESTAMP('2022-03-01 00:00:00')*1000 ;
 
SELECT count(1) FROM `execution_logs` ;
SELECT count(1) FROM `execution_logs` WHERE upload_time<=UNIX_TIMESTAMP('2022-03-01 00:00:00')*1000 ;
DELETE  FROM execution_logs WHERE upload_time<=UNIX_TIMESTAMP('2022-03-01 00:00:00')*1000 ;

注意,如果数据量过多执行可能会很慢,所以一次不要删除太多数据

2、脚本定时清理

参考这篇文章:https://www.jianshu.com/p/609a3c72b787

这篇文章的脚本有点小问题,做一些改造:

#!/usr/bin/env python
# coding:utf-8
 
 
import sys
import pymysql
import time
 
az_prop_file = '/data/azkaban-3.81.0/azkaban-web-server/build/install/azkaban-web-server/conf/azkaban.properties'
days = 30
limits = 3000
 
# 清理30天之前的,每次清理3000条
sql_find_max_exec_id = '''select exec_id,from_unixtime(submit_time/1000) from execution_flows
WHERE exec_id = (
select max(exec_id) from execution_flows WHERE submit_time<unix_timestamp(date_sub(CURRENT_DATE(), interval %d day))*1000 order by exec_id asc limit %d
)
'''
 
tables_to_clear = ['execution_logs', 'execution_jobs', 'execution_flows']
 
sql_clear_execution = 'delete from %s where exec_id <= %s'
 
def read_props(filepath):
    d = {}
    with open(filepath) as fp:
        for line in fp:
            line = line.strip()
            if not line or line.startswith('#'): continue
            t = line.split('=')
            d[t[0].strip()] = t[1].strip()
 
    return d
 
 
'''
mysql.port=3306
mysql.host=xxxx
mysql.database=azkaban
mysql.user=azkaban
mysql.password=aaaaaaaaaa
'''
#props = read_props(sys.argv[1])
props = read_props(az_prop_file)
 
def delete_log():
    dbconn = pymysql.connect(host=props['mysql.host'], user=props['mysql.user'],
    password=props['mysql.password'], database=props['mysql.database'],
    port=int(props.get('mysql.port', 3306)))
 
    cursor = dbconn.cursor()
    print('\n\n%s start' % time.strftime('%Y-%m-%d %H:%M'))
    try:
        count = cursor.execute(sql_find_max_exec_id % (days,limits))
        if count==0 :
            print("没有待清理的数据")
            sys.exit(0)
        max_exec_id, submit_time = cursor.fetchone()
 
        print('clear logs, max exec_id %d submit_time %s' % (max_exec_id, submit_time))
 
        for tbl in tables_to_clear:
            sql = sql_clear_execution % (tbl, max_exec_id)
            affected = cursor.execute(sql)
            print('delete from table %20s, %8d records deleted' % (tbl, affected))
        dbconn.commit()
    except Exception as e:
        dbconn.rollback()
        raise e
    finally:
        cursor.close()
        dbconn.close()
 
while True:
    delete_log()

创建/data/script/clean_azkaban_history_log.py文件放到azkaban-web所在的机器上

配置定时任务:crontab -e

#每天凌晨3点清理azkaban历史数据
0 3 * * * python /data/script/clean_azkaban_history_log.py >> /tmp/clean_azkaban_history_log.log

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值