spark之ETL调度 - crontab命令调度

1、概述

主要通过crontab编写定时任务,调度shell脚本,shell脚本起调py脚本

2、crontab配置定时任务

--每天八点定时调度,插入执行日志到指定文件
00 08 * * * /usr/bin/sh 
/home/work/shell/main_worklog.sh > /home/work/logs/cron_’date+\%Y\%m\%d’.log

3、shell配置环境变量,然后起调python任务

#!/bin/sh
export SPARK_HOME=/../spark-2.1.1-bin-hadoop2.6
export JAVA_HOME=/../jdk1.8.0_162
export PATH=$JAVA_HOME/bin:$PATH

/../python  /../main_workflow.py

4、循环执行spark作业

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys
import datetime
import os

#传入日期参数,便于回溯数据,
if len(sys.argv) < 2:
    today = datetime.datetime.today()
    oneday = datetime.timedelta(days=1)
    yesterday = today - oneday
    datestr = yesterday.strftime("%Y%m%d")
else:
    datestr= sys.argv[1]           

#配置环境变量
os.system("export PYTHONIOENCODING=utf8")
os.system("export SPARK_HOME=/usr/local/spark-2.1.1-bin-hadoop2.6")
os.system("export JAVA_HOME=/usr/local/jdk1.8.0_162/")
os.system("export PATH=$JAVA_HOME/bin:$PATH")

#调起spark任务,执行job		
os.system("/usr/local/spark-2.1.1-bin-hadoop2.6/bin/spark-submit   --master yarn --deploy-mode client  --queue root.production.userprofile  --driver-memory 1g  --executor-memory 8g --executor-cores 2 --num-executors 50  /home/userprofile/userprofile_cookieid_gender.py " + datestr)
os.system("/usr/local/spark-2.1.1-bin-hadoop2.6/bin/spark-submit   --master yarn --deploy-mode client  --queue root.production.userprofile --driver-memory 1g  --executor-memory 4g --executor-cores 2 --num-executors 50  /home/userprofile/userprofile_cookieid_country.py  " + datestr)
os.system("/usr/local/spark-2.1.1-bin-hadoop2.6/bin/spark-submit   --master yarn --deploy-mode client  --queue root.production.userprofile --driver-memory 1g  --executor-memory 4g --executor-cores 2 --num-executors 50  /home/userprofile/userprofile_cookieid_install_days.py  " + datestr)
os.system("/usr/local/spark-2.1.1-bin-hadoop2.6/bin/spark-submit   --master yarn --deploy-mode client  --queue root.production.userprofile --driver-memory 1g  --executor-memory 4g --executor-cores 2 --num-executors 50  /home/userprofile/userprofile_cookieid_last_paid_days.py  " + datestr)
os.system("/usr/local/spark-2.1.1-bin-hadoop2.6/bin/spark-submit   --master yarn --deploy-mode client  --queue root.production.userprofile --driver-memory 1g  --executor-memory 4g --executor-cores 2 --num-executors 50  /home/userprofile/userprofile_cookieid_purchase_state.py  " + datestr)
os.system("/usr/local/spark-2.1.1-bin-hadoop2.6/bin/spark-submit   --master yarn --deploy-mode client  --queue root.production.userprofile --driver-memory 1g  --executor-memory 4g --executor-cores 2 --num-executors 50  /home/userprofile/userprofile_cookieid_registed_state.py  " + datestr)
os.system("/usr/local/spark-2.1.1-bin-hadoop2.6/bin/spark-submit   --master yarn --deploy-mode client  --queue root.production.userprofile --driver-memory 4g  --executor-memory 8g --executor-cores 2 --num-executors 50  /home/userprofile/userprofile_userid_edm.py " + datestr)

# 预警监控,检查异常
os.system("/usr/local/spark-2.1.1-bin-hadoop2.6/bin/spark-submit   --master yarn --deploy-mode client --queue root.production.userprofile --driver-memory 4g --executor-memory 8g --executor-cores 2 --num-executors 50  /home/userprofile/checkdata_cookieid.py " + datestr)
os.system("/usr/local/spark-2.1.1-bin-hadoop2.6/bin/spark-submit   --master yarn --deploy-mode client --queue root.production.userprofile --driver-memory 4g  --executor-memory 8g --executor-cores 2 --num-executors 50  /home/userprofile/checkdata_userid.py " + datestr)




评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

随缘清风殇

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值