通过python 运行hadoop

时间函数

from datetime import date, timedelta

def last_n_days(current_date=date.today(), n=0):
    if n in (0,1):
        return str(current_date - timedelta(days=n))
    return [str(current_date - timedelta(x)) for x in range(n, 0, -1)]

生成shell命令

# -*- coding: utf-8 -*-
import subprocess

file_list = last_n_days(n=7)
mapper = "mapper.py"
reducer = "reducer.py"
input_files = " ".join(['-input /dm/qq/userinfo_qq/{date}-*/qq_guid.txt'.format(date=each_date) for each_date in file_list])
output = '/dm/qq/merge'

mr_cmd = """hadoop jar /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.2.0.jar \
-output {output} \
-mapper  'python {mapper}' \
-reducer 'python {reducer}' \
-file {mapper}  \
-file {reducer}  \
{input_files}""".format(output=output, mapper=mapper,
                        reducer=reducer, input_files=input_files)


if __name__ = "__main__":
    print mr_cmd
    subprocess.call(mr_cmd)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值