Python利用groupby模块进行Mysql分表数据的汇总统计

Mysql分表数据,一般情况下我们是把他导入到数据仓库中整合到一张表中;由于环境受限,所以考虑使用Python脚本取分表数据在Python中进行计算的形式。
1、取Mysql分表进行汇总计算的Python脚本
/Users/nisj/PycharmProjects/BiDataProc/love/staticFromMysqlSubmeter.py
# -*- coding=utf-8 -*-
import os
import re
from itertools import groupby
from operator import itemgetter

def getMysqlSubmeterData(submeter_cnt, src_tabName):
    allDataList=[]
    for submeterPlus in range(0, submeter_cnt, 1):
        submeterData=os.popen("""source /etc/profile; \
                /usr/local/mysql/bin/mysql -hMysqlHost -PMysqlPort -uMysqlUser -pMysqlPass -Djellyfish_quest \
                -N -e"set names utf8; \
                select uid,\`date\` \
                from {src_tabName}_{submeterPlus} \
                where \`date\` between '2017-11-24' and '2017-11-27' and task_id in(14,15) and state=1;" \
                """.format(src_tabName=src_tabName, submeterPlus=submeterPlus)).readlines();

        submeterDataList = []
        for submeterDataRow in submeterData:
            submeterD = re.split('\t', submeterDataRow.replace('\n', '').replace('`', '').replace('\'', '').replace('"', ''))
            submeterDataList.append(submeterD)

        allDataList.extend(submeterDataList)

    duplicateRemovalList=[]
    for allD in allDataList:
        if allD not in duplicateRemovalList:
            duplicateRemovalList.append(allD)

    groupbyAllData = groupby(sorted(duplicateRemovalList, key=itemgetter(1)), itemgetter(1))

    for key, item in groupbyAllData:
        i = 0
        for jtem in item:
            i += 1
        print key, i

# Batch Test
getMysqlSubmeterData(submeter_cnt=256, src_tabName='user_task_done_list')
2、在Hive进行Sql查询的脚本
select `date`,count(distinct uid) 
from oss_user_task_done_list 
where `date` between '2017-11-24' and '2017-11-27' and task_id in(14,15) and state=1
group by `date`;
3、另一个数据分组汇总的简单示例
/Users/nisj/PycharmProjects/BiDataProc/love/listGroup.py
# -*- coding=utf-8 -*-
from itertools import groupby

lst = [1, 2, 8, 11, 25, 43, 6, 9, 29, 51, 66, 100]

def dataClassification(num):
    if num <= 10:
        return 'less'
    elif num >= 30:
        return 'great'
    else:
        return 'middle'

# Batch Test
print [(k, list(g))for k, g in groupby(sorted(lst), key=dataClassification)]
for key, item in groupby(sorted(lst), key=dataClassification):
    i = 0
    for jtem in item:
        i += 1
    print key, i

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值