流式数据处理

最新推荐文章于 2024-08-17 02:19:30 发布

置顶 hustlx

最新推荐文章于 2024-08-17 02:19:30 发布

阅读量2.7k

点赞数

分类专栏：数据挖掘

本文链接：https://blog.csdn.net/HUSTLX/article/details/50850069

版权

数据挖掘专栏收录该内容

4 篇文章 0 订阅

订阅专栏

1、直接登陆服务器：ssh 2014210***@thumedia.org -p 6349

创建streaming.py: touch streaming.py,并且如下编辑：

<span style="font-size:14px;">#! /usr/bin/python
import logging
import math
import time
pg2count={}
t=1
while 1:
    fp=open('/tmp/hw3.log','r')
    for line in fp:
        line = line.strip()
        times, page, count = line.split()[0],line.split()[1],line.split()[2]
        if count.isdigit() & page.startswith('Page-'):
            try:
                pg2count[page] = [pg2count[page][0] + int(count),t]             
            except:
                pg2count[page] = [int(count),t]
    fp.close()
    a=sorted(pg2count.items(), key=lambda page:page[1][0], reverse = True)
    print '%s%s%s' % ('the page rank at current time ',times,' is:')
    for i in range(0,10):
        print '%s\t%d' % (a[i][0],a[i][1][0])
    logger = logging.getLogger()
    #set loghandler 
    file = logging.FileHandler("output.log")
    logger.addHandler(file)
    #set formater   
    formatter = logging.Formatter("%(asctime)s %(levelname)s %(message)s")
    file.setFormatter(formatter)
    #set log level 
    logger.setLevel(logging.NOTSET)
    logger.info('%s%s%s' % ('the page rank at current time ',times,' is:'))
    for i in range(0,10):
        logger.info('%s\t%d' % (a[i][0],a[i][1][0]))
        time.sleep(60)</span>

2、写好代码之后测试运行：python streaming.py输出如下：

nohup: ignoring input and appending output to `nohup.out'，则表示后台运行成功，输出显示会保存到nohup.out中，