python访问网站时携带ip信息_Python之——网站访问来源IP统计

转载请注明出处:http://blog.csdn.net/l1028386804/article/details/79057671

一、场景描述

二、实现MapReduce

【/usr/local/python/source/ipstat.py】

# -*- coding:UTF-8 -*-

'''

Created on 2018年1月14日

@author: liuyazhuang

'''

from mrjob.job import MRJob

import re

#定义IP正则匹配

IP_RE = re.compile(r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}")

class MRCount(MRJob):

def mapper(self, key, line):

#匹配IP正则后生成key:value,其中key为IP地址,value初始值为1

for ip in IP_RE.findall(line):

yield ip, 1

def reducer(self, ip, occurrences):

yield ip, sum(occurrences)

if __name__ == '__main__':

MRCount.run()

三、生成MapReduce任务

执行命令:

python ipstat.py -r hadoop --jobconf mapreduce.job.priority=VERY_HIGH --jobconf mapreduce.map.tasks=2 --jobconf mapduce.reduce.tasks=1 -o hdfs://liuyazhuang121:9000/output/ipstat hdfs://liuyazhuang121:9000/user/root/website.com/20180114打印的日志如下:

[root@liuyazhuang121 source]# python ipstat.py -r hadoop --jobconf mapreduce.job.priority=VERY_HIGH --jobconf mapreduce.map.tasks=2 --jobconf mapduce.reduce.tasks=1 -o hdfs://liuyazhuang121:9000/output/ipstat hdfs://liuyazhuang121:9000/user/root/website.com/20180114

No configs found; falling back on auto-configuration

No configs specified for hadoop runner

Looking for hadoop binary in $PATH...

Found hadoop binary: /usr/local/hadoop-2.5.2/bin/hadoop

Using Hadoop version 2.5.2

Looking for Hadoop streaming jar in /usr/local/hadoop-2.5.2...

Found Hadoop streaming jar: /usr/local/hadoop-2.5.2/share/hadoop/tools/lib/hadoop-streaming-2.5.2.jar

Creating temp directory /tmp/ipstat.root.20180114.091040.605990

Copying local files to hdfs:///user/root/tmp/mrjob/ipstat.root.20180114.091040.605990/files/...

Running step 1 of 1...

packageJobJar: [/usr/local/hadoop-2.5.2/tmp/hadoop-unjar4828642106994965791/] [] /tmp/streamjob4775985125407933464.jar tmpDir=null

Connecting to ResourceManager at liuyazhuang121/192.168.209.121:8032

Connecting to ResourceManager at liuyazhuang121/192.168.209.121:8032

Total input paths to process : 1

number of splits:2

Submitting tokens for job: job_1515893542122_0010

Submitted application application_1515893542122_0010

The url to track the job: http://liuyazhuang121:8088/proxy/application_1515893542122_0010/

Running job: job_1515893542122_0010

Job job_1515893542122_0010 running in uber mode : false

map 0% reduce 0%

map 100% reduce 0%

map 100% reduce 100%

Job job_1515893542122_0010 completed successfully

Output directory: hdfs://liuyazhuang121:9000/output/ipstat

Counters: 49

File Input Format Counters

Bytes Read=2355499

File Output Format Counters

Bytes Written=303

File System Counters

FILE: Number of bytes read=176261

FILE: Number of bytes written=657303

FILE: Number of large read operations=0

FILE: Number of read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=2355749

HDFS: Number of bytes written=303

HDFS: Number of large read operations=0

HDFS: Number of read operations=9

HDFS: Number of write operations=2

Job Counters

Data-local map tasks=2

Launched map tasks=2

Launched reduce tasks=1

Total megabyte-seconds taken by all map tasks=7339008

Total megabyte-seconds taken by all reduce tasks=3062784

Total time spent by all map tasks (ms)=7167

Total time spent by all maps in occupied slots (ms)=7167

Total time spent by all reduce tasks (ms)=2991

Total time spent by all reduces in occupied slots (ms)=2991

Total vcore-seconds taken by all map tasks=7167

Total vcore-seconds taken by all reduce tasks=2991

Map-Reduce Framework

CPU time spent (ms)=3780

Combine input records=0

Combine output records=0

Failed Shuffles=0

GC time elapsed (ms)=77

Input split bytes=250

Map input records=7555

Map output bytes=154577

Map output materialized bytes=176267

Map output records=10839

Merged Map outputs=2

Physical memory (bytes) snapshot=656932864

Reduce input groups=19

Reduce input records=10839

Reduce output records=19

Reduce shuffle bytes=176267

Shuffled Maps =2

Spilled Records=21678

Total committed heap usage (bytes)=468189184

Virtual memory (bytes) snapshot=2660089856

Shuffle Errors

BAD_ID=0

CONNECTION=0

IO_ERROR=0

WRONG_LENGTH=0

WRONG_MAP=0

WRONG_REDUCE=0

Streaming final output from hdfs://liuyazhuang121:9000/output/ipstat...

"10.2.2.105" 6

"10.2.2.113" 94

"10.2.2.116" 125

"10.2.2.144" 176

"10.2.2.186" 64

"10.2.2.190" 41

"10.2.2.2" 2925

"10.2.2.209" 921

"10.2.2.230" 424

"10.2.2.234" 1889

"10.2.2.24" 733

"10.2.2.250" 2018

"10.2.2.44" 40

"10.2.2.54" 1138

"10.2.2.86" 109

"10.2.2.95" 86

"10.2.2.97" 43

"8.8.3.167" 6

"9.0.6.0" 1

Removing HDFS temp directory hdfs:///user/root/tmp/mrjob/ipstat.root.20180114.091040.605990...

Removing temp directory /tmp/ipstat.root.20180114.091040.605990...我们可以看到,打印出了相关的结果。

四、验证结果

输入命令:

hadoop fs -ls /output/ipstat查看输出的结果文件如下:

[root@liuyazhuang121 source]# hadoop fs -ls /output/ipstat

Found 2 items

-rw-r--r-- 1 root supergroup 0 2018-01-14 17:11 /output/ipstat/_SUCCESS

-rw-r--r-- 1 root supergroup 303 2018-01-14 17:11 /output/ipstat/part-00000此时我们执行命令:

hadoop fs -cat /output/ipstat/part-00000查看输出结果如下:

[root@liuyazhuang121 source]# hadoop fs -cat /output/ipstat/part-00000

"10.2.2.105" 6

"10.2.2.113" 94

"10.2.2.116" 125

"10.2.2.144" 176

"10.2.2.186" 64

"10.2.2.190" 41

"10.2.2.2" 2925

"10.2.2.209" 921

"10.2.2.230" 424

"10.2.2.234" 1889

"10.2.2.24" 733

"10.2.2.250" 2018

"10.2.2.44" 40

"10.2.2.54" 1138

"10.2.2.86" 109

"10.2.2.95" 86

"10.2.2.97" 43

"8.8.3.167" 6

"9.0.6.0" 1

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值