mapreduce



[dev@cdh1 python]$ vim genWordCount.py
import random
list =[]
lineNum=0
with open("wordcount.txt","a") as w:
    for i in range(5000000):
        apply = random.randint(3, 10)
        word=''
        for k in range(10):
           for i in range(apply):
               ij = random.randint(65,90) + random.randint(0,1) *32
               word= word + chr(ij)
           word=word+' '
        line =  "%s\n"  % word
        list.append(line)
        lineNum=lineNum+1
        if lineNum==100000:
            w.writelines(list)
            list=[]
            lineNum=0
    w.writelines(list)
print ("0k")



[dev@cdh1 python]$ more wordcount.txt 
OwYVu FZFmf rcTfN hfJPd ktudT kaFSe qnuPb dUBwk LnNyB ccAUP 
Myi xXt eSK ANw bkS VMD Qsm BaK RcN CHs 
vvaDwZJLq XAMdQfThL sGUTdChFo KQdUSauwp nXCWfjTQW xrjtiDVjh rdrUAQFFS fqzhSZjfC DhRrUOnMk HtxsgNXFg 
OhRsZyMAJh qklXriwJCu TgwhmNgbdO EKKYqnAXCK qiPsprOfnf oShHouQBRJ PsvrtsfDfx NJOhGVEtxe RNBVqMrXol RhxgFhXAre 
psR NuZ RwB Ngi NLo KEN rbz MLA YVK MrX 
uUgUEvAEkR uvfMfRHwHy TRFDbrMRVP xYMcmuvlyr dIfwBmMXuE eCNHRWlehQ gphcRCupFa jxVXdCdgCu RTMGMawYUx TSjXsamBAy 
euwT nTOv jLuP wnUh KOpT RLsM TpdN hqHN hsPa VFEx 
JqgVRqI vCTuWLB aHmCreQ zSbYaOq gFBzOXN LCxjYaL bAjWEQM tFOOLSl HgVYXAj SnsDELu 
XQzE gLyB dfpj CuCU KGjZ PyMz Fsox hOUn drZC eNRB 
XszjlJ ObBEqB QCXYAm qoLIWP AZXsuU knUYAd SAhmrH ioHSLu wymRUO NuzeAa 
dikK Kmdb YQNH nifO yVAy rFzz iIUH cWtm PJHF Dcti 
YJLKPr xYgsom oQKlxq dThhgR CYyFqD RYjSHp cfwkpp aypUIy JxBmtZ udQhkW 
psZsnAiA lnSpkbdB sXmepqIh hCOxkcEu LBrDzoDM SZatqPHI tqhPfzyI ZHdyMKuK NGydGlyz bTtWMZnJ 
wqXV iCnN yFSX dEkc XQOs TjGE VbWI AvSq hYxI OrZq 
Jrqevc dRxuQw NBXGPE GaoWhr rsYOZu OMafgd gKKqak XDKKBy tvKiyo SPXRxk 
uEQrSYqiX LJINyYBON NtPnJAeVU FYgcSeOTs kMhyldbte iySsQshxQ JWZbtmPqH MMoUvnalo OKiUctOjH jfpMUAxml 
vQv tkn Txo JaA xJt dUY Wgi rJs GWP jKu 
......



[dev@cdh1 python]$ ll
-rw-rw-r-- 1 dev dev 380015680 3月   8 17:35 wordcount.txt



[dev@cdh1 hadoop-mapreduce]$ find / -name hadoop-mapreduce 2>/dev/null
/home/cdh/cloudera/parcels/CDH-5.13.0-1.cdh5.13.0.p0.29/lib/hadoop-mapreduce
/home/log/hadoop-mapreduce
/var/lib/hadoop-mapreduce

[dev@cdh1 hadoop-mapreduce]$ hadoop fs -mkdir -p /test/wordcount
[dev@cdh1 hadoop-mapreduce]$ hadoop fs -mkdir -p /output


[dev@cdh1 hadoop-mapreduce]$ hadoop fs -put wordcount.txt /test/wordcount


hdfs dfs -rmdir /output/wordcount


[dev@cdh1 python]$ hadoop jar /home/cdh/cloudera/parcels/CDH-5.13.0-1.cdh5.13.0.p0.29/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar  wordcount  /test/wordcount /output/wordcount
19/03/08 17:49:27 INFO client.RMProxy: Connecting to ResourceManager at cdh1/10.3.1.8:8032
19/03/08 17:49:28 INFO input.FileInputFormat: Total input paths to process : 1
19/03/08 17:49:28 INFO mapreduce.JobSubmitter: number of splits:3
19/03/08 17:49:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1544093812828_13063
19/03/08 17:49:32 INFO impl.YarnClientImpl: Submitted application application_1544093812828_13063
19/03/08 17:49:32 INFO mapreduce.Job: The url to track the job: http://cdh1:8088/proxy/application_1544093812828_13063/
19/03/08 17:49:32 INFO mapreduce.Job: Running job: job_1544093812828_13063
19/03/08 17:49:38 INFO mapreduce.Job: Job job_1544093812828_13063 running in uber mode : false
19/03/08 17:49:38 INFO mapreduce.Job:  map 0% reduce 0%
19/03/08 17:49:55 INFO mapreduce.Job:  map 24% reduce 0%
19/03/08 17:49:57 INFO mapreduce.Job:  map 38% reduce 0%
19/03/08 17:50:06 INFO mapreduce.Job:  map 56% reduce 0%
19/03/08 17:50:10 INFO mapreduce.Job:  map 60% reduce 0%
19/03/08 17:50:13 INFO mapreduce.Job:  map 61% reduce 0%
19/03/08 17:50:16 INFO mapreduce.Job:  map 66% reduce 0%
19/03/08 17:50:25 INFO mapreduce.Job:  map 67% reduce 0%
19/03/08 17:50:28 INFO mapreduce.Job:  map 73% reduce 0%
19/03/08 17:50:31 INFO mapreduce.Job:  map 96% reduce 0%
19/03/08 17:50:37 INFO mapreduce.Job:  map 100% reduce 0%
19/03/08 17:50:53 INFO mapreduce.Job:  map 100% reduce 51%
19/03/08 17:50:59 INFO mapreduce.Job:  map 100% reduce 62%
19/03/08 17:51:05 INFO mapreduce.Job:  map 100% reduce 69%
19/03/08 17:51:11 INFO mapreduce.Job:  map 100% reduce 76%
19/03/08 17:51:18 INFO mapreduce.Job:  map 100% reduce 82%
19/03/08 17:51:24 INFO mapreduce.Job:  map 100% reduce 88%
19/03/08 17:51:30 INFO mapreduce.Job:  map 100% reduce 93%
19/03/08 17:51:36 INFO mapreduce.Job:  map 100% reduce 100%
19/03/08 17:51:38 INFO mapreduce.Job: Job job_1544093812828_13063 completed successfully
19/03/08 17:51:39 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=679531081
		FILE: Number of bytes written=1019514163
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=380147082
		HDFS: Number of bytes written=423790270
		HDFS: Number of read operations=12
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=3
		Launched reduce tasks=1
		Data-local map tasks=3
		Total time spent by all maps in occupied slots (ms)=650992
		Total time spent by all reduces in occupied slots (ms)=230552
		Total time spent by all map tasks (ms)=162748
		Total time spent by all reduce tasks (ms)=57638
		Total vcore-milliseconds taken by all map tasks=162748
		Total vcore-milliseconds taken by all reduce tasks=57638
		Total megabyte-milliseconds taken by all map tasks=1666539520
		Total megabyte-milliseconds taken by all reduce tasks=590213120
	Map-Reduce Framework
		Map input records=5000000
		Map output records=50000000
		Map output bytes=575015680
		Map output materialized bytes=339394577
		Input split bytes=330
		Combine input records=81437418
		Combine output records=75023053
		Reduce input groups=41797229
		Reduce shuffle bytes=339394577
		Reduce input records=43585635
		Reduce output records=41797229
		Spilled Records=131625864
		Shuffled Maps =3
		Failed Shuffles=0
		Merged Map outputs=3
		GC time elapsed (ms)=2315
		CPU time spent (ms)=305710
		Physical memory (bytes) snapshot=10213470208
		Virtual memory (bytes) snapshot=43881082880
		Total committed heap usage (bytes)=15340142592
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=380146752
	File Output Format Counters 
		Bytes Written=423790270

		


[dev@cdh1 python]$ hadoop fs -ls /output/wordcount
Found 2 items
-rw-r--r--   3 dev supergroup          0 2019-03-08 17:51 /output/wordcount/_SUCCESS
-rw-r--r--   3 dev supergroup  423790270 2019-03-08 17:51 /output/wordcount/part-r-00000


[dev@cdh1 python]$ hadoop fs -text /output/wordcount/part-r-00000|head -20
AAA	50
AAAAWWm	1
AAAAY	1
AAAAZ	1
AAAAg	1
AAAB	4
AAABbJUK	1
AAABoUvdAd	1
AAAC	2
AAACDYgt	1
AAACEZQ	1
AAACSY	1
AAACxfUj	1
AAACyn	1
AAAD	1
AAADpauX	1
AAADu	1
AAADuVgje	1
AAADwUf	1
AAAE	2



[dev@cdh1 python]$ hadoop fsck /test/wordcount/wordcount.txt -files -blocks -locations
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://cdh1:50070/fsck?ugi=dev&files=1&blocks=1&locations=1&path=%2Ftest%2Fwordcount%2Fwordcount.txt
FSCK started by dev (auth:SIMPLE) from /10.3.1.8 for path /test/wordcount/wordcount.txt at Fri Mar 08 18:04:57 CST 2019
/test/wordcount/wordcount.txt 380015680 bytes, 3 block(s):  OK
0. BP-1123673671-10.3.1.8-1513763234376:blk_1075796371_2055633 len=134217728 Live_repl=3 [DatanodeInfoWithStorage[10.3.1.9:50010,DS-36a9eb13-f4a3-4ba8-bf84-a53e081ebb89,DISK], DatanodeInfoWithStorage[10.3.1.14:50010,DS-c2ae84aa-516f-4e1f-b255-76a00a7cc0f2,DISK], DatanodeInfoWithStorage[10.3.1.13:50010,DS-cf291cd2-a872-490d-8c70-52f6ce58399a,DISK]]
1. BP-1123673671-10.3.1.8-1513763234376:blk_1075796377_2055639 len=134217728 Live_repl=3 [DatanodeInfoWithStorage[10.3.1.14:50010,DS-c2ae84aa-516f-4e1f-b255-76a00a7cc0f2,DISK], DatanodeInfoWithStorage[10.3.1.9:50010,DS-36a9eb13-f4a3-4ba8-bf84-a53e081ebb89,DISK], DatanodeInfoWithStorage[10.3.1.13:50010,DS-cf291cd2-a872-490d-8c70-52f6ce58399a,DISK]]
2. BP-1123673671-10.3.1.8-1513763234376:blk_1075796378_2055640 len=111580224 Live_repl=3 [DatanodeInfoWithStorage[10.3.1.14:50010,DS-c2ae84aa-516f-4e1f-b255-76a00a7cc0f2,DISK], DatanodeInfoWithStorage[10.3.1.9:50010,DS-36a9eb13-f4a3-4ba8-bf84-a53e081ebb89,DISK], DatanodeInfoWithStorage[10.3.1.13:50010,DS-cf291cd2-a872-490d-8c70-52f6ce58399a,DISK]]
Status: HEALTHY
 Total size:	380015680 B
 Total dirs:	0
 Total files:	1
 Total symlinks:		0
 Total blocks (validated):	3 (avg. block size 126671893 B)
 Minimally replicated blocks:	3 (100.0 %)
 Over-replicated blocks:	0 (0.0 %)
 Under-replicated blocks:	0 (0.0 %)
 Mis-replicated blocks:		0 (0.0 %)
 Default replication factor:	3
 Average block replication:	3.0
 Corrupt blocks:		0
 Missing replicas:		0 (0.0 %)
 Number of data-nodes:		3
 Number of racks:		1
FSCK ended at Fri Mar 08 18:04:57 CST 2019 in 0 milliseconds
The filesystem under path '/test/wordcount/wordcount.txt' is HEALTHY


[dev@cdh1 python]$ hadoop fsck /output/wordcount/part-r-00000 -files -blocks -locations
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://cdh1:50070/fsck?ugi=dev&files=1&blocks=1&locations=1&path=%2Foutput%2Fwordcount%2Fpart-r-00000
FSCK started by dev (auth:SIMPLE) from /10.3.1.8 for path /output/wordcount/part-r-00000 at Fri Mar 08 18:03:42 CST 2019
/output/wordcount/part-r-00000 423790270 bytes, 4 block(s):  OK
0. BP-1123673671-10.3.1.8-1513763234376:blk_1075796393_2055655 len=134217728 Live_repl=3 [DatanodeInfoWithStorage[10.3.1.14:50010,DS-c2ae84aa-516f-4e1f-b255-76a00a7cc0f2,DISK], DatanodeInfoWithStorage[10.3.1.9:50010,DS-36a9eb13-f4a3-4ba8-bf84-a53e081ebb89,DISK], DatanodeInfoWithStorage[10.3.1.13:50010,DS-cf291cd2-a872-490d-8c70-52f6ce58399a,DISK]]
1. BP-1123673671-10.3.1.8-1513763234376:blk_1075796394_2055656 len=134217728 Live_repl=3 [DatanodeInfoWithStorage[10.3.1.14:50010,DS-c2ae84aa-516f-4e1f-b255-76a00a7cc0f2,DISK], DatanodeInfoWithStorage[10.3.1.13:50010,DS-cf291cd2-a872-490d-8c70-52f6ce58399a,DISK], DatanodeInfoWithStorage[10.3.1.9:50010,DS-36a9eb13-f4a3-4ba8-bf84-a53e081ebb89,DISK]]
2. BP-1123673671-10.3.1.8-1513763234376:blk_1075796395_2055657 len=134217728 Live_repl=3 [DatanodeInfoWithStorage[10.3.1.14:50010,DS-c2ae84aa-516f-4e1f-b255-76a00a7cc0f2,DISK], DatanodeInfoWithStorage[10.3.1.9:50010,DS-36a9eb13-f4a3-4ba8-bf84-a53e081ebb89,DISK], DatanodeInfoWithStorage[10.3.1.13:50010,DS-cf291cd2-a872-490d-8c70-52f6ce58399a,DISK]]
3. BP-1123673671-10.3.1.8-1513763234376:blk_1075796396_2055658 len=21137086 Live_repl=3 [DatanodeInfoWithStorage[10.3.1.14:50010,DS-c2ae84aa-516f-4e1f-b255-76a00a7cc0f2,DISK], DatanodeInfoWithStorage[10.3.1.13:50010,DS-cf291cd2-a872-490d-8c70-52f6ce58399a,DISK], DatanodeInfoWithStorage[10.3.1.9:50010,DS-36a9eb13-f4a3-4ba8-bf84-a53e081ebb89,DISK]]
Status: HEALTHY
 Total size:	423790270 B
 Total dirs:	0
 Total files:	1
 Total symlinks:		0
 Total blocks (validated):	4 (avg. block size 105947567 B)
 Minimally replicated blocks:	4 (100.0 %)
 Over-replicated blocks:	0 (0.0 %)
 Under-replicated blocks:	0 (0.0 %)
 Mis-replicated blocks:		0 (0.0 %)
 Default replication factor:	3
 Average block replication:	3.0
 Corrupt blocks:		0
 Missing replicas:		0 (0.0 %)
 Number of data-nodes:		3
 Number of racks:		1
FSCK ended at Fri Mar 08 18:03:42 CST 2019 in 1 milliseconds
The filesystem under path '/output/wordcount/part-r-00000' is HEALTHY



 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值