[dev@cdh1 python]$ vim genWordCount.py
import random
list =[]
lineNum=0
with open("wordcount.txt","a") as w:
for i in range(5000000):
apply = random.randint(3, 10)
word=''
for k in range(10):
for i in range(apply):
ij = random.randint(65,90) + random.randint(0,1) *32
word= word + chr(ij)
word=word+' '
line = "%s\n" % word
list.append(line)
lineNum=lineNum+1
if lineNum==100000:
w.writelines(list)
list=[]
lineNum=0
w.writelines(list)
print ("0k")
[dev@cdh1 python]$ more wordcount.txt
OwYVu FZFmf rcTfN hfJPd ktudT kaFSe qnuPb dUBwk LnNyB ccAUP
Myi xXt eSK ANw bkS VMD Qsm BaK RcN CHs
vvaDwZJLq XAMdQfThL sGUTdChFo KQdUSauwp nXCWfjTQW xrjtiDVjh rdrUAQFFS fqzhSZjfC DhRrUOnMk HtxsgNXFg
OhRsZyMAJh qklXriwJCu TgwhmNgbdO EKKYqnAXCK qiPsprOfnf oShHouQBRJ PsvrtsfDfx NJOhGVEtxe RNBVqMrXol RhxgFhXAre
psR NuZ RwB Ngi NLo KEN rbz MLA YVK MrX
uUgUEvAEkR uvfMfRHwHy TRFDbrMRVP xYMcmuvlyr dIfwBmMXuE eCNHRWlehQ gphcRCupFa jxVXdCdgCu RTMGMawYUx TSjXsamBAy
euwT nTOv jLuP wnUh KOpT RLsM TpdN hqHN hsPa VFEx
JqgVRqI vCTuWLB aHmCreQ zSbYaOq gFBzOXN LCxjYaL bAjWEQM tFOOLSl HgVYXAj SnsDELu
XQzE gLyB dfpj CuCU KGjZ PyMz Fsox hOUn drZC eNRB
XszjlJ ObBEqB QCXYAm qoLIWP AZXsuU knUYAd SAhmrH ioHSLu wymRUO NuzeAa
dikK Kmdb YQNH nifO yVAy rFzz iIUH cWtm PJHF Dcti
YJLKPr xYgsom oQKlxq dThhgR CYyFqD RYjSHp cfwkpp aypUIy JxBmtZ udQhkW
psZsnAiA lnSpkbdB sXmepqIh hCOxkcEu LBrDzoDM SZatqPHI tqhPfzyI ZHdyMKuK NGydGlyz bTtWMZnJ
wqXV iCnN yFSX dEkc XQOs TjGE VbWI AvSq hYxI OrZq
Jrqevc dRxuQw NBXGPE GaoWhr rsYOZu OMafgd gKKqak XDKKBy tvKiyo SPXRxk
uEQrSYqiX LJINyYBON NtPnJAeVU FYgcSeOTs kMhyldbte iySsQshxQ JWZbtmPqH MMoUvnalo OKiUctOjH jfpMUAxml
vQv tkn Txo JaA xJt dUY Wgi rJs GWP jKu
......
[dev@cdh1 python]$ ll
-rw-rw-r-- 1 dev dev 380015680 3月 8 17:35 wordcount.txt
[dev@cdh1 hadoop-mapreduce]$ find / -name hadoop-mapreduce 2>/dev/null
/home/cdh/cloudera/parcels/CDH-5.13.0-1.cdh5.13.0.p0.29/lib/hadoop-mapreduce
/home/log/hadoop-mapreduce
/var/lib/hadoop-mapreduce
[dev@cdh1 hadoop-mapreduce]$ hadoop fs -mkdir -p /test/wordcount
[dev@cdh1 hadoop-mapreduce]$ hadoop fs -mkdir -p /output
[dev@cdh1 hadoop-mapreduce]$ hadoop fs -put wordcount.txt /test/wordcount
hdfs dfs -rmdir /output/wordcount
[dev@cdh1 python]$ hadoop jar /home/cdh/cloudera/parcels/CDH-5.13.0-1.cdh5.13.0.p0.29/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount /test/wordcount /output/wordcount
19/03/08 17:49:27 INFO client.RMProxy: Connecting to ResourceManager at cdh1/10.3.1.8:8032
19/03/08 17:49:28 INFO input.FileInputFormat: Total input paths to process : 1
19/03/08 17:49:28 INFO mapreduce.JobSubmitter: number of splits:3
19/03/08 17:49:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1544093812828_13063
19/03/08 17:49:32 INFO impl.YarnClientImpl: Submitted application application_1544093812828_13063
19/03/08 17:49:32 INFO mapreduce.Job: The url to track the job: http://cdh1:8088/proxy/application_1544093812828_13063/
19/03/08 17:49:32 INFO mapreduce.Job: Running job: job_1544093812828_13063
19/03/08 17:49:38 INFO mapreduce.Job: Job job_1544093812828_13063 running in uber mode : false
19/03/08 17:49:38 INFO mapreduce.Job: map 0% reduce 0%
19/03/08 17:49:55 INFO mapreduce.Job: map 24% reduce 0%
19/03/08 17:49:57 INFO mapreduce.Job: map 38% reduce 0%
19/03/08 17:50:06 INFO mapreduce.Job: map 56% reduce 0%
19/03/08 17:50:10 INFO mapreduce.Job: map 60% reduce 0%
19/03/08 17:50:13 INFO mapreduce.Job: map 61% reduce 0%
19/03/08 17:50:16 INFO mapreduce.Job: map 66% reduce 0%
19/03/08 17:50:25 INFO mapreduce.Job: map 67% reduce 0%
19/03/08 17:50:28 INFO mapreduce.Job: map 73% reduce 0%
19/03/08 17:50:31 INFO mapreduce.Job: map 96% reduce 0%
19/03/08 17:50:37 INFO mapreduce.Job: map 100% reduce 0%
19/03/08 17:50:53 INFO mapreduce.Job: map 100% reduce 51%
19/03/08 17:50:59 INFO mapreduce.Job: map 100% reduce 62%
19/03/08 17:51:05 INFO mapreduce.Job: map 100% reduce 69%
19/03/08 17:51:11 INFO mapreduce.Job: map 100% reduce 76%
19/03/08 17:51:18 INFO mapreduce.Job: map 100% reduce 82%
19/03/08 17:51:24 INFO mapreduce.Job: map 100% reduce 88%
19/03/08 17:51:30 INFO mapreduce.Job: map 100% reduce 93%
19/03/08 17:51:36 INFO mapreduce.Job: map 100% reduce 100%
19/03/08 17:51:38 INFO mapreduce.Job: Job job_1544093812828_13063 completed successfully
19/03/08 17:51:39 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=679531081
FILE: Number of bytes written=1019514163
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=380147082
HDFS: Number of bytes written=423790270
HDFS: Number of read operations=12
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=3
Launched reduce tasks=1
Data-local map tasks=3
Total time spent by all maps in occupied slots (ms)=650992
Total time spent by all reduces in occupied slots (ms)=230552
Total time spent by all map tasks (ms)=162748
Total time spent by all reduce tasks (ms)=57638
Total vcore-milliseconds taken by all map tasks=162748
Total vcore-milliseconds taken by all reduce tasks=57638
Total megabyte-milliseconds taken by all map tasks=1666539520
Total megabyte-milliseconds taken by all reduce tasks=590213120
Map-Reduce Framework
Map input records=5000000
Map output records=50000000
Map output bytes=575015680
Map output materialized bytes=339394577
Input split bytes=330
Combine input records=81437418
Combine output records=75023053
Reduce input groups=41797229
Reduce shuffle bytes=339394577
Reduce input records=43585635
Reduce output records=41797229
Spilled Records=131625864
Shuffled Maps =3
Failed Shuffles=0
Merged Map outputs=3
GC time elapsed (ms)=2315
CPU time spent (ms)=305710
Physical memory (bytes) snapshot=10213470208
Virtual memory (bytes) snapshot=43881082880
Total committed heap usage (bytes)=15340142592
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=380146752
File Output Format Counters
Bytes Written=423790270
[dev@cdh1 python]$ hadoop fs -ls /output/wordcount
Found 2 items
-rw-r--r-- 3 dev supergroup 0 2019-03-08 17:51 /output/wordcount/_SUCCESS
-rw-r--r-- 3 dev supergroup 423790270 2019-03-08 17:51 /output/wordcount/part-r-00000
[dev@cdh1 python]$ hadoop fs -text /output/wordcount/part-r-00000|head -20
AAA 50
AAAAWWm 1
AAAAY 1
AAAAZ 1
AAAAg 1
AAAB 4
AAABbJUK 1
AAABoUvdAd 1
AAAC 2
AAACDYgt 1
AAACEZQ 1
AAACSY 1
AAACxfUj 1
AAACyn 1
AAAD 1
AAADpauX 1
AAADu 1
AAADuVgje 1
AAADwUf 1
AAAE 2
[dev@cdh1 python]$ hadoop fsck /test/wordcount/wordcount.txt -files -blocks -locations
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
Connecting to namenode via http://cdh1:50070/fsck?ugi=dev&files=1&blocks=1&locations=1&path=%2Ftest%2Fwordcount%2Fwordcount.txt
FSCK started by dev (auth:SIMPLE) from /10.3.1.8 for path /test/wordcount/wordcount.txt at Fri Mar 08 18:04:57 CST 2019
/test/wordcount/wordcount.txt 380015680 bytes, 3 block(s): OK
0. BP-1123673671-10.3.1.8-1513763234376:blk_1075796371_2055633 len=134217728 Live_repl=3 [DatanodeInfoWithStorage[10.3.1.9:50010,DS-36a9eb13-f4a3-4ba8-bf84-a53e081ebb89,DISK], DatanodeInfoWithStorage[10.3.1.14:50010,DS-c2ae84aa-516f-4e1f-b255-76a00a7cc0f2,DISK], DatanodeInfoWithStorage[10.3.1.13:50010,DS-cf291cd2-a872-490d-8c70-52f6ce58399a,DISK]]
1. BP-1123673671-10.3.1.8-1513763234376:blk_1075796377_2055639 len=134217728 Live_repl=3 [DatanodeInfoWithStorage[10.3.1.14:50010,DS-c2ae84aa-516f-4e1f-b255-76a00a7cc0f2,DISK], DatanodeInfoWithStorage[10.3.1.9:50010,DS-36a9eb13-f4a3-4ba8-bf84-a53e081ebb89,DISK], DatanodeInfoWithStorage[10.3.1.13:50010,DS-cf291cd2-a872-490d-8c70-52f6ce58399a,DISK]]
2. BP-1123673671-10.3.1.8-1513763234376:blk_1075796378_2055640 len=111580224 Live_repl=3 [DatanodeInfoWithStorage[10.3.1.14:50010,DS-c2ae84aa-516f-4e1f-b255-76a00a7cc0f2,DISK], DatanodeInfoWithStorage[10.3.1.9:50010,DS-36a9eb13-f4a3-4ba8-bf84-a53e081ebb89,DISK], DatanodeInfoWithStorage[10.3.1.13:50010,DS-cf291cd2-a872-490d-8c70-52f6ce58399a,DISK]]
Status: HEALTHY
Total size: 380015680 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 3 (avg. block size 126671893 B)
Minimally replicated blocks: 3 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Fri Mar 08 18:04:57 CST 2019 in 0 milliseconds
The filesystem under path '/test/wordcount/wordcount.txt' is HEALTHY
[dev@cdh1 python]$ hadoop fsck /output/wordcount/part-r-00000 -files -blocks -locations
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
Connecting to namenode via http://cdh1:50070/fsck?ugi=dev&files=1&blocks=1&locations=1&path=%2Foutput%2Fwordcount%2Fpart-r-00000
FSCK started by dev (auth:SIMPLE) from /10.3.1.8 for path /output/wordcount/part-r-00000 at Fri Mar 08 18:03:42 CST 2019
/output/wordcount/part-r-00000 423790270 bytes, 4 block(s): OK
0. BP-1123673671-10.3.1.8-1513763234376:blk_1075796393_2055655 len=134217728 Live_repl=3 [DatanodeInfoWithStorage[10.3.1.14:50010,DS-c2ae84aa-516f-4e1f-b255-76a00a7cc0f2,DISK], DatanodeInfoWithStorage[10.3.1.9:50010,DS-36a9eb13-f4a3-4ba8-bf84-a53e081ebb89,DISK], DatanodeInfoWithStorage[10.3.1.13:50010,DS-cf291cd2-a872-490d-8c70-52f6ce58399a,DISK]]
1. BP-1123673671-10.3.1.8-1513763234376:blk_1075796394_2055656 len=134217728 Live_repl=3 [DatanodeInfoWithStorage[10.3.1.14:50010,DS-c2ae84aa-516f-4e1f-b255-76a00a7cc0f2,DISK], DatanodeInfoWithStorage[10.3.1.13:50010,DS-cf291cd2-a872-490d-8c70-52f6ce58399a,DISK], DatanodeInfoWithStorage[10.3.1.9:50010,DS-36a9eb13-f4a3-4ba8-bf84-a53e081ebb89,DISK]]
2. BP-1123673671-10.3.1.8-1513763234376:blk_1075796395_2055657 len=134217728 Live_repl=3 [DatanodeInfoWithStorage[10.3.1.14:50010,DS-c2ae84aa-516f-4e1f-b255-76a00a7cc0f2,DISK], DatanodeInfoWithStorage[10.3.1.9:50010,DS-36a9eb13-f4a3-4ba8-bf84-a53e081ebb89,DISK], DatanodeInfoWithStorage[10.3.1.13:50010,DS-cf291cd2-a872-490d-8c70-52f6ce58399a,DISK]]
3. BP-1123673671-10.3.1.8-1513763234376:blk_1075796396_2055658 len=21137086 Live_repl=3 [DatanodeInfoWithStorage[10.3.1.14:50010,DS-c2ae84aa-516f-4e1f-b255-76a00a7cc0f2,DISK], DatanodeInfoWithStorage[10.3.1.13:50010,DS-cf291cd2-a872-490d-8c70-52f6ce58399a,DISK], DatanodeInfoWithStorage[10.3.1.9:50010,DS-36a9eb13-f4a3-4ba8-bf84-a53e081ebb89,DISK]]
Status: HEALTHY
Total size: 423790270 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 4 (avg. block size 105947567 B)
Minimally replicated blocks: 4 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Fri Mar 08 18:03:42 CST 2019 in 1 milliseconds
The filesystem under path '/output/wordcount/part-r-00000' is HEALTHY