Environment
- Google Cloud Platform
- Ubuntu 14.04 LTS
- Instance: 2 cores, 8GB ROM, 50GB storage
- Openjdk-7-jdk/jre
- Hadoop 2.9.2
Part a
i: Single Node Hadoop Setup
The 50070 page of vm-master
under pseudo distributed mode is in folder 1155114915-HW0/a/i
.
ii: Tera Example on Single Node Machine
This example in run on vm-master
instance.
Below is the command and output of job teragen
, it can also be found under 1155114915-HW0/a/ii/teragen.txt
:
ierg5730-gcp-key@vm-master:~/hadoop$ sbin/start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /home/ierg5730-gcp-key/hadoop/logs/hadoop-ierg5730-gcp-key-namenode-vm-master.out
localhost: starting datanode, logging to /home/ierg5730-gcp-key/hadoop/logs/hadoop-ierg5730-gcp-key-datanode-vm-master.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/ierg5730-gcp-key/hadoop/logs/hadoop-ierg5730-gcp-key-secondarynamenode-vm-master.out
starting yarn daemons
starting resourcemanager, logging to /home/ierg5730-gcp-key/hadoop/logs/yarn-ierg5730-gcp-key-resourcemanager-vm-master.out
localhost: starting nodemanager, logging to /home/ierg5730-gcp-key/hadoop/logs/yarn-ierg5730-gcp-key-nodemanager-vm-master.out
ierg5730-gcp-key@vm-master:~/hadoop$ jps
2928 DataNode
2729 NameNode
3483 NodeManager
3721 Jps
3168 SecondaryNameNode
3321 ResourceManager
ierg5730-gcp-key@vm-master:~/hadoop$ ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar teragen 100000 terasort/input
19/01/18 04:38:50 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
19/01/18 04:38:50 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
19/01/18 04:38:50 INFO terasort.TeraGen: Generating 100000 using 1
19/01/18 04:38:50 INFO mapreduce.JobSubmitter: number of splits:1
19/01/18 04:38:51 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1291551154_0001
19/01/18 04:38:51 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
19/01/18 04:38:51 INFO mapreduce.Job: Running job: job_local1291551154_0001
19/01/18 04:38:51 INFO mapred.LocalJobRunner: OutputCommitter set in config null
19/01/18 04:38:51 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
19/01/18 04:38:51 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
19/01/18 04:38:51 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
19/01/18 04:38:51 INFO mapred.LocalJobRunner: Waiting for map tasks
19/01/18 04:38:51 INFO mapred.LocalJobRunner: Starting task: attempt_local1291551154_0001_m_000000_0
19/01/18 04:38:51 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
19/01/18 04:38:51 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
19/01/18 04:38:51 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
19/01/18 04:38:51 INFO mapred.MapTask: Processing split: org.apache.hadoop.examples.terasort.TeraGen$RangeInputFormat$RangeInputSplit@4352d1fc
19/01/18 04:38:52 INFO mapred.LocalJobRunner:
19/01/18 04:38:52 INFO mapreduce.Job: Job job_local1291551154_0001 running in uber mode : false
19/01/18 04:38:52 INFO mapreduce.Job: map 0% reduce 0%
19/01/18 04:38:52 INFO mapred.Task: Task:attempt_local1291551154_0001_m_000000_0 is done. And is in the process of committing
19/01/18 04:38:52 INFO mapred.LocalJobRunner:
19/01/18 04:38:52 INFO mapred.Task: Task attempt_local1291551154_0001_m_000000_0 is allowed to commit now
19/01/18 04:38:52 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1291551154_0001_m_000000_0' to hdfs://localhost:9000/user/ierg5730-gcp-key/terasort/input/_temporary/0/task_local1291551154_0001_m_000000
19/01/18 04:38:52 INFO mapred.LocalJobRunner: map
19/01/18 04:38:52 INFO mapred.Task: Task 'attempt_local1291551154_0001_m_000000_0' done.
19/01/18 04:38:52 INFO mapred.LocalJobRunner: Finishing task: attempt_local1291551154_0001_m_000000_0
19/01/18 04:38:52 INFO mapred.LocalJobRunner: map task executor complete.
19/01/18 04:38:53 INFO mapreduce.Job: map 100% reduce 0%
19/01/18 04:38:53 INFO mapreduce.Job: Job job_local1291551154_0001 completed successfully
19/01/18 04:38:53 INFO mapreduce.Job: Counters: 21
File System Counters
FILE: Number of bytes read=303449
FILE: Number of bytes written=795355
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=0
HDFS: Number of bytes written=10000000
HDFS: Number of read operations=3
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Map-Reduce Framework
Map input records=100000
Map output records=100000
Input split bytes=82
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=15
Total committed heap usage (bytes)=226492416
org.apache.hadoop.examples.terasort.TeraGen$Counters
CHECKSUM=214574985129000
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=10000000
Below is the command and output of job terasort
, it can also be found under 1155114915-HW0/a/ii/terasort.txt
:
ierg5730-gcp-key@vm-master:~/hadoop$ ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar terasort terasort/input terasort/output
19/01/18 05:19:00 INFO terasort.TeraSort: starting
19/01/18 05:19:02 INFO input.FileInputFormat: Total input files to process : 1
Spent 129ms computing base-splits.
Spent 2ms computing TeraScheduler splits.
Computing input splits took 132ms
Sampling 1 splits of 1
Making 1 from 100000 sampled records
Computing parititions took 868ms
Spent 1004ms computing partitions.
19/01/18 05:19:03 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
19/01/18 05:19:03 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
19/01/18 05:19:03 INFO mapreduce.JobSubmitter: number of splits:1
19/01/18 05:19:03 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1534826650_0001
19/01/18 05:19:03 INFO mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-ierg5730-gcp-key/mapred/local/1547788743731/_partition.lst <- /home/ierg5730-gcp-key/hadoop/_partition.lst
19/01/18 05:19:03 INFO mapred.LocalDistributedCacheManager: Localized hdfs://localhost:9000/user/ierg5730-gcp-key/terasort/output/_partition.lst as file:/tmp/hadoop-ierg5730-gcp-key/mapred/local/1547788743731/_partition.lst
19/01/18 05:19:03 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
19/01/18 05:19:03 INFO mapreduce.Job: Running job: job_local1534826650_0001
19/01/18 05:19:03 INFO mapred.LocalJobRunner: OutputCommitter set in config null
19/01/18 05:19:03 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
19/01/18 05:19:03 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
19/01/18 05:19:03 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
19/01/18 05:19:04 INFO mapred.LocalJobRunner: Waiting for map tasks
19/01/18 05:19:04 INFO mapred.LocalJobRunner: Starting task: attempt_local1534826650_0001_m_000000_0
19/01/18 05:19:04 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
19/01/18 05:19:04 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
19/01/18 05:19:04 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
19/01/18 05:19:04 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/user/ierg5730-gcp-key/terasort/input/part-m-00000:0+10000000
19/01/18 05:19:04 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
19/01/18 05:19:04 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
19/01/18 05:19:04 INFO mapred.MapTask: soft limit at 83886080
19/01/18 05:19:04 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
19/01/18 05:19:04 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
19/01/18 05:19:04 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
19/01/18 05:19:04 INFO mapred.LocalJobRunner:
19/01/18 05:19:04 INFO mapred.MapTask: Starting flush of map output
19/01/18 05:19:04 INFO mapred.MapTask: Spilling map output
19/01/18 05:19:04 INFO mapred.MapTask: bufstart = 0; bufend = 10200000; bufvoid = 104857600
19/01/18 05:19:04 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 25814400(103257600); length = 399997/6553600
19/01/18 05:19:04 INFO mapreduce.Job: Job job_local1534826650_0001 running in uber mode : false
19/01/18 05:19:04 INFO mapreduce.Job: map 0% reduce 0%
19/01/18 05:19:05 INFO mapred.MapTask: Finished spill 0
19/01/18 05:19:05 INFO mapred.Task: Task:attempt_local1534826650_0001_m_000000_0 is done. And is in the process of committing
19/01/18 05:19:05 INFO mapred.LocalJobRunner: map
19/01/18 05:19:05 INFO mapred.Task: Task 'attempt_local1534826650_0001_m_000000_0' done.
19/01/18 05:19:05 INFO mapred.LocalJobRunner: Finishing task: attempt_local1534826650_0001_m_000000_0
19/01/18 05:19:05 INFO mapred.LocalJobRunner: map task executor complete.
19/01/18 05:19:05 INFO mapred.LocalJobRunner: Waiting for reduce tasks
19/01/18 05:19:05 INFO mapred.LocalJobRunner: Starting task: attempt_local1534826650_0001_r_000000_0
19/01/18 05:19:05 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
19/01/18 05:19:05 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
19/01/18 05:19:05 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
19/01/18 05:19:05 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@2a2a9206
19/01/18 05:19:05 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=333971456, maxSingleShuffleLimit=83492864, mergeThreshold=220421168, ioSortFactor=10, memToMemMergeOutputsThreshold=10
19/01/18 05:19:05 INFO reduce.EventFetcher: attempt_local1534826650_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
19/01/18 05:19:05 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1534826650_0001_m_000000_0 decomp: 10400002 len: 10400006 to MEMORY
19/01/18 05:19:05 INFO reduce.InMemoryMapOutput: Read 10400002 bytes from map-output for attempt_local1534826650_0001_m_000000_0
19/01/18 05:19:05 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 10400002, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->10400002
19/01/18 05:19:05 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
19/01/18 05:19:05 INFO mapred.LocalJobRunner: 1 / 1 copied.
19/01/18 05:19:05 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
19/01/18 05:19:05 INFO mapred.Merger: Merging 1 sorted segments
19/01/18 05:19:05 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 10399989 bytes
19/01/18 05:19:05 INFO reduce.MergeManagerImpl: Merged 1 segments, 10400002 bytes to disk to satisfy reduce memory limit
19/01/18 05:19:05 INFO reduce.MergeManagerImpl: Merging 1 files, 10400006 bytes from disk
19/01/18 05:19:05 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
19/01/18 05:19:05 INFO mapred.Merger: Merging 1 sorted segments
19/01/18 05:19:05 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 10399989 bytes
19/01/18 05:19:05 INFO mapred.LocalJobRunner: 1 / 1 copied.
19/01/18 05:19:05 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
19/01/18 05:19:05 INFO mapreduce.Job: map 100% reduce 0%
19/01/18 05:19:06 INFO mapred.Task: Task:attempt_local1534826650_0001_r_000000_0 is done. And is in the process of committing
19/01/18 05:19:06 INFO mapred.LocalJobRunner: 1 / 1 copied.
19/01/18 05:19:06 INFO mapred.Task: Task attempt_local1534826650_0001_r_000000_0 is allowed to commit now
19/01/18 05:19:06 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1534826650_0001_r_000000_0' to hdfs://localhost:9000/user/ierg5730-gcp-key/terasort/output/_temporary/0/task_local1534826650_0001_r_000000
19/01/18 05:19:06 INFO mapred.LocalJobRunner: reduce > reduce
19/01/18 05:19:06 INFO mapred.Task: Task 'attempt_local1534826650_0001_r_000000_0' done.
19/01/18 05:19:06 INFO mapred.LocalJobRunner: Finishing task: attempt_local1534826650_0001_r_000000_0
19/01/18 05:19:06 INFO mapred.LocalJobRunner: reduce task executor complete.
19/01/18 05:19:06 INFO mapreduce.Job: map 100% reduce 100%
19/01/18 05:19:06 INFO mapreduce.Job: Job job_local1534826650_0001 completed successfully
19/01/18 05:19:07 INFO mapreduce.Job: Counters: 35
File System Counters
FILE: Number of bytes read=21407160
FILE: Number of bytes written=32796946
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=40000000
HDFS: Number of bytes written=10000000
HDFS: Number of read operations=45
HDFS: Number of large read operations=0
HDFS: Number of write operations=6
Map-Reduce Framework
Map input records=100000
Map output records=100000
Map output bytes=10200000
Map output materialized bytes=10400006
Input split bytes=136
Combine input records=0
Combine output records=0
Reduce input groups=100000
Reduce shuffle bytes=10400006
Reduce input records=100000
Reduce output records=100000
Spilled Records=200000
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=20
Total committed heap usage (bytes)=663748608
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=10000000
File Output Format Counters
Bytes Written=10000000
19/01/18 05:19:07 INFO terasort.TeraSort: done
Below is the command and output of job teravalidate
, it can also be found under 1155114915-HW0/a/ii/teravalidate.txt
:
ierg5730-gcp-key@vm-master:~/hadoop$ ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar teravalidate terasort/output terasort/check
19/01/18 05:22:11 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
19/01/18 05:22:11 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
19/01/18 05:22:11 INFO input.FileInputFormat: Total input files to process : 1
Spent 50ms computing base-splits.
Spent 4ms computing TeraScheduler splits.
19/01/18 05:22:11 INFO mapreduce.JobSubmitter: number of splits:1
19/01/18 05:22:11 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local13918116_0001
19/01/18 05:22:12 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
19/01/18 05:22:12 INFO mapreduce.Job: Running job: job_local13918116_0001
19/01/18 05:22:12 INFO mapred.LocalJobRunner: OutputCommitter set in config null
19/01/18 05:22:12 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
19/01/18 05:22:12 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
19/01/18 05:22:12 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
19/01/18 05:22:12 INFO mapred.LocalJobRunner: Waiting for map tasks
19/01/18 05:22:12 INFO mapred.LocalJobRunner: Starting task: attempt_local13918116_0001_m_000000_0
19/01/18 05:22:12 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
19/01/18 05:22:12 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
19/01/18 05:22:12 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
19/01/18 05:22:12 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/user/ierg5730-gcp-key/terasort/output/part-r-00000:0+10000000
19/01/18 05:22:12 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
19/01/18 05:22:12 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
19/01/18 05:22:12 INFO mapred.MapTask: soft limit at 83886080
19/01/18 05:22:12 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
19/01/18 05:22:12 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
19/01/18 05:22:12 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
19/01/18 05:22:13 INFO mapreduce.Job: Job job_local13918116_0001 running in uber mode : false
19/01/18 05:22:13 INFO mapreduce.Job: map 0% reduce 0%
19/01/18 05:22:13 INFO mapred.LocalJobRunner:
19/01/18 05:22:13 INFO mapred.MapTask: Starting flush of map output
19/01/18 05:22:13 INFO mapred.MapTask: Spilling map output
19/01/18 05:22:13 INFO mapred.MapTask: bufstart = 0; bufend = 80; bufvoid = 104857600
19/01/18 05:22:13 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214388(104857552); length = 9/6553600
19/01/18 05:22:13 INFO mapred.MapTask: Finished spill 0
19/01/18 05:22:13 INFO mapred.Task: Task:attempt_local13918116_0001_m_000000_0 is done. And is in the process of committing
19/01/18 05:22:13 INFO mapred.LocalJobRunner: map
19/01/18 05:22:13 INFO mapred.Task: Task 'attempt_local13918116_0001_m_000000_0' done.
19/01/18 05:22:13 INFO mapred.LocalJobRunner: Finishing task: attempt_local13918116_0001_m_000000_0
19/01/18 05:22:13 INFO mapred.LocalJobRunner: map task executor complete.
19/01/18 05:22:13 INFO mapred.LocalJobRunner: Waiting for reduce tasks
19/01/18 05:22:13 INFO mapred.LocalJobRunner: Starting task: attempt_local13918116_0001_r_000000_0
19/01/18 05:22:13 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
19/01/18 05:22:13 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
19/01/18 05:22:13 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
19/01/18 05:22:13 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@2f3c02d4
19/01/18 05:22:13 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=333971456, maxSingleShuffleLimit=83492864, mergeThreshold=220421168, ioSortFactor=10, memToMemMergeOutputsThreshold=10
19/01/18 05:22:13 INFO reduce.EventFetcher: attempt_local13918116_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
19/01/18 05:22:13 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local13918116_0001_m_000000_0 decomp: 88 len: 92 to MEMORY
19/01/18 05:22:13 INFO reduce.InMemoryMapOutput: Read 88 bytes from map-output for attempt_local13918116_0001_m_000000_0
19/01/18 05:22:13 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 88, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->88
19/01/18 05:22:13 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
19/01/18 05:22:13 INFO mapred.LocalJobRunner: 1 / 1 copied.
19/01/18 05:22:13 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
19/01/18 05:22:13 INFO mapred.Merger: Merging 1 sorted segments
19/01/18 05:22:13 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 77 bytes
19/01/18 05:22:13 INFO reduce.MergeManagerImpl: Merged 1 segments, 88 bytes to disk to satisfy reduce memory limit
19/01/18 05:22:13 INFO reduce.MergeManagerImpl: Merging 1 files, 92 bytes from disk
19/01/18 05:22:13 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
19/01/18 05:22:13 INFO mapred.Merger: Merging 1 sorted segments
19/01/18 05:22:13 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 77 bytes
19/01/18 05:22:13 INFO mapred.LocalJobRunner: 1 / 1 copied.
19/01/18 05:22:13 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
19/01/18 05:22:13 INFO mapred.Task: Task:attempt_local13918116_0001_r_000000_0 is done. And is in the process of committing
19/01/18 05:22:13 INFO mapred.LocalJobRunner: 1 / 1 copied.
19/01/18 05:22:13 INFO mapred.Task: Task attempt_local13918116_0001_r_000000_0 is allowed to commit now
19/01/18 05:22:13 INFO output.FileOutputCommitter: Saved output of task 'attempt_local13918116_0001_r_000000_0' to hdfs://localhost:9000/user/ierg5730-gcp-key/terasort/check/_temporary/0/task_local13918116_0001_r_000000
19/01/18 05:22:13 INFO mapred.LocalJobRunner: reduce > reduce
19/01/18 05:22:13 INFO mapred.Task: Task 'attempt_local13918116_0001_r_000000_0' done.
19/01/18 05:22:13 INFO mapred.LocalJobRunner: Finishing task: attempt_local13918116_0001_r_000000_0
19/01/18 05:22:13 INFO mapred.LocalJobRunner: reduce task executor complete.
19/01/18 05:22:14 INFO mapreduce.Job: map 100% reduce 100%
19/01/18 05:22:14 INFO mapreduce.Job: Job job_local13918116_0001 completed successfully
19/01/18 05:22:14 INFO mapreduce.Job: Counters: 35
File System Counters
FILE: Number of bytes read=607334
FILE: Number of bytes written=1584306
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=20000000
HDFS: Number of bytes written=22
HDFS: Number of read operations=13
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Map-Reduce Framework
Map input records=100000
Map output records=3
Map output bytes=80
Map output materialized bytes=92
Input split bytes=137
Combine input records=0
Combine output records=0
Reduce input groups=3
Reduce shuffle bytes=92
Reduce input records=3
Reduce output records=1
Spilled Records=6
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=230
Total committed heap usage (bytes)=894435328
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=10000000
File Output Format Counters
Bytes Written=22
This is the end of part a.
Part b
i: Four Node Cluster Setup
The 50070 page of the name-node vm-master
can be found under 1155114915-HW0/b/i
.
ii: 2G & 20G Tera Data Gen & Sort
Since 20G tera dataset is really large, I tried to run the sort process 3 times(crashed in the middle) and Google Cloud Platform spend 1000 HKD of free trivial money on this single process!!! (I only have 1000 HKD in my account now).
So, I decied not to run the sort process of 20G tera data.
In stead, I did the Bonus 20 marks: JAVA Word Count job.
Since one line of tera data is 100B, in order to generate 2G and 20G data, we need to generate
2G: 2*1024*1024*1024/100 = 21,474,837 lines
20G: 20*1024*1024*1024/100 = 214,748,365 lines
Below is the command and output of teragen
job for 2G data, it can also be found under 1155114915-HW0/b/ii
ierg5730-gcp-key@vm-master:~/hadoop$ hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar teragen 21474837 terasort/2G-input
19/01/20 01:09:49 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
19/01/20 01:09:51 INFO terasort.TeraGen: Generating 21474837 using 2
19/01/20 01:09:51 INFO mapreduce.JobSubmitter: number of splits:2
19/01/20 01:09:51 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
19/01/20 01:09:52 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1547946384889_0001
19/01/20 01:09:52 INFO impl.YarnClientImpl: Submitted application application_1547946384889_0001
19/01/20 01:09:52 INFO mapreduce.Job: The url to track the job: http://vm-master:8088/proxy/application_1547946384889_0001/
19/01/20 01:09:52 INFO mapreduce.Job: Running job: job_1547946384889_0001
19/01/20 01:10:01 INFO mapreduce.Job: Job job_1547946384889_0001 running in uber mode : false
19/01/20 01:10:01 INFO mapreduce.Job: map 0% reduce 0%
19/01/20 01:10:21 INFO mapreduce.Job: map 17% reduce 0%
19/01/20 01:10:22 INFO mapreduce.Job: map 34% reduce 0%
19/01/20 01:10:28 INFO mapreduce.Job: map 39% reduce 0%
19/01/20 01:10:29 INFO mapreduce.Job: map 44% reduce 0%
19/01/20 01:10:34 INFO mapreduce.Job: map 52% reduce 0%
19/01/20 01:10:35 INFO mapreduce.Job: map 65% reduce 0%
19/01/20 01:10:40 INFO mapreduce.Job: map 73% reduce 0%
19/01/20 01:10:41 INFO mapreduce.Job: map 83% reduce 0%
19/01/20 01:10:43 INFO mapreduce.Job: map 88% reduce 0%
19/01/20 01:10:45 INFO mapreduce.Job: map 100% reduce 0%
19/01/20 01:10:45 INFO mapreduce.Job: Job job_1547946384889_0001 completed successfully
19/01/20 01:10:45 INFO mapreduce.Job: Counters: 31
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=396850
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=167
HDFS: Number of bytes written=2147483700
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Job Counters
Launched map tasks=2
Other local map tasks=2
Total time spent by all maps in occupied slots (ms)=79581
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=79581
Total vcore-milliseconds taken by all map tasks=79581
Total megabyte-milliseconds taken by all map tasks=81490944
Map-Reduce Framework
Map input records=21474837
Map output records=21474837
Input split bytes=167
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=1015
CPU time spent (ms)=42000
Physical memory (bytes) snapshot=400113664
Virtual memory (bytes) snapshot=1678053376
Total committed heap usage (bytes)=304611328
org.apache.hadoop.examples.terasort.TeraGen$Counters
CHECKSUM=46124755272701764
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=2147483700
Below is the command and output of terasort
job for 2G data, it can also be found under 1155114915-HW0/b/ii
Total time spent by all map tasks (ms)=1520230
Total time spent by all reduce tasks (ms)=415210
ierg5730-gcp-key@vm-master:~/hadoop$ hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar terasort terasort/2G-input terasort/2G-output
19/01/20 01:11:53 INFO terasort.TeraSort: starting
19/01/20 01:11:54 INFO input.FileInputFormat: Total input files to process : 2
Spent 134ms computing base-splits.
Spent 3ms computing TeraScheduler splits.
Computing input splits took 138ms
Sampling 10 splits of 16
Making 1 from 100000 sampled records
Computing parititions took 1119ms
Spent 1261ms computing partitions.
19/01/20 01:11:55 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
19/01/20 01:11:56 INFO mapreduce.JobSubmitter: number of splits:16
19/01/20 01:11:56 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
19/01/20 01:11:56 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1547946384889_0004
19/01/20 01:11:56 INFO impl.YarnClientImpl: Submitted application application_1547946384889_0004
19/01/20 01:11:56 INFO mapreduce.Job: The url to track the job: http://vm-master:8088/proxy/application_1547946384889_0004/
19/01/20 01:11:57 INFO mapreduce.Job: Running job: job_1547946384889_0004
19/01/20 01:12:19 INFO mapreduce.Job: Job job_1547946384889_0004 running in uber mode : false
19/01/20 01:12:19 INFO mapreduce.Job: map 0% reduce 0%
19/01/20 01:12:50 INFO mapreduce.Job: map 8% reduce 0%
19/01/20 01:12:51 INFO mapreduce.Job: map 11% reduce 0%
19/01/20 01:12:53 INFO mapreduce.Job: map 14% reduce 0%
19/01/20 01:12:54 INFO mapreduce.Job: map 17% reduce 0%
19/01/20 01:13:02 INFO mapreduce.Job: map 18% reduce 0%
19/01/20 01:13:03 INFO mapreduce.Job: map 19% reduce 0%
19/01/20 01:13:04 INFO mapreduce.Job: map 21% reduce 0%
19/01/20 01:13:05 INFO mapreduce.Job: map 22% reduce 0%
19/01/20 01:13:07 INFO mapreduce.Job: map 24% reduce 0%
19/01/20 01:13:08 INFO mapreduce.Job: map 25% reduce 0%
19/01/20 01:13:14 INFO mapreduce.Job: map 26% reduce 0%
19/01/20 01:13:15 INFO mapreduce.Job: map 27% reduce 0%
19/01/20 01:13:17 INFO mapreduce.Job: map 28% reduce 0%
19/01/20 01:13:19 INFO mapreduce.Job: map 29% reduce 0%
19/01/20 01:13:20 INFO mapreduce.Job: map 30% reduce 0%
19/01/20 01:13:21 INFO mapreduce.Job: map 32% reduce 0%
19/01/20 01:13:22 INFO mapreduce.Job: map 33% reduce 0%
19/01/20 01:13:23 INFO mapreduce.Job: map 34% reduce 0%
19/01/20 01:13:27 INFO mapreduce.Job: map 36% reduce 0%
19/01/20 01:13:28 INFO mapreduce.Job: map 37% reduce 0%
19/01/20 01:13:31 INFO mapreduce.Job: map 38% reduce 0%
19/01/20 01:14:17 INFO mapreduce.Job: map 39% reduce 6%
19/01/20 01:14:19 INFO mapreduce.Job: map 40% reduce 6%
19/01/20 01:14:21 INFO mapreduce.Job: map 42% reduce 6%
19/01/20 01:14:23 INFO mapreduce.Job: map 42% reduce 10%
19/01/20 01:14:24 INFO mapreduce.Job: map 43% reduce 10%
19/01/20 01:14:26 INFO mapreduce.Job: map 46% reduce 10%
19/01/20 01:14:28 INFO mapreduce.Job: map 49% reduce 10%
19/01/20 01:14:29 INFO mapreduce.Job: map 49% reduce 13%
19/01/20 01:14:32 INFO mapreduce.Job: map 50% reduce 13%
19/01/20 01:14:34 INFO mapreduce.Job: map 51% reduce 13%
19/01/20 01:14:49 INFO mapreduce.Job: map 52% reduce 13%
19/01/20 01:14:51 INFO mapreduce.Job: map 55% reduce 13%
19/01/20 01:14:53 INFO mapreduce.Job: map 58% reduce 13%
19/01/20 01:15:10 INFO mapreduce.Job: map 59% reduce 13%
19/01/20 01:15:14 INFO mapreduce.Job: map 60% reduce 13%
19/01/20 01:15:16 INFO mapreduce.Job: map 62% reduce 13%
19/01/20 01:15:17 INFO mapreduce.Job: map 63% reduce 13%
19/01/20 01:15:20 INFO mapreduce.Job: map 64% reduce 13%
19/01/20 01:15:22 INFO mapreduce.Job: map 66% reduce 13%
19/01/20 01:15:24 INFO mapreduce.Job: map 68% reduce 13%
19/01/20 01:15:27 INFO mapreduce.Job: map 69% reduce 13%
19/01/20 01:15:32 INFO mapreduce.Job: map 69% reduce 17%
19/01/20 01:15:38 INFO mapreduce.Job: map 69% reduce 23%
19/01/20 01:16:13 INFO mapreduce.Job: map 71% reduce 23%
19/01/20 01:16:14 INFO mapreduce.Job: map 75% reduce 23%
19/01/20 01:16:15 INFO mapreduce.Job: map 76% reduce 23%
19/01/20 01:16:20 INFO mapreduce.Job: map 78% reduce 23%
19/01/20 01:16:21 INFO mapreduce.Job: map 81% reduce 23%
19/01/20 01:16:22 INFO mapreduce.Job: map 82% reduce 23%
19/01/20 01:16:28 INFO mapreduce.Job: map 83% reduce 23%
19/01/20 01:16:39 INFO mapreduce.Job: map 84% reduce 23%
19/01/20 01:16:45 INFO mapreduce.Job: map 88% reduce 23%
19/01/20 01:16:48 INFO mapreduce.Job: map 89% reduce 23%
19/01/20 01:16:53 INFO mapreduce.Job: map 90% reduce 23%
19/01/20 01:17:04 INFO mapreduce.Job: map 92% reduce 23%
19/01/20 01:17:06 INFO mapreduce.Job: map 93% reduce 23%
19/01/20 01:17:10 INFO mapreduce.Job: map 94% reduce 23%
19/01/20 01:17:11 INFO mapreduce.Job: map 96% reduce 23%
19/01/20 01:17:12 INFO mapreduce.Job: map 97% reduce 23%
19/01/20 01:17:14 INFO mapreduce.Job: map 98% reduce 23%
19/01/20 01:17:16 INFO mapreduce.Job: map 99% reduce 23%
19/01/20 01:17:19 INFO mapreduce.Job: map 100% reduce 23%
19/01/20 01:17:21 INFO mapreduce.Job: map 100% reduce 31%
19/01/20 01:17:28 INFO mapreduce.Job: map 100% reduce 36%
19/01/20 01:17:34 INFO mapreduce.Job: map 100% reduce 44%
19/01/20 01:17:40 INFO mapreduce.Job: map 100% reduce 49%
19/01/20 01:17:46 INFO mapreduce.Job: map 100% reduce 52%
19/01/20 01:17:52 INFO mapreduce.Job: map 100% reduce 56%
19/01/20 01:17:58 INFO mapreduce.Job: map 100% reduce 59%
19/01/20 01:18:04 INFO mapreduce.Job: map 100% reduce 64%
19/01/20 01:18:10 INFO mapreduce.Job: map 100% reduce 67%
19/01/20 01:18:22 INFO mapreduce.Job: map 100% reduce 68%
19/01/20 01:18:28 INFO mapreduce.Job: map 100% reduce 69%
19/01/20 01:18:34 INFO mapreduce.Job: map 100% reduce 71%
19/01/20 01:18:40 INFO mapreduce.Job: map 100% reduce 72%
19/01/20 01:18:46 INFO mapreduce.Job: map 100% reduce 73%
19/01/20 01:18:53 INFO mapreduce.Job: map 100% reduce 75%
19/01/20 01:18:59 INFO mapreduce.Job: map 100% reduce 76%
19/01/20 01:19:05 INFO mapreduce.Job: map 100% reduce 78%
19/01/20 01:19:11 INFO mapreduce.Job: map 100% reduce 80%
19/01/20 01:19:17 INFO mapreduce.Job: map 100% reduce 81%
19/01/20 01:19:23 INFO mapreduce.Job: map 100% reduce 83%
19/01/20 01:19:29 INFO mapreduce.Job: map 100% reduce 85%
19/01/20 01:19:35 INFO mapreduce.Job: map 100% reduce 87%
19/01/20 01:19:41 INFO mapreduce.Job: map 100% reduce 89%
19/01/20 01:19:47 INFO mapreduce.Job: map 100% reduce 91%
19/01/20 01:19:53 INFO mapreduce.Job: map 100% reduce 92%
19/01/20 01:19:59 INFO mapreduce.Job: map 100% reduce 93%
19/01/20 01:20:05 INFO mapreduce.Job: map 100% reduce 94%
19/01/20 01:20:11 INFO mapreduce.Job: map 100% reduce 96%
19/01/20 01:20:17 INFO mapreduce.Job: map 100% reduce 98%
19/01/20 01:20:23 INFO mapreduce.Job: map 100% reduce 100%
19/01/20 01:20:24 INFO mapreduce.Job: Job job_1547946384889_0004 completed successfully
19/01/20 01:20:25 INFO mapreduce.Job: Counters: 51
File System Counters
FILE: Number of bytes read=5443871246
FILE: Number of bytes written=7680652284
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2147485924
HDFS: Number of bytes written=2147483700
HDFS: Number of read operations=51
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Killed map tasks=1
Launched map tasks=17
Launched reduce tasks=1
Data-local map tasks=4
Rack-local map tasks=13
Total time spent by all maps in occupied slots (ms)=1520230
Total time spent by all reduces in occupied slots (ms)=415210
Total time spent by all map tasks (ms)=1520230
Total time spent by all reduce tasks (ms)=415210
Total vcore-milliseconds taken by all map tasks=1520230
Total vcore-milliseconds taken by all reduce tasks=415210
Total megabyte-milliseconds taken by all map tasks=1556715520
Total megabyte-milliseconds taken by all reduce tasks=425175040
Map-Reduce Framework
Map input records=21474837
Map output records=21474837
Map output bytes=2190433374
Map output materialized bytes=2233383144
Input split bytes=2224
Combine input records=0
Combine output records=0
Reduce input groups=21474837
Reduce shuffle bytes=2233383144
Reduce input records=21474837
Reduce output records=21474837
Spilled Records=73819750
Shuffled Maps =16
Failed Shuffles=0
Merged Map outputs=16
GC time elapsed (ms)=36834
CPU time spent (ms)=291070
Physical memory (bytes) snapshot=5058732032
Virtual memory (bytes) snapshot=14205886464
Total committed heap usage (bytes)=3330801664
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=2147483700
File Output Format Counters
Bytes Written=2147483700
19/01/20 01:20:25 INFO terasort.TeraSort: done
Below is the command and output of teragen
job for 20G data, it can also be found under 1155114915-HW0/b/ii
ierg5730-gcp-key@vm-master:~/hadoop$ hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar teragen 214748365 terasort/20G-input
19/01/19 14:14:47 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
19/01/19 14:14:48 INFO terasort.TeraGen: Generating 214748365 using 2
19/01/19 14:14:49 INFO mapreduce.JobSubmitter: number of splits:2
19/01/19 14:14:49 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
19/01/19 14:14:49 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1547907254538_0001
19/01/19 14:14:50 INFO impl.YarnClientImpl: Submitted application application_1547907254538_0001
19/01/19 14:14:50 INFO mapreduce.Job: The url to track the job: http://vm-master:8088/proxy/application_1547907254538_0001/
19/01/19 14:14:50 INFO mapreduce.Job: Running job: job_1547907254538_0001
19/01/19 14:14:59 INFO mapreduce.Job: Job job_1547907254538_0001 running in uber mode : false
19/01/19 14:14:59 INFO mapreduce.Job: map 0% reduce 0%
19/01/19 14:15:18 INFO mapreduce.Job: map 2% reduce 0%
19/01/19 14:15:21 INFO mapreduce.Job: map 3% reduce 0%
19/01/19 14:15:24 INFO mapreduce.Job: map 4% reduce 0%
19/01/19 14:15:27 INFO mapreduce.Job: map 5% reduce 0%
19/01/19 14:15:33 INFO mapreduce.Job: map 6% reduce 0%
19/01/19 14:15:37 INFO mapreduce.Job: map 7% reduce 0%
19/01/19 14:15:39 INFO mapreduce.Job: map 8% reduce 0%
19/01/19 14:15:43 INFO mapreduce.Job: map 9% reduce 0%
19/01/19 14:15:45 INFO mapreduce.Job: map 10% reduce 0%
19/01/19 14:15:51 INFO mapreduce.Job: map 11% reduce 0%
19/01/19 14:15:55 INFO mapreduce.Job: map 13% reduce 0%
19/01/19 14:15:57 INFO mapreduce.Job: map 14% reduce 0%
19/01/19 14:16:01 INFO mapreduce.Job: map 15% reduce 0%
19/01/19 14:16:07 INFO mapreduce.Job: map 16% reduce 0%
19/01/19 14:16:10 INFO mapreduce.Job: map 17% reduce 0%
19/01/19 14:16:13 INFO mapreduce.Job: map 19% reduce 0%
19/01/19 14:16:19 INFO mapreduce.Job: map 20% reduce 0%
19/01/19 14:16:22 INFO mapreduce.Job: map 21% reduce 0%
19/01/19 14:16:25 INFO mapreduce.Job: map 22% reduce 0%
19/01/19 14:16:31 INFO mapreduce.Job: map 23% reduce 0%
19/01/19 14:16:34 INFO mapreduce.Job: map 24% reduce 0%
19/01/19 14:16:37 INFO mapreduce.Job: map 25% reduce 0%
19/01/19 14:16:40 INFO mapreduce.Job: map 26% reduce 0%
19/01/19 14:16:44 INFO mapreduce.Job: map 27% reduce 0%
19/01/19 14:16:47 INFO mapreduce.Job: map 28% reduce 0%
19/01/19 14:16:50 INFO mapreduce.Job: map 29% reduce 0%
19/01/19 14:16:53 INFO mapreduce.Job: map 30% reduce 0%
19/01/19 14:16:59 INFO mapreduce.Job: map 31% reduce 0%
19/01/19 14:17:02 INFO mapreduce.Job: map 32% reduce 0%
19/01/19 14:17:05 INFO mapreduce.Job: map 33% reduce 0%
19/01/19 14:17:08 INFO mapreduce.Job: map 34% reduce 0%
19/01/19 14:17:11 INFO mapreduce.Job: map 35% reduce 0%
19/01/19 14:17:17 INFO mapreduce.Job: map 36% reduce 0%
19/01/19 14:17:20 INFO mapreduce.Job: map 37% reduce 0%
19/01/19 14:17:23 INFO mapreduce.Job: map 38% reduce 0%
19/01/19 14:17:26 INFO mapreduce.Job: map 39% reduce 0%
19/01/19 14:17:29 INFO mapreduce.Job: map 40% reduce 0%
19/01/19 14:17:32 INFO mapreduce.Job: map 41% reduce 0%
19/01/19 14:17:35 INFO mapreduce.Job: map 42% reduce 0%
19/01/19 14:17:41 INFO mapreduce.Job: map 43% reduce 0%
19/01/19 14:17:46 INFO mapreduce.Job: map 44% reduce 0%
19/01/19 14:17:47 INFO mapreduce.Job: map 45% reduce 0%
19/01/19 14:17:52 INFO mapreduce.Job: map 47% reduce 0%
19/01/19 14:17:53 INFO mapreduce.Job: map 48% reduce 0%
19/01/19 14:17:58 INFO mapreduce.Job: map 49% reduce 0%
19/01/19 14:17:59 INFO mapreduce.Job: map 50% reduce 0%
19/01/19 14:18:04 INFO mapreduce.Job: map 51% reduce 0%
19/01/19 14:18:10 INFO mapreduce.Job: map 52% reduce 0%
19/01/19 14:18:11 INFO mapreduce.Job: map 53% reduce 0%
19/01/19 14:18:16 INFO mapreduce.Job: map 54% reduce 0%
19/01/19 14:18:17 INFO mapreduce.Job: map 55% reduce 0%
19/01/19 14:18:23 INFO mapreduce.Job: map 57% reduce 0%
19/01/19 14:18:28 INFO mapreduce.Job: map 58% reduce 0%
19/01/19 14:18:29 INFO mapreduce.Job: map 59% reduce 0%
19/01/19 14:18:34 INFO mapreduce.Job: map 60% reduce 0%
19/01/19 14:18:36 INFO mapreduce.Job: map 61% reduce 0%
19/01/19 14:18:40 INFO mapreduce.Job: map 62% reduce 0%
19/01/19 14:18:41 INFO mapreduce.Job: map 63% reduce 0%
19/01/19 14:18:46 INFO mapreduce.Job: map 64% reduce 0%
19/01/19 14:18:47 INFO mapreduce.Job: map 65% reduce 0%
19/01/19 14:18:52 INFO mapreduce.Job: map 66% reduce 0%
19/01/19 14:18:55 INFO mapreduce.Job: map 68% reduce 0%
19/01/19 14:18:58 INFO mapreduce.Job: map 69% reduce 0%
19/01/19 14:19:01 INFO mapreduce.Job: map 70% reduce 0%
19/01/19 14:19:05 INFO mapreduce.Job: map 71% reduce 0%
19/01/19 14:19:11 INFO mapreduce.Job: map 72% reduce 0%
19/01/19 14:19:13 INFO mapreduce.Job: map 73% reduce 0%
19/01/19 14:19:17 INFO mapreduce.Job: map 74% reduce 0%
19/01/19 14:19:20 INFO mapreduce.Job: map 75% reduce 0%
19/01/19 14:19:23 INFO mapreduce.Job: map 76% reduce 0%
19/01/19 14:19:26 INFO mapreduce.Job: map 77% reduce 0%
19/01/19 14:19:29 INFO mapreduce.Job: map 78% reduce 0%
19/01/19 14:19:32 INFO mapreduce.Job: map 79% reduce 0%
19/01/19 14:19:36 INFO mapreduce.Job: map 80% reduce 0%
19/01/19 14:19:38 INFO mapreduce.Job: map 81% reduce 0%
19/01/19 14:19:42 INFO mapreduce.Job: map 82% reduce 0%
19/01/19 14:19:44 INFO mapreduce.Job: map 83% reduce 0%
19/01/19 14:19:48 INFO mapreduce.Job: map 84% reduce 0%
19/01/19 14:19:50 INFO mapreduce.Job: map 86% reduce 0%
19/01/19 14:19:56 INFO mapreduce.Job: map 87% reduce 0%
19/01/19 14:20:00 INFO mapreduce.Job: map 88% reduce 0%
19/01/19 14:20:03 INFO mapreduce.Job: map 89% reduce 0%
19/01/19 14:20:06 INFO mapreduce.Job: map 90% reduce 0%
19/01/19 14:20:09 INFO mapreduce.Job: map 91% reduce 0%
19/01/19 14:20:12 INFO mapreduce.Job: map 92% reduce 0%
19/01/19 14:20:15 INFO mapreduce.Job: map 93% reduce 0%
19/01/19 14:20:18 INFO mapreduce.Job: map 94% reduce 0%
19/01/19 14:20:22 INFO mapreduce.Job: map 95% reduce 0%
19/01/19 14:20:24 INFO mapreduce.Job: map 96% reduce 0%
19/01/19 14:20:28 INFO mapreduce.Job: map 97% reduce 0%
19/01/19 14:20:32 INFO mapreduce.Job: map 98% reduce 0%
19/01/19 14:20:34 INFO mapreduce.Job: map 99% reduce 0%
19/01/19 14:20:37 INFO mapreduce.Job: map 100% reduce 0%
19/01/19 14:20:37 INFO mapreduce.Job: Job job_1547907254538_0001 completed successfully
19/01/19 14:20:37 INFO mapreduce.Job: Counters: 31
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=396854
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=170
HDFS: Number of bytes written=21474836500
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Job Counters
Launched map tasks=2
Other local map tasks=2
Total time spent by all maps in occupied slots (ms)=664402
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=664402
Total vcore-milliseconds taken by all map tasks=664402
Total megabyte-milliseconds taken by all map tasks=680347648
Map-Reduce Framework
Map input records=214748365
Map output records=214748365
Input split bytes=170
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=5884
CPU time spent (ms)=356520
Physical memory (bytes) snapshot=450916352
Virtual memory (bytes) snapshot=1659351040
Total committed heap usage (bytes)=212336640
org.apache.hadoop.examples.terasort.TeraGen$Counters
CHECKSUM=461200258163748239
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=21474836500
This is the end of Part b.
Part c
i: Python 2 Mapper & Reducer
Mapper.py
#!/usr/bin/env python
import sys
# input comes from STDIN (standard input)
for line in sys.stdin:
# remove 'num:' from line
line = line.split(':')[1]
# split the line into words
# remove leading and trailing whitespace
line = line.strip()
words = line.split()
# increase counters
for word in words:
# write the results to STDOUT (standard output);
# what we output here will be the input for the
# Reduce step, i.e. the input for reducer.py
#
# tab-delimited; the trivial word count is 1
print '%s\t%s' % (word, 1)
Reducer.py
#!/usr/bin/env python
from operator import itemgetter
import sys
current_word = None
current_count = 0
word = None
# input comes from STDIN
for line in sys.stdin:
# remove leading and trailing whitespace
line = line.strip()
# parse the input we got from mapper.py
word, count = line.split('\t', 1)
# convert count (currently a string) to int
try:
count = int(count)
except ValueError:
# count was not a number, so silently
# ignore/discard this line
continue
# this IF-switch only works because Hadoop sorts map output
# by key (here: word) before it is passed to the reducer
if current_word == word:
current_count += count
else:
# (1)when current_word is not None
if current_word:
# write result to STDOUT
print '%s\t%s' % (current_word, current_count)
current_count = count
current_word = word
# (2) since in condition (1), we output all the words that are not the last word, so we need to add the last word.
# do not forget to output the last word if needed!
if current_word == word:
print '%s\t%s' % (current_word, current_count)
ii: MapReduce using Hadoop Streaming
Below is the command and output of hadoop streaming mapreduce
, it can also be found under 1155114915-HW0/c/i
.
ierg5730-gcp-key@vm-master:~/hadoop$ hadoop jar ./share/hadoop/tools/lib/hadoop-streaming-2.9.2.jar -file py2-word-count/mapper.py -mapper mapper.py -file py2-word-count/reducer.py -reducer reducer.py -input /hw0C/input/Large-Dataset.txt -output /hw0C/output
19/01/20 07:57:04 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
packageJobJar: [py2-word-count/mapper.py, py2-word-count/reducer.py, /tmp/hadoop-unjar722251595545502703/] [] /tmp/streamjob8862609580622424134.jar tmpDir=null
19/01/20 07:57:07 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
19/01/20 07:57:07 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
19/01/20 07:57:08 INFO mapred.FileInputFormat: Total input files to process : 1
19/01/20 07:57:08 INFO mapreduce.JobSubmitter: number of splits:4
19/01/20 07:57:09 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
19/01/20 07:57:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1547969907570_0017
19/01/20 07:57:10 INFO impl.YarnClientImpl: Submitted application application_1547969907570_0017
19/01/20 07:57:10 INFO mapreduce.Job: The url to track the job: http://vm-master:8088/proxy/application_1547969907570_0017/
19/01/20 07:57:10 INFO mapreduce.Job: Running job: job_1547969907570_0017
19/01/20 08:02:00 INFO mapreduce.Job: Job job_1547969907570_0017 running in uber mode : false
19/01/20 08:02:00 INFO mapreduce.Job: map 0% reduce 0%
19/01/20 08:02:30 INFO mapreduce.Job: map 3% reduce 0%
19/01/20 08:02:31 INFO mapreduce.Job: map 5% reduce 0%
19/01/20 08:02:33 INFO mapreduce.Job: map 24% reduce 0%
19/01/20 08:02:37 INFO mapreduce.Job: map 26% reduce 0%
19/01/20 08:02:40 INFO mapreduce.Job: map 28% reduce 0%
19/01/20 08:02:47 INFO mapreduce.Job: map 36% reduce 0%
19/01/20 08:03:08 INFO mapreduce.Job: map 37% reduce 0%
19/01/20 08:03:15 INFO mapreduce.Job: map 39% reduce 0%
19/01/20 08:03:20 INFO mapreduce.Job: map 39% reduce 8%
19/01/20 08:03:21 INFO mapreduce.Job: map 41% reduce 8%
19/01/20 08:03:22 INFO mapreduce.Job: map 42% reduce 8%
19/01/20 08:03:24 INFO mapreduce.Job: map 44% reduce 8%
19/01/20 08:03:27 INFO mapreduce.Job: map 45% reduce 8%
19/01/20 08:03:30 INFO mapreduce.Job: map 46% reduce 8%
19/01/20 08:03:58 INFO mapreduce.Job: map 47% reduce 8%
19/01/20 08:04:05 INFO mapreduce.Job: map 49% reduce 8%
19/01/20 08:04:08 INFO mapreduce.Job: map 50% reduce 8%
19/01/20 08:04:11 INFO mapreduce.Job: map 51% reduce 8%
19/01/20 08:04:12 INFO mapreduce.Job: map 52% reduce 8%
19/01/20 08:04:15 INFO mapreduce.Job: map 54% reduce 8%
19/01/20 08:04:17 INFO mapreduce.Job: map 55% reduce 8%
19/01/20 08:04:21 INFO mapreduce.Job: map 56% reduce 8%
19/01/20 08:04:55 INFO mapreduce.Job: map 59% reduce 8%
19/01/20 08:04:58 INFO mapreduce.Job: map 60% reduce 8%
19/01/20 08:05:01 INFO mapreduce.Job: map 61% reduce 8%
19/01/20 08:05:02 INFO mapreduce.Job: map 62% reduce 8%
19/01/20 08:05:04 INFO mapreduce.Job: map 63% reduce 8%
19/01/20 08:05:08 INFO mapreduce.Job: map 65% reduce 8%
19/01/20 08:05:14 INFO mapreduce.Job: map 66% reduce 8%
19/01/20 08:05:38 INFO mapreduce.Job: map 68% reduce 8%
19/01/20 08:05:42 INFO mapreduce.Job: map 69% reduce 8%
19/01/20 08:05:45 INFO mapreduce.Job: map 70% reduce 8%
19/01/20 08:05:48 INFO mapreduce.Job: map 71% reduce 8%
19/01/20 08:05:51 INFO mapreduce.Job: map 72% reduce 8%
19/01/20 08:05:58 INFO mapreduce.Job: map 73% reduce 8%
19/01/20 08:06:04 INFO mapreduce.Job: map 74% reduce 8%
19/01/20 08:06:13 INFO mapreduce.Job: map 75% reduce 8%
19/01/20 08:06:25 INFO mapreduce.Job: map 76% reduce 8%
19/01/20 08:06:31 INFO mapreduce.Job: map 77% reduce 8%
19/01/20 08:06:33 INFO mapreduce.Job: map 78% reduce 8%
19/01/20 08:06:40 INFO mapreduce.Job: map 79% reduce 8%
19/01/20 08:06:43 INFO mapreduce.Job: map 80% reduce 8%
19/01/20 08:06:46 INFO mapreduce.Job: map 81% reduce 8%
19/01/20 08:06:49 INFO mapreduce.Job: map 82% reduce 8%
19/01/20 08:06:55 INFO mapreduce.Job: map 83% reduce 8%
19/01/20 08:06:58 INFO mapreduce.Job: map 84% reduce 8%
19/01/20 08:07:04 INFO mapreduce.Job: map 85% reduce 8%
19/01/20 08:07:05 INFO mapreduce.Job: map 86% reduce 8%
19/01/20 08:07:10 INFO mapreduce.Job: map 87% reduce 8%
19/01/20 08:07:14 INFO mapreduce.Job: map 88% reduce 8%
19/01/20 08:07:16 INFO mapreduce.Job: map 89% reduce 8%
19/01/20 08:07:20 INFO mapreduce.Job: map 90% reduce 8%
19/01/20 08:07:22 INFO mapreduce.Job: map 91% reduce 8%
19/01/20 08:07:24 INFO mapreduce.Job: map 91% reduce 17%
19/01/20 08:07:28 INFO mapreduce.Job: map 92% reduce 17%
19/01/20 08:07:35 INFO mapreduce.Job: map 94% reduce 17%
19/01/20 08:07:41 INFO mapreduce.Job: map 95% reduce 17%
19/01/20 08:07:47 INFO mapreduce.Job: map 96% reduce 17%
19/01/20 08:07:51 INFO mapreduce.Job: map 97% reduce 17%
19/01/20 08:07:54 INFO mapreduce.Job: map 98% reduce 17%
19/01/20 08:07:55 INFO mapreduce.Job: map 98% reduce 25%
19/01/20 08:08:06 INFO mapreduce.Job: map 100% reduce 25%
19/01/20 08:08:14 INFO mapreduce.Job: map 100% reduce 67%
19/01/20 08:08:20 INFO mapreduce.Job: map 100% reduce 68%
19/01/20 08:08:26 INFO mapreduce.Job: map 100% reduce 69%
19/01/20 08:08:32 INFO mapreduce.Job: map 100% reduce 71%
19/01/20 08:08:38 INFO mapreduce.Job: map 100% reduce 73%
19/01/20 08:08:44 INFO mapreduce.Job: map 100% reduce 74%
19/01/20 08:08:50 INFO mapreduce.Job: map 100% reduce 76%
19/01/20 08:08:56 INFO mapreduce.Job: map 100% reduce 77%
19/01/20 08:09:02 INFO mapreduce.Job: map 100% reduce 78%
19/01/20 08:09:08 INFO mapreduce.Job: map 100% reduce 80%
19/01/20 08:09:14 INFO mapreduce.Job: map 100% reduce 81%
19/01/20 08:09:20 INFO mapreduce.Job: map 100% reduce 82%
19/01/20 08:09:26 INFO mapreduce.Job: map 100% reduce 84%
19/01/20 08:09:32 INFO mapreduce.Job: map 100% reduce 85%
19/01/20 08:09:38 INFO mapreduce.Job: map 100% reduce 87%
19/01/20 08:09:44 INFO mapreduce.Job: map 100% reduce 88%
19/01/20 08:09:50 INFO mapreduce.Job: map 100% reduce 90%
19/01/20 08:09:56 INFO mapreduce.Job: map 100% reduce 91%
19/01/20 08:10:02 INFO mapreduce.Job: map 100% reduce 92%
19/01/20 08:10:08 INFO mapreduce.Job: map 100% reduce 94%
19/01/20 08:10:14 INFO mapreduce.Job: map 100% reduce 95%
19/01/20 08:10:20 INFO mapreduce.Job: map 100% reduce 96%
19/01/20 08:10:26 INFO mapreduce.Job: map 100% reduce 98%
19/01/20 08:10:32 INFO mapreduce.Job: map 100% reduce 99%
19/01/20 08:10:36 INFO mapreduce.Job: map 100% reduce 100%
19/01/20 08:10:36 INFO mapreduce.Job: Job job_1547969907570_0017 completed successfully
19/01/20 08:10:37 INFO mapreduce.Job: Counters: 50
File System Counters
FILE: Number of bytes read=1183675238
FILE: Number of bytes written=1786909669
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=420032880
HDFS: Number of bytes written=40736746
HDFS: Number of read operations=15
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Killed map tasks=3
Launched map tasks=6
Launched reduce tasks=1
Rack-local map tasks=6
Total time spent by all maps in occupied slots (ms)=1672971
Total time spent by all reduces in occupied slots (ms)=466287
Total time spent by all map tasks (ms)=1672971
Total time spent by all reduce tasks (ms)=466287
Total vcore-milliseconds taken by all map tasks=1672971
Total vcore-milliseconds taken by all reduce tasks=466287
Total megabyte-milliseconds taken by all map tasks=1713122304
Total megabyte-milliseconds taken by all reduce tasks=477477888
Map-Reduce Framework
Map input records=3474848
Map output records=53108124
Map output bytes=496006084
Map output materialized bytes=602222356
Input split bytes=408
Combine input records=0
Combine output records=0
Reduce input groups=4041135
Reduce shuffle bytes=602222356
Reduce input records=53108124
Reduce output records=4041135
Spilled Records=157593576
Shuffled Maps =4
Failed Shuffles=0
Merged Map outputs=4
GC time elapsed (ms)=8200
CPU time spent (ms)=476560
Physical memory (bytes) snapshot=1396105216
Virtual memory (bytes) snapshot=4178202624
Total committed heap usage (bytes)=936378368
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=420032472
File Output Format Counters
Bytes Written=40736746
19/01/20 08:10:37 INFO streaming.StreamJob: Output directory: /hw0C/output
The Result of Mapreduce can be found under 1155114915-HW0/c/ii
.
Part d
Java Map Reduce
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
// remove "num:" from value.
String line = value.toString();
line = line.split(":")[1];
// change value.toString() to line.
StringTokenizer itr = new StringTokenizer(line);
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
}
Job job = new Job(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
ii: Map Reduce using complied java program(.jar)
Below is the command and output, it can also be found under 1155114915-HW0/d/ii
:
ierg5730-gcp-key@vm-master:~/hadoop$ hadoop jar java-word-count/WordCount.jar WordCount /WordCount /WordCount/output
19/01/20 09:46:07 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
19/01/20 09:46:08 INFO input.FileInputFormat: Total input files to process : 1
19/01/20 09:46:08 INFO mapreduce.JobSubmitter: number of splits:4
19/01/20 09:46:09 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
19/01/20 09:46:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1547976927340_0006
19/01/20 09:46:10 INFO impl.YarnClientImpl: Submitted application application_1547976927340_0006
19/01/20 09:46:10 INFO mapreduce.Job: The url to track the job: http://vm-master:8088/proxy/application_1547976927340_0006/
19/01/20 09:46:10 INFO mapreduce.Job: Running job: job_1547976927340_0006
19/01/20 09:46:24 INFO mapreduce.Job: Job job_1547976927340_0006 running in uber mode : false
19/01/20 09:46:24 INFO mapreduce.Job: map 0% reduce 0%
19/01/20 09:46:54 INFO mapreduce.Job: map 27% reduce 0%
19/01/20 09:47:05 INFO mapreduce.Job: map 36% reduce 0%
19/01/20 09:47:25 INFO mapreduce.Job: map 40% reduce 0%
19/01/20 09:47:31 INFO mapreduce.Job: map 44% reduce 0%
19/01/20 09:47:39 INFO mapreduce.Job: map 44% reduce 8%
19/01/20 09:48:01 INFO mapreduce.Job: map 49% reduce 8%
19/01/20 09:48:07 INFO mapreduce.Job: map 53% reduce 8%
19/01/20 09:48:57 INFO mapreduce.Job: map 55% reduce 8%
19/01/20 09:49:03 INFO mapreduce.Job: map 59% reduce 8%
19/01/20 09:49:09 INFO mapreduce.Job: map 62% reduce 8%
19/01/20 09:50:00 INFO mapreduce.Job: map 65% reduce 8%
19/01/20 09:50:05 INFO mapreduce.Job: map 68% reduce 8%
19/01/20 09:50:06 INFO mapreduce.Job: map 71% reduce 8%
19/01/20 09:50:36 INFO mapreduce.Job: map 75% reduce 8%
19/01/20 09:50:42 INFO mapreduce.Job: map 78% reduce 8%
19/01/20 09:50:48 INFO mapreduce.Job: map 80% reduce 8%
19/01/20 09:50:54 INFO mapreduce.Job: map 82% reduce 8%
19/01/20 09:50:58 INFO mapreduce.Job: map 83% reduce 8%
19/01/20 09:51:00 INFO mapreduce.Job: map 83% reduce 17%
19/01/20 09:51:01 INFO mapreduce.Job: map 85% reduce 17%
19/01/20 09:51:06 INFO mapreduce.Job: map 86% reduce 17%
19/01/20 09:51:07 INFO mapreduce.Job: map 89% reduce 17%
19/01/20 09:51:12 INFO mapreduce.Job: map 91% reduce 17%
19/01/20 09:51:13 INFO mapreduce.Job: map 93% reduce 17%
19/01/20 09:51:17 INFO mapreduce.Job: map 95% reduce 17%
19/01/20 09:51:18 INFO mapreduce.Job: map 97% reduce 17%
19/01/20 09:51:23 INFO mapreduce.Job: map 100% reduce 17%
19/01/20 09:51:24 INFO mapreduce.Job: map 100% reduce 67%
19/01/20 09:51:30 INFO mapreduce.Job: map 100% reduce 79%
19/01/20 09:51:36 INFO mapreduce.Job: map 100% reduce 96%
19/01/20 09:51:38 INFO mapreduce.Job: map 100% reduce 100%
19/01/20 09:51:38 INFO mapreduce.Job: Job job_1547976927340_0006 completed successfully
19/01/20 09:51:39 INFO mapreduce.Job: Counters: 50
File System Counters
FILE: Number of bytes read=354682329
FILE: Number of bytes written=472468510
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=420032928
HDFS: Number of bytes written=40736746
HDFS: Number of read operations=15
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Killed map tasks=3
Launched map tasks=6
Launched reduce tasks=1
Data-local map tasks=6
Total time spent by all maps in occupied slots (ms)=1388495
Total time spent by all reduces in occupied slots (ms)=270804
Total time spent by all map tasks (ms)=1388495
Total time spent by all reduce tasks (ms)=270804
Total vcore-milliseconds taken by all map tasks=1388495
Total vcore-milliseconds taken by all reduce tasks=270804
Total megabyte-milliseconds taken by all map tasks=1421818880
Total megabyte-milliseconds taken by all reduce tasks=277303296
Map-Reduce Framework
Map input records=3474848
Map output records=53108124
Map output bytes=602222332
Map output materialized bytes=116792741
Input split bytes=456
Combine input records=70677631
Combine output records=26100472
Reduce input groups=4041135
Reduce shuffle bytes=116792741
Reduce input records=8530965
Reduce output records=4041135
Spilled Records=34631437
Shuffled Maps =4
Failed Shuffles=0
Merged Map outputs=4
GC time elapsed (ms)=9245
CPU time spent (ms)=247070
Physical memory (bytes) snapshot=1396428800
Virtual memory (bytes) snapshot=4191748096
Total committed heap usage (bytes)=977272832
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=420032472
File Output Format Counters
Bytes Written=40736746
The result of this map reduce program can be found under 1155114915-HW0/d/ii
.
Compare Word Count time between Hadoop Streaming Python and Hadoop Java:
The Python result:
Total time spent by all map tasks (ms)=1672971
Total time spent by all reduce tasks (ms)=466287
The JAVA result:
Total time spent by all map tasks (ms)=1388495
Total time spent by all reduce tasks (ms)=270804
The map tasks time for Java is 20%(3/17) less than time spent by python.
However, the reduce tasks time for Java is 50% less than time spent by python.