1、WordCount实例需要如下三个jar包
在/etc/profie中加入以下路径
#打包word count程序
export CLASSPATH="${HADOOP_HOME}/share/hadoop/common/hadoop-common- 2.7.7.jar:${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.7.7.jar:${HADOOP_HOME}/share/hadoop/common/lib/commons-cli-1.2.jar"
PS:WordCount源码及解析在这里
2、
新建java_code目录
[hadoop@master ~]$mkdir java
[hadoop@master ~]$ mv java/ java_code
[hadoop@master ~]$ ls
bigdata java_code packages python
[hadoop@master ~]$ cd java_code/
[hadoop@master java_code]$ ls
[hadoop@master java_code]$ vim WordCount.java
[hadoop@master java_code]$ ls
WordCount.java
[hadoop@master java_code]$ javac WordCount.java
Note: WordCount.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
[hadoop@master java_code]$ ls
WordCount.class WordCount$IntSumReducer.class WordCount.java WordCount$TokenizerMapper.class
[hadoop@master java_code]$ jar -cvf WordCount.jar ./WordCount*.class
added manifest
adding: WordCount.class(in = 1820) (out= 987)(deflated 45%)
adding: WordCount$IntSumReducer.class(in = 1739) (out= 737)(deflated 57%)
adding: WordCount$TokenizerMapper.class(in = 1736) (out= 753)(deflated 56%)
[hadoop@master java_code]$ ls
WordCount.class WordCount.jar WordCount$TokenizerMapper.class
WordCount$IntSumReducer.class WordCount.java
[hadoop@master java_code]$ mkdir input
[hadoop@master java_code]$ echo "this is a test" > ./input/file1
[hadoop@master java_code]$ echo "this is a test,too" > ./input/file2
[hadoop@master java_code]$ ls input/
file1 file2
[hadoop@master java_code]$ ls
input WordCount$IntSumReducer.class WordCount.java
WordCount.class WordCount.jar WordCount$TokenizerMapper.class
[hadoop@master ~]$ ./bigdata/hadoop/sbin/start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
19/04/22 16:30:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [master]
master: starting namenode, logging to /home/hadoop/bigdata/hadoop/logs/hadoop-hadoop-namenode-master.out
node1: starting datanode, logging to /home/hadoop/bigdata/hadoop/logs/hadoop-hadoop-datanode-node1.out
node2: starting datanode, logging to /home/hadoop/bigdata/hadoop/logs/hadoop-hadoop-datanode-node2.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/hadoop/bigdata/hadoop/logs/hadoop-hadoop-secondarynamenode-master.out
19/04/22 16:31:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/bigdata/hadoop/logs/yarn-hadoop-resourcemanager-master.out
node2: nodemanager running as process 7767. Stop it first.
node1: starting nodemanager, logging to /home/hadoop/bigdata/hadoop/logs/yarn-hadoop-nodemanager-node1.out
[hadoop@master ~]$ cd java_code/
[hadoop@master java_code]$ ls
input WordCount$IntSumReducer.class WordCount.java
WordCount.class WordCount.jar WordCount$TokenizerMapper.class
[hadoop@master java_code]$ hdfs dfs -put input/ /
19/04/22 16:36:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@master java_code]$ hadoop fs -ls /input
19/04/22 16:37:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r-- 2 hadoop supergroup 15 2019-04-22 16:36 /input/file1
-rw-r--r-- 2 hadoop supergroup 19 2019-04-22 16:36 /input/file2
[hadoop@master java_code]$ hadoop jar WordCount.jar WordCount input/ output
19/04/22 16:39:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/04/22 16:39:08 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.40.160:8032
19/04/22 16:39:11 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/hadoop/.staging/job_1555921886377_0002
Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://master:9000/user/hadoop/input
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
at WordCount.main(WordCount.java:67)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:226)
at org.apache.hadoop.util.RunJar.main(RunJar.java:141)
[hadoop@master java_code]$ hadoop fs -mkdir -p /user/hadoop/
19/04/22 16:43:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@master java_code]$ hadoop fs -mv /input /user/hadoop/
19/04/22 16:44:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@master java_code]$ hadoop fs -ls /user/hadoop/input
19/04/22 16:44:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r-- 2 hadoop supergroup 15 2019-04-22 16:36 /user/hadoop/input/file1
-rw-r--r-- 2 hadoop supergroup 19 2019-04-22 16:36 /user/hadoop/input/file2
[hadoop@master java_code]$ hadoop jar WordCount.jar WordCount input/ output
19/04/22 16:44:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/04/22 16:44:39 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.40.160:8032
19/04/22 16:44:42 INFO input.FileInputFormat: Total input paths to process : 2
19/04/22 16:44:42 INFO mapreduce.JobSubmitter: number of splits:2
19/04/22 16:44:43 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1555921886377_0003
19/04/22 16:44:44 INFO impl.YarnClientImpl: Submitted application application_1555921886377_0003
19/04/22 16:44:44 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1555921886377_0003/
19/04/22 16:44:44 INFO mapreduce.Job: Running job: job_1555921886377_0003
19/04/22 16:45:14 INFO mapreduce.Job: Job job_1555921886377_0003 running in uber mode : false
19/04/22 16:45:14 INFO mapreduce.Job: map 0% reduce 0%
19/04/22 16:45:47 INFO mapreduce.Job: map 100% reduce 0%
19/04/22 16:46:04 INFO mapreduce.Job: map 100% reduce 100%
19/04/22 16:46:05 INFO mapreduce.Job: Job job_1555921886377_0003 completed successfully
19/04/22 16:46:05 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=88
FILE: Number of bytes written=367972
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=248
HDFS: Number of bytes written=34
HDFS: Number of read operations=9
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=2
Launched reduce tasks=1
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=60870
Total time spent by all reduces in occupied slots (ms)=13371
Total time spent by all map tasks (ms)=60870
Total time spent by all reduce tasks (ms)=13371
Total vcore-milliseconds taken by all map tasks=60870
Total vcore-milliseconds taken by all reduce tasks=13371
Total megabyte-milliseconds taken by all map tasks=62330880
Total megabyte-milliseconds taken by all reduce tasks=13691904
Map-Reduce Framework
Map input records=2
Map output records=8
Map output bytes=66
Map output materialized bytes=94
Input split bytes=214
Combine input records=8
Combine output records=8
Reduce input groups=5
Reduce shuffle bytes=94
Reduce input records=8
Reduce output records=5
Spilled Records=16
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=2741
CPU time spent (ms)=5760
Physical memory (bytes) snapshot=466411520
Virtual memory (bytes) snapshot=6235168768
Total committed heap usage (bytes)=265158656
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=34
File Output Format Counters
Bytes Written=34
[hadoop@master java_code]$ hadoop fs -ls -R /user/hadoop/
19/04/22 16:46:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
drwxr-xr-x - hadoop supergroup 0 2019-04-22 16:36 /user/hadoop/input
-rw-r--r-- 2 hadoop supergroup 15 2019-04-22 16:36 /user/hadoop/input/file1
-rw-r--r-- 2 hadoop supergroup 19 2019-04-22 16:36 /user/hadoop/input/file2
drwxr-xr-x - hadoop supergroup 0 2019-04-22 16:46 /user/hadoop/output
-rw-r--r-- 2 hadoop supergroup 0 2019-04-22 16:46 /user/hadoop/output/_SUCCESS
-rw-r--r-- 2 hadoop supergroup 34 2019-04-22 16:46 /user/hadoop/output/part-r-00000
[hadoop@master java_code]$ hadoop fs -text /user/hadoop/output/part-r-00000
19/04/22 16:47:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
a 2
is 2
test 1
test,too 1
this 2