1.启动hadoop
start-all.sh
2.查看状态
hfdsdfsadmin -report
显示:
Configured Capacity: 126818975744 (118.11 GB)
Present Capacity: 107885920256 (100.48 GB)
DFS Remaining: 107885756416 (100.48 GB)
DFS Used: 163840 (160 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 4 (4 total, 0 dead)
Live datanodes:
Name: 210.31.181.218:50010 (hapslave3)
3.
新建input文件夹和output文件夹
./bin/hadoopfs -mkdir /input
./bin/hadoopfs -mkdir /output
hadoop@hapmaster:~/hadoop-2.3.0$ ./bin/hadoop fs -mkdir /input
hadoop@hapmaster:~/hadoop-2.3.0$ ./bin/hadoop fs -mkdir /output
4.
上传输入文件到input文件夹
./bin/hadoopfs -put ~/hadoop/input /input
5.查看文件
hadoop@hapmaster:~/hadoop-2.3.0$ ./bin/hadoop fs -ls /
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2014-04-15 10:46 /input
drwxr-xr-x - hadoop supergroup 0 2014-04-15 16:23 /output
hadoop@hapmaster:~/hadoop-2.3.0$ ./bin/hadoop fs -ls /input
Found 2 items
-rw-r--r-- 3 hadoop supergroup 30 2014-04-15 10:46 /input/filea.txt
-rw-r--r-- 3 hadoop supergroup 42 2014-04-15 10:46 /input/fileb.txt
6.
运行WordCount
./bin/hadoopjar/home/hadoop/hadoop-2.3.0/share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.3.0-sources.jarorg.apache.hadoop.examples.WordCount /input /output
hadoop@hapmaster:~/hadoop-2.3.0$ ./bin/hadoop jar /home/hadoop/hadoop-2.3.0/share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.3.0-sources.jar org.apache.hadoop.examples.WordCount /input /output
7.出错了
14/04/15 16:34:32 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/04/15 16:34:32 WARN security.UserGroupInformation: PriviledgedActionException as:hadoop (auth:SIMPLE) cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://210.31.181.211:9000/output already exists
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://210.31.181.211:9000/output already exists
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:458)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:343)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
8.删掉output文件夹
hadoop@hapmaster:~/hadoop-2.3.0$ ./bin/hadoop fs -rm -r /output
14/04/15 16:39:12 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /output
9.再次运行WordCount
./bin/hadoopjar/home/hadoop/hadoop-2.3.0/share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.3.0-sources.jarorg.apache.hadoop.examples.WordCount /input /output
hadoop@hapmaster:~/hadoop-2.3.0$ ./bin/hadoop jar /home/hadoop/hadoop-2.3.0/share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.3.0-sources.jar org.apache.hadoop.examples.WordCount /input /output/
14/04/15 16:39:19 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/04/15 16:39:20 INFO input.FileInputFormat: Total input paths to process : 2
14/04/15 16:39:20 INFO mapreduce.JobSubmitter: number of splits:2
14/04/15 16:39:21 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1397548796055_0001
14/04/15 16:39:21 INFO impl.YarnClientImpl: Submitted application application_1397548796055_0001
14/04/15 16:39:21 INFO mapreduce.Job: The url to track the job: http://hapmaster:8088/proxy/application_1397548796055_0001/
14/04/15 16:39:21 INFO mapreduce.Job: Running job: job_1397548796055_0001
10.打开链接
http://hapmaster:8088/cluster/app/application_1397548796055_0001
显示如下,貌似哪里不对啊,怎么回事?
修改hadoop配置文件,hadoop安装目录/etc/hadoop目录下的部分,参照http://blog.csdn.net/lrongzheni/article/details/23535491
终于好了。。。
hadoop@hapmaster:~/hadoop-2.3.0$ ./bin/hadoop fs -cat /output/part-r-00000
C 1
hapmaster 1
happy 2
hell 1
hello 5
world 2