hadoop1-单机模式WordCount

最新推荐文章于 2022-11-25 17:35:00 发布

weixin_30263277

最新推荐文章于 2022-11-25 17:35:00 发布

阅读量137

点赞数

原文链接：http://www.cnblogs.com/truezq/p/6368926.html

版权

1. 参考资料

类似文章太多了，所以请先上网搜索。

主要参考了下面文章：

Ubuntu上搭建Hadoop环境（单机模式+伪分布模式） - 狂奔的蜗牛 - 博客频道 - CSDN.NET

http://blog.csdn.net/hitwengqi/article/details/8008203

Hadoop下面WordCount运行详解

http://www.cnblogs.com/madyina/p/3708153.html

2. 学习环境

WINDOWS7 家庭高级版，64位系统

VMware Workstation v12.1.1

里面的虚拟机安装的操作系统ubuntu-16.04-desktop-amd64.iso

hadoop-2.7.2.tar.gz

3. 异常情况

3.1. mkdir hdfsInput就出错- 9000 failed on connection exception

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop fs -mkdir hdfsInput

17/01/08 07:05:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

mkdir: Call From ubuntu/127.0.1.1 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused

hadoop@ubuntu:/usr/local/hadoop$ ll hdfsInput

ls: cannot access 'hdfsInput': No such file or directory

hadoop@ubuntu:/usr/local/hadoop$

hadoop环境配置过程中可能遇到问题的解决方案

http://blog.csdn.net/yutianzuijin/article/details/9455319

当在伪分布式环境下运行wordcount示例时，如果报上述错误说明未启动hadoop，利用start-all.sh脚本启动hadoop环境。

hadoop@ubuntu:/usr/local/hadoop/sbin$ ./start-all.sh

This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh

17/01/08 07:58:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Starting namenodes on [localhost]

localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-namenode-ubuntu.out

localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-ubuntu.out

Starting secondary namenodes [0.0.0.0]

0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-secondarynamenode-ubuntu.out

17/01/08 07:59:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

starting yarn daemons

starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-resourcemanager-ubuntu.out

localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-ubuntu.out

hadoop@ubuntu:/usr/local/hadoop/sbin$ cd ..

hadoop@ubuntu:/usr/local/hadoop$ hadoop fs -mkdir hdfsInput

17/01/08 07:59:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

mkdir: `hdfsInput': No such file or directory

hadoop@ubuntu:/usr/local/hadoop$

果然解决。

原因是在LINUX上配置了伪分布式环境，所以单机模式下的运行方法出错了。

3.2. hadoop: command not found

hadoop@ubuntu:/usr/local/hadoop$ hadoop

hadoop: command not found

可以这样用：hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop

或者：

export PATH=$PATH:/usr/local/hadoop/bin

让环境变量配置生效source

source /usr/local/hadoop/etc/hadoop/hadoop-env.sh

3.3. Not a valid JAR: /usr/local/hadoop/hadoop-examples-1.2.1.jar

路径错了。

3.4. Input path does not exist: hdfs://localhost:9000/user/hadoop/hdfsInput

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount hdfsInput hdfsOutput

17/02/05 07:27:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

17/02/05 07:27:10 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032

17/02/05 07:27:12 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/hadoop/.staging/job_1486308315415_0001

org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:9000/user/hadoop/hdfsInput

原因

http://blog.csdn.net/wang_zhenwei/article/details/47444335

我们当前只需要验证单机模式，所以可以简单删除伪分布式环境需要的这个文件：

/usr/local/hadoop/etc/hadoop$ rm core-site.xml

4. 备注

4.1. ubuntu不能以root通过SSH方式登录，所以新建一个用户hadoop

4.2. hadoop 解压到/usr/local/hadoop

解压下载的hadoop文件，放到/home/hadoop目录下名字为hadoop

---有点不好。和用户hadoop的目录名重复了。

放到/usr/local/hadoop

5. 运行结果

运行结果

Welcome to Ubuntu 16.04 LTS (GNU/Linux 4.4.0-22-generic x86_64)

* Documentation: https://help.ubuntu.com/

118 packages can be updated.

18 updates are security updates.

Last login: Mon May 30 08:45:33 2016 from 192.168.202.1

hadoop@ubuntu:~$ pwd

/home/hadoop

hadoop@ubuntu:~$ cd /usr/local/hadoop/

hadoop@ubuntu:/usr/local/hadoop$ ll

total 64

drwxr-xr-x 10 hadoop hadoop 4096 May 28 08:59 ./

drwxr-xr-x 12 root root 4096 May 28 09:15 ../

drwxr-xr-x 2 hadoop hadoop 4096 Jan 25 16:20 bin/

drwxr-xr-x 3 hadoop hadoop 4096 Jan 25 16:20 etc/

drwxr-xr-x 2 hadoop hadoop 4096 Jan 25 16:20 include/

drwxrwxr-x 2 hadoop hadoop 4096 May 28 08:59 input/

drwxr-xr-x 3 hadoop hadoop 4096 Jan 25 16:20 lib/

drwxr-xr-x 2 hadoop hadoop 4096 Jan 25 16:20 libexec/

-rw-r--r-- 1 hadoop hadoop 15429 Jan 25 16:20 LICENSE.txt

-rw-r--r-- 1 hadoop hadoop 101 Jan 25 16:20 NOTICE.txt

-rw-r--r-- 1 hadoop hadoop 1366 Jan 25 16:20 README.txt

drwxr-xr-x 2 hadoop hadoop 4096 Jan 25 16:20 sbin/

drwxr-xr-x 4 hadoop hadoop 4096 Jan 25 16:20 share/

hadoop@ubuntu:/usr/local/hadoop$ mkdir file

hadoop@ubuntu:/usr/local/hadoop$ cd file

hadoop@ubuntu:/usr/local/hadoop/file$ vi myTest1.txt

Hello world Hello me!

"myTest2.txt" 2L, 24C written

hadoop@ubuntu:/usr/local/hadoop/file$ ll

total 16

drwxrwxr-x 2 hadoop hadoop 4096 May 31 08:26 ./

drwxr-xr-x 11 hadoop hadoop 4096 May 31 08:25 ../

-rw-rw-r-- 1 hadoop hadoop 23 May 31 08:26 myTest1.txt

-rw-rw-r-- 1 hadoop hadoop 24 May 31 08:26 myTest2.txt

hadoop@ubuntu:/usr/local/hadoop/file$ cd ..

hadoop@ubuntu:/usr/local/hadoop$ hadoop

hadoop: command not found

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop

Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]

CLASSNAME run the class named CLASSNAME

where COMMAND is one of:

fs run a generic filesystem user client

version print the version

jar <jar> run a jar file

note: please use "yarn jar" to launch

YARN applications, not this command.

checknative [-a|-h] check native hadoop and compression libraries availability

distcp <srcurl> <desturl> copy file or directories recursively

archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive

classpath prints the class path needed to get the

credential interact with credential providers

Hadoop jar and the required libraries

daemonlog get/set the log level for each daemon

trace view and modify Hadoop tracing settings

Most commands print help when invoked w/o parameters.

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop fs -mkdir hdfsInput

hadoop@ubuntu:/usr/local/hadoop$ cp file/* hdfsInput/

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar hadoop-examples-1.2.1.jar wordcount hdfsInput hdfsOutput

Not a valid JAR: /usr/local/hadoop/hadoop-examples-1.2.1.jar

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar share/

doc/ hadoop/

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar share/

doc/ hadoop/

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount hdfsInput hdfsOutput

16/05/31 08:29:59 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id

16/05/31 08:29:59 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=

16/05/31 08:29:59 INFO input.FileInputFormat: Total input paths to process : 2

16/05/31 08:30:00 INFO mapreduce.JobSubmitter: number of splits:2

16/05/31 08:30:00 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local694536620_0001

16/05/31 08:30:00 INFO mapreduce.Job: The url to track the job: http://localhost:8080/

16/05/31 08:30:00 INFO mapreduce.Job: Running job: job_local694536620_0001

16/05/31 08:30:00 INFO mapred.LocalJobRunner: OutputCommitter set in config null

16/05/31 08:30:00 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1

16/05/31 08:30:00 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter

16/05/31 08:30:00 INFO mapred.LocalJobRunner: Waiting for map tasks

16/05/31 08:30:00 INFO mapred.LocalJobRunner: Starting task: attempt_local694536620_0001_m_000000_0

16/05/31 08:30:00 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1

16/05/31 08:30:00 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]

16/05/31 08:30:00 INFO mapred.MapTask: Processing split: file:/usr/local/hadoop/hdfsInput/myTest2.txt:0+24

16/05/31 08:30:00 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)

16/05/31 08:30:00 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100

16/05/31 08:30:00 INFO mapred.MapTask: soft limit at 83886080

16/05/31 08:30:00 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600

16/05/31 08:30:00 INFO mapred.MapTask: kvstart = 26214396; length = 6553600

16/05/31 08:30:00 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer

16/05/31 08:30:00 INFO mapred.LocalJobRunner:

16/05/31 08:30:00 INFO mapred.MapTask: Starting flush of map output

16/05/31 08:30:00 INFO mapred.MapTask: Spilling map output

16/05/31 08:30:00 INFO mapred.MapTask: bufstart = 0; bufend = 39; bufvoid = 104857600

16/05/31 08:30:00 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214384(104857536); length = 13/6553600

16/05/31 08:30:00 INFO mapred.MapTask: Finished spill 0

16/05/31 08:30:00 INFO mapred.Task: Task:attempt_local694536620_0001_m_000000_0 is done. And is in the process of committing

16/05/31 08:30:00 INFO mapred.LocalJobRunner: map

16/05/31 08:30:00 INFO mapred.Task: Task 'attempt_local694536620_0001_m_000000_0' done.

16/05/31 08:30:00 INFO mapred.LocalJobRunner: Finishing task: attempt_local694536620_0001_m_000000_0

16/05/31 08:30:00 INFO mapred.LocalJobRunner: Starting task: attempt_local694536620_0001_m_000001_0

16/05/31 08:30:00 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1

16/05/31 08:30:00 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]

16/05/31 08:30:00 INFO mapred.MapTask: Processing split: file:/usr/local/hadoop/hdfsInput/myTest1.txt:0+23

16/05/31 08:30:01 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)

16/05/31 08:30:01 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100

16/05/31 08:30:01 INFO mapred.MapTask: soft limit at 83886080

16/05/31 08:30:01 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600

16/05/31 08:30:01 INFO mapred.MapTask: kvstart = 26214396; length = 6553600

16/05/31 08:30:01 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer

16/05/31 08:30:01 INFO mapred.LocalJobRunner:

16/05/31 08:30:01 INFO mapred.MapTask: Starting flush of map output

16/05/31 08:30:01 INFO mapred.MapTask: Spilling map output

16/05/31 08:30:01 INFO mapred.MapTask: bufstart = 0; bufend = 38; bufvoid = 104857600

16/05/31 08:30:01 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214384(104857536); length = 13/6553600

16/05/31 08:30:01 INFO mapred.MapTask: Finished spill 0

16/05/31 08:30:01 INFO mapred.Task: Task:attempt_local694536620_0001_m_000001_0 is done. And is in the process of committing

16/05/31 08:30:01 INFO mapred.LocalJobRunner: map

16/05/31 08:30:01 INFO mapred.Task: Task 'attempt_local694536620_0001_m_000001_0' done.

16/05/31 08:30:01 INFO mapred.LocalJobRunner: Finishing task: attempt_local694536620_0001_m_000001_0

16/05/31 08:30:01 INFO mapred.LocalJobRunner: map task executor complete.

16/05/31 08:30:01 INFO mapred.LocalJobRunner: Waiting for reduce tasks

16/05/31 08:30:01 INFO mapred.LocalJobRunner: Starting task: attempt_local694536620_0001_r_000000_0

16/05/31 08:30:01 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1

16/05/31 08:30:01 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]

16/05/31 08:30:01 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@47130856

16/05/31 08:30:01 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=363285696, maxSingleShuffleLimit=90821424, mergeThreshold=239768576, ioSortFactor=10, memToMemMergeOutputsThreshold=10

16/05/31 08:30:01 INFO reduce.EventFetcher: attempt_local694536620_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events

16/05/31 08:30:01 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local694536620_0001_m_000001_0 decomp: 36 len: 40 to MEMORY

16/05/31 08:30:01 INFO reduce.InMemoryMapOutput: Read 36 bytes from map-output for attempt_local694536620_0001_m_000001_0

16/05/31 08:30:01 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 36, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->36

16/05/31 08:30:01 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local694536620_0001_m_000000_0 decomp: 37 len: 41 to MEMORY

16/05/31 08:30:01 WARN io.ReadaheadPool: Failed readahead on ifile

EBADF: Bad file descriptor

at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native Method)

at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267)

at org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146)

at org.apache.hadoop.io.ReadaheadPool$ReadaheadRequestImpl.run(ReadaheadPool.java:206)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

16/05/31 08:30:01 INFO reduce.InMemoryMapOutput: Read 37 bytes from map-output for attempt_local694536620_0001_m_000000_0

16/05/31 08:30:01 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 37, inMemoryMapOutputs.size() -> 2, commitMemory -> 36, usedMemory ->73

16/05/31 08:30:01 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning

16/05/31 08:30:01 INFO mapred.LocalJobRunner: 2 / 2 copied.

16/05/31 08:30:01 INFO reduce.MergeManagerImpl: finalMerge called with 2 in-memory map-outputs and 0 on-disk map-outputs

16/05/31 08:30:01 INFO mapred.Merger: Merging 2 sorted segments

16/05/31 08:30:01 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 57 bytes

16/05/31 08:30:01 INFO reduce.MergeManagerImpl: Merged 2 segments, 73 bytes to disk to satisfy reduce memory limit

16/05/31 08:30:01 INFO reduce.MergeManagerImpl: Merging 1 files, 75 bytes from disk

16/05/31 08:30:01 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce

16/05/31 08:30:01 INFO mapred.Merger: Merging 1 sorted segments

16/05/31 08:30:01 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 63 bytes

16/05/31 08:30:01 INFO mapred.LocalJobRunner: 2 / 2 copied.

16/05/31 08:30:01 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords

16/05/31 08:30:01 INFO mapred.Task: Task:attempt_local694536620_0001_r_000000_0 is done. And is in the process of committing

16/05/31 08:30:01 INFO mapred.LocalJobRunner: 2 / 2 copied.

16/05/31 08:30:01 INFO mapred.Task: Task attempt_local694536620_0001_r_000000_0 is allowed to commit now

16/05/31 08:30:01 INFO output.FileOutputCommitter: Saved output of task 'attempt_local694536620_0001_r_000000_0' to file:/usr/local/hadoop/hdfsOutput/_temporary/0/task_local694536620_0001_r_000000

16/05/31 08:30:01 INFO mapred.LocalJobRunner: reduce > reduce

16/05/31 08:30:01 INFO mapred.Task: Task 'attempt_local694536620_0001_r_000000_0' done.

16/05/31 08:30:01 INFO mapred.LocalJobRunner: Finishing task: attempt_local694536620_0001_r_000000_0

16/05/31 08:30:01 INFO mapred.LocalJobRunner: reduce task executor complete.

16/05/31 08:30:01 INFO mapreduce.Job: Job job_local694536620_0001 running in uber mode : false

16/05/31 08:30:01 INFO mapreduce.Job: map 100% reduce 100%

16/05/31 08:30:01 INFO mapreduce.Job: Job job_local694536620_0001 completed successfully

16/05/31 08:30:01 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=821975

FILE: Number of bytes written=1666952

FILE: Number of read operations=0