hadoop1-单机模式WordCount

1.   参考资料

类似文章太多了,所以请先上网搜索。

主要参考了下面文章:

Ubuntu上搭建Hadoop环境(单机模式+伪分布模式) - 狂奔的蜗牛 - 博客频道 - CSDN.NET

http://blog.csdn.net/hitwengqi/article/details/8008203

 

Hadoop下面WordCount运行详解

http://www.cnblogs.com/madyina/p/3708153.html

2.   学习环境

WINDOWS7 家庭高级版,64位系统

VMware Workstation v12.1.1

里面的虚拟机安装的操作系统ubuntu-16.04-desktop-amd64.iso

hadoop-2.7.2.tar.gz

 

3.   异常情况

3.1. mkdir hdfsInput就出错- 9000 failed on connection exception

 

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop fs -mkdir hdfsInput

17/01/08 07:05:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

mkdir: Call From ubuntu/127.0.1.1 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

hadoop@ubuntu:/usr/local/hadoop$ ll hdfsInput

ls: cannot access 'hdfsInput': No such file or directory

hadoop@ubuntu:/usr/local/hadoop$

hadoop环境配置过程中可能遇到问题的解决方案

http://blog.csdn.net/yutianzuijin/article/details/9455319

 

 

当在伪分布式环境下运行wordcount示例时,如果报上述错误说明未启动hadoop,利用start-all.sh脚本启动hadoop环境。

 

hadoop@ubuntu:/usr/local/hadoop/sbin$ ./start-all.sh

This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh

17/01/08 07:58:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Starting namenodes on [localhost]

localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-namenode-ubuntu.out

localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-ubuntu.out

Starting secondary namenodes [0.0.0.0]

0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-secondarynamenode-ubuntu.out

17/01/08 07:59:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

starting yarn daemons

starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-resourcemanager-ubuntu.out

localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-ubuntu.out

hadoop@ubuntu:/usr/local/hadoop/sbin$ cd ..

hadoop@ubuntu:/usr/local/hadoop$ hadoop fs -mkdir hdfsInput

17/01/08 07:59:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

mkdir: `hdfsInput': No such file or directory

hadoop@ubuntu:/usr/local/hadoop$

 

果然解决。

原因是在LINUX上配置了伪分布式环境,所以单机模式下的运行方法出错了。

3.2. hadoop: command not found

hadoop@ubuntu:/usr/local/hadoop$ hadoop

hadoop: command not found

 

可以这样用:hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop

或者:

export PATH=$PATH:/usr/local/hadoop/bin

 

让环境变量配置生效source

source /usr/local/hadoop/etc/hadoop/hadoop-env.sh

3.3. Not a valid JAR: /usr/local/hadoop/hadoop-examples-1.2.1.jar

路径错了。

 

3.4. Input path does not exist: hdfs://localhost:9000/user/hadoop/hdfsInput

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount hdfsInput hdfsOutput

17/02/05 07:27:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

17/02/05 07:27:10 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032

17/02/05 07:27:12 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/hadoop/.staging/job_1486308315415_0001

org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:9000/user/hadoop/hdfsInput

 

原因

http://blog.csdn.net/wang_zhenwei/article/details/47444335

 

我们当前只需要验证单机模式,所以可以简单删除伪分布式环境需要的这个文件:

/usr/local/hadoop/etc/hadoop$ rm core-site.xml

4.   备注

4.1. ubuntu不能以root通过SSH方式登录,所以新建一个用户hadoop

4.2. hadoop 解压到/usr/local/hadoop

解压下载的hadoop文件,放到/home/hadoop目录下 名字为hadoop

---有点不好。和用户hadoop的目录名重复了。

 

放到/usr/local/hadoop

 

5.   运行结果

运行结果

Welcome to Ubuntu 16.04 LTS (GNU/Linux 4.4.0-22-generic x86_64)

 

 * Documentation:  https://help.ubuntu.com/

 

118 packages can be updated.

18 updates are security updates.

 

Last login: Mon May 30 08:45:33 2016 from 192.168.202.1

hadoop@ubuntu:~$ pwd

/home/hadoop

hadoop@ubuntu:~$ cd /usr/local/hadoop/

hadoop@ubuntu:/usr/local/hadoop$ ll

total 64

drwxr-xr-x 10 hadoop hadoop  4096 May 28 08:59 ./

drwxr-xr-x 12 root   root    4096 May 28 09:15 ../

drwxr-xr-x  2 hadoop hadoop  4096 Jan 25 16:20 bin/

drwxr-xr-x  3 hadoop hadoop  4096 Jan 25 16:20 etc/

drwxr-xr-x  2 hadoop hadoop  4096 Jan 25 16:20 include/

drwxrwxr-x  2 hadoop hadoop  4096 May 28 08:59 input/

drwxr-xr-x  3 hadoop hadoop  4096 Jan 25 16:20 lib/

drwxr-xr-x  2 hadoop hadoop  4096 Jan 25 16:20 libexec/

-rw-r--r--  1 hadoop hadoop 15429 Jan 25 16:20 LICENSE.txt

-rw-r--r--  1 hadoop hadoop   101 Jan 25 16:20 NOTICE.txt

-rw-r--r--  1 hadoop hadoop  1366 Jan 25 16:20 README.txt

drwxr-xr-x  2 hadoop hadoop  4096 Jan 25 16:20 sbin/

drwxr-xr-x  4 hadoop hadoop  4096 Jan 25 16:20 share/

hadoop@ubuntu:/usr/local/hadoop$ mkdir file

hadoop@ubuntu:/usr/local/hadoop$ cd file

hadoop@ubuntu:/usr/local/hadoop/file$ vi myTest1.txt

Hello world Hello me!

 

~

~

"myTest2.txt" 2L, 24C written                                

hadoop@ubuntu:/usr/local/hadoop/file$ ll

total 16

drwxrwxr-x  2 hadoop hadoop 4096 May 31 08:26 ./

drwxr-xr-x 11 hadoop hadoop 4096 May 31 08:25 ../

-rw-rw-r--  1 hadoop hadoop   23 May 31 08:26 myTest1.txt

-rw-rw-r--  1 hadoop hadoop   24 May 31 08:26 myTest2.txt

hadoop@ubuntu:/usr/local/hadoop/file$ cd ..

hadoop@ubuntu:/usr/local/hadoop$ hadoop

hadoop: command not found

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop

Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]

  CLASSNAME            run the class named CLASSNAME

 or

  where COMMAND is one of:

  fs                   run a generic filesystem user client

  version              print the version

  jar <jar>            run a jar file

                       note: please use "yarn jar" to launch

                             YARN applications, not this command.

  checknative [-a|-h]  check native hadoop and compression libraries availability

  distcp <srcurl> <desturl> copy file or directories recursively

  archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive

  classpath            prints the class path needed to get the

  credential           interact with credential providers

                       Hadoop jar and the required libraries

  daemonlog            get/set the log level for each daemon

  trace                view and modify Hadoop tracing settings

 

Most commands print help when invoked w/o parameters.

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop fs -mkdir hdfsInput

hadoop@ubuntu:/usr/local/hadoop$ cp file/* hdfsInput/

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar hadoop-examples-1.2.1.jar wordcount hdfsInput hdfsOutput

Not a valid JAR: /usr/local/hadoop/hadoop-examples-1.2.1.jar

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar share/

doc/    hadoop/

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar share/

doc/    hadoop/

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount hdfsInput hdfsOutput

16/05/31 08:29:59 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id

16/05/31 08:29:59 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=

16/05/31 08:29:59 INFO input.FileInputFormat: Total input paths to process : 2

16/05/31 08:30:00 INFO mapreduce.JobSubmitter: number of splits:2

16/05/31 08:30:00 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local694536620_0001

16/05/31 08:30:00 INFO mapreduce.Job: The url to track the job: http://localhost:8080/

16/05/31 08:30:00 INFO mapreduce.Job: Running job: job_local694536620_0001

16/05/31 08:30:00 INFO mapred.LocalJobRunner: OutputCommitter set in config null

16/05/31 08:30:00 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1

16/05/31 08:30:00 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter

16/05/31 08:30:00 INFO mapred.LocalJobRunner: Waiting for map tasks

16/05/31 08:30:00 INFO mapred.LocalJobRunner: Starting task: attempt_local694536620_0001_m_000000_0

16/05/31 08:30:00 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1

16/05/31 08:30:00 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]

16/05/31 08:30:00 INFO mapred.MapTask: Processing split: file:/usr/local/hadoop/hdfsInput/myTest2.txt:0+24

16/05/31 08:30:00 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)

16/05/31 08:30:00 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100

16/05/31 08:30:00 INFO mapred.MapTask: soft limit at 83886080

16/05/31 08:30:00 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600

16/05/31 08:30:00 INFO mapred.MapTask: kvstart = 26214396; length = 6553600

16/05/31 08:30:00 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer

16/05/31 08:30:00 INFO mapred.LocalJobRunner:

16/05/31 08:30:00 INFO mapred.MapTask: Starting flush of map output

16/05/31 08:30:00 INFO mapred.MapTask: Spilling map output

16/05/31 08:30:00 INFO mapred.MapTask: bufstart = 0; bufend = 39; bufvoid = 104857600

16/05/31 08:30:00 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214384(104857536); length = 13/6553600

16/05/31 08:30:00 INFO mapred.MapTask: Finished spill 0

16/05/31 08:30:00 INFO mapred.Task: Task:attempt_local694536620_0001_m_000000_0 is done. And is in the process of committing

16/05/31 08:30:00 INFO mapred.LocalJobRunner: map

16/05/31 08:30:00 INFO mapred.Task: Task 'attempt_local694536620_0001_m_000000_0' done.

16/05/31 08:30:00 INFO mapred.LocalJobRunner: Finishing task: attempt_local694536620_0001_m_000000_0

16/05/31 08:30:00 INFO mapred.LocalJobRunner: Starting task: attempt_local694536620_0001_m_000001_0

16/05/31 08:30:00 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1

16/05/31 08:30:00 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]

16/05/31 08:30:00 INFO mapred.MapTask: Processing split: file:/usr/local/hadoop/hdfsInput/myTest1.txt:0+23

16/05/31 08:30:01 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)

16/05/31 08:30:01 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100

16/05/31 08:30:01 INFO mapred.MapTask: soft limit at 83886080

16/05/31 08:30:01 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600

16/05/31 08:30:01 INFO mapred.MapTask: kvstart = 26214396; length = 6553600

16/05/31 08:30:01 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer

16/05/31 08:30:01 INFO mapred.LocalJobRunner:

16/05/31 08:30:01 INFO mapred.MapTask: Starting flush of map output

16/05/31 08:30:01 INFO mapred.MapTask: Spilling map output

16/05/31 08:30:01 INFO mapred.MapTask: bufstart = 0; bufend = 38; bufvoid = 104857600

16/05/31 08:30:01 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214384(104857536); length = 13/6553600

16/05/31 08:30:01 INFO mapred.MapTask: Finished spill 0

16/05/31 08:30:01 INFO mapred.Task: Task:attempt_local694536620_0001_m_000001_0 is done. And is in the process of committing

16/05/31 08:30:01 INFO mapred.LocalJobRunner: map

16/05/31 08:30:01 INFO mapred.Task: Task 'attempt_local694536620_0001_m_000001_0' done.

16/05/31 08:30:01 INFO mapred.LocalJobRunner: Finishing task: attempt_local694536620_0001_m_000001_0

16/05/31 08:30:01 INFO mapred.LocalJobRunner: map task executor complete.

16/05/31 08:30:01 INFO mapred.LocalJobRunner: Waiting for reduce tasks

16/05/31 08:30:01 INFO mapred.LocalJobRunner: Starting task: attempt_local694536620_0001_r_000000_0

16/05/31 08:30:01 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1

16/05/31 08:30:01 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]

16/05/31 08:30:01 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@47130856

16/05/31 08:30:01 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=363285696, maxSingleShuffleLimit=90821424, mergeThreshold=239768576, ioSortFactor=10, memToMemMergeOutputsThreshold=10

16/05/31 08:30:01 INFO reduce.EventFetcher: attempt_local694536620_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events

16/05/31 08:30:01 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local694536620_0001_m_000001_0 decomp: 36 len: 40 to MEMORY

16/05/31 08:30:01 INFO reduce.InMemoryMapOutput: Read 36 bytes from map-output for attempt_local694536620_0001_m_000001_0

16/05/31 08:30:01 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 36, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->36

16/05/31 08:30:01 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local694536620_0001_m_000000_0 decomp: 37 len: 41 to MEMORY

16/05/31 08:30:01 WARN io.ReadaheadPool: Failed readahead on ifile

EBADF: Bad file descriptor

        at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native Method)

        at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267)

        at org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146)

        at org.apache.hadoop.io.ReadaheadPool$ReadaheadRequestImpl.run(ReadaheadPool.java:206)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        at java.lang.Thread.run(Thread.java:745)

16/05/31 08:30:01 INFO reduce.InMemoryMapOutput: Read 37 bytes from map-output for attempt_local694536620_0001_m_000000_0

16/05/31 08:30:01 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 37, inMemoryMapOutputs.size() -> 2, commitMemory -> 36, usedMemory ->73

16/05/31 08:30:01 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning

16/05/31 08:30:01 INFO mapred.LocalJobRunner: 2 / 2 copied.

16/05/31 08:30:01 INFO reduce.MergeManagerImpl: finalMerge called with 2 in-memory map-outputs and 0 on-disk map-outputs

16/05/31 08:30:01 INFO mapred.Merger: Merging 2 sorted segments

16/05/31 08:30:01 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 57 bytes

16/05/31 08:30:01 INFO reduce.MergeManagerImpl: Merged 2 segments, 73 bytes to disk to satisfy reduce memory limit

16/05/31 08:30:01 INFO reduce.MergeManagerImpl: Merging 1 files, 75 bytes from disk

16/05/31 08:30:01 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce

16/05/31 08:30:01 INFO mapred.Merger: Merging 1 sorted segments

16/05/31 08:30:01 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 63 bytes

16/05/31 08:30:01 INFO mapred.LocalJobRunner: 2 / 2 copied.

16/05/31 08:30:01 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords

16/05/31 08:30:01 INFO mapred.Task: Task:attempt_local694536620_0001_r_000000_0 is done. And is in the process of committing

16/05/31 08:30:01 INFO mapred.LocalJobRunner: 2 / 2 copied.

16/05/31 08:30:01 INFO mapred.Task: Task attempt_local694536620_0001_r_000000_0 is allowed to commit now

16/05/31 08:30:01 INFO output.FileOutputCommitter: Saved output of task 'attempt_local694536620_0001_r_000000_0' to file:/usr/local/hadoop/hdfsOutput/_temporary/0/task_local694536620_0001_r_000000

16/05/31 08:30:01 INFO mapred.LocalJobRunner: reduce > reduce

16/05/31 08:30:01 INFO mapred.Task: Task 'attempt_local694536620_0001_r_000000_0' done.

16/05/31 08:30:01 INFO mapred.LocalJobRunner: Finishing task: attempt_local694536620_0001_r_000000_0

16/05/31 08:30:01 INFO mapred.LocalJobRunner: reduce task executor complete.

16/05/31 08:30:01 INFO mapreduce.Job: Job job_local694536620_0001 running in uber mode : false

16/05/31 08:30:01 INFO mapreduce.Job:  map 100% reduce 100%

16/05/31 08:30:01 INFO mapreduce.Job: Job job_local694536620_0001 completed successfully

16/05/31 08:30:01 INFO mapreduce.Job: Counters: 30

        File System Counters

                FILE: Number of bytes read=821975

                FILE: Number of bytes written=1666952

                FILE: Number of read operations=0

                FILE: Number of large read operations=0

                FILE: Number of write operations=0

        Map-Reduce Framework

                Map input records=4

                Map output records=8

                Map output bytes=77

                Map output materialized bytes=81

                Input split bytes=218

                Combine input records=8

                Combine output records=6

                Reduce input groups=4

                Reduce shuffle bytes=81

                Reduce input records=6

                Reduce output records=4

                Spilled Records=12

                Shuffled Maps =2

                Failed Shuffles=0

                Merged Map outputs=2

                GC time elapsed (ms)=37

                Total committed heap usage (bytes)=457912320

        Shuffle Errors

                BAD_ID=0

                CONNECTION=0

                IO_ERROR=0

                WRONG_LENGTH=0

                WRONG_MAP=0

                WRONG_REDUCE=0

        File Input Format Counters

                Bytes Read=47

        File Output Format Counters

                Bytes Written=41

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop fs -ls hdfsOutput/

Found 2 items

-rw-r--r--   1 hadoop hadoop          0 2016-05-31 08:30 hdfsOutput/_SUCCESS

-rw-r--r--   1 hadoop hadoop         29 2016-05-31 08:30 hdfsOutput/part-r-00000

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop fs -cat hdfsOutput/part-r-00000

Hello   4

You!    1

me!     1

world   2

hadoop@ubuntu:/usr/local/hadoop$

转载于:https://www.cnblogs.com/truezq/p/6368926.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值