1. 参考资料
类似文章太多了,所以请先上网搜索。
主要参考了下面文章:
Ubuntu上搭建Hadoop环境(单机模式+伪分布模式) - 狂奔的蜗牛 - 博客频道 - CSDN.NET
http://blog.csdn.net/hitwengqi/article/details/8008203
Hadoop下面WordCount运行详解
http://www.cnblogs.com/madyina/p/3708153.html
2. 学习环境
WINDOWS7 家庭高级版,64位系统
VMware Workstation v12.1.1
里面的虚拟机安装的操作系统ubuntu-16.04-desktop-amd64.iso
hadoop-2.7.2.tar.gz
3. 异常情况
3.1. mkdir hdfsInput就出错- 9000 failed on connection exception
|
hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop fs -mkdir hdfsInput 17/01/08 07:05:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable mkdir: Call From ubuntu/127.0.1.1 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused hadoop@ubuntu:/usr/local/hadoop$ ll hdfsInput ls: cannot access 'hdfsInput': No such file or directory hadoop@ubuntu:/usr/local/hadoop$ |
hadoop环境配置过程中可能遇到问题的解决方案 http://blog.csdn.net/yutianzuijin/article/details/9455319
当在伪分布式环境下运行wordcount示例时,如果报上述错误说明未启动hadoop,利用start-all.sh脚本启动hadoop环境。
hadoop@ubuntu:/usr/local/hadoop/sbin$ ./start-all.sh This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh 17/01/08 07:58:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Starting namenodes on [localhost] localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-namenode-ubuntu.out localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-ubuntu.out Starting secondary namenodes [0.0.0.0] 0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-secondarynamenode-ubuntu.out 17/01/08 07:59:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable starting yarn daemons starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-resourcemanager-ubuntu.out localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-ubuntu.out hadoop@ubuntu:/usr/local/hadoop/sbin$ cd .. hadoop@ubuntu:/usr/local/hadoop$ hadoop fs -mkdir hdfsInput 17/01/08 07:59:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable mkdir: `hdfsInput': No such file or directory hadoop@ubuntu:/usr/local/hadoop$
果然解决。 原因是在LINUX上配置了伪分布式环境,所以单机模式下的运行方法出错了。 |
3.2. hadoop: command not found
hadoop@ubuntu:/usr/local/hadoop$ hadoop
hadoop: command not found
可以这样用:hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop
或者:
export PATH=$PATH:/usr/local/hadoop/bin
让环境变量配置生效source
source /usr/local/hadoop/etc/hadoop/hadoop-env.sh
3.3. Not a valid JAR: /usr/local/hadoop/hadoop-examples-1.2.1.jar
路径错了。
3.4. Input path does not exist: hdfs://localhost:9000/user/hadoop/hdfsInput
hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount hdfsInput hdfsOutput
17/02/05 07:27:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/02/05 07:27:10 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/02/05 07:27:12 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/hadoop/.staging/job_1486308315415_0001
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:9000/user/hadoop/hdfsInput
原因
http://blog.csdn.net/wang_zhenwei/article/details/47444335
我们当前只需要验证单机模式,所以可以简单删除伪分布式环境需要的这个文件:
/usr/local/hadoop/etc/hadoop$ rm core-site.xml
4. 备注
4.1. ubuntu不能以root通过SSH方式登录,所以新建一个用户hadoop
4.2. hadoop 解压到/usr/local/hadoop
解压下载的hadoop文件,放到/home/hadoop目录下 名字为hadoop
---有点不好。和用户hadoop的目录名重复了。
放到/usr/local/hadoop
5. 运行结果
运行结果 |
Welcome to Ubuntu 16.04 LTS (GNU/Linux 4.4.0-22-generic x86_64)
* Documentation: https://help.ubuntu.com/
118 packages can be updated. 18 updates are security updates.
Last login: Mon May 30 08:45:33 2016 from 192.168.202.1 hadoop@ubuntu:~$ pwd /home/hadoop hadoop@ubuntu:~$ cd /usr/local/hadoop/ hadoop@ubuntu:/usr/local/hadoop$ ll total 64 drwxr-xr-x 10 hadoop hadoop 4096 May 28 08:59 ./ drwxr-xr-x 12 root root 4096 May 28 09:15 ../ drwxr-xr-x 2 hadoop hadoop 4096 Jan 25 16:20 bin/ drwxr-xr-x 3 hadoop hadoop 4096 Jan 25 16:20 etc/ drwxr-xr-x 2 hadoop hadoop 4096 Jan 25 16:20 include/ drwxrwxr-x 2 hadoop hadoop 4096 May 28 08:59 input/ drwxr-xr-x 3 hadoop hadoop 4096 Jan 25 16:20 lib/ drwxr-xr-x 2 hadoop hadoop 4096 Jan 25 16:20 libexec/ -rw-r--r-- 1 hadoop hadoop 15429 Jan 25 16:20 LICENSE.txt -rw-r--r-- 1 hadoop hadoop 101 Jan 25 16:20 NOTICE.txt -rw-r--r-- 1 hadoop hadoop 1366 Jan 25 16:20 README.txt drwxr-xr-x 2 hadoop hadoop 4096 Jan 25 16:20 sbin/ drwxr-xr-x 4 hadoop hadoop 4096 Jan 25 16:20 share/ hadoop@ubuntu:/usr/local/hadoop$ mkdir file hadoop@ubuntu:/usr/local/hadoop$ cd file hadoop@ubuntu:/usr/local/hadoop/file$ vi myTest1.txt Hello world Hello me!
~ ~ "myTest2.txt" 2L, 24C written hadoop@ubuntu:/usr/local/hadoop/file$ ll total 16 drwxrwxr-x 2 hadoop hadoop 4096 May 31 08:26 ./ drwxr-xr-x 11 hadoop hadoop 4096 May 31 08:25 ../ -rw-rw-r-- 1 hadoop hadoop 23 May 31 08:26 myTest1.txt -rw-rw-r-- 1 hadoop hadoop 24 May 31 08:26 myTest2.txt hadoop@ubuntu:/usr/local/hadoop/file$ cd .. hadoop@ubuntu:/usr/local/hadoop$ hadoop hadoop: command not found hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop Usage: hadoop [--config confdir] [COMMAND | CLASSNAME] CLASSNAME run the class named CLASSNAME or where COMMAND is one of: fs run a generic filesystem user client version print the version jar <jar> run a jar file note: please use "yarn jar" to launch YARN applications, not this command. checknative [-a|-h] check native hadoop and compression libraries availability distcp <srcurl> <desturl> copy file or directories recursively archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive classpath prints the class path needed to get the credential interact with credential providers Hadoop jar and the required libraries daemonlog get/set the log level for each daemon trace view and modify Hadoop tracing settings
Most commands print help when invoked w/o parameters. hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop fs -mkdir hdfsInput hadoop@ubuntu:/usr/local/hadoop$ cp file/* hdfsInput/ hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar hadoop-examples-1.2.1.jar wordcount hdfsInput hdfsOutput Not a valid JAR: /usr/local/hadoop/hadoop-examples-1.2.1.jar hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar share/ doc/ hadoop/ hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar share/ doc/ hadoop/ hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount hdfsInput hdfsOutput 16/05/31 08:29:59 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 16/05/31 08:29:59 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 16/05/31 08:29:59 INFO input.FileInputFormat: Total input paths to process : 2 16/05/31 08:30:00 INFO mapreduce.JobSubmitter: number of splits:2 16/05/31 08:30:00 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local694536620_0001 16/05/31 08:30:00 INFO mapreduce.Job: The url to track the job: http://localhost:8080/ 16/05/31 08:30:00 INFO mapreduce.Job: Running job: job_local694536620_0001 16/05/31 08:30:00 INFO mapred.LocalJobRunner: OutputCommitter set in config null 16/05/31 08:30:00 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 16/05/31 08:30:00 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter 16/05/31 08:30:00 INFO mapred.LocalJobRunner: Waiting for map tasks 16/05/31 08:30:00 INFO mapred.LocalJobRunner: Starting task: attempt_local694536620_0001_m_000000_0 16/05/31 08:30:00 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 16/05/31 08:30:00 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 16/05/31 08:30:00 INFO mapred.MapTask: Processing split: file:/usr/local/hadoop/hdfsInput/myTest2.txt:0+24 16/05/31 08:30:00 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584) 16/05/31 08:30:00 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100 16/05/31 08:30:00 INFO mapred.MapTask: soft limit at 83886080 16/05/31 08:30:00 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600 16/05/31 08:30:00 INFO mapred.MapTask: kvstart = 26214396; length = 6553600 16/05/31 08:30:00 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 16/05/31 08:30:00 INFO mapred.LocalJobRunner: 16/05/31 08:30:00 INFO mapred.MapTask: Starting flush of map output 16/05/31 08:30:00 INFO mapred.MapTask: Spilling map output 16/05/31 08:30:00 INFO mapred.MapTask: bufstart = 0; bufend = 39; bufvoid = 104857600 16/05/31 08:30:00 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214384(104857536); length = 13/6553600 16/05/31 08:30:00 INFO mapred.MapTask: Finished spill 0 16/05/31 08:30:00 INFO mapred.Task: Task:attempt_local694536620_0001_m_000000_0 is done. And is in the process of committing 16/05/31 08:30:00 INFO mapred.LocalJobRunner: map 16/05/31 08:30:00 INFO mapred.Task: Task 'attempt_local694536620_0001_m_000000_0' done. 16/05/31 08:30:00 INFO mapred.LocalJobRunner: Finishing task: attempt_local694536620_0001_m_000000_0 16/05/31 08:30:00 INFO mapred.LocalJobRunner: Starting task: attempt_local694536620_0001_m_000001_0 16/05/31 08:30:00 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 16/05/31 08:30:00 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 16/05/31 08:30:00 INFO mapred.MapTask: Processing split: file:/usr/local/hadoop/hdfsInput/myTest1.txt:0+23 16/05/31 08:30:01 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584) 16/05/31 08:30:01 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100 16/05/31 08:30:01 INFO mapred.MapTask: soft limit at 83886080 16/05/31 08:30:01 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600 16/05/31 08:30:01 INFO mapred.MapTask: kvstart = 26214396; length = 6553600 16/05/31 08:30:01 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 16/05/31 08:30:01 INFO mapred.LocalJobRunner: 16/05/31 08:30:01 INFO mapred.MapTask: Starting flush of map output 16/05/31 08:30:01 INFO mapred.MapTask: Spilling map output 16/05/31 08:30:01 INFO mapred.MapTask: bufstart = 0; bufend = 38; bufvoid = 104857600 16/05/31 08:30:01 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214384(104857536); length = 13/6553600 16/05/31 08:30:01 INFO mapred.MapTask: Finished spill 0 16/05/31 08:30:01 INFO mapred.Task: Task:attempt_local694536620_0001_m_000001_0 is done. And is in the process of committing 16/05/31 08:30:01 INFO mapred.LocalJobRunner: map 16/05/31 08:30:01 INFO mapred.Task: Task 'attempt_local694536620_0001_m_000001_0' done. 16/05/31 08:30:01 INFO mapred.LocalJobRunner: Finishing task: attempt_local694536620_0001_m_000001_0 16/05/31 08:30:01 INFO mapred.LocalJobRunner: map task executor complete. 16/05/31 08:30:01 INFO mapred.LocalJobRunner: Waiting for reduce tasks 16/05/31 08:30:01 INFO mapred.LocalJobRunner: Starting task: attempt_local694536620_0001_r_000000_0 16/05/31 08:30:01 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 16/05/31 08:30:01 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 16/05/31 08:30:01 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@47130856 16/05/31 08:30:01 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=363285696, maxSingleShuffleLimit=90821424, mergeThreshold=239768576, ioSortFactor=10, memToMemMergeOutputsThreshold=10 16/05/31 08:30:01 INFO reduce.EventFetcher: attempt_local694536620_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events 16/05/31 08:30:01 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local694536620_0001_m_000001_0 decomp: 36 len: 40 to MEMORY 16/05/31 08:30:01 INFO reduce.InMemoryMapOutput: Read 36 bytes from map-output for attempt_local694536620_0001_m_000001_0 16/05/31 08:30:01 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 36, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->36 16/05/31 08:30:01 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local694536620_0001_m_000000_0 decomp: 37 len: 41 to MEMORY 16/05/31 08:30:01 WARN io.ReadaheadPool: Failed readahead on ifile EBADF: Bad file descriptor at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native Method) at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267) at org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146) at org.apache.hadoop.io.ReadaheadPool$ReadaheadRequestImpl.run(ReadaheadPool.java:206) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 16/05/31 08:30:01 INFO reduce.InMemoryMapOutput: Read 37 bytes from map-output for attempt_local694536620_0001_m_000000_0 16/05/31 08:30:01 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 37, inMemoryMapOutputs.size() -> 2, commitMemory -> 36, usedMemory ->73 16/05/31 08:30:01 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning 16/05/31 08:30:01 INFO mapred.LocalJobRunner: 2 / 2 copied. 16/05/31 08:30:01 INFO reduce.MergeManagerImpl: finalMerge called with 2 in-memory map-outputs and 0 on-disk map-outputs 16/05/31 08:30:01 INFO mapred.Merger: Merging 2 sorted segments 16/05/31 08:30:01 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 57 bytes 16/05/31 08:30:01 INFO reduce.MergeManagerImpl: Merged 2 segments, 73 bytes to disk to satisfy reduce memory limit 16/05/31 08:30:01 INFO reduce.MergeManagerImpl: Merging 1 files, 75 bytes from disk 16/05/31 08:30:01 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce 16/05/31 08:30:01 INFO mapred.Merger: Merging 1 sorted segments 16/05/31 08:30:01 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 63 bytes 16/05/31 08:30:01 INFO mapred.LocalJobRunner: 2 / 2 copied. 16/05/31 08:30:01 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords 16/05/31 08:30:01 INFO mapred.Task: Task:attempt_local694536620_0001_r_000000_0 is done. And is in the process of committing 16/05/31 08:30:01 INFO mapred.LocalJobRunner: 2 / 2 copied. 16/05/31 08:30:01 INFO mapred.Task: Task attempt_local694536620_0001_r_000000_0 is allowed to commit now 16/05/31 08:30:01 INFO output.FileOutputCommitter: Saved output of task 'attempt_local694536620_0001_r_000000_0' to file:/usr/local/hadoop/hdfsOutput/_temporary/0/task_local694536620_0001_r_000000 16/05/31 08:30:01 INFO mapred.LocalJobRunner: reduce > reduce 16/05/31 08:30:01 INFO mapred.Task: Task 'attempt_local694536620_0001_r_000000_0' done. 16/05/31 08:30:01 INFO mapred.LocalJobRunner: Finishing task: attempt_local694536620_0001_r_000000_0 16/05/31 08:30:01 INFO mapred.LocalJobRunner: reduce task executor complete. 16/05/31 08:30:01 INFO mapreduce.Job: Job job_local694536620_0001 running in uber mode : false 16/05/31 08:30:01 INFO mapreduce.Job: map 100% reduce 100% 16/05/31 08:30:01 INFO mapreduce.Job: Job job_local694536620_0001 completed successfully 16/05/31 08:30:01 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=821975 FILE: Number of bytes written=1666952 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 Map-Reduce Framework Map input records=4 Map output records=8 Map output bytes=77 Map output materialized bytes=81 Input split bytes=218 Combine input records=8 Combine output records=6 Reduce input groups=4 Reduce shuffle bytes=81 Reduce input records=6 Reduce output records=4 Spilled Records=12 Shuffled Maps =2 Failed Shuffles=0 Merged Map outputs=2 GC time elapsed (ms)=37 Total committed heap usage (bytes)=457912320 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=47 File Output Format Counters Bytes Written=41 hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop fs -ls hdfsOutput/ Found 2 items -rw-r--r-- 1 hadoop hadoop 0 2016-05-31 08:30 hdfsOutput/_SUCCESS -rw-r--r-- 1 hadoop hadoop 29 2016-05-31 08:30 hdfsOutput/part-r-00000 hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop fs -cat hdfsOutput/part-r-00000 Hello 4 You! 1 me! 1 world 2 hadoop@ubuntu:/usr/local/hadoop$ |