前置条件:
1、jdk安装成功(http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html)
2、下载hadoop-1.2.1.tar.gz(https://dist.apache.org/repos/dist/release/hadoop/common/hadoop-1.2.1/)
安装hadoop
首先将hadoop-1.2.1.tar.gz复制到usr下的local文件夹内,然后解压如图1
drwxr-xr-x. 2 root root 4096 9月 23 2011 bin
drwxr-xr-x. 2 root root 4096 9月 23 2011 etc
drwxr-xr-x. 2 root root 4096 9月 23 2011 games
drwxr-xr-x. 16 root root 4096 9月 14 16:42 hadoop-1.2.1
-rw-r--r--. 1 root root 63851630 9月 14 21:20 hadoop-1.2.1.tar.gz
drwxr-xr-x. 2 root root 4096 9月 23 2011 include
drwxr-xr-x. 8 uucp 143 4096 6月 22 17:50 jdk1.8.0_101
drwxr-xr-x. 2 root root 4096 9月 23 2011 lib
drwxr-xr-x. 2 root root 4096 9月 23 2011 libexec
drwxr-xr-x. 2 root root 4096 9月 23 2011 sbin
drwxr-xr-x. 5 root root 4096 9月 14 2016 share
drwxr-xr-x. 2 root root 4096 9月 23 2011 src
配置hadoop
0、浏览hadoop文件下都有些什么东西,如图13
drwxr-xr-x. 2 root root 4096 9月 14 16:32 bin
-rw-rw-r--. 1 root root 121130 7月 23 2013 build.xml
drwxr-xr-x. 4 root root 4096 7月 23 2013 c++
-rw-rw-r--. 1 root root 493744 7月 23 2013 CHANGES.txt
drwxr-xr-x. 2 root root 4096 9月 14 21:30 conf
drwxr-xr-x. 10 root root 4096 7月 23 2013 contrib
drwxr-xr-x. 6 root root 4096 9月 14 16:31 docs
-rw-rw-r--. 1 root root 6842 7月 23 2013 hadoop-ant-1.2.1.jar
-rw-rw-r--. 1 root root 414 7月 23 2013 hadoop-client-1.2.1.jar
-rw-rw-r--. 1 root root 4203147 7月 23 2013 hadoop-core-1.2.1.jar
-rw-rw-r--. 1 root root 142726 7月 23 2013 hadoop-examples-1.2.1.jar
-rw-rw-r--. 1 root root 417 7月 23 2013 hadoop-minicluster-1.2.1.jar
-rw-rw-r--. 1 root root 3126576 7月 23 2013 hadoop-test-1.2.1.jar
-rw-rw-r--. 1 root root 385634 7月 23 2013 hadoop-tools-1.2.1.jar
drwxr-xr-x. 2 root root 4096 9月 14 16:31 ivy
-rw-rw-r--. 1 root root 10525 7月 23 2013 ivy.xml
drwxr-xr-x. 5 root root 4096 9月 14 16:31 lib
drwxr-xr-x. 2 root root 4096 9月 14 16:32 libexec
-rw-rw-r--. 1 root root 13366 7月 23 2013 LICENSE.txt
drwxr-xr-x. 4 root root 4096 9月 14 22:03 logs
-rw-rw-r--. 1 root root 101 7月 23 2013 NOTICE.txt
-rw-rw-r--. 1 root root 1366 7月 23 2013 README.txt
drwxr-xr-x. 2 root root 4096 9月 14 16:32 sbin
drwxr-xr-x. 3 root root 4096 7月 23 2013 share
drwxr-xr-x. 16 root root 4096 9月 14 16:32 src
drwxr-xr-x. 9 root root 4096 7月 23 2013 webapps
1、打开conf/hadoop-env.sh,如图14
vi /usr/local/hadoop-1.2.1/conf/hadoop-env.sh
export JAVA_HOME=/usr/local/jdk1.8.0_101
export HADOOP_HOME=/usr/local/hadoop-1.2.1
export PATH=$PATH:/usr/local/hadoop-1.2.1/bin
如图15
---------------------------------------------------------------------------------------------
# remote nodes.
# The java implementation to use. Required.
export JAVA_HOME=/usr/local/jdk1.8.0_101
export HADOOP_HOME=/usr/local/hadoop-1.2.1
export PATH=$PATH:/usr/local/hadoop-1.2.1/bin
--------------------------------------------------------------------------------------
2、打开conf/core-site.xml
配置,如下内容:
- <configuration>
- <property>
- <name>fs.default.name</name>
- <value>hdfs://localhost:9000</value>
- </property>
- <property>
- <name>dfs.replication</name>
- <value>1</value>
- </property>
- <property>
- <name>hadoop.tmp.dir</name>
- <value>/home/hadoop/tmp</value>
- </property>
- </configuration>
3、打开conf目录下的mapred-site.xml
配置如下内容:
- <configuration>
- <property>
- <name>mapred.job.tracker</name>
- <value>localhost:9001</value>
- </property>
- </configuration>
运行测试:
1、格式化namenode,如图18
hadoop namenode -format
[root@linux-01 hadoop-1.2.1]# hadoop namenode -format
Warning: $HADOOP_HOME is deprecated.
16/09/14 22:10:18 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = linux-01/127.0.0.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.2.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG: java = 1.8.0_101
************************************************************/
Re-format filesystem in /home/hadoop/tmp/dfs/name ? (Y or N) Y
16/09/14 22:10:23 INFO util.GSet: Computing capacity for map BlocksMap
16/09/14 22:10:23 INFO util.GSet: VM type = 32-bit
16/09/14 22:10:23 INFO util.GSet: 2.0% max memory = 1013645312
16/09/14 22:10:23 INFO util.GSet: capacity = 2^22 = 4194304 entries
16/09/14 22:10:23 INFO util.GSet: recommended=4194304, actual=4194304
16/09/14 22:10:23 INFO namenode.FSNamesystem: fsOwner=root
16/09/14 22:10:23 INFO namenode.FSNamesystem: supergroup=supergroup
16/09/14 22:10:23 INFO namenode.FSNamesystem: isPermissionEnabled=true
16/09/14 22:10:23 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
16/09/14 22:10:23 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
16/09/14 22:10:23 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
16/09/14 22:10:23 INFO namenode.NameNode: Caching file names occuring more than 10 times
16/09/14 22:10:23 INFO common.Storage: Image file /home/hadoop/tmp/dfs/name/current/fsimage of size 110 bytes saved in 0 seconds.
16/09/14 22:10:23 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/home/hadoop/tmp/dfs/name/current/edits
16/09/14 22:10:23 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/home/hadoop/tmp/dfs/name/current/edits
16/09/14 22:10:23 INFO common.Storage: Storage directory /home/hadoop/tmp/dfs/name has been successfully formatted.
16/09/14 22:10:23 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at linux-01/127.0.0.1
************************************************************/
可能遇到如下错误(倒腾这个过程次数多了),如图19
执行如图20,再次执行如图18
2、启动hadoop,如图21
./bin/start-all.sh
[root@linux-01 hadoop-1.2.1]# ./bin/start-all.sh
Warning: $HADOOP_HOME is deprecated.
starting namenode, logging to /usr/local/hadoop-1.2.1/logs/hadoop-root-namenode-linux-01.out
root@localhost's password:
localhost: Warning: $HADOOP_HOME is deprecated.
localhost:
localhost: starting datanode, logging to /usr/local/hadoop-1.2.1/logs/hadoop-root-datanode-linux-01.out
root@localhost's password:
localhost: Warning: $HADOOP_HOME is deprecated.
localhost:
localhost: starting secondarynamenode, logging to /usr/local/hadoop-1.2.1/logs/hadoop-root-secondarynamenode-linux-01.out
starting jobtracker, logging to /usr/local/hadoop-1.2.1/logs/hadoop-root-jobtracker-linux-01.out
root@localhost's password:
localhost: Warning: $HADOOP_HOME is deprecated.
localhost:
localhost: starting tasktracker, logging to /usr/local/hadoop-1.2.1/logs/hadoop-root-tasktracker-linux-01.out
3、验证hadoop是否成功启动,
使用jps 如图22
[root@linux-01 hadoop-1.2.1]# jps
25991 DataNode
26361 TaskTracker
24827 FsShell
26428 Jps
26204 JobTracker
26124 SecondaryNameNode
25855 NameNode
运行自带wordcount例子(jidong啊)
1、准备需要进行wordcount的文件,如图23(在test.txt中随便输入字符串,保存并退出)
[root@linux-01 tmp]# mkdir test.txt
[root@linux-01 tmp]# vi test.txt
hello , welcome hadoop !!!
保存退出 :wq
-------------------------------------------------------------------------------------------
2、将上一步中的测试文件上传到dfs文件系统中的firstTest目录下,如图24(如果dfs下不包含firstTest目录的话自动创建一个同名目录,使用命令:bin/hadoop dfs -ls查看dfs文件系统中已有的目录)
bin/hadoop dfs -copyFromLocal /tmp/test.txt firstTest
3、执行wordcount,如图25(对firstest下的所有文件执行wordcount,将统计结果输出到result文件夹中,若result文件夹不存在则自动创建)
bin/hadoop jar hadoop-examples-1.2.1.jar wordcount firstTest result
[root@linux-01 hadoop-1.2.1]# bin/hadoop jar hadoop-examples-1.2.1.jar wordcount firstTest result
Warning: $HADOOP_HOME is deprecated.
16/09/14 22:41:42 INFO input.FileInputFormat: Total input paths to process : 1
16/09/14 22:41:42 INFO util.NativeCodeLoader: Loaded the native-hadoop library
16/09/14 22:41:42 WARN snappy.LoadSnappy: Snappy native library not loaded
16/09/14 22:41:42 INFO mapred.JobClient: Running job: job_201609142236_0004
16/09/14 22:41:43 INFO mapred.JobClient: map 0% reduce 0%
16/09/14 22:41:47 INFO mapred.JobClient: map 100% reduce 0%
16/09/14 22:41:55 INFO mapred.JobClient: map 100% reduce 33%
16/09/14 22:41:56 INFO mapred.JobClient: map 100% reduce 100%
16/09/14 22:41:57 INFO mapred.JobClient: Job complete: job_201609142236_0004
16/09/14 22:41:57 INFO mapred.JobClient: Counters: 29
16/09/14 22:41:57 INFO mapred.JobClient: Map-Reduce Framework
16/09/14 22:41:57 INFO mapred.JobClient: Spilled Records=10
16/09/14 22:41:57 INFO mapred.JobClient: Map output materialized bytes=63
16/09/14 22:41:57 INFO mapred.JobClient: Reduce input records=5
16/09/14 22:41:57 INFO mapred.JobClient: Virtual memory (bytes) snapshot=589844480
16/09/14 22:41:57 INFO mapred.JobClient: Map input records=1
16/09/14 22:41:57 INFO mapred.JobClient: SPLIT_RAW_BYTES=106
16/09/14 22:41:57 INFO mapred.JobClient: Map output bytes=47
16/09/14 22:41:57 INFO mapred.JobClient: Reduce shuffle bytes=63
16/09/14 22:41:57 INFO mapred.JobClient: Physical memory (bytes) snapshot=190750720
16/09/14 22:41:57 INFO mapred.JobClient: Reduce input groups=5
16/09/14 22:41:57 INFO mapred.JobClient: Combine output records=5
16/09/14 22:41:57 INFO mapred.JobClient: Reduce output records=5
16/09/14 22:41:57 INFO mapred.JobClient: Map output records=5
16/09/14 22:41:57 INFO mapred.JobClient: Combine input records=5
16/09/14 22:41:57 INFO mapred.JobClient: CPU time spent (ms)=760
16/09/14 22:41:57 INFO mapred.JobClient: Total committed heap usage (bytes)=177016832
16/09/14 22:41:57 INFO mapred.JobClient: File Input Format Counters
16/09/14 22:41:57 INFO mapred.JobClient: Bytes Read=27
16/09/14 22:41:57 INFO mapred.JobClient: FileSystemCounters
16/09/14 22:41:57 INFO mapred.JobClient: HDFS_BYTES_READ=133
16/09/14 22:41:57 INFO mapred.JobClient: FILE_BYTES_WRITTEN=109333
16/09/14 22:41:57 INFO mapred.JobClient: FILE_BYTES_READ=63
16/09/14 22:41:57 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=37
16/09/14 22:41:57 INFO mapred.JobClient: Job Counters
16/09/14 22:41:57 INFO mapred.JobClient: Launched map tasks=1
16/09/14 22:41:57 INFO mapred.JobClient: Launched reduce tasks=1
16/09/14 22:41:57 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=8646
16/09/14 22:41:57 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
16/09/14 22:41:57 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=4553
16/09/14 22:41:57 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
16/09/14 22:41:57 INFO mapred.JobClient: Data-local map tasks=1
16/09/14 22:41:57 INFO mapred.JobClient: File Output Format Counters
16/09/14 22:41:57 INFO mapred.JobClient: Bytes Written=37
4、查看结果,如图26
bin/hadoop dfs -cat result/part-r-00000
[root@linux-01 hadoop-1.2.1]# bin/hadoop dfs -cat result/part-r-00000
Warning: $HADOOP_HOME is deprecated.
!!! 1
, 1
hadoop 1
hello 1
welcome 1