《深入理解大数据-大数据处理与编辑实践》hadoop1.2.1安装

【第一部分】《深入理解大数据》一书的源代码 

http://download.csdn.net/detail/heming621/9423291

http://hadoop.apache.org/

https://www.zhihu.com/question/19795366

http://mooc.guokr.com/course/2194/%E5%A4%A7%E6%95%B0%E6%8D%AE%E7%B3%BB%E7%BB%9F%E5%9F%BA%E7%A1%80/

http://download.csdn.net/album/detail/3466/1/1

【第二部分】安装hadoop1.2.1安装

【1】安装java程序
jdk-6u45-linux-i586-rpm.rar 解压后为 jdk-6u45-linux-i586-rpm.bin
安装执行 ./jdk-6u45-linux-i586-rpm.bin
安装成功后目录为 /usr/java/jdk1.6.0_45
A22811459:/usr/java/jdk1.6.0_45 # pwd
/usr/java/jdk1.6.0_45
A22811459:/usr/java/jdk1.6.0_45 # ls
COPYRIGHT  LICENSE  README.html  THIRDPARTYLICENSEREADME.txt  bin  include  jre  lib  man  src.zip

【1.2】在系统中/etc/profile添加java路径,便于调用
#set java
export JAVA_HOME=/usr/java/jdk1.6.0_45
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin

【1.3】让配置生效
# source /etc/profile

【1.4】查看java版本,说明安装成功
A22811459:/usr/java/jdk1.6.0_45 # java -version
java version "1.6.0_45"
Java(TM) SE Runtime Environment (build 1.6.0_45-b06)
Java HotSpot(TM) Server VM (build 20.45-b01, mixed mode

【1.5】可以写一个简单的java程序进行编译运行,进一步确保java安装成功
HelloWel.java

public class HelloWel {
       public static void main(String[] args)
       {
          System.out.println("JAVA OK");    
       }    
}

编译和运行
# javac HelloWel.java
# java HelloWel
JAVA OK
至此可百分百确保Java安装没有问题,java路径(后面会用到)为 /usr/java/jdk1.6.0_45


【2】hadoop1.2.1安装 参考《深入理解大数据》
【2.1】创建hadoop用户
#groupadd hadoop-user
#useradd -g hadoop-user hadoop
#passwd hadoop

【2.2】配置SSH
#ssh-keygen -t rsa
# cd /root/.ssh/
#cp id_rsa.pub authorized_keys
#ssh localhost
查看结果
# ls
authorized_keys  id_rsa  id_rsa.pub  known_hosts

【2.3】配置hadoop环境
hadoop系统版本 hadoop-1.2.1.tar.gz
解压后linux目录为 /home/longhui/hadoop/hadoop-1.2.1/

【2.3.1】配置 conf/hadoop-env.sh 配置JAVA_HOME对应的路径
export JAVA_HOME=/usr/java/jdk1.6.0_45

【2.3.2】配置三个xml文件
【1】core-site.xml配置
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp/hadoop</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://A22811459:9000</value>
</property>
</configuration>
【备注】
临时文件夹为/tmp/hadoop,配置成功后该目录下会生成两个文件夹dfs  mapred,并且/tmp目录下会生成一些pid文件
A22811459:/tmp # ls hadoop
hadoop/                            hadoop-root-jobtracker.pid         hadoop-root-secondarynamenode.pid
hadoop-root-datanode.pid           hadoop-root-namenode.pid           hadoop-root-tasktracker.pid
【2】hdfs-site.xml
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/home/longhui/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/longhui/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
【备注】
配置成功后/home/longhui/hadoop/dfs/name下会生成一些文件current  image  in_use.lock  previous.checkpoint
/home/longhui/hadoop/dfs/data生成blocksBeingWritten  current  detach  in_use.lock  storage  tmp
【3】mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>A22811459:9001</value>
</property>
<property>
<name>mapreduce.cluster.local.dir</name>
<value>/home/longhui/hadoop/mapred/local</value>
</property>
<property>
<name>mapreduce.jobtracker.system.dir</name>
<value>/home/longhui/hadoop/mapred/system</value>
</property>
</configuration>
【4】由于主机名为A22811459,所以就不是localhost,并且/etc/hosts文件中也要修改下
127.0.0.1       A22811459


【2.3.3】在/etc/profile中添加hadoop路径并# source /etc/profile 生效
#set hadoop
export HADOOP_HOME_WARN_SUPPRESS=1
export HADOOP_HOME=/home/longhui/hadoop/hadoop-1.2.1
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

【2.3.4】格式化HDFS文件系统
执行 bin/hadoop namenode -format 或直接hadoop namenode -format 接着输入Y
# hadoop namenode -format
16/12/15 12:59:50 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = A22811459/127.0.0.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 1.2.1
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG:   java = 1.6.0_45
************************************************************/
Re-format filesystem in /home/longhui/hadoop/dfs/name ? (Y or N) Y
16/12/15 12:59:52 INFO util.GSet: Computing capacity for map BlocksMap
16/12/15 12:59:52 INFO util.GSet: VM type       = 32-bit
16/12/15 12:59:52 INFO util.GSet: 2.0% max memory = 932118528
16/12/15 12:59:52 INFO util.GSet: capacity      = 2^22 = 4194304 entries
16/12/15 12:59:52 INFO util.GSet: recommended=4194304, actual=4194304
16/12/15 12:59:53 INFO namenode.FSNamesystem: fsOwner=root
16/12/15 12:59:53 INFO namenode.FSNamesystem: supergroup=supergroup
16/12/15 12:59:53 INFO namenode.FSNamesystem: isPermissionEnabled=true
16/12/15 12:59:53 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
16/12/15 12:59:53 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
16/12/15 12:59:53 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
16/12/15 12:59:53 INFO namenode.NameNode: Caching file names occuring more than 10 times
16/12/15 12:59:53 INFO common.Storage: Image file /home/longhui/hadoop/dfs/name/current/fsimage of size 110 bytes saved in 0 seconds.
16/12/15 12:59:53 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/home/longhui/hadoop/dfs/name/current/edits
16/12/15 12:59:53 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/home/longhui/hadoop/dfs/name/current/edits
16/12/15 12:59:53 INFO common.Storage: Storage directory /home/longhui/hadoop/dfs/name has been successfully formatted.
16/12/15 12:59:53 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at A22811459/127.0.0.1
************************************************************/
【备注】如果警告Warning: $HADOOP_HOME is deprecated. 
解决方法:在/etc/profie中添加一行,然后让配置生效# source /etc/profile,再运行bin/hadoop namenode -format就不会报错
export HADOOP_HOME_WARN_SUPPRESS=1
【2.3.5】启动hadoop环境  注停止时stop-all.sh
# start-all.sh
starting namenode, logging to /home/longhui/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-namenode-A22811459.out
localhost: starting datanode, logging to /home/longhui/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-datanode-A22811459.out
localhost: starting secondarynamenode, logging to /home/longhui/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-secondarynamenode-A22811459.out
starting jobtracker, logging to /home/longhui/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-jobtracker-A22811459.out
localhost: starting tasktracker, logging to /home/longhui/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-tasktracker-A22811459.out

【2.3.6】使用jps查看集群状态,除jps进程外,另外五个进程缺一不可。如下说明正常启动了
# jps
2352 TaskTracker
1940 DataNode
1802 NameNode
2465 Jps
2211 JobTracker
2106 SecondaryNameNode

【3】运行第一个自带的测试用例:计算PI的值
A22811459:/home/longhui/hadoop/hadoop-1.2.1 # hadoop jar hadoop-examples-1.2.1.jar pi 2 5
Number of Maps  = 2
Samples per Map = 5
Wrote input for Map #0
Wrote input for Map #1
Starting Job
16/12/15 14:06:04 INFO mapred.FileInputFormat: Total input paths to process : 2
16/12/15 14:06:04 INFO mapred.JobClient: Running job: job_201612151254_0001
16/12/15 14:06:05 INFO mapred.JobClient:  map 0% reduce 0%
16/12/15 14:06:10 INFO mapred.JobClient:  map 100% reduce 0%
16/12/15 14:06:18 INFO mapred.JobClient:  map 100% reduce 33%
16/12/15 14:06:19 INFO mapred.JobClient:  map 100% reduce 100%
16/12/15 14:06:19 INFO mapred.JobClient: Job complete: job_201612151254_0001
16/12/15 14:06:19 INFO mapred.JobClient: Counters: 30
16/12/15 14:06:19 INFO mapred.JobClient:   Job Counters
16/12/15 14:06:19 INFO mapred.JobClient:     Launched reduce tasks=1
16/12/15 14:06:19 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6864
16/12/15 14:06:19 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
16/12/15 14:06:19 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
16/12/15 14:06:19 INFO mapred.JobClient:     Launched map tasks=2
16/12/15 14:06:19 INFO mapred.JobClient:     Data-local map tasks=2
16/12/15 14:06:19 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=8661
16/12/15 14:06:19 INFO mapred.JobClient:   File Input Format Counters
16/12/15 14:06:19 INFO mapred.JobClient:     Bytes Read=236
16/12/15 14:06:19 INFO mapred.JobClient:   File Output Format Counters
16/12/15 14:06:19 INFO mapred.JobClient:     Bytes Written=97
16/12/15 14:06:19 INFO mapred.JobClient:   FileSystemCounters
16/12/15 14:06:19 INFO mapred.JobClient:     FILE_BYTES_READ=50
16/12/15 14:06:19 INFO mapred.JobClient:     HDFS_BYTES_READ=478
16/12/15 14:06:19 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=160889
16/12/15 14:06:19 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=215
16/12/15 14:06:19 INFO mapred.JobClient:   Map-Reduce Framework
16/12/15 14:06:19 INFO mapred.JobClient:     Map output materialized bytes=56
16/12/15 14:06:19 INFO mapred.JobClient:     Map input records=2
16/12/15 14:06:19 INFO mapred.JobClient:     Reduce shuffle bytes=56
16/12/15 14:06:19 INFO mapred.JobClient:     Spilled Records=8
16/12/15 14:06:19 INFO mapred.JobClient:     Map output bytes=36
16/12/15 14:06:19 INFO mapred.JobClient:     Total committed heap usage (bytes)=377028608
16/12/15 14:06:19 INFO mapred.JobClient:     CPU time spent (ms)=3100
16/12/15 14:06:19 INFO mapred.JobClient:     Map input bytes=48
16/12/15 14:06:19 INFO mapred.JobClient:     SPLIT_RAW_BYTES=242
16/12/15 14:06:19 INFO mapred.JobClient:     Combine input records=0
16/12/15 14:06:19 INFO mapred.JobClient:     Reduce input records=4
16/12/15 14:06:19 INFO mapred.JobClient:     Reduce input groups=4
16/12/15 14:06:19 INFO mapred.JobClient:     Combine output records=0
16/12/15 14:06:19 INFO mapred.JobClient:     Physical memory (bytes) snapshot=376963072
16/12/15 14:06:19 INFO mapred.JobClient:     Reduce output records=0
16/12/15 14:06:19 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1132392448
16/12/15 14:06:19 INFO mapred.JobClient:     Map output records=4
Job Finished in 15.585 seconds
Estimated value of Pi is 3.60000000000000000000

【4】

【4.1】输入服务器IP:50070端口,这里可以看到HDFS的管理情况。,可查看如下html界面
http://10.17.35.xxx:50070/dfshealth.jsp

NameNode 'A22811459:9000'

Started:Thu Dec 15 13:00:10 GMT+08:00 2016
Version:1.2.1, r1503152
Compiled:Mon Jul 22 15:23:09 PDT 2013 by mattf
Upgrades:There are no upgrades in progress.

Browse the filesystem
Namenode Logs

Cluster Summary

11 files and directories, 13 blocks = 24 total. Heap Size is 57.69 MB / 888.94 MB (6%)
Configured Capacity:273 GB
DFS Used:40 KB
Non DFS Used:260.77 GB
DFS Remaining:12.23 GB
DFS Used%:0 %
DFS Remaining%:4.48 %
Live Nodes:1
Dead Nodes:0
Decommissioning Nodes:0
Number of Under-Replicated Blocks:0


NameNode Storage:

Storage DirectoryTypeState
/home/longhui/hadoop/dfs/nameIMAGE_AND_EDITSActive


This is Apache Hadoop release 1.2.1

【4.2】 50030端口可以看到Map/Reduce的管理情况

A22811459 Hadoop Map/Reduce Administration

State: RUNNING
Started: Thu Dec 15 12:54:23 GMT+08:00 2016
Version: 1.2.1, r1503152
Compiled: Mon Jul 22 15:23:09 PDT 2013 by mattf
Identifier: 201612151254
SafeMode: OFF

Cluster Summary (Heap Size is 51.56 MB/888.94 MB)

Running Map TasksRunning Reduce TasksTotal SubmissionsNodesOccupied Map SlotsOccupied Reduce SlotsReserved Map SlotsReserved Reduce SlotsMap Task CapacityReduce Task CapacityAvg. Tasks/NodeBlacklisted NodesGraylisted NodesExcluded Nodes
00110000224.00000


Scheduling Information

Queue NameStateScheduling Information
defaultrunningN/A

Filter (Jobid, Priority, User, Name)
Example: 'user:smith 3200' will filter by 'smith' only in the user field and '3200' in all fields

Running Jobs

none

Completed Jobs

JobidStartedPriorityUserNameMap % CompleteMap TotalMaps CompletedReduce % CompleteReduce TotalReduces CompletedJob Scheduling InformationDiagnostic Info
job_201612151254_0001Thu Dec 15 14:06:04 GMT+08:00 2016NORMALrootPiEstimator100.00%
22100.00%
11NANA

Retired Jobs

none

Local Logs

Log directory, Job Tracker History
This is Apache Hadoop release 1.2.1































































评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值