Hadoop安装部署
Hadoop安装部署方式:本地模式,伪分布模式,集群模式
修改Hadoop的配置文件
本地模式: hadoop-env.sh --- JAVA_HOME /usr/java/jdk
伪分布模式: hadoop-env.sh --JAVA_HOME /usr/java/jdk
hdfs-site.xml --dfs.replication 1
core-site.xml --fs.defaultFS hdfs://:9000
--hadoop.tmp.dir /usr/local/cloud/hadoop/tmp
mapred-site.xml --mapreduce.framework.name yarn
yarn.site.xml --yarn.resourcemanager.hostname
--yarn.nodemanager.aux-servixes mapreduce_shuffle
完全分布式: hadoop-env.sh --- JAVA_HOME /usr/java/jdk
hdfs-site.xml --dfs.replication 2
--dfs.permissions false
core-site.xml --fs.defaultFS hdfs://192.168.56.81:9000
--hadoop.tmp.dir /usr/local/cloud/hadoop/tmp
slaves DataNode的地址 192.168.56.83
192.168.56.84
mapred-site.xml --mapreduce.framework.name yarn
yarn.site.xml --yarn.resourcemanager.hostname 192.168.56.81
--yarn.nodemanager.aux-servixes mapreduce_shuffle
实例:在CentOS下安装伪分布式Hadoop
安装JDK
1. 查看并卸载Linux自带的JDK
安装好的CentOS会自带OpenJdk,用命令 java -version ,会有下面的信息:
java version "1.6.0"
OpenJDK Runtime Environment (build 1.6.0-b09)
OpenJDK 64-Bit Server VM (build 1.6.0-b09, mixed mode)
先查看 rpm -qa | grep java
显示如下信息:
java-1.4.2-gcj-compat-1.4.2.0-40jpp.115
java-1.6.0-openjdk-1.6.0.0-1.7.b09.el5
卸载:
rpm -e --nodeps java-1.4.2-gcj-compat-1.4.2.0-40jpp.115
rpm -e --nodeps java-1.6.0-openjdk-1.6.0.0-1.7.b09.el5
2. 安装JDK
解压jdk.tar.gz文件成目录,如jdk1.8.0_45
mkdir /usr/java
cp jdk1.8.0_45 /usr/java
chown -R ty:ty java
ln -s ./jdk1.8.0_45/ jdk
3. 配置JDK
export JAVA_HOME=/usr/java/jdk
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
source /etc/profile
update-alternatives --install /usr/bin/java java /usr/java/jdk/bin/java 60
update-alternatives --config java
安装SSH
$ ssh-keygen -t rsa -P ''
...
Your identification has been saved in /home/hduser/.ssh/id_rsa.
Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.
...
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ ssh localhost
如果登录始终需要输入密码,请运行下面命令
chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keys
安装Hadoop
1. 使用root用户,创建cloud目录,并解压hadoop
cd /usr/local
mkdir cloud
tar zxvf hadoop-2.6.0.tar.gz
ln –s hadoop-2.6.0 hadoop
2. 配置/etc/profile
#set hadoop env
export HADOOP_HOME=/usr/local/cloud/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
source /etc/profile
3. 配置etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/java/jdk
export HADOOP_HOME=/usr/local/cloud/hadoop
4. 配置etc/hadoop/core-site.xml
hadoop.tmp.dir
/usr/local/cloud/hadoop/tmp
fs.defaultFS
hdfs://localhost:9000
5. 配置etc/hadoop/hdfs-site.xml
dfs.replication
1
dfs.namenode.name.dir
file:/usr/local/cloud/hadoop/dfs/name
dfs.datanode.data.dir
file:/usr/local/cloud/hadoop/dfs/data
fs.checkpoint.dir
file:/usr/local/cloud/hadoop/dfs/sname
fs.checkpoint.edits.dir
file:/usr/local/cloud/hadoop/dfs/sname
dfs.permissions
false
注:SecondaryNameNode存放FSImage和EditsLog的目录分别由配置文件中的fs.checkpoint.dir项和fs.checkpoint.edits.dir项来设置,然后会分别对这两类目录进行检查和恢复
6. 配置etc/hadoop/mapred-site.xml
mapreduce.framework.name
yarn
注:旧版的mapreduce在这里面是要配置以下内容的:
mapred.job.tracker
http://192.168.1.2:9001
新框架中已改为Yarn-site.xml中的resouceManager及nodeManager具体配置项,新框架中历史job的查询已从Jobtracker剥离,归入单独的mapreduce.jobtracker.jobhistory相关配置,
所以这里不需要配置这个选项。在yarn-site.xml配置相关属性即可。
7. 配置etc/hadoop/yarn-site.xml
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.nodemanager.aux-services.mapreduce.shuffle.class
org.apache.hadoop.mapred.ShuffleHandler
8. 创建相应目录
cd /usr/local/cloud/hadoop
mkdir –p ./dfs/name
mkdir –p ./dfs/data
mkdir –p ./dfs/sname
9. 格式Namenode
hdfs namenode -format
10. 启动Hadoop服务
start-dfs.sh
start-yarn.sh
hduser@ubuntu40:~$ jps
If everything is sucessful, you should see following services running
2583 DataNode
2970 ResourceManager
3461 Jps
3177 NodeManager
2361 NameNode
2840 SecondaryNameNode
11. 监控UI
http://localhost:50070/
http://localhost:8088/
编译hadoop2.6.0的eclipse插件
git clone https://github.com/winghc/hadoop2x-eclipse-plugin.git
cd src/contrib/eclipse-plugin
eclipse.home 和 hadoop.home 设置成你自己的环境路径
ant jar -Dversion=2.6.0 -Declipse.home=/usr/local/eclipse -Dhadoop.home=/usr/local/hadoop-2.6.0
注:需要手动安装的eclipse,通过命令行一键安装的不行
生成的位置
Building jar: /home/hunter/hadoop2x-eclipse-plugin/build/contrib/eclipse-plugin/hadoop-eclipse-plugin-2.6.0.jar
安装插件
1. 打开eclipse的用户最好是hadoop的管理员,也就是hadoop安装时设置的那个用户,否则会出现拒绝读写权限问题。
2. 复制编译好的jar到eclipse插件目录,重启eclipse
3. 配置 hadoop 安装目录
window ->preference -> hadoop Map/Reduce -> Hadoop installation directory
4. 配置Map/Reduce 视图
window ->Open Perspective -> other->Map/Reduce -> 点击“OK”
windows → show view → other->Map/Reduce Locations-> 点击“OK”
5. 控制台会多出一个"Map/Reduce Locations"的Tab页。在"Map/Reduce Locations"Tab页 点击图标<大象+>或者在空白的地方右键,
选择"New Hadoop location…",弹出对话框"New hadoop location…",配置如下内容:将ha1改为自己的hadoop用户。
MR Master -- Host:localhost Port:9001 DFS Master -- Host:localhost Port:9000
Location name :随便取个名字 比如 hadoop2.2.0
Map/Reduce(V2) Master :根据hdfs-site.xml中配置dfs.datanode.ipc.address的值填写
DFS Master: Name Node的IP和端口,根据core-site.xml中配置fs.defaultFS的值填写