SSH免密码登录
SSH免密码登录,因为Hadoop需要通过SSH登录到各个节点进行操作,我用的是root用户,每台服务器都生成公钥,再合并到authorized_keys(1)CentOS默认没有启动ssh无密登录,去掉/etc/ssh/sshd_config其中2行的注释,每台服务器都要设置,
#RSAAuthentication yes
#PubkeyAuthentication yes
(2)输入命令,ssh-keygen -t rsa,生成key,都不输入密码,一直回车,/root就会生成.ssh文件夹,每台服务器都要设置,
(3)合并公钥到authorized_keys文件,在Master服务器,进入/root/.ssh目录,通过SSH命令合并,
cat id_rsa.pub>> authorized_keys
ssh root@192.168.0.183 cat ~/.ssh/id_rsa.pub>> authorized_keys
ssh root@192.168.0.184 cat ~/.ssh/id_rsa.pub>> authorized_keys
(4)把Master服务器的authorized_keys、known_hosts复制到Slave服务器的/root/.ssh目录
(5)完成,ssh root@192.168.0.183、ssh root@192.168.0.184就不需要输入密码了
Hadoop2.7.3分布式安装
配置Hadoop环境变量
export HADOOP_HOME="/home/hadoop/hadoop-2.7.3"export PATH="$HADOOP_HOME/bin:$PATH"
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
配置 hadoop-env.sh 增加如下内容
export JAVA_HOME=/usr/java/jdk1.8.0_111
配置slaves文件,增加slave主机名
Node02
Node03
配置 core-site.xml
<configuration><!-- 指定hdfs的nameservice为ns1 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://node01:9000</value>
</property>
<!-- Size of read/write buffer used in SequenceFiles. -->
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<!-- 指定hadoop临时目录,自行创建 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoop-2.7.3/tmp</value>
</property>
</configuration>
配置 hdfs-site.xml
<configuration><property>
<name>dfs.namenode.secondary.http-address</name>
<value>node01:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file: /home/hadoop/hadoop-2.7.3/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file: /home/hadoop/hadoop-2.7.3/hdfs/data</value>
</property>
</configuration>
配置yarn-site.xml
<configuration><!-- Site specific YARN configuration properties -->
<!-- Configurations for ResourceManager -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>node01:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>node01:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>node01:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>node01:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>node01:8088</value>
</property>
</configuration>
配置mapred-site.xml
<configuration><property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>node01:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>node01:19888</value>
</property>
</configuration>
格式化节点
hdfs namenode –format启动集群
查看web界面
在/etc/sysconfig/iptables中添加:-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 80 -j ACCEPT
访问http://192.168.1.106:8088
Hadoop2.7.3 Eclipse插件编译
1. hadoop2x-eclipse-plugin-master\src\contrib\eclipse-plugin\build.xml文件中:1.1. 在<target name="jar" depends="compile" unless="skip.contrib">这个element下,有一堆<copy file=....>的sub-element,将其中 <copy file="${hadoop.home}/share/hadoop/common/lib/htrace-core-${htrace.version}.jar" todir="${build.dir}/lib" verbose="true"/> 这个element替换为:
<copy file="${hadoop.home}/share/hadoop/common/lib/htrace-core-${htrace.version}-incubating.jar" todir="${build.dir}/lib" verbose="true"/>
并添加两个新的element:
<copy file="${hadoop.home}/share/hadoop/common/lib/servlet-api-${servlet-api.version}.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.home}/share/hadoop/common/lib/commons-io-${commons-io.version}.jar" todir="${build.dir}/lib" verbose="true"/>
1.2. 在<jar jarfile="${build.dir}/hadoop-${name}-${hadoop.version}.jar" manifest="${root}/META-INF/MANIFEST.MF">这个element的attribute子element中,为Bundle-ClassPath的值列表中添加:
lib/servlet-api-${servlet-api.version}.jar,
lib/commons-io-${commons-io.version}.jar,
并将lib/htrace-core-${htrace.version}.jar替换为lib/htrace-core-${htrace.version}-incubating.jar
2. hadoop2x-eclipse-plugin-master\src\ivy\libraries.properties文件中,更改下列属性和其值使其对应hadoop2.7.3和当前环境的jar包版本:
hadoop.version=2.7.3
apacheant.version=1.9.7
commons-collections.version=3.2.2
commons-httpclient.version=3.1
commons-logging.version=1.1.3
commons-io.version=2.4
slf4j-api.version=1.7.10
slf4j-log4j12.version=1.7.10
3. hadoop2x-eclipse-plugin-master\ivy\libraries.properties文件中,属性和值的修改同上面的2.。另外需要增加一个修改:
htrace.version=3.1.0
编译插件:
进入包括上面配置修改的插件源代码hadoop2x-eclipse-plugin-master\src\contrib\eclipse-plugin目录,运行如下ant命令进行编译:ant jar -Dversion=2.7.3 -Declipse.home=<Eclipse_inst_dir> -Dhadoop.home=<hadoop_inst_dir>
其中<Eclipse_inst_dir>为Eclipse的安装目录,<hadoop_inst_dir>为hadoop-2.7.3的安装目录。
Eclipse连接Hdfs
把编译完的Hadoop插件放到eclipse/plugins目录下;
重启eclipse之后,window-preferences中会出现Hadoop Map/Reduce选项,选中并设置hadoop在windows下的目录;
在show view中把map/reduce显示到工具栏;
配置远程信息,打开 Hadoop Location配置窗口,配置Map/Reduce Master和DFS Mastrer,Host和Port配置成与core-site.xml的一致;
出现如下界面即配置成功:
listing folder content问题解决
使用windows怎么也链接不上,出现了Listing folder content...的错误,主要问题有2点:1.防火墙关闭不正确
2.core-site.xml中fs.defaultFS地址配置成了hdfs://localhost:9000
关闭防火墙:
systemctl stop firewalld.service #停止firewall
systemctl disable firewalld.service #禁止firewall开机启动