Hadoop Get started
Written by huchi
JDK的安装
首先确保没有安装open-jdk, 因为这个比较鸡肋。
事实上JDK从1.5版本以后就自带JRE了,我们只需要下载JDK就ok
wget http://download.oracle.com/otn-pub/java/jdk/8u171-b11/512cd62ec5174c3487ac17c61aaa89e8/jdk-8u171-linux-x64.tar.gz?AuthParam=1528691071_1706915783a8779ece28a7fb1ce5ca6f
下载完后可以使用mv修改一下文件名,实在是太长了
然后使用tar -zxvf 来解压缩,解压缩后就是配置环境变量了
root@huchi:/home/huchi/install_package/jdk# pwd
/home/huchi/install_package/jdk
root@huchi:/home/huchi/install_package/jdk# sudo vim /etc/profile
添加以下内容:
export JAVA_HOME=/home/huchi/install_package/jdk/jdk1.8.0_171
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
这样就添加完了环境变量,!wq退出后可以用java -version
和javac -version
来测试
Hadoop的安装
wget http://mirrors.shu.edu.cn/apache/hadoop/common/hadoop-2.9.1/hadoop-2.9.1.tar.gz
一样的下载后tar -zxvf 解压。
vim /etc/profile. 修改程这个样子
export JAVA_HOME=/home/huchi/install_package/jdk/jdk1.8.0_171
export JRE_HOME=${JAVA_HOME}/jre
export HADOOP_HOME=//home/huchi/install_package/hadoop/hadoop-2.9.1
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH
推出后source /etc/profile生效。
hadoop version
就可以查看hadoop的版本了。
Hadoop的配置
我这边主要是做了一下伪分布的配置:
第一件事是修改Hadoop里的hadoop-env.sh;
root@huchi:~# echo ${JAVA_HOME}
/home/huchi/install_package/jdk/jdk1.8.0_171
root@huchi:~# sudo vim ${HADOOP_HOME}/etc/hadoop/hadoop-env.sh
修改JAVA_HOME, 该为:
export JAVA_HOME=/home/huchi/install_package/jdk/jdk1.8.0_171
事实上这么做是有原因的,具体和linux和bash有关。
stackoverflow的解释:
The way to debug this is to put an “echo $JAVA_HOME” in start-all.sh. Are you running your hadoop environment under a different username, or as yourself? If the former, it’s very likely that the JAVA_HOME environment variable is not set for that user.
The other potential problem is that you have specified JAVA_HOME incorrectly, and the value that you have provided doesn’t point to a JDK/JRE. Note that “which java” and “java -version” will both work, even if JAVA_HOME is set incorrectly.
- 修改core-site.xml,设置namenode主机,hadoop文件系统
root@huchi:~# vim ${HADOOP_HOME}/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/huchi/install_package/hadoop/hadoop-2.9.1/tmp</value>
</property>
</configuration>
- 修改dfs-site.xml, 设置数据块副本数目
root@huchi:~# vim ${HADOOP_HOME}/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/home/huchi/install_package/hadoop/hadoop-2.9.1/hdfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/huchi/install_package/hadoop/hadoop-2.9.1/hdfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
- 修改mapred-site.xml文件
这个文件事先不存在,不过他提供了template,我们复制一下再修改。
root@huchi://home/huchi/install_package/hadoop/hadoop-2.9.1/etc/hadoop# cp mapred-site.xml.template mapred-site.xml
root@huchi://home/huchi/install_package/hadoop/hadoop-2.9.1/etc/hadoop# vim mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
- 修改yarn-site.xml , 包含了Mapreduce启动的配置信息。
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
启动
第一步执行格式化命令
hdfs namenode -format
第二步启动
start-dfs.sh
start-yarn.sh
这两个文件事实上都在/sbin文件夹下,第一个可以建立namenode, datanode ,secondarynamenode.
执行完这两个命令后使用jps
可以查看java进程,如果得到连jps的6个,那说明成功了,如果可以访问,可以访问localhost:8088 与 50070这两个端口。