文章目录
一、Prerequisites
- 一台机器:这里使用Centos7.8,机器ip是:192.168.0.20
- JAVA8:必须使用JAVA8,其他java版本不支持
- SSH:需要免密登录自身
Java8配置示例如下:
mkdir /usr/local/java8
tar zxvf jdk-8u351-linux-x64.tar.gz -C /usr/local/java8/
cd /usr/local/java8/jdk1.8.0_351/
cat >> /etc/profile << 'eof'
## JDK8
export JAVA_HOME=/usr/local/java8/jdk1.8.0_351
export JRE_HOME=${JAVA_HOEM}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:$CLASSPATH
export JAVA_PATH=${JAVA_HOME}/bin:${JRE_HOME}/bin
export PATH=$PATH:${JAVA_PATH}
eof
source /etc/profile
二、Hadoop-3.3.5安装
1.1 配置hdfs
tar zxvf hadoop-3.3.5.tar.gz -C /usr/local/
cd /usr/local/hadoop-3.3.5/
cat >> /etc/profile << 'eoe'
## Hadoop3.3.5
export HADOOP_HOME=/usr/local/hadoop-3.3.5
export PATH=$PATH:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin
eoe
source /etc/profile
echo "export JAVA_HOME=/usr/local/java8/jdk1.8.0_351" >> etc/hadoop/hadoop-env.sh
echo "export HADOOP_PID_DIR=${HADOOP_HOME}/pids" >> /etc/hadoop/hadoop-env.sh
cat > etc/hadoop/core-site.xml << 'eof'
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.0.20:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
</configuration>
eof
cat > etc/hadoop/hdfs-site.xml << 'eog'
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/mnt/data01/hadoop</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>268435456</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>100</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/mnt/data01/hdfs_dndata</value>
</property>
</configuration>
eog
1.2 配置hosts
hostnamectl set-hostname bigdata001.local
bash
echo "192.168.0.20 bigdata001.local" >> /etc/hosts
1.3 创建免密
useradd xwp
su - xwp -c "ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa"
su - xwp -c "cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys"
su - xwp -c "chmod 0600 ~/.ssh/authorized_keys"
chown -R xwp:xwp /usr/local/hadoop-3.3.5
mkdir /usr/local/hadoop-3.3.5/pids
chown -R xwp:xwp /usr/local/hadoop-3.3.5/pids
mkdir -p /mnt/data01/hadoop
mkdir -p /mnt/data01/hdfs_dndata
mkdir -p /mnt/data01/yarn_nmdata
chown -R xwp:xwp /mnt/data01/hadoop
chown -R xwp:xwp /mnt/data01/hdfs_dndata
chown -R xwp:xwp /mnt/data01/yarn_nmdata
1.4 启动HDFS
这里启动了一个Namenode,SecondaryNamenode,Datanode
su - xwp
cd /usr/local/hadoop-3.3.5
bin/hdfs namenode -format
sbin/start-dfs.sh
Namenode的web页面:http://192.168.0.20:9870
1.4.1 创建执行mapreduce任务所需的HDFS目录
hdfs dfs -mkdir /user
hdfs dfs -mkdir /user/xwp
1.4.2 执行mapreduce任务
hdfs dfs -mkdir input
hdfs dfs -put etc/hadoop/*.xml input
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.5.jar grep input output 'dfs[a-z.]+'
hdfs dfs -get output output
cat output/*
1.5 启动YARN
配置mapred和yarn
cat > etc/hadoop/mapred-site.xml << 'eof'
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>1536</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1024M</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>3072</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx2560M</value>
</property>
<property>
<name>mapreduce.task.io.sort.mb</name>
<value>512</value>
</property>
<property>
<name>mapreduce.task.io.sort.factor</name>
<value>100</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
<!--property>
<name>mapreduce.jobhistory.address</name>
<value>hdfs://192.168.0.20:10020</value>
</property-->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>http://192.168.0.20:19888</value>
</property>
</configuration>
eof
cat > etc/hadoop/yarn-site.xml << 'eog'
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/mnt/data01/yarn_nmdata</value>
</property>
</configuration>
eog
sbin/start-yarn.sh
sbin/start-yarn.sh只启动了ResourceManger和NodeManager。如果需要启动JobhistoryServer,则bin/mapred --daemon start historyserver
即可。这里我没有启动它。
Yarn的web页面:http://192.168.0.20:8088
OK!环境已经搭建好了,可以开始愉快的学习hadoop了。