国内镜像
//清华大学
https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/
//北京理工大学
http://mirror.bit.edu.cn/apache/hadoop/common/
JDK安装
yum remove java* -y
yum remove jdk* -y
wget https://github.com/frekele/oracle-java/releases/download/8u212-b10/jdk-8u212-linux-x64.rpm
rpm -ivh jdk-8u212-linux-x64.rpm
vi /etc/profile.d/java.sh
------
export JAVA_HOME=/usr/java/jdk1.8.0_212-amd64
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin
======
source /etc/profile
java -version
单机单实例安装
下载安装
wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-3.1.3/hadoop-3.1.3.tar.gz
tar -zxf hadoop-3.1.3.tar.gz
mv hadoop-3.1.3 /usr/local/hadoop
chown -R hadoop:hadoop /usr/local/hadoop
创建用户
groupadd hadoop
useradd -g hadoop -m -s /bin/bash hadoop
//切换用户
su - hadoop
环境变量
vi ~/.bashrc
------
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
======
source ~/.bashrc
hadoop version
创建ssh免密
cd ~/.ssh/
ssh-keygen -t rsa
cat id_rsa.pub >> authorized_keys
chmod 600 ./authorized_keys
修改配置
vi $HADOOP_HOME/etc/hadoop/core-site.xml
------
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/tmp/hadoop</value>
<description>临时目录的根目录</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
======
mkdir /tmp/hadoop
vi /usr/local/hadoop/etc/hadoop/hdfs-site.xml
------
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/tmp/hadoop/dfs/name</value>
</property>
</configuration>
======
启动hadoop
//NameNode 初始化
//在/tmp/hadoop/dfs/下创建data,name,namesecondary目录
hdfs namenode -format
$HADOOP_HOME/sbin/start-dfs.sh
------
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [localhost.localdomain]
======
jps
------
68048 Jps
67620 NameNode
67880 SecondaryNameNode
67726 DataNode
======
端口说明
日志里有端口的用途
9864: DatanodeHttpServer; DataNode Information
9866: streaming server
9867: IPC server
9868: Web-server for secondary; SecondaryNamenode information
9870: Web-server for hdfs; Namenode information
41409: Jetty; Hadoop Administration
9000: nameNode通信端口,上传数据块注册信息
测试
将 input 文件夹中的所有文件作为输入,筛选当中符合正则表达式 dfs[a-z.]+ 的单词并统计出现的次数,最后输出结果到 output 文件夹中
cd /var/tmp
mkdir input
cp $HADOOP_HOME/etc/hadoop/*.xml input/
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep input output 'dfs[a-z.]+'
cat output/*
Hadoop 默认不会覆盖结果文件,因此再次运行上面实例会提示出错,需要先将 output 删除。