1 解压hadoop安装包
[cwptky@cwptky hadoop]# mkdir hadoop
[cwptky@cwptky hadoop]# wget https://archive.apache.org/dist/hadoop/common/hadoop-3.2.2/hadoop-3.2.2.tar.gz
[cwptky@cwptky hadoop]# tar -zxvf hadoop-3.2.2.tar.gz
2 在hadoop的配置文件etc/hadoop/hadoop-env.sh中配置java环境,文件末尾追加
export JAVA_HOME=/software/jdk1.8.0_201
3 测试hadoop环境
[cwptky@cwptky hadoop]# cd /usr/local/src/hadoop/hadoop-3.2.2
[cwptky@cwptky hadoop-3.2.2]# bin/hadoop version
Hadoop 3.2.2
Source code repository Unknown -r 7a3bc90b05f257c8ace2f76d74264906f0f7a932
Compiled by hexiaoqiao on 2021-01-03T09:26Z
Compiled with protoc 2.5.0
From source with checksum 5a8f564f46624254b27f6a33126ff4
This command was run using /usr/local/src/hadoop/hadoop-3.2.2/share/hadoop/common/hadoop-common-3.2.2.jar
表示hadoop可以运行了
表示hadoop可以运行了
2 hadoop的运行方式一(单机运行)
2.1 创建目录input
[cwptky@cwptky hadoop-3.2.2]# mkdir input
2.2 将hadoop的所有配置文件拷贝到input目录中,作为测试的数据
[cwptky@cwptky hadoop-3.2.2]# cp etc/hadoop/*.xml input
[cwptky@cwptky hadoop-3.2.2]# ll input/
total 52
-rw-r--r-- 1 cwptky cwptky 9213 capacity-scheduler.xml
-rw-r--r-- 1 cwptky cwptky 837 core-site.xml
-rw-r--r-- 1 cwptky cwptky 11392 hadoop-policy.xml
-rw-r--r-- 1 cwptky cwptky 775 hdfs-site.xml
-rw-r--r-- 1 cwptky cwptky 620 httpfs-site.xml
-rw-r--r-- 1 cwptky cwptky 3518 kms-acls.xml
-rw-r--r-- 1 cwptky cwptky 682 kms-site.xml
-rw-r--r-- 1 cwptky cwptky 758 mapred-site.xml
-rw-r--r-- 1 cwptky cwptky 690 yarn-site.xml
2.3 运行hadoop,搜索input目录下的所有文件,按照规则’dfs[a-z.]+'匹配,结果输出到output目录下
[cwptky@cwptky hadoop-3.2.2]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.2.jar grep input output 'dfs[a-z.]+'
2.4 查看结果输出
[cwptky@cwptky hadoop-3.2.2]# ll output
total 4
-rw-r--r-- 1 cwptky cwptky 11 part-r-00000
-rw-r--r-- 1 cwptky cwptky 0 _SUCCESS
[cwptky@cwptky hadoop-3.2.2]# cat output/part-r-00000
1 dfsadmin
2.5 删除input和output文件夹
[cwptky@cwptky hadoop-3.2.2]# rm -vfr input output
3 hadoop的运行方式二(伪集群部署)
在单个节点(一台机器上)以伪分布式的方式运行。
1.修改core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml文件
# 进入文件夹
cd /usr/local/src/hadoop/hadoop-3.2.2/etc/hadoop/
vim core-site.xml
core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/src/hadoop/hadoop-3.2.2/hdfs-tmp</value>
<description>Abase for other temporary directories.</description>
</property>
</configuration>
-
fs.defaultFS
用于指定HDFS的访问地址 -
hadoop.tmp.dir
用于保存临时文件,如果没有配置这个参数,则默认使用的临时目录为/tmp/hadoo-hadoop
,这个目录在Hadoop重启后会被系统清理掉
hdfs-site.xml:
vim hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>localhost:50090</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/src/hadoop/hadoop-3.2.2/file-tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/src/hadoop/hadoop-3.2.2/file-tmp/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
-
dfs.namenode.name.dir
:设定名称节点元数据的保存目录 -
dfs.datanode.data.dir
:设定数据节点的数据保存目录 -
dfs.replicaion
:指定副本数量,在分布式文件系统中,数据通常会被冗余的存储多份,以保证可靠性和安全性,但是这里用的是伪分布式模式,节点只有一个,也有就只有一个副本
mapred-site.xml:
vim mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>localhost:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>localhost:19888</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=/usr/local/src/hadoop/hadoop-3.2.2/hdfs-tmp</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=/usr/local/src/hadoop/hadoop-3.2.2/hdfs-tmp</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=/usr/local/src/hadoop/hadoop-3.2.2/hdfs-tmp</value>
</property>
</configuration>
-
mapreduce.jobhistory.address
:指定历史服务器的地址和端口 -
mapreduce.jobhistory.webapp.address
历史服务器的web地址
yarn-site.xml:
vim yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>localhost</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>localhost:8088</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>localhost:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8032</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>localhost:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>localhost:8033</value>
</property>
</configuration>
创建文件目录
mkdir hdfs-tmp
mkdir file-tmp
格式化hdfs
hdfs namenode -format
WARNING: /usr/local/src/hadoop/hadoop-3.2.2/logs does not exist. Creating.
2021-05-08 11:16:28,710 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = cwptky/172.18.159.172
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 3.2.2
。。。。。
。。。。
。。。
。。
。
2021-05-08 11:16:29,878 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2021-05-08 11:16:29,883 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid=0 when meet shutdown.
2021-05-08 11:16:29,883 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at cwptky/172.18.159.172
************************************************************/
启动hadoop的dfs测试
# 启动 dfs
sbin/start-dfs.sh
报错:
报错原因:出现这种状况是因为当前账号没有配置ssh免密登录
解决:
检查ssh 登陆,发现需要输入密码
# 检查ssh 登陆
ssh locahost
配置ssh免密码登录
# 生成秘钥
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
# 将秘钥写入authorized_keys文件
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
# 授权600
chmod 600 ~/.ssh/authorized_keys
# 在测试ssh登陆
ssh localhost
再次启动
# 再出初始化
bin/hdfs namenode -format
# 启动dfs
sbin/start-dfs.sh
# 浏览器访问
http://127.0.0.1:9870/
停止
# 停止
sbin/stop-dfs.sh
启动hadoop
# 启动所有
./sbin/start-all.sh
关闭hadoop
./sbin/stop-all.sh
3.1 测试调用hadoop搜索功能
[hadoop@localhost hadoop-3.2.2]# bin/hdfs dfs -mkdir /user
[hadoop@localhost hadoop-3.2.2]# bin/hdfs dfs -mkdir /user/root
3.1.1 首先创建用户目录
3.1.2 查看当前用户下的文件
[hadoop@localhost hadoop-3.2.2]# bin/hdfs dfs -ls
# 什么也没有
3.1.3 准备实验的数据(将etc/hadoop下面的所有xml文件拷贝到input目录下)
[hadoop@localhost hadoop-3.2.2]# mkdir input
[hadoop@localhost hadoop-3.2.2]# cp etc/hadoop/*.xml input
[hadoop@localhost hadoop-3.2.2]# ls input
capacity-scheduler.xml hadoop-policy.xml httpfs-site.xml kms-site.xml yarn-site.xml
core-site.xml hdfs-site.xml kms-acls.xml mapred-site.xml
3.1.4 将input目录上传到hadoop上命名为input1
[hadoop@localhost hadoop-3.2.2]# bin/hdfs dfs -put input input1
3.1.5 查看hadoop上已有的实验文件
[hadoop@localhost hadoop-3.2.2]# bin/hdfs dfs -ls
Found 1 items
drwxr-xr-x - root supergroup 0 2019-01-25 15:24 input1
[hadoop@localhost hadoop-3.2.2]# bin/hdfs dfs -ls input1
Found 9 items
-rw-r–r-- 1 supergroup 8260 input1/capacity-scheduler.xml
-rw-r–r-- 1 supergroup 884 input1/core-site.xml
-rw-r–r-- 1 supergroup 11392 input1/hadoop-policy.xml
-rw-r–r-- 1 supergroup 868 input1/hdfs-site.xml
-rw-r–r-- 1 supergroup 620 input1/httpfs-site.xml
-rw-r–r-- 1 supergroup 3518 input1/kms-acls.xml
-rw-r–r-- 1 supergroup 682 input1/kms-site.xml
-rw-r–r-- 1 supergroup 758 input1/mapred-site.xml
-rw-r–r-- 1 supergroup 690 input1/yarn-site.xml
3.1.6 调用hadoop,搜索input1目录下的所有文件,按照规则’dfs[a-z.]+'匹配,结果输出到output目录下
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.2.jar grep input1 output 'dfs[a-z.]+'
3.1.7 查看hadoop上生成的output目录
[hadoop@localhost hadoop-3.2.2]# bin/hdfs dfs -ls
Found 2 items
drwxr-xr-x - root supergroup 0 2019-01-25 15:24 input1
drwxr-xr-x - root supergroup 0 2019-01-25 15:26 output
[hadoop@localhost hadoop-3.2.2]# bin/hdfs dfs -ls output
Found 2 items
-rw-r–r-- 1 root supergroup 0 2019-01-25 15:26 output/_SUCCESS
-rw-r–r-- 1 root supergroup 29 2019-01-25 15:26 output/part-r-00000
[hadoop@localhost hadoop-3.2.2]# bin/hdfs dfs -cat output/part-r-00000
1 dfsadmin
1 dfs.replication
这里可以看到搜索的结果.
3.1.8 实验完毕,关闭伪分布式hadoop
[hadoop@localhost hadoop-3.2.2]# sbin/stop-dfs.sh
WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
Stopping namenodes on [localhost]
Last login: Fri Jan 25 15:14:57 CST 2019 on pts/0
Stopping datanodes
Last login: Fri Jan 25 15:30:53 CST 2019 on pts/0
Stopping secondary namenodes [localhost.localdomain]
Last login: Fri Jan 25 15:30:54 CST 2019 on pts/0