hadoop大数据平台
hadoop属于底层的基础设施,一个生态当中的基础。hadoop起源于google的三大论文。GFS,MAPREDUCE和BIGTABLE。
hadoop名字来源是hadoop之父道卡廷儿子的毛绒玩具名。
我们所使用的是基于apache协议分发
常见的有三种,apache版本,cdh版本,hdp版本
本次就是要搭建一个hadoop框架,部署一个简单的大数据平台:
hadoop的框架最核心的是hdfs和mapreduce,hdfs为海量数据提供存储。mapreduce为海量数据提供计算。
hadoop应用场景:在线旅游(携程,飞猪等,分析旅客的行为需求),电子商务,能源开采与节能,it安全等
我们运维人员要做的就是数据存储层的hdfs文件存储。
开始:
创建一个普通用户,以普通用户身份运行。
[root@server1 ~]# useradd hadoop
[root@server1 ~]# su - hadoop
[hadoop@server1 ~]$
准备好文件
[hadoop@server1 ~]$ ls
hadoop-3.2.1.tar.gz jdk-8u181-linux-x64.tar.gz
[hadoop@server1 ~]$ tar zxf jdk-8u181-linux-x64.tar.gz
解压文件
做一个软链接
[hadoop@server1 ~]$ ln -s jdk1.8.0_181/ java
[hadoop@server1 ~]$ ls
hadoop-3.2.1.tar.gz java jdk1.8.0_181 jdk-8u181-linux-x64.tar.gz
解压hadoop,也做一个软链接
[hadoop@server1 ~]$ tar zxf hadoop-3.2.1.tar.gz
[hadoop@server1 ~]$ ls
hadoop-3.2.1 hadoop-3.2.1.tar.gz java jdk1.8.0_181 jdk-8u181-linux-x64.tar.gz
[hadoop@server1 ~]$ ln -s hadoop-3.2.1 hadoop
[hadoop@server1 ~]$ ls
hadoop hadoop-3.2.1 hadoop-3.2.1.tar.gz java jdk1.8.0_181 jdk-8u181-linux-x64.tar.gz
在此路径中,我们要进入这个环境脚本中,告诉配置文件我们的java路径和hadoop路径在哪里
[hadoop@server1 hadoop]$ pwd
/home/hadoop/hadoop/etc/hadoop
[hadoop@server1 hadoop]$ ls
capacity-scheduler.xml hadoop-user-functions.sh.example kms-log4j.properties ssl-client.xml.example
configuration.xsl hdfs-site.xml kms-site.xml ssl-server.xml.example
container-executor.cfg httpfs-env.sh log4j.properties user_ec_policies.xml.template
core-site.xml httpfs-log4j.properties mapred-env.cmd workers
hadoop-env.cmd httpfs-signature.secret mapred-env.sh yarn-env.cmd
hadoop-env.sh httpfs-site.xml mapred-queues.xml.template yarn-env.sh
hadoop-metrics2.properties kms-acls.xml mapred-site.xml yarnservice-log4j.properties
hadoop-policy.xml kms-env.sh shellprofile.d yarn-site.xml
[hadoop@server1 hadoop]$ vim hadoop-env.sh
如果可以正常执行此脚本,说明单机模式可以正常运行
[hadoop@server1 hadoop]$ bin/hadoop
Usage: hadoop [OPTIONS] SUBCOMMAND [SUBCOMMAND OPTIONS]
or hadoop [OPTIONS] CLASSNAME [CLASSNAME OPTIONS]
where CLASSNAME is a user-provided Java class
OPTIONS is none or any of:
buildpaths attempt to add class files from build tree
--config dir Hadoop config directory
--debug turn on shell script debug mode
--help usage information
hostnames list[,of,host,names] hosts to use in slave mode
hosts filename list of hosts to use in slave mode
loglevel level set the log4j level for this command
workers turn on worker mode
SUBCOMMAND is one of:
Admin Commands:
daemonlog get/set the log level for each daemon
Client Commands:
archive create a Hadoop archive
checknative check native Hadoop and compression libraries availability
classpath prints the class path needed to get the Hadoop jar and the required libraries
conftest validate configuration XML files
credential interact with credential providers
distch distributed metadata changer
distcp copy file or directories recursively
dtutil operations related to delegation tokens
envvars display computed Hadoop environment variables
fs run a generic filesystem user client
gridmix submit a mix of synthetic job, modeling a profiled from production load
jar <jar> run a jar file. NOTE: please use "yarn jar" to launch YARN applications, not this command.
jnipath prints the java.library.path
kdiag Diagnose Kerberos Problems
kerbname show auth_to_local principal conversion
key manage keys via the KeyProvider
rumenfolder scale a rumen input trace
rumentrace convert logs into a rumen trace
s3guard manage metadata on S3
trace view and modify Hadoop tracing settings
version print the version
Daemon Commands:
kms run KMS, the Key Management Server
SUBCOMMAND may print help when invoked w/o parameters or with -h.
1.单机模式:
独立操作
默认情况下,Hadoop 配置为在非分布式模式下作为单个 Java 进程运行。 这对于调试很有用。
以下示例复制解压后的 conf 目录以用作输入,然后查找并显示给定正则表达式的每个匹配项。 输出写入给定的输出目录。
[hadoop@server1 hadoop]$ mkdir input
[hadoop@server1 hadoop]$ cp etc/hadoop/*.xml input
[hadoop@server1 hadoop]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar grep input output 'dfs[a-z]+'
[hadoop@server1 hadoop]$ cat output/*
1 dfsadmin
以上就是单机版的输出,dfs出现的频率为1
安装单机模式的Hadoop无需配置,在这种方式下,Hadoop被认为是一个单独的Java进程,这种方式经常被用来调试
运行Hadoop自带程序grep程序
运行结束后匹配统计结果已经被写入了HDFS的output目录下
(output目录会被自动建立)
(使用./bin/hdfs dfs -cat output2/*可以直接查看)
2.伪分布式:
伪分布式是在一台结点上模拟出多台的效果
伪分布式操作
Hadoop 也可以以伪分布式模式在单节点上运行,其中每个 Hadoop 守护进程在单独的 Java 进程中运行。
编辑如下两个配置文件
[hadoop@server1 hadoop]$ vim core-site.xml
[hadoop@server1 hadoop]$ vim hdfs-site.xml
然后对本地做一个ssh密钥认证
[hadoop@server1 hadoop]$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Created directory '/home/hadoop/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:JEfkuqZG84RrlnQJZThBccvdGAQh1rcwkcoBvkpJ+hc hadoop@server1
The key's randomart image is:
+---[RSA 2048]----+
| .o*==B= |
| . .++B=.+ |
| ... *++*.. |
|o ..+ =. |
|.o. Eo..S |
|.o =.+. |
|. .o.Bo |
| .*o. |
| +. |
+----[SHA256]-----+
[hadoop@server1 hadoop]$ ssh-copy-id localhost
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
The authenticity of host 'localhost (::1)' can't be established.
ECDSA key fingerprint is SHA256:5Dhga2ILRQEIeEeFhLQ6KwXsVFfM19aEzOlkYB+UGCM.
ECDSA key fingerprint is MD5:bd:a1:b5:dc:26:9c:32:36:26:9d:9c:31:1e:48:b3:86.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
hadoop@localhost's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'localhost'"
and check to make sure that only the key(s) you wanted were added.
伪分布式会登录一个worker节点来执行操作
伪分布式也要按照分布式的方式进行启动
在启动hadoop之前,需要先进行格式化Hadoop的文件系统HDFS
[hadoop@server1 hadoop]$ pwd
/home/hadoop/hadoop
[hadoop@server1 hadoop]$ bin/hdfs namenode -format
最后一次确定该有的数据是否准备好
在临时数据目录/tmp下是否有Hadoop的数据
[hadoop@server1 hadoop]$ id
uid=1001(hadoop) gid=1001(hadoop) groups=1001(hadoop)
[hadoop@server1 hadoop]$ ls /tmp/
hadoop hadoop-hadoop-namenode.pid systemd-private-5befa3cbcbaf4987b25ea1d5c292dcc4-mariadb.service-lb0kQy
hadoop-hadoop hsperfdata_hadoop
先启动一个dfs(分布式文件系统)namenode就是master,datanode就是专门存储数据的。
[hadoop@server1 hadoop]$ sbin/start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [server1]
server1: Warning: Permanently added 'server1,172.25.254.11' (ECDSA) to the list of known hosts.
用jps可以查看到Hadoop是否启动成功
可以看到,现在,这台主机,既是NameNode,又是DataNode
相应的9000端口也打开了
不过此处需要把jps添加到bash中,jps是在java中的命令
[hadoop@server1 ~]$ vim .bash_profile
[hadoop@server1 ~]$ source .bash_profile
[hadoop@server1 ~]$ which jps
~/java/bin/jps
[hadoop@server1 ~]$ jps
17558 Jps
15755 NameNode
16060 SecondaryNameNode
15870 DataNode
SecondaryNameNode可以理解为当namenode挂了以后来接管namenode。namenode相当于master.查看端口发现关于hadoop的端口也都开启了
[hadoop@server1 ~]$ netstat -antlp |grep java
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 0.0.0.0:9864 0.0.0.0:* LISTEN 15870/java
tcp 0 0 127.0.0.1:9000 0.0.0.0:* LISTEN 15755/java
tcp 0 0 127.0.0.1:34505 0.0.0.0:* LISTEN 15870/java
tcp 0 0 0.0.0.0:9866 0.0.0.0:* LISTEN 15870/java
tcp 0 0 0.0.0.0:9867 0.0.0.0:* LISTEN 15870/java
tcp 0 0 0.0.0.0:9868 0.0.0.0:* LISTEN 16060/java
tcp 0 0 0.0.0.0:9870 0.0.0.0:* LISTEN 15755/java
tcp 0 0 127.0.0.1:46448 127.0.0.1:9000 ESTABLISHED 15870/java
tcp 0 0 127.0.0.1:9000 127.0.0.1:46448 ESTABLISHED 15755/java
hadoop基本命令及其测试
现在hadoop分布文件系统里面是空的,建立一个以自己用户名为名字的目录
[hadoop@server1 hadoop]$ bin/hdfs dfs -ls
ls: `.': No such file or directory
[hadoop@server1 hadoop]$ id
uid=1001(hadoop) gid=1001(hadoop) groups=1001(hadoop)
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir /user
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir /user/hadoop
[hadoop@server1 hadoop]$ bin/hdfs dfs -ls
给自己的家目录上传刚才创建的input文件
[hadoop@server1 hadoop]$ bin/hdfs dfs -put input
2021-09-15 17:14:22,007 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-09-15 17:14:22,687 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-09-15 17:14:23,144 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-09-15 17:14:23,574 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-09-15 17:14:24,009 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-09-15 17:14:24,038 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-09-15 17:14:24,508 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-09-15 17:14:24,956 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-09-15 17:14:25,390 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
假如删除本地的input目录,通过网页查看Hadoop内的资源,发现input目录依旧存在。
这里再次运行Hadoop自带程序wordcount
首先在/usr/local/hadoop/share/hadoop/mapreduce目录下面找到自带包的版本
运行结束后词频统计结果已经被写入了HDFS的output目录下,执行命令bin/hdfs dfs -cat output/*查看词频统计结果
[hadoop@server1 hadoop]$ rm -rf input/
[hadoop@server1 hadoop]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar wordcount input output
[hadoop@server1 hadoop]$ bin/hdfs dfs -cat output/*
[hadoop@server1 hadoop]$ bin/hdfs dfs -ls
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2021-09-15 17:14 input
drwxr-xr-x - hadoop supergroup 0 2021-09-15 17:20 output
此时分布式文件系统已经有了所上传的文件了。也可以把刚才计算的内容下载到本地查看
[hadoop@server1 hadoop]$ bin/hdfs dfs -get output
2021-09-15 17:28:25,525 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
[hadoop@server1 hadoop]$ ls
bin etc include lib libexec LICENSE.txt logs NOTICE.txt output README.txt sbin share
[hadoop@server1 hadoop]$ cd output/
[hadoop@server1 output]$ ls
part-r-00000 _SUCCESS
[hadoop@server1 output]$ cat *
"*" 21
"AS 9
"License"); 9
"alice,bob 21
"clumping" 1
(ASF) 1
(root 1
(the 9
--> 18
删除output试试
[hadoop@server1 hadoop]$ bin/hdfs dfs -rm -r output
Deleted output
其实和在网页上删除的效果是一样的 ,删除Hadoop中的数据(不能在网页上删除,因为我们在登陆这个web界面时,采用的是匿名用户身份,没有删除的权限)
这个是可以在主配置文件里开放的。
完全分布式:
再开两台虚拟机,给它们都创建用户和密码
[root@server3 ~]# useradd hadoop
[root@server3 ~]# echo yume |passwd --stdin hadoop
Changing password for user hadoop.
passwd: all authentication tokens updated successfully.
关闭之前伪分布式开启的进程
[hadoop@server1 hadoop]$ sbin/stop-dfs.sh
Stopping namenodes on [localhost]
Stopping datanodes
Stopping secondary namenodes [server1]
[hadoop@server1 hadoop]$ jps
10556 Jps
3个节点分别安装NFS
(1个NameNode和2个DataNode)
[root@server1 ~]# yum install nfs-utils -y
NameNode(server1)安装NFS之后,需要写配置文件,将/home/hadoop目录共享出去
[root@server1 ~]# id hadoop
uid=1001(hadoop) gid=1001(hadoop) groups=1001(hadoop)
[root@server1 ~]# vim /etc/exports
/home/hadoop *(rw,anonuid=1001,anongid=1001)
[root@server1 ~]# systemctl start nfs
[root@server1 ~]# showmount -e
Export list for server1:
/home/hadoop *
DataNode挂载共享目录
(只有超级用户才可以挂载)
这里为什么要挂载?
因为,在部署集群时,每个节点都需要进行配置文件的修改以及各种操作,一个一个修改就太麻烦了,采用网络文件的方法将目录共享出去,这样,源目录(server1的/home/hadoop)有任何操作,其余的2台主机就会有什么样的操作
[root@server2 ~]# showmount -e 172.25.254.11
Export list for 172.25.254.11:
/home/hadoop *
[root@server2 ~]# mount 172.25.254.11:/home/hadoop/ /home/hadoop/
[root@server2 ~]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/rhel-root 13092864 1265992 11826872 10% /
devtmpfs 929016 0 929016 0% /dev
tmpfs 941036 0 941036 0% /dev/shm
tmpfs 941036 16972 924064 2% /run
tmpfs 941036 0 941036 0% /sys/fs/cgroup
/dev/vda1 1038336 148188 890148 15% /boot
tmpfs 188208 0 188208 0% /run/user/0
172.25.254.11:/home/hadoop 13092864 8167680 4925184 63% /home/hadoop
server3同样做此操作
[root@server3 ~]# mount 172.25.254.11:/home/hadoop/ /home/hadoop/
[root@server3 ~]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/rhel-root 13092864 1266312 11826552 10% /
devtmpfs 929016 0 929016 0% /dev
tmpfs 941036 0 941036 0% /dev/shm
tmpfs 941036 16972 924064 2% /run
tmpfs 941036 0 941036 0% /sys/fs/cgroup
/dev/vda1 1038336 148188 890148 15% /boot
tmpfs 188208 0 188208 0% /run/user/0
172.25.254.11:/home/hadoop 13092864 8167680 4925184 63% /home/hadoop
NameNode和DataNode之间需要做免密
因为刚刚DataNode都挂载了server21的/home/hadoop,所以,在挂载的目录中就包含有SSH的公私钥对,因此,在这里,我们就不需要再去生成密钥进行scp了;
现在,我们只需要进行免密的验证,确保三台之间互相是免密的即可。
这样的话任何节点都是同步的
[hadoop@server3 ~]$ ls
hadoop hadoop-3.2.1 hadoop-3.2.1.tar.gz java jdk1.8.0_181 jdk-8u181-linux-x64.tar.gz
Hadoop集群的配置
配置Hadoop核心的配置文件
注意:代码中的server1是解析文件中的域名(是master的域名)
[hadoop@server1 hadoop]$ pwd
/home/hadoop/hadoop/etc/hadoop
[hadoop@server1 hadoop]$ vim core-site.xml
修改slaves文件
server2和server3是slave(DataNode)
[hadoop@server1 hadoop]$ vim workers
server2
server3
修改Hadoop的HDFS的配置文件,有两个slave就可以存储两份了
[hadoop@server1 hadoop]$ vim hdfs-site.xml
删除之前伪分布式操作留下的临时数据
[hadoop@server1 hadoop]$ rm -rf /tmp/*
格式化Hadoop文件系统
[hadoop@server1 hadoop]$ bin/hdfs namenode -format
2021-09-16 11:14:05,493 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = server1/172.25.254.11
STARTUP_MSG: args = [-format]
启动Hadoop
[hadoop@server1 hadoop]$ sbin/start-dfs.sh
Starting namenodes on [server1]
Starting datanodes
Starting secondary namenodes [server1]
[hadoop@server1 hadoop]$ jps
23298 Jps
22807 NameNode
23031 SecondaryNameNode
在DataNode可以查看运行中的进程
[hadoop@server2 ~]$ jps
4624 Jps
4559 DataNode
测试(存数据)
生成一个bigfile(200M)
该文件会被分成2个block,因为Hadoop默认一个主机只能存储128M
[hadoop@server1 hadoop]$ dd if=/dev/zero of=bigfile bs=1M count=200
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 0.252584 s, 830 MB/s
使用以下命令,查看一下hadoop的集群状况
[hadoop@server1 hadoop]$ bin/hdfs dfsadmin -report
Configured Capacity: 26814185472 (24.97 GB)
Present Capacity: 24220983296 (22.56 GB)
DFS Remaining: 24220966912 (22.56 GB)
DFS Used: 16384 (16 KB)
DFS Used%: 0.00%
Replicated Blocks:
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Low redundancy blocks with highest priority to recover: 0
Pending deletion blocks: 0
Erasure Coded Block Groups:
Low redundancy block groups: 0
Block groups with corrupt internal blocks: 0
Missing block groups: 0
Low redundancy blocks with highest priority to recover: 0
Pending deletion blocks: 0
-------------------------------------------------
Live datanodes (2):
Name: 172.25.254.12:9866 (server2)
Hostname: server2
Decommission Status : Normal
Configured Capacity: 13407092736 (12.49 GB)
DFS Used: 8192 (8 KB)
Non DFS Used: 1296445440 (1.21 GB)
DFS Remaining: 12110639104 (11.28 GB)
DFS Used%: 0.00%
DFS Remaining%: 90.33%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Sep 16 14:15:23 CST 2021
Last Block Report: Thu Sep 16 11:50:03 CST 2021
Num of Blocks: 0
Name: 172.25.254.13:9866 (server3)
Hostname: server3
Decommission Status : Normal
Configured Capacity: 13407092736 (12.49 GB)
DFS Used: 8192 (8 KB)
Non DFS Used: 1296756736 (1.21 GB)
DFS Remaining: 12110327808 (11.28 GB)
DFS Used%: 0.00%
DFS Remaining%: 90.33%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Sep 16 14:15:23 CST 2021
Last Block Report: Thu Sep 16 11:15:33 CST 2021
Num of Blocks: 0
上传刚才创建的bigfile
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir /user
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir /user/hadoop
[hadoop@server1 hadoop]$ bin/hdfs dfs -put bigfile
2021-09-16 14:19:30,386 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-09-16 14:19:32,929 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
自动存储了两个副本,在server2和3都有
在线添加节点(热添加)
又添加一个DataNode
记得保证全平台一致
(1)写本地解析
(2)创建用户
(3)安装NFS,挂载NameNode 的共享目录(因为挂载的原因,这台主机就不需要给新用户Hadoop设置密码了)
(4)使用命令jps,检查一下环境是否已经部署完毕
[root@server4 ~]# vim /etc/hosts
[root@server4 ~]# useradd hadoop
[root@server4 ~]# echo yume | passwd --stdin hadoop
[root@server4 ~]# yum install nfs-utils -y
[root@server4 ~]# mount 172.25.254.11:/home/hadoop/ /home/hadoop/
[hadoop@server4 ~]$ jps
2524 Jps
新添加的server4作为DataNode(slave)要写入workers文件中
[hadoop@server4 ~]$ vim hadoop/etc/hadoop/workers
启动集群
注意:这里启动集群不同与之前的方式(之前的Hadoop没有关闭=热添加)
[hadoop@server4 hadoop]$ bin/hdfs --daemon start datanode
[hadoop@server4 hadoop]$ jps
2842 DataNode
2925 Jps
测试(模拟用户上传数据)
假如现在客户端要求上传数据,客户端已经通过NameNode得到了DataNode的信息。
客户端将文件demo分割成2个block,分别交给指定的DataNode,DataNode复制保存block
直接上传到用户主目录,既是数据节点又是客户端,因为使用nfs文件系统
[hadoop@server4 hadoop]$ ls
bigfile bin etc include lib libexec LICENSE.txt logs NOTICE.txt README.txt sbin share
[hadoop@server4 hadoop]$ mv bigfile demo
[hadoop@server4 hadoop]$ bin/hdfs dfs -put demo
2021-09-16 14:39:05,502 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-09-16 14:39:07,655 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
在DataNode上查看核心配置文件,NameNode的IP和端口号如下
[hadoop@server4 hadoop]$ cat etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://server1:9000</value>
</property>
</configuration>
通过网页测试可以看到,block有2个,分别在server4和server2上。
均衡的完成了完全分布式存储