Hadoop_06_hdfs和Hadoop的基准测试

最新推荐文章于 2024-07-25 16:38:33 发布

菜小雨

最新推荐文章于 2024-07-25 16:38:33 发布

阅读量359

点赞数

文章标签： hadoop hdfs 大数据

本文链接：https://blog.csdn.net/Latterain/article/details/106265103

版权

hdfs和Hadoop的基准测试

HDFS
Hadoop基准测试

HDFS

HDFS 是
Hadoop Distribute File System 的简称，意为：Hadoop 分布式文件系统。是 Hadoop 核心组件之一，作为最底层的分布式存储服务而存在。

基本特性

master/slave架构：
主从架构

namenode：
主节点，主要用于存储元数据，处理用户的请求

datanode：
从节点，主要用于存储数据，说白了就是出磁盘的

元数据信息：
描述数据的数据文件的名称，文件的位置，文件的大小，创建时间，修改时间，权限控制

namenode元数据管理：
namenode将所有的元数据信息保存起来，方便统一的查找

datanode数据存储：
出磁盘，用于存储我们的文件数据

分块存储：
把一个大的文件，化成一个个的小的block块，在hadoop2当中一个block块默认是128M的大小100台机器，每台机器的磁盘容量是1T 有一个文件 2T 可以把大文件切成很多小的block块，每个block块是128M

统一的命名空间：
对外提供统一的文件访问的地址 hdfs://node01:8020

副本机制：
1280M的文件需要拆成10个Block块副本机制，每个block块都有三个副本，blk_00001 node01 blk_00001 node02 blk_00001 node03

一次写入，多次读取：
hdfs文件系统，适用于频繁读取的情况，不适用与频繁写入的情况改变文件，涉及到元数据的改变

基本命令使用

ls
Usage: hdfs dfs -ls [-R] < args>
Example: hdfs dfs -ls /user/hadoop/file1
Exit Code: Returns 0 on success and -1 on error.

lsr
Usage: hdfs dfs -lsr
Recursive version of ls.
Note: This command is deprecated. Instead use hdfs dfs -ls -R

mkdir
Usage: hdfs dfs -mkdir [-p] < paths>
-p是创造多级路径
Example:

hdfs dfs -mkdir /user/hadoop/dir1 /user/hadoop/dir2
hdfs dfs -mkdir hdfs://nn1.example.com/user/hadoop/dir
hdfs://nn2.example.com/user/hadoop/dir
Exit Code:
Returns 0 on success and -1 on error.

moveFromLocal：相当于剪切操作
Usage: hdfs dfs -moveFromLocal < localsrc> < dst>
Similar to put command, except that the source localsrc is deleted after it’s copied.

mv：可以用来重命名
Usage: hdfs dfs -mv URI [URI …]
Moves files from source to destination. This command allows multiple sources as well in which case the destination needs to be a directory. Moving files across file systems is not permitted.
Example:

hdfs dfs -mv /user/hadoop/file1 /user/hadoop/file2
hdfs dfs -mv hdfs://nn.example.com/file1 hdfs://nn.example.com/file2
hdfs://nn.example.com/file3 hdfs://nn.example.com/dir1

Exit Code:
Returns 0 on success and -1 on error.

put：复制，把本地移到hdfs
Usage: hdfs dfs -put < localsrc> … < dst>
Copy single src, or multiple srcs from local file system to the destination file system. Also reads input from stdin and writes to destination file system.
Example:

hdfs dfs -put localfile /user/hadoop/hadoopfile
hdfs dfs -put localfile1 localfile2 /user/hadoop/hadoopdir
hdfs dfs -put localfile hdfs://nn.example.com/hadoop/hadoopfile
hdfs dfs -put - hdfs://nn.example.com/hadoop/hadoopfile Reads the
input from stdin.

Exit Code:
Returns 0 on success and -1 on error.

appendToFile：追加
Usage: hdfs dfs -appendToFile …
追加一个或者多个文件到hdfs指定文件中.也可以从命令行读取输入.

hdfs dfs -appendToFile localfile /user/hadoop/hadoopfile
hdfs dfs -appendToFile localfile1 localfile2 /user/hadoop/hadoopfile
hdfs dfs -appendToFile localfile
hdfs://nn.example.com/hadoop/hadoopfile
hdfs dfs -appendToFile - hdfs://nn.example.com/hadoop/hadoopfile Reads the input from stdin.

Exit Code:
Returns 0 on success and 1 on error.

cat
Usage: hdfs dfs -cat URI [URI …]
查看内容.
Example:

hdfs dfs -cat hdfs://nn1.example.com/file1
hdfs://nn2.example.com/file2
hdfs dfs -cat file:///file3 /user/hadoop/file4

Exit Code:
Returns 0 on success and -1 on error.

cp
Usage: hdfs dfs -cp [-f] [-p | -p[topax]] URI [URI …]
复制文件(夹)，可以覆盖，可以保留原有权限信息
option：

The -f option will overwrite the destination if it already exists.
The -p option will preserve file attributes [topx] (timestamps, ownership, permission, ACL, XAttr).
Example:
hdfs dfs -cp /user/hadoop/file1 /user/hadoop/file2
hdfs dfs -cp /user/hadoop/file1 /user/hadoop/file2 /user/hadoop/dir

Exit Code:
Returns 0 on success and -1 on error.

rm 删除
Usage: hdfs dfs -rm [-f] [-r|-R] [-skipTrash] URI [URI …]
Example:
hdfs dfs -rm hdfs://nn.example.com/file /user/hadoop/emptydir
hdfs dfs -rm -r 递归删除
Exit Code:
Returns 0 on success and -1 on error.

chmod：修改权限
Usage: hdfs dfs -chmod [-R] <MODE[,MODE]… | OCTALMODE> URI [URI …]
hdfs dfs -chmod -R 777 /xxx

chown:修改所有者.
Usage: hdfs dfs -chown [-R] [OWNER][:[GROUP]] URI [URI ]
hdfs dfs -chown -R hadoop:hadoop /xxx
递归的修在组或者用户

expunge:清空回收站
Usage: hdfs dfs -expunge

高级使用命令

HDFS文件限额配置

hdfs文件的限额配置允许我们以文件大小或者文件个数来限制我们在某个目录下上传的文件数量或者文件内容总量，以便达到我们类似百度网盘网盘等限制每个用户允许上传的最大的文件的量。

文件数量限额

创建hdfs文件夹
hdfs dfs -mkdir -p /user/root/lisi

给该文件夹下面设置最多上传两个文件，上传文件，发现只能上传一个文件，因为自己算一个，限制三个只能上传2个。
hdfs dfsadmin -setQuota 2 lisi

清除文件数量限制
hdfs dfsadmin -clrQuota /user/root/lisi

查看hdfs文件限额数量
hdfs dfs -count -q -h /user/root/lisi

空间大小限额

限制空间大小4KB
hdfs dfsadmin -setSpaceQuota 4k /user/root/lisi

测试：上传超过4Kb的文件大小上去提示文件超过限额
hdfs dfs -put /export/softwares/zookeeper-3.4.5-cdh5.14.0.tar.gz /user/root/lisi

清除空间限额：
hdfs dfsadmin -clrSpaceQuota /user/root/lisi
重新测试：上传成功
hdfs dfs -put /export/softwares/zookeeper-3.4.5-cdh5.14.0.tar.gz /user/root/lisi

hdfs安全模式

在我们集群刚刚启动的时候，集群是出于安全模式的，对外不提供任何服务，只做一件事，集群的自检.
如果集群自检没有什么问题，那么过三十秒钟，自动脱离安全模式，可以对外提供服务

#手动进入安全模式/只读模式
hdfs dfsadmin -safemode enter
#手动进入安全模式后需要手动离开安全模式
hdfs dfsadmin -safemode leave

Hadoop基准测试

集群启动成功之后，第一件事就是做基准测试，就是压测，测试我们的网络带宽，测试我们的文件读取和写入速度等等
了解集群上限能存多少文件
网络带宽如何？网速够不够快
最大带宽取决于交换机

写入速度的测试

向HDFS文件系统中写入数据,10个文件,每个文件10MB,文件存放到/benchmarks/TestDFSIO中
hadoop jar /export/servers/hadoop-2.6.0-cdh5.14.0/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-cdh5.14.0.jar TestDFSIO -write -nrFiles 10 -fileSize 10MB

完成之后查看写入速度结果
hdfs dfs -text /benchmarks/TestDFSIO/io_write/part-00000

如下图：我的小surface写入速度：
在这里插入图片描述
如下图，磁头不动磁盘转，磁盘转的越快，写入速度越快。

读取速度的测试

测试hdfs的读取文件性能
在HDFS文件系统中读入10个文件,每个文件10M
hadoop jar /export/servers/hadoop-2.6.0-cdh5.14.0/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-cdh5.14.0.jar TestDFSIO -read -nrFiles 10 -fileSize 10MB

查看读取结果
hdfs dfs -text /benchmarks/TestDFSIO/io_read/part-00000
在这里插入图片描述