hadoop存储与分析2

NameNode启动过程

在这里插入图片描述

NameNode的SafeMode

在启动过程中,NameNode会进入一个特殊的状态称为Safemode.HDFS在处于安全模式下不会进行数据块的复制。NameNode在安全模式下接收来自DataNode的心跳和Blockreport信息,每个DataNode的块的汇报信息中包含了该物理主机上所持有的所有的数据块的信息。Name会在启动时候检查所有汇报的块是否满足设置的最小副本数(默认值1),只要块达到了最小副本数,才认得当前块是安全的。NameNode等待30秒然后尝试检查汇报的所谓的安全的块的比例有没有达到99.9%,如果达到该阈值,NameNode自动退出安全模式。然后开始检查块的副本数有没有低于配置的副本数,然后发送复制指令,进行块的复制。

On startup, the NameNode enters a special state called Safemode. Replication of data blocks does not occur when the NameNode is in the Safemode state. The NameNode receives Heartbeat and Blockreport messages from the DataNodes. A Blockreport contains the list of data blocks that a DataNode is hosting. Each block has a specified minimum number of replicas. A block is considered safely replicated when the minimum number of replicas of that data block has checked in with the NameNode. After a configurable percentage of safely replicated data blocks checks in with the NameNode (plus an additional 30 seconds), the NameNode exits the Safemode state. It then determines the list of data blocks (if any) that still have fewer than the specified number of replicas. The NameNode then replicates these blocks to other DataNodes.

注意:HDFS在启动的时候会自动进入和退出安全模式,一般在生产一般有时候也会让HDFS强制进入安全模式,进而对服务器进行维护。

[root@CentOS ~]# hdfs dfsadmin -safemode get
Safe mode is OFF
[root@CentOS ~]# hdfs dfsadmin -safemode enter
Safe mode is ON
[root@CentOS ~]# hdfs dfs -put hadoop-2.9.2.tar.gz /
put: Cannot create file/hadoop-2.9.2.tar.gz._COPYING_. Name node is in safe mode.
[root@CentOS ~]# hdfs dfsadmin -safemode leave
Safe mode is OFF
[root@CentOS ~]# hdfs dfs -put hadoop-2.9.2.tar.gz /

SSH免密码认证

SSH 为建立在应用层基础上的安全协议。SSH 是较可靠,专为远程登录会话和其他网络服务提供安全性的协议。利用 SSH 协议可以有效防止远程管理过程中的信息泄露问题。提供的登录方式有两种:

  • 基于口令的安全验证 - 有可能远程主机冒充目标主机,截获用户信息。
  • 密匙的安全验证 -需要认证的是机器的身份

在这里插入图片描述

①产生公私钥对,可选RSA或者DSA算法

[root@CentOS ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:qWX5zumy1JS1f1uxPb3Gr+5e8F0REVueJew/WYrlxwc root@CentOS
The key's randomart image is:
+---[RSA 2048]----+
|             ..+=|
|              .o*|
|            .. +.|
|         o o .E o|
|        S o .+.*+|
|       + +  ..o=%|
|      . . o   o+@|
|       ..o .   ==|
|        .+=  +*+o|
+----[SHA256]-----+

默认会在~/.ssh目录下产生id_rsa(私钥)和id_rsa.pub(公钥)

②将本机的公钥添加到目标主机的授信列表文件

[root@CentOS ~]# ssh-copy-id root@CentOS
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host 'centos (192.168.73.130)' can't be established.
ECDSA key fingerprint is SHA256:WnqQLGCjyJjgb9IMEUUhz1RLkpxvZJxzEZjtol7iLac.
ECDSA key fingerprint is MD5:45:05:12:4c:d6:1b:0c:1a:fc:58:00:ec:12:7e:c1:3d.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@centos's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'root@CentOS'"
and check to make sure that only the key(s) you wanted were added.

默认会将本机的公钥添加到远程目标主机的~/.ssh/authorized_keys文件中。

Trash回收站

HDFS为了规避由于用户的误操作,导致的数据删除丢失,用户可以在构建HDFS的时候,配置HDFS的垃圾回收功能。所谓的垃圾回收,本质上是在用户删除文件的时候,系统并不会立即删除文件,仅仅是将文件移动到垃圾回收的目录。然后更具配置的时间,一旦超过该时间,系统会删除该文件,用户需要在到期之前,将回收站的文件移除垃圾站,即可避免删除。

  • 开启垃圾回收,需要在core-site.xml中添加如下配置,然后重启hdfs即可
<!--垃圾回收,5 minites-->
<property>
  <name>fs.trash.interval</name>
  <value>5</value>
</property>
[root@CentOS hadoop-2.9.2]# hdfs dfs -rm -r -f /jdk-8u191-linux-x64.rpm
20/09/25 20:09:24 INFO fs.TrashPolicyDefault: Moved: 'hdfs://CentOS:9000/jdk-8u191-linux-x64.rpm' to trash at: hdfs://CentOS:9000/user/root/.Trash/Current/jdk-8u191-linux-x64.rpm

目录结构

[root@CentOS ~]# tree -L 1 /usr/hadoop-2.9.2/
/usr/hadoop-2.9.2/
├── bin  # 系统脚本,hdfs、hadoop、yarn
├── etc  # 配置目录xml、文本文件
├── include # 一些C的头文件,无需关注
├── lib  # 第三方native实现C实现
├── libexec # hadoop运行时候,加载配置的脚本
├── LICENSE.txt
├── logs # 系统运行日志目录,排查故障!
├── NOTICE.txt
├── README.txt
├── sbin  # 用户脚本,通常用于启动服务例如:start|top-dfs.sh、
└── share # hadoop运行的依赖jars、内嵌webapp 

HDFS实践

HDFS Shell 命令(经常用)

√打印hadoop类路径
[root@CentOS ~]# hdfs classpath
√格式化NameNode
[root@CentOS ~]# hdfs namenode -format
dfsadmin命令

①可以使用-report -live或者-dead查看集群中dataNode节点状态

[root@CentOS ~]# hdfs dfsadmin -report  -live 

②使用-safemode enter|leave|get等操作安全模式

[root@CentOS ~]# hdfs dfsadmin -safemode get
Safe mode is OFF

③查看集群网络拓扑

[root@CentOS ~]# hdfs dfsadmin -printTopology
Rack: /default-rack
   192.168.73.130:50010 (CentOS)

更多请参考:https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#dfsadmin

检查某个目录状态

[root@CentOS ~]# hdfs fsck /
√DFS命令
[root@CentOS ~]# hdfs dfs -命令 选项 
或者老版本写法
[root@CentOS ~]# hadoop fs -命令 选项  
-appendToFile

将anaconda-ks.cfg 追加到aa.log中

[root@CentOS ~]# hdfs dfs -appendToFile /root/anaconda-ks.cfg /aa.log
[root@CentOS ~]# hdfs dfs -appendToFile /root/anaconda-ks.cfg /aa.log
[root@CentOS ~]#
-cat

查看文件内容

[root@CentOS ~]# hdfs dfs -cat /aa.log
等价
[root@CentOS ~]# hdfs dfs -cat hdfs://CentOS:9000/aa.log
-chmod

修改文件权限

[root@CentOS ~]# hdfs dfs -chmod -R u+x  /aa.log
[root@CentOS ~]# hdfs dfs -chmod -R o+x  /aa.log
[root@CentOS ~]# hdfs dfs -chmod -R a+x  /aa.log
[root@CentOS ~]# hdfs dfs -chmod -R a-x  /aa.log
-copyFromLocal/-copyToLocal

copyFromLocal本地上传到HDFS;copyToLocal从HDFS上下载文件

[root@CentOS ~]# hdfs dfs -copyFromLocal jdk-8u191-linux-x64.rpm /
[root@CentOS ~]# rm -rf jdk-8u191-linux-x64.rpm
[root@CentOS ~]# hdfs dfs -copyToLocal /jdk-8u191-linux-x64.rpm /root/
[root@CentOS ~]# ls
anaconda-ks.cfg  hadoop-2.9.2.tar.gz  jdk-8u191-linux-x64.rpm
-mvToLocal/mvFromLocal

mvToLocal先下载文件,然后删除远程数据;mvFromLocal:先上传,再删除本地

[root@CentOS ~]# hdfs dfs -moveFromLocal jdk-8u191-linux-x64.rpm /dir1
[root@CentOS ~]# ls
anaconda-ks.cfg  hadoop-2.9.2.tar.gz
[root@CentOS ~]# hdfs dfs -moveToLocal /dir1/jdk-8u191-linux-x64.rpm /root
moveToLocal: Option '-moveToLocal' is not implemented yet.
-put/get

文件上传/下载

[root@CentOS ~]# hdfs dfs -get /dir1/jdk-8u191-linux-x64.rpm /root
[root@CentOS ~]# ls
anaconda-ks.cfg  hadoop-2.9.2.tar.gz  jdk-8u191-linux-x64.rpm
[root@CentOS ~]# hdfs dfs -put hadoop-2.9.2.tar.gz /dir1

更多命令请使用

[root@CentOS ~]# hdfs dfs -help 命令

例如,想知道touchz该如何使用

[root@CentOS ~]# hdfs dfs -touchz /dir1/Helloworld.java
[root@CentOS ~]# hdfs dfs -ls /dir1/
Found 5 items
-rw-r--r--   1 root supergroup          0 2020-09-25 23:47 /dir1/Helloworld.java
drwxr-xr-x   - root supergroup          0 2020-09-25 23:07 /dir1/d1
drwxr-xr-x   - root supergroup          0 2020-09-25 23:09 /dir1/d2
-rw-r--r--   1 root supergroup  366447449 2020-09-25 23:43 /dir1/hadoop-2.9.2.tar.gz
-rw-r--r--   1 root supergroup  176154027 2020-09-25 23:41 /dir1/jdk-8u191-linux-x64.rpm

Java API操作HDFS(了解)

命令请使用

[root@CentOS ~]# hdfs dfs -help 命令

例如,想知道touchz该如何使用

[root@CentOS ~]# hdfs dfs -touchz /dir1/Helloworld.java
[root@CentOS ~]# hdfs dfs -ls /dir1/
Found 5 items
-rw-r--r--   1 root supergroup          0 2020-09-25 23:47 /dir1/Helloworld.java
drwxr-xr-x   - root supergroup          0 2020-09-25 23:07 /dir1/d1
drwxr-xr-x   - root supergroup          0 2020-09-25 23:09 /dir1/d2
-rw-r--r--   1 root supergroup  366447449 2020-09-25 23:43 /dir1/hadoop-2.9.2.tar.gz
-rw-r--r--   1 root supergroup  176154027 2020-09-25 23:41 /dir1/jdk-8u191-linux-x64.rpm

Java API操作HDFS(了解)

①搭建开发步骤,创建一个Maven的项目(不用选择任何模 板),在pom.xml文件中添加如下依赖
②配置windows开发环境(很重要)

root@CentOS ~]# hdfs dfs -touchz /dir1/Helloworld.java 
[root@CentOS ~]# hdfs dfs -ls /dir1/ Found 5 items -rw-r--r-- 1 
root supergroup 0 2020-09- 25 23:47 /dir1/Helloworld.java 
drwxr-xr-x - root supergroup 0 2020-09- 25 23:07 /dir1/d1
 drwxr-xr-x - root supergroup 0 2020-09- 25 23:09 /dir1/d2 
 -rw-r--r-- 1 root supergroup 366447449 2020-09- 25 23:43 /dir1/hadoop-2.9.2.tar.gz
  -rw-r--r-- 1 root supergroup 176154027 2020-09- 25 23:41 /dir1/jdk-8u191-linux-x64.rpm 
<dependency> 
    <groupId>org.apache.hadoop</groupId> 
    <artifactId>hadoop-client</artifactId>
    <version>2.9.2</version> 
</dependency>

需要将hadoop-2.9.2解压在window的指定目录下,比如这里 我们解压在C:/目录下 在Windows系统的环境变量中添加HADOOP_HOME环境变量 将hadoop-window-master.zip中的bin目录下的文件全部拷贝 到%HADOOP_HOME%/bin目录下进行覆盖 重启IDEA否则集成开发环境不识别配置HADOOP_HOME环境 变量
③建议将core-site.xml和hdfs-site.xml文件拷贝到项目的 resources目录下
④在Windows上配置主机名和IP的映射关系(略)
⑤创建FileSystem和Configuration对象
##API操作案例

public class HDFSOperators {
  //声明操作hdfs的客户端
  public static FileSystem fs = null;
  //声明初始化对象
  public static Configuration conf = null;
  //静态代码块创建对象
  static {
      try {
          //将配置文件加入封装到Configuration
          conf = new Configuration();
          conf.addResource("core-site.xml");
          conf.addResource("hdfs-site.xml");
          //获取hdfs客户端
          fs = FileSystem.get(conf);
      } catch (IOException e) {
          e.printStackTrace();
      }
  }
  @Test
  //上传文件
  public void testUploadFile() throws IOException {
      //获取文件路径
      Path src = new Path("file:///D:\\资料\\大数据视频\\day2\\20200927_093823.mp4");
      Path dst = new Path("/");
      fs.copyFromLocalFile(src,dst);
  }
  @Test
  //IO上传文件
  public void testUploadFile2() throws IOException {
      //获取文件路径
      FileInputStream in =  new FileInputStream("C:\\Users\\k\\Desktop\\idea快捷键.txt");
      Path dst = new Path("/user/idea.text");
      OutputStream os = fs.create(dst);
      IOUtils.copyBytes(in,os,1024,true);
  }
  @Test
  // 文件下载
  public void testDownload1() throws IOException {
      final Path dst = new Path("C:\\Users\\k\\Desktop\\");
      final Path src = new Path("/user/idea.text");
      fs.copyToLocalFile(src,dst);
  }
  @Test
  // IO文件下载
  public void testDownload2() throws IOException {
      final Path dst = new Path("/user/idea.text");
      InputStream in = fs.open(dst);
      final FileOutputStream os = new FileOutputStream("C:\\Users\\k\\Desktop\\a.text");
      IOUtils.copyBytes(in,os,1024,true);
  }
  //判断文件或文件夹是否存在,返回Boolean值
  @Test
  public void testExsits() throws IOException {
      final Path dst = new Path("/user/idea.text");
      final boolean exists = fs.exists(dst);
      System.out.println(exists);
  }
  //递归查询文件或目录路径,是否是目录,文件大小
  @Test
  public void testListFile() throws IOException {
      final Path path = new Path("/");
      final RemoteIterator<LocatedFileStatus> listeners = fs.listFiles(path, true);
      while (listeners.hasNext()){
          final LocatedFileStatus fileStatus = listeners.next();
          System.out.println(fileStatus.getPath()+"\t"+fileStatus.isDirectory()+"\t"+fileStatus.getLen());
      }
  }
  //查询当前目录下文件和目录路径,是否是目录
  @Test
  public void testListStatus() throws IOException {
      final Path path = new Path("/");
      final FileStatus[] fileStatuses = fs.listStatus(path);
      for (FileStatus fileStatus:fileStatuses) {
          System.out.println(fileStatus.getPath()+"\t"+fileStatus.isDirectory());
      }
  }
  //删除某个文件
  @Test
  public void testDeleteWithTrash() throws IOException {
      final Path path = new Path("/user/root");
      final Trash trash = new Trash(fs, conf);
      trash.moveToTrash(path);
  }
}

©️2020 CSDN 皮肤主题: 游动-白 设计师:上身试试 返回首页