【大数据课程】Hadoop基础知识（大学生期末复习可用）

我是如此相信ᯤ⁶⁶ᴳ

于 2024-03-12 11:35:19 发布

阅读量906

点赞数 17

分类专栏：大学生期末复习文章标签： hadoop 大数据分布式

本文链接：https://blog.csdn.net/weixin_51591826/article/details/136647369

版权

大学生期末复习专栏收录该内容

1 篇文章 0 订阅

订阅专栏

Hadoop

1.介绍

狭义上来说，hadoop就是单独指代hadoop这个软件

HDFS:分布式文件系统MapReduce:分布式计算系统Yarn:分布式样集群资源管理

广义上来说，hadoop指代大数据的一个生态圈，包括很多其他的软件

Hadoop核心-HDFS

HDFS是一个主/从(Mater/Slave)体系结构HDFS由四部分组成，HDFS Client、NameNode、DataNode和Secondary NameNode。

1.Client：就是客户端

文件切分。文件上传HDS的时候，Client将文件切分成一个一个的Block，然后进行存储。
与NameNode交互，获取文件的位置信息。
与DataNode交互，读取或者写入数据。
Client提供一些命令来管理和访问HDFS，比如启动或者关闭HDFS。

2.NameNode：就是master，它是一个主管、管理者

管理HDFS的名称空间
管理数据块（Block）映射信息
配置副本策略
处理客户端读写请求

3.DataNode：就是Slave。NameNode下达命令，DataNode执行实际的操作

存储实际的数据块
执行数据块的读/写操作

4.Secondary NameNode：并非NameNode的热备份。当NameNode挂掉的时候，它并不能马上替换NameNode并提供服务

辅助NameNode,分担其工作量
定期合并fsimage和fsedits，并推送给NameNode
在紧急情况下，可辅助恢复NameNode

4.1NameNode作用

NameNode在内存中保存着整个文件系统的名称空间和文件数据块的地址映射
整个HDFS可存储的文件数受限于NameNode的内存大小

HDFS的API操作

1.1配置Windows下Hadoop环境

1.2导入Maven依赖

1.3使用url方式访问数据(了解)

@Test
public void demo1( ) throws Exception{
//第一步：注册hdfs的ur1
URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
//获取文件输入流
InputStream inputStream = new URL("hdfs://node01:8020/a.txt").openStream()
//获取文件输出流
FileOutputStream outputStream = new FileOutputStream(new File("D:\\hello.txt"))
//实现文件的拷贝
IOUtils.copy(inputStream,outputstream);
//关闭流
IOUtils.closeQuietly(inputStream);
IOUtils.closeQuietly(outputStream);
}

1.4使用文件系统方式访问数据（掌握）

1.4.1涉及的主要类

在Java中操作HDFS，主要涉及以下Class:

Configuration

- 该类的对象封装了客户端或者服务器的配置

FileSystem

- 该类的对象是一个文件系统对象，可以使用该对象的一些方法来对文件进行操作，通过FileSystem的静态方法get获得该对象

FileSystem fs = FileSystem.get(conf)

1.4.2获取FileSystem的几种方式

第一种方式

@Test
    public void getFileSystem1()throws IOException{
        Configuration configuration= new Configuration();
        //指定我们使用的文件系统类型：
        configuration.set("fs.defaultFS","hdfs://node01:8020/");
        //获取指定的文件系统
        FileSystem fileSystem  = FileSystem.get(configuration);
        System.out.println(fileSystem.toString());
    }

第二种方式

@Test
    public void getFileSystem2() throws Exception{
        FileSystem fileSystem = FileSystem.get(new URI("hdfs://node01:8020/"), new Configuration());
        System.out.println(fileSystem);
    }

第三种方式

@Test
    public void getFileSystem3()throws IOException{
        Configuration configuration= new Configuration();
        configuration.set("fs.defaultFS","hdfs://node01:8020/");
        FileSystem fileSystem = FileSystem.newInstance(configuration);
        System.out.println(fileSystem.toString());
    }

第四种方式

@Test
    public void getFileSystem4() throws Exception{
        FileSystem fileSystem = FileSystem.newInstance(new URI("hdfs://node01:8020/"), new Configuration());
        System.out.println(fileSystem);
    }

1.4.3遍历HDFS中所有文件

使用API遍历

@Test
    public void listMyFiles()throws Exception{
    //获取fileSystem类
    FileSystem fileSystem FileSystem.get(new URI("hdfs://node01:8020"),new Configuration());
    //获取RemoteIterator得到所有的文件或者文件夹，第一个参数指定遍历的路径，第二个参数表示是否要递归遍历  
                        RemoteIterator<LocatelFileStatus>locatedFileStatusRemoteIterator=fileSystem.listFiles(newPath("/"),true);
while (locatedFileStatusRemoteIterator.hasNext()){
LocatedFileStatus next= locatedFileStatusRemoteIterator.next();
System.out.println(next.getPath().toString());}
fileSystem.close();}

1.4.4HDFS上创建文件夹

@Test
    public void mkdirs()throws Exception{
        FileSystem fileSystem =FileSystem.get(new URI("hdfs://node01:8020"),new Configuration());
        boolean mkdirs =fileSystem.mkdirs(new Path("/hello/mydir/test"));
        fileSystem.close();}

1.4.5下载文件

@Test
	public void getFileToLocal()throws Exception{
		FileSystem fileSystem= FileSystem.get(new URI("hdfs://node01:8020"),new Configuration());
		FSDataInputStream inputStream =fileSystem.open(new Path("/timer.txt"));
		FileOutputStream outputStream= new FileOutputStream(new File("e:\\timer.txt"));
		IOUtils.copy(inputStream,outputStream )
		IOUtils.closeQuietly(inputStream);
		IOUtils.closeQuietly(outputStream);
		fileSystem.close();

1.4.6HDFS文件上传

@Test
	public void putData()throws Exception{
		FileSystem fileSystem = FileSystem.get(new URI("hdfs://node01:8020"),new Configuration());
		fileSystem.copyFromLocalFile(new Path("file:///c:\\install.log"),new Path("/hello/mydir/test"));
		fileSystem.close();}

1.4.7hdfs访问权限控制

停止hdfs集群，在node01机器上执行以下命令

cd /export/servers/hadoop-2.7.5
	sbin/stop-dfs.sh

修改node01机器上的hdfs-site.xml当中的配置文件

cd /export/servers/hadoop-2.7.5/etc/hadoop
	vim hdfs-site.xml

<property>
	<name>dfs.permissions.enabled</name>
	<value>true</value>
</property>

修改完成之后配置文件发送到其他机器上面去

scp hdfs-site.xml node02:SPWD
	scp hdfs-site.xml node03:SPWD

重启hdfs集群

cd /export/servers/hadoop-2.7.5
	sbin/start-dfs.sh

随意上传一些文件到我们hadoop集群当中准备测试使用

cd /export/servers/hadoop-2.7.5/etc/hadoop
	hdfs dfs -mkdir /config
	hdfs dfs -put *.xml /config
	hdfs dfs -chmod 600 /config/core-site.xml

使用代码准备下载文件

@Test
    public void getConfig()throws Exception{
        FileSystem.fileSystem FileSystem.get(new URI("hdfs://node01:8020"),new Configuration(),"hadoop);
        fileSystem.copyToLocalFile(new Path("/config/core-site.xml"),new
        Path("file:///c:/core-site.xml"));
        fileSystem.close();
        }

1.4.8小文件合并

在我们的HDFS的Shell命令模式下，可以通过命令行将很多的hdfs文件合并成一个大文件下载到本地

cd /export/servers
    hdfs dfs -getmerge /config/*.xml./hello.xml

既然可以在下载的时候将这些小文件合并成一个大文件一起下载，那么肯定就可以在上传的时候将小文件合并到一个大文件里面去

@Test
    public void mergeFile()throws Exception{
        //获取分布式文件系统
        FileSystem fileSystem=FileSystem.get(new
    URI("hdfs://192.168.52.250 8020"),new Configuration(),"root");
        FSDataOutputStream outputStream=fileSystem.create(new Path("/bigfile.txt"));
        //获取本地文件系统
        LocalFileSystem local=FileSystem.getLocal(new Configuration());
        //通过本地文件系统获取文件列表，为一个集合
        FileStatus[]fileStatuses=local.listStatus(new Path("file:///E:\\input"));
        for (FileStatus fileStatus:fileStatuses){
                FSDataInputStream inputstream=local.open(fileStatus.getPath());
                IOUtils.copy(inputStream,outputstream);
                IOUtils.closeQuietly(inputStream);
        }
        IOUtils.closeQuietly(outputStream);
        local.close();
        fileSystem.close();
    }