Hadoop分布式文件系统(Hadoop Distributed File System,HDFS)是Hadoop核心组件之一,如果已经安装了Hadoop,其中就已经包含了HDFS组件,不需要另外安装。
介绍Linux操作系统中关于HDFS文件操作的常用Shell命令,利用Web界面查看和管理Hadoop文件系统,以及利用Hadoop提供的Java API进行基本的文件操作。
本教程介绍Linux操作系统中关于HDFS文件操作的常用Shell命令,利用Web界面查看和管理Hadoop文件系统,以及利用Hadoop提供的Java API进行基本的文件操作。
在学习HDFS编程实践前,我们需要启动Hadoop。执行如下命令
// start-dfs.sh
一、利用Shell命令与HDFS进行交互
Hadoop支持很多Shell命令,其中fs是HDFS最常用的命令,利用fs可以查看HDFS文件系统的目录结构、上传和下载数据、创建文件等。
我们可以在终端输入如下命令,查看fs总共支持了哪些命令
// hadoop fs
Usage: hadoop fs [generic options]
[-appendToFile … ]
[-cat [-ignoreCrc] …]
[-checksum …]
[-chgrp [-R] GROUP PATH…]
[-chmod [-R] <MODE[,MODE]… | OCTALMODE> PATH…]
[-chown [-R] [OWNER][:[GROUP]] PATH…]
[-copyFromLocal [-f] [-p] [-l] [-d] … ]
[-copyToLocal [-f] [-p] [-ignoreCrc] [-crc] … ]
[-count [-q] [-h] [-v] [-t []] [-u] [-x]
[-cp [-f] [-p | -p[topax]] [-d] … ]
[-createSnapshot []]
[-deleteSnapshot ]
[-df [-h] [
[-du [-s] [-h] [-x]
[-expunge]
[-find
[-get [-f] [-p] [-ignoreCrc] [-crc] … ]
[-getfacl [-R]
[-getfattr [-R] {-n name | -d} [-e en]
[-getmerge [-nl] [-skip-empty-file] ]
[-help [cmd …]]
[-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [
[-mkdir [-p]
[-moveFromLocal … ]
[-moveToLocal ]
[-mv … ]
[-put [-f] [-p] [-l] [-d] … ]
[-renameSnapshot ]
[-rm [-f] [-r|-R] [-skipTrash] [-safely] …]
[-rmdir [–ignore-fail-on-non-empty]
[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>}
[-setfattr {-n name [-v value] | -x name}
[-setrep [-R] [-w]
[-stat [format]
[-tail [-f] ]
[-test -[defsz]
[-text [-ignoreCrc] …]
[-touchz
[-truncate [-w]
[-usage [cmd …]]
在终端输入如下命令,可以查看具体某个命令的作用.例如:我们查看put命令如何使用,可以输入如下命令
// hadoop fs -help put
-put [-f] [-p] [-l] [-d] … :
Copy files from the local file system into fs. Copying fails if the file already
exists, unless the -f flag is given.
Flags:
-p Preserves access and modification times, ownership and the mode.
-f Overwrites the destination if it already exists.
-l Allow DataNode to lazily persist the file to disk. Forces
replication factor of 1. This flag will result in reduced
durability. Use with care.
-d Skip creation of temporary file(.COPYING).
创建目录
// hdfs dfs –mkdir /input
删除目录:
// hdfs dfs –rm –r /input
从本地上传文件到HDFS
// hdfs dfs -put /home/hadoop/myLocalFile.txt input
从HDFS下载文件到本地
hdfs dfs -get input/myLocalFile.txt /home/hadoop/下载
查看目录或者文件
// hdfs dfs –ls input
查看HDFS上的文件内容
// hdfs dfs –cat input/myLocalFile.txt
HDFS中进行文件复制
hdfs dfs -cp input/myLocalFile.txt /input
利用Web界面管理HDFS
http://localhost:50070
====================================
利用Java API与HDFS进行交互
注意:jdk要1.8的
将文件eclipse2018.zip解压到本地目录
执行:eclipse.exe
在eclipse中创建Java项目
添加Java开发包
(1)”/usr/local/hadoop/share/hadoop/common”目录下的hadoop-common-2.7.1.jar和haoop-nfs-2.7.1.jar;
(2)"/usr/local/hadoop/share/hadoop/common/lib”目录下的所有JAR包;
(3)“/usr/local/hadoop/share/hadoop/hdfs”目录下的haoop-hdfs-2.7.1.jar和haoop-hdfs-nfs-2.7.1.jar;
(4)“/usr/local/hadoop/share/hadoop/hdfs/lib”目录下的所有JAR包。
创建
判断文件是否存在:
package com.testing;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
public class HDFSFileIfExist {
public static void main(String[] args){
try{
String fileName = “/input/input.txt”;
Configuration conf = new Configuration();
conf.set(“fs.defaultFS”, “hdfs://192.168.224.10:9000”);
conf.set(“fs.hdfs.impl”, “org.apache.hadoop.hdfs.DistributedFileSystem”);
FileSystem fs = FileSystem.get(conf);
if(fs.exists(new Path(fileName))){
System.out.println(“文件存在”);
}else{
System.out.println(“文件不存在”);
}
}catch (Exception e){
e.printStackTrace();
}
}
}
//Write File
如果执行代码的时候遇到权限问题,那么在hadoop server中运行命令hadoop fs -chmod -R 777 /
package com.testing;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.Path;
public class HDFSWriteToFile{
public static void main(String[] args) {
try {
Configuration conf = new Configuration();
conf.set("fs.defaultFS","hdfs://192.168.224.10:9000");
conf.set("fs.hdfs.impl","org.apache.hadoop.hdfs.DistributedFileSystem");
FileSystem fs = FileSystem.get(conf);
byte[] buff = "Hello world".getBytes(); // 要写入的内容
String filename = "/testx"; //要写入的文件名
FSDataOutputStream os = fs.create(new Path(filename));
os.write(buff,0,buff.length);
System.out.println("Create:"+ filename);
os.close();
fs.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
//Read File
package com.testing;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FSDataInputStream;
public class HDFSReadFile{
public static void main(String args[]) {
try {
Configuration conf = new Configuration();
conf.set(“fs.defaultFS”,“hdfs://192.168.224.10:9000”);
conf.set(“fs.hdfs.impl”,“org.apache.hadoop.hdfs.DistributedFileSystem”);
FileSystem fs = FileSystem.get(conf);
Path file = new Path("/testx");
FSDataInputStream getIt = fs.open(file);
BufferedReader d = new BufferedReader(new InputStreamReader(getIt));
String content = d.readLine(); //读取文件一行
System.out.println(content);
d.close(); //关闭文件
fs.close(); //关闭hdfs
} catch (Exception e) {
e.printStackTrace();
}
}
}
===========================================