Hadoop常用

清扬_br

已于 2024-07-24 17:49:15 修改

阅读量717

点赞数 19

文章标签： hadoop

于 2024-07-24 17:03:31 首次发布

本文链接：https://blog.csdn.net/qq_15123471/article/details/140667870

版权

1.hdfs常用命令
测试 hadoop version
启动命令： ./start-dfs.sh 启动hdfs ./start-yarn.sh 启动yarn cd /usr/local/Cellar/hadoop/3.3.3/sbin
./start-all.sh 启动所有 win: start-all.cmd
mapred --daemon start historyserver 启动历史服务
//win： cd D:\1.tools\hadoop-3.1.3\bin mapred.cmd historyserver
hdfs http://localhost:9870/dfshealth.html#tab-overview
yarn http://localhost:8088/cluster localhost不好使换成 http://192.168.31.174/:8088/cluster
jobhistory http://localhost:19888/jobhistory/ 不好使换成 http://192.168.31.174:19888/jobhistory/

终止命令：./stop-dfs.sh cd /usr/local/Cellar/hadoop/3.3.3/sbin
初始化命令： ./hdfs namenode -format cd /usr/local/Cellar/hadoop/3.3.3/sbin
查看文件： hdfs dfs -ls /
实例：hadoop fs -ls / # 查看hdfs根目录 hdfs dfs -ls /user/zhang # 查看hdfs其它目录(/user/zhang)
写命令： hadoop dfs -put
实例：hadoop fs -put /usr/local/hadoopTest/test.txt /input # 将client node中的test.txt文件 put 到hdfs的/input中
读命令： hadoop dfs -get
实例：hdfs dfs -get /user/zhang/test/README.txt.gz # 获取hdfs中的/user/zhang/test/README.txt.gz文件，到client node当前目录
hdfs dfs -text hdfs://dc2/user/mrecom/hive/warehouse/limengran/push_vivovip_user/p_date=20230719/*|head -n 10 看内容
创建文件夹：hadoop fs -mkdir
实例：hdfs dfs -mkdir /user/zhang/abc # 创建一个abc目录
删除文件夹：hdfs dfs rmdir
实例：hdfs dfs rmdir /user/zhang/demo # 删除/user/zhang/demo目录

删除文件： hdfs dfs -rm
hdfs dfs -rm -r
实例：hdfs dfs -rm /user/zhang/NOTICE.txt # 删除/user/zhang/NOTICE.txt文件
实例：hdfs dfs -rm -r /user/zhang/abc # 删除/user/zhang/abc目录

查找文件： hadoop fs -find /user/dataflair/dir1/ -name sample -print
找出能匹配上的所有文件
-name pattern不区分大小写，对大小写不敏感。
-iname pattern对大小写敏感。
-print打印。
-print0打印在一行。
实例：hadoop fs -find /user/dataflair/dir1/ -name sample -print
格式：hadoop fs -find … …
统计个数： hdfs dfs -count [-q]
统计一个指定目录下的文件结点数量。
实例：$ hadoop fs -count /testelephant
2 1 108 testelephant
第一个数值 2 表示 /testelephant 下的文件夹的个数，
第二个数值 1 表是当前文件夹下文件的个数，
第三个数值 108 表示该文件夹下文件所占的空间大小，这个大小是不计算副本的个数的，单位是字节（byte）。
$ hadoop fs -count -q /sunwg
1024 1021 10240 10132 2 1 108 /testelephant
第一个数值 1024 ，表示总的文件包括文件夹的限额。
第二个数值 1021 ，表示目前剩余的文件限额，即还可以创建这么多的文件或文件夹。
第三个数值 10240 ，表示当前文件夹空间的限额。
第四个数值 10132 ，表示当前文件夹可用空间的大小，这个限额是会计算多个副本的。
剩下的三个数值与 -count 的结果一样。
查看详情： hdfs dfs -stat path
实例：hdfs dfs -stat /user/dataflair/dir1
文件校验：hdfs dfs -test -[ezd] URI
实例：hdfs dfs -test -e sample
hdfs dfs -test -z sample
hdfs dfs -test -d sample
-e 检查文件是否存在。如果存在则返回0。
-z 检查文件是否是0字节。如果是则返回0。
-d 如果路径是个目录，则返回1，否则返回0。
参考：https://blog.csdn.net/qq_20042935/article/details/123091721
查看存储情况：hadoop dfsadmin -report
2.yarn常用命令
各个服务组件逐一启动/停止
分别启动/停止HDFS组件
　　hadoop-daemon.sh start / stop
namenode / datanode / secondarynamenode
　　启动/停止YARN
　　yarn-daemon.sh start / stop r
esourcemanager / nodemanager
3.java调用HDFS的常用API
初始化：Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(new URI(“hdfs://192.168.31.174/:9000”), conf, “root”);
上传： fs.copyFromLocalFile(false,true,new Path(“/usr/local/hadoopTest/test.txt”),new Path(“/input2”));
说明：//参数一：是否删除源文件
//参数二：是否覆盖目标文件
//参数三：本地源文件路径
// 参数四：HDFS 目标文件路径
下载： fs.copyToLocalFile(false, new Path(“/input”), new Path(“/usr/local/hadoopTest2/”));
说明：//参数一 boolean delSrc 指是否将原文件删除
//参数二 Path src 指要下载的文件路径（HDFS文件）
//参数三 Path dst 指将文件下载到的路径
//参数四 boolean useRawLocalFileSystem 是否使用RawLocalFileSystem作为本地文件系统默认是false
获取元信息：RemoteIterator iterator = fs.listFiles(path, true); //详情见Hadooptest项目
说明： // 列出目录下的所有的文件
// 参数一路径
// 参数二是否递归遍历
LocatedFileStatus next = iterator.next();
//BlockLocation[] 数组中存储的是当前文件的所有的物理切块数据信息（一个文件分不同块存储）
BlockLocation[] blockLocations = next.getBlockLocations();
//当前文件名
String name = next.getPath().getName();
//当前文件路径
Path path1 = next.getPath();
//文件的副本个数
short replication = next.getReplication();
//数据的物理切块的大小
long blockSize = next.getBlockSize();
//遍历该文件的每个块
for (BlockLocation blockLocation : blockLocations) {
String[] cachedHosts = blockLocation.getCachedHosts();
//
String[] names = blockLocation.getNames();
//数据块和他副本所在的主机节点(因为一个 block 块可能有多个副本,默认值是 3)
String[] hosts = blockLocation.getHosts();
//数据块的长度（实际占用的大小）
long length = blockLocation.getLength();
//每个物理切块的起始偏移量(假如一个文件被分为三块，则第一个块的起始偏移量为0，第二个块的起始偏移量为第一个块存储的字节数)
long offset = blockLocation.getOffset();

删除文件或目录：boolean suc3= fs.delete(new Path(“/input3”),true);
说明：//删除指定文件/文件夹删除是会放到一个类似回收站的地方如果第二个参数设置为true那么就会直接彻底删除
判断是否存在：boolean suc2= fs.exists(new Path(“/input3”));
说明：//判断文件或目录存在与否存在返回true
创建文件或目录：FSDataOutputStream out=fs.create(new Path(“/input4/test.data”),true) ;
说明：//写文件到hdfs系统创建test.data并写入内容
例子：
读API：FSDataInputStream in = fileSystem.open(path);
in.read(b)
读代码实例：详情见Hadooptest项目
Configuration conf = new Configuration();
FileSystem fileSystem = FileSystem.get(conf);
Path path = new Path(“/path/to/file.ext”);
if (!fileSystem.exists(path)) {
System.out.println(“File does not exists”);
return;
}
FSDataInputStream in = fileSystem.open(path);
int numBytes = 0;
while ((numBytes = in.read(b))> 0) {
System.out.prinln((char)numBytes));// code to manipulate the data which is read
}
in.close();
out.close();
fileSystem.close();
写API：FSDataOutputStream out = fileSystem.create(path);
out.write(b, 0, numBytes);
写代码实例：
Configuration conf = new Configuration();
FileSystem fileSystem = FileSystem.get(conf);
// Check if the file already exists
Path path = new Path(“/path/to/file.ext”);
if (fileSystem.exists(path)) {
System.out.println(“File " + dest + " already exists”);
return;
}
// Create a new file and write data to it.
FSDataOutputStream out = fileSystem.create(path);
InputStream in = new BufferedInputStream(new FileInputStream(new File(source)));
byte[] b = new byte[1024];
int numBytes = 0;
while ((numBytes = in.read(b)) > 0) {
out.write(b, 0, numBytes);
}
// Close all the file descripters
in.close();
out.close();
fileSystem.close();

4.java调用hadoop实现mapreduce
//获取配置项
Configuration conf = new Configuration();
//获取job
Job job = Job.getInstance(conf);
//设置打jar包的类
job.setJarByClass(TestMapReduce.class);

// 设置输入文件路径，该方法可以调用多次，用于设置多个输入文件路径
FileInputFormat.addInputPath(job,new Path(args[0]));

//指定map和reduce的序列化类型以及人任务类
job.setMapperClass(MyMap.class);
job.setMapOutputKeyClass(LongWritable.class);
job.setMapOutputValueClass(Text.class);

job.setReducerClass(MyReduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);

//指定输出hdfs的路径
FileOutputFormat.setOutputPath(job,new Path(args[1]));
//提交任务
job.waitForCompletion(true);
详情见Hadooptest项目
hadoop 练习项目：https://github.com/QingYang12/hadooptest

参考：https://blog.csdn.net/W_X_L11/article/details/109062723和
https://blog.csdn.net/whandgdh/article/details/110296608

清扬_br

关注

19
点赞
踩
18

收藏

觉得还不错? 一键收藏
0
评论
Hadoop常用

hdfs dfs -text hdfs://dc2/user/mrecom/hive/warehouse/limengran/push_vivovip_user/p_date=20230719/*|head -n 10 看内容。实例：hdfs dfs -get /user/zhang/test/README.txt.gz # 获取hdfs中的/user/zhang/test/README.txt.gz文件，到client node当前目录。
复制链接

扫一扫