hdfs如何查找指定目录是否文件_是否有一个hdfs命令按照时间戳列出HDFS目录中的文件...

默认情况下,hdfs dfs -ls命令会提供未排序的文件列表。对于HDFS目录中文件的时间戳排序,目前没有内置命令,通常使用管道和sort命令实现。在Hadoop 2.7.x及更高版本中,可以使用-hdf dfs -ls -t或-hdfs dfs -ls -r -t选项按修改时间或访问时间进行排序。
摘要由CSDN通过智能技术生成

Is there a hdfs command to list files in HDFS directory as per timestamp, ascending or descending? By default, hdfs dfs -ls command gives unsorted list of files.

When I searched for answers what I got was a workaround i.e. hdfs dfs -ls /tmp | sort -k6,7. But is there any better way, inbuilt in hdfs dfs commandline?

解决方案

No, there is no other option to sort the files based on datetime.

If you are using hadoop version < 2.7, you will have to use sort -k6,7 as you are doing:

hdfs dfs -ls /tmp | sort -k6,7

And for hadoop 2.7.x ls command , there are following options available :

Usage: hadoop fs -ls [-d] [-h] [-R] [-t] [-S] [-r] [-u]

Options:

-d: Directories are listed as plain files.

-h: Format file sizes in a human-readable fashion (eg 64.0m instead of 67108864).

-R: Recursively list subdirectories encountered.

-t: Sort output by modification time (most recent first).

-S: Sort output by file size.

-r: Reverse the sort order.

-u: Use access time rather than modification time for display and sorting.

So you can easily sort the files:

hdfs dfs -ls -t -R (-r) /tmp

好的,首先需要在flume/conf目录下创建一个名为flume-hdfs.conf的文件,然后将以下配置复制到文件: ``` # Name the components on this agent agent.sources = source1 agent.sinks = sink1 agent.channels = channel1 # Describe/configure the source agent.sources.source1.type = exec agent.sources.source1.command = tail -F /var/log/messages # Describe the sink agent.sinks.sink1.type = hdfs agent.sinks.sink1.hdfs.path = /flume/data/%Y-%m-%d/%H agent.sinks.sink1.hdfs.fileType = DataStream agent.sinks.sink1.hdfs.writeFormat = Text agent.sinks.sink1.hdfs.rollInterval = 0 agent.sinks.sink1.hdfs.rollSize = 134217728 agent.sinks.sink1.hdfs.rollCount = 0 agent.sinks.sink1.hdfs.batchSize = 1000 # Use a channel which buffers events in memory agent.channels.channel1.type = memory agent.channels.channel1.capacity = 10000 agent.channels.channel1.transactionCapacity = 1000 # Bind the source and sink to the channel agent.sources.source1.channels = channel1 agent.sinks.sink1.channel = channel1 ``` 这个配置会启动一个agent,监控/var/log/messages文件,并将日志数据写入到HDFS。其,%Y-%m-%d/%H是一个时间戳,表示数据按照日期和小时进行分区存储。具体来说,数据将会被存储到/flume/data/2021-08-12/10这个目录下。 接下来,我们需要启动Flume agent并进行测试。打开终端,切换到flume目录下,然后运行以下命令: ``` bin/flume-ng agent --conf ./conf/ --conf-file ./conf/flume-hdfs.conf --name agent1 -Dflume.root.logger=INFO,console ``` 这个命令会启动一个名为agent1的Flume agent,并且会将日志输到控制台。 接下来,我们可以在/var/log/messages文件添加一些新的日志数据,然后观察HDFS是否成功写入了数据。我们可以使用以下命令来检查: ``` hadoop fs -ls /flume/data/2021-08-12/10 ``` 如果输类似于以下内容,则表示数据已经成功写入到了HDFS: ``` -rw-r--r-- 3 hduser supergroup 384 2021-08-12 10:19 /flume/data/2021-08-12/10/FlumeData.1628746760578 ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值