Leetocode之Find Duplicate File in System 问题

问题描述:

Given a list of directory info including directory path, and all the files with contents in this directory, you need to find out all the groups of duplicate files in the file system in terms of their paths.

A group of duplicate files consists of at least two files that have exactly the same content.

A single directory info string in the input list has the following format:

"root/d1/d2/.../dm f1.txt(f1_content) f2.txt(f2_content) ... fn.txt(fn_content)"

It means there are n files (f1.txt, f2.txt ... fn.txt with content f1_content, f2_content ... fn_content, respectively) in directory root/d1/d2/.../dm. Note that n >= 1 and m >= 0. If m = 0, it means the directory is just the root directory.

The output is a list of group of duplicate file paths. For each group, it contains all the file paths of the files that have the same content. A file path is a string that has the following format:

"directory_path/file_name.txt"

Note:

  1. No order is required for the final output.
  2. You may assume the directory name, file name and file content only has letters and digits, and the length of file content is in the range of [1,50].
  3. The number of files given is in the range of [1,20000].
  4. You may assume no files or directories share the same name in the same directory.
  5. You may assume each given directory info represents a unique directory. Directory path and file info are separated by a single blank space.

示例:

Input:
["root/a 1.txt(abcd) 2.txt(efgh)", "root/c 3.txt(abcd)", "root/c/d 4.txt(efgh)", "root 4.txt(efgh)"]
Output:  
[["root/a/2.txt","root/c/d/4.txt","root/4.txt"],["root/a/1.txt","root/c/3.txt"]]

问题来源:Find Duplicate File in System (详细地址:https://leetcode.com/problems/find-duplicate-file-in-system/description/)

思路分析:这道题为了解释清楚题目意思,说了那么多。其实意思很简单,就是把相同文件内容的文件归并到一组去,但是规定组成的组中必须是文件数大于2的。至于文件内容,其实就是括号里面的东西,所以在这解法也很简单,首先需要分割成一个一个的片段,然后再相应的片段里找到文件内容,把文件内容相同的文件归并起来。说了这么多,不知道大家有没有想到该用什么数据结构好。为了保证文件内容和对应文件的映射,所以我们需要HashMap,其中的key用来存放文件内容,value用来存放它对应的文件。用来存放文件集合的可以是set,也可以是list,因为题目说了没有顺序要求,所以二者都是允许的。其他的字符串分割啊,子字符串截取啊,文件名组合啥的都是按部就班就行的。最需要讲的是jdk8的解法,它采用的是"流"的处理方法,利用filter()方法过滤掉组中文件数小于2个的组,具体是咋操作的还是看代码好了。

代码:

HashMap + Set的解法:


换个输出的办法:


jdk8的解法(其实也是换了个输出的办法):





  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值