java输出目录节点,Groovy / Java:目录结构的并行处理,其中每个节点都是子目录/文件的列表...

Here's my current problem:

I have a directory structure stored inside a cloud storage somewhere. Under the Root folder, I have 1000+ subdirectories and each of those have a single subdirectory under them. And within each of those subdirectories, a single file exists. So a simplified diagram looks something like this:

Root

________________|________________

| | | |

FolderA FolderB ... FolderY FolderZ

| | | |

Folder1 Folder2 Folder3 Folder4

| | | |

FileA FileB FileC FileD

For each node, it has properties type ("directory" or "file") and path ("/Root/FolderB"). And the only way to retrieve these nodes is to call a method called listDirectory(path) which goes to the cloud, gets all the objects within that path. I need to find all the files and process them.

The problem is that with the way that it's structured, if I want to look for FileA, I need to call listDirectory() three times (Root -> FolderA -> Folder1) which you can imagine slows the whole thing down significantly.

I want to process this in a parallel manner but I can't seem to get this to work. I've tried doing it recursively by using GParsPool.withPool with eachParallel() but I found out that parallel programming with recursion can be a dangerous (and expensive) slope. I've tried doing it linearly by creating a synchronized list that holds all the paths that are of directories that each thread have visited. But none of these seems to work or provide an efficient solution to this problem.

FYI, I can't change the listDirectory() method. Each call will retrieve all the objects in that path.

TL;DR: I need to find a parallel way to process through a cloud-storage file structure where the only way to get the folders/files are through a listDirectory(path) method.

解决方案

If caching the directory structure in memory by using a deamon is not an option.

or caching the directory structure by initially creating a one time mapping of the storage structure in the memory and hooking into each add remove update operation to the storage and changing the database accordingly is not an option.

assuming the storage structure is a Tree (usually is) because the way listDirectory() works i think you are better off using Breadth first search to search the storage structure tree. that way you can search one level at time using parallel programming

your code could look something like this:

SearchElement.java - represents either a directory or a file

public class SearchElement {

private String path;

private String name;

public SearchElement(String path, String name) {

this.path = path;

this.name = name;

}

public String getPath() {

return path;

}

public String getName() {

return name;

}

}

ElementFinder.java - a class that searches the storage you need to replace the listDirectory function to your implementation

import java.util.ArrayList;

import java.util.Collection;

import java.util.Optional;

import java.util.Queue;

import java.util.concurrent.ConcurrentLinkedQueue;

import java.util.concurrent.atomic.AtomicReference;

public class ElementFinder {

private final SearchElement ROOT_DIRECTORY_PATH = new SearchElement("/", "");

public Optional find(String elementName) {

Queue currentLevelElements = new ConcurrentLinkedQueue();

currentLevelElements.add(ROOT_DIRECTORY_PATH);

AtomicReference> wantedElement = new AtomicReference<>(Optional.empty());

while (!currentLevelElements.isEmpty() && wantedElement.get().isEmpty()) {

Queue nextLevelElements = new ConcurrentLinkedQueue();

currentLevelElements.parallelStream().forEach(currentSearchElement -> {

Collection subDirectoriesAndFiles = listDirectory(currentSearchElement.getPath());

subDirectoriesAndFiles.stream()

.filter(searchElement -> searchElement.getName().equals(elementName))

.findAny()

.ifPresent(element -> wantedElement.set(Optional.of(element)));

nextLevelElements.addAll(subDirectoriesAndFiles);

});

currentLevelElements = nextLevelElements;

}

return wantedElement.get();

}

private Collection listDirectory(String path) {

return new ArrayList<>(); // replace me!

}

}

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值