java输出目录节点,Groovy / Java：目录结构的并行处理，其中每个节点都是子目录/文件的列表...

最新推荐文章于 2023-10-06 20:56:39 发布

James-bean

最新推荐文章于 2023-10-06 20:56:39 发布

阅读量100

点赞数

文章标签： java输出目录节点

Here's my current problem:

I have a directory structure stored inside a cloud storage somewhere. Under the Root folder, I have 1000+ subdirectories and each of those have a single subdirectory under them. And within each of those subdirectories, a single file exists. So a simplified diagram looks something like this:

Root

________________|________________

| | | |

FolderA FolderB ... FolderY FolderZ

| | | |

Folder1 Folder2 Folder3 Folder4

| | | |

FileA FileB FileC FileD

For each node, it has properties type ("directory" or "file") and path ("/Root/FolderB"). And the only way to retrieve these nodes is to call a method called listDirectory(path) which goes to the cloud, gets all the objects within that path. I need to find all the files and process them.

The problem is that with the way that it's structured, if I want to look for FileA, I need to call listDirectory() three times (Root -> FolderA -> Folder1) which you can imagine slows the whole thing down significantly.

I want to process this in a parallel manner but I can't seem to get this to work. I've tried doing it recursively by using GParsPool.withPool with eachParallel() but I found out that parallel programming with recursion can be a dangerous (and expensive) slope. I've tried doing it linearly by creating a synchronized list that holds all the paths that are of directories that each thread have visited. But none of these seems to work or provide an efficient solution to this problem.

FYI, I can't change the listDirectory() method. Each call will retrieve all the objects in that path.

TL;DR: I need to find a parallel way to process through a cloud-storage file structure where the only way to get the folders/files are through a listDirectory(path) method.

解决方案

If caching the directory structure in memory by using a deamon is not an option.

or caching the directory structure by initially creating a one time mapping of the storage structure in the memory and hooking into each add remove update operation to the storage and changing the database accordingly is not an option.

assuming the storage structure is a Tree (usually is) because the way listDirectory() works i think you are better off using Breadth first search to search the storage structure tree. that way you can search one level at time using parallel programming

your code could look something like this:

SearchElement.java - represents either a directory or a file

public class SearchElement {

private String path;

private String name;

public SearchElement(String path, String name) {

this.path = path;

this.name = name;

}

public String getPath() {

return path;

}

public String getName() {

return name;

}

ElementFinder.java - a class that searches the storage you need to replace the listDirectory function to your implementation

import java.util.ArrayList;

import java.util.Collection;

import java.util.Optional;

import java.util.Queue;

import java.util.concurrent.ConcurrentLinkedQueue;

import java.util.concurrent.atomic.AtomicReference;

public class ElementFinder {

private final SearchElement ROOT_DIRECTORY_PATH = new SearchElement("/", "");

public Optional find(String elementName) {

Queue currentLevelElements = new ConcurrentLinkedQueue();

currentLevelElements.add(ROOT_DIRECTORY_PATH);

AtomicReference> wantedElement = new AtomicReference<>(Optional.empty());

while (!currentLevelElements.isEmpty() && wantedElement.get().isEmpty()) {

Queue nextLevelElements = new ConcurrentLinkedQueue();

currentLevelElements.parallelStream().forEach(currentSearchElement -> {

Collection subDirectoriesAndFiles = listDirectory(currentSearchElement.getPath());

subDirectoriesAndFiles.stream()

.filter(searchElement -> searchElement.getName().equals(elementName))

.findAny()

.ifPresent(element -> wantedElement.set(Optional.of(element)));

nextLevelElements.addAll(subDirectoriesAndFiles);

});

currentLevelElements = nextLevelElements;

}

return wantedElement.get();

}

private Collection listDirectory(String path) {

return new ArrayList<>(); // replace me!

}

James-bean

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
java输出目录节点,Groovy / Java：目录结构的并行处理，其中每个节点都是子目录/文件的列表...

Here's my current problem:I have a directory structure stored inside a cloud storage somewhere. Under the Root folder, I have 1000+ subdirectories and each of those have a single subdirectory under th...
复制链接

扫一扫