将树存储在mysql中,将深层目录树存储在数据库中

最新推荐文章于 2022-06-09 10:36:36 发布

逝落之心

最新推荐文章于 2022-06-09 10:36:36 发布

阅读量80

点赞数

文章标签：将树存储在mysql中

I am working on a desktop application that is much like WinDirStat or voidtools' Everything - it maps hard drives, i.e. creates a deeply nested dictionary out of the directory tree.

The desktop application should then store the directory trees in some kind of database, so that a web application can be used to browse them from root, depth level by depth level.

Assume both applications run locally on the same machine for the time being.

The question that comes to mind is how the data should be structured and what database should be utilized, considering:

1) RAM consumption should be reasonable

2) The time it takes to for the directory to be ready for viewing in the web application should be minimal

P.S -

My initial approach was serializing each file system node to JSON separately and inserting each into Mongo, with object references linking them to their children. That way the web application could easily load the data based on user demand.

However, I am worried that making so many (a million, by average) independent inserts to Mongo will take a lot of time; if I make bulk inserts that means I have to keep each bulk in memory.

I also considered dumping the entire tree as one deeply nested JSON, but the data is too large to be a Mongo document. GridFS can be used to store it, but then I would have the load the entire tree in the web application even though the deep nodes may not be of interest.

解决方案

Given your requirements of:

A) Low RAM usage

B) Meeting file size limitations in Mongo

C) A responsive UI

I'd consider something along the lines of the following.

Take this example directory

C:\

C:\X\

C:\X\B\

C:\X\file.txt

C:\Y\

C:\Y\file.pdf

C:\Y\R\

C:\Y\R\file.js

In JSON it could possibly be represented as:

{

"C:" : {

"X" : {

"B" : { },

"file.txt" : "C:\X\file.txt",

"Y" : {

"file.pdf" : "C:\Y\file.pdf",

"R" : {

"file.js" : "C:\Y\R\file.js",

}

The latter, as you pointed out, does not scale well with large directory structures (I can tell you first hand that browsers will not appreciate a JSON blob representing even a modest directory with a few thousand files/folders). The former, though akin to some actual filesystems and efficient in the right context, is a pain to work with converting to and from JSON.

My proposal is to break each directory into a separate JSON document, as this will address all three issues, however nothing is free, and this will increase code complexity, number of requests per session, etc.

The above structure could be broken into the following documents:

{

"id" : "CCCCCCCC",

"type" : "p",

"name" : "C:",

"children" : [

{ "name" : "X", "type" : "p", "id" : "XXXXXXXX" },

{ "name" : "Y", "type" : "p", "id" : "YYYYYYYY" }

]

}

{

"id" : "XXXXXXXX",

"type" : "p",

"name" : "X",

"children" : [

{ "name" : "B", "type" : "p", "id" : "BBBBBBBB" },

{ "name" : "file.txt", "type" : "f", "path" : "C:\X\file.txt", "size" : "1024" }

]

}

{

"id" : "YYYYYYYY",

"type" : "p",

"name" : "Y",

"children" : [

{ "name" : "R", "type" : "p", "id" : "RRRRRRRR" },

{ "name" : "file.pdf", "type" : "f", "path" : "C:\Y\file.pdf", "size" : "2048" }

]

}

{

"id" : "BBBBBBBB",

"type" : "p",

"name" : "B",

"children" : [ ]

}

{

"id" : "RRRRRRRR",

"type" : "p",

"name" : "R",

"children" : [

{ "name" : "file.js", "type" : "f", "path" : "C:\Y\R\file.js", "size" : "2048" }

]

}

Where each document represents a folder and its immediate children only. Child folders can be lazy loaded using their ids and appended to their parent in the UI. Well implemented lazy loading can pre load child nodes to a desired depth, creating a very responsive UI. RAM usage is minimal as your sever only has to handle small payloads pre request. The number of requests does go up considerably versus a single document approach, but again, some clever lazy-loading can cluster requests and reduce the total number.

UPDATE 1: I somehow overlooked your second last paragraph before answering, so this is probably more or less what you had in mind. To address the issue of too many documents some level of clustering nodes within documents may be in order. I have to head off now but I'll give it some thought.

UPDATE 2: I've created a gist of a simplified version of the clustering concept I mentioned. It doesn't take into account files, just folders, and it isn't doesn't include and code to update the documents. Hopefully it'll give you some ideas, I'll continue to update it for my own purposes.

逝落之心

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫