将树存储在mysql中,将深层目录树存储在数据库中

I am working on a desktop application that is much like WinDirStat or voidtools' Everything - it maps hard drives, i.e. creates a deeply nested dictionary out of the directory tree.

The desktop application should then store the directory trees in some kind of database, so that a web application can be used to browse them from root, depth level by depth level.

Assume both applications run locally on the same machine for the time being.

The question that comes to mind is how the data should be structured and what database should be utilized, considering:

1) RAM consumption should be reasonable

2) The time it takes to for the directory to be ready for viewing in the web application should be minimal

P.S -

My initial approach was serializing each file system node to JSON separately and inserting each into Mongo, with object references linking them to their children. That way the web application could easily load the data based on user demand.

However, I am worried that making so many (a million, by average) independent inserts to Mongo will take a lot of time; if I make bulk inserts that means I have to keep each bulk in memory.

I also considered dumping the entire tree as one deeply nested JSON, but the data is too large to be a Mongo document. GridFS can be used to store it, but then I would have the load the entire tree in the web application even though the deep nodes may not be of interest.

解决方案

Given your requirements of:

A) Low RAM usage

B) Meeting file size limitations in Mongo

C) A responsive UI

I'd consider something along the lines of the following.

Take this example directory

C:\

C:\X\

C:\X\B\

C:\X\file.txt

C:\Y\

C:\Y\file.pdf

C:\Y\R\

C:\Y\R\file.js

In JSON it could possibly be represented as:

{

"C:" : {

"X" : {

"B" : { },

"file.txt" : "C:\X\file.txt",

},

"Y" : {

"file.pdf" : "C:\Y\file.pdf",

"R" : {

"file.js" : "C:\Y\R\file.js",

}

}

}

}

The latter, as you pointed out, does not scale well with large directory structures (I can tell you first hand that browsers will not appreciate a JSON blob representing even a modest directory with a few thousand files/folders). The former, though akin to some actual filesystems and efficient in the right context, is a pain to work with converting to and from JSON.

My proposal is to break each directory into a separate JSON document, as this will address all three issues, however nothing is free, and this will increase code complexity, number of requests per session, etc.

The above structure could be broken into the following documents:

{

"id" : "CCCCCCCC",

"type" : "p",

"name" : "C:",

"children" : [

{ "name" : "X", "type" : "p", "id" : "XXXXXXXX" },

{ "name" : "Y", "type" : "p", "id" : "YYYYYYYY" }

]

}

{

"id" : "XXXXXXXX",

"type" : "p",

"name" : "X",

"children" : [

{ "name" : "B", "type" : "p", "id" : "BBBBBBBB" },

{ "name" : "file.txt", "type" : "f", "path" : "C:\X\file.txt", "size" : "1024" }

]

}

{

"id" : "YYYYYYYY",

"type" : "p",

"name" : "Y",

"children" : [

{ "name" : "R", "type" : "p", "id" : "RRRRRRRR" },

{ "name" : "file.pdf", "type" : "f", "path" : "C:\Y\file.pdf", "size" : "2048" }

]

}

{

"id" : "BBBBBBBB",

"type" : "p",

"name" : "B",

"children" : [ ]

}

{

"id" : "RRRRRRRR",

"type" : "p",

"name" : "R",

"children" : [

{ "name" : "file.js", "type" : "f", "path" : "C:\Y\R\file.js", "size" : "2048" }

]

}

Where each document represents a folder and its immediate children only. Child folders can be lazy loaded using their ids and appended to their parent in the UI. Well implemented lazy loading can pre load child nodes to a desired depth, creating a very responsive UI. RAM usage is minimal as your sever only has to handle small payloads pre request. The number of requests does go up considerably versus a single document approach, but again, some clever lazy-loading can cluster requests and reduce the total number.

UPDATE 1: I somehow overlooked your second last paragraph before answering, so this is probably more or less what you had in mind. To address the issue of too many documents some level of clustering nodes within documents may be in order. I have to head off now but I'll give it some thought.

UPDATE 2: I've created a gist of a simplified version of the clustering concept I mentioned. It doesn't take into account files, just folders, and it isn't doesn't include and code to update the documents. Hopefully it'll give you some ideas, I'll continue to update it for my own purposes.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值