java将文件内的数据排序,Java中大型数据集的基于文件的合并排序

given large datasets that don't fit in memory, is there any library or api to perform sort in Java?

the implementation would possibly be similar to linux utility sort.

解决方案

Java provides a general-purpose sorting routine which can be used as part of the larger solution to your problem. A common approach to sort data that's too large to all fit in memory is this:

1) Read as much data as will fit into main memory, let's say it's 1 Gb

2) Quicksort that 1 Gb (here's where you'd use Java's built-in sort from the Collections framework)

3) Write that sorted 1 Gb to disk as "chunk-1"

4) Repeat steps 1-3 until you've gone through all the data, saving each data chunk in a separate file. So if your original data was 9 Gb, you will now have 9 sorted chunks of data labeled "chunk-1" thru "chunk-9"

5) You now just need a final merge sort to merge the 9 sorted chunks into a single fully sorted data set. The merge sort will work very efficiently against these pre-sorted chunks. It will essentially open 9 file readers (one for each chunk), plus one file writer (for output). It then compares the first data element in each read file and selects the smallest value, which is written to the output file. The reader from which that selected value came advances to its next data element, and the 9-way comparison process to find the smallest value is repeated, again writing the answer to the output file. This process repeats until all data has been read from all the chunk files.

6) Once step 5 has finished reading all the data you are done -- your output file now contains a fully sorted data set

With this approach you could easily write a generic "megasort" utility of your own that takes a filename and maxMemory parameter and efficiently sorts the file by using temp files. I'd bet you could find at least a few implementations out there for this, but if not you can just roll your own as described above.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值