disk排序操作的过程与分类
1 disk sort过程
When the WORKAREA_SIZE_POLICY parameter is set to MANUAL, the maximum amount of memory allocated for a sort is defined by the parameter SORT_AREA_SIZE. If the sort operation is not able to completely fit into SORT_AREA_SIZE memory, then the sort is separated into phases.The temporary output of each phase is stored in temporary segments on disk. The tablespace in which these sort segments are created is the users temporary tablespace.When Oracle writes sort operations to disk, it writes out partially sorted data in sorted runs. After all the data has been received by the sort, Oracle merges the runs to produce the final sorted output. If the sort area is not large enough to merge all the runs at once, then subsets of the runs are merged in several merge passes. If the sort area is larger, then there are fewer, longer runs produced. A larger sort area also means that the sort can merge more runs in one merge pass.
从上面doc看来,排序始终是在内存里完成的,如果要排序的数据量很大,在内存里不能完成,oracle会分阶段来排序,每次先排一部分,并且把排好序的数据临时存放在用户default temporary tablespace中的temp segment上,而临时表空间对应的tempfile属于disk文件,这就是disk sort的由来。
2 Sort的三种情况:
optimail(in memory sort),one pass sort,mutli-pass sort
TOM 大师的解析:
in memory sort Everything fits in memory, no disk is needed, everything done in ram.
one pass sort We read some amount of data and sort it, the sort workarea fills up, we write that to disk. We do it again (read more, sort it) - this too spills to disk. We do this over and over until we've read the entire set of data and sorted each of the chunks. Now we need read a bit of each of the chunks we've sorted on disk and start returning data from there. In the one pass sort, we can read a bit of each chunk and start returning data. If we cannot read a bit of EVERY chunk then....
we are in the realm of the multi-pass sort. We read and merge the first few chunks - resulting in a new set. We read and merge another set of chunks - resulting in a new set. And so on - until we've processed all of the output from the first pass and created a "second pass" of more merged (more sorted) data. Then we merge those in the sort area and start returning data.
sort的模式可以分为三类
1. cache sort,即排序都在内存中完成,不需要写道temp表空间中,此时所需的sort area size 为 input size/0.9 (大约只有90%的排序区用于实际的排序)
2. One pass sort. 当排序操作不能完全在内存中完成,就需要暂时存储到temp表空间物理文件上。一次内存排序后的结果集称之为一次sort run,个数大约等于input size/(sort_area_size*0.9)。每个单独的sort run都是排好序的,这时候将多个sort run合并成一个有序的数列就要进行merge操作,merge操作就是从各个sort run中从头开始依次读入block,例如一开始每个sort run读入4个block,这样如果读入的这些block(4*n)可以存放在sort area 中就称为one pass sort,也就是一次merge就完成了,如果不能存放入sort area中,就需要做多次merge,这时性能会大大降低。
3. mutil-pass sort就是上面所说的要做多次merge的情况
3 需要一个例子来理解。
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/10248702/viewspace-621043/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/10248702/viewspace-621043/