从MergeSort看HBase 二

17 篇文章 0 订阅
16 篇文章 0 订阅

接着前一篇 从MergeSort看HBase 一:http://blog.csdn.net/mrtitan/article/details/8457080

 

在hbase中scan是最核心的算法,从直接的读操作get,scan到一些hbase的大操作compact、split、flush都是使用各种各样的scanner实现的,可以说region的一切读操作都是scan。scan的实现可以参考:http://blog.csdn.net/mrtitan/article/details/8284569

而其实scan也使用了mergesort的思想,下面我们模拟hbase中scanner实现从多个各自有序的文件中以流的方式顺序读出。

 

SortedFile类

public class SortedFile{

    private String filePath;

    public SortedFile(String path){
        this.filePath = path;
    }
	
	public Reader getReader() throws IOException {
		return new Reader(this.filePath);
	}
	
	public Writer getWriter() throws IOException {
		return new Writer(this.filePath);
	}
	
}

SortedFile的writer类,通过writer的initRandom方法会产生大量随即数,排序后依次存入文件:

public class Writer {

    private RandomAccessFile writeFile;
    private FileChannel writeChannel;
    private MappedByteBuffer writeMbb;
    private final int FILE_SIZE = 128;
    private final int INT_SIZE = 4;


    public Writer(String filePath) throws IOException {
        writeFile = new RandomAccessFile(filePath, "rw");
        writeChannel = writeFile.getChannel();
        writeMbb = writeChannel.map(FileChannel.MapMode.READ_WRITE,0,FILE_SIZE);
    }
	
	public void initFileRandom(){
		Random random = new Random();
        int write_time = FILE_SIZE / INT_SIZE;
        List<Integer> valueList = new ArrayList<Integer>(write_time);
        for(int i=0;i<write_time;i++){
            int randomValue = Math.abs(random.nextInt(100000));
            valueList.add(randomValue);
        }
        Collections.sort(valueList);

        for (Integer value : valueList) {
            writeMbb.putInt(value);
        }
	}

    public void close() throws IOException {
        if (writeMbb != null) {
            writeMbb.force();
            unmap(writeMbb);
        }

        closeResource(writeChannel);
        closeResource(writeFile);
    }

    private void unmap(MappedByteBuffer buffer)
    {
        if (buffer == null) return;
        sun.misc.Cleaner cleaner = ((DirectBuffer) buffer).cleaner();
        if (cleaner != null) {
            cleaner.clean();
        }
    }

    private void closeResource(Closeable c) throws IOException {
        if (c != null) {
            c.close();
        }
    }

}

SortedFile的Reader类,提供scanner依次读出文件中的数字。

public class Reader {

    private RandomAccessFile readFile;
    private String filePath;
	public Reader(String filePath) throws IOException {
        readFile = new RandomAccessFile(filePath, "rw");
        this.filePath = filePath;

	}

    public Scanner getScanner() throws IOException {
        return new Scanner(readFile);
    }

    class Scanner {

        private FileChannel readChannel;
        private MappedByteBuffer readMbb;
        private final int FILE_SIZE = 128;
        private final int INT_SIZE = 4;

        private int readIndex ;
        private int valueNow;

        public Scanner(RandomAccessFile readFile) throws IOException {
            readChannel = readFile.getChannel();
            readMbb = readChannel.map(FileChannel.MapMode.READ_WRITE,0,FILE_SIZE);
            readIndex = 0;
        }

        public String getFilePath(){
            return filePath;
        }

        public Integer peek(){
            valueNow = readMbb.getInt();
            readMbb.position(readIndex);
            return valueNow;
        }

        public boolean hasNext(){
            return readIndex + INT_SIZE < FILE_SIZE;
        }

        public Integer next(){
            if (hasNext()){
                valueNow = readMbb.getInt();
                readIndex += INT_SIZE;
            }

            return null;
        }
    }
	
}

MergeHandler类,核心类。将多个scanner整合为一个有序序列。

public class MergeHandler {

    Queue<Reader.Scanner> valueHeap;
    IntegerScannerComparator comparator;
    Reader.Scanner current;

    public MergeHandler(SortedFile[] sortedFiles) throws IOException {
        comparator = new IntegerScannerComparator();
        valueHeap = new PriorityQueue<Reader.Scanner>(sortedFiles.length, comparator);
        for (SortedFile sortedFile : sortedFiles) {
            Reader.Scanner scanner = sortedFile.getReader().getScanner();
            if (scanner.hasNext()) {
                valueHeap.add(scanner);
            }
        }
    }

    public ResultScanner getResultScanner() {
        return new ResultScanner();
    }

    class ResultScanner implements Iterable<Integer> {

        @Override
        public Iterator<Integer> iterator() {
            return new Iterator<Integer>() {
                @Override
                public boolean hasNext() {
                    return !valueHeap.isEmpty();
                }

                @Override
                public Integer next() {
                    if (hasNext()) {
                        current = valueHeap.poll();

                        //System.out.println(current.getFilePath());
                        int currentValue = current.peek();
                        if (current.hasNext()) {
                            current.next();
                            valueHeap.add(current);
                        }
                        return currentValue;
                    }
                    return null;
                }

                @Override
                public void remove() {
                    //To change body of implemented methods use File | Settings | File Templates.
                }
            };
        }
    }
}

Scanner的comparator,通过加载此类后prorityQueue就能分辨scanner的大小关系

public class IntegerScannerComparator implements Comparator<Reader.Scanner> {

    @Override
    public int compare(Reader.Scanner o1, Reader.Scanner o2) {
        return o1.peek().compareTo(o2.peek());
    }
}

测试类:

public class MergeSortTest {

    public static SortedFile[] itemList = new SortedFile[10];

    /**
     * @param args
     * @throws Exception
     */
    public static void main(String[] args) throws Exception {
        // TODO Auto-generated method stub

        int filenum = 10;

        SortedFile[] files = new SortedFile[filenum];
        for (int i = 0; i < filenum; i++) {
            SortedFile sortedFile = new SortedFile("D:/queue/" + i + ".txt");
            files[i] = sortedFile;
            Writer writer = sortedFile.getWriter();
            writer.initFileRandom();
            writer.close();
        }

        MergeHandler mergeHandler = new MergeHandler(files);
        MergeHandler.ResultScanner resultScanner = mergeHandler.getResultScanner();

        int num = 0;
        for (int value : resultScanner) {
            num++;
            System.out.println("value :" + value + " num :" + num);
        }
    }
}

运行测试程序后,resultScanner会将所有文件中的数据排序。

 

总结:

1.此程序可以很好的解决一个经典的面试题:排序10亿个数值。

2.mergesort的使用前提是每个小序列本身必须有序。

3.实现的关键在于使用prorityqueue来获得所有scanner当前的最小值

4.这种scanner的思想也在lucent等分布式计算领域中大量借鉴,值得学习~~






 

 

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值