在谈MIT6.830 lab1 细节剖析 (已完结）

最新推荐文章于 2022-11-21 18:19:36 发布

wwxy261

最新推荐文章于 2022-11-21 18:19:36 发布

阅读量1.4k

点赞数 3

分类专栏：算法

本文链接：https://blog.csdn.net/wwxy1995/article/details/116076240

版权

算法专栏收录该内容

3633 篇文章 120 订阅

订阅专栏

该项目使用Java来实现，充分体现面向对象特性。

part1 是对关系型数据库表结构的抽象，实现以下几个类

src / java / simpledb / storage / TupleDesc.java
src / java / simpledb / storage / Tuple.java

这个项目写起来有些特别，通常我们自己来实现，是设计数据，然后用IDEA实现get和set方法。这里项目设计是反过来的，已经有get和set。让我们去补全数据，目的是通过练习，理解整个架构。

在实现过程中，注意equals， hashcode， tostring方法实现均可以用idea自动生成，这也是Java的特色所在。

其中Java ArrayList， string.join方法有些地方要注意，这里就不再提了。

part2 是实现catalog，也就是管理所有表结构的类。是一个单例

src / java / simpledb / common / Catalog.java

这里找不到table的定义，因此我们需要自己实现一个table类

    class Table{
        public Table(DbFile file, String name, String pkeyField){
            this.file = file;
            this.name = name;
            this.pkeyField = pkeyField;
            this.tupleDesc = file.getTupleDesc();
        }
        public DbFile file;
        public String name;
        public String pkeyField;
        public TupleDesc tupleDesc;
    }

    HashMap<Integer, Table> tables = new HashMap<>();

part3是实现一个BufferPool

src/java/simpledb/storage/BufferPool.java

只需要实现构造函数和getPage

这里也是写数据成员，很容易想到一个是HashMap，但是为了考虑并发，这里用线程安全的HashMap，加读写锁，后面会修改。

getPage发生缺页时，要去往硬盘上读，这里处理可以先实现

    private ConcurrentHashMap<PageId, Page> pageTable;

    private ReadWriteLock rwLock;

    public  Page getPage(TransactionId tid, PageId pid, Permissions perm)
        throws TransactionAbortedException, DbException {
        // some code goes here
        rwLock.readLock().lock();
        Page page = pageTable.get(pid);
        if(page == null){
            DbFile file = Database.getCatalog().getDatabaseFile(pid.getTableId());
            page = file.readPage(pid);
        }
        rwLock.readLock().unlock();
        return page;
    }

Part4

实现这三个，HeapPage 是Page接口的实现

src/java/simpledb/storage/HeapPageId.java
src/java/simpledb/storage/RecordId.java
src/java/simpledb/storage/HeapPage.java

主要是实现HeapPage

其中HeapPage是一个可以序列话为二进制存储的结构，比较关键的数据有

_tuples per page_ = floor((_page size_ * 8) / (_tuple size_ * 8 + 1))

以及headerBytes = ceiling(tupsPerPage/8

因为headerBytes是Bit数组标记tupsPerPage是否在使用

与之相关的两个方法就很好实现。

另外就是BitMap的操作

判断BitMap中有多少个0

我们转换成判断BitMap中有多少个1

然后用lowBit技巧。

    /**
     * Returns the number of empty slots on this page.
     */
    public int getNumEmptySlots() {
        // some code goes here
        int num = 0;
        for(byte b : header){
            while(b != 0){
                num++;
                b &= (b-1);
            }
        }
        return getNumTuples() - num;
    }

然后是BitMap判断是否使用

    /**
     * Returns true if associated slot on this page is filled.
     */
    public boolean isSlotUsed(int i) {
        // some code goes here
        byte b = header[i / 8];
        return (b & (1 << (i % 8))) != 0;
    }

然后就是结合BitMap实现一个迭代器，这里也是常规操作：

    /**
     * @return an iterator over all tuples on this page (calling remove on this iterator throws an UnsupportedOperationException)
     * (note that this iterator shouldn't return tuples in empty slots!)
     */
    public Iterator<Tuple> iterator() {
        // some code goes here
        return new Iterator<Tuple>(){

            int i = 0;

            @Override
            public boolean hasNext() {
                return i < getNumTuples() - getNumEmptySlots();
            }

            @Override
            public Tuple next() {
                if(!hasNext()){
                    throw new NoSuchElementException();
                }
                while(!isSlotUsed(i)){
                    i++;
                }
                return tuples[i++];
            }

            @Override
            public void remove() {
                throw new UnsupportedOperationException();
            }
        };
    }

}

Part4 是实现一个HeapFile， HeapFile是File接口的实现，其中一个DbFile有多少Page的实现如下：

    /**
     * Returns the number of pages in this HeapFile.
     */
    public int numPages() {
        // some code goes here
        return (int) f.length() / BufferPool.getPageSize();
    }

然后读取DbFile指定的Page使用标准的文件偏移量做法即可。

    // see DbFile.java for javadocs
    public Page readPage(PageId pid) {
        // some code goes here
        // 读取File指定的一页
        try {
            RandomAccessFile rfile = new RandomAccessFile(f, "r");
            int pageSize = BufferPool.getPageSize();
            byte[] buffer = new byte[pageSize];
            rfile.seek((long) pid.getPageNumber() * pageSize);
            if(rfile.read(buffer) == -1){
                return null;
            }
            return new HeapPage(new HeapPageId(pid.getTableId(), pid.getPageNumber()), buffer);
        } catch (IOException e) {
            e.printStackTrace();
        }
        return null;
    }

Part7是实现一个命令行测试，然后用一段数据测试

1,1,1
2,2,2 
3,4,4

public class test {

    public static void main(String[] argv) {

        // construct a 3-column table schema
        Type types[] = new Type[]{ Type.INT_TYPE, Type.INT_TYPE, Type.INT_TYPE };
        String names[] = new String[]{ "field0", "field1", "field2" };
        TupleDesc descriptor = new TupleDesc(types, names);

        // create the table, associate it with some_data_file.dat
        // and tell the catalog about the schema of this table.
        HeapFile table1 = new HeapFile(new File("some_data_file.dat"), descriptor);
        Database.getCatalog().addTable(table1, "test");

        // construct the query: we use a simple SeqScan, which spoonfeeds
        // tuples via its iterator.
        TransactionId tid = new TransactionId();
        SeqScan f = new SeqScan(tid, table1.getId());
        try {
            // and run it
            f.open();
            //System.out.println(1);
            while (f.hasNext()) {
                Tuple tup = f.next();
                System.out.println(tup);
            }
            f.close();
            Database.getBufferPool().transactionComplete(tid);
        } catch (Exception e) {
            System.out.println ("Exception : " + e);
        }
    }

}