MIT6.830 Lab 1 实验笔记底层存储相关单元

最新推荐文章于 2023-07-21 11:45:33 发布

马走日_ZouR-Ma

最新推荐文章于 2023-07-21 11:45:33 发布

阅读量785

点赞数

分类专栏： MIT 6.830 文章标签：数据库 java sql

本文链接：https://blog.csdn.net/Fitzzzz/article/details/118583725

版权

MIT 6.830 专栏收录该内容

5 篇文章 4 订阅

订阅专栏

Lab1 底层存储相关单元

Exercise 1 TupleDesc 和 Tuple

TupleDesc 代表的是数据库中表头的一行的具体类型包括了一张表中的数据类型和名称。然后TupleDesc.java中就有个数组 TDItem[ ]代表了这张表里面的表头，数据类型就是在里面定义的静态的内部类。

public static class TDItem implements Serializable {
 
        private static final long serialVersionUID = 1L;
 
        /**
         * The type of the field
         * */
        public final Type fieldType;
        
        /**
         * The name of the field
         * */
        public final String fieldName;
 
        public TDItem(Type t, String n) {
            this.fieldName = n;
            this.fieldType = t;
        }
 
        public String toString() {
            return fieldName + "(" + fieldType + ")";
        }
 
    }

然后TupleDesc.java里面再通过构造函数，向里面添加每个字段的名称和类型，同时名称可以为空。

public TupleDesc(Type[] typeAr, String[] fieldAr) {
        // some code goes here
        tdItems = new TDItem[typeAr.length];
        for(int i=0;i<typeAr.length;++i){
            tdItems[i] = new TDItem(typeAr[i],fieldAr[i]);
        }
    }

 public TupleDesc(Type[] typeAr) {
        // some code goes here
        tdItems = new TDItem[typeAr.length];
        for(int i=0;i<typeAr.length;++i){
            tdItems[i] = new TDItem(typeAr[i],"");
        }
    }

同时TupleDesc.java 还实现了 TDItem的迭代函数、获取当前数组 TDItem[ ]长度的函数、通过编号获取该编号的类型的函数、通过编号获取该编号的名称的函数、通过字段名字获取当前字段编号的函数、获取TDItem[ ]占用的字节数的函数、将两个TupleDesc合并的函数等

public Iterator<TDItem> iterator();
public int numFields();
public Type getFieldType(int i);   
public String getFieldName(int i);
public int fieldNameToIndex(String name);
public int getSize();
public static TupleDesc merge(TupleDesc td1, TupleDesc td2);

Tuple则是TupleDesc的具体实现，实现了存储一行中的具体数据等的操作。通过一系列get set方法设置 Tuple 的 TupleDesc 、RecordId 和Field（字段）。

Exercise 2 Catalog

catalog类描述的是数据库实例。包含了数据库现有的表信息以及表的schema信息。现在需要实现添加新表的功能，以及从特定的表中提取信息。提取信息时通过表对应的TupleDesc对象决定操作的字段类型和数量。

catalog里面通过一个hashmap存储 ConcurrentHashMap<Integer,Table>( )，同时在catalog里面创建一个Table的类代表了现有数据库的一张表。

private static class Table{
        private static final long serialVersionUID = 1L;
 
        public final DbFile dbFile;
        public final String tableName;
        public final String pk;
 
        public Table(DbFile file,String name,String pkeyField){
            dbFile = file;
            tableName = name;
            pk = pkeyField;
        }
 
        public String toString(){
            return tableName + "(" + dbFile.getId() + ":" + pk +")";
        }
    }

Catalog.java中实现了三种构造函数、通过表名获取TableId（也就是ConcurrentHashMap<Integer,Table>( )的Integer）的函数、通过TableId获取TupleDesc的函数、通过TableId获取Dbfile的函数、通过TableId获取主键的函数以及原本提供的loadSchema函数。

public void addTable(DbFile file, String name, String pkeyField);
public void addTable(DbFile file, String name);
public void addTable(DbFile file);
public int getTableId(String name);
public TupleDesc getTupleDesc(int tableid);
public DbFile getDatabaseFile(int tableid);
public String getPrimaryKey(int tableid);
public String getTableName(int id);
public void loadSchema(String catalogFile);

Exercise 3 BufferPool部分实现

buffer pool（在SimpleDB中是BufferPool类）负责将内存最近读过的物理页缓存下来。所有的读写操作通过buffer pool读写硬盘上不同文件，BufferPool里的numPages参数确定了读取的固定页数，在之后的lab中，需要实现淘汰机制(eviction policy)。在这个lab中，只需要实现构造器和BufferPool.getPage()方法，BufferPool应该存取最多numPages个物理页，当前lab中如果页的数量超过numPages，先不实现eviction policy，先扔出一个DbException错误。
在本次exercise中我们只需要实现getPage（）方法。通过Database中的getCatalog方法获取catalog，然后再通过getDatabaseFile获取catalog中的具体的table的Dbfile，然后通过readPage方法获取具体某一页的page信息，再存储在pageStore中，pageStore为ConcurrentHashMap<Integer,Page>( )。

Exercise 4 HeapPage实现

HeapPageID.java 存储的是tableId和pagenumber的对应关系，实现了读取TableId、读取PageNumber、返回hashCode等函数。
RecordID.java 存储的是pageId和tuplenumber的对应关系，实现了读取TupleNumber、读取PageId、返回hashCode等函数。

HeapPage.java 实现了从硬盘内读取数据将数据实例化成heapPage，其实就是将HeapFile实例化一个个的HeapPge。

HeapPage.java通过构造函数初始化HeapPageId、TupleDesc 、numSlots（每页HeapPage能容纳Tuple的数量）、header[ ]（判断对应Tuple是否被用过）、Tuple[ ]。

public HeapPage(HeapPageId id, byte[] data) throws IOException {
        this.pid = id;
        this.td = Database.getCatalog().getTupleDesc(id.getTableId());
        this.numSlots = getNumTuples();
        DataInputStream dis = new DataInputStream(new ByteArrayInputStream(data));
 
        // allocate and read the header slots of this page
        header = new byte[getHeaderSize()];
        for (int i=0; i<header.length; i++)
            header[i] = dis.readByte();
        
        tuples = new Tuple[numSlots];
        try{
            // allocate and read the actual records of this page
            for (int i=0; i<tuples.length; i++)
                tuples[i] = readNextTuple(dis,i);
        }catch(NoSuchElementException e){
            e.printStackTrace();
        }
        dis.close();
 
        setBeforeImage();
    }

HeapPage.java还是实现了，通过读取下一个Tuple的函数、获取空闲Slot的函数，判断当前Slot是否被用过的函数。

private Tuple readNextTuple(DataInputStream dis, int slotId);
public int getNumEmptySlots();
public boolean isSlotUsed(int i);

Exercise 5 HeapFile实现

HeapFile是DbFile的一个实现，它存储了一个没有特定顺序的元组集合。元组被存储在页面上，每个页面的大小都是固定的，而文件只是这些页面的一个集合。HeapFile与HeapPage紧密合作。HeapPages的格式在HeapPage构造函数中描述。

HeapFile.java在构造函数中实例化了File、TupleDesc这两个成员变量，并提供了get方法。还提供了readPage函数，通过传入PageId返回Page。需要注意的是从文件中找到当前PageId对应的偏移量，通过seek函数移动指针，然后再用read方法进行读取。

public Page readPage(PageId pid) {
        // some code goes here
        int tableId = pid.getTableId();
        int pgNo = pid.getPageNumber();
 
        RandomAccessFile f = null;
        try{
            f = new RandomAccessFile(file,"r");
            if((pgNo+1)*BufferPool.getPageSize() > f.length()){
                f.close();
                throw new IllegalArgumentException(String.format("table %d page %d is invalid", tableId, pgNo));
            }
            byte[] bytes = new byte[BufferPool.getPageSize()];
            f.seek(pgNo * BufferPool.getPageSize());
            // big end
            int read = f.read(bytes,0,BufferPool.getPageSize());
            if(read != BufferPool.getPageSize()){
                throw new IllegalArgumentException(String.format("table %d page %d read %d bytes", tableId, pgNo, read));
            }
            HeapPageId id = new HeapPageId(pid.getTableId(),pid.getPageNumber());
            return new HeapPage(id,bytes);
        }catch (IOException e){
            e.printStackTrace();
        }finally {
            try{
                f.close();
            }catch (Exception e){
                e.printStackTrace();
            }
        }
        throw new IllegalArgumentException(String.format("table %d page %d is invalid", tableId, pgNo));
    }

接下来就是HeapFile.java中最难的函数，返回DbFileIterator 迭代器的函数。首先我们需要再创建一个类HeapFileIterator 它继承了DbFileIterator ，在构造函数里面实例化了heapFile和TransactionId，然后在open函数中通过getPageTuples函数中实例化了Iterator（Tuple的迭代器），然后还重写了hasNext、next、rewind以及close等函数。

    private static final class HeapFileIterator implements DbFileIterator{
        private final HeapFile heapFile;
        private final TransactionId tid;
        private Iterator<Tuple> it;
        private int whichPage;

        public HeapFileIterator(HeapFile file,TransactionId tid){
            this.heapFile = file;
            this.tid = tid;
        }
        @Override
        public void open() throws DbException, TransactionAbortedException {
            // TODO Auto-generated method stub
            whichPage = 0;
            it = getPageTuples(whichPage);
        }

        private Iterator<Tuple> getPageTuples(int pageNumber) throws TransactionAbortedException, DbException{
            if(pageNumber >= 0 && pageNumber < heapFile.numPages()){
                HeapPageId pid = new HeapPageId(heapFile.getId(),pageNumber);
                HeapPage page = (HeapPage)Database.getBufferPool().getPage(tid, pid, Permissions.READ_ONLY);
                return page.iterator();
            }else{
                throw new DbException(String.format("heapfile %d does not contain page %d!", pageNumber,heapFile.getId()));
            }
        }

        @Override
        public boolean hasNext() throws DbException, TransactionAbortedException {
            // TODO Auto-generated method stub
            if(it == null){
                return false;
            }

            if(!it.hasNext()){
                if(whichPage < (heapFile.numPages()-1)){
                    whichPage++;
                    it = getPageTuples(whichPage);
                    return it.hasNext();
                }else{
                    return false;
                }
            }else{
                return true;
            }
        }

        @Override
        public Tuple next() throws DbException, TransactionAbortedException, NoSuchElementException {
            // TODO Auto-generated method stub
            if(it == null || !it.hasNext()){
                throw new NoSuchElementException();
            }
            return it.next();
        }

        @Override
        public void rewind() throws DbException, TransactionAbortedException {
            // TODO Auto-generated method stub
            close();
            open();
        }

        @Override
        public void close() {
            // TODO Auto-generated method stub
            it = null;
        }

    }

Exercise 6 Operators实现

数据库Operators(操作符)负责查询语句的实际执行。在SimpleDB中，Operators是基于迭代器实现的，每种iterator实现了一个DbIterator接口。SeqScan则为顺序扫描的功能，提供表内数据的迭代。

在SeqScan.java中，在构造函数的时候要实例化TransactionId、tableId、tableAlias、DbFileIterator，四个成员变量。

public SeqScan(TransactionId tid, int tableid, String tableAlias) {
        // some code goes here
        this.tid = tid;
        this.tableId = tableid;
        this.tableAlias = tableAlias;
        it = Database.getCatalog().getDatabaseFile(tableId).iterator(tid);
    }

还实现了get和rest等方法，主要关键函数为getTupleDesc、hasNext以及next，都是通过包装DbIterator实现的。

public TupleDesc getTupleDesc() {
        // some code goes here
        TupleDesc tupleDesc = Database.getCatalog().getTupleDesc(tableId);
        int numFields = tupleDesc.numFields();
        Type[] types = new Type[numFields];
        String[] fieldNames = new String[numFields];
        String prefix = "null.";
        if(getAlias() != null) {
            prefix = getAlias() + ".";
        }
        for(int i = 0; i < numFields; i++) {
            types[i] = tupleDesc.getFieldType(i);
            fieldNames[i] = prefix + tupleDesc.getFieldName(i);
        }
        return new TupleDesc(types, fieldNames);
    }

    public boolean hasNext() throws TransactionAbortedException, DbException {
        // some code goes here
        if(it == null){
            return false;
        }
        return it.hasNext();
    }

    public Tuple next() throws NoSuchElementException,
            TransactionAbortedException, DbException {
        // some code goes here
        if(it == null){
            throw new NoSuchElementException("no next tuple");
        }
        Tuple t = it.next();
        if(t == null){
            throw new NoSuchElementException("no next tuple");
        }
        return t;

    }

参考文章：
https://blog.csdn.net/hjw199666/category_9588041.html 特别鸣谢hjw199666 在我完成6.830的道路上给了很多代码指导，我的很多代码都是基于他的改的
https://www.zhihu.com/people/zhi-yue-zhang-42/posts

马走日_ZouR-Ma

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
MIT6.830 Lab 1 实验笔记底层存储相关单元

Lab1 底层存储相关单元Exercise 1 TupleDesc 和 TupleTupleDesc 代表的是数据库中表头的一行的具体类型包括了一张表中的数据类型和名称。然后TupleDesc.java中就有个数组 TDItem[ ]代表了这张表里面的表头，数据类型就是在里面定义的静态的内部类。public static class TDItem implements Serializable { private static final long serialVersion
复制链接

扫一扫