6.830 / 6.814: Syllabus 2021 - MIT Lab 2 - SimpleDB Operators

最新推荐文章于 2023-07-15 17:29:17 发布

weixin_Akian

最新推荐文章于 2023-07-15 17:29:17 发布

阅读量648

点赞数

分类专栏： DB-Class 6.830 文章标签： mysql

本文链接：https://blog.csdn.net/weixin_44337475/article/details/118606499

版权

DB-Class 6.830 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

文章目录

在此实验中，您将为SimpleDB编写一组运算符，以实现表修改（例如，插入和删除记录），选择，连接和聚合。这些将构建在实验1中写入的基础之上，为您提供一个可以在多个表上执行简单查询的数据库系统。

此外，我们忽略了实验中的缓冲池管理问题：我们没有处理引用多个页面时出现的问题。在Lab 2中，您将设计一个驱动策略来从缓冲池中刷新陈旧页面。
您无需在此实验中实现事务或锁定。

1.参考链接：

1.1 课程链接：http://db.lcs.mit.edu/6.830/assign.php
1.2 lab2链接：https://github.com/MIT-DB-Class/simple-db-hw-2021/blob/master/lab2.md
1.3 HearmingBear Lab2链接：https://blog.csdn.net/hjw199666/article/details/103590963
1.4 我的仓库地址：https://gitee.com/zhou-zhiqiang/db-class

2.SimpleDB Architecture and Implementation Guide

2.1Filter and Join

Filter: This operator only returns tuples that satisfy a Predicate that is specified as part of its constructor. Hence, it filters out any tuples that do not match the predicate.
Filter：此操作符只返回满足Predicate的tuples，该tuples被指定为其构造函数的一部分。因此，它过滤出与Predicate不匹配的任何tuples。

Join: This operator joins tuples from its two children according to a JoinPredicate that is passed in as part of its constructor.
Join：此操作符根据JoinPredicate连接两个tuples。

Exercise1.

src/java/simpledb/execution/Predicate.java
src/java/simpledb/execution/JoinPredicate.java
src/java/simpledb/execution/Filter.java
src/java/simpledb/execution/Join.java

src/java/simpledb/execution/Predicate.java

package simpledb.execution;

import simpledb.storage.Field;
import simpledb.storage.Tuple;

import java.io.Serializable;

/**
 * Predicate compares tuples to a specified Field value.
 */
public class Predicate implements Serializable {

    private static final long serialVersionUID = 1L;

    /** Constants used for return codes in Field.compare */
    public enum Op implements Serializable {
        EQUALS, GREATER_THAN, LESS_THAN, LESS_THAN_OR_EQ, GREATER_THAN_OR_EQ, LIKE, NOT_EQUALS;

        /**
         * Interface to access operations by integer value for command-line
         * convenience.
         * 
         * @param i
         *            a valid integer Op index
         */
        public static Op getOp(int i) {
            return values()[i];
        }

        public String toString() {
            if (this == EQUALS)
                return "=";
            if (this == GREATER_THAN)
                return ">";
            if (this == LESS_THAN)
                return "<";
            if (this == LESS_THAN_OR_EQ)
                return "<=";
            if (this == GREATER_THAN_OR_EQ)
                return ">=";
            if (this == LIKE)
                return "LIKE";
            if (this == NOT_EQUALS)
                return "<>";
            throw new IllegalStateException("impossible to reach here");
        }

    }

    private final int field;
    private final Op op;
    private final Field operand;

    /**
     * Constructor.
     * 
     * @param field
     *            field number of passed in tuples to compare against.
     * @param op
     *            operation to use for comparison
     * @param operand
     *            field value to compare passed in tuples to
     */
    public Predicate(int field, Op op, Field operand) {
        // some code goes here
        this.field = field;
        this.op = op;
        this.operand = operand;
    }

    /**
     * @return the field number
     */
    public int getField()
    {
        // some code goes here
        return field;
    }

    /**
     * @return the operator
     */
    public Op getOp()
    {
        // some code goes here
        return op;
    }
    
    /**
     * @return the operand
     */
    public Field getOperand()
    {
        // some code goes here
        return operand;
    }
    
    /**
     * Compares the field number of t specified in the constructor to the
     * operand field specified in the constructor using the operator specific in
     * the constructor. The comparison can be made through Field's compare
     * method.
     * 
     * @param t
     *            The tuple to compare against
     * @return true if the comparison is true, false otherwise.
     */
    public boolean filter(Tuple t) {
        // some code goes here
        return t.getField(field).compare(op,operand);
    }

    /**
     * Returns something useful, like "f = field_id op = op_string operand =
     * operand_string"
     */
    public String toString() {
        // some code goes here
        String s = String.format("f = %d op = %s operand = %s",field,op.toString(),operand.toString());
        return "";
    }
}

字段及含义

type	name	description
int	field	field id
Op	op	操作符
Field	oprand	比较的field

Op：

实现了Serializable接口的枚举类型。
| EQUALS | GREATER_THAN | LESS_THAN | LESS_THAN_OR_EQ | GREATER_THAN_OR_EQ | LIKE | NOT_EQUALS |
| :----: | :----------: | :-------: | :-------------: | :----------------: | ---- | :--------: |
|   =    |      >       |     <     |       <=        |         >=         | like |     !=     |
提供了getOp、toString方法。

Predicate：

构造器，字段id、操作、比较字段的数。

filter：

获取需要比较元组field对应的值，与当前构造器的oprand比较，通过op操作符。

getField、getOp、getOperand、toString：

略

src/java/simpledb/execution/JoinPredicate.java

package simpledb.execution;

import simpledb.storage.Tuple;

import java.io.Serializable;

/**
 * JoinPredicate compares fields of two tuples using a predicate. JoinPredicate
 * is most likely used by the Join operator.
 */
public class JoinPredicate implements Serializable {

    private static final long serialVersionUID = 1L;

    private final int field1;
    private final int field2;
    private final Predicate.Op op;
    /**
     * Constructor -- create a new predicate over two fields of two tuples.
     * 
     * @param field1
     *            The field index into the first tuple in the predicate
     * @param field2
     *            The field index into the second tuple in the predicate
     * @param op
     *            The operation to apply (as defined in Predicate.Op); either
     *            Predicate.Op.GREATER_THAN, Predicate.Op.LESS_THAN,
     *            Predicate.Op.EQUAL, Predicate.Op.GREATER_THAN_OR_EQ, or
     *            Predicate.Op.LESS_THAN_OR_EQ
     * @see Predicate
     */
    public JoinPredicate(int field1, Predicate.Op op, int field2) {
        // some code goes here
        this.field1 = field1;
        this.op = op;
        this.field2 = field2;
    }

    /**
     * Apply the predicate to the two specified tuples. The comparison can be
     * made through Field's compare method.
     * 
     * @return true if the tuples satisfy the predicate.
     */
    public boolean filter(Tuple t1, Tuple t2) {
        // some code goes here
        return t1.getField(field1).compare(op, t2.getField(field2));
    }
    
    public int getField1()
    {
        // some code goes here
        return field1;
    }
    
    public int getField2()
    {
        // some code goes here
        return field2;
    }
    
    public Predicate.Op getOperator()
    {
        // some code goes here
        return op;
    }
}

属性定义：
field1是第一个tuple的field id，field2是第二个tuple的field id，它们直接通过op操作过滤不符合条件的元组。
关键方法：
filter：过滤掉不符合条件的元组。
类似mysql的join…on后的on
在这里插入图片描述
src/java/simpledb/execution/Filter.java

package simpledb.execution;

import simpledb.transaction.TransactionAbortedException;
import simpledb.common.DbException;
import simpledb.storage.Tuple;
import simpledb.storage.TupleDesc;

import java.util.*;

/**
 * Filter is an operator that implements a relational select.
 */
public class Filter extends Operator {

    private static final long serialVersionUID = 1L;

    private final Predicate p;
    private OpIterator child;

    /**
     * Constructor accepts a predicate to apply and a child operator to read
     * tuples to filter from.
     * 
     * @param p
     *            The predicate to filter tuples with
     * @param child
     *            The child operator
     */
    public Filter(Predicate p, OpIterator child) {
        // some code goes here
        this.p = p;
        this.child = child;
    }

    public Predicate getPredicate() {
        // some code goes here
        return p;
    }

    public TupleDesc getTupleDesc() {
        // some code goes here
        return child.getTupleDesc();
    }

    public void open() throws DbException, NoSuchElementException,
            TransactionAbortedException {
        // some code goes here
        child.open();
        super.open();
    }

    public void close() {
        // some code goes here
        super.close();
        child.close();
    }

    public void rewind() throws DbException, TransactionAbortedException {
        // some code goes here
        child.rewind();
    }

    /**
     * AbstractDbIterator.readNext implementation. Iterates over tuples from the
     * child operator, applying the predicate to them and returning those that
     * pass the predicate (i.e. for which the Predicate.filter() returns true.)
     * 
     * @return The next tuple that passes the filter, or null if there are no
     *         more tuples
     * @see Predicate#filter
     */
    protected Tuple fetchNext() throws NoSuchElementException,
            TransactionAbortedException, DbException {
        // some code goes here
        while (child.hasNext()){
            Tuple t = child.next();
            if (p.filter(t)){
                return t;
            }
        }
        return null;
    }

    @Override
    public OpIterator[] getChildren() {
        // some code goes here
        return new OpIterator[] {this.child};
    }

    @Override
    public void setChildren(OpIterator[] children) {
        // some code goes here
        child = children[0];
    }

}

属性定义：
定义了一个predicate过滤tuple，以及一个child表示要过滤元组的迭代器对象。
关键方法：
fetchNext：如果child含有符合predicate过滤条件的tuple，则返回，否则返回null。
例如满足过滤条件的元组，将会参与join操作。
在这里插入图片描述
src/java/simpledb/execution/Join.java

package simpledb.execution;

import simpledb.transaction.TransactionAbortedException;
import simpledb.common.DbException;
import simpledb.storage.Tuple;
import simpledb.storage.TupleDesc;

import java.util.*;

/**
 * The Join operator implements the relational join operation.
 */
public class Join extends Operator {

    private static final long serialVersionUID = 1L;

    private JoinPredicate p;
    private OpIterator child1;
    private OpIterator child2;
    private Tuple t;
    /**
     * Constructor. Accepts two children to join and the predicate to join them
     * on
     * 
     * @param p
     *            The predicate to use to join the children
     * @param child1
     *            Iterator for the left(outer) relation to join
     * @param child2
     *            Iterator for the right(inner) relation to join
     */
    public Join(JoinPredicate p, OpIterator child1, OpIterator child2) {
        // some code goes here
        this.p = p;
        this.child1 = child1;
        this.child2 = child2;
        t = null;
    }

    public JoinPredicate getJoinPredicate() {
        // some code goes here
        return p;
    }

    /**
     * @return
     *       the field name of join field1. Should be quantified by
     *       alias or table name.
     * */
    public String getJoinField1Name() {
        // some code goes here
        return child1.getTupleDesc().getFieldName(p.getField1());
    }

    /**
     * @return
     *       the field name of join field2. Should be quantified by
     *       alias or table name.
     * */
    public String getJoinField2Name() {
        // some code goes here
        return child2.getTupleDesc().getFieldName(p.getField2());
    }

    /**
     * @see TupleDesc#merge(TupleDesc, TupleDesc) for possible
     *      implementation logic.
     */
    public TupleDesc getTupleDesc() {
        // some code goes here
        return TupleDesc.merge(child1.getTupleDesc(),child2.getTupleDesc());
    }

    public void open() throws DbException, NoSuchElementException,
            TransactionAbortedException {
        // some code goes here
        child1.open();
        child2.open();
        super.open();
    }

    public void close() {
        // some code goes here
        super.close();
        child2.close();
        child1.close();
    }

    public void rewind() throws DbException, TransactionAbortedException {
        // some code goes here
        child1.rewind();
        child2.rewind();
    }

    /**
     * Returns the next tuple generated by the join, or null if there are no
     * more tuples. Logically, this is the next tuple in r1 cross r2 that
     * satisfies the join predicate. There are many possible implementations;
     * the simplest is a nested loops join.
     * <p>
     * Note that the tuples returned from this particular implementation of Join
     * are simply the concatenation of joining tuples from the left and right
     * relation. Therefore, if an equality predicate is used there will be two
     * copies of the join attribute in the results. (Removing such duplicate
     * columns can be done with an additional projection operator if needed.)
     * <p>
     * For example, if one tuple is {1,2,3} and the other tuple is {1,5,6},
     * joined on equality of the first column, then this returns {1,2,3,1,5,6}.
     * 
     * @return The next matching tuple.
     * @see JoinPredicate#filter
     */
    protected Tuple fetchNext() throws TransactionAbortedException, DbException {
        // some code goes here
        while (this.child1.hasNext() || this.t != null) {
            if (this.child1.hasNext() && this.t == null){
                t = child1.next();
            }
            while (child2.hasNext()){
                Tuple t2 = child2.next();
                if (p.filter(t,t2)){
                    TupleDesc td1 = t.getTupleDesc();
                    TupleDesc td2 = t2.getTupleDesc();
                    TupleDesc newTd = TupleDesc.merge(td1,td2);
                    Tuple newTuple = new Tuple(newTd);
                    newTuple.setRecordId(t.getRecordId());
                    int i = 0;
                    for (;i < td1.numFields();i++){
                        newTuple.setField(i,t.getField(i));
                    }
                    for (int j = 0; j < td2.numFields();j++){
                        newTuple.setField(i+j,t2.getField(j));
                    }
                    if (!child2.hasNext()){
                        child2.rewind();
                        t = null;
                    }
                    return newTuple;
                }
            }
            child2.rewind();
            t = null;
        }
        return null;
    }

    @Override
    public OpIterator[] getChildren() {
        // some code goes here
        return new OpIterator[] {child1,child2};
    }

    @Override
    public void setChildren(OpIterator[] children) {
        // some code goes here
        child1 = children[0];
        child2 = children[1];
    }

}

属性定义：
定义了JoinPredicate对象p用于过滤tuple，child1作为第一个表的迭代器对象，child2作为第二个表的迭代器对象，以及临时变量t保存第一个表的tuple。
关键方法：
fetchNext：遍历第二个表的tuple，查看符合filter条件的tuple，拼接成为一个新的tuple并返回，第二个回到起点。
类似mysql
在这里插入图片描述

ant runtest -Dtest=PredicateTest
ant runtest -Dtest=JoinPredicateTest
ant runtest -Dtest=FilterTest
ant runtest -Dtest=JoinTest
这些单元测试都BUILD SUCCESSFUL。
ant runsystest -Dtest=FilterTest
ant runsystest -Dtest=JoinTest
这些系统测试都BUILD SUCCESSFUL。

2.2 Aggregates

src/java/simpledb/execution/IntegerAggregator.java
src/java/simpledb/execution/StringAggregator.java
src/java/simpledb/execution/Aggregate.java

src/java/simpledb/execution/IntegerAggregator.java

package simpledb.execution;

import simpledb.common.DbException;
import simpledb.common.Type;
import simpledb.storage.Field;
import simpledb.storage.IntField;
import simpledb.storage.Tuple;
import simpledb.storage.TupleDesc;
import simpledb.transaction.TransactionAbortedException;

import java.util.*;

/**
 * Knows how to compute some aggregate over a set of IntFields.
 */
public class IntegerAggregator implements Aggregator {

    private static final long serialVersionUID = 1L;

    private int gbfield;// group-by field
    private Type gbfieldtype;
    private int afield; //agregate field
    private Op what;

    /**
     * @return SUM,MIN,MAX,COUNT
     * **/
    private Map<Field,Integer> groupMap;
    private Map<Field,Integer> countMap;
    private Map<Field, List<Integer>> avgMap;
    /**
     * Aggregate constructor
     * 
     * @param gbfield
     *            the 0-based index of the group-by field in the tuple, or
     *            NO_GROUPING if there is no grouping
     * @param gbfieldtype
     *            the type of the group by field (e.g., Type.INT_TYPE), or null
     *            if there is no grouping
     * @param afield
     *            the 0-based index of the aggregate field in the tuple
     * @param what
     *            the aggregation operator
     */

    public IntegerAggregator(int gbfield, Type gbfieldtype, int afield, Op what) {
        // some code goes here
        this.gbfield = gbfield;
        this.gbfieldtype = gbfieldtype;
        this.afield = afield;
        this.what = what;
        this.groupMap = new HashMap<>();
        this.countMap = new HashMap<>();
        this.avgMap = new HashMap<>();
    }

    /**
     * Merge a new tuple into the aggregate, grouping as indicated in the
     * constructor
     * 
     * @param tup
     *            the Tuple containing an aggregate field and a group-by field
     */
    public void mergeTupleIntoGroup(Tuple tup) {
        // some code goes here
        IntField afield = (IntField) tup.getField(this.afield);
        Field gbfield = this.gbfield == NO_GROUPING ? null : tup.getField(this.gbfield);
        int newVaule = afield.getValue();//IntField.getValue()
        if (gbfield != null && gbfield.getType()!=this.gbfieldtype){  //compare
            throw new IllegalArgumentException("Given tuple has wrong type");
        }
        switch (this.what){
            case MIN:
                /**
                 * Places the current new tuple in the aggregate tuple
                 * */
                if (!this.groupMap.containsKey(gbfield)){
                    this.groupMap.put(gbfield,newVaule);
                }
                else {
                    this.groupMap.put(gbfield,Math.min(this.groupMap.get(gbfield),newVaule));
                }
                break;

            case MAX:
                if (!this.groupMap.containsKey(gbfield)){
                    this.groupMap.put(gbfield,newVaule);
                }
                else {
                    this.groupMap.put(gbfield,Math.max(this.groupMap.get(gbfield),newVaule));
                }
                break;

            case SUM:
                if (!this.groupMap.containsKey(gbfield)){
                    this.groupMap.put(gbfield,newVaule);
                }
                else {
                    this.groupMap.put(gbfield,this.groupMap.get(gbfield)+newVaule);
                }
                break;

            case COUNT:
                if (!this.groupMap.containsKey(gbfield)){
                    this.groupMap.put(gbfield,1);
                }
                else {
                    this.groupMap.put(gbfield,this.groupMap.get(gbfield)+1);
                }
                break;

            case SC_AVG:
                IntField countFiled = null;
                if (gbfield == null){
                    countFiled = (IntField) tup.getField(1);
                } else {
                    countFiled = (IntField) tup.getField(2);
                }
                int countValue = countFiled.getValue();
                if (!this.groupMap.containsKey(gbfield)){
                    this.groupMap.put(gbfield,newVaule);
                    this.countMap.put(gbfield,countValue);
                } else {
                    this.groupMap.put(gbfield,this.groupMap.get(gbfield) + newVaule);
                    this.countMap.put(gbfield,this.countMap.get(gbfield) + countValue);
                }
                break;

            case AVG:
                if (!this.avgMap.containsKey(gbfield)){
                    List<Integer> l = new ArrayList<>();
                    l.add(newVaule);
                    this.avgMap.put(gbfield,l);
                }
                else {
                    List<Integer> l = this.avgMap.get(gbfield);
                    l.add(newVaule);
                }
                break;
            default:
                throw new IllegalArgumentException("Aggregate not supported!");
        }
    }

    /**
     * Create a OpIterator over group aggregate results.
     * 
     * @return a OpIterator whose tuples are the pair (groupVal, aggregateVal)
     *         if using group, or a single (aggregateVal) if no grouping. The
     *         aggregateVal is determined by the type of aggregate specified in
     *         the constructor.
     */
    public OpIterator iterator() {
        // some code goes here
        return new IntAggIterator();
    }

    private class IntAggIterator extends AggregateIterator{

        private Iterator<Map.Entry<Field,List<Integer>>> avgIt;
        private boolean isAvg;
        private boolean isSCAvg;
        private boolean isSumCount;

        public IntAggIterator() {
            super(groupMap, gbfieldtype);
            this.isAvg = what.equals(Op.AVG);
            this.isSCAvg = what.equals(Op.SC_AVG);
            this.isSumCount = what.equals(Op.SUM_COUNT);
            if (isSumCount){
                this.td = new TupleDesc(new Type[] {this.itgbfieldtype,Type.INT_TYPE,Type.INT_TYPE},
                        new String[]{"groupVal","sumVal","countVal"});
            }
        }

        @Override
        public void open() throws DbException, TransactionAbortedException {
            super.open();
            if (this.isAvg || this.isSumCount){
                this.avgIt = avgMap.entrySet().iterator();
            } else {
                this.avgIt = null;
            }
        }

        @Override
        public boolean hasNext() throws DbException, TransactionAbortedException {
            if (this.isAvg || this.isSumCount){
                return avgIt.hasNext();
            }
            return super.hasNext();
        }

        @Override
        public Tuple next() throws DbException, TransactionAbortedException, NoSuchElementException {
            Tuple rtn = new Tuple(td);
            if (this.isAvg || this.isSumCount){
                Map.Entry<Field,List<Integer>> avgOrSumCountEntry = this.avgIt.next();
                Field avgOrSumCountField = avgOrSumCountEntry.getKey();
                List<Integer> avgOrSumCountList = avgOrSumCountEntry.getValue();
                if (this.isAvg){
                    int value = this.sumList(avgOrSumCountList) / avgOrSumCountList.size();
                    this.setFields(rtn,value,avgOrSumCountField);
                    return rtn;
                } else {
                    this.setFields(rtn,sumList(avgOrSumCountList),avgOrSumCountField);
                    if (avgOrSumCountField != null){
                        rtn.setField(2,new IntField(avgOrSumCountList.size()));
                    } else {
                        rtn.setField(1,new IntField(avgOrSumCountList.size()));
                    }
                }
            } else if (this.isSCAvg){
                Map.Entry<Field,Integer> entry = this.it.next();
                Field f = entry.getKey();
                this.setFields(rtn,entry.getValue() / countMap.get(f), f);
                return rtn;
            }
            return super.next();
        }

        @Override
        public void rewind() throws DbException, TransactionAbortedException {
            super.rewind();
            if (this.isAvg || this.isSumCount){
                this.avgIt = avgMap.entrySet().iterator();
            }
        }

        @Override
        public TupleDesc getTupleDesc() {
            return super.getTupleDesc();
        }

        @Override
        public void close() {
            super.close();
            this.avgIt = null;
        }

        private int sumList(List<Integer> l){
            int sum = 0;
            for (int i:l) {
                sum += i;
            }
            return sum;
        }
    }

}

属性定义：
定义了gbfield表示group-by field id，gbfieldtype表示相应类型，afield表示agregate field id，what表示聚合类型有MIN, MAX, SUM, AVG, COUNT。
groupMap保存聚合的大部分情况，countMap保存每个分组的数目，avgMap保存需要统计平均值的value。
关键方法：
mergeTupleIntoGroup：合并tuple到group中，包括MIN, MAX, SUM, AVG, COUNT,SC_AVG操作。
iterator：返回一个IntAggIterator对象，包含聚合结果的迭代器。

src/java/simpledb/execution/StringAggregator.java

package simpledb.execution;

import simpledb.common.Type;
import simpledb.storage.Field;
import simpledb.storage.StringField;
import simpledb.storage.Tuple;

import java.util.HashMap;
import java.util.Map;

/**
 * Knows how to compute some aggregate over a set of StringFields.
 */
public class StringAggregator implements Aggregator {

    private static final long serialVersionUID = 1L;

    private int gbfield;
    private Type gbfieldtype;
    private int afield;
    private Op what;

    private Map<Field,Integer> groupMap;
    /**
     * Aggregate constructor
     * @param gbfield the 0-based index of the group-by field in the tuple, or NO_GROUPING if there is no grouping
     * @param gbfieldtype the type of the group by field (e.g., Type.INT_TYPE), or null if there is no grouping
     * @param afield the 0-based index of the aggregate field in the tuple
     * @param what aggregation operator to use -- only supports COUNT
     * @throws IllegalArgumentException if what != COUNT
     */

    public StringAggregator(int gbfield, Type gbfieldtype, int afield, Op what) {
        // some code goes here
        if (!what.equals(Op.COUNT)){
            throw new IllegalArgumentException("Only COUNT is supported for String fields!");
        }
        this.gbfield = gbfield;
        this.gbfieldtype = gbfieldtype;
        this.afield = afield;
        this.what = what;
        this.groupMap = new HashMap<>();
    }

    /**
     * Merge a new tuple into the aggregate, grouping as indicated in the constructor
     * @param tup the Tuple containing an aggregate field and a group-by field
     */
    public void mergeTupleIntoGroup(Tuple tup) {
        // some code goes here
        StringField afield = (StringField) tup.getField(this.afield);
        Field gbfield = this.gbfield == NO_GROUPING ? null : tup.getField(this.gbfield);//group by field
        String newValue = afield.getValue();//aggregate field
        if (gbfield != null && gbfield.getType() != this.gbfieldtype){
            throw new IllegalArgumentException("Given tuple has wrong type");
        }
        if (!this.groupMap.containsKey(gbfield)){
            this.groupMap.put(gbfield,1);
        } else {
            this.groupMap.put(gbfield,this.groupMap.get(gbfield) + 1);
        }
    }

    /**
     * Create a OpIterator over group aggregate results.
     *
     * @return a OpIterator whose tuples are the pair (groupVal,
     *   aggregateVal) if using group, or a single (aggregateVal) if no
     *   grouping. The aggregateVal is determined by the type of
     *   aggregate specified in the constructor.
     */
    public OpIterator iterator() {
        // some code goes here
        return new AggregateIterator(this.groupMap,this.gbfieldtype);
    }

}

属性定义：
定义了gbfield表示group-by field id，gbfieldtype表示相应类型，afield表示agregate field id，what表示聚合类型有COUNT。
groupMap保存聚合的COUNT情况
关键方法：
mergeTupleIntoGroup：合并tuple到group中，包括COUNT操作。
iterator：返回一个AggregateIterator对象，包含聚合结果的迭代器。

src/java/simpledb/execution/Aggregate.java

package simpledb.execution;

import simpledb.common.DbException;
import simpledb.common.Type;
import simpledb.storage.Tuple;
import simpledb.storage.TupleDesc;
import simpledb.transaction.TransactionAbortedException;

import java.util.ArrayList;
import java.util.List;
import java.util.NoSuchElementException;


/**
 * The Aggregation operator that computes an aggregate (e.g., sum, avg, max,
 * min). Note that we only support aggregates over a single column, grouped by a
 * single column.
 */
public class Aggregate extends Operator {

    private static final long serialVersionUID = 1L;

    private OpIterator child;
    private int afield;
    private int gfield;
    private Aggregator.Op aop;
    private Aggregator aggregator;
    private OpIterator it;
    private TupleDesc td;

    /**
     * Constructor.
     * <p>
     * Implementation hint: depending on the type of afield, you will want to
     * construct an {@link IntegerAggregator} or {@link StringAggregator} to help
     * you with your implementation of readNext().
     *
     * @param child  The OpIterator that is feeding us tuples.
     * @param afield The column over which we are computing an aggregate.
     * @param gfield The column over which we are grouping the result, or -1 if
     *               there is no grouping
     * @param aop    The aggregation operator to use
     */
    public Aggregate(OpIterator child, int afield, int gfield, Aggregator.Op aop) {
        // some code goes here
        this.child = child;
        this.afield = afield;
        this.gfield = gfield;
        this.aop = aop;
        Type gfieldtype = gfield == -1 ? null : this.child.getTupleDesc().getFieldType(this.gfield);//group by field type
        if (this.child.getTupleDesc().getFieldType(this.afield) == (Type.STRING_TYPE)){
            this.aggregator = new StringAggregator(this.gfield,gfieldtype,this.afield,this.aop);
        } else {
            this.aggregator = new IntegerAggregator(this.gfield,gfieldtype,this.afield,this.aop);
        }
        this.it = this.aggregator.iterator();
        List<Type> types = new ArrayList<>();
        List<String> names = new ArrayList<>();
        if (gfieldtype != null){
            types.add(gfieldtype);//group by field type
            names.add(this.child.getTupleDesc().getFieldName(this.afield));//field name
        }
        types.add(this.child.getTupleDesc().getFieldType(this.afield));
        names.add(this.child.getTupleDesc().getFieldName(this.afield));
        if (aop.equals(Aggregator.Op.SUM_COUNT)){
            types.add(Type.INT_TYPE);
            names.add("COUNT");
        }
        this.td = new TupleDesc(types.toArray(new Type[types.size()]),names.toArray(new String[names.size()]));
    }

    /**
     * @return If this aggregate is accompanied by a groupby, return the groupby
     * field index in the <b>INPUT</b> tuples. If not, return
     * {@link Aggregator#NO_GROUPING}
     */
    public int groupField() {
        // some code goes here
        return this.gfield;
    }

    /**
     * @return If this aggregate is accompanied by a group by, return the name
     * of the groupby field in the <b>OUTPUT</b> tuples. If not, return
     * null;
     */
    public String groupFieldName() {
        // some code goes here
        return this.td.getFieldName(0);
    }

    /**
     * @return the aggregate field
     */
    public int aggregateField() {
        // some code goes here
        return this.afield;
    }

    /**
     * @return return the name of the aggregate field in the <b>OUTPUT</b>
     * tuples
     */
    public String aggregateFieldName() {
        // some code goes here
        if (this.gfield == -1){
            return this.td.getFieldName(0);
        } else {
            return this.td.getFieldName(1);
        }
    }

    /**
     * @return return the aggregate operator
     */
    public Aggregator.Op aggregateOp() {
        // some code goes here
        return this.aop;
    }

    public static String nameOfAggregatorOp(Aggregator.Op aop) {
        return aop.toString();
    }

    public void open() throws NoSuchElementException, DbException,
            TransactionAbortedException {
        // some code goes here
        this.child.open();
        while(this.child.hasNext()){
            this.aggregator.mergeTupleIntoGroup(this.child.next());
        }
        this.it.open();
        super.open();
    }

    /**
     * Returns the next tuple. If there is a group by field, then the first
     * field is the field by which we are grouping, and the second field is the
     * result of computing the aggregate. If there is no group by field, then
     * the result tuple should contain one field representing the result of the
     * aggregate. Should return null if there are no more tuples.
     */
    protected Tuple fetchNext() throws TransactionAbortedException, DbException {
        // some code goes here
        while(this.it.hasNext()){
            return this.it.next();
        }
        return null;
    }

    public void rewind() throws DbException, TransactionAbortedException {
        // some code goes here
        this.child.rewind();
        this.it.rewind();
    }

    /**
     * Returns the TupleDesc of this Aggregate. If there is no group by field,
     * this will have one field - the aggregate column. If there is a group by
     * field, the first field will be the group by field, and the second will be
     * the aggregate value column.
     * <p>
     * The name of an aggregate column should be informative. For example:
     * "aggName(aop) (child_td.getFieldName(afield))" where aop and afield are
     * given in the constructor, and child_td is the TupleDesc of the child
     * iterator.
     */
    public TupleDesc getTupleDesc() {
        // some code goes here
        return this.td;
    }

    public void close() {
        // some code goes here
        super.close();
        this.child.close();
        this.it.close();
    }

    @Override
    public OpIterator[] getChildren() {
        // some code goes here
        return new OpIterator[]{this.child};
    }

    @Override
    public void setChildren(OpIterator[] children) {
        // some code goes here
        this.child = children[0];
        List<Type> types = new ArrayList<>();
        List<String> names = new ArrayList<>();
        Type gfieldtype = gfield == -1 ? null : this.child.getTupleDesc().getFieldType(gfield);
        if (gfieldtype != null){
            types.add(gfieldtype);
            names.add(this.child.getTupleDesc().getFieldName(gfield));
        }
        types.add(this.child.getTupleDesc().getFieldType(afield));
        names.add(this.child.getTupleDesc().getFieldName(afield));
        if (aop.equals(Aggregator.Op.SUM_COUNT)){
            types.add(Type.INT_TYPE);
            names.add("COUNT");
        }
        this.td = new TupleDesc(types.toArray(new Type[types.size()]),names.toArray(new String[names.size()]));
    }
}

属性定义：
private OpIterator child; //提供tuple的迭代器
private int afield;//被聚合字段
private int gfield;//聚合字段
private Aggregator.Op aop;//聚合操作，如count
private Aggregator aggregator;//聚合器
private OpIterator it;//聚合器的迭代器
private TupleDesc td;//保存聚合、被聚合字段的元组描述
关键方法：
open：聚合每个结果到aggregator
fetchNext：依次取结果。

ant runtest -Dtest=IntegerAggregatorTest
ant runtest -Dtest=StringAggregatorTest
ant runtest -Dtest=FilterTest 和 ant runtest -Dtest=AggregateTest
这些单元测试都BUILD SUCCESSFUL。

ant runsystest -Dtest=AggregateTest
这些系统测试BUILD SUCCESSFUL。

2.3HeapFile Mutability

src/java/simpledb/storage/HeapPage.java

package simpledb.storage;

import simpledb.common.Catalog;
import simpledb.common.Database;
import simpledb.common.DbException;
import simpledb.transaction.TransactionId;

import java.io.*;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.NoSuchElementException;

/**
 * Each instance of HeapPage stores data for one page of HeapFiles and
 * implements the Page interface that is used by BufferPool.
 *
 * @see HeapFile
 * @see BufferPool
 *
 */
public class HeapPage implements Page {

    final HeapPageId pid;
    final TupleDesc td;
    final byte[] header;
    final Tuple[] tuples;
    final int numSlots;

    byte[] oldData;
    private final Byte oldDataLock= (byte) 0;

    /**
     * the transaction id which changed the page to dirty
     * */
    private TransactionId dirtyId;
    private boolean dirty;

    /**
     * Create a HeapPage from a set of bytes of data read from disk.
     * The format of a HeapPage is a set of header bytes indicating
     * the slots of the page that are in use, some number of tuple slots.
     *  Specifically, the number of tuples is equal to: <p>
     *          floor((BufferPool.getPageSize()*8) / (tuple size * 8 + 1))
     * <p> where tuple size is the size of tuples in this
     * database table, which can be determined via {@link Catalog#getTupleDesc}.
     * The number of 8-bit header words is equal to:
     * <p>
     *      ceiling(no. tuple slots / 8)
     * <p>
     * @see Database#getCatalog
     * @see Catalog#getTupleDesc
     * @see BufferPool#getPageSize()
     */
    public HeapPage(HeapPageId id, byte[] data) throws IOException {
        this.pid = id;
        this.td = Database.getCatalog().getTupleDesc(id.getTableId());
        this.numSlots = getNumTuples();
        DataInputStream dis = new DataInputStream(new ByteArrayInputStream(data));

        // allocate and read the header slots of this page
        header = new byte[getHeaderSize()];
        for (int i=0; i<header.length; i++)
            header[i] = dis.readByte();

        tuples = new Tuple[numSlots];
        try{
            // allocate and read the actual records of this page
            for (int i=0; i<tuples.length; i++)
                tuples[i] = readNextTuple(dis,i);
        }catch(NoSuchElementException e){
            e.printStackTrace();
        }
        dis.close();

        setBeforeImage();
    }

    /** Retrieve the number of tuples on this page.
        @return the number of tuples on this page
    */
    private int getNumTuples() {
        // some code goes here
        int num = (int) Math.floor((BufferPool.getPageSize()*8*1.0)/(td.getSize()*8 + 1));
        return num;
    }

    /**
     * Computes the number of bytes in the header of a page in a HeapFile with each tuple occupying tupleSize bytes
     * @return the number of bytes in the header of a page in a HeapFile with each tuple occupying tupleSize bytes
     */
    private int getHeaderSize() {
        // some code goes here
        return (int) Math.ceil(getNumTuples()*1.0/8);

    }

    /** Return a view of this page before it was modified
        -- used by recovery */
    public HeapPage getBeforeImage(){
        try {
            byte[] oldDataRef = null;
            synchronized(oldDataLock)
            {
                oldDataRef = oldData;
            }
            return new HeapPage(pid,oldDataRef);
        } catch (IOException e) {
            e.printStackTrace();
            //should never happen -- we parsed it OK before!
            System.exit(1);
        }
        return null;
    }

    public void setBeforeImage() {
        synchronized(oldDataLock)
        {
        oldData = getPageData().clone();
        }
    }

    /**
     * @return the PageId associated with this page.
     */
    public HeapPageId getId() {
    // some code goes here
        return pid;
    }

    /**
     * Suck up tuples from the source file.
     */
    private Tuple readNextTuple(DataInputStream dis, int slotId) throws NoSuchElementException {
        // if associated bit is not set, read forward to the next tuple, and
        // return null.
        if (!isSlotUsed(slotId)) {
            for (int i=0; i<td.getSize(); i++) {
                try {
                    dis.readByte();
                } catch (IOException e) {
                    throw new NoSuchElementException("error reading empty tuple");
                }
            }
            return null;
        }

        // read fields in the tuple
        Tuple t = new Tuple(td);
        RecordId rid = new RecordId(pid, slotId);
        t.setRecordId(rid);
        try {
            for (int j=0; j<td.numFields(); j++) {
                Field f = td.getFieldType(j).parse(dis);
                t.setField(j, f);
            }
        } catch (java.text.ParseException e) {
            e.printStackTrace();
            throw new NoSuchElementException("parsing error!");
        }

        return t;
    }

    /**
     * Generates a byte array representing the contents of this page.
     * Used to serialize this page to disk.
     * <p>
     * The invariant here is that it should be possible to pass the byte
     * array generated by getPageData to the HeapPage constructor and
     * have it produce an identical HeapPage object.
     *
     * @see #HeapPage
     * @return A byte array correspond to the bytes of this page.
     */
    public byte[] getPageData() {
        int len = BufferPool.getPageSize();
        ByteArrayOutputStream baos = new ByteArrayOutputStream(len);
        DataOutputStream dos = new DataOutputStream(baos);

        // create the header of the page
        for (byte b : header) {
            try {
                dos.writeByte(b);
            } catch (IOException e) {
                // this really shouldn't happen
                e.printStackTrace();
            }
        }

        // create the tuples
        for (int i=0; i<tuples.length; i++) {

            // empty slot
            if (!isSlotUsed(i)) {
                for (int j=0; j<td.getSize(); j++) {
                    try {
                        dos.writeByte(0);
                    } catch (IOException e) {
                        e.printStackTrace();
                    }

                }
                continue;
            }

            // non-empty slot
            for (int j=0; j<td.numFields(); j++) {
                Field f = tuples[i].getField(j);
                try {
                    f.serialize(dos);

                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }

        // padding
        int zerolen = BufferPool.getPageSize() - (header.length + td.getSize() * tuples.length); //- numSlots * td.getSize();
        byte[] zeroes = new byte[zerolen];
        try {
            dos.write(zeroes, 0, zerolen);
        } catch (IOException e) {
            e.printStackTrace();
        }

        try {
            dos.flush();
        } catch (IOException e) {
            e.printStackTrace();
        }

        return baos.toByteArray();
    }

    /**
     * Static method to generate a byte array corresponding to an empty
     * HeapPage.
     * Used to add new, empty pages to the file. Passing the results of
     * this method to the HeapPage constructor will create a HeapPage with
     * no valid tuples in it.
     *
     * @return The returned ByteArray.
     */
    public static byte[] createEmptyPageData() {
        int len = BufferPool.getPageSize();
        return new byte[len]; //all 0
    }

    /**
     * Delete the specified tuple from the page; the corresponding header bit should be updated to reflect
     *   that it is no longer stored on any page.
     * @throws DbException if this tuple is not on this page, or tuple slot is
     *         already empty.
     * @param t The tuple to delete
     */
    public void deleteTuple(Tuple t) throws DbException {
        // some code goes here
        // not necessary for lab1
        if (t.getRecordId() == null){
            throw new DbException("tuple does not exist");
        }
        int tid = t.getRecordId().getTupleNumber();
        if (tid >= numSlots || tuples[tid] == null){
            throw new DbException("tuple does not exist");
        }
        if (!isSlotUsed(tid)){
            throw new DbException("the slot is already empty");
        }
//        else {
//            markSlotUsed(tid,false);
//            tuples[tid] = null;
//        }
        if (tuples[tid].equals(t)){
            tuples[tid] = null;
            markSlotUsed(tid,false);
            return;
        }
        throw new DbException("tuple does not exist");
    }

    /**
     * Adds the specified tuple to the page;  the tuple should be updated to reflect
     *  that it is now stored on this page.
     * @throws DbException if the page is full (no empty slots) or tupledesc
     *         is mismatch.
     * @param t The tuple to add.
     */
    public void insertTuple(Tuple t) throws DbException {
        // some code goes here
        // not necessary for lab1
        if (getNumEmptySlots() == 0 || !t.getTupleDesc().equals(td)){
            throw new DbException("page is full or tuple descriptor does not match");
        }
        for (int i = 0; i < numSlots; i++) {
            if (!isSlotUsed(i)){
                markSlotUsed(i,true);
                t.setRecordId(new RecordId(pid,i));
                tuples[i] = t;
                break;
            }
        }
    }

    /**
     * Marks this page as dirty/not dirty and record that transaction
     * that did the dirtying
     */
    public void markDirty(boolean dirty, TransactionId tid) {
        // some code goes here
	// not necessary for lab1
        this.dirty = dirty;
        this.dirtyId = tid;
    }

    /**
     * Returns the tid of the transaction that last dirtied this page, or null if the page is not dirty
     */
    public TransactionId isDirty() {
        // some code goes here
	// Not necessary for lab1
        return this.dirty ? this.dirtyId : null;
    }

    /**
     * Returns the number of empty slots on this page.
     */
    public int getNumEmptySlots() {
        // some code goes here
        int cnt = 0;
        for (int i = 0; i < numSlots; i++) {
            if (!isSlotUsed(i)){
                ++cnt;
            }
        }
        return cnt;
    }

    /**
     * Returns true if associated slot on this page is filled.
     */
    public boolean isSlotUsed(int i) {
        // some code goes here
        int quot = i / 8;
        int remainder = i % 8;
        int bitidx = header[quot];
        int bit = (bitidx >> remainder) & 1;
        return bit == 1;
    }

    /**
     * Abstraction to fill or clear a slot on this page.
     */
    private void markSlotUsed(int i, boolean value) {
        // some code goes here
        // not necessary for lab1
        byte b = header[Math.floorDiv(i,8)];
        byte mask = (byte)(1<<(i%8));
        if (value){
            header[Math.floorDiv(i,8)] = (byte) (b | mask);
        } else {
            header[Math.floorDiv(i,8)] = (byte) (b & (~mask));
        }
    }

    /**
     * @return an iterator over all tuples on this page (calling remove on this iterator throws an UnsupportedOperationException)
     * (note that this iterator shouldn't return tuples in empty slots!)
     */
    public Iterator<Tuple> iterator() {
        // some code goes here
        ArrayList<Tuple> filleTuples = new ArrayList<>();
        for (int i = 0; i < numSlots; i++) {
            if (isSlotUsed(i)){
                filleTuples.add(tuples[i]);
            }
        }
        return filleTuples.iterator();
    }

}

变量定义：
final HeapPageId pid;//页面ID值
final TupleDesc td;//元组描述
final byte[] header;
final Tuple[] tuples;
final int numSlots;
byte[] oldData;
private final Byte oldDataLock= (byte) 0;
private TransactionId dirtyId;
private boolean dirty;
关键方法：
readNextTuple：从DataInputStream中读取某个槽的元组
getPageData：生成表示此页面内容的字节数组。用于将此页面序列化到磁盘。
deleteTuple：tuples相应位置为null，且设置槽为false。
insertTuple：设置tuples相应位置为t，且槽为true
markDirty：标记页面是否为脏，并记录事务id。
iterator：迭代整个页的元组。

src/java/simpledb/storage/HeapFile.java

package simpledb.storage;

import simpledb.common.Database;
import simpledb.common.DbException;
import simpledb.common.Permissions;
import simpledb.transaction.TransactionAbortedException;
import simpledb.transaction.TransactionId;

import java.io.*;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import java.util.NoSuchElementException;

/**
 * HeapFile is an implementation of a DbFile that stores a collection of tuples
 * in no particular order. Tuples are stored on pages, each of which is a fixed
 * size, and the file is simply a collection of those pages. HeapFile works
 * closely with HeapPage. The format of HeapPages is described in the HeapPage
 * constructor.
 * 
 * @see HeapPage#HeapPage
 * @author Sam Madden
 */
public class HeapFile implements DbFile {

    private final File file;
    private final TupleDesc td;
    /**
     * Constructs a heap file backed by the specified file.
     * 
     * @param f
     *            the file that stores the on-disk backing store for this heap
     *            file.
     */
    public HeapFile(File f, TupleDesc td) {
        // some code goes here
        this.file = f;
        this.td = td;
    }

    /**
     * Returns the File backing this HeapFile on disk.
     * 
     * @return the File backing this HeapFile on disk.
     */
    public File getFile() {
        // some code goes here
        return file;
    }

    /**
     * Returns an ID uniquely identifying this HeapFile. Implementation note:
     * you will need to generate this tableid somewhere to ensure that each
     * HeapFile has a "unique id," and that you always return the same value for
     * a particular HeapFile. We suggest hashing the absolute file name of the
     * file underlying the heapfile, i.e. f.getAbsoluteFile().hashCode().
     * 
     * @return an ID uniquely identifying this HeapFile.
     */
    public int getId() {
        // some code goes here
        return file.getAbsoluteFile().hashCode();
    }

    /**
     * Returns the TupleDesc of the table stored in this DbFile.
     * 
     * @return TupleDesc of this DbFile.
     */
    public TupleDesc getTupleDesc() {
        // some code goes here
        return td;
    }

    // see DbFile.java for javadocs
    public Page readPage(PageId pid) {
        // some code goes here
        int tableId = pid.getTableId();
        int pgNo = pid.getPageNumber();

        RandomAccessFile f = null;
        try {
            f = new RandomAccessFile(file,"r");
            if ((pgNo + 1) * BufferPool.getPageSize() > f.length()) {
                f.close();
                throw new IllegalArgumentException(String.format("table %d page %d is invalid", tableId, pgNo));
            }
            byte[] bytes = new byte[BufferPool.getPageSize()];
            f.seek(pgNo * BufferPool.getPageSize());
            int read = f.read(bytes,0,BufferPool.getPageSize());
            if (read != BufferPool.getPageSize()){
                throw new IllegalArgumentException(String.format("table %d page %d read %d bytes", tableId, pgNo, read));
            }
            HeapPageId id = new HeapPageId(pid.getTableId(), pid.getPageNumber());
            return new HeapPage(id,bytes);
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            try {
                f.close();
            } catch (Exception e){
                e.printStackTrace();
            }
        }
        throw new IllegalArgumentException(String.format("table %d page %d is invalid", tableId, pgNo));
    }

    // see DbFile.java for javadocs
    public void writePage(Page page) throws IOException {
        // some code goes here
        // not necessary for lab1
        int pgNo = page.getId().getPageNumber();
        if (pgNo > numPages()){
            throw new IllegalArgumentException();
        }
        int pgSize = BufferPool.getPageSize();
        RandomAccessFile f = new RandomAccessFile(file,"rw");
        f.seek(pgNo*pgSize);
        byte[] data = page.getPageData();
        f.write(data);
        f.close();

    }

    /**
     * Returns the number of pages in this HeapFile.
     */
    public int numPages() {
        // some code goes here
        int num = (int) Math.floor(file.length()*1.0 / BufferPool.getPageSize());
        return num;
    }

    // see DbFile.java for javadocs
    public List<Page> insertTuple(TransactionId tid, Tuple t)
            throws DbException, IOException, TransactionAbortedException {
        // some code goes here
        // not necessary for lab1
        ArrayList<Page> pageList = new ArrayList<>();
        for (int i = 0; i < numPages(); i++) {
            HeapPage p = (HeapPage) Database.getBufferPool().getPage(tid,
                    new HeapPageId(this.getId(),i),Permissions.READ_WRITE);
            if (p.getNumEmptySlots() == 0){
                continue;
            }
            p.insertTuple(t);
            pageList.add(p);
            return pageList;
        }

        BufferedOutputStream bw = new BufferedOutputStream(new FileOutputStream(file,true));
        byte[] emptyData = HeapPage.createEmptyPageData();
        bw.write(emptyData);
        bw.close();
        HeapPage p = (HeapPage) Database.getBufferPool().getPage(tid,
                new HeapPageId(getId(),numPages() - 1),Permissions.READ_WRITE);
        p.insertTuple(t);
        pageList.add(p);
        return pageList;
    }

    // see DbFile.java for javadocs
    public ArrayList<Page> deleteTuple(TransactionId tid, Tuple t) throws DbException,
            TransactionAbortedException {
        // some code goes here

        // not necessary for lab1
        ArrayList<Page> pageList = new ArrayList<>();
        HeapPage p = (HeapPage) Database.getBufferPool().getPage(tid,
                t.getRecordId().getPageId(),Permissions.READ_WRITE);
        p.deleteTuple(t);
        pageList.add(p);
        return pageList;
    }

    // see DbFile.java for javadocs
    public DbFileIterator iterator(TransactionId tid) {
        // some code goes here
        return new HeapFileIterator(this,tid);
    }

    private static final class HeapFileIterator implements DbFileIterator{
        private final HeapFile heapFile;
        private final TransactionId tid;
        private Iterator<Tuple> it;
        private int whichPage;

        public HeapFileIterator(HeapFile file,TransactionId tid){
            this.heapFile = file;
            this.tid = tid;
        }

        @Override
        public void open() throws DbException, TransactionAbortedException {
            whichPage = 0;
            it = getPageTuples(whichPage);
        }

        private Iterator<Tuple> getPageTuples(int pageNumber) throws DbException, TransactionAbortedException {
            if (pageNumber >= 0 && pageNumber < heapFile.numPages()){
                HeapPageId pid = new HeapPageId(heapFile.getId(),pageNumber);
                HeapPage page = (HeapPage) Database.getBufferPool().getPage(tid,pid,Permissions.READ_ONLY);
                return page.iterator();
            } else {
                throw new DbException(String.format("heapfile %d does not contain page %d!", pageNumber,heapFile.getId()));
            }
        }

        @Override
        public boolean hasNext() throws DbException, TransactionAbortedException {
            if (it == null) {
                return false;
            }
            if (!it.hasNext()){
                while (whichPage < (heapFile.numPages() - 1)){
                    whichPage++;
                    it = getPageTuples(whichPage);
                    if (it.hasNext()) {
                        return it.hasNext();
                    }
                }
                if (whichPage >= (heapFile.numPages() - 1)) {
                  return false;
                }
            } else {
                return true;
            }
            return false;
        }

        @Override
        public Tuple next() throws DbException, TransactionAbortedException, NoSuchElementException {
            if (it == null || !it.hasNext()){
                throw new NoSuchElementException();
            }
            return it.next();
        }

        @Override
        public void rewind() throws DbException, TransactionAbortedException {
            close();
            open();
        }

        @Override
        public void close() {
            it = null;
        }
    }

}

变量定义：
private final File file;//保存tuple的文件
private final TupleDesc td;
关键方法：
readPage、writePage：读或写页面到文件
insertTuple、deleteTuple：插入tuple或删除tuple到page，其中page从bufferpool获取

src/java/simpledb/storage/BufferPool.java

    public void insertTuple(TransactionId tid, int tableId, Tuple t)
        throws DbException, IOException, TransactionAbortedException {
        // some code goes here
        // not necessary for lab1
        DbFile f = Database.getCatalog().getDatabaseFile(tableId);
        updateBufferPool(f.insertTuple(tid,t),tid);
    }
 
    /**
     * Remove the specified tuple from the buffer pool.
     * Will acquire a write lock on the page the tuple is removed from and any
     * other pages that are updated. May block if the lock(s) cannot be acquired.
     *
     * Marks any pages that were dirtied by the operation as dirty by calling
     * their markDirty bit, and adds versions of any pages that have 
     * been dirtied to the cache (replacing any existing versions of those pages) so 
     * that future requests see up-to-date pages. 
     *
     * @param tid the transaction deleting the tuple.
     * @param t the tuple to delete
     */
    public  void deleteTuple(TransactionId tid, Tuple t)
        throws DbException, IOException, TransactionAbortedException {
        // some code goes here
        // not necessary for lab1
        DbFile f = Database.getCatalog().getDatabaseFile(t.getRecordId().getPageId().getTableId());
        updateBufferPool(f.deleteTuple(tid,t),tid);
    }
 
    private void updateBufferPool(ArrayList<Page> pagelist,TransactionId tid) throws DbException{
        for(Page p:pagelist){
            p.markDirty(true,tid);
            // update bufferpool
            if(pageStore.size() > numPages)
                evictPage();
            pageStore.put(p.getId(),p);
        }
    }

ant runtest -Dtest=HeapPageWriteTest
ant runtest -Dtest=HeapFileWriteTest
ant runtest -Dtest=BufferPoolWriteTest
都BUILD SUCCESSFUL。

2.4. Insertion and deletion

src/java/simpledb/execution/Insert.java

package simpledb.execution;

import simpledb.common.Database;
import simpledb.common.DbException;
import simpledb.common.Type;
import simpledb.storage.BufferPool;
import simpledb.storage.IntField;
import simpledb.storage.Tuple;
import simpledb.storage.TupleDesc;
import simpledb.transaction.TransactionAbortedException;
import simpledb.transaction.TransactionId;

import java.io.IOException;

/**
 * Inserts tuples read from the child operator into the tableId specified in the
 * constructor
 */
public class Insert extends Operator {

    private static final long serialVersionUID = 1L;

    private TransactionId tid;
    private OpIterator child;
    private int tableId;
    private final TupleDesc td;

    /**
     * helper for fetchNext
     * */
    private int counter;
    private boolean called;
    /**
     * Constructor.
     *
     * @param t
     *            The transaction running the insert.
     * @param child
     *            The child operator from which to read tuples to be inserted.
     * @param tableId
     *            The table in which to insert tuples.
     * @throws DbException
     *             if TupleDesc of child differs from table into which we are to
     *             insert.
     */
    public Insert(TransactionId t, OpIterator child, int tableId)
            throws DbException {
        // some code goes here
        if (!child.getTupleDesc().equals(Database.getCatalog().getTupleDesc(tableId))){
            throw new DbException("TupleDesc does not match!");
        }
        this.tid = t;
        this.child = child;
        this.tableId = tableId;
        this.td = new TupleDesc(new Type[]{Type.INT_TYPE},new String[]{"number of inserted tuples"});
        this.counter = -1;
        this.called = false;
    }

    public TupleDesc getTupleDesc() {
        // some code goes here
        return this.td;
    }

    public void open() throws DbException, TransactionAbortedException {
        // some code goes here
        this.counter = 0;
        this.child.open();
        super.open();
    }

    public void close() {
        // some code goes here
        super.close();
        this.child.close();
        this.counter = -1;
        this.called = false;
    }

    public void rewind() throws DbException, TransactionAbortedException {
        // some code goes here
        this.child.rewind();
        this.counter = 0;
        this.called = false;
    }

    /**
     * Inserts tuples read from child into the tableId specified by the
     * constructor. It returns a one field tuple containing the number of
     * inserted records. Inserts should be passed through BufferPool. An
     * instances of BufferPool is available via Database.getBufferPool(). Note
     * that insert DOES NOT need check to see if a particular tuple is a
     * duplicate before inserting it.
     *
     * @return A 1-field tuple containing the number of inserted records, or
     *         null if called more than once.
     * @see Database#getBufferPool
     * @see BufferPool#insertTuple
     */
    protected Tuple fetchNext() throws TransactionAbortedException, DbException {
        // some code goes here
        if (this.called){
            return null;
        }
        this.called = true;
        while (this.child.hasNext()){
            Tuple t = this.child.next();
            try {
                Database.getBufferPool().insertTuple(this.tid,this.tableId,t);
                this.counter++;
            } catch (IOException e){
                e.printStackTrace();
                break;
            }
        }
        Tuple tu = new Tuple(this.td);
        tu.setField(0,new IntField(this.counter));
        return tu;
    }

    @Override
    public OpIterator[] getChildren() {
        // some code goes here
        return new OpIterator[]{this.child};
    }

    @Override
    public void setChildren(OpIterator[] children) {
        // some code goes here
        this.child = children[0];
    }
}

方法定义：
private TransactionId tid;//事务id
private OpIterator child;//要插入tuple迭代器
private int tableId;//表id
private final TupleDesc td;//元组描述
private int counter;//计数器
private boolean called;//
关键方法：
fetchNext：调用bufferpool去insert元组。

在这里插入图片描述

src/java/simpledb/execution/Delete.java

package simpledb.execution;

import simpledb.common.Database;
import simpledb.common.DbException;
import simpledb.common.Type;
import simpledb.storage.BufferPool;
import simpledb.storage.IntField;
import simpledb.storage.Tuple;
import simpledb.storage.TupleDesc;
import simpledb.transaction.TransactionAbortedException;
import simpledb.transaction.TransactionId;

import java.io.IOException;

/**
 * The delete operator. Delete reads tuples from its child operator and removes
 * them from the table they belong to.
 */
public class Delete extends Operator {

    private static final long serialVersionUID = 1L;

    private TransactionId tid;
    private OpIterator child;
    private final TupleDesc td;
    private int counter;
    private boolean called;
    /**
     * Constructor specifying the transaction that this delete belongs to as
     * well as the child to read from.
     * 
     * @param t
     *            The transaction this delete runs in
     * @param child
     *            The child operator from which to read tuples for deletion
     */
    public Delete(TransactionId t, OpIterator child) {
        // some code goes here
        this.tid = t;
        this.child = child;
        this.td = new TupleDesc(new Type[]{Type.INT_TYPE},new String[]{"number of deleted tuples"});
        this.counter = -1;
        this.called = false;
    }

    public TupleDesc getTupleDesc() {
        // some code goes here
        return this.td;
    }

    public void open() throws DbException, TransactionAbortedException {
        // some code goes here
        this.child.open();
        super.open();
        this.counter = 0;
    }

    public void close() {
        // some code goes here
        super.close();
        this.child.close();
        this.counter = -1;
    }

    public void rewind() throws DbException, TransactionAbortedException {
        // some code goes here
        this.child.rewind();
        this.counter = 0;
    }

    /**
     * Deletes tuples as they are read from the child operator. Deletes are
     * processed via the buffer pool (which can be accessed via the
     * Database.getBufferPool() method.
     * 
     * @return A 1-field tuple containing the number of deleted records.
     * @see Database#getBufferPool
     * @see BufferPool#deleteTuple
     */
    protected Tuple fetchNext() throws TransactionAbortedException, DbException {
        // some code goes here
        if (this.called){
            return null;
        }
        this.called = true;
        while (this.child.hasNext()){
            Tuple t = this.child.next();
            try{
                Database.getBufferPool().deleteTuple(this.tid,t);
                this.counter++;
            } catch (IOException e){
                e.printStackTrace();
                break;
            }
        }
        Tuple tu = new Tuple(this.td);
        tu.setField(0,new IntField(this.counter));
        return tu;
    }

    @Override
    public OpIterator[] getChildren() {
        // some code goes here
        return new OpIterator[]{this.child};
    }

    @Override
    public void setChildren(OpIterator[] children) {
        // some code goes here
        this.child = children[0];
    }

}

变量定义：
private TransactionId tid;//事务id
private OpIterator child;//要插入tuple迭代器
private int tableId;//表id
private final TupleDesc td;//元组描述
private int counter;//计数器
private boolean called;//
关键方法：
fetchNext：调用bufferpool去delete元组。

ant runtest -Dtest=InsertTest
单元测试 BUILD SUCCESSFUL。
ant runsystest -Dtest=InsertTest
ant runsystest -Dtest=DeleteTest
系统测试BUILD SUCCESSFUL。

2.5. Page eviction

src/java/simpledb/storage/BufferPool.java

package simpledb.storage;

import simpledb.common.Database;
import simpledb.common.DbException;
import simpledb.common.Permissions;
import simpledb.transaction.TransactionAbortedException;
import simpledb.transaction.TransactionId;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ConcurrentHashMap;

/**
 * BufferPool manages the reading and writing of pages into memory from
 * disk. Access methods call into it to retrieve pages, and it fetches
 * pages from the appropriate location.
 * <p>
 * The BufferPool is also responsible for locking;  when a transaction fetches
 * a page, BufferPool checks that the transaction has the appropriate
 * locks to read/write the page.
 * 
 * @Threadsafe, all fields are final
 */
public class BufferPool {
    /** Bytes per page, including header. */
    private static final int DEFAULT_PAGE_SIZE = 4096;

    private static int pageSize = DEFAULT_PAGE_SIZE;
    
    /** Default number of pages passed to the constructor. This is used by
    other classes. BufferPool should use the numPages argument to the
    constructor instead. */
    public static final int DEFAULT_PAGES = 50;

    private final int numPages;
    private final ConcurrentHashMap<Integer,Page> pageStore;

    /**
     * Creates a BufferPool that caches up to numPages pages.
     *
     * @param numPages maximum number of pages in this buffer pool.
     */
    public BufferPool(int numPages) {
        // some code goes here
        this.numPages = numPages;
        pageStore = new ConcurrentHashMap<>();
    }
    
    public static int getPageSize() {
      return pageSize;
    }
    
    // THIS FUNCTION SHOULD ONLY BE USED FOR TESTING!!
    public static void setPageSize(int pageSize) {
    	BufferPool.pageSize = pageSize;
    }
    
    // THIS FUNCTION SHOULD ONLY BE USED FOR TESTING!!
    public static void resetPageSize() {
    	BufferPool.pageSize = DEFAULT_PAGE_SIZE;
    }

    /**
     * Retrieve the specified page with the associated permissions.
     * Will acquire a lock and may block if that lock is held by another
     * transaction.
     * <p>
     * The retrieved page should be looked up in the buffer pool.  If it
     * is present, it should be returned.  If it is not present, it should
     * be added to the buffer pool and returned.  If there is insufficient
     * space in the buffer pool, a page should be evicted and the new page
     * should be added in its place.
     *
     * @param tid the ID of the transaction requesting the page
     * @param pid the ID of the requested page
     * @param perm the requested permissions on the page
     */
    public  Page getPage(TransactionId tid, PageId pid, Permissions perm)
        throws TransactionAbortedException, DbException {
        // some code goes here
        if (!pageStore.containsKey(pid.hashCode())){
            if (pageStore.size() > numPages){
                evictPage();
            }
            DbFile dbFile = Database.getCatalog().getDatabaseFile(pid.getTableId());
            Page page = dbFile.readPage(pid);
            pageStore.put(pid.hashCode(),page);
        }
        return pageStore.get(pid.hashCode());
    }

    /**
     * Releases the lock on a page.
     * Calling this is very risky, and may result in wrong behavior. Think hard
     * about who needs to call this and why, and why they can run the risk of
     * calling it.
     *
     * @param tid the ID of the transaction requesting the unlock
     * @param pid the ID of the page to unlock
     */
    public  void unsafeReleasePage(TransactionId tid, PageId pid) {
        // some code goes here
        // not necessary for lab1|lab2
    }

    /**
     * Release all locks associated with a given transaction.
     *
     * @param tid the ID of the transaction requesting the unlock
     */
    public void transactionComplete(TransactionId tid) {
        // some code goes here
        // not necessary for lab1|lab2
    }

    /** Return true if the specified transaction has a lock on the specified page */
    public boolean holdsLock(TransactionId tid, PageId p) {
        // some code goes here
        // not necessary for lab1|lab2
        return false;
    }

    /**
     * Commit or abort a given transaction; release all locks associated to
     * the transaction.
     *
     * @param tid the ID of the transaction requesting the unlock
     * @param commit a flag indicating whether we should commit or abort
     */
    public void transactionComplete(TransactionId tid, boolean commit) {
        // some code goes here
        // not necessary for lab1|lab2
    }

    /**
     * Add a tuple to the specified table on behalf of transaction tid.  Will
     * acquire a write lock on the page the tuple is added to and any other 
     * pages that are updated (Lock acquisition is not needed for lab2). 
     * May block if the lock(s) cannot be acquired.
     * 
     * Marks any pages that were dirtied by the operation as dirty by calling
     * their markDirty bit, and adds versions of any pages that have 
     * been dirtied to the cache (replacing any existing versions of those pages) so 
     * that future requests see up-to-date pages. 
     *
     * @param tid the transaction adding the tuple
     * @param tableId the table to add the tuple to
     * @param t the tuple to add
     */
    public void insertTuple(TransactionId tid, int tableId, Tuple t)
        throws DbException, IOException, TransactionAbortedException {
        // some code goes here
        // not necessary for lab1
        DbFile f = Database.getCatalog().getDatabaseFile(tableId);
        updateBufferPool(f.insertTuple(tid,t),tid);
    }



    /**
     * Remove the specified tuple from the buffer pool.
     * Will acquire a write lock on the page the tuple is removed from and any
     * other pages that are updated. May block if the lock(s) cannot be acquired.
     *
     * Marks any pages that were dirtied by the operation as dirty by calling
     * their markDirty bit, and adds versions of any pages that have 
     * been dirtied to the cache (replacing any existing versions of those pages) so 
     * that future requests see up-to-date pages. 
     *
     * @param tid the transaction deleting the tuple.
     * @param t the tuple to delete
     */
    public  void deleteTuple(TransactionId tid, Tuple t)
        throws DbException, IOException, TransactionAbortedException {
        // some code goes here
        // not necessary for lab1
        DbFile f = Database.getCatalog().getDatabaseFile(t.getRecordId().getPageId().getTableId());
        updateBufferPool(f.deleteTuple(tid,t),tid);
    }

    private void updateBufferPool(List<Page> pageList, TransactionId tid) throws DbException {
        for (Page p:pageList) {
            p.markDirty(true,tid);
            if (pageStore.size() > numPages){
                evictPage();
            }
            pageStore.put(p.getId().getPageNumber(),p);
        }

    }

    /**
     * Flush all dirty pages to disk.
     * NB: Be careful using this routine -- it writes dirty data to disk so will
     *     break simpledb if running in NO STEAL mode.
     */
    public synchronized void flushAllPages() throws IOException {
        // some code goes here
        // not necessary for lab1
        for (Page p:pageStore.values()) {
            flushPage(p.getId());
        }
    }

    /** Remove the specific page id from the buffer pool.
        Needed by the recovery manager to ensure that the
        buffer pool doesn't keep a rolled back page in its
        cache.
        
        Also used by B+ tree files to ensure that deleted pages
        are removed from the cache so they can be reused safely
    */
    public synchronized void discardPage(PageId pid) {
        // some code goes here
        // not necessary for lab1
        pageStore.remove(pid.hashCode());
    }

    /**
     * Flushes a certain page to disk
     * @param pid an ID indicating the page to flush
     */
    private synchronized  void flushPage(PageId pid) throws IOException {
        // some code goes here
        // not necessary for lab1
        Page p = pageStore.get(pid.hashCode());
        TransactionId tid = null;
        if ((tid = p.isDirty()) != null){
            Database.getLogFile().logWrite(tid,p.getBeforeImage(),p);
            Database.getLogFile().force();
            Database.getCatalog().getDatabaseFile(pid.getTableId()).writePage(p);
            p.markDirty(false,null);
        }
    }

    /** Write all pages of the specified transaction to disk.
     */
    public synchronized  void flushPages(TransactionId tid) throws IOException {
        // some code goes here
        // not necessary for lab1|lab2
    }

    /**
     * Discards a page from the buffer pool.
     * Flushes the page to disk to ensure dirty pages are updated on disk.
     */
    private synchronized  void evictPage() throws DbException {
        // some code goes here
        // not necessary for lab1
        PageId pid = new ArrayList<>(pageStore.values()).get(0).getId();
        try {
            flushPage(pid);
        } catch (IOException e){
            e.printStackTrace();
        }
        discardPage(pid);
    }

}

变量定义：
private static final int DEFAULT_PAGE_SIZE = 4096;//默认页大小
private static int pageSize = DEFAULT_PAGE_SIZE;//默认页大小
public static final int DEFAULT_PAGES = 50;//默认页数目
private final int numPages;//实际页数目
private final ConcurrentHashMap<Integer,Page> pageStore;//页map
关键方法：
flushPage：刷新页面到磁盘，把脏页刷新。
evictPage：淘汰缓冲池里的页面。

ant runsystest -Dtest=EvictionTest
系统测试BUILD SUCCESSFUL。

weixin_Akian

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
6.830 / 6.814: Syllabus 2021 - MIT Lab 2 - SimpleDB Operators

在此实验中，您将为SimpleDB编写一组运算符，以实现表修改（例如，插入和删除记录），选择，连接和聚合。这些将构建在实验1中写入的基础之上，为您提供一个可以在多个表上执行简单查询的数据库系统。此外，我们忽略了实验室中的缓冲池管理问题：我们没有处理我们引用更多页面时出现的问题，而不是在数据库的生命周期中符合内存。在Lab 2中，您将设计一个驱动策略来从缓冲池中刷新陈旧页面。您无需在此实验室中实施事务或锁定。本文档的其余部分给出了一些关于如何开始编码的建议，描述了一组练习，以帮助您通过实验室工作，并
复制链接

扫一扫