Lab2 表操作符的实现
Exercise 1 Filter 和 Join 的实现
-
Predicate.java 很简单(是比较表内的字段和提供的数据),就是实例一下传入字段序号、比较符和待比较的数,主要是在构造函数中实例化三个成员变量,然后补充一下get方法以及filter函数(返回True or False)
public Predicate(int field, Op op, Field operand) { // some code goes here this.field = field; this.op = op; this.operand = operand; } public boolean filter(Tuple t) { // some code goes here return t.getField(field).compare(op,operand); }
-
JoinPredicate.java 和 Predicate.java实现的东西是一样的,只不过是表内字段的比较,两个都要用到getField方法。
public JoinPredicate(int field1, Predicate.Op op, int field2) { // some code goes here this.field1 = field1; this.field2 = field2; this.op = op; } public boolean filter(Tuple t1, Tuple t2) { // some code goes here return t1.getField(field1).compare(op,t2.getField(field2)); }
-
Filter.java就是对Predicate.java的使用,在构造函数中实例化Predicate 和 OpIterator,通过fetchNext函数逐个读取OpIterator中的Tuple,然后让他们与Predicate 中的字段进行比较,如果为真则返回该Tuple。
private final Predicate p; private OpIterator child; public Filter(Predicate p, OpIterator child) { // some code goes here this.p = p; this.child = child; } protected Tuple fetchNext() throws NoSuchElementException, TransactionAbortedException, DbException { // some code goes here while(child.hasNext()){ Tuple t = child.next(); if(p.filter(t)){ return t; } } return null; }
-
Join.java 就是对JoinPredicate.java的使用,通过构造函数实例化JoinPredicate 和两个OpIterator 。实现一系类get方法和open、close等迭代器的函数。最后完成fetchNext函数找到两个迭代器中可以jion的字段进行join。
fetchNext中由两个while循环进行遍历,直到最外层迭代器遍历完成,每次遍历child1取出一个Tuple,与child2中的所有Tuple做filter比较,直到有符合要求的,创建新的TupleDesc,并且将child1和child2的字段(field),加入newTuple中,然后返回newTuple,同时将child2重置到最开始。
protected Tuple fetchNext() throws TransactionAbortedException, DbException { // some code goes here while(this.child1.hasNext() || this.t != null){ if(this.child1.hasNext() && this.t == null){ t = child1.next(); } while(child2.hasNext()){ Tuple t2 = child2.next(); if(p.filter(t,t2)){ TupleDesc td1 = t.getTupleDesc(); TupleDesc td2 = t2.getTupleDesc(); TupleDesc newTd = TupleDesc.merge(td1,td2); Tuple newTuple = new Tuple(newTd); newTuple.setRecordId(t.getRecordId()); int i=0; for(;i<td1.numFields();++i) newTuple.setField(i,t.getField(i)); for(int j=0;j<td2.numFields();++j) newTuple.setField(i+j,t2.getField(j)); if(!child2.hasNext()){ child2.rewind(); t = null; } return newTuple; } } child2.rewind(); t = null; } return null; }
Exercise 2 Aggregates实现
-
IntegerAggregator.java 做的是当聚合函数处理的数据是Integer时应该怎么处理,首先需要实例化的是groupby的字段位置、要聚合的数据的位置、要进行的聚合操作、groupby的字段的数据类型。目前只支持String 和 Integer。
通过判断gbfieldtype是什么数据类型,然后进行如下类似的操作将group的字段和聚合数据的字段的值提取出来,然后放进HashMap表中(以group的字段为Key,List为value),通过对HashMap进行操作更新聚类的结果。
public void mergeTupleIntoGroup(Tuple tup) { // some code goes here if(gbfield == Aggregator.NO_GROUPING) { List<Integer> temp = (List<Integer>) groupValAndAggregateVals; int value = ((IntField) tup.getField(afield)).getValue(); update(temp, value); } else if(gbfieldtype == Type.INT_TYPE) { HashMap<Integer, List<Integer>> map = (HashMap<Integer, List<Integer>>)groupValAndAggregateVals; int group = ((IntField) tup.getField(gbfield)).getValue(); int value = ((IntField) tup.getField(afield)).getValue(); List<Integer> temp; if(map.containsKey(group)) temp = map.get(group); else { temp = new ArrayList<>(); map.put(group, temp); } update(temp, value); } else if(gbfieldtype == Type.STRING_TYPE) { HashMap<String, List<Integer>> map = (HashMap<String, List<Integer>>)groupValAndAggregateVals; String group = ((StringField) tup.getField(gbfield)).getValue(); int value = ((IntField) tup.getField(afield)).getValue(); List<Integer> temp; if(map.containsKey(group)) temp = map.get(group); else { temp = new ArrayList<>(); map.put(group, temp); } update(temp, value); } } private void update(List<Integer> l, int value) { if(op == Op.MIN) { if(l.size() == 0) l.add(value); else l.set(0, Math.min(l.get(0), value)); } else if(op == Op.MAX) { if(l.size() == 0) l.add(value); else l.set(0, Math.max(l.get(0), value)); } else if(op == Op.COUNT) { if(l.size() == 0) l.add(1); else l.set(0, l.get(0) + 1); } else if(op == Op.SUM) { if(l.size() == 0) l.add(value); else l.set(0, l.get(0) + value); } else if (op == Op.AVG) { l.add(value); } }
然后就是迭代器,明白上面的mergeTupleIntoGroup函数了就明白这里了,其中要注意的是这行代码
List<Integer> temp = (List<Integer>) groupValAndAggregateVals;
我一开始非常不明白为啥要这么做,我以为这么做就没有操作到groupValAndAggregateVals后面百度了才知道,原来是java中的引用,groupValAndAggregateVals是一个Object类型的(这块得看看jvm了)。
然后迭代器就是根据聚合操作的不同进行,字段的填写,最后返回一个迭代器。
private List<Tuple> tuples; private Iterator<Tuple> tupleIterator; public IntegerAggregatorIterator() { tuples = new ArrayList<>(); if(gbfield == Aggregator.NO_GROUPING) { List<Integer> temp = (List<Integer>) groupValAndAggregateVals; Tuple t = new Tuple(getTupleDesc()); int aggregateVal = 0; if(op == Op.AVG) { for(int v : temp) { aggregateVal = aggregateVal + v; } aggregateVal = aggregateVal / temp.size(); t.setField(0, new IntField(aggregateVal)); } else { t.setField(0, new IntField(temp.get(0))); } tuples.add(t); } else if(gbfieldtype == Type.INT_TYPE) { HashMap<Integer, List<Integer>> temp = (HashMap<Integer, List<Integer>>) groupValAndAggregateVals; for(int key : temp.keySet()) { Tuple t = new Tuple(getTupleDesc()); int groupVal = key; int aggregateVal = 0; List<Integer> l = temp.get(groupVal); if(op == Op.AVG) { for(int v : l) aggregateVal = aggregateVal + v; aggregateVal = aggregateVal / l.size(); t.setField(0, new IntField(groupVal)); t.setField(1, new IntField(aggregateVal)); } else { t.setField(0, new IntField(groupVal)); t.setField(1, new IntField(l.get(0))); } tuples.add(t); } } else if(gbfieldtype == Type.STRING_TYPE) { HashMap<String, List<Integer>> temp = (HashMap<String, List<Integer>>) groupValAndAggregateVals; for(String key : temp.keySet()) { Tuple t = new Tuple(getTupleDesc()); String groupVal = key; int aggregateVal = 0; List<Integer> l = temp.get(groupVal); if(op == Op.AVG) { for(int v : l) aggregateVal = aggregateVal + v; aggregateVal = aggregateVal / l.size(); t.setField(0, new StringField(groupVal, groupVal.length())); t.setField(1, new IntField(aggregateVal)); } else { t.setField(0, new StringField(groupVal, groupVal.length())); t.setField(1, new IntField(l.get(0))); } tuples.add(t); } } }
-
StringAggregator.java 就相对简单了,因为是对String类型进行聚合,所以只有count操作,还是老样子实例化的是groupby的字段位置、要聚合的数据的位置、要进行的聚合操作、groupby的字段的数据类型,已经存储分组的groupMap。
当不需要分组时直接返回总的数值,需要分组时则返回分组值和每组的个数。
-
Aggregate.java就是对IntegerAggregator.java 和 StringAggregator.java联合使用,通过输入一串的tuple,通过判断gfieldType来判断是用Interger聚合还是String聚合,创建完聚合对象后,将tuple一个一个的通过mergeTupleIntoGroup方法传进去,然后生成迭代器进行迭代。
Exercise 3 HeapFile 修改 (修改表)
-
HeapPage.java 就是单个表的单个页,在 HeapPage.java 中实现 deleteTuple 和 insertTuple 这两个方法。
在deleteTuple中先获取Tuple的编号,然后判断该位置的Tuple是否存在或者被使用,如果都不是就置为空位置,然后该位置编号设为null
public void deleteTuple(Tuple t) throws DbException { // some code goes here // not necessary for lab1 if(t == null) { throw new DbException("tuple is null!"); } HeapPageId heapPageId = (HeapPageId) t.getRecordId().getPageId(); int tupleNum = t.getRecordId().getTupleNumber(); if (!heapPageId.equals(pid) || !isSlotUsed(tupleNum)) { throw new DbException("this tuple is not on this page, or tuple slot is already empty"); } tuples[tupleNum]=null; markSlotUsed(tupleNum, false); }
在insertTuple中,先判断插入的Tuple的TupleDesc格式是否与表中一致,然后再判断当前页面是否还有空的位置,如果都满足,就找到空位置将要插入的TupleDesc插入。
public void insertTuple(Tuple t) throws DbException { // some code goes here // not necessary for lab if(!t.getTupleDesc().equals(td)) { throw new DbException("Can't insert! TupleDesc is not equal!"); } if(getNumEmptySlots() == 0) { throw new DbException("page is full!"); } for(int i = 0;i < getNumTuples(); i++) { if (!isSlotUsed(i)) { tuples[i] = t; //修改tuple的信息,表明它现在存储在这个page上 t.setRecordId(new RecordId(pid, i)); markSlotUsed(i,true); return; } } }
-
HeapFile.java 就是数据的底层表现,通过将数据从硬盘中读到BufferPool中以便后续操作,本练习需完成 insertTuple 和 deleteTuple两个函数。
在inserTuple中我们需要返回一个受到修改的Page,遍历当前HeapFile所有在BufferPool中的HeapPage,查看是否有空位,如果有空位,就将Tuple插入,并返回插入的Page。如果遍历都没有空位,就新建一个HeapPage并将HeapPage写入磁盘中,然后再通过BufferPool进行访问并插入。最后再返回该HeapPage。
public List<Page> insertTuple(TransactionId tid, Tuple t) throws DbException, IOException, TransactionAbortedException { // some code goes here // not necessary for lab1 List<Page> affectedPages = new ArrayList<>(); for(int i = 0; i < numPages(); i++) { PageId pageId = new HeapPageId(getId(), i); HeapPage heapPage = (HeapPage) Database.getBufferPool().getPage(tid, pageId, Permissions.READ_WRITE); if(heapPage.getNumEmptySlots() > 0) { heapPage.insertTuple(t); heapPage.markDirty(true, tid); affectedPages.add(heapPage); break; } } if(affectedPages.size() == 0) { HeapPageId heapPageId = new HeapPageId(getId(), numPages()); HeapPage blankPage = new HeapPage(heapPageId, HeapPage.createEmptyPageData()); // 将空页写入磁盘 writePage(blankPage); // 通过BufferPool来访问该新的page HeapPage newPage = (HeapPage) Database.getBufferPool().getPage(tid, heapPageId, Permissions.READ_WRITE); newPage.insertTuple(t); newPage.markDirty(true, tid); affectedPages.add(newPage); } return affectedPages; }
deleteTuple 就相对简单了,就是在BufferPool中找到对应的PageId,然后删掉改Tuple 就完事了。
public List<Page> deleteTuple(TransactionId tid, Tuple t) throws DbException, TransactionAbortedException { // some code goes here // not necessary for lab1 List<Page> affectedPages = new ArrayList<>(); PageId pageId = t.getRecordId().getPageId(); for (int i = 0; i < numPages(); i++) { if (i == pageId.getPageNumber()) { HeapPage heapPage = (HeapPage) Database.getBufferPool().getPage(tid, pageId, Permissions.READ_WRITE); heapPage.deleteTuple(t); heapPage.markDirty(true, tid); affectedPages.add(heapPage); } } return affectedPages; }
然后还要修改一下BufferPool.java,结合前面的HeapFile,insertTuple 和 deleteTuple 都是是用tableId 通过Database找到 要操作要操作的Dbfile,然后再通过pageStore进行更换。
public void insertTuple(TransactionId tid, int tableId, Tuple t) throws DbException, IOException, TransactionAbortedException { // some code goes here // not necessary for lab1 DbFile f = Database.getCatalog().getDatabaseFile(tableId); updateBufferPool(f.insertTuple(tid,t),tid); } public void deleteTuple(TransactionId tid, Tuple t) throws DbException, IOException, TransactionAbortedException { // some code goes here // not necessary for lab1 DbFile f = Database.getCatalog().getDatabaseFile(t.getRecordId().getPageId().getTableId()); updateBufferPool(f.deleteTuple(tid,t),tid); } private void updateBufferPool(List<Page> pagelist, TransactionId tid) throws DbException{ for(Page p:pagelist){ p.markDirty(true,tid); // update bufferpool if(pageStore.size() > numPages) evictPage(); pageStore.put(p.getId(),p); } }
Exercise 4 Insertion and deletion 操作符实现
-
Insertion.java 就是实现在表中插入Tuple,通过上面BufferPool中的insertTuple能够很好的完成,通过构造函数实例化 事物Id、Tuple迭代器和待插入表的ID等。
public Insert(TransactionId t, OpIterator child, int tableId) throws DbException { // some code goes here if(!child.getTupleDesc().equals(Database.getCatalog().getTupleDesc(tableId))){ throw new DbException("TupleDesc does not match!"); } this.tid = t; this.child = child; this.tableId = tableId; this.td = new TupleDesc(new Type[]{Type.INT_TYPE},new String[]{"number of inserted tuples"}); this.counter = -1; this.called = false; }
然后实现的就是 fetchNext函数,通过遍历Tuple迭代器,一个一个的通过BufferPool中的insertTuple方法进行Tuple的插入,然后返回一个Tuple,其TupleDesc为(new Type[]{Type.INT_TYPE},new String[])然后通过设置Tuple(0,counter)。将该Tuple返回。
protected Tuple fetchNext() throws TransactionAbortedException, DbException { // some code goes here if (this.called) return null; this.called = true; while (this.child.hasNext()) { Tuple t = this.child.next(); try { Database.getBufferPool().insertTuple(this.tid, this.tableId, t); this.counter++; } catch (IOException e) { e.printStackTrace(); break; } } Tuple tu = new Tuple(this.td); tu.setField(0, new IntField(this.counter)); return tu; }
-
Delete.java 也跟 Insert.java差不多,构造函数的实现以及FehchNext的实现都是大同小异。
public Delete(TransactionId t, OpIterator child) { // some code goes here this.tid = t; this.child = child; this.td = new TupleDesc(new Type[] {Type.INT_TYPE}, new String[] {"number of deleted tuples"}); this.counter = -1; this.called = false; } protected Tuple fetchNext() throws TransactionAbortedException, DbException { // some code goes here if (this.called) return null; this.called = true; while (this.child.hasNext()) { Tuple t = this.child.next(); try { Database.getBufferPool().deleteTuple(this.tid, t); this.counter++; } catch (IOException e) { e.printStackTrace(); break; } } Tuple tu = new Tuple(this.td); tu.setField(0, new IntField(this.counter)); return tu; }
Exercise 5 Page eviction 实现
-
BufferPool.java的驱逐策略很简单,你想怎么设置就怎么设置,一直删除一个也行,先进先出也行,只需要记住当页面多的时候就进行驱逐,在驱逐中是先刷进硬盘中,然后然后再在BufferPool中更新待插入的页面。
private void updateBufferPool(List<Page> pagelist, TransactionId tid) throws DbException{ for(Page p:pagelist){ p.markDirty(true,tid); // update bufferpool if(pageStore.size() > numPages) evictPage(); pageStore.put(p.getId(),p); } } private synchronized void evictPage() throws DbException { // some code goes here // not necessary for lab1 PageId pid = new ArrayList<>(pageStore.keySet()).get(0); try{ flushPage(pid); }catch(IOException e){ e.printStackTrace(); } discardPage(pid); } private synchronized void flushPage(PageId pid) throws IOException { // some code goes here // not necessary for lab1 Page p = pageStore.get(pid); TransactionId tid = null; // flush it if it is dirty if((tid = p.isDirty())!= null){ Database.getLogFile().logWrite(tid,p.getBeforeImage(),p); Database.getLogFile().force(); // write to disk Database.getCatalog().getDatabaseFile(pid.getTableId()).writePage(p); p.markDirty(false,null); } } public synchronized void discardPage(PageId pid) { // some code goes here // not necessary for lab1 pageStore.remove(pid); }
参考文章:
https://blog.csdn.net/hjw199666/category_9588041.html 特别鸣谢hjw199666 在我完成6.830的道路上给了很多代码指导,我的很多代码都是基于他的改的
https://www.zhihu.com/people/zhi-yue-zhang-42/posts