Lab2主要包括5个exercise
exercise1 中要实现过滤和连接,即Filter和Join操作符。
exercise2 中要实现聚合操作,即Aggregate操作符。对Integer类型进行聚合时需要能够实现MAX
、MIN
、COUNT
、SUM
、AVG
,对String类型进行聚合操作时,只需实现COUNT
。
exercise3 中要实现修改表的方法,从单个页面和文件的级别完成添加元组和删除元组的操作。
exercise4 实现Insert和Delete操作符。基于exercise3 中实现的方法,实现Insert 和 Delete操作符。
exercise5 实现BufferPool中的页面置换算法。
exercise 1
实现Predicate、JoinPredicate、Filter、Join
Predicate类
Filter类的辅助类,用于筛选满足条件的tuple。将tuple中的字段与指定的字段进行比较,实现对单个tuple的过滤操作,比较逻辑有==
、 >=
、 <=
、 >
、 <
、 !=
、 LIKE
(主要针对字符串)
参数:
Field opField
:指定的比较字段,Field.compare()中的参数int fieldNo
:tuple中与指定字段对应的字段 的 序号Op op
:执行的比较逻辑,Field.compare()中的参数
public enum Op implements Serializable {
EQUALS, GREATER_THAN, LESS_THAN, LESS_THAN_OR_EQ, GREATER_THAN_OR_EQ, LIKE, NOT_EQUALS;
//== > < <= >= LIKE主要针对字符串 !=
public static Op getOp(int i) {
return values()[i];
}
public String toString() {
if (this == EQUALS)
return "=";
if (this == GREATER_THAN)
return ">";
if (this == LESS_THAN)
return "<";
if (this == LESS_THAN_OR_EQ)
return "<=";
if (this == GREATER_THAN_OR_EQ)
return ">=";
if (this == LIKE)
return "LIKE";
if (this == NOT_EQUALS)
return "<>";
throw new IllegalStateException("impossible to reach here");
}
}
方法:
public Predicate(int field, Op op, Field operand)
:初始化方法public int getField()
:返回fieldNopublic Op getOp()
:返回oppublic Field getOperand()
:返回opFieldpublic boolean filter(Tuple t)
:对元组t 进行比较public boolean filter(Tuple t) { // some code goes here Field field = t.getField(fieldNo); return field.compare(op,opField); }
field.compare(op,opField)
将filed与opField的进行比较
Filter类
Filter实现了Operator接口。根据Predicate的判读结果,得到满足条件的tuples。实现了where age > 18这样的操作。
参数:
Predicate predicate;
:对Predicate封装,通过predicate实现对每一个tuple的过滤操作
OpIterator child;
:待过滤的tuples的迭代器
方法:
public Filter(Predicate p, OpIterator child)
:初始化方法public Predicate getPredicate()
:返回predicatepublic TupleDesc getTupleDesc()
:返回待过滤元组的属性,用child.getTupledesc()
即可public void open()
:Filter是项目中的Operator类的子类, 需要执行super.open()
public void close()
:对child 和 super进行closeprotected Tuple fetchNext()
:返回过滤后的tuple
protected Tuple fetchNext() throws NoSuchElementException,
TransactionAbortedException, DbException {
// some code goes here
while (child.hasNext()){
Tuple tuple = child.next();
if (predicate.filter(tuple)){
return tuple;
}
}
return null;
}
public OpIterator[] getChildren()
:返回待过滤的tuples
public OpIterator[] getChildren() {
// some code goes here
return new OpIterator[] {child};
}
public void setChildren(OpIterator[] children)
:重置待过滤的tuples
public void setChildren(OpIterator[] children) {
child = children[0];
// some code goes here
}
JoinPredicate类
实现连接的条件,和Predicate类似 ,是JoinPredicate的辅助类,对两个tuple中的 某一字段进行比较。
参数:
int field1No;
:tuple1中进行比较的字段的序号int field2No;
tuple2中进行比较的字段的序号Predicate.Op op;
比较逻辑
方法:
public JoinPredicate(int field1, Predicate.Op op, int field2)
:初始化方法public int getField1()
:返回fieldNo1public int getField2()
:返回fieldNo2public Predicate.Op getOperator()
:返回oppublic boolean filter(Tuple t1, Tuple t2)
:对t1中的fieldNo1字段 与t2 中的fieldNo2字段进行比较
public boolean filter(Tuple t1, Tuple t2) {
// some code goes here
Field field1 = t1.getField(field1No);
Field field2 = t2.getField(field2No);
return field1.compare(op,field2);
}
Join类
同样实现了Operator接口。实现连接操作(自然连接),给定两组tuples对这两组tuples中满足JoinPredicate操作的两个tuple进行连接。连接方式采用嵌套-循环连接
嵌套-循环连接伪代码
for each 元组tr in r do begin
for each 元组ts in s do begin
测试元组对(tr,ts)是否满足连接条件,
如果满足,把tr·ts 加到结果中
end
end
参数:
JoinPredicate joinPredicate;
OpIterator child1;
:用于连接的left tuplesOpIterator child2;
:用于连接的right tuplesTuple left;
:fetchNext()
方法每次获得一个连接后的结果,用嵌套循环每次选择child中的一个left 依次 与child2中符合条件的 right 进行连接,将child2中的所有tuples比较完之后,left = child1.next()
,child2.rewind()
方法:
public Join(JoinPredicate p, OpIterator child1, OpIterator child2)
:初始化方法public JoinPredicate getJoinPredicate()
:返回joinPredicatepublic String getJoinField1Name()
:left tuple中进行比较的字段的名字public String getJoinField2Name()
:right tuple中进行比较的字段的名字public TupleDesc getTupleDesc()
:返回连接后的tuple的属性,TupleDesc中的merge操作protected Tuple fetchNext()
:先获取child1中的一个tuple 赋值给left,left依次与child2中的tuples进行比较,与满足连接条件的 right 进行连接并返回连接后的tuple,遍历完child2之后,left = child1.next()
,child2.rewind()
private Tuple left;
protected Tuple fetchNext() throws TransactionAbortedException, DbException {
// some code goes here
while(child1.hasNext() || left != null){
if(child1.hasNext() && left == null){
left = child1.next();
}
Tuple right;
while (child2.hasNext()){
right = child2.next();
if(joinPredicate.filter(left,right)){
int len1 = left.getTupleDesc().numFields();
int len2 = right.getTupleDesc().numFields();
Tuple tuple = new Tuple(getTupleDesc()); //tuple的recordID应该设置成什么?
int len = getTupleDesc().numFields();
for(int i=0; i<len1; i++){
tuple.setField(i,left.getField(i));
}
for(int i=0; i<len2; i++){
tuple.setField(i+len1,right.getField(i));
}
System.out.println(tuple.toString());
return tuple;
}
}
left = null;
child2.rewind();
}
return null;
}
public OpIterator[] getChildren()
:返回child1、child2
public OpIterator[] getChildren() {
// some code goes here
return new OpIterator[] {child1,child2};
}
public void setChildren(OpIterator[] children)
:
public void setChildren(OpIterator[] children) {
// some code goes here
child1 = children[0];
child2 = children[1];
}
public void open()
public void close()
public void rewind()
exercise 2
exercise2中实现五中SQL聚合(COUNT,SUM,AVG,MIN,MAX),同时支持分组。调用为了计算聚合,我们使用Aggregator接口,该接口将新tuple通过Aggregator.mergeTupleIntoGroup() 融进已存在的一个聚合。当所有的tuple都被合并后,可以得到聚合结果的迭代器。结果中的每个元组有分组字段、聚合字段(groupValue, aggregateValue),当 group by 字段的值是Aggregator.NO_GROUPING时,结果中的元组是单个字段(aggregateValue)。
实现IntegerAggregator、StringAggregator、Aggregate类
IntergerAggregator类
对整数类型的字段进行分组聚合操作
参数:
int gbFieldIndex;
:分组字段的序号Type gbFieldType;
:分组字段的类型int aggFieldIndex;
:聚合字段的序号AggHandler aggHandler;
:进行的聚合操作。自定义的AggHandler接口,实现了接口的类有CountHandler、SumHandler、MaxHandler、MinHandler、AvgHandler
AggHandler中aggResult
用于保存聚合后的结果(Filed是用于分组的gbField,gbFieIndex == NO_GROUPING时为null Integer是聚合结果)。抽象方法handle
,实现聚合操作,每个子类实现了不同的handle方法
private abstract class AggHandler{
HashMap<Field,Integer> aggResult;
//Filed是用于分组的gbField,gbFieIndex == NO_GROUPING时为null Integer是聚合结果
abstract void handle(Field gbField, IntField aggField);
public AggHandler(){
aggResult = new HashMap<>();
}
public HashMap<Field,Integer> getAggResult(){
return aggResult;
}
}
private class CountHandler extends AggHandler{
@Override
void handle(Field gbField, IntField aggField) {
if(aggResult.containsKey(gbField)){
aggResult.put(gbField, aggResult.get(gbField) + 1);
} else {
aggResult.put(gbField, 1);
}
}
}
private class SumHandler extends AggHandler{
@Override
void handle(Field gbField, IntField aggField) {
int value = aggField.getValue();
if(aggResult.containsKey(gbField)){
aggResult.put(gbField, aggResult.get(gbField) + value);
} else {
aggResult.put(gbField, value);
}
}
}
private class MaxHandler extends AggHandler{
@Override
void handle(Field gbField, IntField aggField) {
int value = aggField.getValue();
if(aggResult.containsKey(gbField)){
aggResult.put(gbField, Math.max(aggResult.get(gbField) , value));
} else {
aggResult.put(gbField, value);
}
}
}
private class MinHandler extends AggHandler{
@Override
void handle(Field gbField, IntField aggField) {
int value = aggField.getValue();
if(aggResult.containsKey(gbField)){
aggResult.put(gbField, Math.min(aggResult.get(gbField) , value));
} else {
aggResult.put(gbField, value);
}
}
}
private class AvgHandler extends AggHandler{
HashMap<Field, Integer> sum;
HashMap<Field, Integer> count;
private AvgHandler(){
sum = new HashMap<>();
count = new HashMap<>();
}
@Override
void handle(Field gbField, IntField aggField) {
int value = aggField.getValue();
if(sum.containsKey(gbField) && count.containsKey(gbField)){
sum.put(gbField, sum.get(gbField) + value);
count.put(gbField, count.get(gbField) + 1);
} else {
sum.put(gbField, value);
count.put(gbField, 1);
}
int avg = sum.get(gbField) / count.get(gbField);
aggResult.put(gbField, avg);
}
}
方法:
public IntegerAggregator(int gbfield, Type gbfieldtype, int afield, Op what)
:初始化方法,根据不同的聚合操作使用aggHandler对应的子类
public IntegerAggregator(int gbfield, Type gbfieldtype, int afield, Op what) {
// some code goes here
gbFieldIndex = gbfield;
gbFieldType = gbfieldtype;
aggFieldIndex = afield;
switch (what) {
case MIN:
aggHandler = new MinHandler();
break;
case MAX:
aggHandler = new MaxHandler();
break;
case SUM:
aggHandler = new SumHandler();
break;
case COUNT:
aggHandler = new CountHandler();
break;
case AVG:
aggHandler = new AvgHandler();
break;
default:
throw new UnsupportedOperationException("Unsupported aggregation operator ");
}
}
public void mergeTupleIntoGroup(Tuple tup)
:聚合操作的执行过程是:先读取一个tuple进行聚合操作,得到一个只聚合了一个tuple的聚合结果,之后每读取一个tuple就将其加入到聚合结果中重新进行聚合
public void mergeTupleIntoGroup(Tuple tup) {
// some code goes here
Field gbField;
IntField aggField = (IntField) tup.getField(aggFieldIndex);
if(gbFieldIndex == NO_GROUPING){
gbField = null;
} else {
gbField = tup.getField(gbFieldIndex);
}
aggHandler.handle(gbField,aggField);
}
public OpIterator iterator()
:返回聚合结果的迭代器。结果集中的每个元组都有(groupValue, aggregateValue)两个字段。当 group by 字段的值是Aggregator.NO_GROUPING时,结果中的元组只有(aggregateValue)一个字段。返回结果的类型是OpIterator 可以用TupleIterator对结果进行封装
public OpIterator iterator() {
// some code goes here
HashMap<Field,Integer> result = aggHandler.getAggResult();
Type[] fieldTypes;
String[] fieldNames;
TupleDesc tupleDesc;
List<Tuple> tuples = new ArrayList<>();
if(gbFieldIndex == NO_GROUPING){
fieldTypes = new Type[]{Type.INT_TYPE};
fieldNames = new String[]{"aggregateValue"};
tupleDesc = new TupleDesc(fieldTypes,fieldNames);
Tuple tuple = new Tuple(tupleDesc);
IntField resultField = new IntField(result.get(gbField));
tuple.setField(0,resultField);
tuples.add(tuple);
} else {
fieldTypes = new Type[]{gbFieldType,Type.INT_TYPE};
fieldNames = new String[]{"groupByValue" , "aggregateValue"};
tupleDesc = new TupleDesc(fieldTypes,fieldNames);
for(Field field : result.keySet()){
Tuple tuple = new Tuple(tupleDesc);
if(gbFieldType == Type.INT_TYPE){
IntField gbField = (IntField)field;
tuple.setField(0,gbField);
} else {
StringField gbField = (StringField) field;
tuple.setField(0,gbField);
}
IntField resultField = new IntField(result.get(field));
tuple.setField(1,resultField);
tuples.add(tuple);
}
}
return new TupleIterator(tupleDesc,tuples);
}
StringAggerator类
对String类型的字段实现分组聚合操作,和实现IntegerAggerator类似,只需要实现count。
Aggerate类
对IntegerAggerator、StringAggerator进行封装
参数:
OpIterator child;
:需要聚合的tuplesint aggFieldIndex;
:待聚合字段的序号int gbFieldIndex;
:分组字段的序号Aggregator.Op aop;
:运算符Aggregator aggregator;
:进行聚合操作的类OpIterator aggIterator;
:聚合结果的迭代器TupleDesc aggTupleDesc;
:聚合结果的属性行
方法:
public Aggregate(OpIterator child, int afield, int gfield, Aggregator.Op aop)
:初始化方法,在初始化方法中得到聚合结果的TupleDesc 方便后序使用
public Aggregate(OpIterator child, int afield, int gfield, Aggregator.Op aop) {
// some code goes here
this.child = child;
aggFieldIndex = afield;
gbFieldIndex = gfield;
this.aop = aop;
Type gbFieldType = gfield == Aggregator.NO_GROUPING ? null : child.getTupleDesc().getFieldType(gfield);
Type aggFieldType = child.getTupleDesc().getFieldType(afield);
Type[] fieldTypes ;
String[] fieldNames;
String aggFieldName = String.format("%s(%s)",aop.toString(), child.getTupleDesc().getFieldName(afield));
if(gbFieldType == null){
fieldTypes = new Type[]{aggFieldType};
fieldNames = new String[]{aggFieldName};
} else {
fieldTypes = new Type[]{gbFieldType,aggFieldType};
String gbFieldName = child.getTupleDesc().getFieldName(gfield);
fieldNames = new String[]{gbFieldName,aggFieldName};
}
aggTupleDesc = new TupleDesc(fieldTypes,fieldNames);
}
public int groupField()
:返回分组字段的序号,如果没有分组字段则返回Aggerator.NO_GROUPINGpublic String groupFieldName()
:返回分组字段的字段名public int aggregateField()
:返回待聚合字段的序号public String aggregateFieldName()
:返回待聚合字段的字段名public Aggregator.Op aggregateOp()
:返回aoppublic void open()
:open时要根据aggFieldType得到响应的Aggerator,然后通过Aggerator进行聚合操作,用aggIterator保存聚合后的结果
public void open() throws NoSuchElementException, DbException,
TransactionAbortedException {
// some code goes here
super.open();
child.open();
Type gbFieldType = gbFieldIndex == -1 ? null : child.getTupleDesc().getFieldType(gbFieldIndex);
Type aggFieldType = child.getTupleDesc().getFieldType(aggFieldIndex);
if(aggFieldType == Type.INT_TYPE){
aggregator = new IntegerAggregator(gbFieldIndex,gbFieldType,aggFieldIndex,aop);
} else {
aggregator = new StringAggregator(gbFieldIndex,gbFieldType,aggFieldIndex,aop);
}
while (child.hasNext()){
aggregator.mergeTupleIntoGroup(child.next());
}
aggIterator = aggregator.iterator();
aggIterator.open();
}
public void close()
protected Tuple fetchNext()
:返回aggIterator中的tuplepublic void rewind()
public TupleDesc getTupleDesc()
:返回聚合后的结果集的TupleDescpublic OpIterator[] getChildren()
public void setChildren(OpIterator[] children)
exercise 3
实现HeapPage、HeapFile、BufferPool 中的insertTuple、deleteTuple
实现对表的添加和删除。先从独立物理页和文件的层次开始,这有两组主要的操作符,adding tuples和removing tuples。
Removing tuples: 要移除一个Tuple,需要实现deleteTuple。Tuples包含了RecordIDs,能帮助找到tuple存储的物理页,所以大致思路就是找到tuple所属的物理页,正确地修改物理页的headers。
Adding tuples: HeapFile.java中的insertTuple方法负责添加一个Tuple到heap file。为了增加一个新的tuple到HeapFile,需要找到页中一个空闲的slot。如果HeapFile中不存在这样的页,需要创建一个新页,并添加新页到磁盘上的物理文件中。需要保证tuple中的RecordID正确地更新。
HeapFile, Lab1的笔记中为了方便理解读取page的过程用它来代表磁盘文件,严谨的来说它应该是磁盘文件的接口,通过readPage()
从磁盘中获取page,通过writePage()
将新增的page写入到磁盘
假设HeapFile 为 磁盘中表table1的接口
插入tuple时,BufferPool首先调用HeapFile的insertTuple()
方法,HeapFile执行insertTuple()
方法时通过调用BufferPool的getPage()
方法(因为数据库只能通过BufferPool读取page),遍历磁盘中table1的所有page,(当table1中的page在BufferPool中时,直接返回给HeapFile,若不在BufferPool中,BufferPool则调用HeapFile的readPage()
方法从磁盘中读取page,将它保存在BufferPool中然后在返回给HeapFile,HeapFile和BufferPool是相互调用的关系),找到一个存在空slot的page,将待插入的tuple插入到该page中,然后将插入了新tuple的page返回给BufferPool。如果BufferPool未满则将该page保存起来,否则调用页面置换算法(见exercise5)移除一个page然后将其插入了新tuple的page保存起来,以便未来的请求能够读取到最新的页面。此时的情况是,HeapFile将tuple插入到page之后就将该page返回给了BufferPool,并没有调用writePage()
将page中发生的变化写入到磁盘,所以BufferPool中保存的page和磁盘中的page内容是不一致的(同一个pageId,BufferPool中的page比磁盘中的page多了一个tuple),此时就产生了脏页的情况,我们要个page设置一个boolean dirty
的标志位,当某个page被页面置换算法从BufferPool中移除的时候,判断该page是否为脏页,如果是脏页,则调用HeapFile的writePage()
方法将其写回到磁盘。
删除tuple的操作与插入的操作类似。
HeapPage
参数:
boolean dirty;
:脏页标志位TransactionId dirtyId;
:产生脏页的事务id
方法:
public void deleteTuple(Tuple t)
:删除page中的tuple,同时修改该slot对应的bitmap(通过markSlotUsed方法),表示该slot已为空
public void insertTuple(Tuple t) throws DbException {
// some code goes here
// not necessary for lab1
if(getNumEmptySlots() == 0){
throw new DbException("this page if full");
}
if(!t.getTupleDesc().equals(td)){
throw new DbException("tupledesc is mismatch");
}
for(int i=0; i<numSlots; i++){
if(!isSlotUsed(i)){
markSlotUsed(i,true);
t.setRecordId(new RecordId(pid,i));
tuples[i] = t;
return;
}
}
}
public void insertTuple(Tuple t)
:插入tuple,选择一个空的slot插入tuple,同时修改该slot对应的bitmap,表示该slot已被占用。
public void insertTuple(Tuple t) throws DbException {
// some code goes here
// not necessary for lab1
if(getNumEmptySlots() == 0){
throw new DbException("this page if full");
}
if(!t.getTupleDesc().equals(td)){
throw new DbException("tupledesc is mismatch");
}
for(int i=0; i<numSlots; i++){
if(!isSlotUsed(i)){
markSlotUsed(i,true);
t.setRecordId(new RecordId(pid,i));
tuples[i] = t;
return;
}
}
}
public void markDirty(boolean dirty, TransactionId tid)
:修改脏页标志位public TransactionId isDirty()
:判断该page是否为dirty page如果是则返回产生该脏页的事务idprivate void markSlotUsed(int i, boolean value)
:修改page中的header,value == true 在第i位添加, value == false 在第i位删除
private void markSlotUsed(int i, boolean value) {
// some code goes here
// not necessary for lab1
if(i < numSlots){
int index = i / 8;
int offset = i % 8;
byte mask = (byte)(0x1 << offset);
if(value){
header[index] |= mask;
} else {
header[index] &= ~mask;
}
}
}
HeapFile类
方法:
public void writePage(Page page)
:将page写入磁盘的操作。
public void writePage(Page page) throws IOException {
// some code goes here
// not necessary for lab1
PageId heapPgaeId = page.getId();
int tableid = heapPgaeId.getTableId();
int pgNo = heapPgaeId.getPageNumber();
final int pageSize = Database.getBufferPool().getPageSize();
byte[] pgData = page.getPageData();
RandomAccessFile dbfile = new RandomAccessFile(file, "rws");
dbfile.skipBytes(pgNo * pageSize);
dbfile.write(pgData);
}
public List<Page> insertTuple(TransactionId tid, Tuple t)
:将tuple插入到HeapFile中的page中,如果HeapFile中的page都已经满了,则在HeapFile中创建一个新的page
public List<Page> insertTuple(TransactionId tid, Tuple t)
throws DbException, IOException, TransactionAbortedException {
// some code goes here
// not necessary for lab1
ArrayList<Page> arrayList = new ArrayList<>();
for(int pgNo=0; pgNo<numPages(); pgNo++){
HeapPageId pageId = new HeapPageId(getId(),pgNo);
HeapPage heapPage = (HeapPage) Database.getBufferPool().getPage(tid,pageId,Permissions.READ_WRITE);
if(heapPage.getNumEmptySlots() != 0){
heapPage.insertTuple(t);
arrayList.add(heapPage);
return arrayList;
}
}
BufferedOutputStream bw = new BufferedOutputStream(new FileOutputStream(file,true));
byte[] emptyPage = HeapPage.createEmptyPageData();
bw.write(emptyPage);
bw.close();
HeapPageId newPageId = new HeapPageId(getId(),numPages()-1);
HeapPage newPage = (HeapPage) Database.getBufferPool().getPage(tid,newPageId,Permissions.READ_WRITE);
newPage.insertTuple(t);
arrayList.add(newPage);
return arrayList;
}
BufferedInputStream和BufferedOutputStream类就是实现了缓冲功能的输入流/输出流。使用带缓冲的输入输出流,效率更高,速度更快。
使用步骤:
1、创建FileOutputStream对象,构造方法中绑定要输出的目的文件
2、创建BufferedOutputStream对象
3、使用BufferedOutputStream对象中的方法write,把数据写入到内部缓冲区
4、使用BufferedOutputStream对象中的方法flush,把内部缓冲区中的数据,刷新到文件中。
5、释放资源(会先调用flush方法刷新数据,可省略)
BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream("bos.txt")) ; //写数据 bos.write("hello".getBytes()); //释放资源 bos.close();
public ArrayList<Page> deleteTuple(TransactionId tid, Tuple t)
:将HeapFile中某一page上的某一tuple 从page上删除
public ArrayList<Page> deleteTuple(TransactionId tid, Tuple t) throws DbException,
TransactionAbortedException {
// some code goes here
// not necessary for lab1
ArrayList<Page> arrayList = new ArrayList<>();
PageId pageId = t.getRecordId().getPageId();
HeapPage heapPage = (HeapPage) Database.getBufferPool().getPage(tid,pageId,Permissions.READ_WRITE);
heapPage.deleteTuple(t);
arrayList.add(heapPage);
return arrayList;
}
BufferPool类
InserTuple中的moveHead、addToHead等操作是为了让将pageId所对应的page节点移动到链表的最前端,详见exercise5的页面置换策略
方法:
public void insertTuple(TransactionId tid, int tableId, Tuple t)
:将tuple插入到指定的table(HeapFile)中,调用HeapFile中的insertTuple()
方法,将返回的结果保存到BufferPool中
public void insertTuple(TransactionId tid, int tableId, Tuple t)
throws DbException, IOException, TransactionAbortedException {
// some code goes here
// not necessary for lab1
DbFile heapFile = Database.getCatalog().getDatabaseFile(tableId);
List<Page> arrayList = heapFile.insertTuple(tid,t);
for(Page page : arrayList){
PageId pid = page.getId();
page.markDirty(true,tid);
if(!bufferPool.containsKey(pid)){
LinkedNode node = new LinkedNode(pid,page);
if(getBufferPoolSize() < numPages){
addToHead(node);
bufferPool.put(pid,node);
} else {
evictPage();
addToHead(node);
bufferPool.put(pid,node);
}
} else {
LinkedNode node = bufferPool.get(pid);
moveToHead(node);
node.page = page;
bufferPool.put(pid,node);
}
}
}
public void deleteTuple(TransactionId tid, Tuple t)
:调用HeapFile中的deleteTuple方法删除指定table中的tuple
public void deleteTuple(TransactionId tid, Tuple t)
throws DbException, IOException, TransactionAbortedException {
// some code goes here
// not necessary for lab1
DbFile heapFile = Database.getCatalog().getDatabaseFile(t.getRecordId().getPageId().getTableId());
List<Page> arrayList = heapFile.deleteTuple(tid,t);
for(Page page : arrayList){
page.markDirty(true,tid);
LinkedNode node = bufferPool.get(page.getId());
if(node == null){
if(getBufferPoolSize() < numPages){
LinkedNode temp = new LinkedNode(page.getId(),page);
addToHead(temp);
bufferPool.put(page.getId(),temp);
} else {
evictPage();
LinkedNode temp = new LinkedNode(page.getId(),page);
addToHead(temp);
bufferPool.put(page.getId(),temp);
}
} else {
LinkedNode temp = bufferPool.get(page.getId());
moveToHead(temp);
bufferPool.put(page.getId(),temp);
}
}
}
exercise 4
实现Insert 和 Delete操作符,最高的操作符是Insert和Delete,用于修改磁盘上的页数据,这些操作符返回受影响的tuples个数。
Insert: 这个操作符添加 child operator读取的tuples 到 tableid 代表的表中,需要用到BufferPool.insertTuple()方法实现。
Delete:这个操作符删除 child operator读取的tuples 到 tableid 代表的表中,需要用到BufferPool.deleteTuple()方法实现。
实现Insert类、Delete类,对exercise3进行封装即可
Insert类
实现了Operator接口,调用BufferPool的insertTuple()方法向给定的表中插入tuple
参数:
TransactionId transactionId;
:执行插入操作的事务idOpIterator child;
:待插入的tuple的迭代器int tableId;
:tuple插入的表idboolean isInserted;
标志位,避免fetchNext操作可以无限制的向下取TupleDesc tupleDesc;
:fetchNext()会返回一个表示插入了多少tuple的一个tuple,tupleDesc是该tuple的属性行fieldTypes == {Type.INT_TYPE}、fieldNames == {“numbers of instered tuples”}
方法:
public Insert(TransactionId t, OpIterator child, int tableId)
:初始化方法
public Insert(TransactionId t, OpIterator child, int tableId)
throws DbException {
// some code goes here
if(!Database.getCatalog().getTupleDesc(tableId).equals(child.getTupleDesc())){
throw new DbException("TupleDesc does not match!");
}
this.transactionId = t;
this.child = child;
this.tableId = tableId;
this.isInserted = false;
Type[] types = {Type.INT_TYPE};
String[] fieldNames = {"numbers of instered tuples"};
this.tupleDesc = new TupleDesc(types,fieldNames);
}
protected Tuple fetchNext()
:执行插入操作,返回包含插入了多少tuple的一个tuple
protected Tuple fetchNext() throws TransactionAbortedException, DbException {
// some code goes here
if(!isInserted){
isInserted = true;
int count = 0;
while (child.hasNext()){
Tuple tuple = child.next();
try {
Database.getBufferPool().insertTuple(transactionId,tableId,tuple);
count++;
}
catch (IOException e) {
e.printStackTrace();
}
}
Tuple tuple = new Tuple(tupleDesc);
tuple.setField(0,new IntField(count));
return tuple;
}
else{
return null;
}
}
public TupleDesc getTupleDesc()
:返回tupleDesc
exercise 5
实现BufferPool中的页面置换策略
Lab1中 当BufferPool满了之后,会抛出异常。该exercise中实现了BufferPool中的页面置换策略,采取的页面置换策略是LRU(最近最少使用)。Lab1中用的是HashMap<PageId,Page> bufferPool
保存的pageId与Page的映射关系,加入页面置换策略以后修改了HashMap的映射结构,改用了HashMap<PageId,LinkedNode> bufferPool
,LinkedNode是自定义的链表节点,节点内保存了pageID和page,以及前后两个节点prev、next。用LinkedNode构建一个链表,每当BufferPool中的page被访问时,将该pageId对应的LinkedNode移动到链表的头部,当有page需要放置到BufferPool中,且BufferPool的容量已经满时,则将最近最少使用的page淘汰,即链表的最后一个节点,然后将该page放置到BufferPool中。
参数:
HashMap<PageId,LinkedNode> bufferPool
:对HashMap的映射结构做了调整LinkedNode head, tail;
:添加了head、tail两个参数,作为链表的头结点和尾节点,方便将节点移动到前端,以及移除最后的节点。
class LinkedNode {
PageId pageId;
Page page;
LinkedNode prev;
LinkedNode next;
public LinkedNode() {}
public LinkedNode(PageId _pageId, Page _page) {pageId = _pageId; page = _page;}
}
private void addToHead(LinkedNode node) {
node.prev = head;
node.next = head.next;
head.next.prev = node;
head.next = node;
}
private void removeNode(LinkedNode node) {
node.prev.next = node.next;
node.next.prev = node.prev;
}
private void moveToHead(LinkedNode node) {
removeNode(node);
addToHead(node);
}
private LinkedNode removeTail() {
LinkedNode res = tail.prev;
removeNode(res);
return res;
}
方法:
public Page getPage(TransactionId tid, PageId pid, Permissions perm)
:当从BufferPool中读取page时,若BufferPool中有要读取的page则将其对应的LinkedNode移动到头部,若没有则去硬盘中读取,并将其生成LinkedNode放在链表的头部,如果BufferPool已满,则调用evictPage()
置换BufferPool中的页面
public Page getPage(TransactionId tid, PageId pid, Permissions perm)
throws TransactionAbortedException, DbException {
// some code goes here
if(!bufferPool.containsKey(pid)){
DbFile dbFile = Database.getCatalog().getDatabaseFile(pid.getTableId());
Page page = dbFile.readPage(pid);
LinkedNode node = new LinkedNode(pid,page);
if(numPages > bufferPool.size()){
addToHead(node);
bufferPool.put(pid,node);
return node.page;
}
else{
//LRU
evictPage();
addToHead(node);
bufferPool.put(page.getId(), node);
return page;
}
}
else{
LinkedNode node = bufferPool.get(pid);
moveToHead(node);
return node.page;
}
}
private synchronized void evictPage()
:页面置换操作
private synchronized void evictPage() throws DbException {
// some code goes here
// not necessary for lab1
LinkedNode tail = removeTail();
PageId evictPageId = tail.pageId;
try {
flushPage(evictPageId);
} catch (IOException e){
e.printStackTrace();
}
discardPage(evictPageId);
}
private synchronized void flushPage(PageId pid)
:如果移除的page是dirty page则将其写回磁盘
private synchronized void flushPage(PageId pid) throws IOException {
// some code goes here
// not necessary for lab1
Page page = bufferPool.get(pid).page;
if(page.isDirty() != null){
Database.getCatalog().getDatabaseFile(pid.getTableId()).writePage(page);
page.markDirty(false,null);
}
}
public synchronized void discardPage(PageId pid)
:将pid所对应的page从映射结构中移除
public synchronized void discardPage(PageId pid) {
// some code goes here
// not necessary for lab1
bufferPool.remove(pid);
}