2.5. HeapFile access method
Access methods provide a way to read or write data from disk that is arranged in a specific way. Common access methods include heap files (unsorted files of tuples) and B-trees; for this assignment, you will only implement a heap file access method, and we have written some of the code for you.
A HeapFile
object is arranged into a set of pages, each of which consists of a fixed number of bytes for storing tuples, (defined by the constant BufferPool.DEFAULT_PAGE_SIZE
), including a header. In SimpleDB, there is one HeapFile
object for each table in the database. Each page in a HeapFile
is arranged as a set of slots, each of which can hold one tuple (tuples for a given table in SimpleDB are all of the same size). In addition to these slots, each page has a header that consists of a bitmap with one bit per tuple slot. If the bit corresponding to a particular tuple is 1, it indicates that the tuple is valid; if it is 0, the tuple is invalid (e.g., has been deleted or was never initialized.) Pages of HeapFile
objects are of type HeapPage
which implements the Page
interface. Pages are stored in the buffer pool but are read and written by the HeapFile
class.
SimpleDB stores heap files on disk in more or less the same format they are stored in memory. Each file consists of page data arranged consecutively on disk. Each page consists of one or more bytes representing the header, followed by the _ page size_ bytes of actual page content. Each tuple requires tuple size * 8 bits for its content and 1 bit for the header. Thus, the number of tuples that can fit in a single page is:
Exercise 4
Implement the skeleton methods in:
- src/java/simpledb/storage/HeapPageId.java
- src/java/simpledb/storage/RecordId.java
- src/java/simpledb/storage/HeapPage.java
其中HeapPageId和RecordId都是实现get和set方法
HeapPage 中head存放一个BitMap,从这个BitMap可以知道Tuple是否是能使用的。
关键的两个BitMap操作方法
/**
* Returns true if associated slot on this page is filled.
*/
public boolean isSlotUsed(int i) {
byte b = this.header[i / 8];
return (b & (1 << (i % 8))) != 0;
}
/**
* Returns the number of empty slots on this page.
*/
public int getNumEmptySlots() {
// 统计BitMap有多少个0,变为统计BitMap有多少个1,用lowbit技巧
int numUsedSlots = 0;
for(byte b: header){
while(b != 0){
numUsedSlots++;
b &= (b-1);
}
}
return getNumTuples() - numUsedSlots;
}
还有一个方法BitMap的大小是这样得出来的
/**
* Computes the number of bytes in the header of a page in a HeapFile with each tuple occupying tupleSize bytes
* @return the number of bytes in the header of a page in a HeapFile with each tuple occupying tupleSize bytes
*/
private int getHeaderSize() {
return (int)Math.ceil(getNumTuples() / 8.0);
}
还有一个关键是每页Tuple的大小
/** Retrieve the number of tuples on this page.
@return the number of tuples on this page
_tuples per page_ = floor((_page size_ * 8) / (_tuple size_ * 8 + 1))
*/
private int getNumTuples() {
// 该页面大小能够装的tuple个数,每个tuple需要占用的空间为tupleDesc的size加上1bit
return (BufferPool.getPageSize() * 8) / (this.td.getSize() * 8 + 1);
}
还要实现一个迭代器去遍历可以用的Tuple,方法就是在数组遍历的时候判断。
/**
* @return an iterator over all tuples on this page (calling remove on this iterator throws an UnsupportedOperationException)
* (note that this iterator shouldn't return tuples in empty slots!)
*/
public Iterator<Tuple> iterator() {
// some code goes here
return new Iterator<Tuple>() {
private int nextSlot = 0;
@Override
public boolean hasNext() {
return nextSlot < getNumTuples() - getNumEmptySlots();
}
@Override
public Tuple next() {
if(!hasNext()){
throw new NoSuchElementException();
}
while(!isSlotUsed(nextSlot)){
nextSlot++;
}
return tuples[nextSlot++];
}
};
}