SimpleDB lab1 EX5~EX7
1 EX5 HeapFile
2020-10-25
根据提示,这里需要实现从磁盘中,对文件的读取操作。因此,我们首先要计算正确的偏移量(offset)。另外,为了实现随机读取文件,采用RandomAccess来操作文件。
从DbFile.java中,我们可以找到偏移量的计算方式,即
p
a
g
e
N
o
∗
p
a
g
e
S
i
z
e
pageNo*pageSize
pageNo∗pageSize. 其中,pageNo代表file中的第n页,pageSize是页大小。
通过RandomAccessFile的seek()函数,可以快速定位到对应页的位置,并读取一个页的内容。同时,通过HeapPage的构造函数,将byte类型的数据,实例化为HeapPage。
另外,就是构造迭代器,这个相对比较简单。迭代器的作用为遍历整个HeapFile上的Tuple。前面在HeapPage上,已经实现了页上的迭代器,因此,可以通过读取HeapPage, 用HeapPage的迭代器返回Tuple。此处文档给了提示,使用BufferPool的readPage()函数,进行读取页面的操作。另外就是,不要在open的时候,把所有的页都载入内存,这样性能会更好一些。我们可以在每个页读完之后,如果文件还存在页没有读取,再将其载入内存。同时,释放上一个读取的页面。
其余函数比较常规,不再赘述。
综上,本部分的代码如下所示:
package simpledb;
import javax.xml.crypto.Data;
import java.io.*;
import java.util.*;
/**
* HeapFile is an implementation of a DbFile that stores a collection of tuples
* in no particular order. Tuples are stored on pages, each of which is a fixed
* size, and the file is simply a collection of those pages. HeapFile works
* closely with HeapPage. The format of HeapPages is described in the HeapPage
* constructor.
*
* @see simpledb.HeapPage#HeapPage
* @author Sam Madden
*/
public class HeapFile implements DbFile {
private File file;
private TupleDesc td;
/**
* Constructs a heap file backed by the specified file.
*
* @param f
* the file that stores the on-disk backing store for this heap
* file.
*/
public HeapFile(File f, TupleDesc td) {
// some code goes here
this.file = f;
this.td = td;
}
/**
* Returns the File backing this HeapFile on disk.
*
* @return the File backing this HeapFile on disk.
*/
public File getFile() {
// some code goes here
return this.file;
}
/**
* Returns an ID uniquely identifying this HeapFile. Implementation note:
* you will need to generate this tableid somewhere to ensure that each
* HeapFile has a "unique id," and that you always return the same value for
* a particular HeapFile. We suggest hashing the absolute file name of the
* file underlying the heapfile, i.e. f.getAbsoluteFile().hashCode().
*
* @return an ID uniquely identifying this HeapFile.
*/
public int getId() {
// some code goes here
return file.getAbsoluteFile().hashCode();
}
/**
* Returns the TupleDesc of the table stored in this DbFile.
*
* @return TupleDesc of this DbFile.
*/
public TupleDesc getTupleDesc() {
// some code goes here
return this.td;
}
// see DbFile.java for javadocs
public Page readPage(PageId pid) {
// some code goes here
// 通过pid计算偏移量,然后读取一个页
try {
RandomAccessFile randomAccessFile = new RandomAccessFile(file, "r");
// 计算偏移量
int pgno = pid.getPageNumber();
int pageSize = Database.getBufferPool().getPageSize();
int offset = pgno * pageSize;
// 读取一个pagesize的内容
byte[] buffer = new byte[pageSize];
randomAccessFile.seek(offset);
randomAccessFile.read(buffer);
HeapPage heapPage = new HeapPage(new HeapPageId(pid.getTableId(), pgno), buffer);
return heapPage;
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
// see DbFile.java for javadocs
public void writePage(Page page) throws IOException {
// some code goes here
// not necessary for lab1
}
/**
* Returns the number of pages in this HeapFile.
*/
public int numPages() {
// some code goes here
long fileLen = file.length();
return (int) Math.floor((double)fileLen/Database.getBufferPool().getPageSize());
}
// see DbFile.java for javadocs
public ArrayList<Page> insertTuple(TransactionId tid, Tuple t)
throws DbException, IOException, TransactionAbortedException {
// some code goes here
return null;
// not necessary for lab1
}
// see DbFile.java for javadocs
public ArrayList<Page> deleteTuple(TransactionId tid, Tuple t) throws DbException,
TransactionAbortedException {
// some code goes here
return null;
// not necessary for lab1
}
// see DbFile.java for javadocs
public DbFileIterator iterator(TransactionId tid) {
// some code goes here
return new DbFileIterator() {
int pages = numPages();
int readingPage;
PageId readingPid;
Page page;
Iterator<Tuple> it;
@Override
public void open() throws DbException, TransactionAbortedException {
readingPage=0;
readingPid = new HeapPageId(getId(), readingPage);
page = Database.getBufferPool().getPage(tid, readingPid, Permissions.READ_ONLY);
it = ((HeapPage)page).iterator();
}
@Override
public boolean hasNext() throws DbException, TransactionAbortedException {
if(it == null)
return false;
// lab2 DeleteTest中,发现页面可能为空,所以不应该根据页数来判断是否当前页有tuple,
// 而是根据当前页是否有tuple决定是否读取下一个页 故作出修改。
boolean hasNextTupleInPage = it.hasNext();
while(!hasNextTupleInPage){
if(readingPage < pages) {
Database.getBufferPool().releasePage(tid, readingPid);
readingPage++;
readingPid = new HeapPageId(getId(), readingPage);
page = Database.getBufferPool().getPage(tid, readingPid, Permissions.READ_ONLY);
it = ((HeapPage) page).iterator();
hasNextTupleInPage = it.hasNext();
}
else
return false;
}
return true;
}
@Override
public Tuple next() throws DbException, TransactionAbortedException, NoSuchElementException {
if(it == null)
throw new NoSuchElementException("iterator is not open");
return it.next();
}
@Override
public void rewind() throws DbException, TransactionAbortedException {
readingPage=0;
readingPid = new HeapPageId(getId(), readingPage);
page = Database.getBufferPool().getPage(tid, readingPid, Permissions.READ_ONLY);
it = ((HeapPage)page).iterator();
}
@Override
public void close() {
readingPage = pages+1;
it = null;
Database.getBufferPool().releasePage(tid, readingPid);
}
};
}
}
运行结果
Ex6 Operator
此处仅要求实现SeqScan, 很简单。
seqScan实现的是语句select * from table table_alias
我们可以使用EX5中实现的Iterator遍历表中的数据。
代码如下:
package simpledb;
import org.omg.IOP.TAG_ALTERNATE_IIOP_ADDRESS;
import java.awt.image.DataBuffer;
import java.util.*;
/**
* SeqScan is an implementation of a sequential scan access method that reads
* each tuple of a table in no particular order (e.g., as they are laid out on
* disk).
*/
public class SeqScan implements OpIterator {
private static final long serialVersionUID = 1L;
private TransactionId tid;
private int tableId;
private String tableAlias;
private DbFileIterator iterator;
/**
* Creates a sequential scan over the specified table as a part of the
* specified transaction.
*
* @param tid
* The transaction this scan is running as a part of.
* @param tableid
* the table to scan.
* @param tableAlias
* the alias of this table (needed by the parser); the returned
* tupleDesc should have fields with name tableAlias.fieldName
* (note: this class is not responsible for handling a case where
* tableAlias or fieldName are null. It shouldn't crash if they
* are, but the resulting name can be null.fieldName,
* tableAlias.null, or null.null).
*/
public SeqScan(TransactionId tid, int tableid, String tableAlias) {
// some code goes here
this.tid = tid;
this.tableId = tableid;
this.tableAlias = tableAlias;
this.iterator = Database.getCatalog().getDatabaseFile(tableid).iterator(tid);
}
/**
* @return
* return the table name of the table the operator scans. This should
* be the actual name of the table in the catalog of the database
* */
public String getTableName() {
return Database.getCatalog().getTableName(tableId);
}
/**
* @return Return the alias of the table this operator scans.
* */
public String getAlias()
{
// some code goes here
return this.tableAlias;
}
/**
* Reset the tableid, and tableAlias of this operator.
* @param tableid
* the table to scan.
* @param tableAlias
* the alias of this table (needed by the parser); the returned
* tupleDesc should have fields with name tableAlias.fieldName
* (note: this class is not responsible for handling a case where
* tableAlias or fieldName are null. It shouldn't crash if they
* are, but the resulting name can be null.fieldName,
* tableAlias.null, or null.null).
*/
public void reset(int tableid, String tableAlias) {
// some code goes here
this.tableId = tableid;
this.tableAlias = tableAlias;
}
public SeqScan(TransactionId tid, int tableId) {
this(tid, tableId, Database.getCatalog().getTableName(tableId));
}
public void open() throws DbException, TransactionAbortedException {
// some code goes here
iterator.open();
}
/**
* Returns the TupleDesc with field names from the underlying HeapFile,
* prefixed with the tableAlias string from the constructor. This prefix
* becomes useful when joining tables containing a field(s) with the same
* name. The alias and name should be separated with a "." character
* (e.g., "alias.fieldName").
*
* @return the TupleDesc with field names from the underlying HeapFile,
* prefixed with the tableAlias string from the constructor.
*/
public TupleDesc getTupleDesc() {
// some code goes here
return Database.getCatalog().getTupleDesc(tableId);
}
public boolean hasNext() throws TransactionAbortedException, DbException {
// some code goes here
return this.iterator.hasNext();
}
public Tuple next() throws NoSuchElementException,
TransactionAbortedException, DbException {
// some code goes here
return this.iterator.next();
}
public void close() {
// some code goes here
this.iterator.close();
}
public void rewind() throws DbException, NoSuchElementException,
TransactionAbortedException {
// some code goes here
this.iterator.rewind();
}
}
测试
EX7 (sp)
至此,lab1的内容已经完全实现了。文档中指出,我们可以自己实现一下sql与函数的对应关系,后续更新吧~
按照提示,新建了一个测试类:
package simpledb.client;
import simpledb.*;
import java.io.File;
public class Test{
public static void main(String[] argv) {
// construct a 3-column table schema
Type types[] = new Type[]{ Type.INT_TYPE, Type.INT_TYPE, Type.INT_TYPE };
String names[] = new String[]{ "field0", "field1", "field2" };
TupleDesc descriptor = new TupleDesc(types, names);
// create the table, associate it with some_data_file.dat
// and tell the catalog about the schema of this table.
HeapFile table1 = new HeapFile(new File("datafile.dat"), descriptor);
Database.getCatalog().addTable(table1, "test");
// construct the query: we use a simple SeqScan, which spoonfeeds
// tuples via its iterator.
TransactionId tid = new TransactionId();
SeqScan f = new SeqScan(tid, table1.getId());
// 打印属性名
System.out.printLn(names[0] + " " + names[1] + " " + names[2] );
try {
// and run it
f.open();
while (f.hasNext()) {
Tuple tup = f.next();
System.out.println(tup);
}
f.close();
Database.getBufferPool().transactionComplete(tid);
} catch (Exception e) {
System.out.println ("Exception : " + e);
}
}
}
可以看到,该类调用了前面写完的SeqScan(),遍历了表中的所有数据,并将其打印了出来。
需要调用SimpleDB主函数,输入参数(args) convert filename.txt n, 即可生成filenam.dat文件,其中存储了filename.txt中的数据,即可供simpleDB读取的二进制数据库文件。此时,调用Test类,即可输出表中的内容。
运行结果:
至此,lab1完成。