Mit 6.830:SimpleDB Lab1

资料来源

课程地址:6.830/6.814: Database Systems (mit.edu)

源码 + 讲义:github.com

克隆到本地

git clone https://github.com/MIT-DB-Class/simple-db-hw-2021.git 

Exercise1

Tuple 元组

参数
  • TupleDesc 表头部
    • 关系是相当于列头部
  • RecordId 路径
  • Field[] 存储的数据
方法
  • Tuple(TupleDesc)
    • 以表头部创建
全代码:
package simpledb.storage;

import java.io.Serializable;
import java.util.Arrays;
import java.util.Iterator;

/**
 * Tuple maintains information about the contents of a tuple. Tuples have a
 * specified schema specified by a TupleDesc object and contain Field objects
 * with the data for each field.
 */
public class Tuple implements Serializable {

    private static final long serialVersionUID = 1L;

    private TupleDesc tupleDesc;
    // 路径
    private RecordId rid;
    private final Field[] fields;

    /**
     * Create a new tuple with the specified schema (type).
     *
     * @param td
     *            the schema of this tuple. It must be a valid TupleDesc
     *            instance with at least one field.
     */
    public Tuple(TupleDesc td) {
        // some code goes here
        this.tupleDesc = td;
        fields = new Field[td.numFields()];
    }

    /**
     * @return The TupleDesc representing the schema of this tuple.
     */
    public TupleDesc getTupleDesc() {
        // some code goes here
        return tupleDesc;
    }

    /**
     * @return The RecordId representing the location of this tuple on disk. May
     *         be null.
     */
    public RecordId getRecordId() {
        // some code goes here
        return rid;
    }

    /**
     * Set the RecordId information for this tuple.
     *
     * @param rid
     *            the new RecordId for this tuple.
     */
    public void setRecordId(RecordId rid) {
        // some code goes here
        this.rid = rid;
    }

    /**
     * Change the value of the ith field of this tuple.
     *
     * @param i
     *            index of the field to change. It must be a valid index.
     * @param f
     *            new value for the field.
     */
    public void setField(int i, Field f) {
        // some code goes here
        fields[i] = f;
    }

    /**
     * @return the value of the ith field, or null if it has not been set.
     *
     * @param i
     *            field index to return. Must be a valid index.
     */
    public Field getField(int i) {
        // some code goes here
        return fields[i];
    }

    /**
     * Returns the contents of this Tuple as a string. Note that to pass the
     * system tests, the format needs to be as follows:
     *
     * column1\tcolumn2\tcolumn3\t...\tcolumnN
     *
     * where \t is any whitespace (except a newline)
     */
    public String toString() {
        // some code goes here
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < fields.length - 1; i++) {
            sb.append(fields[i]).append(" ");
        }
        sb.append(fields[fields.length - 1]);
        return sb.toString();
    }

    /**
     * @return
     *        An iterator which iterates over all the fields of this tuple
     * */
    public Iterator<Field> fields()
    {
        // some code goes here
        return Arrays.asList(fields).iterator();
    }

    /**
     * reset the TupleDesc of this tuple (only affecting the TupleDesc)
     * */
    public void resetTupleDesc(TupleDesc td)
    {
        // some code goes here
        tupleDesc = td;
    }
}

TupleDesc 元组头部

参数:
  • TDItem[] tdItems 每个字段对应的类型和名称
    • 内部类 TDItem
      • Type typeAr 字段类型
      • String fieldAr 字段名称
方法
  • TupleDesc(Type[] typeAr, String[] fieldAr)

  • TupleDesc(Type[] typeAr) 默认名字为空 ("")

  • iterator()

    • 直接转换

      public Iterator<TDItem> iterator() {
      	return Arrays.asList(tdItems).iterator();
      }
      
  • getSize() 获取所有字段的类型大小

  • TupleDesc merge() :合并,遍历两个表合成一个新TupleDesc 返回

  • equals

    • 三个步骤: 比较 类 - 》 比较大小是否相同 -》比较每个字段

      public boolean equals(Object o) {
          // some code goes here
          // 比较 class
          if(this.getClass().isInstance(o)){
              // 强转类型
              TupleDesc tupleDesc = (TupleDesc) o;
              // 比较大小是否相同
              if(numFields() == tupleDesc.numFields()){
                  // 遍历每个字段
                  for (int i = 0; i < numFields(); i++) {
                      // 如果类型不相等
                      if(!tdItems[i].fieldType.equals(tupleDesc.tdItems[i].fieldType)){
                          return false;
                      }
                  }
                  // 全部相等
                  return true;
              }
          }
          return false;
      }
      
    • 关于比较类

      • class.isInstance(obj) 是判断类是否能够转化为当前类(编译器运行时才进行类型检查)
      • obj instanceof ( class ) 是判断类是否相同(编译时编译器需要知道类的具体类型)
全代码:
package simpledb.storage;

import simpledb.common.Type;

import java.io.Serializable;
import java.util.*;

/**
 * TupleDesc describes the schema of a tuple.
 */
public class TupleDesc implements Serializable {

    private final TDItem[] tdItems;

    /**
     * A help class to facilitate organizing the information of each field
     * */
    public static class TDItem implements Serializable {

        private static final long serialVersionUID = 1L;

        /**
         * The type of the field
         * */
        public final Type fieldType;
        
        /**
         * The name of the field
         * */
        public final String fieldName;

        public TDItem(Type t, String n) {
            this.fieldName = n;
            this.fieldType = t;
        }

        public String toString() {
            return fieldName + "(" + fieldType + ")";
        }
    }

    /**
     * @return
     *        An iterator which iterates over all the field TDItems
     *        that are included in this TupleDesc
     * */
    public Iterator<TDItem> iterator() {
        // some code goes here
        return Arrays.asList(tdItems).iterator();
    }

    private static final long serialVersionUID = 1L;

    /**
     * Create a new TupleDesc with typeAr.length fields with fields of the
     * specified types, with associated named fields.
     * 
     * @param typeAr
     *            array specifying the number of and types of fields in this
     *            TupleDesc. It must contain at least one entry.
     * @param fieldAr
     *            array specifying the names of the fields. Note that names may
     *            be null.
     */
    public TupleDesc(Type[] typeAr, String[] fieldAr) {
        // some code goes here
        tdItems = new TDItem[typeAr.length];
        for (int i = 0; i < typeAr.length; i++) {
            tdItems[i] = new TDItem(typeAr[i], fieldAr[i]);
        }
    }

    /**
     * Constructor. Create a new tuple desc with typeAr.length fields with
     * fields of the specified types, with anonymous (unnamed) fields.
     * 
     * @param typeAr
     *            array specifying the number of and types of fields in this
     *            TupleDesc. It must contain at least one entry.
     */
    public TupleDesc(Type[] typeAr) {
        // some code goes here
        tdItems = new TDItem[typeAr.length];
        for (int i = 0; i < typeAr.length; i++) {
            tdItems[i] = new TDItem(typeAr[i], "");
        }
    }

    /**
     * @return the number of fields in this TupleDesc
     */
    public int numFields() {
        // some code goes here
        return tdItems.length;
    }

    /**
     * Gets the (possibly null) field name of the ith field of this TupleDesc.
     * 
     * @param i
     *            index of the field name to return. It must be a valid index.
     * @return the name of the ith field
     * @throws NoSuchElementException
     *             if i is not a valid field reference.
     */
    public String getFieldName(int i) throws NoSuchElementException {
        // some code goes here
        return tdItems[i].fieldName;
    }

    /**
     * Gets the type of the ith field of this TupleDesc.
     * 
     * @param i
     *            The index of the field to get the type of. It must be a valid
     *            index.
     * @return the type of the ith field
     * @throws NoSuchElementException
     *             if i is not a valid field reference.
     */
    public Type getFieldType(int i) throws NoSuchElementException {
        // some code goes here
        return tdItems[i].fieldType;
    }

    /**
     * Find the index of the field with a given name.
     * 
     * @param name
     *            name of the field.
     * @return the index of the field that is first to have the given name.
     * @throws NoSuchElementException
     *             if no field with a matching name is found.
     */
    public int fieldNameToIndex(String name) throws NoSuchElementException {
        // some code goes here
        for (int i = 0; i < tdItems.length; i++) {
            if(tdItems[i].fieldName.equals(name)){
                return i;
            }
        }
        throw new NoSuchElementException("not found fieldName " + name);
    }

    /**
     * @return The size (in bytes) of tuples corresponding to this TupleDesc.
     *         Note that tuples from a given TupleDesc are of a fixed size.
     */
    public int getSize() {
        // some code goes here
        int size = 0;
        for (int i = 0; i < tdItems.length; i++) {
            size += tdItems[i].fieldType.getLen();
        }
        return size;
    }

    /**
     * Merge two TupleDescs into one, with td1.numFields + td2.numFields fields,
     * with the first td1.numFields coming from td1 and the remaining from td2.
     * 
     * @param td1
     *            The TupleDesc with the first fields of the new TupleDesc
     * @param td2
     *            The TupleDesc with the last fields of the TupleDesc
     * @return the new TupleDesc
     */
    public static TupleDesc merge(TupleDesc td1, TupleDesc td2) {
        // some code goes here
        Type[] types = new Type[td1.numFields() + td2.numFields()];
        String[] name = new String[td1.numFields() + td2.numFields()];
        for (int i = 0; i < td1.numFields(); i++) {
            types[i] = td1.tdItems[i].fieldType;
            name[i] = td1.tdItems[i].fieldName;
        }
        int len1 = td1.numFields();
        for (int i = 0; i < td2.numFields(); i++) {
            types[i + len1] = td2.tdItems[i].fieldType;
            name[i + len1] = td2.tdItems[i].fieldName;
        }
        return new TupleDesc(types, name);
    }

    /**
     * Compares the specified object with this TupleDesc for equality. Two
     * TupleDescs are considered equal if they have the same number of items
     * and if the i-th type in this TupleDesc is equal to the i-th type in o
     * for every i.
     * 
     * @param o
     *            the Object to be compared for equality with this TupleDesc.
     * @return true if the object is equal to this TupleDesc.
     */

    public boolean equals(Object o) {
        // some code goes here
        // 比较 class
        if(this.getClass().isInstance(o)){
            // 强转类型
            TupleDesc tupleDesc = (TupleDesc) o;
            // 比较大小是否相同
            if(numFields() == tupleDesc.numFields()){
                // 遍历每个字段
                for (int i = 0; i < numFields(); i++) {
                    // 如果类型不相等
                    if(!tdItems[i].fieldType.equals(tupleDesc.tdItems[i].fieldType)){
                        return false;
                    }
                }
                // 全部相等
                return true;
            }
        }
        return false;
    }

    public int hashCode() {
        // If you want to use TupleDesc as keys for HashMap, implement this so
        // that equal objects have equals hashCode() results
        throw new UnsupportedOperationException("unimplemented");
    }

    /**
     * Returns a String describing this descriptor. It should be of the form
     * "fieldType[0](fieldName[0]), ..., fieldType[M](fieldName[M])", although
     * the exact format does not matter.
     * 
     * @return String describing this descriptor.
     */
    public String toString() {
        // some code goes here
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < tdItems.length - 1; i++) {
            sb.append(tdItems[i].fieldType).append("(").append(tdItems[i].fieldName).append("),");
        }
        // 处理最后一个
        sb.append(tdItems[tdItems.length - 1].fieldType).append("(").append(tdItems[tdItems.length - 1].fieldName).append("),");
        return sb.toString();
    }
}

Field 字段

  • IntField
    • int value
  • StringField
    • String value
    • int maxSize 最大大小,超过最大大小分割

测试:

TupleTest 和 TupleDescTest

Exercise2

Catelog 表日志

存储所有表记录

参数:
  • ConcurrentHashMap<Integer, Table> hashTable 用于存储 tableId 和 表记录的映射
方法:
  • Catalog():创建hashTable

  • addTable(DbFile file, String name, String pkeyField): put新值进去

  • getTableId(String name):根据名字查询tableId

    public int getTableId(String name) throws NoSuchElementException {
        // some code goes here
        // 遍历
        Integer res = hashTable.searchValues(1, value ->{
            if(value.tableName.equals(name)){
                return value.file.getId();
            }
            return null;
        });
        if(res != null){
            return res;
        }
        throw new NoSuchElementException("not found id for table " + name);
    }
    
  • TupleDesc getTupleDesc(int tableid):根据表id获得元组头部

    public TupleDesc getTupleDesc(int tableid) throws NoSuchElementException {
        // some code goes here
        Table t = hashTable.getOrDefault(tableid, null);
        if(t != null){
            return t.file.getTupleDesc();
        }
        throw new NoSuchElementException("not found tupleDesc for table " + tableid);
    }
    
  • tableIdIterator

    public Iterator<Integer> tableIdIterator() {
        // some code goes here
        return hashTable.keySet().iterator();
    }
    
全代码
package simpledb.common;

import simpledb.common.Type;
import simpledb.storage.DbFile;
import simpledb.storage.HeapFile;
import simpledb.storage.TupleDesc;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.*;
import java.util.concurrent.ConcurrentHashMap;

/**
 * The Catalog keeps track of all available tables in the database and their
 * associated schemas.
 * For now, this is a stub catalog that must be populated with tables by a
 * user program before it can be used -- eventually, this should be converted
 * to a catalog that reads a catalog table from disk.
 * 
 * @Threadsafe
 */
public class Catalog {

    // 所有表
    private final ConcurrentHashMap<Integer, Table> hashTable;

    /**
     * 表
     * */
    private static class Table{
        private DbFile file;
        private String tableName;
        private String pkeyField;

        public Table(DbFile file, String tableName, String pkeyField){
            this.file = file;
            this.tableName = tableName;
            this.pkeyField = pkeyField;
        }

        public String toString(){
            return tableName + "(" + file.getId() + ")" + pkeyField + ")";
        }
    }


    /**
     * Constructor.
     * Creates a new, empty catalog.
     */
    public Catalog() {
        // some code goes here
        hashTable = new ConcurrentHashMap<>();
    }

    /**
     * Add a new table to the catalog.
     * This table's contents are stored in the specified DbFile.
     * @param file the contents of the table to add;  file.getId() is the identfier of
     *    this file/tupledesc param for the calls getTupleDesc and getFile
     * @param name the name of the table -- may be an empty string.  May not be null.  If a name
     * conflict exists, use the last table to be added as the table for a given name.
     * @param pkeyField the name of the primary key field
     */
    public void addTable(DbFile file, String name, String pkeyField) {
        // some code goes here
        hashTable.put(file.getId(), new Table(file, name, pkeyField));
    }

    public void addTable(DbFile file, String name) {
        addTable(file, name, "");
    }

    /**
     * Add a new table to the catalog.
     * This table has tuples formatted using the specified TupleDesc and its
     * contents are stored in the specified DbFile.
     * @param file the contents of the table to add;  file.getId() is the identfier of
     *    this file/tupledesc param for the calls getTupleDesc and getFile
     */
    public void addTable(DbFile file) {
        addTable(file, (UUID.randomUUID()).toString());
    }

    /**
     * Return the id of the table with a specified name,
     * @throws NoSuchElementException if the table doesn't exist
     */
    public int getTableId(String name) throws NoSuchElementException {
        // some code goes here
        // 遍历
        Integer res = hashTable.searchValues(1, value ->{
            if(value.tableName.equals(name)){
                return value.file.getId();
            }
            return null;
        });
        if(res != null){
            return res;
        }
        throw new NoSuchElementException("not found id for table " + name);
    }

    /**
     * Returns the tuple descriptor (schema) of the specified table
     * @param tableid The id of the table, as specified by the DbFile.getId()
     *     function passed to addTable
     * @throws NoSuchElementException if the table doesn't exist
     */
    public TupleDesc getTupleDesc(int tableid) throws NoSuchElementException {
        // some code goes here
        Table t = hashTable.getOrDefault(tableid, null);
        if(t != null){
            return t.file.getTupleDesc();
        }
        throw new NoSuchElementException("not found tupleDesc for table " + tableid);
    }

    /**
     * Returns the DbFile that can be used to read the contents of the
     * specified table.
     * @param tableid The id of the table, as specified by the DbFile.getId()
     *     function passed to addTable`
     */
    public DbFile getDatabaseFile(int tableid) throws NoSuchElementException {
        // some code goes here
        Table t = hashTable.getOrDefault(tableid, null);
        if(t != null){
            return t.file;
        }
        throw new NoSuchElementException("not found dbFile for table " + tableid);
    }

    public String getPrimaryKey(int tableid) {
        // some code goes here
        Table t = hashTable.getOrDefault(tableid, null);
        if(t != null){
            return t.pkeyField;
        }
        throw new NoSuchElementException("not found primaryKey for table " + tableid);
    }

    public Iterator<Integer> tableIdIterator() {
        // some code goes here
        return hashTable.keySet().iterator();
    }

    public String getTableName(int id) {
        // some code goes here
        Table t = hashTable.getOrDefault(id, null);
        if(t != null){
            return t.tableName;
        }
        throw new NoSuchElementException("not found tableName for table " + id);
    }
    
    /** Delete all tables from the catalog */
    public void clear() {
        // some code goes here
        hashTable.clear();
    }
    
    /**
     * Reads the schema from a file and creates the appropriate tables in the database.
     * @param catalogFile
     */
    public void loadSchema(String catalogFile) {
        String line = "";
        // 根目录
        String baseFolder=new File(new File(catalogFile).getAbsolutePath()).getParent();
        try {
            // 读取 catelogFile
            BufferedReader br = new BufferedReader(new FileReader(catalogFile));
            while ((line = br.readLine()) != null) {
                //assume line is of the format name (field type, field type, ...)
                String name = line.substring(0, line.indexOf("(")).trim();
                //System.out.println("TABLE NAME: " + name);
                String fields = line.substring(line.indexOf("(") + 1, line.indexOf(")")).trim();
                String[] els = fields.split(",");
                ArrayList<String> names = new ArrayList<>();
                ArrayList<Type> types = new ArrayList<>();
                String primaryKey = "";
                for (String e : els) {
                    String[] els2 = e.trim().split(" ");
                    names.add(els2[0].trim());
                    if (els2[1].trim().equalsIgnoreCase("int"))
                        types.add(Type.INT_TYPE);
                    else if (els2[1].trim().equalsIgnoreCase("string"))
                        types.add(Type.STRING_TYPE);
                    else {
                        System.out.println("Unknown type " + els2[1]);
                        System.exit(0);
                    }
                    if (els2.length == 3) {
                        if (els2[2].trim().equals("pk"))
                            primaryKey = els2[0].trim();
                        else {
                            System.out.println("Unknown annotation " + els2[2]);
                            System.exit(0);
                        }
                    }
                }
                Type[] typeAr = types.toArray(new Type[0]);
                String[] namesAr = names.toArray(new String[0]);
                TupleDesc t = new TupleDesc(typeAr, namesAr);
                HeapFile tabHf = new HeapFile(new File(baseFolder+"/"+name + ".dat"), t);
                addTable(tabHf,name,primaryKey);
                System.out.println("Added table : " + name + " with schema " + t);
            }
        } catch (IOException e) {
            e.printStackTrace();
            System.exit(0);
        } catch (IndexOutOfBoundsException e) {
            System.out.println ("Invalid catalog entry : " + line);
            System.exit(0);
        }
    }
}

测试:

CatalogTest

Exercise3

BufferPoll 缓冲池

参数
  • private static final int DEFAULT_PAGE_SIZE = 4096;
  • private static int pageSize = DEFAULT_PAGE_SIZE; //每页最大字节数
  • public static final int DEFAULT_PAGES = 50 默认页面数
  • private final int numPages 页面的最大数量
  • private final ConcurrentHashMap<Integer, Page> pageStore 储存的页面
方法
  • getPage(TransactionId tid, PageId pid, Permissions perm)

    public Page getPage(TransactionId tid, PageId pid, Permissions perm)
        throws TransactionAbortedException, DbException {
        // some code goes here
        // 如果缓存池中没有
        if(!pageStore.containsKey(pid.hashCode())){
            // 获取
            DbFile dbFile = Database.getCatalog().getDatabaseFile(pid.getTableId());
            Page page = dbFile.readPage(pid);
            // 是否超过大小
            if(pageStore.size() >= numPages){
                // 淘汰 (后面的 Exercise 书写)
                throw new DbException("页面已满");
            }
            // 放入缓存
            pageStore.put(pid.hashCode(), page);
        }
        // 从 缓存池 中获取
        return pageStore.get(pid.hashCode());
    }
    
全代码
package simpledb.storage;

import simpledb.common.Database;
import simpledb.common.Permissions;
import simpledb.common.DbException;
import simpledb.common.DeadlockException;
import simpledb.transaction.TransactionAbortedException;
import simpledb.transaction.TransactionId;

import java.io.*;

import java.util.concurrent.ConcurrentHashMap;

/**
 * BufferPool manages the reading and writing of pages into memory from
 * disk. Access methods call into it to retrieve pages, and it fetches
 * pages from the appropriate location.
 * <p>
 * The BufferPool is also responsible for locking;  when a transaction fetches
 * a page, BufferPool checks that the transaction has the appropriate
 * locks to read/write the page.
 * 
 * @Threadsafe, all fields are final
 */
public class BufferPool {
    /** Bytes per page, including header. */
    private static final int DEFAULT_PAGE_SIZE = 4096;

    private static int pageSize = DEFAULT_PAGE_SIZE;
    
    /** Default number of pages passed to the constructor. This is used by
    other classes. BufferPool should use the numPages argument to the
    constructor instead. */
    public static final int DEFAULT_PAGES = 50;

    // 页面的最大数量
    private final int numPages;
    // 储存的页面
    private final ConcurrentHashMap<Integer, Page> pageStore;

    /**
     * Creates a BufferPool that caches up to numPages pages.
     *
     * @param numPages maximum number of pages in this buffer pool.
     */
    public BufferPool(int numPages) {
        // some code goes here
        this.numPages = numPages;
        pageStore = new ConcurrentHashMap<>();
    }
    
    public static int getPageSize() {
      return pageSize;
    }
    
    // THIS FUNCTION SHOULD ONLY BE USED FOR TESTING!!
    public static void setPageSize(int pageSize) {
    	BufferPool.pageSize = pageSize;
    }
    
    // THIS FUNCTION SHOULD ONLY BE USED FOR TESTING!!
    public static void resetPageSize() {
    	BufferPool.pageSize = DEFAULT_PAGE_SIZE;
    }

    /**
     * Retrieve the specified page with the associated permissions.
     * Will acquire a lock and may block if that lock is held by another
     * transaction.
     * <p>
     * The retrieved page should be looked up in the buffer pool.  If it
     * is present, it should be returned.  If it is not present, it should
     * be added to the buffer pool and returned.  If there is insufficient
     * space in the buffer pool, a page should be evicted and the new page
     * should be added in its place.
     *
     * @param tid the ID of the transaction requesting the page
     * @param pid the ID of the requested page
     * @param perm the requested permissions on the page
     */
    public Page getPage(TransactionId tid, PageId pid, Permissions perm)
        throws TransactionAbortedException, DbException {
        // some code goes here
        // 如果缓存池中没有
        if(!pageStore.containsKey(pid.hashCode())){
            // 获取
            DbFile dbFile = Database.getCatalog().getDatabaseFile(pid.getTableId());
            Page page = dbFile.readPage(pid);
            // 是否超过大小
            if(pageStore.size() >= numPages){
                // 淘汰 (后面的 Exercise 书写)
                throw new DbException("页面已满");
            }
            // 放入缓存
            pageStore.put(pid.hashCode(), page);
        }
        // 从 缓存池 中获取
        return pageStore.get(pid.hashCode());
    }

    /**
     * Releases the lock on a page.
     * Calling this is very risky, and may result in wrong behavior. Think hard
     * about who needs to call this and why, and why they can run the risk of
     * calling it.
     *
     * @param tid the ID of the transaction requesting the unlock
     * @param pid the ID of the page to unlock
     */
    public  void unsafeReleasePage(TransactionId tid, PageId pid) {
        // some code goes here
        // not necessary for Exercise1|Exercise2
    }

    /**
     * Release all locks associated with a given transaction.
     *
     * @param tid the ID of the transaction requesting the unlock
     */
    public void transactionComplete(TransactionId tid) {
        // some code goes here
        // not necessary for Exercise1|Exercise2
    }

    /** Return true if the specified transaction has a lock on the specified page */
    public boolean holdsLock(TransactionId tid, PageId p) {
        // some code goes here
        // not necessary for Exercise1|Exercise2
        return false;
    }

    /**
     * Commit or abort a given transaction; release all locks associated to
     * the transaction.
     *
     * @param tid the ID of the transaction requesting the unlock
     * @param commit a flag indicating whether we should commit or abort
     */
    public void transactionComplete(TransactionId tid, boolean commit) {
        // some code goes here
        // not necessary for Exercise1|Exercise2
    }

    /**
     * Add a tuple to the specified table on behalf of transaction tid.  Will
     * acquire a write lock on the page the tuple is added to and any other 
     * pages that are updated (Lock acquisition is not needed for Exercise2). 
     * May block if the lock(s) cannot be acquired.
     * 
     * Marks any pages that were dirtied by the operation as dirty by calling
     * their markDirty bit, and adds versions of any pages that have 
     * been dirtied to the cache (replacing any existing versions of those pages) so 
     * that future requests see up-to-date pages. 
     *
     * @param tid the transaction adding the tuple
     * @param tableId the table to add the tuple to
     * @param t the tuple to add
     */
    public void insertTuple(TransactionId tid, int tableId, Tuple t)
        throws DbException, IOException, TransactionAbortedException {
        // some code goes here
        // not necessary for Exercise1
    }

    /**
     * Remove the specified tuple from the buffer pool.
     * Will acquire a write lock on the page the tuple is removed from and any
     * other pages that are updated. May block if the lock(s) cannot be acquired.
     *
     * Marks any pages that were dirtied by the operation as dirty by calling
     * their markDirty bit, and adds versions of any pages that have 
     * been dirtied to the cache (replacing any existing versions of those pages) so 
     * that future requests see up-to-date pages. 
     *
     * @param tid the transaction deleting the tuple.
     * @param t the tuple to delete
     */
    public  void deleteTuple(TransactionId tid, Tuple t)
        throws DbException, IOException, TransactionAbortedException {
        // some code goes here
        // not necessary for Exercise1
    }

    /**
     * Flush all dirty pages to disk.
     * NB: Be careful using this routine -- it writes dirty data to disk so will
     *     break simpledb if running in NO STEAL mode.
     */
    public synchronized void flushAllPages() throws IOException {
        // some code goes here
        // not necessary for Exercise1

    }

    /** Remove the specific page id from the buffer pool.
        Needed by the recovery manager to ensure that the
        buffer pool doesn't keep a rolled back page in its
        cache.
        
        Also used by B+ tree files to ensure that deleted pages
        are removed from the cache so they can be reused safely
    */
    public synchronized void discardPage(PageId pid) {
        // some code goes here
        // not necessary for Exercise1
    }

    /**
     * Flushes a certain page to disk
     * @param pid an ID indicating the page to flush
     */
    private synchronized  void flushPage(PageId pid) throws IOException {
        // some code goes here
        // not necessary for Exercise1
    }

    /** Write all pages of the specified transaction to disk.
     */
    public synchronized  void flushPages(TransactionId tid) throws IOException {
        // some code goes here
        // not necessary for Exercise1|Exercise2
    }

    /**
     * Discards a page from the buffer pool.
     * Flushes the page to disk to ensure dirty pages are updated on disk.
     */
    private synchronized  void evictPage() throws DbException {
        // some code goes here
        // not necessary for Exercise1
    }
}

测试:

本节暂时不用测试,在下一节 HeapPage中 readPage会有测试

Exercise4

HeapPage 页面

参数
  • final HeapPageId pid;
  • final TupleDesc td; // 表头部 也是元组头部
    • 这个的意义是用来统计每行所占的大小,一个表对应一个页面
  • final byte[] header;// 头部数据 bitmap
  • final Tuple[] tuples; // 元组数据
  • final int numSlots; // 槽数,也就是行的数量
  • byte[] oldData;
  • private final Byte oldDataLock= (byte) 0;
方法
  • getNumTuples() 返回一个页面有多少个元组

    公式:tuple_nums = floor((page_size * 8) / tuple_size * 8 + 1)

    • 向下取整,剩下不够一个元组也就没办法储存了,需要抛弃
    • 1字节 = 8 比特
    • tuple_size * 8 + 1 :除了数据需要的大小,还需要一个bit位标记是否已被使用
    private int getNumTuples() {        
        // some code goes here
        // 计算页面有多少个元组
        return (int) Math.floor((BufferPool.getPageSize() * 8 * 1.0) / (td.getSize() * 8 + 1));
    }
    
  • HeapPage(HeapPageId id, byte[] data) 创建一个页面,也就是读取数据

        public HeapPage(HeapPageId id, byte[] data) throws IOException {
            this.pid = id;
            this.td = Database.getCatalog().getTupleDesc(id.getTableId());
            this.numSlots = getNumTuples();
            DataInputStream dis = new DataInputStream(new ByteArrayInputStream(data));
    
            // allocate and read the header slots of this page
            // 处理头部
            header = new byte[getHeaderSize()];
            for (int i=0; i<header.length; i++)
                header[i] = dis.readByte();
            // 处理每行
            tuples = new Tuple[numSlots];
            try{
                // allocate and read the actual records of this page
                for (int i=0; i<tuples.length; i++)
                    tuples[i] = readNextTuple(dis,i);
            }catch(NoSuchElementException e){
                e.printStackTrace();
            }
            dis.close();
    
            setBeforeImage();
        }
    
  • getHeaderSize():获取头部长度

    • headerBytes = ceiling(tuplePerPage / 8);
      • 向上取整,满足无用行,凑8的整数倍
    private int getHeaderSize() {
        // some code goes here
        // headerBytes = ceiling(tuplePerPage / 8);
        return (int) Math.ceil(getNumTuples() * 1.0 / 8);
    }
    
  • isSlotUsed(int i):该槽位是否被使用

    public boolean isSlotUsed(int i) {
        // some code goes here
        // 槽位
        int quot = i / 8;
        // 偏移
        int move = i % 8;
        // 获得对应的槽位
        int bitidx = header[quot];
        // 偏移 move 位,看是否等于 1
        return ((bitidx >> move) & 1) == 1;
    }
    
  • iterator()

    public Iterator<Tuple> iterator() {
        // some code goes here
        // 获取已使用的槽对应的数
        ArrayList<Tuple> res = new ArrayList<>();
        for (int i = 0; i < numSlots; i++) {
            if(isSlotUsed(i)){
                res.add(tuples[i]);
            }
        }
        return res.iterator();
    }
    
全代码
package simpledb.storage;

import simpledb.common.Database;
import simpledb.common.DbException;
import simpledb.common.Debug;
import simpledb.common.Catalog;
import simpledb.transaction.TransactionId;

import java.util.*;
import java.io.*;

/**
 * Each instance of HeapPage stores data for one page of HeapFiles and 
 * implements the Page interface that is used by BufferPool.
 *
 * @see HeapFile
 * @see BufferPool
 *
 */
public class HeapPage implements Page {

    final HeapPageId pid;
    final TupleDesc td;
    // 槽储存
    final byte[] header;
    // 元组数据
    final Tuple[] tuples;
    // 槽数
    final int numSlots;

    byte[] oldData;
    private final Byte oldDataLock= (byte) 0;

    /**
     * Create a HeapPage from a set of bytes of data read from disk.
     * The format of a HeapPage is a set of header bytes indicating
     * the slots of the page that are in use, some number of tuple slots.
     *  Specifically, the number of tuples is equal to: <p>
     *          floor((BufferPool.getPageSize()*8) / (tuple size * 8 + 1))
     * <p> where tuple size is the size of tuples in this
     * database table, which can be determined via {@link Catalog#getTupleDesc}.
     * The number of 8-bit header words is equal to:
     * <p>
     *      ceiling(no. tuple slots / 8)
     * <p>
     * @see Database#getCatalog
     * @see Catalog#getTupleDesc
     * @see BufferPool#getPageSize()
     */
    public HeapPage(HeapPageId id, byte[] data) throws IOException {
        this.pid = id;
        this.td = Database.getCatalog().getTupleDesc(id.getTableId());
        this.numSlots = getNumTuples();
        DataInputStream dis = new DataInputStream(new ByteArrayInputStream(data));

        // allocate and read the header slots of this page
        // 处理头部
        header = new byte[getHeaderSize()];
        for (int i=0; i<header.length; i++)
            header[i] = dis.readByte();
        // 处理每行
        tuples = new Tuple[numSlots];
        try{
            // allocate and read the actual records of this page
            for (int i=0; i<tuples.length; i++)
                tuples[i] = readNextTuple(dis,i);
        }catch(NoSuchElementException e){
            e.printStackTrace();
        }
        dis.close();

        setBeforeImage();
    }

    /** Retrieve the number of tuples on this page.
        @return the number of tuples on this page
    */
    private int getNumTuples() {        
        // some code goes here
        // 计算页面有多少个元组
        // tuple_nums = floor((page_size * 8) / tuple_size * 8 + 1)
        return (int) Math.floor((BufferPool.getPageSize() * 8 * 1.0) / (td.getSize() * 8 + 1));

    }

    /**
     * Computes the number of bytes in the header of a page in a HeapFile with each tuple occupying tupleSize bytes
     * @return the number of bytes in the header of a page in a HeapFile with each tuple occupying tupleSize bytes
     */
    private int getHeaderSize() {
        // some code goes here
        // headerBytes = ceiling(tuplePerPage / 8);
        return (int) Math.ceil(getNumTuples() * 1.0 / 8);
    }
    
    /** Return a view of this page before it was modified
        -- used by recovery */
    public HeapPage getBeforeImage(){
        try {
            byte[] oldDataRef = null;
            synchronized(oldDataLock)
            {
                oldDataRef = oldData;
            }
            return new HeapPage(pid,oldDataRef);
        } catch (IOException e) {
            e.printStackTrace();
            //should never happen -- we parsed it OK before!
            System.exit(1);
        }
        return null;
    }
    
    public void setBeforeImage() {
        synchronized(oldDataLock)
        {
        oldData = getPageData().clone();
        }
    }

    /**
     * @return the PageId associated with this page.
     */
    public HeapPageId getId() {
        // some code goes here
        return pid;
    }

    /**
     * Suck up tuples from the source file.
     */
    private Tuple readNextTuple(DataInputStream dis, int slotId) throws NoSuchElementException {
        // if associated bit is not set, read forward to the next tuple, and
        // return null.
        if (!isSlotUsed(slotId)) {
            for (int i=0; i<td.getSize(); i++) {
                try {
                    dis.readByte();
                } catch (IOException e) {
                    throw new NoSuchElementException("error reading empty tuple");
                }
            }
            return null;
        }

        // read fields in the tuple
        Tuple t = new Tuple(td);
        RecordId rid = new RecordId(pid, slotId);
        t.setRecordId(rid);
        try {
            for (int j=0; j<td.numFields(); j++) {
                Field f = td.getFieldType(j).parse(dis);
                t.setField(j, f);
            }
        } catch (java.text.ParseException e) {
            e.printStackTrace();
            throw new NoSuchElementException("parsing error!");
        }

        return t;
    }

    /**
     * Generates a byte array representing the contents of this page.
     * Used to serialize this page to disk.
     * <p>
     * The invariant here is that it should be possible to pass the byte
     * array generated by getPageData to the HeapPage constructor and
     * have it produce an identical HeapPage object.
     *
     * @see #HeapPage
     * @return A byte array correspond to the bytes of this page.
     */
    public byte[] getPageData() {
        int len = BufferPool.getPageSize();
        ByteArrayOutputStream baos = new ByteArrayOutputStream(len);
        DataOutputStream dos = new DataOutputStream(baos);

        // create the header of the page
        for (byte b : header) {
            try {
                dos.writeByte(b);
            } catch (IOException e) {
                // this really shouldn't happen
                e.printStackTrace();
            }
        }

        // create the tuples
        for (int i=0; i<tuples.length; i++) {

            // empty slot
            if (!isSlotUsed(i)) {
                for (int j=0; j<td.getSize(); j++) {
                    try {
                        dos.writeByte(0);
                    } catch (IOException e) {
                        e.printStackTrace();
                    }

                }
                continue;
            }

            // non-empty slot
            for (int j=0; j<td.numFields(); j++) {
                Field f = tuples[i].getField(j);
                try {
                    f.serialize(dos);
                
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }

        // padding
        int zerolen = BufferPool.getPageSize() - (header.length + td.getSize() * tuples.length); //- numSlots * td.getSize();
        byte[] zeroes = new byte[zerolen];
        try {
            dos.write(zeroes, 0, zerolen);
        } catch (IOException e) {
            e.printStackTrace();
        }

        try {
            dos.flush();
        } catch (IOException e) {
            e.printStackTrace();
        }

        return baos.toByteArray();
    }

    /**
     * Static method to generate a byte array corresponding to an empty
     * HeapPage.
     * Used to add new, empty pages to the file. Passing the results of
     * this method to the HeapPage constructor will create a HeapPage with
     * no valid tuples in it.
     *
     * @return The returned ByteArray.
     */
    public static byte[] createEmptyPageData() {
        int len = BufferPool.getPageSize();
        return new byte[len]; //all 0
    }

    /**
     * Delete the specified tuple from the page; the corresponding header bit should be updated to reflect
     *   that it is no longer stored on any page.
     * @throws DbException if this tuple is not on this page, or tuple slot is
     *         already empty.
     * @param t The tuple to delete
     */
    public void deleteTuple(Tuple t) throws DbException {
        // some code goes here
        // not necessary for Exercise1
    }

    /**
     * Adds the specified tuple to the page;  the tuple should be updated to reflect
     *  that it is now stored on this page.
     * @throws DbException if the page is full (no empty slots) or tupledesc
     *         is mismatch.
     * @param t The tuple to add.
     */
    public void insertTuple(Tuple t) throws DbException {
        // some code goes here
        // not necessary for Exercise1
    }

    /**
     * Marks this page as dirty/not dirty and record that transaction
     * that did the dirtying
     */
    public void markDirty(boolean dirty, TransactionId tid) {
        // some code goes here
	// not necessary for Exercise1
    }

    /**
     * Returns the tid of the transaction that last dirtied this page, or null if the page is not dirty
     */
    public TransactionId isDirty() {
        // some code goes here
	// Not necessary for Exercise1
        return null;      
    }

    /**
     * Returns the number of empty slots on this page.
     */
    public int getNumEmptySlots() {
        // some code goes here
        int count = 0;
        for (int i = 0; i < numSlots; i++) {
            if(!isSlotUsed(i)){
                count++;
            }
        }
        return count;
    }

    /**
     * Returns true if associated slot on this page is filled.
     */
    public boolean isSlotUsed(int i) {
        // some code goes here
        // 槽位
        int quot = i / 8;
        // 偏移
        int move = i % 8;
        // 获得对应的槽位
        int bitidx = header[quot];
        // 偏移 move 位,看是否等于 1
        return ((bitidx >> move) & 1) == 1;
    }

    /**
     * Abstraction to fill or clear a slot on this page.
     */
    private void markSlotUsed(int i, boolean value) {
        // some code goes here
        // not necessary for Exercise1
    }

    /**
     * @return an iterator over all tuples on this page (calling remove on this iterator throws an UnsupportedOperationException)
     * (note that this iterator shouldn't return tuples in empty slots!)
     */
    public Iterator<Tuple> iterator() {
        // some code goes here
        // 获取已使用的槽对应的数
        ArrayList<Tuple> res = new ArrayList<>();
        for (int i = 0; i < numSlots; i++) {
            if(isSlotUsed(i)){
                res.add(tuples[i]);
            }
        }
        return res.iterator();
    }

}

RecordId 路径

参数
  • private PageId pid; //页面 id
  • private int tupleno; // 行id
全代码
package simpledb.storage;

import java.io.Serializable;

/**
 * A RecordId is a reference to a specific tuple on a specific page of a
 * specific table.
 */
public class RecordId implements Serializable {

    private static final long serialVersionUID = 1L;

    private PageId pid;
    private int tupleno;

    /**
     * Creates a new RecordId referring to the specified PageId and tuple
     * number.
     * 
     * @param pid
     *            the pageid of the page on which the tuple resides
     * @param tupleno
     *            the tuple number within the page.
     */
    public RecordId(PageId pid, int tupleno) {
        // some code goes here
        this.pid = pid;
        this.tupleno = tupleno;
    }

    /**
     * @return the tuple number this RecordId references.
     */
    public int getTupleNumber() {
        // some code goes here
        return tupleno;
    }

    /**
     * @return the page id this RecordId references.
     */
    public PageId getPageId() {
        // some code goes here
        return pid;
    }

    /**
     * Two RecordId objects are considered equal if they represent the same
     * tuple.
     * 
     * @return True if this and o represent the same tuple
     */
    @Override
    public boolean equals(Object o) {
        // some code goes here
        if(o instanceof RecordId){
            RecordId recordId = (RecordId) o;
            if (pid.equals(recordId.pid) && tupleno == recordId.tupleno){
                return true;
            }
        }
        return false;
    }

    /**
     * You should implement the hashCode() so that two equal RecordId instances
     * (with respect to equals()) have the same hashCode().
     * 
     * @return An int that is the same for equal RecordId objects.
     */
    @Override
    public int hashCode() {
        // some code goes here
        String hash = "" + pid.getTableId() + pid.getPageNumber() + tupleno;
        return hash.hashCode();
    }

}

HeapPageId 页面id

参数
  • private final int tableId; //对应的表id
  • private final int pgNo; // 页面id
全代码
package simpledb.storage;

/** Unique identifier for HeapPage objects. */
public class HeapPageId implements PageId {
    private final int tableId;
    private final int pgNo;

    /**
     * Constructor. Create a page id structure for a specific page of a
     * specific table.
     *
     * @param tableId The table that is being referenced
     * @param pgNo The page number in that table.
     */
    public HeapPageId(int tableId, int pgNo) {
        // some code goes here
        this.tableId = tableId;
        this.pgNo = pgNo;
    }

    /** @return the table associated with this PageId */
    public int getTableId() {
        // some code goes here
        return tableId;
    }

    /**
     * @return the page number in the table getTableId() associated with
     *   this PageId
     */
    public int getPageNumber() {
        // some code goes here
        return pgNo;
    }

    /**
     * @return a hash code for this page, represented by a combination of
     *   the table number and the page number (needed if a PageId is used as a
     *   key in a hash table in the BufferPool, for example.)
     * @see BufferPool
     */
    public int hashCode() {
        // some code goes here
        String hash = "" + tableId + pgNo;
        return hash.hashCode();
    }

    /**
     * Compares one PageId to another.
     *
     * @param o The object to compare against (must be a PageId)
     * @return true if the objects are equal (e.g., page numbers and table
     *   ids are the same)
     */
    public boolean equals(Object o) {
        // some code goes here
        if(o instanceof PageId){
            PageId page = (PageId) o;
            if(page.getTableId() == tableId && page.getPageNumber() == pgNo){
                return true;
            }
        }
        return false;
    }

    /**
     *  Return a representation of this object as an array of
     *  integers, for writing to disk.  Size of returned array must contain
     *  number of integers that corresponds to number of args to one of the
     *  constructors.
     */
    public int[] serialize() {
        int[] data = new int[2];

        data[0] = getTableId();
        data[1] = getPageNumber();

        return data;
    }
}

测试:

HeapPageIdTest 、HeapPageReadTest、RecordIdTest

Exercise5

HeapFile 存储文件

参数
  • private final File file; //文件
  • private final TupleDesc tupleDesc; // 表头
方法
  • getId():由绝对路径生成id

    public int getId() {
        // some code goes here
        // 文件的绝对路径,取hash。独一无二的id
        return file.getAbsoluteFile().hashCode();
    }
    
  • numsPage():计算文件有多少页

    public int numPages() {
        // some code goes here
        // 文件长度 / 每页的字节数
        int res = (int) Math.floor(file.length() * 1.0 / BufferPool.getPageSize());
        return res;
    }
    
  • readPage():读取页面

        public Page readPage(PageId pid) {
            // some code goes here
            // 表id
            int tableId = pid.getTableId();
            // 该表所处的页码
            int pgNo = pid.getPageNumber();
            // 随机访问,指针偏移访问
            RandomAccessFile f = null;
            try{
                // 读取当前文件
                f = new RandomAccessFile(file, "r");
                // 当前页号 * 每页的字节大小 是否超出文件的范围
                if((pgNo + 1) * BufferPool.getPageSize() > f.length()){
                    f.close();
                    throw new IllegalArgumentException(String.format("表 %d 页 %d 不存在", tableId, pgNo));
                }
                // 用于储存
                byte[] bytes = new byte[BufferPool.getPageSize()];
                // 指针偏移
                f.seek(pgNo * BufferPool.getPageSize());
                // 读取(返回读取的数量)
                int read = f.read(bytes, 0, BufferPool.getPageSize());
                // 如果取出来少了,说明不存在
                if(read != BufferPool.getPageSize()){
                    throw new IllegalArgumentException(String.format("表 %d 页 %d 不存在", tableId, pgNo));
                }
                return new HeapPage(new HeapPageId(pid.getTableId(), pid.getPageNumber()), bytes);
            }catch (IOException e){
                e.printStackTrace();
            }finally {
                try{
                    // 关闭流
                    f.close();
                }catch (IOException e){
                    e.printStackTrace();
                }
            }
            throw new IllegalArgumentException(String.format("表 %d 页 %d 不存在", tableId, pgNo));
        }
    
  • DbFileIterator iterator(TransactionId tid):迭代器,需要写一个内部类

    实现DbFileIterator

    public DbFileIterator iterator(TransactionId tid) {
            // some code goes here
            return new HeapFileIterator(this, tid);
        }
    
        private static final class HeapFileIterator implements DbFileIterator{
            private final HeapFile heapFile;
            private final TransactionId tid;
            // 元组迭代器
            private Iterator<Tuple> iterator;
            private int whichPage;
    
            public HeapFileIterator(HeapFile heapFile, TransactionId tid) {
                this.heapFile = heapFile;
                this.tid = tid;
            }
    
            @Override
            public void open() throws DbException, TransactionAbortedException {
                // 获取第一页的全部元组
                whichPage = 0;
                iterator = getPageTuple(whichPage);
            }
    
            // 获取当前页的所有行
            private Iterator<Tuple> getPageTuple(int pageNumber) throws TransactionAbortedException, DbException {
                // 在文件范围内
                if(pageNumber >= 0 && pageNumber < heapFile.numPages()){
                    HeapPageId pid = new HeapPageId(heapFile.getId(), pageNumber);
                    // 从缓存池中查询相应的页面 读权限
                    HeapPage page = (HeapPage) Database.getBufferPool().getPage(tid, pid, Permissions.READ_ONLY);
                    return page.iterator();
                }
                throw new DbException(String.format("heapFile %d not contain page %d", pageNumber, heapFile.getId()));
            }
    
            @Override
            public boolean hasNext() throws DbException, TransactionAbortedException {
                // 如果迭代器为空
                if(iterator == null){
                    return false;
                }
                // 如果已经遍历结束
                if(!iterator.hasNext()){
                    // 是否还存在下一页,小于文件的最大页
                    while(whichPage < (heapFile.numPages() - 1)){
                        whichPage++;
                        // 获取下一页
                        iterator = getPageTuple(whichPage);
                        if(iterator.hasNext()){
                            return iterator.hasNext();
                        }
                    }
                    // 所有元组获取完毕
                    return false;
                }
                return true;
            }
    
            @Override
            public Tuple next() throws DbException, TransactionAbortedException, NoSuchElementException {
                // 如果没有元组了,抛出异常
                if(iterator == null || !iterator.hasNext()){
                    throw new NoSuchElementException();
                }
                // 返回下一个元组
                return iterator.next();
            }
    
            @Override
            public void rewind() throws DbException, TransactionAbortedException {
                // 清除上一个迭代器
                close();
                // 重新开始
                open();
            }
    
            @Override
            public void close() {
                iterator = null;
            }
        }
    
全代码
package simpledb.storage;

import simpledb.common.Database;
import simpledb.common.DbException;
import simpledb.common.Permissions;
import simpledb.transaction.TransactionAbortedException;
import simpledb.transaction.TransactionId;

import java.io.*;
import java.util.*;

/**
 * HeapFile is an implementation of a DbFile that stores a collection of tuples
 * in no particular order. Tuples are stored on pages, each of which is a fixed
 * size, and the file is simply a collection of those pages. HeapFile works
 * closely with HeapPage. The format of HeapPages is described in the HeapPage
 * constructor.
 * 
 * @see HeapPage#HeapPage
 * @author Sam Madden
 */
public class HeapFile implements DbFile {
    private final File file;
    private final TupleDesc tupleDesc;

    /**
     * Constructs a heap file backed by the specified file.
     * 
     * @param f
     *            the file that stores the on-disk backing store for this heap
     *            file.
     */
    public HeapFile(File f, TupleDesc td) {
        // some code goes here
        this.file = f;
        this.tupleDesc = td;
    }

    /**
     * Returns the File backing this HeapFile on disk.
     * 
     * @return the File backing this HeapFile on disk.
     */
    public File getFile() {
        // some code goes here
        return file;
    }

    /**
     * Returns an ID uniquely identifying this HeapFile. Implementation note:
     * you will need to generate this tableid somewhere to ensure that each
     * HeapFile has a "unique id," and that you always return the same value for
     * a particular HeapFile. We suggest hashing the absolute file name of the
     * file underlying the heapfile, i.e. f.getAbsoluteFile().hashCode().
     * 
     * @return an ID uniquely identifying this HeapFile.
     */
    public int getId() {
        // some code goes here
        // 文件的绝对路径,取hash。独一无二的id
        return file.getAbsoluteFile().hashCode();
    }

    /**
     * Returns the TupleDesc of the table stored in this DbFile.
     * 
     * @return TupleDesc of this DbFile.
     */
    public TupleDesc getTupleDesc() {
        // some code goes here
        return tupleDesc;
    }

    // see DbFile.java for javadocs
    public Page readPage(PageId pid) {
        // some code goes here
        // 表id
        int tableId = pid.getTableId();
        // 该表所处的页码
        int pgNo = pid.getPageNumber();
        // 随机访问,指针偏移访问
        RandomAccessFile f = null;
        try{
            // 读取当前文件
            f = new RandomAccessFile(file, "r");
            // 当前页号 * 每页的字节大小 是否超出文件的范围
            if((pgNo + 1) * BufferPool.getPageSize() > f.length()){
                f.close();
                throw new IllegalArgumentException(String.format("表 %d 页 %d 不存在", tableId, pgNo));
            }
            // 用于储存
            byte[] bytes = new byte[BufferPool.getPageSize()];
            // 指针偏移
            f.seek(pgNo * BufferPool.getPageSize());
            // 读取(返回读取的数量)
            int read = f.read(bytes, 0, BufferPool.getPageSize());
            // 如果取出来少了,说明不存在
            if(read != BufferPool.getPageSize()){
                throw new IllegalArgumentException(String.format("表 %d 页 %d 不存在", tableId, pgNo));
            }
            return new HeapPage(new HeapPageId(pid.getTableId(), pid.getPageNumber()), bytes);
        }catch (IOException e){
            e.printStackTrace();
        }finally {
            try{
                // 关闭流
                f.close();
            }catch (IOException e){
                e.printStackTrace();
            }
        }
        throw new IllegalArgumentException(String.format("表 %d 页 %d 不存在", tableId, pgNo));
    }

    // see DbFile.java for javadocs
    public void writePage(Page page) throws IOException {
        // some code goes here
        // not necessary for Exercise1
    }

    /**
     * Returns the number of pages in this HeapFile.
     */
    public int numPages() {
        // some code goes here
        // 文件长度 / 每页的字节数
        int res = (int) Math.floor(file.length() * 1.0 / BufferPool.getPageSize());
        return res;
    }

    // see DbFile.java for javadocs
    public List<Page> insertTuple(TransactionId tid, Tuple t)
            throws DbException, IOException, TransactionAbortedException {
        // some code goes here
        return null;
        // not necessary for Exercise1
    }

    // see DbFile.java for javadocs
    public ArrayList<Page> deleteTuple(TransactionId tid, Tuple t) throws DbException,
            TransactionAbortedException {
        // some code goes here
        return null;
        // not necessary for Exercise1
    }

    // see DbFile.java for javadocs
    public DbFileIterator iterator(TransactionId tid) {
        // some code goes here
        return new HeapFileIterator(this, tid);
    }

    private static final class HeapFileIterator implements DbFileIterator{
        private final HeapFile heapFile;
        private final TransactionId tid;
        // 元组迭代器
        private Iterator<Tuple> iterator;
        private int whichPage;

        public HeapFileIterator(HeapFile heapFile, TransactionId tid) {
            this.heapFile = heapFile;
            this.tid = tid;
        }

        @Override
        public void open() throws DbException, TransactionAbortedException {
            // 获取第一页的全部元组
            whichPage = 0;
            iterator = getPageTuple(whichPage);
        }

        // 获取当前页的所有行
        private Iterator<Tuple> getPageTuple(int pageNumber) throws TransactionAbortedException, DbException {
            // 在文件范围内
            if(pageNumber >= 0 && pageNumber < heapFile.numPages()){
                HeapPageId pid = new HeapPageId(heapFile.getId(), pageNumber);
                // 从缓存池中查询相应的页面 读权限
                HeapPage page = (HeapPage) Database.getBufferPool().getPage(tid, pid, Permissions.READ_ONLY);
                return page.iterator();
            }
            throw new DbException(String.format("heapFile %d not contain page %d", pageNumber, heapFile.getId()));
        }

        @Override
        public boolean hasNext() throws DbException, TransactionAbortedException {
            // 如果迭代器为空
            if(iterator == null){
                return false;
            }
            // 如果已经遍历结束
            if(!iterator.hasNext()){
                // 是否还存在下一页,小于文件的最大页
                while(whichPage < (heapFile.numPages() - 1)){
                    whichPage++;
                    // 获取下一页
                    iterator = getPageTuple(whichPage);
                    if(iterator.hasNext()){
                        return iterator.hasNext();
                    }
                }
                // 所有元组获取完毕
                return false;
            }
            return true;
        }

        @Override
        public Tuple next() throws DbException, TransactionAbortedException, NoSuchElementException {
            // 如果没有元组了,抛出异常
            if(iterator == null || !iterator.hasNext()){
                throw new NoSuchElementException();
            }
            // 返回下一个元组
            return iterator.next();
        }

        @Override
        public void rewind() throws DbException, TransactionAbortedException {
            // 清除上一个迭代器
            close();
            // 重新开始
            open();
        }

        @Override
        public void close() {
            iterator = null;
        }
    }
}

测试:

HeapFileReadTest

Exercise6

SeqScan 查询

参数
  • private final TransactionId tid // 事务id
  • private int tableid // 要查询的表id
  • private String tableAlias // 表别名
  • private DbFileIterator iterator //迭代器,用于检索
方法
  • getTableName():获取table的名称

    通过Catelog查询

    public String getTableName() {
        return Database.getCatalog().getTableName(tableid);
    }
    
  • open():开启迭代器,也就是获取迭代器

    • 这里的迭代器其实就是使用之前写的 HeapFileIterator,所以主要代码还是在之前的内部类中,这里只是调用
    public void open() throws DbException, TransactionAbortedException {
        // some code goes here
        // 查询目录 -》 根据表id查询相应的 DBFile -> 获取迭代器
        iterator = Database.getCatalog().getDatabaseFile(tableid).iterator(tid);
        iterator.open();
    }
    
  • getTupleDesc():通过Catelog获取TupleDesc,并修改为别名

    public TupleDesc getTupleDesc() {
        // some code goes here
        TupleDesc tupleDesc = Database.getCatalog().getTupleDesc(tableid);
        String prefix = tableAlias != null? tableAlias : "null";
        // 遍历,添加前缀
        int len = tupleDesc.numFields();
        Type[] types = new Type[len];
        String[] fieldNames = new String[len];
        for (int i = 0; i < len; i++) {
            types[i] = tupleDesc.getFieldType(i);
            fieldNames[i] = prefix + "." + tupleDesc.getFieldName(i);
        }
        return new TupleDesc(types, fieldNames);
    }
    
全代码
package simpledb.execution;

import simpledb.common.Database;
import simpledb.transaction.TransactionAbortedException;
import simpledb.transaction.TransactionId;
import simpledb.common.Type;
import simpledb.common.DbException;
import simpledb.storage.DbFileIterator;
import simpledb.storage.Tuple;
import simpledb.storage.TupleDesc;

import java.util.*;

/**
 * SeqScan is an implementation of a sequential scan access method that reads
 * each tuple of a table in no particular order (e.g., as they are laid out on
 * disk).
 */
public class SeqScan implements OpIterator {

    private static final long serialVersionUID = 1L;
    private final TransactionId tid;
    private int tableid;
    private String tableAlias;
    private DbFileIterator iterator;

    /**
     * Creates a sequential scan over the specified table as a part of the
     * specified transaction.
     *
     * @param tid
     *            The transaction this scan is running as a part of.
     * @param tableid
     *            the table to scan.
     * @param tableAlias
     *            the alias of this table (needed by the parser); the returned
     *            tupleDesc should have fields with name tableAlias.fieldName
     *            (note: this class is not responsible for handling a case where
     *            tableAlias or fieldName are null. It shouldn't crash if they
     *            are, but the resulting name can be null.fieldName,
     *            tableAlias.null, or null.null).
     */
    public SeqScan(TransactionId tid, int tableid, String tableAlias) {
        // some code goes here
        this.tid = tid;
        this.tableid = tableid;
        this.tableAlias = tableAlias;
    }

    /**
     * @return
     *       return the table name of the table the operator scans. This should
     *       be the actual name of the table in the catalog of the database
     * */
    public String getTableName() {
        return Database.getCatalog().getTableName(tableid);
    }

    /**
     * @return Return the alias of the table this operator scans.
     * */
    public String getAlias()
    {
        // some code goes here
        return tableAlias;
    }

    /**
     * Reset the tableid, and tableAlias of this operator.
     * @param tableid
     *            the table to scan.
     * @param tableAlias
     *            the alias of this table (needed by the parser); the returned
     *            tupleDesc should have fields with name tableAlias.fieldName
     *            (note: this class is not responsible for handling a case where
     *            tableAlias or fieldName are null. It shouldn't crash if they
     *            are, but the resulting name can be null.fieldName,
     *            tableAlias.null, or null.null).
     */
    public void reset(int tableid, String tableAlias) {
        // some code goes here
        this.tableid = tableid;
        this.tableAlias = tableAlias;
    }

    public SeqScan(TransactionId tid, int tableId) {
        this(tid, tableId, Database.getCatalog().getTableName(tableId));
    }

    public void open() throws DbException, TransactionAbortedException {
        // some code goes here
        // 查询目录 -》 根据表id查询相应的 DBFile -> 获取迭代器
        iterator = Database.getCatalog().getDatabaseFile(tableid).iterator(tid);
        iterator.open();
    }

    /**
     * Returns the TupleDesc with field names from the underlying HeapFile,
     * prefixed with the tableAlias string from the constructor. This prefix
     * becomes useful when joining tables containing a field(s) with the same
     * name.  The alias and name should be separated with a "." character
     * (e.g., "alias.fieldName").
     *
     * @return the TupleDesc with field names from the underlying HeapFile,
     *         prefixed with the tableAlias string from the constructor.
     */
    public TupleDesc getTupleDesc() {
        // some code goes here
        TupleDesc tupleDesc = Database.getCatalog().getTupleDesc(tableid);
        String prefix = tableAlias != null? tableAlias : "null";
        // 遍历,添加前缀
        int len = tupleDesc.numFields();
        Type[] types = new Type[len];
        String[] fieldNames = new String[len];
        for (int i = 0; i < len; i++) {
            types[i] = tupleDesc.getFieldType(i);
            fieldNames[i] = prefix + "." + tupleDesc.getFieldName(i);
        }
        return new TupleDesc(types, fieldNames);
    }

    public boolean hasNext() throws TransactionAbortedException, DbException {
        // some code goes here
        if(iterator == null){
            return false;
        }
        return iterator.hasNext();
    }

    public Tuple next() throws NoSuchElementException,
            TransactionAbortedException, DbException {
        // some code goes here
        if(iterator == null){
            throw new NoSuchElementException("No Next Tuple");
        }
        Tuple tuple = iterator.next();
        if(tuple == null){
            throw new NoSuchElementException("No Next Tuple");
        }
        return tuple;
    }

    public void close() {
        // some code goes here
        iterator = null;
    }

    public void rewind() throws DbException, NoSuchElementException,
            TransactionAbortedException {
        // some code goes here
        iterator.rewind();
    }
}

测试:

Exercise7

简易查询

准备阶段

在 src/java/simpledb 下,创建一个test测试

创建一个db文件 test.txt

数据自定,比如

1,1,1,1
2,2,2,2
3,4,4,5

再将test.txt 转化为 二进制格式

需要调用HeapFileEncoder.convert()

HeapFileEncoder
package simpledb.storage;

import simpledb.common.Type;
import simpledb.common.Utility;

import java.io.*;
import java.util.Arrays;
import java.util.List;

/**
 * HeapFileEncoder reads a comma delimited text file or accepts
 * an array of tuples and converts it to
 * pages of binary data in the appropriate format for simpledb heap pages
 * Pages are padded out to a specified length, and written consecutive in a
 * data file.
 */

public class HeapFileEncoder {

  /** Convert the specified tuple list (with only integer fields) into a binary
   * page file. <br>
   *
   * The format of the output file will be as specified in HeapPage and
   * HeapFile.
   *
   * @see HeapPage
   * @see HeapFile
   * @param tuples the tuples - a list of tuples, each represented by a list of integers that are
   *        the field values for that tuple.
   * @param outFile The output file to write data to
   * @param npagebytes The number of bytes per page in the output file
   * @param numFields the number of fields in each input tuple
   * @throws IOException if the temporary/output file can't be opened
   */
  public static void convert(List<List<Integer>> tuples, File outFile, int npagebytes, int numFields) throws IOException {
      // 创建临时文件
      File tempInput = File.createTempFile("tempTable", ".txt");
      tempInput.deleteOnExit();
      // 写流
      BufferedWriter bw = new BufferedWriter(new FileWriter(tempInput));
      // 遍历元组, 统计元组数量
      for (List<Integer> tuple : tuples) {
          int writtenFields = 0;
          for (Integer field : tuple) {
              writtenFields++;
              // 如果超过每个字段的元组数,抛出异常
              if (writtenFields > numFields) {
                  throw new RuntimeException("Tuple has more than " + numFields + " fields: (" +
                          Utility.listToString(tuple) + ")");
              }
              // 写入当前元组
              bw.write(String.valueOf(field));
              // 分隔
              if (writtenFields < numFields) {
                  bw.write(',');
              }
          }
          // 换行
          bw.write('\n');
      }
      bw.close();
      // 传入下一个当作输入文件
      convert(tempInput, outFile, npagebytes, numFields);
  }

      public static void convert(File inFile, File outFile, int npagebytes,
                 int numFields) throws IOException {
      // 创建类型,每一个赋初值位 INT
      Type[] ts = new Type[numFields];
          Arrays.fill(ts, Type.INT_TYPE);
      convert(inFile,outFile,npagebytes,numFields,ts);
      }

  public static void convert(File inFile, File outFile, int npagebytes,
                 int numFields, Type[] typeAr)
      throws IOException {
      convert(inFile,outFile,npagebytes,numFields,typeAr,',');
  }

   /** Convert the specified input text file into a binary
    * page file. <br>
    * Assume format of the input file is (note that only integer fields are
    * supported):<br>
    * int,...,int\n<br>
    * int,...,int\n<br>
    * ...<br>
    * where each row represents a tuple.<br>
    * <p>
    * The format of the output file will be as specified in HeapPage and
    * HeapFile.
    *
    * @see HeapPage
    * @see HeapFile
    * @param inFile The input file to read data from
    * @param outFile The output file to write data to
    * @param npagebytes The number of bytes per page in the output file
    * @param numFields the number of fields in each input line/output tuple
    * @throws IOException if the input/output file can't be opened or a
    *   malformed input line is encountered
    */
  public static void convert(File inFile, File outFile, int npagebytes,
                 int numFields, Type[] typeAr, char fieldSeparator)
      throws IOException {
      // 统计类型的总字节数
      int nrecbytes = 0;
      for (int i = 0; i < numFields ; i++) {
          nrecbytes += typeAr[i].getLen();
      }
      // 计算每页存储的的元组数量
      int nrecords = (npagebytes * 8) /  (nrecbytes * 8 + 1);  //floor comes for free
      
    //  per record, we need one bit; there are nrecords per page, so we need
    // nrecords bits, i.e., ((nrecords/32)+1) integers.
      // 计算所需要的物理大小,一字节存储 8 位,以字节为单位,向上取整
    int nheaderbytes = (nrecords / 8);
    if (nheaderbytes * 8 < nrecords)
        nheaderbytes++;  //ceiling
      //恢复成比特数量
    int nheaderbits = nheaderbytes * 8;

    BufferedReader br = new BufferedReader(new FileReader(inFile));
    FileOutputStream os = new FileOutputStream(outFile);

    // our numbers probably won't be much larger than 1024 digits
    char[] buf = new char[1024];

    // 字符指针
    int curpos = 0;
    // 行计数(元组计数)
    int recordcount = 0;
    int npages = 0;
    // 字符计数
    int fieldNo = 0;

    ByteArrayOutputStream headerBAOS = new ByteArrayOutputStream(nheaderbytes);
    DataOutputStream headerStream = new DataOutputStream(headerBAOS);
    ByteArrayOutputStream pageBAOS = new ByteArrayOutputStream(npagebytes);
    DataOutputStream pageStream = new DataOutputStream(pageBAOS);

    boolean done = false;
    boolean first = true;
    while (!done) {
        // 读取指针所指字符
        int c = br.read();
        
        // Ignore Windows/Notepad special line endings
        if (c == '\r')
            continue;

        // 换行跳到下一行
        if (c == '\n') {
            if (first)
                continue;
            // 行数 + 1
            recordcount++;
            first = true;
        } else
            first = false;


        // 如果等于 分隔符, 换行
        if (c == fieldSeparator || c == '\n' || c == '\r') {
            // 读取当前已经写入 buf 的字符串
            String s = new String(buf, 0, curpos);
            // 判断类型
            if (typeAr[fieldNo] == Type.INT_TYPE) {
                try {
                    // 删除前导空格和后置空格 然后转整型后写入
                    pageStream.writeInt(Integer.parseInt(s.trim()));
                } catch (NumberFormatException e) {
                    System.out.println ("BAD LINE : " + s);
                }
            }
            else   if (typeAr[fieldNo] == Type.STRING_TYPE) {
                s = s.trim();
                // 计算长度, 是否大于最大限制
                int overflow = Type.STRING_LEN - s.length();
                // 大于String允许最大限制,截取
                if (overflow < 0) {
                    s  = s.substring(0,Type.STRING_LEN);
                }
                // 写入字符串长度
                pageStream.writeInt(s.length());
                // 写入字符串(有可能是截取的)
                pageStream.writeBytes(s);
                // 如果未满,填充 byte 0
                while (overflow-- > 0)
                    pageStream.write((byte)0);
            }
            // 计数重置
            curpos = 0;
            // 换行
            if (c == '\n')
                // 字符计数重置
                fieldNo = 0;
            else
                // 字符计数 + 1
                fieldNo++;

            // 结束字符
        } else if (c == -1) {
            done = true;
            
        }
        // c 为字符,记录到buff
        else {
            buf[curpos++] = (char)c;
            continue;
        }
        
        // if we wrote a full page of records, or if we're done altogether,
        // write out the header of the page.
        //
        // in the header, write a 1 for bits that correspond to records we've
        // written and 0 for empty slots.
        //
        // when we're done, also flush the page to disk, but only if it has
        // records on it.  however, if this file is empty, do flush an empty
        // page to disk.

        // 元组总数是否大于等于每页的最大元组数, 是否结束 并且 元组总数 > 0, 结束 并且 页面 = 0
        // 也就是大于等于一页,需要提前分割
        if (recordcount >= nrecords
            || done && recordcount > 0
            || done && npages == 0) {
            int i = 0;
            byte headerbyte = 0;
            // 落到相应槽位,对应点置 1
            for (i=0; i<nheaderbits; i++) {
                // 每一行入槽位
                if (i < recordcount)
                    headerbyte |= (1 << (i % 8));

                // 写入当前槽位
                if (((i+1) % 8) == 0) {
                    headerStream.writeByte(headerbyte);
                    // 重置,下一槽位
                    headerbyte = 0;
                }
            }
            // 如果还有未满槽位,写入
            if (i % 8 > 0)
                headerStream.writeByte(headerbyte);
            
            // pad the rest of the page with zeroes
            // 填充满整页
            for (i=0; i<(npagebytes - (recordcount * nrecbytes + nheaderbytes)); i++)
                pageStream.writeByte(0);
            
            // write header and body to file
            headerStream.flush();
            headerBAOS.writeTo(os);
            pageStream.flush();
            pageBAOS.writeTo(os);
            
            // reset header and body for next page
            headerBAOS = new ByteArrayOutputStream(nheaderbytes);
            headerStream = new DataOutputStream(headerBAOS);
            pageBAOS = new ByteArrayOutputStream(npagebytes);
            pageStream = new DataOutputStream(pageBAOS);

            // 重置
            recordcount = 0;
            // 页面 + 1
            npages++;
        }
    }
    br.close();
    os.close();
  }
}
代码:
package simpledb;

import simpledb.common.Database;
import simpledb.common.Type;
import simpledb.execution.SeqScan;
import simpledb.storage.*;
import simpledb.transaction.TransactionId;

import java.io.File;
import java.io.IOException;

public class test {
    public static void main(String[] args) throws IOException {
        // 创建模式头部
        Type[] types = new Type[]{Type.INT_TYPE, Type.INT_TYPE, Type.INT_TYPE, Type.INT_TYPE};
        String[] name = new String[]{"field0", "field1", "field2", "field3"};
        TupleDesc tupleDesc = new TupleDesc(types, name);
        HeapFileEncoder.convert(new File("test.dat"), new File("some_date_file.dat"), BufferPool.getPageSize(), 4, types);
        File file = new File("some_date_file.dat");
        // 创建 table 文件
        HeapFile heapFile = new HeapFile(new File("some_date_file.dat"), tupleDesc);

        // 将table 文件 写入日志,表名test
        Database.getCatalog().addTable(heapFile, "test");

        // 创建事务 id
        TransactionId transactionId = new TransactionId();
        // 根据表 id 查询
        SeqScan scan = new SeqScan(transactionId, heapFile.getId());

        try{
            scan.open();
            while (scan.hasNext()){
                Tuple tuple = scan.next();
                System.out.println(tuple);
            }
            scan.close();
            Database.getBufferPool().transactionComplete(transactionId);
        }catch (Exception e){
            e.printStackTrace();
        }

    }
}

过程中遇到问题

在准备阶段的时候,如果最后一个数据没有换行,会导致最后一行丢失,并且多出部分数据

例如:

1,1,1,1
2,2,2,2
3,4,4,5

输出的结果是:

1,1,1,1
2,2,2,2
最后一行丢失的问题

原因

在 HeapFileEncoder 的代码中,有一个记录行数的字段叫 recordcount ,会记录当前的行总数

// 换行跳到下一行
if (c == '\n') {
    if (first)
        continue;
    // 行数 + 1
    recordcount++;		// 这个地方是更新的时机
    first = true;
} else
    first = false;

如果没有换行的情况下,就不会有更新的时机。因此,在最后执行行写入槽位时会丢失一行

// 落到相应槽位,对应点置 1
for (i=0; i<nheaderbits; i++) {
    // 每一行入槽位
    if (i < recordcount)   ---->>> //这个位置,由于少了一行
        headerbyte |= (1 << (i % 8));
    // 写入当前槽位
    if (((i+1) % 8) == 0) {
        headerStream.writeByte(headerbyte);
        // 重置,下一槽位
        headerbyte = 0;
    }
}
大小的问题

在判断的时候,可以看到写入的时机是遇到分隔符和换行符

// 如果等于 分隔符, 换行
if (c == fieldSeparator || c == '\n' || c == '\r') {
    // 读取当前已经写入 buf 的字符串
    String s = new String(buf, 0, curpos);
    // 判断类型

也就是说基于这种情况,如果最后没有去加上换行,就会丢失最后的一个字符。

可以尝试输出当前文件的大小来对比加换行符和不加换行符的区别

输出:
	加换行符:4096
    不加换行符:4108

原因

上面测试的数据是3行4列,每一个数字占4个字节。当最后一个字节丢失的时候,也就是多写入了三个字节,并且系统不知道这个情况,因为没有更新 recordcount ,在最后的写入中无法感知。

// pad the rest of the page with zeroes
// 填充满整页
for (i=0; i<(npagebytes - (recordcount * nrecbytes + nheaderbytes)); i++)
    pageStream.writeByte(0);

npagebytes - (recordcount * nrecbytes + nheaderbytes) 就是

一页总的数量 - 行数 * 每行的比特数 + 头部所占的数

丢失了 1 行,也就是系统记录是 2 行。并且由于最后一个字符没有写入,所以是会多写入了 3 个数字,也就是 3 * 4 = 12 字节
也就是说 填充完0之后,理应是 4096,但由于多了 12 就变成 4096 + 12 = 4108 字节
  • 4
    点赞
  • 22
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
MIT 6.824 课程的 Lab1 是关于 Map 的实现,这里单介绍一下实现过程。 MapReduce 是一种布式计算模型,它可以用来处理大规模数据集。MapReduce 的核心想是将数据划分为多个块,每个块都可以在不同的节点上并行处理,然后将结果合并在一起。 在 Lab1 中,我们需要实现 MapReduce 的基本功能,包括 Map 函数、Reduce 函数、分区函数、排序函数以及对作业的整体控制等。 首先,我们需要实现 Map 函数。Map 函数会读取输入文件,并将其解析成一系列键值对。对于每个键值对,Map 函数会将其传递给用户定义的 Map 函数,生成一些新的键值对。这些新的键值对会被分派到不同的 Reduce 任务中,进行进一步的处理。 接着,我们需要实现 Reduce 函数。Reduce 函数接收到所有具有相同键的键值对,并将它们合并成一个结果。Reduce 函数将结果写入输出文件。 然后,我们需要实现分区函数和排序函数。分区函数将 Map 函数生成的键值对映射到不同的 Reduce 任务中。排序函数将键值对按键进行排序,确保同一键的所有值都被传递给同一个 Reduce 任务。 最后,我们需要实现整个作业的控制逻辑。这包括读取输入文件、调用 Map 函数、分区、排序、调用 Reduce 函数以及写入输出文件。 Lab1 的实现可以使用 Go 语言、Python 或者其他编程语言。我们可以使用本地文件系统或者分布式文件系统(比如 HDFS)来存储输入和输出文件。 总体来说,Lab1 是一个比较简单的 MapReduce 实现,但它奠定了 MapReduce 的基础,为后续的 Lab 提供了良好的基础。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值