文件管理
- 在操作系统中文件操作流程:
fd = open(fileName,mode)
// open a named file for reading.writing/appending
close(fd)
// close an open file, via its descriptor
read = read(rd, but, nbytes)
//attempt to read data from file into buffer
written = write(fd, but, nbytes)
//attempt to write data from buffer to file
lseek(fd, offset, seek_type)
//move file pointer to relative/absolute file offset
fsync(fd)
//flush contents of file buffers to disk
- 不同DBMS的数据有不同安排:
(1) 在文件系统中使用raw disk partition(原来的Oracle)
(2) 用一个大文件包含所有DB数据(SQL)
(3) 一些大文件构成表(现在的Oracle)
(4) 与表一一对应的的multiple data files(PostgreSQL)
(5) multiple files 对应每个表(含一个main file)
…
Single-file Storage Manager(单文件DBMS层)
PS:如上,如果Employee数据很大,可以考虑将Employee Data Pages扩容,或者在Project Data Pages 后面再添加剩余的Employee 数据(添加link)。
SpaceMap: 每个chunk偏移,用了多少空间和状态。([(0,10,U)…])
NameMap: 给定一个表的名,文件在数据库的哪里以及它用了多少空间。[(“employee”,20,350)…]
每个file segment(chunk) 包含固定数量的blocks。
数据定义
#define PAGESIZE 4096 //bytes per page
typedef long PageId; //PageId is block index
// pageOffset = PageId*PAGESIZE
typedef char *Page; // pointer to page/block buffer
PAGESIZE取值:1024,2048,4096,8192
DB的Storage Manager data structure:
typedef struct DBrec{
char *dbname; //copy of database name
int fd;
SpaceMap map;
NameMap names;
} *DB;
//DB
typedef struct Relrec{
char *relname; //copy of table name
int start; //page index of start of table data
int npages; //number of pages of table data
...
}*Rel
//Table
Eg. 扫描表
DB db = openDatabase("myDB");
Rel r = openRelation(db,"Employee");
//current page
Page buffer = malloc(PAGESIZE*sizeof(char));
//Assuming continuous storage
for (int i =0; i<r->npages; i++){
//Fetch pageID according to offsets
PageId pid = r->