COMP9315-week2-lecture1

COMP9315 19T2 Week 2 Lecture 1_哔哩哔哩_bilibili

C:\python\COMP9315-master\19T2\Lectures\weel02.pdf

COMP9315 24T1 - Course Notes (unsw.edu.au)

前面三分之一时间讲week1的练习题,是一个存储过程

COMP9315-master\19T2\Lecture Exercises\week01\ex05\schema.sql 这是一个存储过程

Week 02 Lectures

Storage Manager
Storage Management 2/68
Levels of DBMS related to storage management

... Storage Management 3/68
Aims of storage management in DBMS:

  • provide view of data as collection of pages/tuples
  • map from database objects (e.g. tables) to disk files
  • manage transfer of data to/from disk storage
  • use buffers to minimise disk/memory transfers
  • interpret loaded data as tuples/records
  • basis for file structures used by access methods

Views of Data in Query Evaluation

... Views of Data in Query Evaluation 5/68
Representing database objects during query execution:

  • DB (handle on an authorised/opened database)
  • Rel (handle on an opened relation)
  • Page (memory buffer to hold contents of disk block)
  • Tuple (memory holding data values from one tuple)

Addressing in DBMSs:

  • PageID = FileID+Offset ... identifies a block of data
    • where Offset gives location of block within file
  • TupleID = PageID+Index ... identifies a single tuple
    • where Index gives location of tuple within page

Storage Management 6/68
Topics in storage management ...

  • Disks and Files
    • performance issues and organisation of disk files
  • Buffer Management
    • using caching to improve DBMS system throughput
  • Tuple/Page Management
    • how tuples are represented within disk pages
  • DB Object Management (Catalog)
    • how tables/views/functions/types, etc. are represented

Storage Technology
Storage Technology 8/68
Persistent storage is

  • large, cheap, relatively slow, accessed in blocks
  • used for long-term storage of data

Computational storage is

  • small, expensive, fast, accessed by byte/word
  • used for all analysis of data

Access cost HDD:RAM ≅ 100000:1, e.g.

  • 10ms to read block containing two tuples
  • 1μs to compare fields in two tuples

... Storage Technology 9/68
Hard disks are well-established, cheap, high-volume, ...
Alternative bulk storage: SSD

  • faster than HDDs, no latency
  • can read single items
  • update requires block erase then write
  • over time, writes "wear out" blocks
  • require controllers that spread write load

Feasible for long-term, high-update environments?

... Storage Technology 10/68
Comparison of HDD and SSD properties:
                  HDD          SDD
Cost/byte ~ 4c / GB ~ 13c / GB
Read latency ~ 10ms ~ 50μs
Write latency ~ 10ms ~ 900μs
Read unit block (e.g. 1KB) byte
Writing write a block write on empty block
Will SSDs ever replace HDDs?

Cost Models 11/68
Throughout this course, we compare costs of DB operations
Important aspects in determining cost:

  • data is always transferred to/from disk as whole blocks (pages)
  • cost of manipulating tuples in memory is negligible
  • overall cost determined primarily by #data-blocks read/written

Complicating factors in determining costs:

  • not all page accesses require disk access (buffer pool)
  • tuples typically have variable size (tuples/page ?)

More details later ...

File Management 12/68
Aims of file management subsystem:

  • organise layout of data within the filesystem
  • handle mapping from database ID to file address
  • transfer blocks of data between buffer pool and filesystem
  • also attempts to handle file access error problems (retry)

Builds higher-level operations on top of OS file operations

... File Management 13/68
Typical file operations provided by the operating system:
fd = open(fileName,mode)
// open a named file for reading/writing/appending
close(fd)
// close an open file, via its descriptor
nread = read(fd, buf, nbytes)
// attempt to read data from file into buffer
nwritten = write(fd, buf, nbytes)
// attempt to write data from buffer to file
lseek(fd, offset, seek_type)
// move file pointer to relative/absolute file offset
fsync(fd)
// flush contents of file buffers to disk

DBMS File Organisation 14/68
How is data for DB objects arranged in the file system?
Different DBMSs make different choices, e.g.

  • by-pass the file system and use a raw disk partition
  • have a single very large file containing all DB data
  • have several large files, with tables spread across them
  • have multiple data files, one for each table
  • have multiple files for each table

etc.

Single-file DBMS 15/68
Consider a single file for the entire database (e.g. SQLite)
Objects are allocated to regions (segments) of the file.

If an object grows too large for allocated segment, allocate an extension.
What happens to allocated space when objects are removed?

... Single-file DBMS 16/68
Allocating space in Unix files is easy:

  • simply seek to the place you want and write the data
  • if nothing there already, data is appended to the file
  • if something there already, it gets overwritten

If the seek goes way beyond the end of the file:

  • Unix does not (yet) allocate disk space for the "hole"
  • allocates disk storage only when data is written there

With the above, a disk/file manager is easy to implement

Single-file Storage Manager 17/68
Consider the following simple single-file DBMS layout:

E.g.
SpaceMap = [ (0,10,U), (10,10,U), (20,600,U), (620,100,U), (720,20,F) ]
TableMap = [ ("employee",20,500), ("project",620,40) ]

... Single-file Storage Manager 18/68
Each file segment consists of a number fixed-size blocks
The following data/constant definitions are useful
#define PAGESIZE 2048 // bytes per page
typedef long PageId; // PageId is block index
                             // pageOffset=PageId*PAGESIZE
typedef char *Page; // pointer to page/block buffer
Typical PAGESIZE values: 1024, 2048, 4096, 8192

... Single-file Storage Manager 19/68
Storage Manager data structures for opened DBs & Tables
typedef struct DBrec {
  char *dbname; // copy of database name
  int fd; // the database file
  SpaceMap map; // map of free/used areas
  NameTable names; // map names to areas + sizes
} *DB;
typedef struct Relrec {
  char *relname; // copy of table name
  int start; // page index of start of table data
  int npages; // number of pages of table data
...
} *Rel;

Example: Scanning a Relation

With the above disk manager, our example:
select name from Employee
might be implemented as something like
DB db = openDatabase("myDB");
Rel r = openRelation(db,"Employee");
Page buffer = malloc(PAGESIZE*sizeof(char));
for (int i = 0; i < r->npages; i++) {
  PageId pid = r->start+i;
  get_page(db, pid, buffer);
  for each tuple in buffer {
    get tuple data and extract name
    add (name) to result tuples
  }
}

Single-File Storage Manager 21/68
// start using DB, buffer meta-data
DB openDatabase(char *name) {
DB db = new(struct DBrec);
db->dbname = strdup(name);
db->fd = open(name,O_RDWR);
db->map = readSpaceTable(db->fd);
db->names = readNameTable(db->fd);
return db;
}
// stop using DB and update all meta-data
void closeDatabase(DB db) {
writeSpaceTable(db->fd,db->map);
writeNameTable(db->fd,db->map);
fsync(db->fd);
close(db->fd);
free(db->dbname);
free(db);
}
... Single-File Storage Manager 22/68
// set up struct describing relation
Rel openRelation(DB db, char *rname) {
Rel r = new(struct Relrec);
r->relname = strdup(rname);
// get relation data from map tables
r->start = ...;
r->npages = ...;
return r;
}
// stop using a relation
void closeRelation(Rel r) {
free(r->relname);
free(r);
}
... Single-File Storage Manager 23/68
// assume that Page = byte[PageSize]
// assume that PageId = block number in file
// read page from file into memory buffer
void get_page(DB db, PageId p, Page buf) {

lseek(db->fd, p*PAGESIZE, SEEK_SET);
read(db->fd, buf, PAGESIZE);
}
// write page from memory buffer to file
void put_page(Db db, PageId p, Page buf) {
lseek(db->fd, p*PAGESIZE, SEEK_SET);
write(db->fd, buf, PAGESIZE);
}
... Single-File Storage Manager 24/68
Managing contents of space mapping table can be complex:
// assume an array of (offset,length,status) records
// allocate n new pages
PageId allocate_pages(int n) {
if (no existing free chunks are large enough) {
int endfile = lseek(db->fd, 0, SEEK_END);
addNewEntry(db->map, endfile, n);
} else {
grab "worst fit" chunk
split off unused section as new chunk
}
// note that file itself is not changed
}
... Single-File Storage Manager 25/68
Similar complexity for freeing chunks
// drop n pages starting from p
void deallocate_pages(PageId p, int n) {
if (no adjacent free chunks) {
markUnused(db->map, p, n);
} else {
merge adjacent free chunks
compress mapping table
}
// note that file itself is not changed
}
Changes take effect when closeDatabase() executed.

Exercise 1: Relation Scan Cost 26/68
Consider a table R(x,y,z) with 105 tuples, implemented as
number of tuples r = 10,000
average size of tuples R = 200 bytes
size of data pages B = 4096 bytes
time to read one data page Tr = 10msec
time to check one tuple 1 usec
time to form one result tuple 1 usec
time to write one result page Tr = 10msec
Calculate the total time-cost for answering the query:
insert into S select * from R where x > 10;
if 50% of the tuples satisfy the condition.

-------------------------------------------------------

DBMS Parameters 27/68
Our view of relations in DBMSs:

  • a relation is a set of r tuples, with average size R bytes
  • the tuples are stored in b data pages on disk
  • each page has size B bytes and contains up to c tuples
  • data is transferred disk􃲗memory in whole pages
  • cost of disk􃲗memory transfer Tr , Tw dominates other costs

... DBMS Parameters 28/68
Typical DBMS/table parameter values:
Quantity Symbol E.g. Value
total # tuples r 106
record size R 128 bytes
total # pages b 105
page size B 8192 bytes
# tuples per page c 60
page read/write time Tr ,Tw 10 msec
cost to process
one page in memory
- ≅ 0

Multiple-file Disk Manager 29/68
Most DBMSs don't use a single large file for all data.
They typically provide:

  • multiple files partitioned physically or logically
  • mapping from DB-level objects to files (e.g. via meta-data)

Precise file structure varies between individual DBMSs.
Using multiple files (one file per relation) can be easier, e.g.

  • adding a new relation
  • extending the size of a relation
  • computing page offsets within a relation

... Multiple-file Disk Manager 30/68
Example of single-file vs multiple-file:

... Multiple-file Disk Manager 31/68
Structure of PageId for data pages in such systems ...
If system uses one file per table, PageId contains:
relation indentifier (which can be mapped to filename)
page number (to identify page within the file)
If system uses several files per table, PageId contains:
relation identifier
file identifier (combined with relid, gives filename)
page number (to identify page within the file)

  • 21
    点赞
  • 14
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
### 回答1: COBOL中,COMP和COMP-3都是数据类型,但它们有一些区别。 COMP是一种二进制数据类型,它占用一个字(4个字节)的存储空间。它可以存储整数、小数和浮点数等数据类型。但是,它不能存储带有小数点的数值。 COMP-3是一种压缩的二进制数据类型,它占用不定长的存储空间。它可以存储整数、小数和浮点数等数据类型,并且可以存储带有小数点的数值。但是,它的存储方式比较特殊,需要进行压缩和解压缩操作,所以在处理时需要特别注意。 因此,如果需要存储带有小数点的数值,建议使用COMP-3类型。如果只需要存储整数、小数和浮点数等数据类型,可以使用COMP类型。 ### 回答2: COBOL(通用商务语言,英文全称:Common Business-Oriented Language)是一种广泛用于企业应用的编程语言。在COBOL中,主要有两种数据类型:二进制和十进制。 在二进制数据类型中,COBOL中有两个最主要的数据类型── COMP和COMP-3。这两个数据类型的区别可以从编程语言的使用和存储方面来阐述。 首先,COMP(有时称为COMP-1)是一种二进制数据类型,它存储的是32位或64位的二进制数。它最初是用来存储小数的数值,但它还可以用于存储整数。在运算时,二进制数被转换为十进制数,在计算结束时再转换回二进制数。这种数据类型通常用于存储非常大的数字,因为二进制位数越多,数值就越大。如果您的应用程序需要处理非常大的数字,那么COMP类型是非常有用的。 其次,COMP-3(也称为Packed-Decimal)是一种可压缩的二进制数据类型,它可以存储数字和数值。它被广泛用于金融应用程序中,因为它可以存储小数点位置并且可以通过使用压缩算法来减少存储空间。该算法通过将数字加上13(十进制),然后将结果除以10,并将余数加入表示数字长度的字段中,来将数字压缩成不同的字节数。例如,一个俩位的数字在COMP-3中只需要1个字节,而一个四位数字需要2个字节,并且这个数字只能使用10进制表示。 总之,COBOL中的COMP和COMP-3数据类型的区别在于它们存储数据的方式,以及数据类型的存储大小。COMP主要用于存储非常大的数字,而COMP-3则用于存储金融应用程序中的数字和数值,通过压缩算法来节约存储空间。需要根据具体的应用场景来选择使用哪种数据类型。 ### 回答3: COBOL是一种面向商业应用的高级编程语言。在使用COBOL时,我们常会听到两个概念:comp和comp-3。这两者有什么区别?下面就来一一讲解: 1. comp(二进制) comp是COBOL中的一种数据类型,它表示二进制数。comp所占据的存储空间是由所存储数值的位数决定的,例如:pic S9(9) comp 表示一个9位带符号的二进制数。 使用comp类型变量的好处是它占用的存储空间少,节省了系统资源。它也更适合于整数计算,被广泛用于所有的计算机中(从小到大,64位计算机除外)。 2. comp-3(压缩型二进制) COBOL语言中,comp-3也是一种数据类型,它表示一种压缩型二进制数。它将数字压缩成BCD码表现,占据的存储空间比comp要少得多。 同时,它也更适用于金额计算或者其他金融计算。尤其是在大型银行和金融机构中,comp-3被广泛使用。 总结: - comp是一个二进制数据类型,它占用比较小的存储空间,适用于整数计算。 - comp-3是一种压缩型二进制数据类型,它将数字压缩成BCD码表现,占据的存储空间比comp要少得多,适用于金额等金融计算。 根据实际需求来选择不同的存储方式是非常重要的,能够提高应用程序的执行效率和节省系统资源。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值