Why are Oracle’s Read Events “Named Backwards”?

There are two events that appear prominently in almost any Oracle trace file: db file sequential read and db file scattered read. These events denote the Oracle kernel’s requests for database blocks from disk. 


在oracle跟踪文件中,经常会出现两个事件:db file sequential read 和 db file sctter read。这两个事件表明oracle内核发生了读写磁盘数据的请求。


 When we think about disks, we are accustomed to thinking about sequential accesses, where a process reads long runs of contiguous data from disk. When Oracle executes a full-table scan, reading several Oracle blocks’ worth of data in a single large I/O call, it is a sequential read. And of course when we think about disks, we are also accustomed to thinking about random accesses. A SQL state-ment typically uses random accesses to fulfill the single-block read calls that a well-indexed query will call for. 


当设计到磁盘时,我们通常会想到顺序读(进程从磁盘读取连续数据)和随机读。当执行全表扫描时,oracle会从磁盘中一次读取多个数据块,因此我们认为其实顺序读。当基于索引读取表数据是,oracle会随机访问某个数据块,因此我们认为为随机读。


 However, if you’ve read anything about Oracle’s wait events, you’ve seen db file sequential read described as denoting random reads (e.g., indexed scans), and db file scattered read described as denoting sequential reads (e.g., full-table scans). The terminology that Oracle uses is certainly unintuitive. It has led some people to speculate that the Oracle events were accidentally named backwards. Well, it’s a fun theory, but it’s not true. There is a good reason that the events are named the way they are.


但是,当我们阅读与oracle event相关的资料时,我们发现 db file sequential read事件代表随机读(例如索引读取),db file scattered read 事件表示顺序读(例如全表扫描)。oracle的命名方法与我们的直觉是相违背的,因此有些人认为oracle在这两个事件是的命名是混淆的。但事实并非如此,oracle 之所以采用这样的命名方式是有充分的理由的。


   The event names db file sequential read and db file scattered read describe how blocks are stored in memory, not how they are read from disk. The two Oracle read event names are analogous to two classes of UNIX read calls, repre-sented by the functions read() and readv(). The UNIX read() function reads a contiguous portion of a file and stores it contiguously in a single memory area. The readv() function reads a contiguous portion of a file and stores pieces of that data in different areas of memory dictated by an array of memory references. Oracle diskreads executed through calls like read() are recorded as db file sequential read events, and disk-reads executed through calls like readv() are recorded as db file scattered read events. 


db file sequential read 和db file scattered read事件名称是用来描述数据块是如何在内存中存储的,而不是用来描述数据块是如何在磁盘中读取的。这与unix系统中的两类读调用非常相似,分别由read()函数和readv()函数来表示。在unix中,read函数读取连续的文件内容并将其存放在单一连续的内存空间中,而readv函数读取文件中的连续内容并将其存放在内存的不同片段中。oracle中通过read函数执行的读操作记录为db file sequential read events,而通过readv函数执行的读操作记录为db file scattered read events。




   • Oracle generates db file sequential read events for all single-block reads. Oracle always stores a single block in a single cache buffer, so single-block reads never generate db file scattered read events. 


对于单块读,oracle总是产生db file sequential read 事件。因为oracle总是会将单个数据库存储在连续内存中,因此单块读永远不会产生db file scattered read 事件。


  • Oracle7.3 and onward generates multi-block db file sequential read events when a dedicated server process reads temporary segment data from disk.Older releases of Oracle would read temporary segment data into the database buffer cache using db file scattered reads. Newer releases exploit the heuristic that temporary segment data isn’t likely to be sharable or revisited, so it reads directly into a server process’ program global area (PGA). 


在oracle7.3 及后续版本中,但专用服务进程读取临时段时,oracle会产生 multi-block db file sequential read 事件。在早版本的数据库中,临时段是读入缓存区的,因此会产生 db file scattered read 事件。但是在后续版本中,意识到临时段不会被其他进程共享,因此oracle直接将其读入pga中。


 A db file scattered read event occurs when the memory receiving the contents of a disk read is not guaranteed to be contiguous. The Oracle kernel could have been designed to spend more time finding a group of contiguous free buffers that could be filled with a read(). Doing it this way would have provided a net benefit only if the additional time consumed were offset by a sufficient speed advantage of read() over readv(). The cost of searching for contiguous buffers probably overwhelmed any advantage read() might have over readv(), motivating Oracle’s kernel developers to choose the algorithm they did. 


db file scatter read事件发生在oracle读取数据到内存,但是无法保证数据在内存内是连续存储时。oracle内核可能已经被设计为花费较多的时间来寻找一段连续内存,从而可以保证实施read操作。但是这样做仅仅只会在read相对readv带来的优势要高于寻找内存所花费时间时才会有所收益。花费在寻找连续内存的时间可能要高于read相对readv所带来的优势,这也就促使了开发人员选择了他们所选择的的算法。


  • In most circumstances, full-table scans and fast full-index scans generate one or more db file scattered reads. However, there are times when these scans generate only db file sequential read events. 


在多数情况下,全部扫描和索引快速扫描会产生db file scatter read事件,但是也有可能仅仅产生 db file sequential read 事件。


Example: The following excerpt from an Oracle trace file shows a full-table scan executed using only db file sequential read events. The table is stored contiguously in file 2 between blocks 2 and 20, with its high-water mark set at block 16. The database block size is 8 KB. The reads are all single-block reads because all of the even-numbered blocks were resident in the database buffer cache prior to this query’s execution [Holt (2000)]. 


例如:在下面的oracle跟踪文件片段中,发生了全表扫描,但是仅仅只有db file sequential read 事件发生。这个表在文件2的2号块和20号块之间连续的存储数据,其高水位现为16,。数据块大小为8k。之所以会发生单块读,是因为所有的偶数块在查询发生之前都已经缓存在buffer cache中。


  PARSING IN CURSOR #1 len=31 dep=0 uid=18 oct=3 lid=18 tim=26532087 hv=2854987545 ad='82061e04' 
  select count(1) from test_tab  
  END OF STMT 
  PARSE #1:c=0,e=0,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=4,tim=26532087 
  EXEC #1:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=4,tim=26532087 
  WAIT #1: nam='db file sequential read' ela= 0 p1=2 p2=3 p3=1 
  WAIT #1: nam='db file sequential read' ela= 0 p1=2 p2=5 p3=1 
  WAIT #1: nam='db file sequential read' ela= 0 p1=2 p2=7 p3=1 
  WAIT #1: nam='db file sequential read' ela= 0 p1=2 p2=9 p3=1 
  WAIT #1: nam='db file sequential read' ela= 0 p1=2 p2=11 p3=1 
  WAIT #1: nam='db file sequential read' ela= 0 p1=2 p2=13 p3=1 
  WAIT #1: nam='db file sequential read' ela= 0 p1=2 p2=15 p3=1 
  FETCH #1:c=1,e=1,p=7,cr=14,cu=3,mis=0,r=1,dep=0,og=4,tim=26532088 

 The following output from the Solaris truss command shows the system calls executed for the query. Lines 23 through 27 show how the data file is opened and referenced as file handle 409. Line 31 shows a call to the pread() system function that is exactly the same as read() except that it does not advance the file pointer like read() does, and it accepts the file offset as its fourth argument. The return value is the number of bytes successfully read. The times() calls provide timing data since the timed_statistics instance parameter is set to true. 


从下面solaris truss命令的输出可以看出在查询执行期间所有的系统调用。在23~27行,可以看出数据文件被打开并作为409handle引用。31行调用的pread函数与read函数功能相近。times函数提供了计时数据,因为timed_statistics实例参数被设置为true。


     *** SUID: ruid/euid/suid = 4076 / 108 / 108  *** 
  read(10, 0x010ACACE, 2064) (sleeping...) 
  read(10, "\097\0\006\0\0\0\0\003 ^".., 2064) = 151 
  times(0xEFFFE360)    = 162380015 
  times(0xEFFFDA80)    = 162380015 
  times(0xEFFFD9C0)    = 162380015 
  times(0xEFFFD9C0)    = 162380015 
  time()      = 948902570 
  brk(0x010E9A60)    = 0 
  brk(0x010EBA60)    = 0 
  brk(0x010EBA60)    = 0 
  brk(0x010EDA60)    = 0 
  times(0xEFFFD9C0)    = 162380015 
  times(0xEFFFD9C0)    = 162380015 
  times(0xEFFFC4A0)    = 162380015 
  times(0xEFFFC4A0)    = 162380015 
  time()      = 948902570 
  times(0xEFFFD7A8)    = 162380015 
  times(0xEFFFD760)    = 162380015 
  times(0xEFFFC5D8)    = 162380015 
  times(0xEFFFC578)    = 162380015 
  times(0xEFFFBE90)    = 162380015 
  open64("/home/jeffh/test/data/tools01.dbf", O_RDWR|O_DSYNC) = 15 
  getrlimit(RLIMIT_NOFILE, 0xEFFFBE98)  = 0 
  fstat64(409, 0xEFFFBE00)   Err#9 EBADF 
  fcntl(15, F_DUP2FD, 0x00000199)  = 409 
  close(15)     = 0 
  fcntl(409, F_SETFD, 0x00000001)  = 0 
  ioctl(409, 0x0403, 0xEFFFBE5C)  Err#25 ENOTTY 
  times(0xEFFFBE48)    = 162380016 
  pread(409, "0602\0\0\080\003\0\0 + B".., 8192, 24576) = 8192 
  times(0xEFFFC530)    = 162380016 
  times(0xEFFFC5D8)    = 162380016 
  times(0xEFFFC5D8)    = 162380016 
  times(0xEFFFC5D8)    = 162380016 
  times(0xEFFFC578)    = 162380016 
  pread(409, "0602\0\0\080\005\0\0 + V".., 8192, 40960) = 8192 
  times(0xEFFFC530)    = 162380016 
  times(0xEFFFC5D8)    = 162380016 
  times(0xEFFFC5D8)    = 162380016 
  times(0xEFFFC5D8)    = 162380016 
  times(0xEFFFC578)    = 162380016 
  pread(409, "0602\0\0\080\007\0\0 + V".., 8192, 57344) = 8192 
  times(0xEFFFC530)    = 162380016 
  times(0xEFFFC5D8)    = 162380016 
  times(0xEFFFC5D8)    = 162380016 
  times(0xEFFFC5D8)    = 162380016 
  times(0xEFFFC578)    = 162380016 
  pread(409, "0602\0\0\080\0\t\0\0 + V".., 8192, 73728) = 8192 
  times(0xEFFFC530)    = 162380016 
  times(0xEFFFC5D8)    = 162380016 
  times(0xEFFFC5D8)    = 162380016 
  times(0xEFFFC5D8)    = 162380016 
  times(0xEFFFC578)    = 162380016 
  pread(409, "0602\0\0\080\0\v\0\0 + V".., 8192, 90112) = 8192 
  times(0xEFFFC530)    = 162380016 
  times(0xEFFFC5D8)    = 162380016 
  times(0xEFFFC5D8)    = 162380016 
  times(0xEFFFC5D8)    = 162380016 
  times(0xEFFFC578)    = 162380016 
  pread(409, "0602\0\0\080\0\r\0\0 + V".., 8192, 106496) = 8192 
  times(0xEFFFC530)    = 162380016 
  times(0xEFFFC5D8)    = 162380016 
  times(0xEFFFC5D8)    = 162380016 
  times(0xEFFFC5D8)    = 162380017 
  times(0xEFFFC578)    = 162380017 
  pread(409, "0602\0\0\080\00F\0\0 + V".., 8192, 122880) = 8192 
  times(0xEFFFC530)    = 162380017 
  times(0xEFFFC5D8)    = 162380017 
  times(0xEFFFC5D8)    = 162380017 
  times(0xEFFFDB40)    = 162380017 
  times(0xEFFFE3A8)    = 162380017 
  write(13, "\0B0\0\006\0\0\0\0\010\0".., 176) = 176 
  read(10, "\0 y\0\006\0\0\0\0\003 ^".., 2064) = 121 
  times(0xEFFFE360)    = 162380017 
  times(0xEFFFDA80)    = 162380017 
  times(0xEFFFDB40)    = 162380017 
  times(0xEFFFE3A8)    = 162380017 
  times(0xEFFFE360)    = 162380017 
  lseek(14, 37376, SEEK_SET)   = 37376 
  read(14, "\0\v04E2\0\0\0 J04E3\0\0".., 512) = 512 
  times(0xEFFFE3A8)    = 162380017 
  write(13, "\0 U\0\006\0\0\0\0\004\0".., 85) = 85 
  read(10, 0x010ACACE, 2064) (sleeping...) 

 The following table describes circumstances that generate Oracle’s db file sequential read and db file scattered read events. 


下图是对产生db file sequential read 和 db file scattered read 事件情景的总结。








  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值