[20130705]gettimeofday() 系统调用.txt
链接:http://space.itpub.net/267265/viewspace-755535
今天本来想看看GETTIMEOFDAY,发现一个链接:
http://www.scaleabilities.co.uk/2012/12/18/who-stole-gettimeofday-from-oracle-straces/
自己找一台安装centos 6.2的系统做了测试。
按照链接的介绍,如果使用strace跟踪,可能看不到gettimeofday的调用(rhel 5.5以上版本)。
摘要:
System Call Interface
Historically, the method for making a system call was via a software interrupt. This interrupt signalled the kernel to
switch into the kernel context and to process the system call on behalf of the user process. There is lots of detail
surrounding this such as how parameters are passed, security checks (because the kernel operates with full access to
hardware, including all system memory), invalidating TLB entries, and so forth, but we won't go into that here. The
important part is that this interface became increasingly unscalable, particularly across multiple SMP CPUs. Intel
addressed the scalability issue in the Pentium II generation through the creation of a Fast System Call interface using
the SYSENTER and SYSEXIT instructions instead of interrupts. As this method did not exist in previous processor
generations, the Linux kernel needed a way to determine which method of invoking system calls was optimal for all
processors that were/are supported by the Operating System. This implementation needed to insulate the user code against
any kind of porting between different processor types, and the result was the implementation of the Virtual Dynamic
Shared Object library. This is a special library that the kernel maps into the address space of any user process and
implements the code for the Fast System Call interface. Using ldd we can see that it is not actually represented by any
underlying file, just a memory address:
$ ldd /u01/app/oracle/product/10.2.0/db_1/bin/oracle
linux-vdso.so.1 => (0x00007fff86dfc000)
libskgxp10.so => /u01/app/oracle/product/10.2.0/db_1/lib/libskgxp10.so (0x00007fb32b12e000)
libhasgen10.so => /u01/app/oracle/product/10.2.0/db_1/lib/libhasgen10.so (0x00007fb32af37000)
libskgxn2.so => /u01/app/oracle/product/10.2.0/db_1/lib/libskgxn2.so (0x00007fb32ae35000)
libocr10.so => /u01/app/oracle/product/10.2.0/db_1/lib/libocr10.so (0x00007fb32acc8000)
libocrb10.so => /u01/app/oracle/product/10.2.0/db_1/lib/libocrb10.so (0x00007fb32ab86000)
libocrutl10.so => /u01/app/oracle/product/10.2.0/db_1/lib/libocrutl10.so (0x00007fb32aa11000)
libjox10.so => /u01/app/oracle/product/10.2.0/db_1/lib/libjox10.so (0x00007fb329f3f000)
libclsra10.so => /u01/app/oracle/product/10.2.0/db_1/lib/libclsra10.so (0x00007fb329e35000)
libdbcfg10.so => /u01/app/oracle/product/10.2.0/db_1/lib/libdbcfg10.so (0x00007fb329d17000)
libnnz10.so => /u01/app/oracle/product/10.2.0/db_1/lib/libnnz10.so (0x00007fb329877000)
libaio.so.1 => /lib64/libaio.so.1 (0x0000003f5d200000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003f5da00000)
libm.so.6 => /lib64/libm.so.6 (0x0000003f5e200000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003f5de00000)
libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003f6e600000)
libc.so.6 => /lib64/libc.so.6 (0x0000003f5d600000)
/lib64/ld-linux-x86-64.so.2 (0x0000003f5ce00000)
The linux-vdso.so.1 entry at the top there is the VDSO library, which used be referred to as linux-gate in previous
versions. This article has some further information on how this is all structured, including disassembly of the actual
library showing the call to SYSENTER.
Anyway, that's one of the purposes of the VDSO library, but it is more than that in concept; it is as abstraction to
what a system call actually is. With this abstraction in place it becomes possible to perform. other magic tricks under
the hood without any change to the user code. The particular trick that we are interested in for this blog entry is the
speedup of the gettimeofday(2) call.
Vsyscall64
As documented here, the VDSO also contains some mapping directly to kernel code. Specifically, it allows the user
process some shortcuts to the system call interface by directly executing some/all of the code in user mode rather than
kernel mode. This is controlled via a kernel flag which can be viewed and modified via the /proc/sys/kernel/vsyscall64
special file. The possible values are as follows:
0 - Provides the most accurate time intervals at μs (microsecond) resolution, but also produces the highest call
overhead, as it uses a regular system call
1 - Slightly less accurate, although still at μs resolution, with a lower call overhead
2 - The least accurate, with time intervals at the ms (millisecond) level, but offers the lowest call overhead
As from RHEL 5.5, the default value for this has become "1", which means by implication that it does not "use a
regular system call" as it did when the default was "0". I have not fully investigated exactly what is executed in the
various modes from a kernel code perspective, but I could make a fair guess from the descriptions:
0 – Regular system call, via Fast System Call interface if available
1 – Mostly user mode, probably just reading from a page of kernel-updated memory which is mapped into the user address
space. Some kernel code must be necessary to ensure the value in memory reflects the current timestamp
2 – Just user mode. Read the mapped memory page and rely on the kernel to update the value every millisecond.
Like I said, that's just an educated guess. If anyone's looked into this, please do comment and I'll update
accordingly.
Let's run a few tests. To check this out I wrote a small C program to essentially just call gettimeofday(2) over and
over again:
#include
void
main() {
int i;
struct timeval tv;
for (i=1000000;i>0;i--)
if (gettimeofday(&tv, 0))
perror("gtod");
}
--我在我的测试环境做了测试:OS = centos 6.2 内核版本:
# uname -a
Linux xxxx 2.6.32-220.17.1.el6.x86_64 #1 SMP Wed May 16 00:01:37 BST 2012 x86_64 x86_64 x86_64 GNU/Linux
--重复我链接的测试:
SQL> @ver
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
create table t1 ( c1 number ,c2 clob);
insert into t1 select rownum, lpad('a',32,'a') from dual connect by level<=10000;
commit ;
exec dbms_stats.gather_table_stats(user, 't1',method_opt=>'for all columns size 1');
SQL> select spid from v$process where addr in (select paddr from v$session where sid in (select sid from v$mystat where rownum=1));
SPID
------------
15664
$ cat /tmp/a.sql
set timing on
set autot traceonly;
select * from t1;
quit
--三种情况的测试结果:
--可以发现仅仅在/proc/sys/kernel/vsyscall64=0,出现gettimeofday.时间=22秒.
--很明显改成/proc/sys/kernel/vsyscall64=1(也是缺省设置效果最好)
链接:http://space.itpub.net/267265/viewspace-755535
今天本来想看看GETTIMEOFDAY,发现一个链接:
http://www.scaleabilities.co.uk/2012/12/18/who-stole-gettimeofday-from-oracle-straces/
自己找一台安装centos 6.2的系统做了测试。
按照链接的介绍,如果使用strace跟踪,可能看不到gettimeofday的调用(rhel 5.5以上版本)。
摘要:
System Call Interface
Historically, the method for making a system call was via a software interrupt. This interrupt signalled the kernel to
switch into the kernel context and to process the system call on behalf of the user process. There is lots of detail
surrounding this such as how parameters are passed, security checks (because the kernel operates with full access to
hardware, including all system memory), invalidating TLB entries, and so forth, but we won't go into that here. The
important part is that this interface became increasingly unscalable, particularly across multiple SMP CPUs. Intel
addressed the scalability issue in the Pentium II generation through the creation of a Fast System Call interface using
the SYSENTER and SYSEXIT instructions instead of interrupts. As this method did not exist in previous processor
generations, the Linux kernel needed a way to determine which method of invoking system calls was optimal for all
processors that were/are supported by the Operating System. This implementation needed to insulate the user code against
any kind of porting between different processor types, and the result was the implementation of the Virtual Dynamic
Shared Object library. This is a special library that the kernel maps into the address space of any user process and
implements the code for the Fast System Call interface. Using ldd we can see that it is not actually represented by any
underlying file, just a memory address:
$ ldd /u01/app/oracle/product/10.2.0/db_1/bin/oracle
linux-vdso.so.1 => (0x00007fff86dfc000)
libskgxp10.so => /u01/app/oracle/product/10.2.0/db_1/lib/libskgxp10.so (0x00007fb32b12e000)
libhasgen10.so => /u01/app/oracle/product/10.2.0/db_1/lib/libhasgen10.so (0x00007fb32af37000)
libskgxn2.so => /u01/app/oracle/product/10.2.0/db_1/lib/libskgxn2.so (0x00007fb32ae35000)
libocr10.so => /u01/app/oracle/product/10.2.0/db_1/lib/libocr10.so (0x00007fb32acc8000)
libocrb10.so => /u01/app/oracle/product/10.2.0/db_1/lib/libocrb10.so (0x00007fb32ab86000)
libocrutl10.so => /u01/app/oracle/product/10.2.0/db_1/lib/libocrutl10.so (0x00007fb32aa11000)
libjox10.so => /u01/app/oracle/product/10.2.0/db_1/lib/libjox10.so (0x00007fb329f3f000)
libclsra10.so => /u01/app/oracle/product/10.2.0/db_1/lib/libclsra10.so (0x00007fb329e35000)
libdbcfg10.so => /u01/app/oracle/product/10.2.0/db_1/lib/libdbcfg10.so (0x00007fb329d17000)
libnnz10.so => /u01/app/oracle/product/10.2.0/db_1/lib/libnnz10.so (0x00007fb329877000)
libaio.so.1 => /lib64/libaio.so.1 (0x0000003f5d200000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003f5da00000)
libm.so.6 => /lib64/libm.so.6 (0x0000003f5e200000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003f5de00000)
libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003f6e600000)
libc.so.6 => /lib64/libc.so.6 (0x0000003f5d600000)
/lib64/ld-linux-x86-64.so.2 (0x0000003f5ce00000)
The linux-vdso.so.1 entry at the top there is the VDSO library, which used be referred to as linux-gate in previous
versions. This article has some further information on how this is all structured, including disassembly of the actual
library showing the call to SYSENTER.
Anyway, that's one of the purposes of the VDSO library, but it is more than that in concept; it is as abstraction to
what a system call actually is. With this abstraction in place it becomes possible to perform. other magic tricks under
the hood without any change to the user code. The particular trick that we are interested in for this blog entry is the
speedup of the gettimeofday(2) call.
Vsyscall64
As documented here, the VDSO also contains some mapping directly to kernel code. Specifically, it allows the user
process some shortcuts to the system call interface by directly executing some/all of the code in user mode rather than
kernel mode. This is controlled via a kernel flag which can be viewed and modified via the /proc/sys/kernel/vsyscall64
special file. The possible values are as follows:
0 - Provides the most accurate time intervals at μs (microsecond) resolution, but also produces the highest call
overhead, as it uses a regular system call
1 - Slightly less accurate, although still at μs resolution, with a lower call overhead
2 - The least accurate, with time intervals at the ms (millisecond) level, but offers the lowest call overhead
As from RHEL 5.5, the default value for this has become "1", which means by implication that it does not "use a
regular system call" as it did when the default was "0". I have not fully investigated exactly what is executed in the
various modes from a kernel code perspective, but I could make a fair guess from the descriptions:
0 – Regular system call, via Fast System Call interface if available
1 – Mostly user mode, probably just reading from a page of kernel-updated memory which is mapped into the user address
space. Some kernel code must be necessary to ensure the value in memory reflects the current timestamp
2 – Just user mode. Read the mapped memory page and rely on the kernel to update the value every millisecond.
Like I said, that's just an educated guess. If anyone's looked into this, please do comment and I'll update
accordingly.
Let's run a few tests. To check this out I wrote a small C program to essentially just call gettimeofday(2) over and
over again:
#include
void
main() {
int i;
struct timeval tv;
for (i=1000000;i>0;i--)
if (gettimeofday(&tv, 0))
perror("gtod");
}
--我在我的测试环境做了测试:OS = centos 6.2 内核版本:
# uname -a
Linux xxxx 2.6.32-220.17.1.el6.x86_64 #1 SMP Wed May 16 00:01:37 BST 2012 x86_64 x86_64 x86_64 GNU/Linux
# echo 0 >| /proc/sys/kernel/vsyscall64
# time strace -c ./gtod
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
99.84 0.095268 0 1000000 gettimeofday
0.16 0.000152 19 8 mmap
0.00 0.000000 0 1 read
0.00 0.000000 0 2 open
0.00 0.000000 0 2 close
0.00 0.000000 0 2 fstat
0.00 0.000000 0 3 mprotect
0.00 0.000000 0 1 munmap
0.00 0.000000 0 1 brk
0.00 0.000000 0 1 1 access
0.00 0.000000 0 1 execve
0.00 0.000000 0 1 arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00 0.095420 1000023 1 total
real 0m33.072s
user 0m4.665s
sys 0m32.098s
# echo 1 >| /proc/sys/kernel/vsyscall64
# time strace -c ./gtod
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
-nan 0.000000 0 1 read
-nan 0.000000 0 2 open
-nan 0.000000 0 2 close
-nan 0.000000 0 2 fstat
-nan 0.000000 0 8 mmap
-nan 0.000000 0 3 mprotect
-nan 0.000000 0 1 munmap
-nan 0.000000 0 1 brk
-nan 0.000000 0 1 1 access
-nan 0.000000 0 1 execve
-nan 0.000000 0 1 arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00 0.000000 23 1 total
real 0m0.065s
user 0m0.062s
sys 0m0.003s
# echo 2 >| /proc/sys/kernel/vsyscall64
# cat /proc/sys/kernel/vsyscall64
2
# time strace -c ./gtod
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
-nan 0.000000 0 1 read
-nan 0.000000 0 2 open
-nan 0.000000 0 2 close
-nan 0.000000 0 2 fstat
-nan 0.000000 0 8 mmap
-nan 0.000000 0 3 mprotect
-nan 0.000000 0 1 munmap
-nan 0.000000 0 1 brk
-nan 0.000000 0 1 1 access
-nan 0.000000 0 1 execve
-nan 0.000000 0 1 arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00 0.000000 23 1 total
real 0m0.066s
user 0m0.064s
sys 0m0.001s
--重复我链接的测试:
SQL> @ver
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
create table t1 ( c1 number ,c2 clob);
insert into t1 select rownum, lpad('a',32,'a') from dual connect by level<=10000;
commit ;
exec dbms_stats.gather_table_stats(user, 't1',method_opt=>'for all columns size 1');
SQL> select spid from v$process where addr in (select paddr from v$session where sid in (select sid from v$mystat where rownum=1));
SPID
------------
15664
$ cat /tmp/a.sql
set timing on
set autot traceonly;
select * from t1;
quit
--三种情况的测试结果:
# echo 0 >| /proc/sys/kernel/vsyscall64
$ time strace -f -c sqlplus system/xxxx @/tmp/a.sql
...
Elapsed: 00:00:22.18
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
65.63 0.106459 2 60292 read
16.42 0.026644 0 251228 gettimeofday
7.22 0.011710 0 111292 getrusage
7.12 0.011548 0 60117 write
3.54 0.005747 0 60220 times
0.02 0.000031 1 34 munmap
0.01 0.000021 0 223 113 open
0.01 0.000020 0 159 mmap
0.01 0.000019 0 53 rt_sigaction
0.01 0.000014 0 125 close
0.01 0.000010 0 96 lseek
0.00 0.000000 0 49 36 stat
0.00 0.000000 0 55 fstat
0.00 0.000000 0 65 mprotect
0.00 0.000000 0 16 brk
0.00 0.000000 0 18 rt_sigprocmask
0.00 0.000000 0 3 ioctl
0.00 0.000000 0 21 14 access
0.00 0.000000 0 2 pipe
0.00 0.000000 0 16 14 shmget
0.00 0.000000 0 2 shmat
0.00 0.000000 0 1 shmctl
0.00 0.000000 0 2 dup
0.00 0.000000 0 9 socket
0.00 0.000000 0 8 8 connect
0.00 0.000000 0 1 bind
0.00 0.000000 0 1 getsockname
0.00 0.000000 0 1 1 getpeername
0.00 0.000000 0 2 getsockopt
0.00 0.000000 0 1 clone
0.00 0.000000 0 2 execve
0.00 0.000000 0 7 uname
0.00 0.000000 0 2 semctl
0.00 0.000000 0 2 shmdt
0.00 0.000000 0 29 fcntl
0.00 0.000000 0 6 getdents
0.00 0.000000 0 5 getcwd
0.00 0.000000 0 1 chdir
0.00 0.000000 0 3 readlink
0.00 0.000000 0 2 umask
0.00 0.000000 0 9 getrlimit
0.00 0.000000 0 7 getuid
0.00 0.000000 0 2 geteuid
0.00 0.000000 0 1 getegid
0.00 0.000000 0 3 getppid
0.00 0.000000 0 2 1 setsid
0.00 0.000000 0 1 sigaltstack
0.00 0.000000 0 3 statfs
0.00 0.000000 0 2 arch_prctl
0.00 0.000000 0 3 setrlimit
0.00 0.000000 0 8 2 futex
0.00 0.000000 0 1 sched_getaffinity
0.00 0.000000 0 2 io_setup
0.00 0.000000 0 1 io_destroy
0.00 0.000000 0 2 set_tid_address
0.00 0.000000 0 1 semtimedop
0.00 0.000000 0 2 set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00 0.162223 544221 189 total
real 0m22.412s
user 0m3.541s
sys 0m12.842s
# echo 1 >| /proc/sys/kernel/vsyscall64
$ time strace -f -c sqlplus system/xxxx @/tmp/a.sql
....
Elapsed: 00:00:10.66
.....
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
66.44 0.044495 1 60292 read
14.52 0.009727 0 116045 getrusage
12.88 0.008626 0 60117 write
5.59 0.003745 0 60655 times
0.21 0.000138 4 34 munmap
0.10 0.000066 33 2 pipe
0.10 0.000065 33 2 set_tid_address
0.06 0.000040 0 225 113 open
0.04 0.000025 1 21 14 access
0.03 0.000018 0 96 lseek
0.02 0.000015 0 65 mprotect
0.02 0.000011 0 170 mmap
0.00 0.000000 0 127 close
0.00 0.000000 0 49 36 stat
0.00 0.000000 0 55 fstat
0.00 0.000000 0 16 brk
0.00 0.000000 0 53 rt_sigaction
0.00 0.000000 0 18 rt_sigprocmask
0.00 0.000000 0 3 ioctl
0.00 0.000000 0 6 pread
0.00 0.000000 0 16 14 shmget
0.00 0.000000 0 2 shmat
0.00 0.000000 0 1 shmctl
0.00 0.000000 0 2 dup
0.00 0.000000 0 9 socket
0.00 0.000000 0 8 8 connect
0.00 0.000000 0 1 bind
0.00 0.000000 0 1 getsockname
0.00 0.000000 0 1 1 getpeername
0.00 0.000000 0 2 getsockopt
0.00 0.000000 0 1 clone
0.00 0.000000 0 2 execve
0.00 0.000000 0 7 uname
0.00 0.000000 0 5 semctl
0.00 0.000000 0 2 shmdt
0.00 0.000000 0 31 fcntl
0.00 0.000000 0 6 getdents
0.00 0.000000 0 5 getcwd
0.00 0.000000 0 1 chdir
0.00 0.000000 0 3 readlink
0.00 0.000000 0 2 umask
0.00 0.000000 0 9 getrlimit
0.00 0.000000 0 7 getuid
0.00 0.000000 0 2 geteuid
0.00 0.000000 0 1 getegid
0.00 0.000000 0 3 getppid
0.00 0.000000 0 2 1 setsid
0.00 0.000000 0 1 sigaltstack
0.00 0.000000 0 5 statfs
0.00 0.000000 0 2 arch_prctl
0.00 0.000000 0 3 setrlimit
0.00 0.000000 0 8 2 futex
0.00 0.000000 0 1 sched_getaffinity
0.00 0.000000 0 2 io_setup
0.00 0.000000 0 1 io_destroy
0.00 0.000000 0 2 semtimedop
0.00 0.000000 0 2 set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00 0.066971 298210 189 total
real 0m11.241s
user 0m2.249s
sys 0m6.253s
# echo 1 >| /proc/sys/kernel/vsyscall64
$ time strace -f -c sqlplus system/xxxx @/tmp/a.sql
....
Elapsed: 00:00:12.85
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
69.65 0.069590 1 60292 read
13.24 0.013234 0 111684 getrusage
11.39 0.011378 0 60117 write
5.51 0.005502 0 60262 times
0.14 0.000142 8 18 rt_sigprocmask
0.03 0.000028 0 65 mprotect
0.03 0.000027 0 223 113 open
0.02 0.000019 1 34 munmap
0.00 0.000000 0 125 close
0.00 0.000000 0 49 36 stat
0.00 0.000000 0 55 fstat
0.00 0.000000 0 96 lseek
0.00 0.000000 0 165 mmap
0.00 0.000000 0 16 brk
0.00 0.000000 0 53 rt_sigaction
0.00 0.000000 0 3 ioctl
0.00 0.000000 0 21 14 access
0.00 0.000000 0 2 pipe
0.00 0.000000 0 16 14 shmget
0.00 0.000000 0 2 shmat
0.00 0.000000 0 1 shmctl
0.00 0.000000 0 2 dup
0.00 0.000000 0 9 socket
0.00 0.000000 0 8 8 connect
0.00 0.000000 0 1 bind
0.00 0.000000 0 1 getsockname
0.00 0.000000 0 1 1 getpeername
0.00 0.000000 0 2 getsockopt
0.00 0.000000 0 1 clone
0.00 0.000000 0 2 execve
0.00 0.000000 0 7 uname
0.00 0.000000 0 2 semctl
0.00 0.000000 0 2 shmdt
0.00 0.000000 0 29 fcntl
0.00 0.000000 0 6 getdents
0.00 0.000000 0 5 getcwd
0.00 0.000000 0 1 chdir
0.00 0.000000 0 3 readlink
0.00 0.000000 0 2 umask
0.00 0.000000 0 9 getrlimit
0.00 0.000000 0 7 getuid
0.00 0.000000 0 2 geteuid
0.00 0.000000 0 1 getegid
0.00 0.000000 0 3 getppid
0.00 0.000000 0 2 1 setsid
0.00 0.000000 0 1 sigaltstack
0.00 0.000000 0 3 statfs
0.00 0.000000 0 2 arch_prctl
0.00 0.000000 0 3 setrlimit
0.00 0.000000 0 8 2 futex
0.00 0.000000 0 1 sched_getaffinity
0.00 0.000000 0 2 io_setup
0.00 0.000000 0 1 io_destroy
0.00 0.000000 0 2 set_tid_address
0.00 0.000000 0 2 semtimedop
0.00 0.000000 0 2 set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00 0.099920 293434 189 total
real 0m13.330s
user 0m2.442s
sys 0m7.564s
--可以发现仅仅在/proc/sys/kernel/vsyscall64=0,出现gettimeofday.时间=22秒.
--很明显改成/proc/sys/kernel/vsyscall64=1(也是缺省设置效果最好)
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/267265/viewspace-765611/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/267265/viewspace-765611/