linux HZ
HZ
Linux核心每隔固定周期会发出timer interrupt (IRQ 0),HZ是用来定义每一秒有几次timer interrupts。举例来说,HZ为1000,代表每秒有1000次timer interrupts。 HZ可在编译核心时设定,如下所示(以核心版本2.6.20-15为例):adrian@adrian-desktop:~$ cd /usr/src/linux
adrian@adrian-desktop:/usr/src/linux$ make menuconfig
Processor type and features ---> Timer frequency (250 HZ) --->
其中HZ可设定100、250、300或1000。以小弟的核心版本预设值为250。
小实验
观察/proc/interrupt的 timer中断次数,并于一秒后再次观察其值。理论上,两者应该相差250左右。
adrian@adrian-desktop:~$ cat /proc/interrupts | grep timer && sleep 1 && cat /proc/interrupts | grep timer
0: 9309306 IO-APIC-edge timer
0: 9309562 IO-APIC-edge timer
上面四个栏位分别为中断号码、CPU中断次数、PIC与装置名称。
要检查系统上HZ的值是什么,就执行命令
cat /boot/config-`uname -r` | grep '^CONFIG_HZ='
还可以直接更改文件
os/linux/include/asm-cris/param.h?
Tick
Tick是HZ的倒数,意即timer interrupt每发生一次中断的时间。如HZ为250时,tick为4毫秒(millisecond)。Jiffies
Jiffies为Linux核心变数(32位元变数,unsigned long),它被用来纪录系统自开几以来,已经过多少的tick。每发生一次timer interrupt,Jiffies变数会被加一。值得注意的是,Jiffies于系统开机时,并非初始化成零,而是被设为-300*HZ (arch/i386/kernel/time.c),即代表系统于开机五分钟后,jiffies便会溢位。那溢位怎么办?事实上,Linux核心定义几个macro(timer_after、time_after_eq、time_before与time_before_eq),即便是溢位,也能藉由这几个macro正确地取得jiffies的内容。另外,80x86架构定义一个与jiffies相关的变数jiffies_64 ,此变数64位元,要等到此变数溢位可能要好几百万年。因此要等到溢位这刻发生应该很难吧。那如何经由jiffies_64取得jiffies资讯呢?事实上,jiffies被对应至jiffies_64最低的32位元。因此,经由jiffies_64可以完全不理会溢位的问题便能取得jiffies。
#make menuconfig
processor type and features--->Timer frequency (250 HZ)--->
全局变量jiffies用来记录自系统启动以来产生的节拍的总数。启动时,内核将该变量初始化为0,此后,每次时钟中断处理程序都会增加该变量的值。一秒内时钟中断的次数等于Hz,所以jiffies一秒内增加的值也就是Hz。
系统运行时间以秒为单位,等于jiffies/Hz。
注意,jiffies类型为无符号长整型(unsigned long),其他任何类型存放它都不正确。
将以秒为单位的时间转化为jiffies:
seconds * Hz
将jiffies转化为以秒为单位的时间:
jiffies / Hz
相比之下,内核中将秒转换为jiffies用的多些。
- jiffies的内部表示
jiffies定义于文件<linux\Jiffies.h>中:
- /*
- * The 64-bit value is not atomic - you MUST NOT read it
- * without sampling the sequence number in xtime_lock.
- * get_jiffies_64() will do this for you as appropriate.
- */
- extern u64 __jiffy_data jiffies_64;
- extern unsigned long volatile __jiffy_data jiffies;
ld(1)脚本用于连接主内核映像(在x86上位于arch/i386/kernel/vmlinux.lds.S中),然后用jiffies_64变量的初值覆盖jiffies变量。因此jiffies取整个jiffies_64变量的低32位。
访问jiffies的代码只会读取jiffies_64的低32位,通过get_jiffies_64()函数就可以读取整个64位的值。在64位体系结构上,jiffies_64和jiffies指的是同一个变量。
- #if (BITS_PER_LONG < 64)
- u64 get_jiffies_64(void);
- #else
- static inline u64 get_jiffies_64(void)
- {
- return (u64)jiffies;
- }
- #endif
- 在<Time.c(kernel)>中
- #if (BITS_PER_LONG < 64)
- u64 get_jiffies_64(void)
- {
- unsigned long seq;
- u64 ret;
- do {
- seq = read_seqbegin(&xtime_lock);
- ret = jiffies_64;
- } while (read_seqretry(&xtime_lock, seq));
- return ret;
- }
- jiffies的回绕wrap around
当jiffies的值超过它的最大存放范围后就会发生溢出。对于32位无符号长整型,最大取值为(2^32)-1,即429496795。如果节拍计数达到了最大值后还要继续增加,它的值就会回绕到0。
内核提供了四个宏来帮助比较节拍计数,它们能正确的处理节拍计数回绕的问题:
- /*
- * These inlines deal with timer wrapping correctly. You are
- * strongly encouraged to use them
- * 1. Because people otherwise forget
- * 2. Because if the timer wrap changes in future you won't have to
- * alter your driver code.
- *
- * time_after(a,b) returns true if the time a is after time b.
- *
- * Do this with "<0" and ">=0" to only test the sign of the result. A
- * good compiler would generate better code (and a really good compiler
- * wouldn't care). Gcc is currently neither.
- */
- #define time_after(a,b) \
- (typecheck(unsigned long, a) && \
- typecheck(unsigned long, b) && \
- ((long)(b) - (long)(a) < 0))
- #define time_before(a,b) time_after(b,a)
- #define time_after_eq(a,b) \
- (typecheck(unsigned long, a) && \
- typecheck(unsigned long, b) && \
- ((long)(a) - (long)(b) >= 0))
- #define time_before_eq(a,b) time_after_eq(b,a)
- /* Same as above, but does so with platform independent 64bit types.
- * These must be used when utilizing jiffies_64 (i.e. return value of
- * get_jiffies_64() */
- #define time_after64(a,b) \
- (typecheck(__u64, a) && \
- typecheck(__u64, b) && \
- ((__s64)(b) - (__s64)(a) < 0))
- #define time_before64(a,b) time_after64(b,a)
- #define time_after_eq64(a,b) \
- (typecheck(__u64, a) && \
- typecheck(__u64, b) && \
- ((__s64)(a) - (__s64)(b) >= 0))
- #define time_before_eq64(a,b) time_after_eq64(b,a)
- 用户空间和HZ
问题提出:
在2.6以前的内核中,如果改变内核中的HZ值会给用户空间中某些程序造成异常结果。因为内核是以节拍数/秒的形式给用户空间导出这个值的,应用程序便依赖这个特定的HZ值。如果在内核中改变了HZ的定义值,就打破了用户空间的常量关系---用户空间并不知道新的HZ值。
解决方法:
内核更改所有导出的jiffies值。内核定义了USER_HZ来代表用户空间看到的HZ值。在x86体系结构上,由于HZ值原来一直是100,所以USER_HZ值就定义为100。内核可以使用宏jiffies_to_clock_t()将一个有HZ表示的节拍计数转换为一个由USER_HZ表示的节拍计数。
- 在<Time.c(kernel)>中
- /*
- * Convert jiffies/jiffies_64 to clock_t and back.
- */
- clock_t jiffies_to_clock_t(long x)
- {
- #if (TICK_NSEC % (NSEC_PER_SEC / USER_HZ)) == 0
- return x / (HZ / USER_HZ);
- #else
- u64 tmp = (u64)x * TICK_NSEC;
- do_div(tmp, (NSEC_PER_SEC / USER_HZ));
- return (long)tmp;
- #endif
- }
- unsigned long clock_t_to_jiffies(unsigned long x)
- {
- #if (HZ % USER_HZ)==0
- if (x >= ~0UL / (HZ / USER_HZ))
- return ~0UL;
- return x * (HZ / USER_HZ);
- #else
- u64 jif;
- /* Don't worry about loss of precision here .. */
- if (x >= ~0UL / HZ * USER_HZ)
- return ~0UL;
- /* .. but do try to contain it here */
- jif = x * (u64) HZ;
- do_div(jif, USER_HZ);
- return jif;
- #endif
- }
- u64 jiffies_64_to_clock_t(u64 x)
- {
- #if (TICK_NSEC % (NSEC_PER_SEC / USER_HZ)) == 0
- do_div(x, HZ / USER_HZ);
- #else
- /*
- * There are better ways that don't overflow early,
- * but even this doesn't overflow in hundreds of years
- * in 64 bits, so..
- */
- x *= TICK_NSEC;
- do_div(x, (NSEC_PER_SEC / USER_HZ));
- #endif
- return x;
- }
- 在<Div64.h(include\asm-i385)>中
- /*
- * do_div() is NOT a C function. It wants to return
- * two values (the quotient and the remainder), but
- * since that doesn't work very well in C, what it
- * does is:
- *
- * - modifies the 64-bit dividend _in_place_
- * - returns the 32-bit remainder
- *
- * This ends up being the most efficient "calling
- * convention" on x86.
- */
- #define do_div(n,base) ({ \
- unsigned long __upper, __low, __high, __mod, __base; \
- __base = (base); \
- asm("":"=a" (__low), "=d" (__high):"A" (n)); \
- __upper = __high; \
- if (__high) { \
- __upper = __high % (__base); \
- __high = __high / (__base); \
- } \
- asm("divl %2":"=a" (__low), "=d" (__mod):"rm" (__base), "0" (__low), "1" (__upper)); \
- asm("":"=A" (n):"a" (__low),"d" (__high)); \
- __mod; \
- })
用户空间期望HZ=USER_HZ,但是如果它们不相等,则由宏完成转换。