Reading notes about Generic Time Subsystem implementation on linux.
Author: Honggang Yang(Joseph) <ganggexiongqi@gmail.com>
Kernel Version: Linux 3.1.1
==================================================================
REF:
http://kerneldox.com
Professional Linux Kernel Architecture
------------------------------------------------------------------------------------------------------------------------
Contents:
0. header
1. Overview
2. Working with Clock Sources
3. Clock Event Device
=============================================================
0. header
An overview of the building blocks employed to implement the timing
subsystem is given in Figure below. It gives a quick glance at what isinvoled in timekeeping, and how the components interact with each other.
Figure: Overview of the components that build up the timing subsytem.
As you can see, the raw hardware sits at the very bottom. Every typical
system has several devices, ususlly implemented by clock chips, that
provide timing functionality and can serve as clocks. IA-32 and AMD64 have
a programmable interrupt timer(PIT, implemented by the 8253 chip) as a classical
clock source the has only a modest resolution and stability. CPU-local APICs
(advanced programmable interrupt controllers) provide much better
resolution and stability. They are suitable as high-resolution time sources,
whereas the PIT is only good enough for low-resolution timers.
Hardware naturally needs to be programmed by architecture-specific code,
but the clock source abstraction provides a generic interface to all hardware
clock chips. Essentially, read access to the current value of the running
counter provided by a clock chip is granted. Periodic events do not comply
with a free running counter very well, thus another abstraction is required.
Clock events are the foundation of periodic events. However, clock events
can be more powerful. Some time devices can provide events at arbitrary,
irregular time points. In contrast to periodic event devices, they are called
one-shot devices.
1. Overview
Figure: Overview of the generic time subsytem
Three mechanisms form the foundation of any time-related task in the kernel:
1> Clock Source ( struct clocksource) -- Each clock source provides a monotonicallyincreasing counter with Read Only access for the generic parts. The
accurateness of the clocksource varies depending on the capabilities of the
underlying hardware.
2> Clock Event Devices ( struct clock_event_device) -- Add the possibility of
equipping clocks with events that occur at a certain time in the future.
We also refer to such devices as clock event source for historical reasons.
3> Tick Devices( struct tick_device) -- Extended clock event sources to provide
a continuous stream of tick events that happen at regular time intervals.
The kernel distinguishes between two types of clocks:
1> Global Clock -- It is responsible to provide the peridic tick that is mainly
used to update the @jiffies value. In former versions of the kernel, this
type of clock was realized by the PIT on IA-32 systems.
2> Local Clock -- one local clock per CPU allows for performing process
accounting, profiling, and last but not least, high-resolution timers.
2. Object for Time Management
Clock Sources:
129 /**
130 * struct clocksource - hardware abstraction for a free running counter
131 * Provides mostly state-free accessors to the underlying hardware.
132 * This is the structure used for system time.
133 *
134 * @name: ptr to clocksource name
135 * @list: list head for registration
136 * @rating: rating value for selection (higher is better)
137 * To avoid rating inflation the following
138 * list should give you a guide as to how
139 * to assign your clocksource a rating
140 * 1-99: Unfit for real use
141 * Only available for bootup and testing purposes.
142 * 100-199: Base level usability.
143 * Functional for real use, but not desired.
144 * 200-299: Good.
145 * A correct and usable clocksource.
146 * 300-399: Desired.
147 * A reasonably fast and accurate clocksource.
148 * 400-499: Perfect
149 * The ideal clocksource. A must-use where
150 * available.
151 * @read: returns a cycle value, passes clocksource as argument
152 * @enable: optional function to enable the clocksource
153 * @disable: optional function to disable the clocksource
154 * @mask: bitmask for two's complement
155 * subtraction of non 64 bit counters
156 * @mult: cycle to nanosecond multiplier
157 * @shift: cycle to nanosecond divisor (power of two)
158 * @max_idle_ns: max idle time permitted by the clocksource (nsecs)
159 * @flags: flags describing special properties
160 * @archdata: arch-specific data
161 * @suspend: suspend function for the clocksource, if necessary
162 * @resume: resume function for the clocksource, if necessary
163 */
164 struct clocksource {
165 /*
166 * Hotpath data, fits in a single cache line when the
167 * clocksource itself is cacheline aligned.
168 */
/*
* Used to read the current cycle value of the clock. Note that the value
* returned does not use any fixed timing basis for all clocks, but needs
* to be converted into a nanosecond value individully. Assume that
* @ret is the value returned from the @cs->read, the nanosecond value
* can be caculted by ((ret * mult) >> shift).
*/
169 cycle_t (*read)(struct clocksource *cs);
170 cycle_t cycle_last;
/*
* If a clock does not provide time values with 64bits, then @mask specifies
* a bitmask to select the appropriate bits.
*/
171 cycle_t mask;
172 u32 mult;
173 u32 shift;
174 u64 max_idle_ns;
175
176 #ifdef CONFIG_ARCH_CLOCKSOURCE_DATA
177 struct arch_clocksource_data archdata;
178 #endif
179
180 const char *name;
181 struct list_head list;
182 int rating;
183 int (*enable)(struct clocksource *cs);
184 void (*disable)(struct clocksource *cs);
/*
* CLOCK_SOURCE_CONTINUOUS reprents a continuous clock,
* it describes that the clock is free-running if set to 1 and thus
* can not skip. If it is set to 0, then some cycles maybe lost; that is,
* if the last cycle value was n, then the next value does not necessarily
* need to be (n + 1) even if it was read at the next possible moment.
* CLOCK_SOURCE_MUST_VERIFY: a clocksource to be watched. //ref: clocksource_check_watchdog
*/185 unsigned long flags;
186 void (*suspend)(struct clocksource *cs);
187 void (*resume)(struct clocksource *cs);
188
189 #ifdef CONFIG_CLOCKSOURCE_WATCHDOG
190 /* Watchdog related data, used by the framework */
191 struct list_head wd_list;
192 cycle_t cs_last;
193 cycle_t wd_last;
194 #endif
195 } ____cacheline_aligned;
164 /*[Clocksource internal variables]---------
165 * curr_clocksource:
166 * currently selected clocksource.
167 * clocksource_list:
168 * linked list with the registered clocksources
169 * clocksource_mutex:
170 * protects manipulations to curr_clocksource and the clocksource_list
171 * override_name:
172 * Name of the user-specified clocksource.
173 */
174 static struct clocksource *curr_clocksource;
175 static LIST_HEAD(clocksource_list);
176 static DEFINE_MUTEX(clocksource_mutex);
177 static char override_name[32];
178 static int finished_booting;
2. Working with Clock Sources
1> Register a clock source.
Before using a clock, you must register a clock source with the kernel.
The function clocksource_register() is responsible for this. The source
is only added to the global @clocksource_list, which sorts all available
clock sources by their rating.
2> Read the clock.
To read the clock, the kernel provides the following functions:
getnstimeofday() // in kernel/time/timekeeping.c
Details of the functions mentioned above:
Call Tree:
clocksource_register
clocksource_max_deferment
clocksource_enqueue
clocksource_enqueue_watchdog
clocksource_select
timekeeping_notify // Don't go to so far, they are out of our topic here
stop_machine(change_clocksource...)
tick_clock_notify
timekeeping_get_ns
clock->read(clock)
698 /**
699 * clocksource_register - Used to install new clocksources
700 * @t: clocksource to be registered
701 *
702 * Returns -EBUSY if registration fails, zero otherwise.
703 */
704 int clocksource_register(struct clocksource *cs)
705 {
706 /* calculate max idle time permitted for this clocksource */
/* Returns max time in nanosecond the clocksource can be deferred */
707 cs->max_idle_ns = clocksource_max_deferment(cs);
708
709 mutex_lock(&clocksource_mutex);
/* Enqueue the clocksource sorted by rating */
710 clocksource_enqueue(cs);
/* For simplicity, we omit this function now. */
711 clocksource_enqueue_watchdog(cs);
712 clocksource_select();
713 mutex_unlock(&clocksource_mutex);
714 return 0;
715 }
496 /**
497 * clocksource_max_deferment - Returns max time in nanosecond the clocksource can be deferred
498 * @cs: Pointer to clocksource
499 *
500 */
501 static u64 clocksource_max_deferment(struct clocksource *cs)
502 {
503 u64 max_nsecs, max_cycles;
504
505 /*
506 * Calculate the maximum number of cycles that we can pass to the
507 * cyc2ns function without overflowing a 64-bit signed result. The
508 * maximum number of cycles is equal to ULLONG_MAX/cs->mult which
509 * is equivalent to the below.
510 * max_cycles < (2^63)/cs->mult
511 * max_cycles < 2^(log2((2^63)/cs->mult))
512 * max_cycles < 2^(log2(2^63) - log2(cs->mult))
513 * max_cycles < 2^(63 - log2(cs->mult))
514 * max_cycles < 1 << (63 - log2(cs->mult))
515 * Please note that we add 1 to the result of the log2 to account for
516 * any rounding errors, ensure the above inequality is satisfied and
517 * no overflow will occur.
518 */
519 max_cycles = 1ULL << (63 - (ilog2(cs->mult) + 1));
520
521 /*
522 * The actual maximum number of cycles we can defer the clocksource is
523 * determined by the minimum of max_cycles and cs->mask.
524 */
525 max_cycles = min_t(u64, max_cycles, (u64) cs->mask);
526 max_nsecs = clocksource_cyc2ns(max_cycles, cs->mult, cs->shift);
527
528 /*
529 * To ensure that the clocksource does not wrap whilst we are idle,
530 * limit the time the clocksource can be deferred by 12.5%. Please
531 * note a margin of 12.5% is used because this can be computed with
532 * a shift, versus say 10% which would require division.
533 */
534 return max_nsecs - ( max_nsecs >> 3 /*5*/);
535 }
536
261 /**
262 * clocksource_cyc2ns - converts clocksource cycles to nanoseconds
263 *
264 * Converts cycles to nanoseconds, using the given mult and shift.
265 *
266 * XXX - This could use some mult_lxl_ll() asm optimization
267 */
268 static inline s64 clocksource_cyc2ns(cycle_t cycles, u32 mult, u32 shift)
269 {
270 return ((u64) cycles * mult) >> shift;
271 }
616 /*
617 * Enqueue the clocksource sorted by rating
618 */
619 static void clocksource_enqueue(struct clocksource *cs)
620 {
621 struct list_head *entry = &clocksource_list;
622 struct clocksource *tmp;
623
624 list_for_each_entry(tmp, &clocksource_list, list)
625 /* Keep track of the place, where to insert */
626 if (tmp->rating >= cs->rating)
627 entry = &tmp->list;
628 list_add(&cs->list, entry);
629 }
366 static void clocksource_enqueue_watchdog(struct clocksource *cs)
367 {
368 unsigned long flags;
369
370 spin_lock_irqsave(&watchdog_lock, flags);
371 if (cs->flags & CLOCK_SOURCE_MUST_VERIFY) {
372 /* cs is a clocksource to be watched. */
373 list_add(&cs->wd_list, &watchdog_list);
374 cs->flags &= ~CLOCK_SOURCE_WATCHDOG;
375 } else {
376 /* cs is a watchdog. */
377 if (cs->flags & CLOCK_SOURCE_IS_CONTINUOUS)
378 cs->flags |= CLOCK_SOURCE_VALID_FOR_HRES;
379 /* Pick the best watchdog. */
380 if (!watchdog || cs->rating > watchdog->rating) {
381 watchdog = cs;
382 /* Reset watchdog cycles */
383 clocksource_reset_watchdog();
384 }
385 }
386 /* Check if the watchdog timer needs to be started. */
387 clocksource_start_watchdog();
388 spin_unlock_irqrestore(&watchdog_lock, flags);
389 }
539 /**
540 * clocksource_select - Select the best clocksource available
541 *
542 * Private function. Must hold clocksource_mutex when called.
543 *
544 * Select the clocksource with the best rating, or the clocksource,
545 * which is selected by userspace override.
546 */
547 static void clocksource_select(void)
548 {
549 struct clocksource *best, *cs;
550
551 if (!finished_booting || list_empty(&clocksource_list))
552 return;
553 /* First clocksource on the list has the best rating. */
554 best = list_first_entry(&clocksource_list, struct clocksource, list);
555 /* Check for the override clocksource. */
556 list_for_each_entry(cs, &clocksource_list, list) {
557 if (strcmp(cs->name, override_name) != 0)
558 continue;
559 /*
560 * Check to make sure we don't switch to a non-highres
561 * capable clocksource if the tick code is in oneshot
562 * mode (highres or nohz)
563 */
564 if (!(cs->flags & CLOCK_SOURCE_VALID_FOR_HRES) &&
565 tick_oneshot_mode_active()) {
566 /* Override clocksource cannot be used. */
567 printk(KERN_WARNING "Override clocksource %s is not "
568 "HRT compatible. Cannot switch while in "
569 "HRT/NOHZ mode\n", cs->name);
570 override_name[0] = 0;
571 } else
572 /* Override clocksource can be used. */
573 best = cs;
574 break;
575 }
576 if (curr_clocksource != best) {
577 printk(KERN_INFO "Switching to clocksource %s\n", best->name);
578 curr_clocksource = best;
/* Install a new clock source, async notification about clocksource changes */
579 timekeeping_notify(curr_clocksource);
580 }
581 }
447 /**
448 * timekeeping_notify - Install a new clock source, Async notification about clocksource changes
449 * @clock: pointer to the clock source
450 *
451 * This function is called from clocksource.c after a new, better clock
452 * source has been registered. The caller holds the clocksource_mutex.
453 */
454 void timekeeping_notify(struct clocksource *clock)
455 {
456 if (timekeeper.clock == clock)
457 return;
/* Swaps clocksources if a new one is available */
458 stop_machine(change_clocksource, clock, NULL);
/* Async notification about clocksource changes */
459 tick_clock_notify();
460 }
211 /**
212 * getnstimeofday - Returns the time of day in a timespec
213 * @ts: pointer to the timespec to be set
214 *
215 * Returns the time of day in a timespec.
216 */
217 void getnstimeofday(struct timespec *ts)
218 {
219 unsigned long seq;
220 s64 nsecs;
221
222 WARN_ON(timekeeping_suspended);
223
224 do {
225 seq = read_seqbegin(&xtime_lock);
226
227 *ts = xtime;
228 nsecs = timekeeping_get_ns();
229
230 /* If arch requires, add in gettimeoffset() */
231 nsecs += arch_gettimeoffset();
232
233 } while (read_seqretry(&xtime_lock, seq));
234
235 timespec_add_ns(ts, nsecs);
236 }
237
238 EXPORT_SYMBOL(getnstimeofday);
104 /* Timekeeper helper functions. */
105 static inline s64 timekeeping_get_ns(void)
106 {
107 cycle_t cycle_now, cycle_delta;
108 struct clocksource *clock;
109
110 /* read clocksource: */
111 clock = timekeeper.clock;
112 cycle_now = clock->read(clock);
113
114 /* calculate the delta since the last update_wall_time: */
115 cycle_delta = (cycle_now - clock->cycle_last) & clock->mask;
116
117 /* return delta convert to nanoseconds using ntp adjusted mult. */
118 return clocksource_cyc2ns(cycle_delta, timekeeper.mult,
119 timekeeper.shift);
120 }
The following is the APIs of the clocksource you can use in your modules:
EXPORT_SYMBOL_GPL(timecounter_init); +
EXPORT_SYMBOL_GPL(timecounter_read); +
EXPORT_SYMBOL_GPL(timecounter_cyc2time); +
EXPORT_SYMBOL_GPL(__clocksource_updatefreq_scale); +
EXPORT_SYMBOL_GPL(__clocksource_register_scale);
EXPORT_SYMBOL(clocksource_register); +
EXPORT_SYMBOL(clocksource_change_rating); +
EXPORT_SYMBOL(clocksource_unregister); +
Now let get into the details of them.
clocksource_register() has been talked above.
clocksource_unregister:
737 /**
738 * clocksource_unregister - remove a registered clocksource and
select the best clocksource available
739 */
740 void clocksource_unregister(struct clocksource *cs)
741 {
742 mutex_lock(&clocksource_mutex);
743 clocksource_dequeue_watchdog(cs);
/* delete the @cs from the globale @clocksource_list */
744 list_del(&cs->list);
/* Select the best clocksource available */
745 clocksource_select();
746 mutex_unlock(&clocksource_mutex);
747 }
748 EXPORT_SYMBOL(clocksource_unregister);
timecounter_init:
29 /**
30 * struct cyclecounter - hardware abstraction for a free running counter
31 * Provides completely state-free accessors to the underlying hardware.
32 * Depending on which hardware it reads, the cycle counter may wrap
33 * around quickly. Locking rules (if necessary) have to be defined
34 * by the implementor and user of specific instances of this API.
35 *
36 * @read: returns the current cycle value
37 * @mask: bitmask for two's complement
38 * subtraction of non 64 bit counters,
39 * see CLOCKSOURCE_MASK() helper macro
40 * @mult: cycle to nanosecond multiplier
41 * @shift: cycle to nanosecond divisor (power of two)
42 */
43 struct cyclecounter {
44 cycle_t (*read)(const struct cyclecounter *cc);
45 cycle_t mask;
46 u32 mult;
47 u32 shift;
48 };
50 /**
51 * struct timecounter - layer above a %struct cyclecounter which counts nanoseconds
52 * Contains the state needed by timecounter_read() to detect
53 * cycle counter wrap around. Initialize with
54 * timecounter_init(). Also used to convert cycle counts into the
55 * corresponding nanosecond counts with timecounter_cyc2time(). Users
56 * of this code are responsible for initializing the underlying
57 * cycle counter hardware, locking issues and reading the time
58 * more often than the cycle counter wraps around. The nanosecond
59 * counter will only wrap around after ~585 years.
60 *
61 * @cc: the cycle counter used by this instance
62 * @cycle_last: most recent cycle counter value seen by
63 * timecounter_read()
64 * @nsec: continuously increasing count
65 */
66 struct timecounter {
67 const struct cyclecounter *cc;
68 cycle_t cycle_last;
69 u64 nsec; ///?????
70 };
/*
* Init the timecounter @tc
*/
34 void timecounter_init(struct timecounter *tc,
35 const struct cyclecounter *cc,
36 u64 start_tstamp)
37 {
38 tc->cc = cc;
39 tc->cycle_last = cc->read(cc);
40 tc->nsec = start_tstamp;
41 }
42 EXPORT_SYMBOL_GPL(timecounter_init);
/*
* Return the increment time by nanoseconds since last call and added the
* value to @tc->nsec.
*/
75 u64 timecounter_read(struct timecounter *tc)
76 {
77 u64 nsec;
78
79 /* increment time by nanoseconds since last call */
80 nsec = timecounter_read_delta(tc);
81 nsec += tc->nsec;
82 tc->nsec = nsec;
83
84 return nsec;
85 }
86 EXPORT_SYMBOL_GPL(timecounter_read);
44 /**
45 * timecounter_read_delta - get nanoseconds since last call of this function
46 * @tc: Pointer to time counter
47 *
48 * When the underlying cycle counter runs over, this will be handled
49 * correctly as long as it does not run over more than once between
50 * calls.
51 *
52 * The first call to this function for a new time counter initializes
53 * the time tracking and returns an undefined result.
54 */
55 static u64 timecounter_read_delta(struct timecounter *tc)
56 {
57 cycle_t cycle_now, cycle_delta;
58 u64 ns_offset;
59
60 /* read cycle counter: */
61 cycle_now = tc->cc->read(tc->cc);
62
63 /* calculate the delta since the last timecounter_read_delta(): */
64 cycle_delta = (cycle_now - tc->cycle_last) & tc->cc->mask;
65
66 /* convert to nanoseconds: */
67 ns_offset = cyclecounter_cyc2ns(tc->cc, cycle_delta);
68
69 /* update time stamp of timecounter_read_delta() call: */
70 tc->cycle_last = cycle_now;
71
72 return ns_offset;
73 }
/*
* timecounter_cyc2time - convert a cycle counter to same time base as
* values returned by timecounter_read()
*
* @tc: Pointer to time counter
* @cycle_tstamp: a value returned by @tc->cc->read()
*
* Cycle counts that are converted correctly as long as they fall into the
* interval [-1/2 max cycle count, +1/2 max cycle count], with
* "max cycle count" == cs->mask+1.This allows conversion of cycle counter
* values which were generated in the past.
* REF: http://kerneldox.com/dd/dca/clocksource_8h.html
*/
88 u64 timecounter_cyc2time(struct timecounter *tc,
89 cycle_t cycle_tstamp)
90 {
91 u64 cycle_delta = (cycle_tstamp - tc->cycle_last) & tc->cc->mask;
92 u64 nsec;
93
94 /*
95 * Instead of always treating cycle_tstamp as more recent
96 * than tc->cycle_last, detect when it is too far in the
97 * future and treat it as old time stamp instead.
98 */
99 if (cycle_delta > tc->cc->mask / 2) {
100 cycle_delta = (tc->cycle_last - cycle_tstamp) & tc->cc->mask;
101 nsec = tc->nsec - cyclecounter_cyc2ns(tc->cc, cycle_delta);
102 } else {
103 nsec = cyclecounter_cyc2ns(tc->cc, cycle_delta) + tc->nsec;
104 }
105
106 return nsec;
107 }
108 EXPORT_SYMBOL_GPL(timecounter_cyc2time);
109
call tree:
clocksource_change_rating
__clocksource_change_rating
/* delete @cs from the global list*/
/* Change @cs's rating */
clocksource_enqueue
clocksource_select
726 /**
727 * clocksource_change_rating - Change the rating of a registered clocksource
728 */
729 void clocksource_change_rating(struct clocksource *cs, int rating)
730 {
731 mutex_lock(&clocksource_mutex);
732 __clocksource_change_rating(cs, rating);
733 mutex_unlock(&clocksource_mutex);
734 }
735 EXPORT_SYMBOL(clocksource_change_rating);
631 /**
632 * __clocksource_updatefreq_scale - Used update clocksource with new freq
633 * @t: clocksource to be registered
634 * @scale: Scale factor multiplied against freq to get clocksource hz
635 * @freq: clocksource frequency (cycles per second) divided by scale
636 *
637 * This should only be called from the clocksource->enable() method.
638 *
639 * This *SHOULD NOT* be called directly! Please use the
640 * clocksource_updatefreq_hz() or clocksource_updatefreq_khz helper functions.
641 */
/* HOW TO USE: __clocksource_updatefreq_scale(cs, 1000, khz);
* __clocksource_updatefreq_scale(cs, 1, hz);
*/
642 void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq)
643 {
644 u64 sec;
645
646 /*
647 * Calc the maximum number of seconds which we can run before
648 * wrapping around. For clocksources which have a mask > 32bit
649 * we need to limit the max sleep time to have a good
650 * conversion precision. 10 minutes is still a reasonable
651 * amount. That results in a shift value of 24 for a
652 * clocksource with mask >= 40bit and f >= 4GHz. That maps to
653 * ~ 0.06ppm granularity for NTP. We apply the same 12.5%
654 * margin as we do in clocksource_max_deferment()
655 */
656 sec = (cs->mask - (cs->mask >> 3 /*5*/)); /* Joseph: I think it should be >> 3 in order to leave a margin of 12.5%*/
657 do_div(sec, freq);
658 do_div(sec, scale);
659 if (!sec)
660 sec = 1;
661 else if (sec > 600 && cs->mask > UINT_MAX)
662 sec = 600;
663
/*
* Generate the proper mult/shift pair
*/
664 clocks_calc_mult_shift(&cs->mult, &cs->shift, freq,
665 NSEC_PER_SEC / scale, sec * scale);
666 cs->max_idle_ns = clocksource_max_deferment(cs);
667 }
/**
* __clocksource_register_scale - Used to install new clocksources
* @t: clocksource to be registered
* @scale: Scale factor multiplied against freq to get clocksource hz
* @freq: clocksource frequency (cycles per second) divided by scale
*
* Returns -EBUSY if registration fails, zero otherwise.
*
* This *SHOULD NOT* be called directly! Please use the
* clocksource_register_hz() or clocksource_register_khz helper functions.
*/
int __clocksource_register_scale(struct clocksource *cs, u32 scale, u32 freq)
{
/* Initialize mult/shift and max_idle_ns */
__clocksource_updatefreq_scale(cs, scale, freq);
/* Add clocksource to the clcoksource list */
mutex_lock(&clocksource_mutex);
clocksource_enqueue(cs);
clocksource_enqueue_watchdog(cs);
clocksource_select();
mutex_unlock(&clocksource_mutex);
return 0;
}
EXPORT_SYMBOL_GPL(__clocksource_register_scale);
---------------------------------------
Initialization of clocksource
Call Tree:
start_kernel
timekeeping_init
read_persistent_clock
clocksource_default_clock
time_init /* late_time_init = x86_late_time_init */
...
late_time_init /* x86_late_time_init() called */
hpet_time_init
hpet_enable
...
hpet_clocksource_register
hpet_restart_counter
clocksource_register_hz(&clocksource_hpet, (u32)hpet_freq)
__clocksource_register_scale(cs, 1, hz)
hpet_legacy_clockevent_register
clockevents_register_device
...
//setup_pit_timer
clockevent_i8253_init
clockevents_config_and_register(&i8253_clockevent ...)
clockevents_config
clockevents_register_device
/* global_clock_event = &i8253_clockevent */
setup_default_timer_irq
559 /*
560 * timekeeping_init - Initializes the clocksource and common timekeeping values
561 */
562 void __init timekeeping_init(void)
563 {
564 struct clocksource *clock;
565 unsigned long flags;
566 struct timespec now, boot;
567
/* read the current time of the systme from RTC */
568 read_persistent_clock(&now);
569 read_boot_clock(&boot);
570
571 write_seqlock_irqsave(&xtime_lock, flags);
572
573 ntp_init();
574
/* choose &clocksource_jiffies as the current clock
*
* Because when we call timekeeping_init(), the clocksources in the
* system have not finished their initialization. Only jiffies clocksource
* is available here.
*/
575 clock = clocksource_default_clock();
576 if (clock->enable)
577 clock->enable(clock);
578 timekeeper_setup_internals(clock);
579
580 xtime.tv_sec = now.tv_sec;
581 xtime.tv_nsec = now.tv_nsec;
582 raw_time.tv_sec = 0;
583 raw_time.tv_nsec = 0;
584 if (boot.tv_sec == 0 && boot.tv_nsec == 0) {
585 boot.tv_sec = xtime.tv_sec;
586 boot.tv_nsec = xtime.tv_nsec;
587 }
588 set_normalized_timespec(&wall_to_monotonic,
589 -boot.tv_sec, -boot.tv_nsec);
590 total_sleep_time.tv_sec = 0;
591 total_sleep_time.tv_nsec = 0;
592 write_sequnlock_irqrestore(&xtime_lock, flags);
593 }
594
//arch/x86/kernel/x86_init.c
98 struct x86_platform_ops x86_platform = {
99 .calibrate_tsc = native_calibrate_tsc,
100 .get_wallclock = mach_get_cmos_time,
101 .set_wallclock = mach_set_rtc_mmss,
102 .iommu_shutdown = iommu_shutdown_noop,
103 .is_untracked_pat_range = is_ISA_range,
104 .nmi_init = default_nmi_init,
105 .i8042_detect = default_i8042_detect
106 };
185 /* not static: needed by APM */
/* read the current time of the systme from RTC */
186 void read_persistent_clock(struct timespec *ts)
187 {
188 unsigned long retval;
189
/* From the defination of the x86_platform, we can know
* mach_get_cmos_time() is called actually.
* It read the current time of the systme from RTC
*/
190 retval = x86_platform.get_wallclock();
191
192 ts->tv_sec = retval;
193 ts->tv_nsec = 0;
194 }
94 struct clocksource * __init __weak clocksource_default_clock(void)
95 {
96 return &clocksource_jiffies;
97 }
30 /* The Jiffies based clocksource is the lowest common
31 * denominator clock source which should function on
32 * all systems. It has the same coarse resolution as
33 * the timer interrupt frequency HZ and it suffers
34 * inaccuracies caused by missed or lost timer
35 * interrupts and the inability for the timer
36 * interrupt hardware to accuratly tick at the
37 * requested HZ value. It is also not recommended
38 * for "tick-less" systems.
39 */
40 #define NSEC_PER_JIFFY ((u32)((((u64)NSEC_PER_SEC)<<8)/ACTHZ))
41
42 /* Since jiffies uses a simple NSEC_PER_JIFFY multiplier
43 * conversion, the .shift value could be zero. However
44 * this would make NTP adjustments impossible as they are
45 * in units of 1/2^.shift. Thus we use JIFFIES_SHIFT to
46 * shift both the nominator and denominator the same
47 * amount, and give ntp adjustments in units of 1/2^8
48 *
49 * The value 8 is somewhat carefully chosen, as anything
50 * larger can result in overflows. NSEC_PER_JIFFY grows as
51 * HZ shrinks, so values greater than 8 overflow 32bits when
52 * HZ=100.
53 */
54 #define JIFFIES_SHIFT 8
55
56 static cycle_t jiffies_read(struct clocksource *cs)
57 {
58 return (cycle_t) jiffies;
59 }
60
61 struct clocksource clocksource_jiffies = {
62 .name = "jiffies",
63 .rating = 1, /* lowest valid rating*/
64 .read = jiffies_read,
65 .mask = 0xffffffff, /*32bits*/
66 .mult = NSEC_PER_JIFFY << JIFFIES_SHIFT, /* details above */
67 .shift = JIFFIES_SHIFT,
68 };
/* 'Auto' registeration of the clocksource jiffies */
87 static int __init init_jiffies_clocksource(void)
88 {
89 return clocksource_register(&clocksource_jiffies);
90 }
91
92 core_initcall(init_jiffies_clocksource);
The @clocksource_jiffies above is a simple example of how to implement
your clocksource.
71 static struct irqaction irq0 = {
72 .handler = timer_interrupt,
73 .flags = IRQF_DISABLED | IRQF_NOBALANCING | IRQF_IRQPOLL | IRQF_TIMER,
74 .name = "timer"
75 };
76
77 void __init setup_default_timer_irq(void)
78 {
79 setup_irq(0, &irq0);
80 }
81
82 /* Default timer init function */
83 void __init hpet_time_init(void)
84 {
85 if (!hpet_enable())
86 setup_pit_timer();
87 setup_default_timer_irq();
88 }
89
90 static __init void x86_late_time_init(void)
91 {
/* hpet_time_init() called */
92 x86_init.timers.timer_init();
93 tsc_init();
94 }
95
96 /*
97 * Initialize TSC and delay the periodic timer init to
98 * late x86_late_time_init() so ioremap works.
99 */
100 void __init time_init(void)
101 {
102 late_time_init = x86_late_time_init;
103 }
31 /*
32 * The platform setup functions are preset with the default functions
33 * for standard PC hardware.
34 */
35 struct x86_init_ops x86_init __initdata = {//arch/x86/kernel/x86_init.c
...
73 .timers = {
74 .setup_percpu_clockev = setup_boot_APIC_clock,
75 .tsc_pre_init = x86_init_noop,
76 .timer_init = hpet_time_init,
77 .wallclock_init = x86_init_noop,
78 },
...
};
787 /**
788 * hpet_enable - Try to setup the HPET timer. Returns 1 on success.
789 */
790 int __init hpet_enable(void)
791 {
792 unsigned long hpet_period;
793 unsigned int id;
794 u64 freq;
795 int i;
796
797 if (!is_hpet_capable())
798 return 0;
799
800 hpet_set_mapping();
801
802 /*
803 * Read the period and check for a sane value:
804 */
805 hpet_period = hpet_readl(HPET_PERIOD);
806
807 /*
808 * AMD SB700 based systems with spread spectrum enabled use a
809 * SMM based HPET emulation to provide proper frequency
810 * setting. The SMM code is initialized with the first HPET
811 * register access and takes some time to complete. During
812 * this time the config register reads 0xffffffff. We check
813 * for max. 1000 loops whether the config register reads a non
814 * 0xffffffff value to make sure that HPET is up and running
815 * before we go further. A counting loop is safe, as the HPET
816 * access takes thousands of CPU cycles. On non SB700 based
817 * machines this check is only done once and has no side
818 * effects.
819 */
820 for (i = 0; hpet_readl(HPET_CFG) == 0xFFFFFFFF; i++) {
821 if (i == 1000) {
822 printk(KERN_WARNING
823 "HPET config register value = 0xFFFFFFFF. "
824 "Disabling HPET\n");
825 goto out_nohpet;
826 }
827 }
828
829 if (hpet_period < HPET_MIN_PERIOD || hpet_period > HPET_MAX_PERIOD)
830 goto out_nohpet;
831
832 /*
833 * The period is a femto seconds value. Convert it to a
834 * frequency.
835 */
836 freq = FSEC_PER_SEC;
837 do_div(freq, hpet_period);
838 hpet_freq = freq;
839
840 /*
841 * Read the HPET ID register to retrieve the IRQ routing
842 * information and the number of channels
843 */
844 id = hpet_readl(HPET_ID);
845 hpet_print_config();
846
847 #ifdef CONFIG_HPET_EMULATE_RTC
848 /*
849 * The legacy routing mode needs at least two channels, tick timer
850 * and the rtc emulation channel.
851 */
852 if (!(id & HPET_ID_NUMBER))
853 goto out_nohpet;
854 #endif
855
/* The following is the most important part */
856 if (hpet_clocksource_register())
857 goto out_nohpet;
858
859 if (id & HPET_ID_LEGSUP) {
860 hpet_legacy_clockevent_register();
861 return 1;
862 }
863 return 0;
864
865 out_nohpet:
866 hpet_clear_mapping();
867 hpet_address = 0;
868 return 0;
869 }
870
754 static int hpet_clocksource_register(void)
755 {
756 u64 start, now;
757 cycle_t t1;
758
759 /* Start the counter */
760 hpet_restart_counter();
761
762 /* Verify whether hpet counter works */
763 t1 = hpet_readl(HPET_COUNTER);
764 rdtscll(start);
765
766 /*
767 * We don't know the TSC frequency yet, but waiting for
768 * 200000 TSC cycles is safe:
769 * 4 GHz == 50us
770 * 1 GHz == 200us
771 */
772 do {
773 rep_nop();
774 rdtscll(now);
775 } while ((now - start) < 200000UL);
776
777 if (t1 == hpet_readl(HPET_COUNTER)) {
778 printk(KERN_WARNING
779 "HPET counter not counting. HPET disabled\n");
780 return -ENODEV;
781 }
782
783 clocksource_register_hz(&clocksource_hpet, (u32)hpet_freq);
784 return 0;
785 }
20 void __init setup_pit_timer(void)
21 {
22 clockevent_i8253_init(true);
23 global_clock_event = &i8253_clockevent;
24 }
--------------
3. Clock Event Device
-------------
Clock event devices allow for gegistering an event that is going to happen
at a defined point of time in the future. In comparison to a full-blown timer
implementation, however, only a single event can be stored.
The key elements of every clock_event_device are set_next_event because
it allows for setting the time at which the events is going to take place, and
event_handler, which is called when the event acturally happens.
57 /**
58 * struct clock_event_device - clock event device descriptor
59 * @event_handler: Assigned by the framework to be called by the low
60 * level handler of the event source
61 * @set_next_event: set next event function
62 * @next_event: local storage for the next event in oneshot mode
63 * @max_delta_ns: maximum delta value in ns
64 * @min_delta_ns: minimum delta value in ns
65 * @mult: nanosecond to cycles multiplier
66 * @shift: nanoseconds to cycles divisor (power of two)
67 * @mode: operating mode assigned by the management code
68 * @features: features
69 * @retries: number of forced programming retries
70 * @set_mode: set mode function
71 * @broadcast: function to broadcast events
72 * @min_delta_ticks: minimum delta value in ticks stored for reconfiguration
73 * @max_delta_ticks: maximum delta value in ticks stored for reconfiguration
74 * @name: ptr to clock event name
75 * @rating: variable to rate clock event devices
76 * @irq: IRQ number (only for non CPU local devices)
77 * @cpumask: cpumask to indicate for which CPUs this device works
78 * @list: list head for the management code
79 */
80 struct clock_event_device {
81 void (*event_handler)(struct clock_event_device *);
/*
* Generic code does not need to call set_next_event directly because the kernel
* provides a auxiliary function clockevents_program_event() for this.
*/
82 int (*set_next_event)(unsigned long evt,
83 struct clock_event_device *);
84 ktime_t next_event;
/*
* max_delta_ns and min_delta_ns specify the maximum or minimum, respectively,
* differencies between the current time and the time for the next event.
* Consider, for instance, that the current time is 20, min_delta_ns is 2, and
* max_delta_ns is 40. Then the next event can take place during the time
* interval [22, 60] where the boundaries are included.
*/
85 u64 max_delta_ns;
86 u64 min_delta_ns;
/*
* mult and shift are a multiplier and divider, respectively, used to convert
* between clock cycles and nanosecond values.
* For instance, a 10 ns needed to converted to cyclic number is
* ((10 * mult) >> shift).
*/
87 u32 mult;
88 u32 shift;
89 enum clock_event_mode mode;
/*
* Clock event devices that support periodic events are identified by CLOCK_EVT_FEAT_PERIODIC
* CLOCK_EVT_FEAT_ONESHOT marks a clock capable of issuing one-shot events that happen
* exactly once.Basically, this is the opposite of th periodic events.
*/
90 unsigned int features;
91 unsigned long retries;
92
/*
* broadcast is required fo the broadcasting implementation that provides a
* workaround for nonfunctional local APICs on IA-32 and AMD64 in power-saving
* mode.
*/
93 void (*broadcast)(const struct cpumask *mask);
/*
* set_mode points to a function that allows for toggling the desired mode of operaton
* between periodic and one-shot mode at a time, but it can nevertheless work in both modes
* at the same time.
*/
94 void (*set_mode)(enum clock_event_mode mode,
95 struct clock_event_device *);
96 unsigned long min_delta_ticks;
97 unsigned long max_delta_ticks;
98
99 const char *name;
100 int rating;
/*
* irq specifies the number of the IRQ that is used by the event device. Note
* that is only required for global devices. Per-CPU local devices use different
* hardware mechanisms to emit signals and set irq to -1.
*/
101 int irq;
102 const struct cpumask *cpumask;
103 struct list_head list;
104 } ____cacheline_aligned;
24 /* The registered clock event devices */
25 static LIST_HEAD(clockevent_devices);
26 static LIST_HEAD(clockevents_released);
27
28 /* Notification for clock events */
29 static RAW_NOTIFIER_HEAD(clockevents_chain);
30
31 /* Protection for the above */
32 static DEFINE_RAW_SPINLOCK(clockevents_lock);
APIs:
EXPORT_SYMBOL_GPL(clockevent_delta2ns);
EXPORT_SYMBOL_GPL(clockevents_register_device);
EXPORT_SYMBOL_GPL(clockevents_notify);
34 /**
35 * clockevents_delta2ns - Convert a latch value (device ticks) to nanoseconds
36 * @latch: value to convert
37 * @evt: pointer to clock event device descriptor
38 *
39 * Math helper, returns latch value converted to nanoseconds (bound checked)
40 */
41 u64 clockevent_delta2ns(unsigned long latch, struct clock_event_device *evt)
42 {
43 u64 clc = (u64) latch << evt->shift;
44
45 if (unlikely(!evt->mult)) {
46 evt->mult = 1;
47 WARN_ON(1);
48 }
49
50 do_div(clc, evt->mult);
51 if (clc < 1000)
52 clc = 1000;
53 if (clc > KTIME_MAX)
54 clc = KTIME_MAX;
55
56 return clc;
57 }
58 EXPORT_SYMBOL_GPL(clockevent_delta2ns);
Call Tree:
clockevents_register_device
/*add clock event device to global clockevent_devices list */
clockevents_do_notify
raw_notifier_call_chain
clockevents_notify_released
/*
* All clockevent devices in global list @clockevents_released are added to
* global clockevent devices list @clockevent_devices
*/
clockevents_do_notify
raw_notifier_call_chain
176 /**
177 * clockevents_register_device - register a clock event device
178 * @dev: device to register
179 */
180 void clockevents_register_device(struct clock_event_device *dev)
181 {
182 unsigned long flags;
183
184 BUG_ON(dev->mode != CLOCK_EVT_MODE_UNUSED);
185 if (!dev->cpumask) {
186 WARN_ON(num_possible_cpus() > 1);
187 dev->cpumask = cpumask_of(smp_processor_id());
188 }
189
190 raw_spin_lock_irqsave(&clockevents_lock, flags);
191
192 list_add(&dev->list, &clockevent_devices);
193 clockevents_do_notify(CLOCK_EVT_NOTIFY_ADD, dev);
194 clockevents_notify_released();
195
196 raw_spin_unlock_irqrestore(&clockevents_lock, flags);
197 }
198 EXPORT_SYMBOL_GPL(clockevents_register_device);
/*
* All clockevent devices in global list @clockevents_released are added to
* global clockevent devices list @clockevent_devices
*/
163 static void clockevents_notify_released(void)
164 {
165 struct clock_event_device *dev;
166
167 while (!list_empty(&clockevents_released)) {
168 dev = list_entry(clockevents_released.next,
169 struct clock_event_device, list);
170 list_del(&dev->list);
171 list_add(&dev->list, &clockevent_devices);
172 clockevents_do_notify(CLOCK_EVT_NOTIFY_ADD, dev);
173 }
174 }
302 /**
303 * clockevents_notify - notification about relevant events
304 */
305 void clockevents_notify(unsigned long reason, void *arg)
306 {
307 struct clock_event_device *dev, *tmp;
308 unsigned long flags;
309 int cpu;
310
311 raw_spin_lock_irqsave(&clockevents_lock, flags);
312 clockevents_do_notify(reason, arg);
313
314 switch (reason) {
315 case CLOCK_EVT_NOTIFY_CPU_DEAD:
316 /*
317 * Unregister the clock event devices which were
318 * released from the users in the notify chain.
319 */
320 list_for_each_entry_safe(dev, tmp, &clockevents_released, list)
321 list_del(&dev->list);
322 /*
323 * Now check whether the CPU has left unused per cpu devices
324 */
325 cpu = *((int *)arg);
326 list_for_each_entry_safe(dev, tmp, &clockevent_devices, list) {
327 if (cpumask_test_cpu(cpu, dev->cpumask) &&
328 cpumask_weight(dev->cpumask) == 1 &&
329 !tick_is_broadcast_device(dev)) {
330 BUG_ON(dev->mode != CLOCK_EVT_MODE_UNUSED);
331 list_del(&dev->list);
332 }
333 }
334 break;
335 default:
336 break;
337 }
338 raw_spin_unlock_irqrestore(&clockevents_lock, flags);
339 }
340 EXPORT_SYMBOL_GPL(clockevents_notify);
Notify request released by clockevents_do_notify() finally handled by
tick_notify() in kernel/time/tick-common.c
Call Tree:
start_kernel
tick_init
clockevents_register_notifier(&tick_notifier)
407 static struct notifier_block tick_notifier = {
408 .notifier_call = tick_notify,
409 };
410
411 /**
412 * tick_init - initialize the tick control
413 *
414 * Register the notifier with the clockevents framework
415 */
416 void __init tick_init(void)
417 {
418 clockevents_register_notifier(&tick_notifier);
419 }
359 /*
360 * Notification about clock event devices
361 */
362 static int tick_notify(struct notifier_block *nb, unsigned long reason,
363 void *dev)
364 {
365 switch (reason) {
366
367 case CLOCK_EVT_NOTIFY_ADD:
368 return tick_check_new_device(dev);
369
370 case CLOCK_EVT_NOTIFY_BROADCAST_ON:
371 case CLOCK_EVT_NOTIFY_BROADCAST_OFF:
372 case CLOCK_EVT_NOTIFY_BROADCAST_FORCE:
373 tick_broadcast_on_off(reason, dev);
374 break;
375
376 case CLOCK_EVT_NOTIFY_BROADCAST_ENTER:
377 case CLOCK_EVT_NOTIFY_BROADCAST_EXIT:
378 tick_broadcast_oneshot_control(reason);
379 break;
380
381 case CLOCK_EVT_NOTIFY_CPU_DYING:
382 tick_handover_do_timer(dev);
383 break;
384
385 case CLOCK_EVT_NOTIFY_CPU_DEAD:
386 tick_shutdown_broadcast_oneshot(dev);
387 tick_shutdown_broadcast(dev);
388 tick_shutdown(dev);
389 break;
390
391 case CLOCK_EVT_NOTIFY_SUSPEND:
392 tick_suspend();
393 tick_suspend_broadcast();
394 break;
395
396 case CLOCK_EVT_NOTIFY_RESUME:
397 tick_resume();
398 break;
399
400 default:
401 break;
402 }
403
404 return NOTIFY_OK;
405 }
406
Before go further, a new structure is introduced here.
13 enum tick_device_mode {
14 TICKDEV_MODE_PERIODIC,
15 TICKDEV_MODE_ONESHOT,
16 };
17
18 struct tick_device {
19 struct clock_event_device *evtdev;
20 enum tick_device_mode mode;
21 };
A tick_device is just a wrapper around struct clock_event_device with
additional field that specifies which mode the device is in. This can either be
periodic or one-shot. The distinction will be important when tickless systems
are considered. A tick device can be seen as mechanism to provides a
continuous stream of tick events here. These form the basic for the scheduler,
the classical timer wheel, and related components of the kernel.
Note that the kernel automatically creats a tick device for the new clock
event device is registered(see tick_check_new_device() for details).
Some global variables are defined.
- tick_cpu_device is a per-CPU list containing one instance of struct tick_device
for each CPU in the system.
- tick_next_period specifies the time( in nanoseconds) when the next global tick event will happen.
- tick_do_timer_cpu contains the CPU number whose tick device assumes the role of the
global tick device.
- tick_period stores the interval between ticks in nanoseconds.
Let us back to the tick_notify() analysis.
Call Tree:
tick_check_new_device// Be called by tick_notify() when new clock event device added.
205 /*
206 * Check, if the new registered device should be used.
207 */
208 static int tick_check_new_device(struct clock_event_device *newdev)
209 {
210 struct clock_event_device *curdev;
211 struct tick_device *td;
212 int cpu, ret = NOTIFY_OK;
213 unsigned long flags;
214
215 raw_spin_lock_irqsave(&tick_device_lock, flags);
216
217 cpu = smp_processor_id();
218 if (!cpumask_test_cpu(cpu, newdev->cpumask))
219 goto out_bc;
220
/*
* tick_cpu_device is a per-CPU list containing one instance of struct tick_device
* for each CPU in the system.
*/
221 td = &per_cpu(tick_cpu_device, cpu);
222 curdev = td->evtdev;
223
224 /* cpu local device ? */
225 if (!cpumask_equal(newdev->cpumask, cpumask_of(cpu))) {
226
227 /*
228 * If the cpu affinity of the device interrupt can not
229 * be set, ignore it.
230 */
/* If the new device is not a local device, first, check if it can send IRQ
* to the current CPU. If not, use the origion one.
*/
231 if (!irq_can_set_affinity(newdev->irq))
232 goto out_bc;
233
234 /*
235 * If we have a cpu local device already, do not replace it
236 * by a non cpu local device
237 */
238 if (curdev && cpumask_equal(curdev->cpumask, cpumask_of(cpu)))
239 goto out_bc;
240 }
241
242 /*
243 * If we have an active device, then check the rating and the oneshot
244 * feature.
245 */
246 if (curdev) {
247 /*
248 * Prefer one shot capable devices !
249 */
250 if ((curdev->features & CLOCK_EVT_FEAT_ONESHOT) &&
251 !(newdev->features & CLOCK_EVT_FEAT_ONESHOT))
252 goto out_bc;
253 /*
254 * Check the rating
255 */
256 if (curdev->rating >= newdev->rating)
257 goto out_bc;
258 }
259
260 /*
261 * Replace the eventually existing device by the new
262 * device. If the current device is the broadcast device, do
263 * not give it back to the clockevents layer !
264 */
265 if (tick_is_broadcast_device(curdev)) {
/* Close the current device */
266 clockevents_shutdown(curdev);
267 curdev = NULL;
268 }
269 clockevents_exchange_device(curdev, newdev);
270 tick_setup_device(td, newdev, cpu, cpumask_of(cpu));
271 if (newdev->features & CLOCK_EVT_FEAT_ONESHOT)
272 tick_oneshot_notify();
273
274 raw_spin_unlock_irqrestore(&tick_device_lock, flags);
275 return NOTIFY_STOP;
276
277 out_bc:
278 /*
279 * Can the new device be used as a broadcast device ?
280 */
281 if (tick_check_broadcast_device(newdev))
282 ret = NOTIFY_STOP;
283
284 raw_spin_unlock_irqrestore(&tick_device_lock, flags);
285
286 return ret;
287 }
288
64 /*
65 * Check, if the device can be utilized as broadcast device:
66 */
67 int tick_check_broadcast_device(struct clock_event_device *dev)
68 {
69 if ((tick_broadcast_device.evtdev &&
70 tick_broadcast_device.evtdev->rating >= dev->rating) ||
71 (dev->features & CLOCK_EVT_FEAT_C3STOP))
72 return 0;
73
74 clockevents_exchange_device(NULL, dev);
75 tick_broadcast_device.evtdev = dev;
76 if (!cpumask_empty(tick_get_broadcast_mask()))
77 tick_broadcast_start_periodic(dev);
78 return 1;
79 }
271 /**
272 * clockevents_exchange_device - release and request clock devices
273 * @old: device to release (can be NULL)
274 * @new: device to request (can be NULL)
275 *
276 * Called from the notifier chain. clockevents_lock is held already
277 */
278 void clockevents_exchange_device(struct clock_event_device *old,
279 struct clock_event_device *new)
280 {
281 unsigned long flags;
282
283 local_irq_save(flags);
284 /*
285 * Caller releases a clock event device. We queue it into the
286 * released list and do a notify add later.
287 */
288 if (old) {
289 clockevents_set_mode(old, CLOCK_EVT_MODE_UNUSED);
290 list_del(&old->list);
291 list_add(&old->list, &clockevents_released);
292 }
293
294 if (new) {
295 BUG_ON(new->mode != CLOCK_EVT_MODE_UNUSED);
296 clockevents_shutdown(new);
297 }
298 local_irq_restore(flags);
299 }
Call Tree:
tick_notifytick_check_new_device
tick_setup_device
147 /*
148 * Setup the tick device
149 */
150 static void tick_setup_device(struct tick_device *td,
151 struct clock_event_device *newdev, int cpu,
152 const struct cpumask *cpumask)
153 {
154 ktime_t next_event;
155 void (*handler)(struct clock_event_device *) = NULL;
156
157 /*
158 * First device setup ?
159 */
160 if (!td->evtdev) {
161 /*
162 * If no cpu took the do_timer update, assign it to
163 * this cpu:
164 */
165 if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) {
166 tick_do_timer_cpu = cpu;
167 tick_next_period = ktime_get();
168 tick_period = ktime_set(0, NSEC_PER_SEC / HZ);
169 }
170
171 /*
172 * Startup in periodic mode first.
173 */
174 td->mode = TICKDEV_MODE_PERIODIC;
175 } else {
/* The CPU have had a clock event device */
176 handler = td->evtdev->event_handler;
177 next_event = td->evtdev->next_event;
178 td->evtdev->event_handler = clockevents_handle_noop;
179 }
180
/* Current CPU will use the new clock event device */
181 td->evtdev = newdev;
182
183 /*
184 * When the device is not per cpu, pin the interrupt to the
185 * current cpu:
186 */
187 if (!cpumask_equal(newdev->cpumask, cpumask))
188 irq_set_affinity(newdev->irq, cpumask);
189
190 /*
191 * When global broadcasting is active, check if the current
192 * device is registered as a placeholder for broadcast mode.
193 * This allows us to handle this x86 misfeature in a generic
194 * way.
195 */
196 if (tick_device_uses_broadcast(newdev, cpu))
197 return;
198
/* If the new device's work mode is periodic, ...*/
199 if (td->mode == TICKDEV_MODE_PERIODIC)
200 tick_setup_periodic(newdev, 0);
201 else
202 tick_setup_oneshot(newdev, handler, next_event);
203 }
204
Call Tree:
tick_setup_periodic // called by tick_setup_device()
tick_set_periodic_handler
tick_handle_periodic
or tick_handle_periodic_broadcast
/* Event devices supports periodic events?*/--y--> /* set event device to periodic mode*/
|
| no
|---> /* Set event device to one-shot mode */
|---> /* Program next clock event */
114 /*
115 * Setup the device for a periodic tick
116 */
117 void tick_setup_periodic(struct clock_event_device *dev, int broadcast)
118 {
119 tick_set_periodic_handler(dev, broadcast);
120
121 /* Broadcast setup ? */
122 if (!tick_device_is_functional(dev))
123 return;
124
125 if ((dev->features & CLOCK_EVT_FEAT_PERIODIC) &&
126 !tick_broadcast_oneshot_active()) {
127 clockevents_set_mode(dev, CLOCK_EVT_MODE_PERIODIC);
128 } else {
129 unsigned long seq;
130 ktime_t next;
131
132 do {
133 seq = read_seqbegin(&xtime_lock);
134 next = tick_next_period;
135 } while (read_seqretry(&xtime_lock, seq));
136
137 clockevents_set_mode(dev, CLOCK_EVT_MODE_ONESHOT);
138
139 for (;;) {
140 if (!clockevents_program_event(dev, next, ktime_get()))
141 return;
142 next = ktime_add(next, tick_period);
143 }
144 }
145 }
281 /*
282 * Set the periodic handler depending on broadcast on/off
283 */
284 void tick_set_periodic_handler(struct clock_event_device *dev, int broadcast)
285 {
286 if (!broadcast)
287 dev->event_handler = tick_handle_periodic;
288 else
289 dev->event_handler = tick_handle_periodic_broadcast;
290 }
Call Tree:
tick_handle_periodic
tick_periodic
/* CPU responsible for global tick? */ --> do_timer
update_process_times
profile_tick
/*device don't have periodic mode*/ --> clockevents_program_event
79 /*
80 * Event handler for periodic ticks
81 */
82 void tick_handle_periodic(struct clock_event_device *dev)
83 {
84 int cpu = smp_processor_id();
85 ktime_t next;
86
87 tick_periodic(cpu);
88
89 if (dev->mode != CLOCK_EVT_MODE_ONESHOT)
90 return;
91 /*
92 * Setup the next period for devices, which do not have
93 * periodic mode:
94 */
95 next = ktime_add(dev->next_event, tick_period);
96 for (;;) {
97 if (!clockevents_program_event(dev, next, ktime_get()))
98 return;
99 /*
100 * Have to be careful here. If we're in oneshot mode,
101 * before we call tick_periodic() in a loop, we need
102 * to be sure we're using a real hardware clocksource.
103 * Otherwise we could get trapped in an infinite
104 * loop, as the tick_periodic() increments jiffies,
105 * when then will increment time, posibly causing
106 * the loop to trigger again and again.
107 */
108 if (timekeeping_valid_for_hres())
109 tick_periodic(cpu);
110 next = ktime_add(next, tick_period);
111 }
112 }
Call Tree:
tick_handle_periodic_broadcast
tick_do_periodic_broadcast
tick_do_broadcast
/*device don't have periodic mode*/ --> clockevents_program_event
172 /*
173 * Event handler for periodic broadcast ticks
174 */
175 static void tick_handle_periodic_broadcast(struct clock_event_device *dev)
176 {
177 ktime_t next;
178
179 tick_do_periodic_broadcast();
180
181 /*
182 * The device is in periodic mode. No reprogramming necessary:
183 */
184 if (dev->mode == CLOCK_EVT_MODE_PERIODIC)
185 return;
186
187 /*
188 * Setup the next period for devices, which do not have
189 * periodic mode. We read dev->next_event first and add to it
190 * when the event already expired. clockevents_program_event()
191 * sets dev->next_event only when the event is really
192 * programmed to the device.
193 */
194 for (next = dev->next_event; ;) {
195 next = ktime_add(next, tick_period);
196
197 if (!clockevents_program_event(dev, next, ktime_get()))
198 return;
199 tick_do_periodic_broadcast();
200 }
201 }
157 /*
158 * Periodic broadcast:
159 * - invoke the broadcast handlers
160 */
161 static void tick_do_periodic_broadcast(void)
162 {
163 raw_spin_lock(&tick_broadcast_lock);
164
165 cpumask_and(to_cpumask(tmpmask),
166 cpu_online_mask, tick_get_broadcast_mask());
167 tick_do_broadcast(to_cpumask(tmpmask));
168
169 raw_spin_unlock(&tick_broadcast_lock);
170 }
128 /*
129 * Broadcast the event to the cpus, which are set in the mask (mangled).
130 */
131 static void tick_do_broadcast(struct cpumask *mask)
132 {
133 int cpu = smp_processor_id();
134 struct tick_device *td;
135
136 /*
137 * Check, if the current cpu is in the mask
138 */
139 if (cpumask_test_cpu(cpu, mask)) {
140 cpumask_clear_cpu(cpu, mask);
141 td = &per_cpu(tick_cpu_device, cpu);
142 td->evtdev->event_handler(td->evtdev);
143 }
144
145 if (!cpumask_empty(mask)) {
146 /*
147 * It might be necessary to actually check whether the devices
148 * have different broadcast functions. For now, just use the
149 * one of the first device. This works as long as we have this
150 * misfeature only on x86 (lapic)
151 */
152 td = &per_cpu(tick_cpu_device, cpumask_first(mask));
153 td->evtdev->broadcast(mask);
154 }
155 }
Call Tree:
tick_setup_oneshot
114 /**
115 * tick_setup_oneshot - setup the event device for oneshot mode (hres or nohz)
116 */
117 void tick_setup_oneshot(struct clock_event_device *newdev,
118 void (*handler)(struct clock_event_device *),
119 ktime_t next_event)
120 {
121 newdev->event_handler = handler;
122 clockevents_set_mode(newdev, CLOCK_EVT_MODE_ONESHOT);
123 tick_dev_program_event(newdev, next_event, 1);
124 }