这里不讲进程的基本原理,重点描述下进程的数据结构(task_struct).
进程的定义
- 正在执行的程序
- 正在计算机上执行的程序实例
- 能分配给处理器并有处理器执行的实体
- 一组指令序列的执行、一个当前状态个相关的系统自愿集
在进程执行时,任意给定一个时间,进程都可以唯一地被表征为以下元素:
- 标识符: 跟这个进程相关的唯一标识符,用来区别其他进程
- 状态: 进程的几个状态(等待、运行、停止)
- 优先级: 进程的优先级
- 程序计数器: 程序中即将被执行的下一条指令的地址。
- 内存指针:包括程序代码和进程相关数据的指针,还有和其他进程共享的内存块的指针。
- 上下文数据:进程执行时处理器的寄存器中的数据。
- I/O状态信息:包括显示的I/O请求,分配给进程的I/O设备和被进程使用的文件列表。
- 记账信息:可能包括处理器时间总和,使用的时钟数总和,时间限制,记账号等。
这些所有的信息都存放在一个进程的数据结构中task_struct, 也叫PCB。每个进程在内核中都有一个进程控制块(PCB)来维护进程相关的信息,Linux内核的进程控制块是task_struct结构体. 它在进程的运行时会被加载到RAM中。
进程的创建
unix中进程创建是通过内核系统调用fork()实现的,当一个进程产生fork()请求时,操作系统执行以下功能:
- 为新进程在进程表中分配一个空项;
- 为子进程赋予一个唯一的进程标识符
- 给进程分配空间,做一个父进程上下文的逻辑副本,不包括共享内存区;
- 初始化进程控制块
- 增加父进程拥有的所有文件的计数器
- 把子进程置为就绪态
- 向父进程返回子进程的进程号;对子进程返回0;
- 设置正确的连接,把新进程放置在就绪/挂起链表中。
传统的UNIX中用于复制进程的系统调用是fork。 但它并不是Linux为此实现的唯一调用,实际上Linux实现了3个。
- fork是重量级调用, 因为它建立了父进程的一个完整副本, 然后作为子进程执行。为减少与该调用相关的工作量, Linux使用了写时复制(copy-on-write) 技术。
- vfork类似于fork, 但并不创建父进程数据的副本。 相反, 父子进程之间共享数据。这节省了大量CPU时间(如果一个进程操纵共享数据, 则另一个会自动注意到) 。
- clone产生线程, 可以对父子进程之间的共享、 复制进行精确控制。
写时复制: 内核使用了写时复制(Copy-On-Write, COW) 技术, 以防止在fork执行时将父进程的所有数据复制到子进程。 在调用fork时, 内核通常对父进程的每个内存页, 都为子进程创建一个相同的副本。
执行系统调用 : fork、 vfork和clone系统调用的入口点分别是sys_fork、 sys_vfork和sys_clone函数。 其定义依赖于具体的体系结构, 因为在用户空间和内核空间之间传递参数的方法因体系结构而异。
do_fork实现 : 所有3个fork机制最终都调用kernel/fork.c中的do_fork(一个体系结构无关的函数) , 其代码流程如图所示 :
进程的退出
进程必须用exit系统调用终止。 这使得内核有机会将该进程使用的资源释放回系统。 见kernel/exit.c------>do_exit。 简而言之,该函数的实现就是将各个引用计数器减1, 如果引用计数器归0而没有进程再使用对应的结构, 那么将相应的内存区域返还给内存管理模块;
Task_struct 结构
Linux内核涉及进程和程序的所有算法都围绕一个名为task_struct的数据结构建立, 该结构定义在include/linux/sched.h中。 这是系统中主要的一个结构:
-
进程的状态
volatile long state; /* -1 unrunnable, 0 runnable, >0 stopped */
state的可能取值为:
#define TASK_RUNNING 0//进程要么正在执行,要么准备执行 #define TASK_INTERRUPTIBLE 1 //可中断的睡眠,可以通过一个信号唤醒 #define TASK_UNINTERRUPTIBLE 2 //不可中断睡眠,不可以通过信号进行唤醒 #define __TASK_STOPPED 4 //进程停止执行 #define __TASK_TRACED 8 //进程被追踪 /* in tsk->exit_state */ #define EXIT_ZOMBIE 16 //僵尸状态的进程,表示进程被终止,但是父进程还没有获取它的终止信息,比如进程有没有执行完等信息。 #define EXIT_DEAD 32 //进程的最终状态,进程死亡 /* in tsk->state again */ #define TASK_DEAD 64 //死亡 #define TASK_WAKEKILL 128 //唤醒并杀死的进程 #define TASK_WAKING 256 //唤醒进程
-
进程的唯一标识(pid)
pid_t pid;//进程的唯一标识 pid_t tgid;// 线程组的领头线程的pid成员的值
在Linux系统中,一个线程组中的所有线程使用和该线程组的领头线程(该组中的第一个轻量级进程)相同的PID,并被存放在tgid成员中。只有线程组的领头线程的pid成员才会被设置为与tgid相同的值。注意,getpid()系统调用返回的是当前进程的tgid值而不是pid值。(线程是程序运行的最小单位,进程是程序运行的基本单位。)
-
进程的标记:
unsigned int flags; //flags成员的可能取值如下 #define PF_ALIGNWARN 0x00000001 /* Print alignment warning msgs */ /* Not implemented yet, only for 486*/ #define PF_STARTING 0x00000002 /* being created */ #define PF_EXITING 0x00000004 /* getting shut down */ #define PF_EXITPIDONE 0x00000008 /* pi exit done on shut down */ #define PF_VCPU 0x00000010 /* I'm a virtual CPU */ #define PF_FORKNOEXEC 0x00000040 /* forked but didn't exec */ #define PF_MCE_PROCESS 0x00000080 /* process policy on mce errors */ #define PF_SUPERPRIV 0x00000100 /* used super-user privileges */ #define PF_DUMPCORE 0x00000200 /* dumped core */ #define PF_SIGNALED 0x00000400 /* killed by a signal */ #define PF_MEMALLOC 0x00000800 /* Allocating memory */ #define PF_FLUSHER 0x00001000 /* responsible for disk writeback */ #define PF_USED_MATH 0x00002000 /* if unset the fpu must be initialized before use */ #define PF_FREEZING 0x00004000 /* freeze in progress. do not account to load */ #define PF_NOFREEZE 0x00008000 /* this thread should not be frozen */ #define PF_FROZEN 0x00010000 /* frozen for system suspend */ #define PF_FSTRANS 0x00020000 /* inside a filesystem transaction */ #define PF_KSWAPD 0x00040000 /* I am kswapd */ #define PF_OOM_ORIGIN 0x00080000 /* Allocating much memory to others */ #define PF_LESS_THROTTLE 0x00100000 /* Throttle me less: I clean memory */ #define PF_KTHREAD 0x00200000 /* I am a kernel thread */ #define PF_RANDOMIZE 0x00400000 /* randomize virtual address space */ #define PF_SWAPWRITE 0x00800000 /* Allowed to write to swap */ #define PF_SPREAD_PAGE 0x01000000 /* Spread page cache over cpuset */ #define PF_SPREAD_SLAB 0x02000000 /* Spread some slab caches over cpuset */ #define PF_THREAD_BOUND 0x04000000 /* Thread bound to specific cpu */ #define PF_MCE_EARLY 0x08000000 /* Early kill for mce process policy */ #define PF_MEMPOLICY 0x10000000 /* Non-default NUMA mempolicy */ #define PF_MUTEX_TESTER 0x20000000 /* Thread belongs to the rt mutex tester */ #define PF_FREEZER_SKIP 0x40000000 /* Freezer should not count it as freezeable */ #define PF_FREEZER_NOSIG 0x80000000 /* Freezer won't send signals to it */
-
进程之间的亲属关系:
struct task_struct *real_parent; /* real parent process */ struct task_struct *parent; /* recipient of SIGCHLD, wait4() reports */ struct list_head children; /* list of my children */ struct list_head sibling; /* linkage in my parent's children list */ struct task_struct *group_leader; /* threadgroup leader */
在Linux系统中,所有进程之间都有着直接或间接地联系,每个进程都有其父进程,也可能有零个或多个子进程。拥有同一父进程的所有进程具有兄弟关系。
real_parent 指向其父进程,如果创建它的父进程不再存在,则指向PID为1的init进程。
parent ** 指向其父进程,当它终止时,必须向它的父进程发送信号。它的值通常与real_parent** 相同。**children **表示链表的头部,链表中的所有元素都是它的子进程(进程的子进程链表)。
sibling 用于把当前进程插入到兄弟链表中(进程的兄弟链表)。
group_leader指向其所在进程组的领头进程。 -
进程调度信息:
int prio, static_prio, normal_prio; unsigned int rt_priority; const struct sched_class *sched_class; struct sched_entity se; struct sched_rt_entity rt; unsigned int policy;
实时优先级范围是0到MAX_RT_PRIO-1(即99),而普通进程的静态优先级范围是从MAX_RT_PRIO到MAX_PRIO-1(即100到139)。值越大静态优先级越低。
static_prio
用于保存静态优先级,可以通过nice
系统调用来进行修改。
rt_priority
用于保存实时优先级。
normal_prio
的值取决于静态优先级和调度策略(进程的调度策略有:先来先服务,短作业优先、时间片轮转、高响应比优先等等的调度算法。
prio
用于保存动态优先级。
policy
表示进程的调度策略,目前主要有以下五种:#define SCHED_NORMAL 0//按照优先级进行调度(有些地方也说是CFS调度器) #define SCHED_FIFO 1//先进先出的调度算法 实时 #define SCHED_RR 2//时间片轮转的调度算法 实时 #define SCHED_BATCH 3//用于非交互的处理机消耗型的进程 #define SCHED_IDLE 5//系统负载很低时的调度算法 #define SCHED_RESET_ON_FORK 0x40000000
**
SCHED_NORMA
**L用于普通进程,通过CFS调度器实现;
**SCHED_BATCH
**用于非交互的处理器消耗型进程;
SCHED_IDLE
是在系统负载很低时使用;
SCHED_FIFO
(先入先出调度算法)和SCHED_RR
(轮流调度算法)都是实时调度策略. -
ptrace系统调用
unsigned int ptrace; struct list_head ptraced; struct list_head ptrace_entry; unsigned long ptrace_message; siginfo_t *last_siginfo; /* For ptrace use. */ ifdef CONFIG_HAVE_HW_BREAKPOINT atomic_t ptrace_bp_refcnt;
成员**
ptrace
**被设置为0时表示不需要被跟踪,它的可能取值如下:/* linux-4.4.4/include/linux/ptrace.h */ /* * Ptrace flags * * The owner ship rules for task->ptrace which holds the ptrace * flags is simple. When a task is running it owns it's task->ptrace * flags. When the a task is stopped the ptracer owns task->ptrace. */ #define PT_SEIZED 0x00010000 /* SEIZE used, enable new behavior */ #define PT_PTRACED 0x00000001 #define PT_DTRACE 0x00000002 /* delayed trace (used on m68k, i386) */ #define PT_PTRACE_CAP 0x00000004 /* ptracer can follow suid-exec */ #define PT_OPT_FLAG_SHIFT 3 /* PT_TRACE_* event enable flags */ #define PT_EVENT_FLAG(event) (1 << (PT_OPT_FLAG_SHIFT + (event))) #define PT_TRACESYSGOOD PT_EVENT_FLAG(0) #define PT_TRACE_FORK PT_EVENT_FLAG(PTRACE_EVENT_FORK) #define PT_TRACE_VFORK PT_EVENT_FLAG(PTRACE_EVENT_VFORK) #define PT_TRACE_CLONE PT_EVENT_FLAG(PTRACE_EVENT_CLONE) #define PT_TRACE_EXEC PT_EVENT_FLAG(PTRACE_EVENT_EXEC) #define PT_TRACE_VFORK_DONE PT_EVENT_FLAG(PTRACE_EVENT_VFORK_DONE) #define PT_TRACE_EXIT PT_EVENT_FLAG(PTRACE_EVENT_EXIT) #define PT_TRACE_SECCOMP PT_EVENT_FLAG(PTRACE_EVENT_SECCOMP) #define PT_EXITKILL (PTRACE_O_EXITKILL << PT_OPT_FLAG_SHIFT) #define PT_SUSPEND_SECCOMP (PTRACE_O_SUSPEND_SECCOMP << PT_OPT_FLAG_SHIFT)
-
时间数据成员
一个进程从创建到终止叫做该进程的生存期,进程在其生存期内使用CPU时间,内核都需要进行记录,进程耗费的时间分为两部分,一部分是用户模式下耗费的时间,一部分是在系统模式下耗费的时间
cputime_t utime, stime, utimescaled, stimescaled; cputime_t gtime; cputime_t prev_utime, prev_stime;//记录当前的运行时间(用户态和内核态) unsigned long nvcsw, nivcsw; //自愿/非自愿上下文切换计数 struct timespec start_time; //进程的开始执行时间 struct timespec real_start_time; //进程真正的开始执行时间 unsigned long min_flt, maj_flt; struct task_cputime cputime_expires;//cpu执行的有效时间 struct list_head cpu_timers[3];//用来统计进程或进程组被处理器追踪的时间 struct list_head run_list; unsigned long timeout;//当前已使用的时间(与开始时间的差值) unsigned int time_slice;//进程的时间片的大小 int nr_cpus_allowed;
-
信号处理信息
struct signal_struct *signal;//指向进程信号描述符 struct sighand_struct *sighand;//指向进程信号处理程序描述符 sigset_t blocked, real_blocked;//阻塞信号的掩码 sigset_t saved_sigmask; /* restored if set_restore_sigmask() was used */ struct sigpending pending;//进程上还需要处理的信号 unsigned long sas_ss_sp;//信号处理程序备用堆栈的地址 size_t sas_ss_size;//信号处理程序的堆栈的地址
-
文件系统信息
/* filesystem information */ struct fs_struct *fs;//文件系统的信息的指针 /* open file information */ struct files_struct *files;//打开文件的信息指针
Task_struct
/* linux-4.4.4/include/linux/sched.h */ struct task_struct { //进程的状态 -1:就绪态 0:运行态 >0:停止态 volatile long state; /* -1 unrunnable, 0 runnable, >0 stopped */ void *stack; //指向内核栈指针 atomic_t usage; //有几个进程在使用此结构 unsigned int flags; /* per process flags, defined below */ unsigned int ptrace; //ptrace 系统调用,跟踪进程运行 #ifdef CONFIG_SMP //条件编译多处理器 struct llist_node wake_entry; int on_cpu; unsigned int wakee_flips; unsigned long wakee_flip_decay_ts; struct task_struct *last_wakee; int wake_cpu; #endif int on_rq; //运行队列和进程调试相关程序 int prio, static_prio, normal_prio; unsigned int rt_priority; // 优先级 const struct sched_class *sched_class; struct sched_entity se; struct sched_rt_entity rt; #ifdef CONFIG_CGROUP_SCHED //结构体链表 struct task_group *sched_task_group; #endif struct sched_dl_entity dl; #ifdef CONFIG_PREEMPT_NOTIFIERS /* list of struct preempt_notifier: */ struct hlist_head preempt_notifiers; #endif #ifdef CONFIG_BLK_DEV_IO_TRACE //块设备IO的跟踪工具 unsigned int btrace_seq; #endif unsigned int policy; int nr_cpus_allowed; cpumask_t cpus_allowed; #ifdef CONFIG_PREEMPT_RCU //RCU同步原语 int rcu_read_lock_nesting; union rcu_special rcu_read_unlock_special; struct list_head rcu_node_entry; struct rcu_node *rcu_blocked_node; #endif /* #ifdef CONFIG_PREEMPT_RCU */ #ifdef CONFIG_TASKS_RCU unsigned long rcu_tasks_nvcsw; bool rcu_tasks_holdout; struct list_head rcu_tasks_holdout_list; int rcu_tasks_idle_cpu; #endif /* #ifdef CONFIG_TASKS_RCU */ #ifdef CONFIG_SCHED_INFO struct sched_info sched_info; #endif struct list_head tasks; #ifdef CONFIG_SMP struct plist_node pushable_tasks; struct rb_node pushable_dl_tasks; #endif struct mm_struct *mm, *active_mm; //管理进程的地址空间,每个进程都有独立的地址空间4G,32位X86 /* per-thread vma caching */ u32 vmacache_seqnum; struct vm_area_struct *vmacache[VMACACHE_SIZE]; #if defined(SPLIT_RSS_COUNTING) struct task_rss_stat rss_stat; #endif /* task state */ //进程状态参数 int exit_state; int exit_code, exit_signal; int pdeath_signal; /* The signal sent when the parent dies */ unsigned long jobctl; /* JOBCTL_*, siglock protected */ /* Used for emulating ABI behavior of previous Linux versions */ unsigned int personality; /* scheduler bits, serialized by scheduler locks */ unsigned sched_reset_on_fork:1; unsigned sched_contributes_to_load:1; unsigned sched_migrated:1; unsigned :0; /* force alignment to the next boundary */ /* unserialized, strictly 'current' */ unsigned in_execve:1; /* bit to tell LSMs we're in execve */ unsigned in_iowait:1; #ifdef CONFIG_MEMCG unsigned memcg_may_oom:1; #endif #ifdef CONFIG_MEMCG_KMEM unsigned memcg_kmem_skip_account:1; #endif #ifdef CONFIG_COMPAT_BRK unsigned brk_randomized:1; #endif unsigned long atomic_flags; /* Flags needing atomic access. */ struct restart_block restart_block; pid_t pid; pid_t tgid; #ifdef CONFIG_CC_STACKPROTECTOR //防止内核堆栈溢出 /* Canary value for the -fstack-protector gcc feature */ unsigned long stack_canary; #endif /* * pointers to (original) parent process, youngest child, younger sibling, * older sibling, respectively. (p->father can be replaced with * p->real_parent->pid) */ struct task_struct __rcu *real_parent; /* real parent process */ struct task_struct __rcu *parent; /* recipient of SIGCHLD, wait4() reports */ /* * children/sibling forms the list of my natural children */ struct list_head children; /* list of my children */ struct list_head sibling; /* linkage in my parent's children list */ struct task_struct *group_leader; /* threadgroup leader */ /* * ptraced is the list of tasks this task is using ptrace on. * This includes both natural children and PTRACE_ATTACH targets. * p->ptrace_entry is p's link on the p->parent->ptraced list. */ struct list_head ptraced; struct list_head ptrace_entry; /* PID/PID hash table linkage. */ struct pid_link pids[PIDTYPE_MAX]; struct list_head thread_group; struct list_head thread_node; // do_fork()函数 struct completion *vfork_done; /* for vfork() */ int __user *set_child_tid; /* CLONE_CHILD_SETTID */ int __user *clear_child_tid; /* CLONE_CHILD_CLEARTID */ cputime_t utime, stime, utimescaled, stimescaled; //utime 用户态下的执行时间, stime内核态下的执行时间 cputime_t gtime; struct prev_cputime prev_cputime; #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN seqlock_t vtime_seqlock; unsigned long long vtime_snap; enum { VTIME_SLEEPING = 0, VTIME_USER, VTIME_SYS, } vtime_snap_whence; #endif unsigned long nvcsw, nivcsw; /* context switch counts */ u64 start_time; /* monotonic time in nsec */ u64 real_start_time; /* boot based time in nsec */ /* mm fault and swap info: this can arguably be seen as either mm-specific or thread-specific */ unsigned long min_flt, maj_flt; struct task_cputime cputime_expires; struct list_head cpu_timers[3]; /* process credentials */ const struct cred __rcu *real_cred; /* objective and real subjective task * credentials (COW) */ const struct cred __rcu *cred; /* effective (overridable) subjective task * credentials (COW) */ char comm[TASK_COMM_LEN]; /* executable name excluding path - access with [gs]et_task_comm (which lock it with task_lock()) - initialized normally by setup_new_exec */ /* file system info */ struct nameidata *nameidata; #ifdef CONFIG_SYSVIPC /* ipc stuff */ struct sysv_sem sysvsem; struct sysv_shm sysvshm; #endif #ifdef CONFIG_DETECT_HUNG_TASK /* hung task detection */ unsigned long last_switch_count; #endif /* filesystem information */ struct fs_struct *fs; /* open file information */ struct files_struct *files; /* namespaces */ struct nsproxy *nsproxy; /* signal handlers */ struct signal_struct *signal; //信号 struct sighand_struct *sighand; sigset_t blocked, real_blocked; sigset_t saved_sigmask; /* restored if set_restore_sigmask() was used */ struct sigpending pending; unsigned long sas_ss_sp; size_t sas_ss_size; struct callback_head *task_works; struct audit_context *audit_context; #ifdef CONFIG_AUDITSYSCALL kuid_t loginuid; unsigned int sessionid; #endif struct seccomp seccomp; /* Thread group tracking */ u32 parent_exec_id; u32 self_exec_id; /* Protection of (de-)allocation: mm, files, fs, tty, keyrings, mems_allowed, * mempolicy */ spinlock_t alloc_lock; /* Protection of the PI data structures: */ raw_spinlock_t pi_lock; struct wake_q_node wake_q; #ifdef CONFIG_RT_MUTEXES /* PI waiters blocked on a rt_mutex held by this task */ struct rb_root pi_waiters; struct rb_node *pi_waiters_leftmost; /* Deadlock detection and priority inheritance handling */ struct rt_mutex_waiter *pi_blocked_on; #endif #ifdef CONFIG_DEBUG_MUTEXES /* mutex deadlock detection */ struct mutex_waiter *blocked_on; #endif #ifdef CONFIG_TRACE_IRQFLAGS unsigned int irq_events; unsigned long hardirq_enable_ip; unsigned long hardirq_disable_ip; unsigned int hardirq_enable_event; unsigned int hardirq_disable_event; int hardirqs_enabled; int hardirq_context; unsigned long softirq_disable_ip; unsigned long softirq_enable_ip; unsigned int softirq_disable_event; unsigned int softirq_enable_event; int softirqs_enabled; int softirq_context; #endif #ifdef CONFIG_LOCKDEP # define MAX_LOCK_DEPTH 48UL u64 curr_chain_key; int lockdep_depth; unsigned int lockdep_recursion; struct held_lock held_locks[MAX_LOCK_DEPTH]; gfp_t lockdep_reclaim_gfp; #endif /* journalling filesystem info */ void *journal_info; //日志文件信息 /* stacked block device info */ struct bio_list *bio_list; #ifdef CONFIG_BLOCK /* stack plugging */ struct blk_plug *plug; #endif /* VM state */ //虚拟内存状态参数 struct reclaim_state *reclaim_state; struct backing_dev_info *backing_dev_info; struct io_context *io_context; // io调度器所用的信息 unsigned long ptrace_message; siginfo_t *last_siginfo; /* For ptrace use. */ struct task_io_accounting ioac; #if defined(CONFIG_TASK_XACCT) u64 acct_rss_mem1; /* accumulated rss usage */ u64 acct_vm_mem1; /* accumulated virtual memory usage */ cputime_t acct_timexpd; /* stime + utime since last update */ #endif #ifdef CONFIG_CPUSETS nodemask_t mems_allowed; /* Protected by alloc_lock */ seqcount_t mems_allowed_seq; /* Seqence no to catch updates */ int cpuset_mem_spread_rotor; int cpuset_slab_spread_rotor; #endif #ifdef CONFIG_CGROUPS /* Control Group info protected by css_set_lock */ struct css_set __rcu *cgroups; /* cg_list protected by css_set_lock and tsk->alloc_lock */ struct list_head cg_list; #endif #ifdef CONFIG_FUTEX //futex同步机制 struct robust_list_head __user *robust_list; #ifdef CONFIG_COMPAT struct compat_robust_list_head __user *compat_robust_list; #endif struct list_head pi_state_list; struct futex_pi_state *pi_state_cache; #endif #ifdef CONFIG_PERF_EVENTS struct perf_event_context *perf_event_ctxp[perf_nr_task_contexts]; struct mutex perf_event_mutex; struct list_head perf_event_list; #endif #ifdef CONFIG_DEBUG_PREEMPT unsigned long preempt_disable_ip; #endif #ifdef CONFIG_NUMA struct mempolicy *mempolicy; /* Protected by alloc_lock */ short il_next; short pref_node_fork; #endif #ifdef CONFIG_NUMA_BALANCING int numa_scan_seq; unsigned int numa_scan_period; unsigned int numa_scan_period_max; int numa_preferred_nid; unsigned long numa_migrate_retry; u64 node_stamp; /* migration stamp */ u64 last_task_numa_placement; u64 last_sum_exec_runtime; struct callback_head numa_work; struct list_head numa_entry; struct numa_group *numa_group; /* * numa_faults is an array split into four regions: * faults_memory, faults_cpu, faults_memory_buffer, faults_cpu_buffer * in this precise order. * * faults_memory: Exponential decaying average of faults on a per-node * basis. Scheduling placement decisions are made based on these * counts. The values remain static for the duration of a PTE scan. * faults_cpu: Track the nodes the process was running on when a NUMA * hinting fault was incurred. * faults_memory_buffer and faults_cpu_buffer: Record faults per node * during the current scan window. When the scan completes, the counts * in faults_memory and faults_cpu decay and these values are copied. */ unsigned long *numa_faults; unsigned long total_numa_faults; /* * numa_faults_locality tracks if faults recorded during the last * scan window were remote/local or failed to migrate. The task scan * period is adapted based on the locality of the faults with different * weights depending on whether they were shared or private faults */ unsigned long numa_faults_locality[3]; unsigned long numa_pages_migrated; #endif /* CONFIG_NUMA_BALANCING */ #ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH struct tlbflush_unmap_batch tlb_ubc; #endif struct rcu_head rcu; //rcu链表 /* * cache last used pipe for splice */ struct pipe_inode_info *splice_pipe; struct page_frag task_frag; #ifdef CONFIG_TASK_DELAY_ACCT struct task_delay_info *delays; #endif #ifdef CONFIG_FAULT_INJECTION int make_it_fail; #endif /* * when (nr_dirtied >= nr_dirtied_pause), it's time to call * balance_dirty_pages() for some dirty throttling pause */ int nr_dirtied; int nr_dirtied_pause; unsigned long dirty_paused_when; /* start of a write-and-pause period */ #ifdef CONFIG_LATENCYTOP int latency_record_count; struct latency_record latency_record[LT_SAVECOUNT]; #endif /* * time slack values; these are used to round up poll() and * select() etc timeout values. These are in nanoseconds. */ unsigned long timer_slack_ns; unsigned long default_timer_slack_ns; #ifdef CONFIG_KASAN unsigned int kasan_depth; #endif #ifdef CONFIG_FUNCTION_GRAPH_TRACER /* Index of current stored address in ret_stack */ int curr_ret_stack; /* Stack of return addresses for return function tracing */ struct ftrace_ret_stack *ret_stack; /* time stamp for last schedule */ unsigned long long ftrace_timestamp; /* * Number of functions that haven't been traced * because of depth overrun. */ atomic_t trace_overrun; /* Pause for the tracing */ atomic_t tracing_graph_pause; #endif #ifdef CONFIG_TRACING /* state flags for use by tracers */ unsigned long trace; /* bitmask and counter of trace recursion */ unsigned long trace_recursion; #endif /* CONFIG_TRACING */ #ifdef CONFIG_MEMCG struct mem_cgroup *memcg_in_oom; gfp_t memcg_oom_gfp_mask; int memcg_oom_order; /* number of pages to reclaim on returning to userland */ unsigned int memcg_nr_pages_over_high; #endif #ifdef CONFIG_UPROBES struct uprobe_task *utask; #endif #if defined(CONFIG_BCACHE) || defined(CONFIG_BCACHE_MODULE) unsigned int sequential_io; unsigned int sequential_io_avg; #endif #ifdef CONFIG_DEBUG_ATOMIC_SLEEP unsigned long task_state_change; #endif int pagefault_disabled; /* CPU-specific state of this task */ struct thread_struct thread; /* * WARNING: on x86, 'thread_struct' contains a variable-sized * structure. It *MUST* be at the end of 'task_struct'. * * Do not put anything below here! */ };