sched_setup(9) perform round-robin scheduling of runnable processes

Other Alias

curpriority_cmp, maybe_resched, resetpriority, roundrobin, roundrobin_interval, schedclock, schedcpu, setrunnable, updatepri

SYNOPSIS

In sys/param.h In sys/proc.h Ft int Fn curpriority_cmp struct proc *p Ft void Fn maybe_resched struct thread *td Ft void Fn propagate_priority struct proc *p Ft void Fn resetpriority struct ksegrp *kg Ft void Fn roundrobin void *arg Ft int Fn roundrobin_interval void Ft void Fn sched_setup void *dummy Ft void Fn schedclock struct thread *td Ft void Fn schedcpu void *arg Ft void Fn setrunnable struct thread *td Ft void Fn updatepri struct thread *td

DESCRIPTION

Each process has three different priorities stored in Vt struct proc : p_usrpri p_nativepri and p_priority

The p_usrpri member is the user priority of the process calculated from a process' estimated CPU time and nice level.

The p_nativepri member is the saved priority used by Fn propagate_priority . When a process obtains a mutex, its priority is saved in p_nativepri While it holds the mutex, the process's priority may be bumped by another process that blocks on the mutex. When the process releases the mutex, then its priority is restored to the priority saved in p_nativepri

The p_priority member is the actual priority of the process and is used to determine what runqueue(9) it runs on, for example.

The Fn curpriority_cmp function compares the cached priority of the currently running process with process Fa p . If the currently running process has a higher priority, then it will return a value less than zero. If the current process has a lower priority, then it will return a value greater than zero. If the current process has the same priority as Fa p , then Fn curpriority_cmp will return zero. The cached priority of the currently running process is updated when a process resumes from tsleep(9) or returns to userland in Fn userret and is stored in the private variable curpriority

The Fn maybe_resched function compares the priorities of the current thread and Fa td . If Fa td has a higher priority than the current thread, then a context switch is needed, and KEF_NEEDRESCHED is set.

The Fn propagate_priority looks at the process that owns the mutex Fa p is blocked on. That process's priority is bumped to the priority of Fa p if needed. If the process is currently running, then the function returns. If the process is on a runqueue(9), then the process is moved to the appropriate runqueue(9) for its new priority. If the process is blocked on a mutex, its position in the list of processes blocked on the mutex in question is updated to reflect its new priority. Then, the function repeats the procedure using the process that owns the mutex just encountered. Note that a process's priorities are only bumped to the priority of the original process Fa p , not to the priority of the previously encountered process.

The Fn resetpriority function recomputes the user priority of the ksegrp Fa kg (stored in kg_user_pri and calls Fn maybe_resched to force a reschedule of each thread in the group if needed.

The Fn roundrobin function is used as a timeout(9) function to force a reschedule every sched_quantum ticks.

The Fn roundrobin_interval function simply returns the number of clock ticks in between reschedules triggered by Fn roundrobin . Thus, all it does is return the current value of sched_quantum

The Fn sched_setup function is a SYSINIT(9) that is called to start the callout driven scheduler functions. It just calls the Fn roundrobin and Fn schedcpu functions for the first time. After the initial call, the two functions will propagate themselves by registering their callout event again at the completion of the respective function.

The Fn schedclock function is called by Fn statclock to adjust the priority of the currently running thread's ksegrp. It updates the group's estimated CPU time and then adjusts the priority via Fn resetpriority .

The Fn schedcpu function updates all process priorities. First, it updates statistics that track how long processes have been in various process states. Secondly, it updates the estimated CPU time for the current process such that about 90% of the CPU usage is forgotten in 5 * load average seconds. For example, if the load average is 2.00, then at least 90% of the estimated CPU time for the process should be based on the amount of CPU time the process has had in the last 10 seconds. It then recomputes the priority of the process and moves it to the appropriate runqueue(9) if necessary. Thirdly, it updates the %CPU estimate used by utilities such as ps(1) and top(1) so that 95% of the CPU usage is forgotten in 60 seconds. Once all process priorities have been updated, Fn schedcpu calls Fn vmmeter to update various other statistics including the load average. Finally, it schedules itself to run again in hz clock ticks.

The Fn setrunnable function is used to change a process's state to be runnable. The process is placed on a runqueue(9) if needed, and the swapper process is woken up and told to swap the process in if the process is swapped out. If the process has been asleep for at least one run of Fn schedcpu , then Fn updatepri is used to adjust the priority of the process.

The Fn updatepri function is used to adjust the priority of a process that has been asleep. It retroactively decays the estimated CPU time of the process for each Fn schedcpu event that the process was asleep. Finally, it calls Fn resetpriority to adjust the priority of the process.

BUGS

The curpriority variable really should be per-CPU. In addition, Fn maybe_resched should compare the priority of Fa chk with that of each CPU, and then send an IPI to the processor with the lowest priority to trigger a reschedule if needed.

Priority propagation is broken and is thus disabled by default. The p_nativepri variable is only updated if a process does not obtain a sleep mutex on the first try. Also, if a process obtains more than one sleep mutex in this manner, and had its priority bumped in between, then p_nativepri will be clobbered.