Home - YIYAYIYAYOUCHENG/linux GitHub Wiki
WARN!!! This page is out of date. The work is now named Open-Extension Container(OXC) scheduling framework. See my ongoing master thesis for more information.
Open-Extenssion Container( OXC) scheduler is a real-time scheduler which can control the cpu bandwidth distribution for Linux system on multi-processor platform. Inside oxc scheduling, the concept "open-extension container" is first time raised. oxc scheduler can reserve bandwidth from a cpu for an ox-container, where a per-container scheduling system is built up. This per-container scheduling system behaves as a per-cpu scheduling system in Linux. Multiple ox-containers can cooperate to work as a pseudo Linux scheduling system. The bandwidth reserved in an ox-container can be further distributed into sub ox-containers.
Open-Extenstion Cotainer Structure
The name of oxc scheduler comes from the Open-Extension Contaienr structure inside Linux. The oxc concept was raised by myself and I would use another article to introduce it in detail. Now, I will only briefly introduce it to help people understand how oxc schedule work.
The core element in Linux scheduling is the "runqueue" structure "struct rq". Each CPU in the system is associated with one such runqueue, where a scheduling system is built upon. Different per-cpu (or per-runqueue) scheduling systems together compose the system level scheduling inside Linux.
The oxc is a quite abstract concept; any data structure which contains a runqueue field inside can be called an oxc structure. Then basing on such a extra runqueue, a new per-oxc scheduling system, independent from the original ones inside Linux, can be built up. This per-oxc scheduling system can behave just as a per-cpu scheduling system and is inherently compatible with scheduling components already inside or to be merged into the kernel. There are further benefits for it, whenever an independent or specific scheduling system is needed, the oxc idea may be useful.
How does oxc scheduler work?
In oxc scheduler, there is a new kind of runqueue called "struct oxc_rq", which is a open-extension container; that is, inside it, there is a "struct rq" field inside the oxc_rq structure. Each ox-container can reserve some amount of cpu bandwidth from a cpu according to CBS rules. CBS is a deadline based real-time scheduling algorithm. In CBS, to reserve the bandwidth for an ox-container, a pair of parameter (Q, T) should be specified. Q is called maximum bandwidth and T is called period. By specifying Q and T for an ox-container, it means that every T cpu cycles, that ox-container can occupy the cpu for Q cpu cycles.
So, now an oxc_rq gets Q/T cpu computation power; oxc_rq is an ox container, so it can basically work as a per-cpu scheduling system with a less powerful cpu. In linux system, there are two important kinds of tasks: cfs tasks and rt tasks. They are scheduled by their own scheduling policy which still work under per-container scheduling system. In principle, from tasks', including new kinds of tasks that possibly imported into the kernel in the future, point of view, there is no difference between per-cpu scheduling system or per-cpu scheduling. For some purpose, there may be some scheduling algorithm which is no proper to be applied to the whole system, one choice is to the apply it in a ox-container level.
Furthermore, in oxc scheduling, in order to reserve bandwidth across multiple cpus as a whole, there is a "struct hyper_oxc_rq" structure, which contains several oxc_rq inside. Each oxc_rq reserves cpu bandwidth independently on a cpu. Indeed, a pseudo system level scheduling is built upon this hyper oxc.
How to use oxc-scheduler?
Currently, the user interface of oxc-scheduler is built upon cgroups interfaces in Linux; particularly, the "cpu" subsystem. After mounting the "cpu" subsystem, you can see a file named "cpu.oxc_control", where you can reserve bandwidth on set of cpus for tasks inside this cgroup and its descendant cgruoups (let's call this a cgroup hierarchy). For instance,
echo 0 10000/100000 1 200000/1000000 > cpu.oxc_control
reserves 10% and 20% cpu computation time on cpu 0 and 1 respectively. Users can configure the reservation parameters individually for a cpu. Take the same example, maybe the user later feels it is proper to decrease the reservation on cpu 0, he can do like this :
echo 0 5000/100000 > cpu.oxc_control
By default, oxc control is not enabled when "cpu" subsystem is initially mounted. After enabling it, you can move tasks in and out this hyper ox-container( here I use "hyper-oxc" and "a cgroup hieraarchy" for the same meaning).
Inside the hyper ox-container, there is a top level cgroup and its descendants. People can further apply oxc control in these descendant cgroups by using the cpu.oxc_control interface in corresponding directories.
Compatibility issues
The feature of ox-container is the compatibility. Yet oxc-scheduler is currently being developed, some issues should be mentioned.
-
cpuset subsystem: a cgroup can be associated with a set of cpus using "cpuset subsystem". In oxc scheduling, there is the structure "hyper_oxc_rq" who also concerns with a set of cpus, so the cooperation between "cpuset" and oxc scheduler should not be too difficult. Yet for now, there is no work put here, so if the user enable CONFIG_CPUSET, I can not predict what would happen.
-
cfs grouping and rt grouping: cfs and rt grouping are a non real-time cpu bandwidth method that the linux applies to two kinds of tasks. They are principally the same technology. They are not supported under oxc scheduling now for following reasons:
* oxc scheduling itself can do the same thing in a real time way * It is not attractive to merge them naively. Take rt grouping as example, for cfs bandwidth control it is the same. Suppose we enable rt throttling inside a hyper oxc-container and set the parameter as (Q_r, T_r). The logic for rt throttling is that, every T_r time units, tasks could not tun over Q_r time units, this can be achievable under oxc scheduling. However, rt throttling and oxc control are using different timers, because of unsyncronization between them, we only know rt tasks would not exceed Q_r every T_r, as for what happens besides this, it is quite complex and some computation power would be lost. A better way is to merge rt and cfs grouping in oxc scheduling with the same timer. This is future work.