Kernel debugger must be in some way tied to the CPU scheduler, since it needs to suspend execution of the kernel and inspect its state in suspension. This leads me to consider in a very abstract sense how to write a CPU scheduler in general.
void scheduler() { while (1) { task_t *task = next(); // OS sched. alg. if (!task) { halt(); // CPU specific. continue; } run(task); // CPU specific. put(task); // OS sched. alg. } }The scheduler is really an event-driven loop using some CPU specific subroutines and OS scheduler algorithm functions.
The CPU specific functions are:
- halt() makes the CPU idle, possibly enters power-saving mode, until an interrupt occurs.
- run() context switches into the task and returns when an interrupt occurs.
To make things simple, assume that everything in the kernel and the user processes can be structured as a task. The run() function would have to discern what type of context switch is appropriate for what type of task.
The OS scheduling functions are:
- next() to fetch the next task in the scheduling queue that is ready to run.
- put() places the task back into the scheduling queue.
In a multi-processor system, each CPU will run its own scheduler and have its own task queue. The next() function might attempt to steal task from another CPU's queue if the current queue is empty. The queue takes ownership of a task, but the task can migrate from one queue to another. Work stealing of distributed queues is a good load balancing strategy, and I think it's the only such strategy that is provably scalable.
Within this scheduler, one possibility is to represent the REPL as a task and schedule it like all other kernel tasks. This is the simplest, but the REPL would run simultaneously with all tasks. It cannot be used to debug the scheduler or examine a freeze state of the CPU. One the other hand, having a REPL that can look at the live state of the kernel is pretty cool. Suspending the kernel is quite easy: simply allow uninterruptible kernel tasks in run(), then REPL can switch between live and freeze states by toggling the interruptible flag of its own task. You can also suspend other CPUs using a boolean variable to force their next() to all return the nil task which makes the CPU halt. You don't need to disable their interrupts. To resume, just reset the variable.
Even so, such REPL won't allow you to step into or over the scheduler or interrupt handlers. A true SoftICE styled debugger is only possible if you hijack the CPU specific implementation of run(). The good news is that it should be able to coexist with our REPL which is already pretty useful.
No comments:
Post a Comment