Syscalls & interrupts, Part II - ghaerr/elks GitHub Wiki

System Calls and Interrupts, Part II

In the first part of this article, we analyzed the execution of system calls or hardware interrupts from their inception through the entry into _irqit, the kernel funnel routine that all software and hardware interrupts pass through. In this section, we'll trace the same path for a typical interrupt, the hardware timer.

Here's the full code at the start of _irqit:

_irqit:
//
//      Make room
//
        push    %ds
        push    %si
        push    %di
//
//      Recover data segment
//
//      seg     cs
        mov     %cs:ds_kernel,%ds
//
//      Determine which stack to use
//
        cmpw    $1,_gint_count
        jc      utask           // We were in user mode
        jz      itask           // Using a process's kernel stack
ktask:                          // Already using interrupt stack
//
//      Already using interrupt stack, keep using it
//
        mov     %sp,%si
        sub     $8,%si          // 14 offsets less 6 already on stack
        jmp     save_regs
//
//      Using a process's kernel stack, switch to interrupt stack
//
itask:
        mov     $_intstack-14,%si // 14 offsets 0-13 of SI below
        jmp     save_regs

Execution is identical to a system call, until the general interrupt counter (_gint_count) is tested. Since this is a hardware interrupt this time, the system could be executing application code, or perhaps be in the middle of a kernel routine.

If _gint_count is 0, then the path for the timer interrupt would be the same as a system call - the save_regs routine would be called and the registered saved within the current task structure. However, since we know that each call through _irqit increments _gintr_count, we'll look at each case specifically this time.

The Interrupt Stack

If _gint_count is 1, this means that an application program is already in the kernel, and that the timer interrupt is interrupting normal kernel code. This case is handled by the "jz itask" (jump if zero) instruction. You'll see here that in this case, rather than SI being set to point into the kernel current task structure, it is set to point to a global _intstack-14. This is 14 bytes (7 words) less than the top of a global kernel array _intstack, which is going to be used as a special interrupt stack, rather than the normal kernel stack used for application system calls or hardware interrupts from user mode. SI is set to 14 less because that's the amount of space required before this same routine switches to the new kernel stack. The registers are saved on the interrupt stack in the same order as they are in the current task struct, but instead onto the interrupt stack:

/* ordering of saved registers on kernel stack after syscall/interrupt entry*/
struct _registers {
    /* SI offset                 0   2        4   6   8  10  12*/
    __u16       ax, bx, cx, dx, di, si, orig_ax, es, ds, sp, ss;
};

Notice that the offsets are 0-12 (14 total) to save the DI through SS registers, which was discussed in Part I.

Interrupts interrupting interrupts

If _gint_count is 2 or more, this means that the current hardware interrupt is interrupting another interrupt. Although that sounds complicated, it's not really a big deal, since interrupts are prioritized by the 8259 interrupt controller and only allowed when interrupts are re-enabled after saving all the registers in this same routine.

In this case, the conditional jumps to utask and itask are not taken, and code execution falls through to ktask, where the comment says "Already using interrupt stack, keep using it". In this case, since we know the first three instructions of this routine, the push DS, SI and DI registers are already on the interrupt stack, we only need to move SP to SI, and subtract 8, since the first three pushes used the other 6 bytes.

This is somewhat tricky, but the important point is that in the case of interrupting kernel code, or interrupting an interrupt, the system doesn't switch to a normal kernel process stack saved in the task structure, but instead saves all the registers on a 512-byte fixed interrupt stack with the following declaration:

       .skip 512,0             // 512 byte interrupt stack
_intstack:

The reason a normal kernel process stack can't be used is because that stack is already in use by the first interrupt, the one that occurred when _gint_count == 0. The saved registers would not be in the places the kernel expects, and ultimately this would mean that task couldn't be scheduled in or out. But we'll return to that in another discussion.

Saving the registers

save_regs:
        incw    _gint_count
        pop     (%si)           // DI
        pop     2(%si)          // SI
        pop     8(%si)          // DS
        ...

Here we are, in the exact same routine as handling a system call, incrementing _gint_count as always, but this time SI points to a special interrupt stack to save the registers.

        ...
        movb    %cs:(%di),%al
        cmpb    $0x80,%al
        jne     updct

Now we get to the part where the interrupt number is checked. The timer is interrupt 0, so the test for interrupt 80h will fail and we will "jne updct" (jump not equal). This takes us away from the syscall processing and instead into an update count routine.

Calling the interrupt routine

/*
!
!       ----------PROCESS INTERRUPT----------
!
!       Update intr_count
!
*/
updct:
        incw    intr_count      // only needed for schedule during interrupt warning
//
//      Call the C code
//
        sti                     // Reenable interrupts
        mov     %sp,%bx         // Get pointer to pt_regs
        cbw
        push    %ax             // IRQ for later

        push    %bx             // Register base
        push    %ax             // IRQ number
        call    do_IRQ          // Do the work
        pop     %ax             // Clean parameters
        pop     %bx

        pop     %ax             // Saved IRQ

We updated a new global variable _intr_count, whose purpose is to tell the kernel to error if a reschedule (sleep/wait) is attempted, and then enable interrupts.

The current SP, which points to a _registers struct on the interrupt stack, is copied to BX, and the interrupt number, which happens to be in AL, is sign-extended and saved. The _register array address and IRQ number are then pushed and do_IRQ is called, which uses the interrupt number to find the previously-registered driver routine to call. The kernel routine request_irq is used by device drivers for this purpose.

Notice that the kernel-registered interrupt routine is called with interrupt enabled.

After the interrupt routine returns, the stack parameters are removed, and the saved IRQ popped into AX.

The kernel can be interrupted at almost any time

So we've seen that any code, including the kernel, can be interrupted at any time, except for portions of routines like this that disable interrupts. But in general, any application or kernel code must be aware that a registered interrupt handler could be called.

HOWEVER, as we will see shortly, all interrupts processed on the interrupt stack (not kernel stack) are guaranteed to return to the previously interrupted code. This means that an interrupt driver routine, which is always called through do_IRQ using this method, cannot call any kernel sleep routine. In general, it shouldn't do much, since it is interrupting application or kernel code, although it is definitely allowed to "wake up" tasks that are not running (called sleeping).

After the interrupt routine, reset the IRQ controller

//
//      Send EOI to interrupt controller
//
        cli                     // Disable interrupts to avoid reentering ISR
        cmp     $16,%ax
        jge     was_trap        // Traps need no reset
        or      %ax,%ax         // Is int #0?
        jnz     a4
//  
//      IRQ 0 (timer) has to go on to the bios for some systems
//
        decw    _bios_call_cnt_l // Will call bios int?
        jne     a4
        movw    $5,_bios_call_cnt_l
        pushf
        lcall   *_stashed_irq0_l 
        jmp     was_trap        // EOI already sent by bios int
a4:
        cmp     $8,%ax
        mov     $0x20,%al       // EOI
        jb      a6              // IRQ on low chip
/*
!
!       Reset secondary 8259 if we have taken an AT rather
!       than XT irq. We also have to prod the primay
!       controller EOI..
!   
*/  
        out     %al,$0xA0
        jmp     a5
a5:     jmp     a6
a6:     out     %al,$0x20               // Ack on primary controller

After the interrupt service routine returns, interrupts are disabled again and a test for interrupts 0-15 is checked.

If 16 or greater, this section of code is skipped by jumping to was_trap.

If the interrupt was number 0, then every five calls of (the timer interrupt 0) the BIOS timer interrupt entry point is called, to handle BIOS's that need a timer interrupt (since all interrupt vectors point to ELKS now).

Then, at label a4, the interrupt number is compared to 8 to determine whether to send an EOI (end of interrupt) to the slave as well as well as the master 8259 interrupt controller. Note, that even though interrupts were re-enabled before calling do_IRQ, the 8259 interrupt controller won't allow any equal-or-lower priority interrupts to occur until it has had an EOI sent to it. Since interrupts are now disabled, reseting the 8259 won't cause an immediate interrupt (yet).

The end of interrupt processing - the interesting part

was_trap:
//
//      Restore intr_count
//  
        decw    intr_count
//  
//      Now look at rescheduling
//
        cmpw    $1,_gint_count
        jne     restore_regs    // No
//  
// This path will return directly to user space
//
        sti                     // Enable interrupts to help fast devices
        call    schedule        // Task switch
        call    do_signal       // Check signals
        cli
//
//      Restore registers and return
//
restore_regs:

We now get to the interesting part of handling hardware interrupts. Now that the interrupt handling routine has finished, and the interrupt controller reset with an EOI, the global _intr_count is decremented, possibly allowing task-switching in the scheduler.

The global _gint_count is compared to 1. If not equal to 1, then the registers are restored as always from exit from _irqit, and the interrupted routine, whatever it was, is resumed. Remember, that the only time that _gint_count can be 1 in this routine is when the routine is handling an interrupt from user code. For us now, that means that now that this timer interrupt has completed, it turns out the timer interrupted user code (NOT kernel code), if _gint_count is 1.

Thus, in this special case only, where a hardware interrupt interrupted user code, interrupts are re-enabled and schedule is called.

If there is another task ready to run (possibly woken up by the interrupt handler just executed), the scheduler will pick the first one an d switch stacks.

Switching kernel stacks

We won't go into exactly how this works now, but since _gintr_count is 1 at this time (just before decrementing it and restoring registers), this means that the current kernel stack is sitting with all registers saved in the _register array in the task struct. All that is needed is to switch the kernel SP to a different task struct entry, and then the return from schedule will be to this very same routine, but with a different kernel stack. In effect, could be the current, or any other, ready-to-run task struct.

After the return from schedule, the current (newly-woken, or round robin scheduled) task now continues executing, which then calls do_signal for signal processing, then falls into restore_registers, which decrements _gint_count, restores the registers, and return from interrupt:

//
//      Restore registers and return
//
restore_regs:
        decw    _gint_count
        pop     %ax
        pop     %bx
        pop     %cx
        pop     %dx
        ...

This concludes this writeup on system calls and interrupt processing. The heart of ELKS is contained within the _irqit routine, the scheduler, and the task structure. Hopefully this illuminates the very controlled execution path ELKS takes to keep the system in a known state at all times.