Linux Device Drivers Notes - yszheda/wiki GitHub Wiki

Chapter 1: An Introduction to Device Drivers

The Role of the Device Driver

the role of a device driver is providing mechanism (“what capabilities are to be provided”), not policy (“how those capabilities can be used”)

When writing drivers, a programmer should pay particular attention to this funda- mental concept: write kernel code to access the hardware, but don’t force particular policies on the user, since different users have different needs. The driver should deal with making the hardware available, leaving all the issues about how to use the hard- ware to the applications. A driver, then, is flexible if it offers access to the hardware capabilities without adding constraints.

Classes of Devices and Modules

Character devices: one that can be accessed as a stream of bytes (like a file)
- usually implements at least the open, close, read, and write system calls
- The only relevant difference between a char device and a regular file is that you can always move back and forth in the regular file, whereas most char devices are just data channels, which you can only access sequentially.
Block devices: a device (e.g., a disk) that can host a filesystem.
- block and char devices differ only in the way data is managed internally by the kernel, and thus in the kernel/driver software interface.
Network interfaces

Chapter 2: Building and Running Modules

Chapter 3: Char Drivers

Some Important Data Structures

open and release

readv and writev

ssize_t (*readv) (struct file *filp, const struct iovec *iov,
unsigned long count, loff_t *ppos);
ssize_t (*writev) (struct file *filp, const struct iovec *iov,
unsigned long count, loff_t *ppos);

Chapter 5: Concurrency and Race Conditions

Concurrency and Its Management

Semaphores and Mutexes

a semaphore is a single integer value combined with a pair of functions that are typically called P and V. A process wishing to enter a critical section will call P on the relevant semaphore; if the semaphore’s value is greater than zero, that value is decremented by one and the process continues. If, instead, the semaphore’s value is 0 (or less), the process must wait until somebody else releases the semaphore. Unlocking a semaphore is accomplished by calling V; this function increments the value of the semaphore and, if necessary, wakes up processes that are waiting.

The Linux Semaphore Implementation

the function decrements the value of the semaphore and, perhaps after putting the caller to sleep for a while to wait for the semaphore to become available, grants access to the protected resources.

void down(struct semaphore *sem);
int down_interruptible(struct semaphore *sem);
int down_trylock(struct semaphore *sem);

Using down_interruptible requires some extra care, however, if the operation is interrupted, the function returns a nonzero value, and the caller does not hold the semaphore. Proper use of down_interruptible requires always checking the return value and responding accordingly.
down_trylock never sleeps; if the semaphore is not available at the time of the call, down_trylock returns immediately with a nonzero return value.
Once up has been called, the caller no longer holds the semaphore. void up(struct semaphore *sem);

Reader/Writer Semaphores

rwsem (or “reader/writer semaphore”) (<linux/rwsem.h>)

void down_read(struct rw_semaphore *sem);
int down_read_trylock(struct rw_semaphore *sem);
void up_read(struct rw_semaphore *sem);

void down_write(struct rw_semaphore *sem);
int down_write_trylock(struct rw_semaphore *sem);
void up_write(struct rw_semaphore *sem);
void downgrade_write(struct rw_semaphore *sem);

An rwsem allows either one writer or an unlimited number of readers to hold the semaphore. Writers get priority; as soon as a writer tries to enter the critical section, no readers will be allowed in until all writers have completed their work. This implementation can lead to reader starvation—where readers are denied access for a long time—if you have a large number of writers contending for the semaphore. For this reason, rwsems are best used when write access is required only rarely, and writer access is held for short periods of time.

Completions

Completions are a lightweight mechanism with one task: allowing one thread to tell another that the job is done. (<linux/completion.h>)

DECLARE_COMPLETION(comp);
ssize_t complete_read (struct file *filp, char __user *buf, size_t count, loff_t
    *pos)
{
  printk(KERN_DEBUG "process %i (%s) going to sleep\n",
      current->pid, current->comm);
  wait_for_completion(&comp);
  printk(KERN_DEBUG "awoken %i (%s)\n", current->pid, current->comm);
  return 0; /* EOF */
}
ssize_t complete_write (struct file *filp, const char __user *buf, size_t count,
    loff_t *pos)
{
  printk(KERN_DEBUG "process %i (%s) awakening the readers...\n",
      current->pid, current->comm);
  complete(&comp);
  return count; /* succeed, to avoid retrial */
}

A typical use of the completion mechanism is with kernel thread termination at module exit time. In the prototypical case, some of the driver internal workings is performed by a kernel thread in a while (1) loop. When the module is ready to be cleaned up, the exit function tells the thread to exit and then waits for completion. To this aim, the kernel includes a specific function to be used by the thread:

void complete_and_exit(struct completion *c, long retval);

Spinlocks

Unlike semaphores, spinlocks may be used in code that cannot sleep, such as interrupt handlers. When properly used, spinlocks offer higher performance than semaphores in general.
A spinlock is a mutual exclusion device that can have only two values: “locked” and “unlocked.”

Introduction to the Spinlock API

<linux/spinlock.h>

spinlock_t my_lock = SPIN_LOCK_UNLOCKED;

void spin_lock_init(spinlock_t *lock);

void spin_lock(spinlock_t *lock);

void spin_unlock(spinlock_t *lock);

Spinlocks and Atomic Context

the core rule that applies to spinlocks is that any code must, while holding a spinlock, be atomic. It cannot sleep; in fact, it cannot relinquish the processor for any reason except to service interrupts (and sometimes not even then).
Any time kernel code holds a spinlock, preemption is disabled on the relevant processor.
The last important rule for spinlock usage is that spinlocks must always be held for the minimum time possible.

The Spinlock Functions

spin_lock_irqsave disables interrupts (on the local processor only) before taking the spinlock; the previous interrupt state is stored in flags .
If you are absolutely sure nothing else might have already disabled interrupts on your processor (or, in other words, you are sure that you should enable interrupts when you release your spinlock), you can use spin_lock_irq instead and not have to keep track of the flags.
spin_lock_bh disables software interrupts before taking the lock, but leaves hardware interrupts enabled.

Reader/Writer Spinlocks

Locking Traps

Lock Ordering Rules

when multiple locks must be acquired, they should always be acquired in the same order.
If you must obtain a lock that is local to your code (a device lock, say) along with a lock belonging to a more central part of the kernel, take your lock first. If you have a combination of semaphores and spinlocks, you must, of course, obtain the semaphore(s) first; calling down (which can sleep) while holding a spinlock is a serious error.

Fine- Versus Coarse-Grained Locking

Alternatives to Locking

Lock-Free Algorithms

Atomic Variables

Bit Operations

/* try to set lock */
while (test_and_set_bit(nr, addr) != 0) {
  wait_for_a_while( );
}
/* do your work */
/* release lock, and check... */
if (test_and_clear_bit(nr, addr) = = 0) {
  something_went_wrong( ); /* already released: error */
}

seqlocks

Seqlocks work in situations where the resource to be protected is small, simple, and frequently accessed, and where write access is rare but must be fast.

Read-Copy-Update

Read-copy-update (RCU) is an advanced mutual exclusion scheme that can yield high performance in the right conditions.

http://www.rdrop.com/users/paulmck/rclock/intro/rclock_intro.html
Kernel Korner - Using RCU in the Linux 2.5 Kernel
http://www2.rdrop.com/users/paulmck/rclock/
It is optimized for situations where reads are common and writes are rare. The resources being protected should be accessed via pointers, and all references to those resources must be held only by atomic code. When the data structure needs to be changed, the writing thread makes a copy, changes the copy, then aims the relevant pointer at the new version—thus, the name of the algorithm.

Chapter 6: Advanced Char Driver Operations

Blocking I/O

Introduction to Sleeping

never sleep when you are running in an atomic context.

your driver cannot sleep while holding a spinlock, seqlock, or RCU lock.
You also cannot sleep if you have disabled interrupts.
It is legal to sleep while holding a semaphore, but you should look very carefully at any code that does so.

when you wake up, you never know how long your process may have been out of the CPU or what may have changed in the mean time. The end result is that you can make no assumptions about the state of the system after you wake up, and you must check to ensure that the condition you were waiting for is, indeed, true.
your process cannot sleep unless it is assured that somebody else, somewhere, will wake it up.

Simple Sleeping

wait_event(queue, condition)
wait_event_interruptible(queue, condition)
wait_event_timeout(queue, condition, timeout)
wait_event_interruptible_timeout(queue, condition, timeout)

void wake_up(wait_queue_head_t *queue);
void wake_up_interruptible(wait_queue_head_t *queue);

Blocking and Nonblocking Operations

The behavior of read and write is different if O_NONBLOCK is specified. In this case, the calls simply return -EAGAIN (“try it again”) if a process calls read when no data is available or if it calls write when there’s no space in the buffer.

Sometimes, however, opening the device requires a long initialization, and you may choose to support O_NONBLOCK in your open method by returning immediately with -EAGAIN if the flag is set, after starting the device initialization process. The driver may also implement a blocking open to support access policies in a way similar to file locks.

Only the read, write, and open file operations are affected by the nonblocking flag.

Advanced Sleeping

poll and select

Call poll_wait on one or more wait queues that could indicate a change in the poll status. If no file descriptors are currently available for I/O, the kernel causes the process to wait on the wait queues for all file descriptors passed to the system call.
Return a bit mask describing the operations (if any) that could be immediately performed without blocking.

Interaction with read and write

Flushing pending output

int (*fsync) (struct file *file, struct dentry *dentry, int datasync);

The Underlying Data Structure

Asynchronous Notification

User programs have to execute two steps to enable asynchronous notification from an input file.

First, they specify a process as the “owner” of the file. When a process invokes the F_SETOWN command using the fcntl system call, the process ID of the owner process is saved in filp->f_owner for later use. This step is necessary for the kernel to know just whom to notify.
In order to actually enable asynchronous notification, the user programs must set the FASYNC flag in the device by means of the F_SETFL fcntl command.
After these two calls have been executed, the input file can request delivery of a SIGIO signal whenever new data arrives. The signal is sent to the process (or process group, if the value is negative) stored in filp->f_owner.

e.g.

signal(SIGIO, &input_handler); /* dummy sample; sigaction( ) is better */
fcntl(STDIN_FILENO, F_SETOWN, getpid( ));
oflags = fcntl(STDIN_FILENO, F_GETFL);
fcntl(STDIN_FILENO, F_SETFL, oflags | FASYNC);

The Driver’s Point of View

When F_SETOWN is invoked, nothing happens, except that a value is assigned to filp->f_owner .
When F_SETFL is executed to turn on FASYNC , the driver’s fasync method is called. This method is called whenever the value of FASYNC is changed in filp->f_flags to notify the driver of the change, so it can respond properly. The flag is cleared by default when the file is opened.
When data arrives, all the processes registered for asynchronous notification must be sent a SIGIO signal.

Access Control on a Device File

// TODO