Linux Device Drivers Notes - yszheda/wiki GitHub Wiki
Chapter 1: An Introduction to Device Drivers
The Role of the Device Driver
- the role of a device driver is providing mechanism (“what capabilities are to be provided”), not policy (“how those capabilities can be used”)
When writing drivers, a programmer should pay particular attention to this funda- mental concept: write kernel code to access the hardware, but don’t force particular policies on the user, since different users have different needs. The driver should deal with making the hardware available, leaving all the issues about how to use the hard- ware to the applications. A driver, then, is flexible if it offers access to the hardware capabilities without adding constraints.
Classes of Devices and Modules
- Character devices: one that can be accessed as a stream of bytes (like a file)
- usually implements at least the
open
,close
,read
, andwrite
system calls - The only relevant difference between a char device and a regular file is that you can always move back and forth in the regular file, whereas most char devices are just data channels, which you can only access sequentially.
- usually implements at least the
- Block devices: a device (e.g., a disk) that can host a filesystem.
- block and char devices differ only in the way data is managed internally by the kernel, and thus in the kernel/driver software interface.
- Network interfaces
Chapter 2: Building and Running Modules
Chapter 3: Char Drivers
Some Important Data Structures
open and release
readv and writev
ssize_t (*readv) (struct file *filp, const struct iovec *iov,
unsigned long count, loff_t *ppos);
ssize_t (*writev) (struct file *filp, const struct iovec *iov,
unsigned long count, loff_t *ppos);
Chapter 5: Concurrency and Race Conditions
Concurrency and Its Management
Semaphores and Mutexes
a semaphore is a single integer value combined with a pair of functions that are typically called P
and V
. A process wishing to enter a critical section will call P on the relevant semaphore; if the semaphore’s value is greater than zero, that value is decremented by one and the process continues. If, instead, the semaphore’s value is 0 (or less), the process must wait until somebody else releases the semaphore. Unlocking a semaphore is accomplished by calling V; this function increments the value of the semaphore and, if necessary, wakes up processes that are waiting.
The Linux Semaphore Implementation
the function decrements the value of the semaphore and, perhaps after putting the caller to sleep for a while to wait for the semaphore to become available, grants access to the protected resources.
void down(struct semaphore *sem);
int down_interruptible(struct semaphore *sem);
int down_trylock(struct semaphore *sem);
-
Using
down_interruptible
requires some extra care, however, if the operation is interrupted, the function returns a nonzero value, and the caller does not hold the semaphore. Proper use ofdown_interruptible
requires always checking the return value and responding accordingly. -
down_trylock
never sleeps; if the semaphore is not available at the time of the call,down_trylock
returns immediately with a nonzero return value. -
Once
up
has been called, the caller no longer holds the semaphore.void up(struct semaphore *sem);
Reader/Writer Semaphores
rwsem
(or “reader/writer semaphore”) (<linux/rwsem.h>)
void down_read(struct rw_semaphore *sem);
int down_read_trylock(struct rw_semaphore *sem);
void up_read(struct rw_semaphore *sem);
void down_write(struct rw_semaphore *sem);
int down_write_trylock(struct rw_semaphore *sem);
void up_write(struct rw_semaphore *sem);
void downgrade_write(struct rw_semaphore *sem);
An rwsem allows either one writer or an unlimited number of readers to hold the semaphore. Writers get priority; as soon as a writer tries to enter the critical section, no readers will be allowed in until all writers have completed their work. This implementation can lead to reader starvation—where readers are denied access for a long time—if you have a large number of writers contending for the semaphore. For this reason, rwsems are best used when write access is required only rarely, and writer access is held for short periods of time.
Completions
Completions are a lightweight mechanism with one task: allowing one thread to tell another that the job is done. (<linux/completion.h>)
DECLARE_COMPLETION(comp);
ssize_t complete_read (struct file *filp, char __user *buf, size_t count, loff_t
*pos)
{
printk(KERN_DEBUG "process %i (%s) going to sleep\n",
current->pid, current->comm);
wait_for_completion(&comp);
printk(KERN_DEBUG "awoken %i (%s)\n", current->pid, current->comm);
return 0; /* EOF */
}
ssize_t complete_write (struct file *filp, const char __user *buf, size_t count,
loff_t *pos)
{
printk(KERN_DEBUG "process %i (%s) awakening the readers...\n",
current->pid, current->comm);
complete(&comp);
return count; /* succeed, to avoid retrial */
}
A typical use of the completion mechanism is with kernel thread termination at module exit time. In the prototypical case, some of the driver internal workings is performed by a kernel thread in a while (1)
loop. When the module is ready to be cleaned up, the exit function tells the thread to exit and then waits for completion. To this aim, the kernel includes a specific function to be used by the thread:
void complete_and_exit(struct completion *c, long retval);
Spinlocks
- Unlike semaphores, spinlocks may be used in code that cannot sleep, such as interrupt handlers. When properly used, spinlocks offer higher performance than semaphores in general.
- A spinlock is a mutual exclusion device that can have only two values: “locked” and “unlocked.”
Introduction to the Spinlock API
<linux/spinlock.h>
spinlock_t my_lock = SPIN_LOCK_UNLOCKED;
void spin_lock_init(spinlock_t *lock);
void spin_lock(spinlock_t *lock);
void spin_unlock(spinlock_t *lock);
Spinlocks and Atomic Context
- the core rule that applies to spinlocks is that any code must, while holding a spinlock, be atomic. It cannot sleep; in fact, it cannot relinquish the processor for any reason except to service interrupts (and sometimes not even then).
- Any time kernel code holds a spinlock, preemption is disabled on the relevant processor.
- The last important rule for spinlock usage is that spinlocks must always be held for the minimum time possible.
The Spinlock Functions
spin_lock_irqsave
disables interrupts (on the local processor only) before taking the spinlock; the previous interrupt state is stored in flags .- If you are absolutely sure nothing else might have already disabled interrupts on your processor (or, in other words, you are sure that you should enable interrupts when you release your spinlock), you can use
spin_lock_irq
instead and not have to keep track of the flags. spin_lock_bh
disables software interrupts before taking the lock, but leaves hardware interrupts enabled.
Reader/Writer Spinlocks
Locking Traps
Lock Ordering Rules
- when multiple locks must be acquired, they should always be acquired in the same order.
- If you must obtain a lock that is local to your code (a device lock, say) along with a lock belonging to a more central part of the kernel, take your lock first. If you have a combination of semaphores and spinlocks, you must, of course, obtain the semaphore(s) first; calling down (which can sleep) while holding a spinlock is a serious error.
Fine- Versus Coarse-Grained Locking
Alternatives to Locking
Lock-Free Algorithms
Atomic Variables
<asm/atomic.h>
Bit Operations
<asm/bitops.h>
/* try to set lock */
while (test_and_set_bit(nr, addr) != 0) {
wait_for_a_while( );
}
/* do your work */
/* release lock, and check... */
if (test_and_clear_bit(nr, addr) = = 0) {
something_went_wrong( ); /* already released: error */
}
seqlocks
Seqlocks work in situations where the resource to be protected is small, simple, and frequently accessed, and where write access is rare but must be fast.
<linux/seqlock.h>
Read-Copy-Update
<linux/rcupdate.h>
Read-copy-update (RCU) is an advanced mutual exclusion scheme that can yield high performance in the right conditions.
-
http://www.rdrop.com/users/paulmck/rclock/intro/rclock_intro.html
-
It is optimized for situations where reads are common and writes are rare. The resources being protected should be accessed via pointers, and all references to those resources must be held only by atomic code. When the data structure needs to be changed, the writing thread makes a copy, changes the copy, then aims the relevant pointer at the new version—thus, the name of the algorithm.
Chapter 6: Advanced Char Driver Operations
Blocking I/O
Introduction to Sleeping
- never sleep when you are running in an atomic context.
- your driver cannot sleep while holding a spinlock, seqlock, or RCU lock.
- You also cannot sleep if you have disabled interrupts.
- It is legal to sleep while holding a semaphore, but you should look very carefully at any code that does so.
- when you wake up, you never know how long your process may have been out of the CPU or what may have changed in the mean time. The end result is that you can make no assumptions about the state of the system after you wake up, and you must check to ensure that the condition you were waiting for is, indeed, true.
- your process cannot sleep unless it is assured that somebody else, somewhere, will wake it up.
Simple Sleeping
wait_event(queue, condition)
wait_event_interruptible(queue, condition)
wait_event_timeout(queue, condition, timeout)
wait_event_interruptible_timeout(queue, condition, timeout)
void wake_up(wait_queue_head_t *queue);
void wake_up_interruptible(wait_queue_head_t *queue);
Blocking and Nonblocking Operations
The behavior of
read
andwrite
is different ifO_NONBLOCK
is specified. In this case, the calls simply return-EAGAIN
(“try it again”) if a process callsread
when no data is available or if it callswrite
when there’s no space in the buffer.
Sometimes, however, opening the device requires a long initialization, and you may choose to support
O_NONBLOCK
in your open method by returning immediately with-EAGAIN
if the flag is set, after starting the device initialization process. The driver may also implement a blockingopen
to support access policies in a way similar to file locks.
Only the
read
,write
, andopen
file operations are affected by the nonblocking flag.
Advanced Sleeping
poll and select
- Call
poll_wait
on one or more wait queues that could indicate a change in thepoll
status. If no file descriptors are currently available for I/O, the kernel causes the process to wait on the wait queues for all file descriptors passed to the system call. - Return a bit mask describing the operations (if any) that could be immediately performed without blocking.
Interaction with read and write
Flushing pending output
int (*fsync) (struct file *file, struct dentry *dentry, int datasync);
The Underlying Data Structure
Asynchronous Notification
User programs have to execute two steps to enable asynchronous notification from an input file.
-
First, they specify a process as the “owner” of the file. When a process invokes the
F_SETOWN
command using the fcntl system call, the process ID of the owner process is saved infilp->f_owner
for later use. This step is necessary for the kernel to know just whom to notify. -
In order to actually enable asynchronous notification, the user programs must set the
FASYNC
flag in the device by means of theF_SETFL
fcntl
command. -
After these two calls have been executed, the input file can request delivery of a
SIGIO
signal whenever new data arrives. The signal is sent to the process (or process group, if the value is negative) stored infilp->f_owner
.
e.g.
signal(SIGIO, &input_handler); /* dummy sample; sigaction( ) is better */
fcntl(STDIN_FILENO, F_SETOWN, getpid( ));
oflags = fcntl(STDIN_FILENO, F_GETFL);
fcntl(STDIN_FILENO, F_SETFL, oflags | FASYNC);
The Driver’s Point of View
- When
F_SETOWN
is invoked, nothing happens, except that a value is assigned tofilp->f_owner
. - When
F_SETFL
is executed to turn onFASYNC
, the driver’sfasync
method is called. This method is called whenever the value ofFASYNC
is changed infilp->f_flags
to notify the driver of the change, so it can respond properly. The flag is cleared by default when the file is opened. - When data arrives, all the processes registered for asynchronous notification must be sent a
SIGIO
signal.
Access Control on a Device File
// TODO