Linux Device Drivers Notes - yszheda/wiki GitHub Wiki
Chapter 1: An Introduction to Device Drivers
The Role of the Device Driver
- the role of a device driver is providing mechanism (“what capabilities are to be provided”), not policy (“how those capabilities can be used”)
When writing drivers, a programmer should pay particular attention to this funda- mental concept: write kernel code to access the hardware, but don’t force particular policies on the user, since different users have different needs. The driver should deal with making the hardware available, leaving all the issues about how to use the hard- ware to the applications. A driver, then, is flexible if it offers access to the hardware capabilities without adding constraints.
Classes of Devices and Modules
- Character devices: one that can be accessed as a stream of bytes (like a file)
- usually implements at least the
open,close,read, andwritesystem calls - The only relevant difference between a char device and a regular file is that you can always move back and forth in the regular file, whereas most char devices are just data channels, which you can only access sequentially.
- usually implements at least the
- Block devices: a device (e.g., a disk) that can host a filesystem.
- block and char devices differ only in the way data is managed internally by the kernel, and thus in the kernel/driver software interface.
- Network interfaces
Chapter 2: Building and Running Modules
Chapter 3: Char Drivers
Some Important Data Structures
open and release
readv and writev
ssize_t (*readv) (struct file *filp, const struct iovec *iov,
unsigned long count, loff_t *ppos);
ssize_t (*writev) (struct file *filp, const struct iovec *iov,
unsigned long count, loff_t *ppos);
Chapter 5: Concurrency and Race Conditions
Concurrency and Its Management
Semaphores and Mutexes
a semaphore is a single integer value combined with a pair of functions that are typically called P and V. A process wishing to enter a critical section will call P on the relevant semaphore; if the semaphore’s value is greater than zero, that value is decremented by one and the process continues. If, instead, the semaphore’s value is 0 (or less), the process must wait until somebody else releases the semaphore. Unlocking a semaphore is accomplished by calling V; this function increments the value of the semaphore and, if necessary, wakes up processes that are waiting.
The Linux Semaphore Implementation
the function decrements the value of the semaphore and, perhaps after putting the caller to sleep for a while to wait for the semaphore to become available, grants access to the protected resources.
void down(struct semaphore *sem);
int down_interruptible(struct semaphore *sem);
int down_trylock(struct semaphore *sem);
-
Using
down_interruptiblerequires some extra care, however, if the operation is interrupted, the function returns a nonzero value, and the caller does not hold the semaphore. Proper use ofdown_interruptiblerequires always checking the return value and responding accordingly. -
down_trylocknever sleeps; if the semaphore is not available at the time of the call,down_trylockreturns immediately with a nonzero return value. -
Once
uphas been called, the caller no longer holds the semaphore.void up(struct semaphore *sem);
Reader/Writer Semaphores
rwsem (or “reader/writer semaphore”) (<linux/rwsem.h>)
void down_read(struct rw_semaphore *sem);
int down_read_trylock(struct rw_semaphore *sem);
void up_read(struct rw_semaphore *sem);
void down_write(struct rw_semaphore *sem);
int down_write_trylock(struct rw_semaphore *sem);
void up_write(struct rw_semaphore *sem);
void downgrade_write(struct rw_semaphore *sem);
An rwsem allows either one writer or an unlimited number of readers to hold the semaphore. Writers get priority; as soon as a writer tries to enter the critical section, no readers will be allowed in until all writers have completed their work. This implementation can lead to reader starvation—where readers are denied access for a long time—if you have a large number of writers contending for the semaphore. For this reason, rwsems are best used when write access is required only rarely, and writer access is held for short periods of time.
Completions
Completions are a lightweight mechanism with one task: allowing one thread to tell another that the job is done. (<linux/completion.h>)
DECLARE_COMPLETION(comp);
ssize_t complete_read (struct file *filp, char __user *buf, size_t count, loff_t
*pos)
{
printk(KERN_DEBUG "process %i (%s) going to sleep\n",
current->pid, current->comm);
wait_for_completion(&comp);
printk(KERN_DEBUG "awoken %i (%s)\n", current->pid, current->comm);
return 0; /* EOF */
}
ssize_t complete_write (struct file *filp, const char __user *buf, size_t count,
loff_t *pos)
{
printk(KERN_DEBUG "process %i (%s) awakening the readers...\n",
current->pid, current->comm);
complete(&comp);
return count; /* succeed, to avoid retrial */
}
A typical use of the completion mechanism is with kernel thread termination at module exit time. In the prototypical case, some of the driver internal workings is performed by a kernel thread in a while (1) loop. When the module is ready to be cleaned up, the exit function tells the thread to exit and then waits for completion. To this aim, the kernel includes a specific function to be used by the thread:
void complete_and_exit(struct completion *c, long retval);
Spinlocks
- Unlike semaphores, spinlocks may be used in code that cannot sleep, such as interrupt handlers. When properly used, spinlocks offer higher performance than semaphores in general.
- A spinlock is a mutual exclusion device that can have only two values: “locked” and “unlocked.”
Introduction to the Spinlock API
<linux/spinlock.h>
spinlock_t my_lock = SPIN_LOCK_UNLOCKED;
void spin_lock_init(spinlock_t *lock);
void spin_lock(spinlock_t *lock);
void spin_unlock(spinlock_t *lock);
Spinlocks and Atomic Context
- the core rule that applies to spinlocks is that any code must, while holding a spinlock, be atomic. It cannot sleep; in fact, it cannot relinquish the processor for any reason except to service interrupts (and sometimes not even then).
- Any time kernel code holds a spinlock, preemption is disabled on the relevant processor.
- The last important rule for spinlock usage is that spinlocks must always be held for the minimum time possible.
The Spinlock Functions
spin_lock_irqsavedisables interrupts (on the local processor only) before taking the spinlock; the previous interrupt state is stored in flags .- If you are absolutely sure nothing else might have already disabled interrupts on your processor (or, in other words, you are sure that you should enable interrupts when you release your spinlock), you can use
spin_lock_irqinstead and not have to keep track of the flags. spin_lock_bhdisables software interrupts before taking the lock, but leaves hardware interrupts enabled.
Reader/Writer Spinlocks
Locking Traps
Lock Ordering Rules
- when multiple locks must be acquired, they should always be acquired in the same order.
- If you must obtain a lock that is local to your code (a device lock, say) along with a lock belonging to a more central part of the kernel, take your lock first. If you have a combination of semaphores and spinlocks, you must, of course, obtain the semaphore(s) first; calling down (which can sleep) while holding a spinlock is a serious error.
Fine- Versus Coarse-Grained Locking
Alternatives to Locking
Lock-Free Algorithms
Atomic Variables
<asm/atomic.h>
Bit Operations
<asm/bitops.h>
/* try to set lock */
while (test_and_set_bit(nr, addr) != 0) {
wait_for_a_while( );
}
/* do your work */
/* release lock, and check... */
if (test_and_clear_bit(nr, addr) = = 0) {
something_went_wrong( ); /* already released: error */
}
seqlocks
Seqlocks work in situations where the resource to be protected is small, simple, and frequently accessed, and where write access is rare but must be fast.
<linux/seqlock.h>
Read-Copy-Update
<linux/rcupdate.h>
Read-copy-update (RCU) is an advanced mutual exclusion scheme that can yield high performance in the right conditions.
-
http://www.rdrop.com/users/paulmck/rclock/intro/rclock_intro.html
-
It is optimized for situations where reads are common and writes are rare. The resources being protected should be accessed via pointers, and all references to those resources must be held only by atomic code. When the data structure needs to be changed, the writing thread makes a copy, changes the copy, then aims the relevant pointer at the new version—thus, the name of the algorithm.
Chapter 6: Advanced Char Driver Operations
Blocking I/O
Introduction to Sleeping
- never sleep when you are running in an atomic context.
- your driver cannot sleep while holding a spinlock, seqlock, or RCU lock.
- You also cannot sleep if you have disabled interrupts.
- It is legal to sleep while holding a semaphore, but you should look very carefully at any code that does so.
- when you wake up, you never know how long your process may have been out of the CPU or what may have changed in the mean time. The end result is that you can make no assumptions about the state of the system after you wake up, and you must check to ensure that the condition you were waiting for is, indeed, true.
- your process cannot sleep unless it is assured that somebody else, somewhere, will wake it up.
Simple Sleeping
wait_event(queue, condition)
wait_event_interruptible(queue, condition)
wait_event_timeout(queue, condition, timeout)
wait_event_interruptible_timeout(queue, condition, timeout)
void wake_up(wait_queue_head_t *queue);
void wake_up_interruptible(wait_queue_head_t *queue);
Blocking and Nonblocking Operations
The behavior of
readandwriteis different ifO_NONBLOCKis specified. In this case, the calls simply return-EAGAIN(“try it again”) if a process callsreadwhen no data is available or if it callswritewhen there’s no space in the buffer.
Sometimes, however, opening the device requires a long initialization, and you may choose to support
O_NONBLOCKin your open method by returning immediately with-EAGAINif the flag is set, after starting the device initialization process. The driver may also implement a blockingopento support access policies in a way similar to file locks.
Only the
read,write, andopenfile operations are affected by the nonblocking flag.
Advanced Sleeping
poll and select
- Call
poll_waiton one or more wait queues that could indicate a change in thepollstatus. If no file descriptors are currently available for I/O, the kernel causes the process to wait on the wait queues for all file descriptors passed to the system call. - Return a bit mask describing the operations (if any) that could be immediately performed without blocking.
Interaction with read and write
Flushing pending output
int (*fsync) (struct file *file, struct dentry *dentry, int datasync);
The Underlying Data Structure
Asynchronous Notification
User programs have to execute two steps to enable asynchronous notification from an input file.
-
First, they specify a process as the “owner” of the file. When a process invokes the
F_SETOWNcommand using the fcntl system call, the process ID of the owner process is saved infilp->f_ownerfor later use. This step is necessary for the kernel to know just whom to notify. -
In order to actually enable asynchronous notification, the user programs must set the
FASYNCflag in the device by means of theF_SETFLfcntlcommand. -
After these two calls have been executed, the input file can request delivery of a
SIGIOsignal whenever new data arrives. The signal is sent to the process (or process group, if the value is negative) stored infilp->f_owner.
e.g.
signal(SIGIO, &input_handler); /* dummy sample; sigaction( ) is better */
fcntl(STDIN_FILENO, F_SETOWN, getpid( ));
oflags = fcntl(STDIN_FILENO, F_GETFL);
fcntl(STDIN_FILENO, F_SETFL, oflags | FASYNC);
The Driver’s Point of View
- When
F_SETOWNis invoked, nothing happens, except that a value is assigned tofilp->f_owner. - When
F_SETFLis executed to turn onFASYNC, the driver’sfasyncmethod is called. This method is called whenever the value ofFASYNCis changed infilp->f_flagsto notify the driver of the change, so it can respond properly. The flag is cleared by default when the file is opened. - When data arrives, all the processes registered for asynchronous notification must be sent a
SIGIOsignal.
Access Control on a Device File
// TODO