PThread - jwyx/ForFun GitHub Wiki
目的
B. Nichols, D. Buttlar, and J. P. Farrell,
Pthreads Programming: A POSIX Standard for Better Multiprocessing, O'Rielly, 1996
读书笔记:记录重点和难点,方便以后回忆
FAQ
UNIX的fork生成child process, 不是thread
Thread的生成是由PThread library来实现
exit()退出process; pthread_exit()退出thread
Pthread
a lightweight, easy-to-use, and portable mechanism for speeding up applications
Why Threads?
Thread model takes a process and divides it into two parts:
1. contains resources used across the whole program, such as program instructions and global data
=> process
2. contains information related to the execution state, such as a program counter and a stack
=> a thread
Layout of program in virtual address space:
1. Readonly text area
2. Read-write area for global data
3. Heap area for dynamically allocated memory
4. Stack area on which the automatic variables of the current procedure are kept
below similar information for the procedure that called it
E.g. function arguments, information needed to link it to the procedure that called it
Each of these procedure-specific areas is known as a stack frame,
and one exists for each procedure in the programs that remains active
5. Machine registers (one copy for each thread)
A program counter [PC] to currently executing instruction
A stack pointer [SP] to the current stack frame
6. Process-specific include tables, maintained by the operating system,
to track system-supplied resources
Open files (file descriptor)
Sockets
Locks
Signals
Identity e.g. PID, UID, GID
Potential parallelism:
the property of a program - that statements can be executed in any order without changing the result
Threads:
a means to identify and utilize potential parallelism in a program.
enhance its performance and to efficiently structure programs that do more than one thing at a time
Why exploit potential parallelism:
1. Make our program run faster on a multiprocessor
2. Overlap I/O (split I/O-intensive with CPU-intensive)
3. Asynchronous events (indeterminate occurrence of events, such as network, input, signal)
4. Real-time scheduling (independent scheduling priorities and policies)
Concurrent programming interface:
1. Multiple processes:
Supported by UNIX
Allow user programs to create multiple processes
Provide services the processes can use to communicate with each other
Fork(): create a new process
The fork call creates a child process that is identical toits parent process
at the time the paret called fork with the following differences:
1. The child has its own process identifier, or PID
2. The fork call provides PID of child to parent process and 0 to child process
Parent-child relationship:
The child begins executing as if it were returning from the fork call issued by its parent.
Because it starts out as a nearly identical copy of its parent. The initial values of all of
its variables and the state of its system resources are the same as those of its parent.
After the program forks into two different processes, the parent and child execute
independently unless you add explicit synchronization.
2. Multi-threads
The program starts in a single thread, referred as the main thread.
But for the most part, the operating system does not recognize any thread as being a parent
or master thread - from its viewpoint, all threads in a process are equal.
Using Pthread function call, the creator thread spawns threads. The main thread waits for
both threads to finish.
pthread_create(): create a new thread
Arguments:
1. A pointer to a buffer to which pthread_create returns a value
that identifies the newly created thread.
2. A pointer to a structure known as a thread attribute object.
NULL means default characteristics for the new thread.
3. A pointer to the routine at which the new thread will start executing.
4. A pointer to a parameter to be passed to the routine at which the new thread starts.
Return value:
A zero value represents success
A nonzero value indicates and identifies an error
Peer-to-peer relationship:
The creator thread and spawned thread have exactly same properties for Pthreads.
The executing order depends on the default scheduling policies of the underlying Pthreads.
Why multi-threads?
The OS performs less work on behalf of a multithreaded program than it does for a multiprocess
program. [performance gain for the multithreaded program]
Parallel vs. Concurrent programming:
concurrent programming: the tasks can occur in any order
parallel programming: the simultaneous execution of concurrent tasks on different processes
All parallel programming is concurrent, but not all concurrent programming is parallel.
Pthreads standard specifies concurrency;
it allows parallelism to be at the option of system implementation
As a programmer, all you can do is define those tasks, or threads, that can occur concurrently.
Synchronization:
Inter-processes:
waitpid(): prevent the parent process from exit before all children processes exit
By suspending its caller until a child process exit
Resource sharing:
The parent initializes a region of shared memory from the system using shmget() and shmat()
Communication:
Use any of UNIX Interprocess Communication (IPC) mechanism:
sockets, shared memory, and messages
However, all , all types of IPC involve a call into the OS
- to initialize shared memory or a message structure
so this makes communication between processes more expensive than communication
between threads
Inter-threads:
pthread_join(): provide synchronization for threads; can between any two threads
By suspending its caller until another thread exit
Or, pthread_mutex_lock
Resource sharing:
Share memory using global variable
Communication:
Primitive mechanism
Scheduling:
The scheduler synchronizes the task's access to a shared resource: the system's CPUs.
POSIX defines some scheduling calls as an optional part of its Pthreads package,
allowing you to select scheduling policies and priorities for threads
Who am I? Who are you?
Use pthread_t to determine a thread's identity using pthread_self() and pthread_equal()
The Pthreads standard leaves the exact definition of the pthread_t type up to system implementors.
E.g. pthread_t thread = pthread_self();
pthread_equal(io_thread, thread);
Terminate thread execution:
A process terminates when it comes to the end of main(). At that time, the OS reclaims the process's
resources and stores its exit status. Similarly, a thread exits when it comes to the end of the
routine in which it was started. (By the way, all threads expire when the process in which they
run exits.) When a thread terminates, the Pthreads library reclaims any process or system resources
the thread was using and stores its exit status.
pthread_exit(): a thread can explicitly exit with a call of this function
pthread_cancel(): terminate another thread
In any of these cases, the Pthreads library runs any routines in its cleanup stack
and any destructors in keys in which it has store values.
Exit status and Return values:
The Pthreads library may or may not save the exit status of a thread when the thread exits,
depending upon whether the thread is joinable or detached.
- joinable thread: default state; does have its exit status saved.
- detached thread: does not.
Detaching a thread gives the library a break and lets it immediately reclaim the resources
associated with the thread (??). Because the library will not have an exit status for a detached
thread, cannot use a pthread_join() to join it.
What is the exit status f a thread?
- terminate explicitly with a call to pthread_exit(), the argument to the call becomes exit status
- if not call pthread_exit(), the return value of the routine becomes its exit status
With Pthread standard, the thread-start routine (specified in the pthread_create call) return
a (void *) type. Just remember to cast the return value as a (void *) type and avoid using a
value that conflicts with PTHREAD_CANCELED, the only status value that the Pthreads library itself
may return. (Because Pthread implementations cannot define PTHREAD_CANCELED) as a valid address or
as NULL, so you're always safest when returning an address.
Note: pthread_join() is in order. Its purpose is to allow a single thread to wait on another's
termination. The result of having multiple threads concurrently call pthread_join is undefined
in the Pthreads standard.
Pthreads library calls and errors:
Most Pthreads library calls return zero on success and an error number otherwise.
- Errors numbers are defined in the errno.h; The Pthreads standard doesn't require library calls
to set errno, the global variable traditionally used by UNIX and POSIX.1 calls to deliver an error
to their callers.
- The two Pthread library calls that don't return an error code upon failure
1. pthread_getspecific() return NULL if it's unsuccessful
2. pthread_self() can always succeeds
If your platform supports a routine to convert error numbers to a readable string,
such as the XPG4 call, strerror().
Why use threads over processes?
For process:
create a new process can be expensive
- take time (A system call + maybe context-switch)
- take memory (The entire process must be replicated)
the cost of interprocess communication and synchronization of shared data
- take time (need system call)
For thread:
create a new thread
- some, if not all, of the work of creating a thread is done in user space
- without replicating an entire process
easy synchronize
- simply monitoring a variable in user space
Techniques used to obtain concurrency:
- potential parallelism, overlapping I/O, asynchronous events, and real-time scheduling
UNIX offers many disjointed mechanisms to accomplish them between processes:
- select()
- signal
- nonblocking I/O
- setjmp()/longjmp(),
plus many call for real time (such as aio_read and aio_write) and parallel processing
Pthread offers a clean, consistent way to address all of these motivations.
Choose which application to thread:
multi-threaded program can concurrently execute tasks,
but also introduce a certain amount of overhead.
What makes concurrency possible?
1. consist of some independent tasks - tasks that do not depend on the completion of
other tasks to proceed.
2. be confident that concurrent execution of these tasks would be faster than their serial
execution.
- For a uni-processing system, the concurrent execution of independent tasks will be faster
than their serial execution if at least one of these tasks issues a lot of I/O request;
=> look at overlapping I/O and asynchronous events
- For a multi-processing system, even CPU-bound tasks can benefit from concurrency;
Design threaded programs
How to identify a task that is suitable for threading:
- It is independent of other tasks.
Does the task use separate resources from other tasks?
Does its execution depend on the results of other tasks?
Do other tasks depend on its results?
=> maximize concurrency and minimize the need for synchronization
- It can become blocked in potentially long waits.
- It can use a lot of CPU cycles.
- It must respond to asynchronous events.
- Its work has greater or lesser importance than other work in the application.
Models
Boss/worker model
- thread pool
- dynamically create thread
Peer model (a.k.a Workcrew model)
- suitable for application that have a fixed or well-defined set of inputs
Pipeline model
- a long stream of input
- a series of suboperations (known as stages or filters) through wich every unit of input
must be processed
- each processing stage can handle a different unit of input at a time
=> should balance the work to be performed across all stages
Buffer data between threads
Producer: the thread that passes the data to another
Consumer: the thread that receives the data
producer/consumer relationship:
Buffer: in the process's global data region
Lock: mutex
Suspend/resume mechanism: condition variable
State information: indicate how much data is in the buffer
Note: the code that consumer wakes up to grab resources in buffer should wrapped with
while(/* buffer is not empty */) {}
Producer() {
...
lock shared buffer
place results in buffer
unlock buffer
wake up any consumer threads
...
}
Consumer() {
...
lock shared buffer
while state is not full { /* Here should be while !! */
release lock and sleep
awake and reacquire lock
}
remove contents
unlock buffer
...
}
Double buffering: both threads act as producer and consumer to each other
Common problems:
The basic rule for managing shared resoureces:
- obtain a lock before accessing the resource
- release the lock when you are finished with the resource
Performance:
Some of costs:
- the memory and CPU cycles required to manage each thread, including the structures the OS
uses to manage them, plus the overhead for the Pthreads library and any special code in the
OS that supports the library
- the CPU cycles spent fro synchronization calls that enforce orderly access to shared data
- the time during which the application is inactive while one thread is waiting on another
thread
Synchronize Pthreads
Synchronization
Race condition
Synchronization tools
pthread_join():
allows one thread to suspend execution until another has terminated
mutex variable:
mutually exclusive lock
only one thread at a time can hold the lock and access the data it protects
condition variable:
provide a way of naming an event in which threads have a general interest
pthread_once():
ensure that initialization routines get executed once and only once when called by
multiple threads
some of the common synchronization mechanisms:
reader/writer exclusion:
allow multiple threads to read data concurrently
but ensure that any thread writing to the data has exclusive access
thread-safe data structure:
build synchronization primitives into a complex data structure
so that each time you access it you do not need to make a separate call
to synchronize concurrent access.
semaphores:
have semaphore if the platform supports POSIX real-time extensions (POSIX.1b)
Mutex variable:
mutual exclusion / mutex
critical region:
the piece of code that must be executed atomically
The single statement in C may be no longer atomic at the hardware level.
1. create & initialize a mutex for each resource to be protected,
pthread_mutex_t
pthread_mutex_attr_t
pthread_mutexattr_init
pthread_mutexattr_destroy
pthread_mutexattr_setshared // PTHREAD_PROCESS_SHARED | PTHREAD_PROCESS_PRIVATE
E.g.
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
// Or
pthread_mutex_t *mutexPtr;
mutexPtr = (pthread_mutex_t *)malloc(sizeof(pthread_mutex_t));
pthread_mutex_init(mutexPtr, NULL);
2. pthread_mutex_lock(pthread_mutex_t*)
Block the calling thread until it's granted the lock
If the mutex is unlocked at the time of the call, the lock's granted immediately;
otherwise, it's granted after it's released by the thread that's holding it.
pthread_mutex_trylock()
It does not suspend its caller if another thread already holds the mutex
It returns immediately, indicating that the mutex is currently locked
Represent a kind of polling for a resource - repeatedly trying and backing off until
the resource is obtained.
This polling leads to some overhead and, worse, potential resource starvation.
Acceptable situations:
- real-time programmers to poll for state changes
- detect and avoid deadlock in locking hierarchies and priority inversion situations ??
3. pthread_mutex_unlock(pthread_mutex_t*)
The thread that holds a lock on a mutex can assume that:
- no other thread will write to the data
- no other thread will read the data while it is in some sort of intermediate state
Mutex is best used for controlling direct access to data and resources.
But for event synchronization, condition variable is better.
Shortcomings:
1. Mutex is the most restrictive type of access control.
But sometimes, reader/writer locks provides better access control:
- the writer should be allowed access whenever any readers are using the data
- when a writer is using it, neither readers nor other writers are allowed in.
=> can "roll your own" using mutex and condition variable
2. Recursive lock
A lock that can be relocked any number of times by its current holder.
- Imagine the Pthreads library associating an internal counter with a recursive mutex
to count the number of times its current holder has called pthread_mutex_lock
- Each time the current holder calls pthread_mutex_unlock, the library would decrement
this counter. The lock would not be released until the call that brings the count down
to zero is issued.
Useful for a thread that makes a number of nested calls to a routine that locks and
manipulates a resource
Contention for a mutex
If more than one threads is waiting for a locked mutex, which thread is the first to be
granted the lock once its released?
=> The thread with the highest priority gets the lock.
The use of priority in a multithreaded program can lead to a classic multiprocessing problem:
Priority Inversion:
Involve a low priority thread that holds a lock that a higher priority thread wants.
Because the higher priority thread cannot continue until the lower priority thread
release the lock, each thread is actually treated as if it had the inverse of its intend
priority.
=> The best way to avoid inversion is to minimize the degree to which threads of
different priority share the same locks. But this may not always be possible.
=> Eliminate the risk of priority inversion by using mutex attributes.
When designing the synchronization for our data structures:
- Eliminate all race conditions
- Do not introduce deadlock
Access Patterns and Granularity:
Your choices for optimizing performance in a multithreaded program are tied to how its
threads access the shared data on the list.
Lock Granularity:
The level at which we apply locks to our shared data, (and, thus, the number of locks
we use to protect the data)
Coarse-grain locking vs. Fine-grain locking
- Fine-grain improves concurrency, but more coding and overhead in synchronization calls
Locking Hierarchies
If your shared data has some hierarchical structure - for instance, it's a tree with some
depth to it - you may want to allow threads to lock separate subtrees of the structure
simultaneously. This assumes a fine-grain lock design.
If we allow threads to acquire these locks in any order they please, the kind of deadlock
known as a "deadly embrace" can occur.
=> Enforce a fixed locking hierarchy. To access data at any given level, all threads must
obtain the lock at each lower level in exactly the same order.
=> Pthread don't have built-in support for locking hierarchies.
Sharing a mutex among processes
A mutex has a single attributes that determines whether or not it can be seen by threads
in other process: process-shared. (A mutex object also has two attributes that assit in
scheduling threads.) If your platform allows you to set the process-shared attribute, the
compile-time constant _POSIX_THREAD_PROCESS_SHARED will be TRUE.
E.g. Chapter3 Mutex Variable Example 3-6
- Once a process has multiple threads, forking has many pitfalls. So we took care to
initialize the mutex before forking and to fork before we created multiple threads from
multiple processes.
- For strict, process-to-process synchronization, use System V or POSIX.1b semaphores. For
thread-to-thread synchronization across processes, you can use POSIX.1b semaphores as an
alternative to process-shared mutexes.
Condition Variable
Mutex lets threads synchronize by controlling their access to data.
Condition variable lets threads synchronize on the value of data.
Cooperating threads wait until data reaches a particular stat or until a certain event occurs.
CV provides a kind of notification system among threads.
If Pthreads didn't offer CV, but only provided Mutexes:
Threads would need to poll the variables to determine when it reached a certain state.