PThread - jwyx/ForFun GitHub Wiki

目的

B. Nichols, D. Buttlar, and J. P. Farrell, 
Pthreads Programming: A POSIX Standard for Better Multiprocessing, O'Rielly, 1996
读书笔记：记录重点和难点，方便以后回忆

FAQ

UNIX的fork生成child process, 不是thread
Thread的生成是由PThread library来实现

exit()退出process; pthread_exit()退出thread

Pthread

a lightweight, easy-to-use, and portable mechanism for speeding up applications

Why Threads?

Thread model takes a process and divides it into two parts:
    1. contains resources used across the whole program, such as program instructions and global data
       => process
    2. contains information related to the execution state, such as a program counter and a stack
       => a thread

Layout of program in virtual address space:
    1. Readonly text area
    2. Read-write area for global data
    3. Heap area for dynamically allocated memory
    4. Stack area on which the automatic variables of the current procedure are kept
       below similar information for the procedure that called it
       E.g. function arguments, information needed to link it to the procedure that called it
       
       Each of these procedure-specific areas is known as a stack frame,
       and one exists for each procedure in the programs that remains active
    5. Machine registers (one copy for each thread)
           A program counter [PC] to currently executing instruction
           A stack pointer [SP] to the current stack frame
    6. Process-specific include tables, maintained by the operating system,
       to track system-supplied resources
           Open files (file descriptor)
           Sockets
           Locks
           Signals
           Identity e.g. PID, UID, GID

Potential parallelism:
    the property of a program - that statements can be executed in any order without changing the result

Threads:
    a means to identify and utilize potential parallelism in a program. 
    enhance its performance and to efficiently structure programs that do more than one thing at a time

Why exploit potential parallelism:
    1. Make our program run faster on a multiprocessor
    2. Overlap I/O          (split I/O-intensive with CPU-intensive)
    3. Asynchronous events  (indeterminate occurrence of events, such as network, input, signal)
    4. Real-time scheduling (independent scheduling priorities and policies)

Concurrent programming interface:
    1. Multiple processes:
        Supported by UNIX
        Allow user programs to create multiple processes
        Provide services the processes can use to communicate with each other
        Fork(): create a new process
            The fork call creates a child process that is identical toits parent process
            at the time the paret called fork with the following differences:
                1. The child has its own process identifier, or PID
                2. The fork call provides PID of child to parent process and 0 to child process
        Parent-child relationship:
            The child begins executing as if it were returning from the fork call issued by its parent.
            Because it starts out as a nearly identical copy of its parent. The initial values of all of
            its variables and the state of its system resources are the same as those of its parent.
            After the program forks into two different processes, the parent and child execute 
            independently unless you add explicit synchronization.
    2. Multi-threads
        The program starts in a single thread, referred as the main thread.
        But for the most part, the operating system does not recognize any thread as being a parent
        or master thread - from its viewpoint, all threads in a process are equal.
        Using Pthread function call, the creator thread spawns threads. The main thread waits for 
        both threads to finish.
        pthread_create(): create a new thread
            Arguments:
                1. A pointer to a buffer to which pthread_create returns a value
                that identifies the newly created thread.
                2. A pointer to a structure known as a thread attribute object.
                NULL means default characteristics for the new thread.
                3. A pointer to the routine at which the new thread will start executing.
                4. A pointer to a parameter to be passed to the routine at which the new thread starts.
            Return value:
                A zero value represents success
                A nonzero value indicates and identifies an error
        Peer-to-peer relationship:
            The creator thread and spawned thread have exactly same properties for Pthreads.
            The executing order depends on the default scheduling policies of the underlying Pthreads.

Why multi-threads?
    The OS performs less work on behalf of a multithreaded program than it does for a multiprocess 
    program. [performance gain for the multithreaded program]

Parallel vs. Concurrent programming:
    concurrent programming: the tasks can occur in any order
    parallel programming:   the simultaneous execution of concurrent tasks on different processes

    All parallel programming is concurrent, but not all concurrent programming is parallel.
    
    Pthreads standard specifies concurrency;
    it allows parallelism to be at the option of system implementation

    As a programmer, all you can do is define those tasks, or threads, that can occur concurrently.

Synchronization:
    Inter-processes:
        waitpid():      prevent the parent process from exit before all children processes exit
        By suspending its caller until a child process exit
        Resource sharing:
            The parent initializes a region of shared memory from the system using shmget() and shmat()
        Communication:
            Use any of UNIX Interprocess Communication (IPC) mechanism:
                sockets, shared memory, and messages
            However, all , all types of IPC involve a call into the OS
                - to initialize shared memory or a message structure
                  so this makes communication between processes more expensive than communication
                  between threads

    Inter-threads:
        pthread_join(): provide synchronization for threads; can between any two threads
        By suspending its caller until another thread exit
        Or, pthread_mutex_lock
        Resource sharing:
            Share memory using global variable
        Communication:
            Primitive mechanism
        Scheduling:
            The scheduler synchronizes the task's access to a shared resource: the system's CPUs.
            POSIX defines some scheduling calls as an optional part of its Pthreads package,
            allowing you to select scheduling policies and priorities for threads

Who am I? Who are you?
    Use pthread_t to determine a thread's identity using pthread_self() and pthread_equal()
    The Pthreads standard leaves the exact definition of the pthread_t type up to system implementors.
    E.g. pthread_t thread = pthread_self();
         pthread_equal(io_thread, thread);

Terminate thread execution:
    A process terminates when it comes to the end of main(). At that time, the OS reclaims the process's
    resources and stores its exit status. Similarly, a thread exits when it comes to the end of the
    routine in which it was started. (By the way, all threads expire when the process in which they
    run exits.) When a thread terminates, the Pthreads library reclaims any process or system resources
    the thread was using and stores its exit status.
    pthread_exit():   a thread can explicitly exit with a call of this function
    pthread_cancel(): terminate another thread
    In any of these cases, the Pthreads library runs any routines in its cleanup stack
    and any destructors in keys in which it has store values.

Exit status and Return values:
    The Pthreads library may or may not save the exit status of a thread when the thread exits,
    depending upon whether the thread is joinable or detached.
    - joinable thread: default state; does have its exit status saved.
    - detached thread: does not.
    Detaching a thread gives the library a break and lets it immediately reclaim the resources 
    associated with the thread (??). Because the library will not have an exit status for a detached
    thread, cannot use a pthread_join() to join it.

    What is the exit status f a thread?
    - terminate explicitly with a call to pthread_exit(), the argument to the call becomes exit status
    - if not call pthread_exit(), the return value of the routine becomes its exit status

    With Pthread standard, the thread-start routine (specified in the pthread_create call) return
    a (void *) type. Just remember to cast the return value as a (void *) type and avoid using a 
    value that conflicts with PTHREAD_CANCELED, the only status value that the Pthreads library itself
    may return. (Because Pthread implementations cannot define PTHREAD_CANCELED) as a valid address or
    as NULL, so you're always safest when returning an address.

    Note: pthread_join() is in order. Its purpose is to allow a single thread to wait on another's
    termination. The result of having multiple threads concurrently call pthread_join is undefined
    in the Pthreads standard.

Pthreads library calls and errors:
    Most Pthreads library calls return zero on success and an error number otherwise.
    - Errors numbers are defined in the errno.h; The Pthreads standard doesn't require library calls
      to set errno, the global variable traditionally used by UNIX and POSIX.1 calls to deliver an error
      to their callers.
    - The two Pthread library calls that don't return an error code upon failure
      1. pthread_getspecific() return NULL if it's unsuccessful
      2. pthread_self() can always succeeds

    If your platform supports a routine to convert error numbers to a readable string,
    such as the XPG4 call, strerror().

Why use threads over processes?
    For process:
        create a new process can be expensive
        - take time   (A system call + maybe context-switch)
        - take memory (The entire process must be replicated)
        the cost of interprocess communication and synchronization of shared data
        - take time   (need system call)
    For thread:
        create a new thread
        - some, if not all, of the work of creating a thread is done in user space
        - without replicating an entire process
        easy synchronize
        - simply monitoring a variable in user space

Techniques used to obtain concurrency:
    - potential parallelism, overlapping I/O, asynchronous events, and real-time scheduling
    UNIX offers many disjointed mechanisms to accomplish them between processes:
        - select()
        - signal
        - nonblocking I/O
        - setjmp()/longjmp(),
          plus many call for real time (such as aio_read and aio_write) and parallel processing
    Pthread offers a clean, consistent way to address all of these motivations.

Choose which application to thread:
    multi-threaded program can concurrently execute tasks,
    but also introduce a certain amount of overhead.
    
    What makes concurrency possible?
        1. consist of some independent tasks - tasks that do not depend on the completion of
           other tasks to proceed.
        2. be confident that concurrent execution of these tasks would be faster than their serial
           execution.
           - For a uni-processing system, the concurrent execution of independent tasks will be faster
             than their serial execution if at least one of these tasks issues a lot of I/O request;
             => look at overlapping I/O and asynchronous events 
           - For a multi-processing system, even CPU-bound tasks can benefit from concurrency;

Design threaded programs

How to identify a task that is suitable for threading:
    - It is independent of other tasks.
        Does the task use separate resources from other tasks?
        Does its execution depend on the results of other tasks?
        Do other tasks depend on its results?
        => maximize concurrency and minimize the need for synchronization
    - It can become blocked in potentially long waits.
    - It can use a lot of CPU cycles.
    - It must respond to asynchronous events.
    - Its work has greater or lesser importance than other work in the application.

Models

    Boss/worker model
        - thread pool
        - dynamically create thread

    Peer model (a.k.a Workcrew model)
        - suitable for application that have a fixed or well-defined set of inputs

    Pipeline model
        - a long stream of input
        - a series of suboperations (known as stages or filters) through wich every unit of input
          must be processed
        - each processing stage can handle a different unit of input at a time
        => should balance the work to be performed across all stages

Buffer data between threads

    Producer: the thread that passes the data to another
    Consumer: the thread that receives the data

    producer/consumer relationship:

        Buffer:                     in the process's global data region
        Lock:                       mutex
        Suspend/resume mechanism:   condition variable
        State information: indicate how much data is in the buffer

        Note: the code that consumer wakes up to grab resources in buffer should wrapped with
              while(/* buffer is not empty */) {}

    Producer() {
        ...
        lock shared buffer
        place results in buffer
        unlock buffer
        wake up any consumer threads
        ...
    }

    Consumer() {
        ...
        lock shared buffer
        while state is not full { /* Here should be while !! */
            release lock and sleep
            awake and reacquire lock
        }
        remove contents
        unlock buffer
        ...
    }

    Double buffering: both threads act as producer and consumer to each other

Common problems:

    The basic rule for managing shared resoureces:
        - obtain a lock before accessing the resource
        - release the lock when you are finished with the resource

Performance:

    Some of costs:
        - the memory and CPU cycles required to manage each thread, including the structures the OS
          uses to manage them, plus the overhead for the Pthreads library and any special code in the
          OS that supports the library
        - the CPU cycles spent fro synchronization calls that enforce orderly access to shared data
        - the time during which the application is inactive while one thread is waiting on another 
          thread

Synchronize Pthreads

Synchronization
Race condition

Synchronization tools
    pthread_join():
        allows one thread to suspend execution until another has terminated
    mutex variable:
        mutually exclusive lock
        only one thread at a time can hold the lock and access the data it protects
    condition variable:
        provide a way of naming an event in which threads have a general interest
    pthread_once():
        ensure that initialization routines get executed once and only once when called by
        multiple threads
    
    some of the common synchronization mechanisms:
        reader/writer exclusion:
            allow multiple threads to read data concurrently 
            but ensure that any thread writing to the data has exclusive access
        thread-safe data structure:
            build synchronization primitives into a complex data structure
            so that each time you access it you do not need to make a separate call
            to synchronize concurrent access.
        semaphores:
            have semaphore if the platform supports POSIX real-time extensions (POSIX.1b)

    Mutex variable:
        mutual exclusion / mutex
        critical region:
            the piece of code that must be executed atomically
        The single statement in C may be no longer atomic at the hardware level.
        
        1. create & initialize a mutex for each resource to be protected,
           pthread_mutex_t
           pthread_mutex_attr_t
           pthread_mutexattr_init
           pthread_mutexattr_destroy
           pthread_mutexattr_setshared // PTHREAD_PROCESS_SHARED | PTHREAD_PROCESS_PRIVATE
           E.g.
               pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
               // Or
               pthread_mutex_t *mutexPtr;
               mutexPtr = (pthread_mutex_t *)malloc(sizeof(pthread_mutex_t));
               pthread_mutex_init(mutexPtr, NULL);
        2. pthread_mutex_lock(pthread_mutex_t*)
               Block the calling thread until it's granted the lock
               If the mutex is unlocked at the time of the call, the lock's granted immediately;
               otherwise, it's granted after it's released by the thread that's holding it.
           pthread_mutex_trylock()
               It does not suspend its caller if another thread already holds the mutex
               It returns immediately, indicating that the mutex is currently locked
               Represent a kind of polling for a resource - repeatedly trying and backing off until
               the resource is obtained.
               This polling leads to some overhead and, worse, potential resource starvation.
               Acceptable situations:
               - real-time programmers to poll for state changes
               - detect and avoid deadlock in locking hierarchies and priority inversion situations ??
        3. pthread_mutex_unlock(pthread_mutex_t*)

        The thread that holds a lock on a mutex can assume that:
            - no other thread will write to the data
            - no other thread will read the data while it is in some sort of intermediate state

        Mutex is best used for controlling direct access to data and resources.
        But for event synchronization, condition variable is better.

        Shortcomings:
            1. Mutex is the most restrictive type of access control.
               But sometimes, reader/writer locks provides better access control:
                - the writer should be allowed access whenever any readers are using the data
                - when a writer is using it, neither readers nor other writers are allowed in.
                => can "roll your own" using mutex and condition variable
            2. Recursive lock
               A lock that can be relocked any number of times by its current holder.
                - Imagine the Pthreads library associating an internal counter with a recursive mutex
                  to count the number of times its current holder has called pthread_mutex_lock
                - Each time the current holder calls pthread_mutex_unlock, the library would decrement
                  this counter. The lock would not be released until the call that brings the count down
                  to zero is issued.
               Useful for a thread that makes a number of nested calls to a routine that locks and 
               manipulates a resource

        Contention for a mutex
            If more than one threads is waiting for a locked mutex, which thread is the first to be 
            granted the lock once its released?
            => The thread with the highest priority gets the lock.
            The use of priority in a multithreaded program can lead to a classic multiprocessing problem:
            Priority Inversion:
                Involve a low priority thread that holds a lock that a higher priority thread wants.
                Because the higher priority thread cannot continue until the lower priority thread 
                release the lock, each thread is actually treated as if it had the inverse of its intend
                priority.
                => The best way to avoid inversion is to minimize the degree to which threads of 
                   different priority share the same locks. But this may not always be possible.
                => Eliminate the risk of priority inversion by using mutex attributes.
        When designing the synchronization for our data structures:
            - Eliminate all race conditions
            - Do not introduce deadlock
        Access Patterns and Granularity:
            Your choices for optimizing performance in a multithreaded program are tied to how its
            threads access the shared data on the list.
            Lock Granularity:
                The level at which we apply locks to our shared data, (and, thus, the number of locks
                we use to protect the data)
                Coarse-grain locking vs. Fine-grain locking
                    - Fine-grain improves concurrency, but more coding and overhead in synchronization calls 
        Locking Hierarchies
            If your shared data has some hierarchical structure - for instance, it's a tree with some 
            depth to it - you may want to allow threads to lock separate subtrees of the structure
            simultaneously. This assumes a fine-grain lock design.
            
            If we allow threads to acquire these locks in any order they please, the kind of deadlock
            known as a "deadly embrace" can occur.
            
            => Enforce a fixed locking hierarchy. To access data at any given level, all threads must 
               obtain the lock at each lower level in exactly the same order.
            => Pthread don't have built-in support for locking hierarchies.
        
        Sharing a mutex among processes
            A mutex has a single attributes that determines whether or not it can be seen by threads
            in other process: process-shared. (A mutex object also has two attributes that assit in
            scheduling threads.) If your platform allows you to set the process-shared attribute, the 
            compile-time constant _POSIX_THREAD_PROCESS_SHARED will be TRUE.
            E.g. Chapter3 Mutex Variable Example 3-6

            - Once a process has multiple threads, forking has many pitfalls. So we took care to
              initialize the mutex before forking and to fork before we created multiple threads from
              multiple processes.
            - For strict, process-to-process synchronization, use System V or POSIX.1b semaphores. For
              thread-to-thread synchronization across processes, you can use POSIX.1b semaphores as an
              alternative to process-shared mutexes.

    Condition Variable
        Mutex lets threads synchronize by controlling their access to data.
        Condition variable lets threads synchronize on the value of data.
        Cooperating threads wait until data reaches a particular stat or until a certain event occurs.
        CV provides a kind of notification system among threads. 
        If Pthreads didn't offer CV, but only provided Mutexes:
            Threads would need to poll the variables to determine when it reached a certain state.