Python Notes‐2 - LogeshVel/learning_resources GitHub Wiki

image

GIL

Python Wiki

In CPython, the global interpreter lock, or GIL, is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecodes at once. The GIL prevents race conditions and ensures thread safety.

The Global Interpreter Lock (GIL) is a mechanism in Python that allows only one thread to execute at a time in the interpreter. This lock is essential for memory safety in multi-threaded Python programs. However, it limits the effectiveness of multithreading, especially in CPU-intensive tasks, as it prevents multiple threads from running in parallel on multiple cores.

Why Python Developers Use GIL?

What Problem did the GIL Solve? The Global Interpreter Lock (GIL) in Python addresses a critical issue in concurrent programming, a race condition. A race condition occurs when multiple threads simultaneously access and modify shared data, leading to unpredictable and incorrect outcomes.

Consider a shared variable n:

n = 5

Suppose two threads, T1 and T2, modify n as follows:

# T1
n = n + 20

# T2
n = n * 10

The final value of n depends on the execution order of T1 and T2. For instance, if T1 executes before T2, the result differs from if T2 goes first. In the worst case, if both threads access n simultaneously, both might read n as 5, leading to inconsistent modifications.

GIL prevents this by ensuring that only one thread runs at a time, making Python's thread management simpler and safer. However, this comes at the cost of reduced efficiency in multi-core systems, as threads cannot run in parallel, limiting the utilization of multiple processors.

Why GIL is Chosen as the Solution?

The Global Interpreter Lock (GIL) in Python was chosen as a solution primarily for its simplicity and effectiveness in thread-safe memory management. Many Python libraries are built using C or C++ at the backend, where managing concurrent access to memory is crucial. GIL provides an efficient way to ensure thread safety, allowing only one thread to execute at a time, thereby avoiding complex lock management and potential deadlock scenarios.

Moreover, implementing GIL is straightforward and easily integrated with Python's design. While it does slow down multi-threaded programs by preventing true parallel execution on multi-core processors, GIL significantly boosts the performance of single-threaded programs. This is because it requires managing only one lock, reducing the overhead associated with multiple-thread synchronization. Thus, GIL balances between ease of implementation and maintaining performance efficiency in various Python applications.

The Impact on Multi-Threaded Python Programs

The presence of the Global Interpreter Lock (GIL) in Python significantly influences the performance of multi-threaded programs, which can be broadly categorized as CPU-bound or I/O-bound.

  • CPU Bound: These programs, such as image processing or complex mathematical computations, push the CPU to its limits. They do not heavily rely on I/O operations.

  • I/O Bound: These involve significant I/O operations, like file or network access, often waiting for external input/output.

For I/O-bound programs, the GIL's impact is minimal. This is because the lock is released while a thread waits for I/O, allowing other threads to execute.

However, the scenario is different for CPU-bound programs. This is illustrated with the following examples:

Single-Threaded Program:

# Execution time calculation for a single-threaded Python program

import time

def calculateSum(n):
    sum = 0
    for i in range(n):
        sum += i 

n = 100000000
startTime = time.time()
calculateSum(n)
endTime = time.time()

print("Time taken in single thread = ", endTime - startTime)
Output:

Time taken in single thread = 3.406636953353882

Multi-Threaded Program:

# Execution time calculation for a multi-threaded Python program

import time
from threading import Thread

def calculateSum(n):
    sum = 0
    for i in range(n):
        sum += i 

n = 100000000
T1 = Thread(target=calculateSum, args=(n//2,))
T2 = Thread(target=calculateSum, args=(n//2,))
startTime = time.time()
T1.start()
T2.start()
T1.join()
T2.join()
endTime = time.time()

print("Time taken in multi-thread = ", endTime - startTime)
Output:

Time taken in multi-thread = 5.456082820892334

These examples demonstrate that a single-threaded CPU-bound program can be faster than its multi-threaded counterpart due to GIL's restrictions. Consequently, GIL can introduce delays in multi-threaded CPU-bound Python programs.

Why hasn't the GIL been Removed Yet?

Despite its limitations, the Global Interpreter Lock (GIL) remains in Python due to concerns about backward compatibility and the stability of existing libraries, many of which are written in C/C++. Removing GIL risks breaking this foundational code. Attempts to replace it have led to issues like degraded performance in single-threaded and I/O-bound multi-threaded programs. The Python community prioritizes the reliability and efficiency of the language. Any alternative that significantly slows down existing applications is not favourable. Consequently, despite its drawbacks, GIL is retained to ensure the consistent performance and stability of Python programs.

How to Deal with Python's GIL?

Addressing the challenges posed by Python's Global Interpreter Lock (GIL) can be approached in a couple of effective ways:

  • Using Multiprocessing: A popular method to circumvent GIL limitations is through multiprocessing. By creating multiple processes instead of threads, each process operates with its own Python interpreter and memory space, thereby sidestepping the GIL. However, this approach should be used judiciously as processes are more resource-intensive than threads, and managing multiple processes can introduce significant overhead.

  • Choosing a Different Interpreter: Python supports various interpreters, each with its own characteristics:

CPython: The standard interpreter, written in C, includes GIL.

JPython: A Java-based interpreter.

IronPython: Developed in C#, primarily for .NET environments.

PyPy: An alternative Python interpreter, implemented in Python, focusing on performance and efficiency.

Except for CPython, other interpreters may not have a GIL, offering potential performance benefits. However, compatibility with existing libraries is a critical factor to consider when switching interpreters, as not all Python libraries may be available or function identically across different interpreters.

Conclusion **GIL **in Python is a mutex that gives the control of the Python interpreter only to a single thread at a time. It is necessary because of the existing libraries of Python which are written in C/C++. Various ways are there to bypass GIL in Python like using multiple processes or using a different interpreter itself.

Python concurrency

Concurrency is working on multiple things at the same time. In Python, concurrency can be reached in several ways:

  • With threading, by letting multiple threads take turns.
  • With multiprocessing, we’re using multiple processes. This way we can truly do more than one thing at a time using multiple processor cores. This is called parallelism.
  • Using asynchronous IO, firing off a task and continuing to do other stuff, instead of waiting for an answer from the network or disk.
  • By using distributed computing, which basically means using multiple computers at the same time

image

image

I/O bound vs CPU bound problems

image

Python concurrency with threads

While waiting for answers from the network or disk, you can keep other parts running using multiple threads. A thread is an independent sequence of execution. Your Python program has, by default, one main thread. But you can create more of them and let Python switch between them. This switching happens so fast that it appears like they are running side by side at the same time.

Unlike other languages, however, Python threads don’t run at the same time; they take turns instead. The reason for this is a mechanism in Python called the Global Interpreter Lock (GIL). In short, the GIL ensures there is always one thread running, so threads can’t interfere with each other. This, together with the threading library, is explained in detail on the next page.

The takeaway is that threads make a big difference for I/O bound software but are useless for CPU bound software. Why is that? It’s simple. While one thread is waiting for a reply from the network, other threads can continue running. If you make a lot of network requests, threads can make a tremendous difference. If your threads are doing heavy calculations instead, they are just waiting for their turn to continue. Threading would only introduce more overhead: there’s some administration involved in constantly switching between threads.

Python concurrency with asyncio

Asyncio article

Asyncio is a relatively new core library in Python. It solves the same problem as threading: it speeds up I/O bound software, but it does so differently.

Python concurrency with multiprocessing

If your software is CPU-bound, you can often rewrite your code in such a way that you can use more processors at the same time. This way, you can linearly scale the execution speed. This type of concurrency is what we call parallelism, and you can use it in Python too, despite the GIL.

Not all algorithms can be made to run in parallel. It is impossible to parallelize a recursive algorithm, for example. But there’s almost always an alternative algorithm that can work in parallel just fine.

There are two ways of using more processors:

  • Using multiple processors and/or cores within the same machine. In Python, we can do so with the multiprocessing library.
  • Using a network of computers to use many processors, spread over multiple machines. We call this distributed computing. Python’s multiprocessing library, unlike the Python threading library, bypasses the Python Global Interpreter Lock. It does so by actually spawning multiple instances of Python. Instead of threads taking turns within a single Python process, you now have multiple Python processes running multiple copies of your code at once.

The multiprocessing library, in terms of usage, is very similar to the threading library. A question that might arise is: why should you even consider threading? You can guess the answer. Threading is ‘lighter’: it requires less memory since it only requires one running Python interpreter. Spawning new processes has its overhead as well. So if your code is I/O bound, threading is likely good enough and more efficient and lightweight.

PyPy

You are probably using the reference implementation of Python, CPython. Most people do. It’s called CPython because it’s written in C. If you are sure your code is CPU bound, meaning it’s doing lots of calculations, you should look into PyPy, an alternative to CPython. It’s potentially a quick fix that doesn’t require you to change a single line of code.

PyPy claims that, on average, it is 4.4 times faster than CPython. It does so by using a technique called just-in-time compilation (JIT). Java and the .NET framework are other notable examples of JIT compilation. In contrast, CPython uses interpretation to execute your code. Although this offers a lot of flexibility, it’s also very slow.

With JIT, your code is compiled while running the program. It combines the speed advantage of ahead-of-time compilation (used by languages like C and C++) with the flexibility of interpretation. Another advantage is that the JIT compiler can keep optimizing your code while it is running. The longer your code runs, the more optimized it will become.

Threading

import threading
import time

def print_numbers(s):
    for i in range(10):
        time.sleep(0.5)
        print(f"{s}: {i}")

thread1 = threading.Thread(target=print_numbers, args=("Thread 1",))
thread2 = threading.Thread(target=print_numbers, args=("Thread 2",))
thread1.start()
time.sleep(2)
thread2.start()

thread1.join()
thread2.join()
Output:

Thread 1: 0
Thread 1: 1
Thread 1: 2
Thread 1: 3
Thread 2: 0
Thread 1: 4
Thread 2: 1
Thread 1: 5
Thread 2: 2
Thread 1: 6
Thread 2: 3
Thread 1: 7
Thread 2: 4
Thread 1: 8
Thread 2: 5
Thread 1: 9
Thread 2: 6
Thread 2: 7
Thread 2: 8
Thread 2: 9

Get thread info

To get thread name and thread id

threading.current_thread().name
threading.current_thread().ident

In Python's threading module, a Lock is a synchronization primitive used to prevent multiple threads from accessing shared resources simultaneously. This is crucial in multi-threaded programs to avoid race conditions and ensure data consistency.

What is a Lock?

A Lock is a low-level mechanism that can be in one of two states:

  1. Locked (Acquired): When a thread acquires the lock, other threads that try to acquire it will block (wait) until the lock is released.
  2. Unlocked (Released): When the lock is released, one of the waiting threads can acquire it.

The primary purpose of a lock is to ensure mutual exclusion: only one thread can hold the lock at a time.


How Locks Work

  1. A thread acquires the lock using the acquire() method.
  2. While the lock is held, no other thread can acquire it. If a thread attempts to acquire an already-held lock, it will block until the lock is released.
  3. The thread releases the lock using the release() method.

Using Locks in Python

Here’s a simple example:

import threading

# Shared resource
counter = 0
lock = threading.Lock()

def increment():
    global counter
    for _ in range(1000000):
        with lock:  # Acquiring the lock
            counter += 1  # Critical section

# Create threads
thread1 = threading.Thread(target=increment)
thread2 = threading.Thread(target=increment)

# Start threads
thread1.start()
thread2.start()

# Wait for threads to finish
thread1.join()
thread2.join()

print(f"Final counter value: {counter}")

Key Points in the Example:

  1. Shared Resource (counter): The counter variable is accessed and modified by multiple threads.
  2. Lock Usage:
    • The with lock: statement acquires the lock at the beginning of the block and releases it at the end, ensuring that only one thread modifies counter at a time.
    • This prevents race conditions where multiple threads modify counter simultaneously.
  3. Without the Lock: If the lock were not used, the final value of counter would likely be incorrect due to thread interference.

Methods of a Lock

  1. acquire(blocking=True, timeout=-1)

    • Acquires the lock.
    • Parameters:
      • blocking (default True): If True, the thread will block until the lock is available. If False, the thread will return immediately if the lock is unavailable.
      • timeout: Specifies a timeout period (in seconds) to wait for the lock.
    • Returns True if the lock was acquired, False otherwise (only if blocking=False or timeout expires).

    Example:

    if lock.acquire(blocking=False):  # Non-blocking
        try:
            print("Lock acquired!")
        finally:
            lock.release()
    else:
        print("Could not acquire lock.")
    
  2. release()

    • Releases the lock.
    • A thread that has acquired the lock must release it; otherwise, other threads cannot proceed.

    Example:

    lock.acquire()  # Acquire lock
    # Critical section
    lock.release()  # Release lock
    

ReentrantLock (RLock)

Python provides another lock type called RLock (Reentrant Lock), which allows a thread to acquire the same lock multiple times. This is useful when the same thread needs to re-enter a critical section.

Example:

import threading

lock = threading.RLock()

def recursive_function(n):
    with lock:
        if n > 0:
            print(f"Lock acquired for n={n}")
            recursive_function(n - 1)
        print(f"Releasing lock for n={n}")

recursive_function(3)

Difference Between Lock and RLock:

  • Lock: A thread cannot re-acquire a lock it already holds without releasing it first.
  • RLock: A thread can acquire the same lock multiple times but must release it the same number of times.

Common Pitfalls

  1. Deadlocks:

    • Occurs when two or more threads are waiting for each other to release locks.
    • Example of a deadlock:
      import threading
      
      lock1 = threading.Lock()
      lock2 = threading.Lock()
      
      def thread1():
          with lock1:
              with lock2:
                  print("Thread 1")
      
      def thread2():
          with lock2:
              with lock1:
                  print("Thread 2")
      
      t1 = threading.Thread(target=thread1)
      t2 = threading.Thread(target=thread2)
      
      t1.start()
      t2.start()
      
      t1.join()
      t2.join()
      
  2. Neglecting to Release a Lock:

    • If a thread fails to release a lock (e.g., due to an exception), it can block other threads indefinitely.
    • Use the with statement to ensure proper release.

Best Practices

  1. Use with for lock acquisition and release to avoid errors:
    with lock:
        # Critical section
    
  2. Use RLock if reentrant locking is required.
  3. Avoid nested locks unless necessary, and always acquire locks in a consistent order to prevent deadlocks.
  4. Consider higher-level synchronization mechanisms (e.g., Semaphore, Condition, Queue) if locks alone are not sufficient for your use case.

In summary, locks are a fundamental tool for thread synchronization in Python, ensuring that shared resources are accessed safely. However, they require careful handling to avoid common issues like deadlocks and improper usage.

Thread lock - block parameter

The below example will throw an error at the end.

import threading
from threading import Lock
import time

lock = Lock()

def print_numbers(s):
    lock.acquire(blocking=False)
    for i in range(10):
        time.sleep(0.5)
        print(f"{s}: {i}")
    lock.release()

thread1 = threading.Thread(target=print_numbers, args=("Thread 1",))
thread2 = threading.Thread(target=print_numbers, args=("Thread 2",))
thread1.start()
time.sleep(2)
thread2.start()

thread1.join()
thread2.join()

Output:

Thread 1: 0
Thread 1: 1
Thread 1: 2
Thread 1: 3
Thread 2: 0
Thread 1: 4
Thread 2: 1
Thread 1: 5
Thread 2: 2
Thread 1: 6
Thread 2: 3
Thread 1: 7
Thread 2: 4
Thread 1: 8
Thread 2: 5
Thread 1: 9
Thread 2: 6
Thread 2: 7
Thread 2: 8
Thread 2: 9
Exception in thread Thread-2 (print_numbers):
Traceback (most recent call last):
  File "C:\Python311\Lib\threading.py", line 1038, in _bootstrap_inner
    self.run()
  File "C:\Python311\Lib\threading.py", line 975, in run
    self._target(*self._args, **self._kwargs)
  File "e:\Documents\python_threading.py", line 12, in print_numbers
    lock.release()
RuntimeError: release unlocked lock

Its due to the code error. At first the thread1 starts and acquires the lock and executes but after 2 secs thread two starts and it tries to acquire the lock but it failed. But we have continued to execute the thread2 without acquiring the lock (Eventhough failed to acquire the lock we are able to execute the locked resource due to the poor code. And both the threads executing the locked code by one thread.) At the end we got error Exception in thread Thread-2 (print_numbers): ... RuntimeError: release unlocked lock Since thread1 acquires the lock and it released. but the thread2 somehow executes the locked code with this blocking=False and poor code handling, while trying to execute the lock.release code by thread2 we got the exeception. Since thread2 haven't acquired the lock so while trying to release the lock which is already released by thread1 we got this exception.

Good code - for block=False

import threading
from threading import Lock
import time

lock = Lock()

def print_numbers(s):
    if lock.acquire(blocking=False):
        for i in range(10):
            time.sleep(0.5)
            print(f"{s}: {i}")
        lock.release()
    else:
        print("Failed to acquire lock")

thread1 = threading.Thread(target=print_numbers, args=("Thread 1",), name='T1')
thread2 = threading.Thread(target=print_numbers, args=("Thread 2",), name='T2')
thread1.start()
time.sleep(2)
thread2.start()

thread1.join()
thread2.join()

Output

Thread 1: 0
Thread 1: 1
Thread 1: 2
Failed to acquire lock
Thread 1: 3
Thread 1: 4
Thread 1: 5
Thread 1: 6
Thread 1: 7
Thread 1: 8
Thread 1: 9

Deamon thread

A daemon thread in Python is a thread that runs in the background and automatically terminates when the main program (or the main thread) finishes. Unlike regular (non-daemon) threads, daemon threads do not prevent the program from exiting when the main thread completes its execution.

Daemon threads are typically used for background tasks that don't need to complete before the program ends, such as logging, monitoring, or cleanup tasks.


Key Characteristics of Daemon Threads

  1. Background Execution:

    • Daemon threads run in the background and are often used for tasks that run continuously or periodically while the program is active.
  2. Termination Behavior:

    • When the main program exits, all daemon threads are abruptly stopped, regardless of whether they have completed their task.
    • They are not given a chance to clean up or release resources.
  3. Setting as Daemon:

    • A thread can be set as a daemon by setting its daemon attribute to True before calling the start() method.
    • Example: thread.daemon = True.
  4. Default Behavior:

    • By default, threads are non-daemon (i.e., they block program termination until they finish execution).

How to Create a Daemon Thread You can create a daemon thread by either:

  1. Setting daemon=True in the thread constructor.
  2. Setting the daemon attribute of the thread object before starting the thread.

Example: Creating and Using Daemon Threads

import threading
import time

def daemon_task():
    while True:
        print("Daemon thread running...")
        time.sleep(1)

def non_daemon_task():
    print("Non-daemon thread starting...")
    time.sleep(5)
    print("Non-daemon thread finished.")

# Create a daemon thread
daemon_thread = threading.Thread(target=daemon_task, daemon=True)

# Create a non-daemon thread
non_daemon_thread = threading.Thread(target=non_daemon_task)

# Start both threads
daemon_thread.start()
non_daemon_thread.start()

# Wait for the non-daemon thread to finish
non_daemon_thread.join()

print("Main program finished.")

Output:

Daemon thread running...
Daemon thread running...
Daemon thread running...
Non-daemon thread starting...
Daemon thread running...
Daemon thread running...
Non-daemon thread finished.
Main program finished.

Explanation:

  • The daemon thread (daemon_task) keeps running in the background, printing "Daemon thread running...".
  • The non-daemon thread (non_daemon_task) starts, performs its work, and finishes after 5 seconds.
  • Once the non-daemon thread finishes, the main program terminates, and the daemon thread is abruptly stopped.

Use Cases for Daemon Threads

  1. Background Services:

    • Performing continuous background tasks like monitoring, logging, or data collection.
  2. Periodic Cleanup Tasks:

    • Clearing temporary files, purging caches, or removing unused resources.
  3. Non-Essential Operations:

    • Tasks that can be interrupted without impacting the main program.

Important Notes:

  1. Abrupt Termination:

    • Daemon threads are stopped without cleanup when the main program exits. If the thread holds any resources (e.g., files, network connections), they may not be released properly.
  2. Should Not Be Used for Critical Tasks:

    • Avoid using daemon threads for tasks that require completion, such as saving data or writing logs, because they may be terminated unexpectedly.
  3. Alternatives for Graceful Shutdown:

    • Use non-daemon threads if you need to ensure tasks finish before the program ends.
    • For daemon-like behavior with cleanup, consider combining a non-daemon thread with proper signal handling or a shutdown mechanism.

Comparison Between Daemon and Non-Daemon Threads

Feature Daemon Thread Non-Daemon Thread
Purpose Background tasks that don't block program exit. Essential tasks that must complete before exit.
Termination Abruptly stops when the main program exits. Keeps the program alive until it completes.
Default Not daemon (unless explicitly set). Always non-daemon unless changed.
Use Cases Logging, monitoring, periodic background tasks. Critical operations like saving data or results.

Daemon threads are a powerful feature for background processing in Python, but they should be used with caution due to their abrupt termination behavior. Proper understanding ensures they are applied in the right context without unintended side effects.

Semaphores

A semaphore in Python threading is a synchronization primitive that allows a fixed number of threads to access a shared resource at the same time. It is used to control access to resources that have a limited capacity, such as a pool of database connections, file handles, or network connections.


How Semaphore Works

  • A semaphore maintains an internal counter, which represents the number of threads that can simultaneously access the shared resource.
  • When a thread wants to use the resource, it acquires the semaphore, which decreases the counter by 1.
  • If the counter reaches 0, additional threads trying to acquire the semaphore will block until one of the threads holding it releases it, increasing the counter.

Types of Semaphores

  1. Bounded Semaphore (threading.BoundedSemaphore)

    • Similar to a regular semaphore but with an upper limit on the counter value.
    • Prevents accidental over-release, which could lead to incorrect behavior.
  2. Unbounded Semaphore (threading.Semaphore)

    • No restriction on the maximum value the counter can reach.

Methods in a Semaphore

  1. acquire(blocking=True, timeout=None)

    • Decreases the counter by 1.
    • If blocking=True (default), the thread waits if the counter is 0.
    • If blocking=False, the thread does not wait and immediately returns False if the counter is 0.
    • timeout specifies the maximum time to wait for the semaphore.
  2. release()

    • Increases the counter by 1, allowing other threads to acquire the semaphore.

Example: Semaphore in Python Scenario: Limiting Access to a Shared Resource Let's say a shared resource (e.g., a server) can only handle 3 threads at a time.

import threading
import time

# Create a semaphore with a maximum of 3 threads
semaphore = threading.Semaphore(3)

def access_resource(thread_id):
    print(f"Thread {thread_id} is trying to acquire the semaphore...")
    with semaphore:  # Acquires the semaphore
        print(f"Thread {thread_id} acquired the semaphore. Accessing resource...")
        time.sleep(2)  # Simulate resource usage
        print(f"Thread {thread_id} is releasing the semaphore.")
    # Semaphore is released automatically when exiting the `with` block

# Create multiple threads
threads = []
for i in range(6):  # More threads than the semaphore limit
    t = threading.Thread(target=access_resource, args=(i,))
    threads.append(t)
    t.start()

# Wait for all threads to finish
for t in threads:
    t.join()

print("All threads completed.")

Output:

Thread 0 is trying to acquire the semaphore...
Thread 0 acquired the semaphore. Accessing resource...
Thread 1 is trying to acquire the semaphore...
Thread 1 acquired the semaphore. Accessing resource...
Thread 2 is trying to acquire the semaphore...
Thread 2 acquired the semaphore. Accessing resource...
Thread 3 is trying to acquire the semaphore...
Thread 4 is trying to acquire the semaphore...
Thread 5 is trying to acquire the semaphore...
Thread 0 is releasing the semaphore.
Thread 3 acquired the semaphore. Accessing resource...
Thread 1 is releasing the semaphore.
Thread 4 acquired the semaphore. Accessing resource...
Thread 2 is releasing the semaphore.
Thread 5 acquired the semaphore. Accessing resource...
Thread 3 is releasing the semaphore.
Thread 4 is releasing the semaphore.
Thread 5 is releasing the semaphore.
All threads completed.

Explanation:

  1. Semaphore Initialization:

    • semaphore = threading.Semaphore(3) allows up to 3 threads to access the critical section simultaneously.
  2. Thread Behavior:

    • Threads 0, 1, and 2 acquire the semaphore first since its initial value is 3.
    • Threads 3, 4, and 5 wait until a slot becomes available (when one of the first three threads releases the semaphore).
  3. Releasing the Semaphore:

    • When a thread finishes using the resource, it releases the semaphore, allowing another waiting thread to acquire it.

Use Cases for Semaphores

  1. Resource Pool Management:

    • Controlling access to limited resources like database connections, file handles, or sockets.
  2. Rate Limiting:

    • Restricting the number of concurrent threads or processes performing a specific task.
  3. Producer-Consumer Problems:

    • Managing shared buffers where producers add items and consumers remove items.
  4. Traffic Control:

    • Managing thread traffic to avoid overwhelming a shared resource.

Comparison with Other Synchronization Primitives

Primitive Purpose Key Difference
Lock Ensures mutual exclusion (one thread at a time). Only one thread can access the resource at any time.
RLock A reentrant lock for the same thread to acquire multiple times. Same thread can repeatedly acquire the lock.
Semaphore Allows a specific number of threads to access a resource simultaneously. Counter determines how many threads can proceed.
Condition Thread coordination based on signaling between threads. Requires explicit signaling (e.g., notify()).

Semaphores are versatile and powerful synchronization tools, ideal for managing shared resources with a limited capacity. They provide greater flexibility than simple locks by allowing multiple threads to proceed concurrently up to a specified limit.

Race condition

What is a Race Condition?

A race condition occurs in a multi-threaded program when multiple threads access shared resources (e.g., variables, files) simultaneously and at least one thread modifies the resource. The final outcome depends on the timing or sequence of thread execution, leading to unpredictable results.

Why Race Conditions Happen

  • In multi-threading, threads execute independently.
  • If two or more threads access a shared resource without proper synchronization, their operations can overlap in unintended ways.
  • For example, if one thread is writing to a variable while another is reading from or updating it, the result can be inconsistent or incorrect.

Example of a Race Condition Let's look at an example where multiple threads increment a shared counter:

import threading

# Shared resource
counter = 0

def increment():
    global counter
    for _ in range(100000):
        counter += 1  # Critical section: race condition occurs here

# Create two threads
thread1 = threading.Thread(target=increment)
thread2 = threading.Thread(target=increment)

# Start both threads
thread1.start()
thread2.start()

# Wait for both threads to finish
thread1.join()
thread2.join()

print(f"Final counter value: {counter}")

Expected vs. Actual Output

  • Expected Output: 200000 (since each thread increments counter 100,000 times).
  • Actual Output: A value less than 200000 (due to race conditions).

The issue arises because the counter += 1 operation is not atomic; it involves multiple steps:

  1. Reading the value of counter.
  2. Incrementing it by 1.
  3. Writing the updated value back to counter.

If two threads perform these steps simultaneously, they can overwrite each other’s updates.


How to Prevent Race Conditions Race conditions can be resolved by using synchronization mechanisms to ensure that only one thread accesses the critical section at a time.


Solution: Using Locks A Lock ensures mutual exclusion by allowing only one thread to execute the critical section at any given time.

Modified Example with a Lock

import threading

# Shared resource
counter = 0
lock = threading.Lock()  # Create a lock

def increment():
    global counter
    for _ in range(100000):
        with lock:  # Acquire the lock
            counter += 1  # Critical section

# Create two threads
thread1 = threading.Thread(target=increment)
thread2 = threading.Thread(target=increment)

# Start both threads
thread1.start()
thread2.start()

# Wait for both threads to finish
thread1.join()
thread2.join()

print(f"Final counter value: {counter}")

Output:

  • Final Counter Value: 200000
  • The lock ensures that only one thread modifies counter at a time.

Other Synchronization Techniques

  1. Reentrant Lock (RLock):

    • Useful when a thread needs to acquire the same lock multiple times.
    • Example:
      lock = threading.RLock()
      
  2. Semaphore:

    • Limits the number of threads that can access a resource simultaneously.
    • Example:
      semaphore = threading.Semaphore(1)  # Similar to a lock
      
  3. Condition Variables:

    • Used for more complex synchronization, such as signaling between threads.
    • Example:
      condition = threading.Condition()
      
  4. Thread-safe Data Structures:

    • Use thread-safe alternatives like queue.Queue or collections.deque.

Summary

  • Race Condition: A situation where the outcome of a program depends on the timing or order of thread execution, leading to inconsistent results.
  • Solution: Use synchronization mechanisms like locks, semaphores, or thread-safe data structures to coordinate thread access to shared resources.
  • Key Tip: Minimize shared state and use proper synchronization to avoid unpredictable behavior in multi-threaded programs.

Multiprocessing

Multiprocessing in Python refers to a parallel execution framework that allows programs to execute multiple processes concurrently, leveraging multiple CPU cores. This is especially useful for CPU-bound tasks, where the Global Interpreter Lock (GIL) in Python can limit the performance of multithreaded programs.

Python’s multiprocessing module provides a high-level interface to create and manage processes. It enables parallelism by creating separate processes for different tasks, bypassing the GIL and fully utilizing multicore processors.


Key Features of the multiprocessing Module:

  1. Process-Based Parallelism:
    • Each process has its own memory space, avoiding data corruption issues caused by shared memory.
  2. Support for Multiple CPUs:
    • Takes advantage of multiple cores, making it ideal for CPU-intensive tasks.
  3. Interprocess Communication (IPC):
    • Provides tools like Queue, Pipe, and shared memory to enable communication between processes.
  4. Process Synchronization:
    • Includes synchronization primitives like Lock, Event, Semaphore, and Barrier.

  1. Creating Processes: You can create a process using the Process class.
     import multiprocessing as mp
     import os
     import time
    
     def do_work():
         time.sleep(2)
         for i in range(1, 6):
             print(f'{mp.current_process().name}:{os.getpid()} -> {i}')
    
     def main():
         p1 = mp.Process(name='Process 1', target=do_work)
         p2 = mp.Process(name='Process 2', target=do_work)
         p1.start()
         p2.start()
         p1.join()
         p2.join()
         
     if __name__ == '__main__':
         s = time.perf_counter()
         main()
         print(f'It takes {time.perf_counter()-s} secs')
    
    
    Output:
     Process 1:6324 -> 1
     Process 1:6324 -> 2
     Process 1:6324 -> 3
     Process 1:6324 -> 4
     Process 1:6324 -> 5
     Process 2:6920 -> 1
     Process 2:6920 -> 2
     Process 2:6920 -> 3
     Process 2:6920 -> 4
     Process 2:6920 -> 5
     It takes 2.2620684000003166 secs
    

Advantages:

  • Overcomes the GIL, enabling true parallelism for CPU-bound tasks.
  • Simplifies writing parallel programs compared to managing threads directly.
  • Ensures process isolation, reducing risks of shared memory corruption.

Disadvantages:

  • Higher memory usage compared to threading since each process has its own memory space.
  • Overhead of inter-process communication.
  • Debugging can be harder due to separate memory spaces.

When to Use Multiprocessing:

  1. CPU-Bound Tasks:
    • Tasks that involve heavy computations, such as matrix operations, scientific simulations, or image processing.
  2. When GIL Becomes a Bottleneck:
    • Multiprocessing avoids the GIL, unlike threading.

For tasks that are I/O-bound, consider using threading or asynchronous programming, as the GIL does not significantly impact I/O operations.


To know how many cpu cores are there

import multiprocessing as mp
print(mp.cpu_count())
# Output
# 8

How to Use multiprocessing.Pool

The multiprocessing.Pool in Python is a convenient way to parallelize tasks using multiple processes. It manages a pool of worker processes, allowing you to execute tasks concurrently.

1. Import the Library

from multiprocessing import Pool

2. Define a Task Function

def square(x):
    return x * x

3. Create a Pool and Apply Tasks

  • map(): Applies a function to all items in an iterable.
  • apply(): Applies a function to a single argument.
  • apply_async(): Same as apply(), but returns a result asynchronously.
  • starmap(): Like map(), but supports multiple arguments.
if __name__ == "__main__":
    with Pool(4) as p:  # Create a pool with 4 processes
        
        # Using map
        numbers = [1, 2, 3, 4, 5]
        results = p.map(square, numbers)
        print("Results using map:", results)

        # Using apply_async
        result = p.apply_async(square, (10,))
        print("Result using apply_async:", result.get())

        # Using starmap (for functions with multiple arguments)
        def multiply(x, y):
            return x * y
        
        pairs = [(1, 2), (3, 4), (5, 6)]
        results = p.starmap(multiply, pairs)
        print("Results using starmap:", results)

Key Points:

  1. Initialization: Use Pool(processes=n) where n is the number of worker processes. If not specified, it defaults to the number of CPU cores.
  2. Context Management: Use with Pool(...) as p to ensure the pool is closed automatically.
  3. Blocking vs Non-Blocking:
    • map() and apply() are blocking.
    • apply_async() and starmap_async() are non-blocking, returning a result object that you can query later using .get().

Considerations:

  • Use if __name__ == "__main__": to prevent recursive process creation on Windows.
  • Ensure tasks are CPU-bound for maximum benefit, as I/O-bound tasks work better with concurrent.futures.ThreadPoolExecutor.

Multiple functions in Pool execution concurrently

To run multiple functions concurrently using multiprocessing.Pool, you can use either:

  1. Submitting tasks with apply_async()
  2. Combining starmap() or map() with a helper function
  3. **Using functools partial() and map() for the non-blocking execution

Example 1: Using apply_async() for Multiple Functions

from multiprocessing import Pool
import time

def a(x):
    time.sleep(2)
    return x

def b(x):
    time.sleep(2)
    return x

if __name__ == "__main__":
    s = time.perf_counter()
    
    with Pool(4) as pool:
        res1 = pool.apply_async( a, args=('A', ))
        res2 = pool.apply_async( b, args=('B', ))
        print(res1.get())
        print(res2.get())
        
    print(f"Time: {time.perf_counter()-s} sec")

Output

A
B
Time: 2.416869699999552 sec

Example 2: Using a Helper Function with starmap()

from multiprocessing import Pool
import time

# Define functions
def process_function(func, x):
    return func.__name__, func(x)

def a(x):
    time.sleep(2)
    return x

def b(x):
    time.sleep(2)
    return x

if __name__ == "__main__":
    s = time.perf_counter()
    tasks = [(a, 4), (b, 2), (a, 3), (b, 5)]
    
    with Pool(4) as pool:
        results = pool.starmap(process_function, tasks)
        
    for func_name, result in results:
        print(f"{func_name} result: {result}")
    print(f"Time: {time.perf_counter()-s} sec")

Output

a result: 4
b result: 2
a result: 3
b result: 5
Time: 2.424494899999445 sec

3.Using functools partial and map for non-blocking execution

from multiprocessing import Pool
from functools import partial
import time


def func_a(a):
    time.sleep(2)
    return a

def func_b(a, b):
    time.sleep(2)
    return a, b

def map_func(func):
    return func()

if __name__ == "__main__":
    s = time.perf_counter()
    with Pool() as pool:
        a = partial(func_a, "A")
        b = partial(func_b, "A", "B")

        res = pool.map(map_func, [a, b])
        print(res)
    print(f"Time: {time.perf_counter()-s} sec")
Output
['A', ('A', 'B')]
Time: 2.662100499999724 sec

Data sharing issue in multiprocessing

In multiprocessing, each process has its own memory. Process do not share memory. Each process has its own space of memory. So what we expect will not be the answer for the following code.

from multiprocessing import Process

l = [0]

def func():
    global l
    l.extend([1,2,3])
    print('Data inside func: ', l)

if __name__ == "__main__":
    p1 = Process(target=func)
    p1.start()
    p1.join()
    print('Data in main', l)

Output

Data inside func:  [0, 1, 2, 3]
Data in main [0]

To overcome this issue we have two options

  1. Pipes
  2. Queues

Pipes

In multiprocessing in Python, a Pipe is a method for two-way communication between processes. It provides a way to send and receive data between processes in a multiprocessing environment.

How It Works

  • A pipe is created using multiprocessing.Pipe(), which returns two connection objects: conn1 and conn2.
  • Data sent through one connection can be received from the other.

Example: Using Pipe in Multiprocessing

from multiprocessing import Process, Pipe

# Function to run in a child process
def worker(conn):
    conn.send("Hello from child process!")
    conn.close()

if __name__ == "__main__":
    # Create a Pipe
    parent_conn, child_conn = Pipe()

    # Start a process
    p = Process(target=worker, args=(child_conn,))
    p.start()

    # Receive data from the child process
    print("Received:", parent_conn.recv())

    p.join()

Key Points

  1. Bidirectional Communication: By default, Pipe() creates a bidirectional pipe.
  2. Duplex Control: Use Pipe(duplex=False) for one-way communication.
  3. Data Types: Pipes can send and receive picklable objects (e.g., strings, numbers, lists).

When to Use Pipe vs Queue

  • Pipe: Simple and suitable for two processes.
  • Queue: Better for multiple processes and complex data sharing. It's thread- and process-safe.

A sentinel value is a special value used to signal the end of a process, data stream, or loop. It acts as a marker that indicates no more data is expected or that a special condition has been reached.

Why Use a Sentinel Value?

End of Data: To mark the end of data in a stream or loop. Termination Signal: To signal processes or threads to stop working. Default Indicator: To represent a value that cannot be confused with valid data.

In multiprocessing, a sentinel value can be used to signal when a process should stop processing data. This is particularly useful when using Pipe for inter-process communication.

from multiprocessing import Process, Pipe

# Define a sentinel value
SENTINEL = "STOP" # can be anything but it should not be a confused with the valid data that can be processed.

# Worker function
def worker(conn):
    while True:
        data = conn.recv()  # Receive data from the main process
        if data == SENTINEL:
            print("Worker received the sentinel value, terminating.")
            break
        print(f"Worker received: {data}")
    conn.close()

if __name__ == "__main__":
    # Create a Pipe
    parent_conn, child_conn = Pipe()

    # Start a worker process
    p = Process(target=worker, args=(child_conn,))
    p.start()

    # Send some data
    parent_conn.send("Task 1")
    parent_conn.send("Task 2")
    parent_conn.send("Task 3")

    # Send the sentinel value to terminate the worker
    parent_conn.send(SENTINEL)

    # Wait for the process to finish
    p.join()
    print("Main process finished.")

Key Considerations

  • Unique Choice: Ensure the sentinel value is distinct and cannot be confused with valid data.
  • Type Selection: Common sentinel values include None, -1, "", float('inf'), or custom objects.

Queues

A Multiprocessing Queue in Python is a thread and process-safe queue provided by the multiprocessing module. It allows multiple processes to communicate by sharing data efficiently.

How to Use Multiprocessing Queue

  1. Import the Module:

    from multiprocessing import Process, Queue
    
  2. Creating a Queue:

    q = Queue(maxsize=5)  # Optional maxsize argument
    
  3. Adding Items to the Queue (Put):

    q.put("Hello")
    q.put(42)
    
  4. Retrieving Items from the Queue (Get):

    item = q.get()
    print(item)
    

Example: Producer-Consumer Model

from multiprocessing import Process, Queue
import time

# Function for adding items to the queue
def producer(q):
    for i in range(5):
        q.put(i)
        print(f"Produced: {i}")
        time.sleep(1)

# Function for consuming items from the queue
def consumer(q):
    while True:
        item = q.get()
        print(f"Consumed: {item}")
        if item == 4:  # Stop after consuming the last item
            break

if __name__ == "__main__":
    q = Queue()

    p1 = Process(target=producer, args=(q,))
    p2 = Process(target=consumer, args=(q,))

    p1.start()
    p2.start()

    p1.join()
    p2.join()

Key Methods of Multiprocessing Queue

  • put(item): Adds an item to the queue.
  • get(): Removes and returns an item from the queue.
  • empty(): Returns True if the queue is empty.
  • full(): Returns True if the queue is full.
  • qsize(): Returns the approximate size of the queue (not guaranteed to be accurate).

Considerations

  • Avoid using empty(), full(), and qsize() in critical sections as the values can change unexpectedly due to process scheduling.
  • Always use process-safe techniques to avoid deadlocks or race conditions.

Like Pipe q.get() is a blocking code which will wait indefinitely to get the element. If there is not put but we try to get then that code waits forever.

The timeout parameter in Python's multiprocessing.Queue is used with the put() and get() methods to limit the time a process waits when interacting with the queue. If the specified time is exceeded, a queue.Full or queue.Empty exception is raised, depending on the operation.

How timeout Works

  1. put(item, block=True, timeout=None):
    • block=True (default): The process waits until space is available or timeout is reached.
    • block=False: The process raises queue.Full if the queue is full immediately.
    • timeout=None (default): Waits indefinitely if block=True.

  1. get(block=True, timeout=None):
    • block=True (default): The process waits until an item is available or timeout is reached.
    • block=False: Raises queue.Empty if the queue is empty immediately.
    • timeout=None (default): Waits indefinitely if block=True.

Examples Using put() with timeout

from multiprocessing import Process, Queue
import time

def producer(q):
    for i in range(5):
        try:
            q.put(i, block=True, timeout=2)
            print(f"Produced: {i}")
        except Exception as e:
            print(f"Queue full! Could not produce {i}")
        time.sleep(1)

if __name__ == "__main__":
    q = Queue(maxsize=2)
    p = Process(target=producer, args=(q,))
    p.start()
    p.join()

Using get() with timeout

from multiprocessing import Process, Queue
import time

def consumer(q):
    while True:
        try:
            item = q.get(block=True, timeout=2)
            print(f"Consumed: {item}")
        except Exception as e:
            print("Queue empty! Exiting...")
            break

if __name__ == "__main__":
    q = Queue()
    q.put(1)
    q.put(2)

    p = Process(target=consumer, args=(q,))
    p.start()
    p.join()

Key Points

  • Default Behavior: If timeout is not specified, it waits indefinitely.
  • When to Use: Use timeout to avoid processes getting stuck while waiting for items or space in the queue.
  • Exceptions Raised:
    • queue.Full: When put() fails due to a full queue.
    • queue.Empty: When get() fails due to an empty queue.

Lock and Semaphore

Python's multiprocessing module provides synchronization primitives like Lock and Semaphore to manage access to shared resources among multiple processes.

1. Lock

A Lock is a synchronization primitive that allows only one process to access a critical section at a time. It prevents race conditions by ensuring exclusive access.

Key Methods

  • acquire(block=True, timeout=None): Locks the lock, optionally with a timeout.
  • release(): Releases the lock.

Example: Using Lock

from multiprocessing import Process, Lock
import time

def worker(lock, i):
    lock.acquire()
    try:
        print(f"Process {i} is working")
        time.sleep(1)
    finally:
        print(f"Process {i} is done")
        lock.release()

if __name__ == "__main__":
    lock = Lock()

    processes = [Process(target=worker, args=(lock, i)) for i in range(5)]
    
    for p in processes:
        p.start()
    
    for p in processes:
        p.join()

Using with statement

from multiprocessing import Process, Lock
import time

def worker(lock, i):
    with lock:  # Automatically acquires and releases the lock
        print(f"Process {i} is working")
        time.sleep(1)
        print(f"Process {i} is done")

if __name__ == "__main__":
    lock = Lock()

    processes = [Process(target=worker, args=(lock, i)) for i in range(5)]
    
    for p in processes:
        p.start()
    
    for p in processes:
        p.join()

2. Semaphore

A Semaphore is a more flexible synchronization primitive that allows a limited number of processes to access a resource simultaneously. It manages a counter, decrementing it when a process acquires the semaphore and incrementing it when released.

Key Methods

  • acquire(block=True, timeout=None): Decrements the semaphore counter, blocking if needed.
  • release(): Increments the semaphore counter.

Example: Using Semaphore

from multiprocessing import Process, Semaphore
import time

# Allow 2 processes to work simultaneously
sem = Semaphore(2)

def worker(sem, i):
    sem.acquire()
    try:
        print(f"Process {i} is working")
        time.sleep(2)
    finally:
        print(f"Process {i} is done")
        sem.release()

if __name__ == "__main__":
    processes = [Process(target=worker, args=(sem, i)) for i in range(5)]
    
    for p in processes:
        p.start()
    
    for p in processes:
        p.join()

Using with statement

from multiprocessing import Process, Semaphore
import time

# Allow 2 processes to work simultaneously
sem = Semaphore(2)

def worker(sem, i):
    with sem:  # Automatically acquires and releases the semaphore
        print(f"Process {i} is working")
        time.sleep(2)
        print(f"Process {i} is done")

if __name__ == "__main__":
    processes = [Process(target=worker, args=(sem, i)) for i in range(5)]
    
    for p in processes:
        p.start()
    
    for p in processes:
        p.join()

Key Differences Between Lock and Semaphore

Feature Lock Semaphore
Access Control One process at a time Multiple processes allowed
Counter No counter Maintains a counter
Use Case Mutual exclusion Resource pooling
Exceptions Raised RuntimeError on double release ValueError if counter exceeds initial value

When to Use

  • Lock: When only one process should access a resource at a time.
  • Semaphore: When a limited number of processes can access a resource simultaneously, like a database connection pool or API rate limiter.

concurrent.futures

concurrent.futures Overview The concurrent.futures module in Python provides a high-level interface for asynchronously executing tasks using threads or processes. It simplifies the implementation of concurrent programming by abstracting much of the complexity involved in managing threads or processes.


Core Concepts

  1. Executor Classes:

    • ThreadPoolExecutor: Manages a pool of threads for executing tasks concurrently.
    • ProcessPoolExecutor: Manages a pool of processes for executing tasks concurrently.
  2. Future Objects:

    • A Future represents the result of an asynchronous computation.
    • You can query a Future object to check if the computation is done, wait for it to complete, or retrieve its result.
  3. Key Methods:

    • submit(fn, *args, **kwargs): Submits a callable (function/method) to the executor for execution and returns a Future object.
    • map(func, *iterables, timeout=None, chunksize=1): A parallelized version of the built-in map function.
    • shutdown(wait=True): Signals the executor to clean up resources when all tasks are done.
  4. Features:

    • Easy to use with Python's with statement, ensuring proper resource management.
    • Uniform API for both threads and processes.

Key Differences from threading and multiprocessing

Feature concurrent.futures threading multiprocessing
Ease of Use High-level abstraction Low-level, manual Low-level, manual
Concurrency Model Thread-based or process-based Thread-based Process-based
Resource Management Built-in (with statement) Manual (e.g., join()) Manual (e.g., join())
Overhead Slightly higher (abstraction) Lower Higher (inter-process communication)
Global Interpreter Lock (GIL) Affects ThreadPoolExecutor Affects all threads Not affected
Parallelism True parallelism with ProcessPoolExecutor Limited by GIL True parallelism

How concurrent.futures Works

Using ThreadPoolExecutor

from concurrent.futures import ThreadPoolExecutor

def square(n):
    return n * n

numbers = [1, 2, 3, 4, 5]

with ThreadPoolExecutor(max_workers=3) as executor:
    results = executor.map(square, numbers)

print(list(results))  # Output: [1, 4, 9, 16, 25]

Using ProcessPoolExecutor

from concurrent.futures import ProcessPoolExecutor

def factorial(n):
    if n == 0:
        return 1
    return n * factorial(n - 1)

numbers = [5, 10, 15]

with ProcessPoolExecutor(max_workers=2) as executor:
    results = executor.map(factorial, numbers)

print(list(results))  # Output: [120, 3628800, 1307674368000]

Working with Future Objects

from concurrent.futures import ThreadPoolExecutor
import time

def sleep_and_return(n):
    time.sleep(n)
    return f"Slept for {n} seconds"

with ThreadPoolExecutor(max_workers=2) as executor:
    future1 = executor.submit(sleep_and_return, 2)
    future2 = executor.submit(sleep_and_return, 3)

    print(future1.result())  # Waits for the first task to complete
    print(future2.result())  # Waits for the second task to complete

Advantages of concurrent.futures

  1. Simplified API for both threads and processes.
  2. Readable and maintainable code using with for resource management.
  3. Easily switch between threads and processes by changing the executor class.
  4. Supports parallelism with ProcessPoolExecutor.

Limitations of concurrent.futures

  1. Slightly higher abstraction overhead compared to threading and multiprocessing.
  2. ThreadPoolExecutor is still constrained by the GIL, limiting its effectiveness for CPU-bound tasks.

When to Use concurrent.futures

  • Use ThreadPoolExecutor for I/O-bound tasks like file operations, web scraping, or database queries.
  • Use ProcessPoolExecutor for CPU-bound tasks like numerical computations or data processing.

Comparison Example Using threading:

import threading

def task(n):
    print(f"Processing {n}")

threads = []
for i in range(5):
    t = threading.Thread(target=task, args=(i,))
    threads.append(t)
    t.start()

for t in threads:
    t.join()

Using concurrent.futures:

from concurrent.futures import ThreadPoolExecutor

def task(n):
    print(f"Processing {n}")

with ThreadPoolExecutor(max_workers=5) as executor:
    executor.map(task, range(5))

The concurrent.futures version is more concise and manages resources automatically.


Conclusion

  • concurrent.futures provides a high-level interface for concurrent programming with threads and processes.
  • It abstracts away the complexity of threading and multiprocessing while still offering flexibility.
  • The choice between concurrent.futures, threading, and multiprocessing depends on the task's complexity, scalability, and specific requirements.