Concurrency and Parallelism - CameronAuler/python-devops GitHub Wiki

Table of Contents

Python provides multiple ways to handle concurrent and parallel execution using:

  • Threading (for concurrency with shared memory)
  • Multiprocessing (for true parallel execution using multiple CPU cores)
  • Asyncio (for asynchronous, non-blocking operations)
Technique Use Case
threading Best for I/O-bound tasks where multiple operations wait for external resources (e.g., network requests, file I/O, database queries). Uses multiple threads but does not achieve true parallelism due to the Global Interpreter Lock (GIL).
multiprocessing Ideal for CPU-bound tasks that require intensive computations (e.g., machine learning, image processing, large data analysis). Runs separate processes, allowing true parallel execution across multiple CPU cores.
asyncio Designed for high-performance asynchronous I/O operations where tasks involve waiting (e.g., web scraping, handling thousands of API requests, real-time applications). Uses cooperative multitasking with async and await.

Threading (Concurrency with Shared Memory)

The threading module allows concurrent execution of tasks within the same process. However, due to Python’s Global Interpreter Lock (GIL), threads don’t run in parallel for CPU-bound tasks but work well for I/O-bound operations.

Creating and Running Threads

Threads run concurrently but not truly in parallel. Threading is mainly used for I/O-bound tasks (e.g., reading files, network requests).

import threading

def print_numbers():
    for i in range(5):
        print(f"Number: {i}")

# Creating threads
thread1 = threading.Thread(target=print_numbers)
thread2 = threading.Thread(target=print_numbers)

# Starting threads
thread1.start()
thread2.start()

# Waiting for threads to finish
thread1.join()
thread2.join()
# Output (Order may vary):
Number: 0
Number: 1
Number: 2
Number: 3
Number: 4
Number: 0
Number: 1
Number: 2
Number: 3
Number: 4

Using Locks to Prevent Race Conditions

Since threads share memory, they can cause race conditions. Locks are mainly used for preventing data corruption in shared variables.

import threading

counter = 0
lock = threading.Lock()

def increment():
    global counter
    for _ in range(100000):
        with lock:  # Ensures only one thread updates `counter` at a time
            counter += 1

thread1 = threading.Thread(target=increment)
thread2 = threading.Thread(target=increment)

thread1.start()
thread2.start()

thread1.join()
thread2.join()

print("Final Counter:", counter)  # Expected: 200000

Multiprocessing (True Parallelism)

Unlike threading, the multiprocessing module creates separate processes, allowing true parallel execution across CPU cores.

Running Processes in Parallel

multiprocessing.Process() creates separate processes, utilizing multiple CPU cores. This is ideal for CPU-bound tasks (e.g., heavy computations, data processing).

import multiprocessing

def worker():
    print(f"Process {multiprocessing.current_process().name} is running")

if __name__ == "__main__":
    process1 = multiprocessing.Process(target=worker)
    process2 = multiprocessing.Process(target=worker)

    process1.start()
    process2.start()

    process1.join()
    process2.join()
# Output (Processes run in parallel):
Process Process-1 is running
Process Process-2 is running

Sharing Data Between Processes

Since each process has its own memory, we use Queues or Shared Memory in order to pass messages between processes safely.

import multiprocessing

def worker(queue):
    queue.put("Hello from process")

if __name__ == "__main__":
    q = multiprocessing.Queue()
    process = multiprocessing.Process(target=worker, args=(q,))
    process.start()
    process.join()
    print(q.get())  # Retrieve data from queue
# Output:
Hello from process

Asyncio (Asynchronous Programming)

asyncio enables non-blocking, cooperative multitasking using async and await.

Asynchronous Functions (async & await)

These are mainly used for network requests, database queries, and real-time applications.

import asyncio

async def say_hello():
    print("Hello")
    await asyncio.sleep(1)  # Non-blocking sleep
    print("World")

asyncio.run(say_hello())  # Runs the async function
# Output (Executes asynchronously):
Hello
(World prints after 1 second)

Running Multiple Tasks Concurrently

asyncio.gather() runs tasks concurrently without blocking. This is mainly used for I/O-bound operations like web scraping and API requests.

import asyncio

async def task(name, delay):
    print(f"Task {name} starting")
    await asyncio.sleep(delay)  # Simulates non-blocking work
    print(f"Task {name} done")

async def main():
    await asyncio.gather(task("A", 2), task("B", 1))  # Run tasks in parallel

asyncio.run(main())
# Output:
Task A starting
Task B starting
Task B done  # (After 1 second)
Task A done  # (After 2 seconds)

Asyncio with Threading

Run async tasks in a separate thread. This is mainly used for combining async I/O with CPU-bound tasks.

import asyncio
import threading

def run_asyncio():
    asyncio.run(async_task())

async def async_task():
    print(f"Running in thread: {threading.current_thread().name}")
    await asyncio.sleep(1)
    print("Async task done")

thread = threading.Thread(target=run_asyncio)
thread.start()
thread.join()