Processes and Threading - robbiehume/CS-Notes GitHub Wiki


Concept Definition Typical Use Case Efficiency
Process Independent execution unit with its own memory Running separate applications High overhead, isolated
Thread Lightweight unit of execution, shares memory within a process Tasks within an app (e.g., handling multiple tabs in a browser) Lightweight, requires synchronization
Synchronous Tasks run one after the other, blocking Linear operations, easy to reason about Less efficient for I/O tasks
Asynchronous Tasks can start without waiting for others to complete Network calls, file I/O, event loops More efficient, non-blocking
Concurrency Multiple tasks in progress (may not run simultaneously) I/O-bound tasks (web servers, user interface) Time-slicing on single core
Parallelism Multiple tasks executed at the same time on different cores/machines CPU-bound tasks (matrix computations, simulations) Requires multiple cores or machines
Multithreading Multiple threads within the same process, sharing memory I/O-heavy tasks (e.g., network services) Requires locks for data safety
Multiprocessing Multiple processes with separate memory spaces CPU-heavy tasks (e.g., video rendering) Avoids shared resource issues
Coroutines Functions that can pause and resume (non-blocking) Async operations (network calls, I/O) Efficient for async tasks
Subroutines Functions that pass control to another function until complete Simple sequential execution Simpler control flow

General

  • A single-core CPU can run one process at a time
    • Each core inside a multi-core CPU can run one process/thread at a time
    • With hyperthreading, you can achieve 2 per core by making it appear to the OS that a single core is actually 2

Process vs Program vs Thread vs Job

Process: a program executing on your computer

  • A process is just a construct / abstraction an OS uses

Program: stored instructions

  • A program is a static entity, a process is an active entity

Thread: a task associated with a process

  • A process usually has multiple threads

Job: collection of tasks (processes)

  • A job typically refers to a unit of work submitted to a job scheduler, often in batch processing or distributed systems
  • Jobs can consist of multiple tasks or processes that run together to accomplish a task
  • Jobs are typically used in scheduling systems (like cron jobs, batch jobs) to describe work that needs to be completed, and they may involve one or more processes

Multiprocessing vs Multithreading

Multiprocessing

  • Multiple separate processes, each with its own memory space
  • Processes can run in parallel if there are multiple cores/CPUs
  • Heavier than threads due to memory isolation and the need for inter-process communication (IPC)
  • Example: Performing complex mathematical operations on large datasets using multiple CPU cores
  • Best for: CPU-bound tasks where processes can run independently and true parallelism is required

Multithreading

  • Multiple threads can run within the same process, sharing memory and other resources
  • Threads can run concurrently (on one core) or in parallel (on multiple cores)
  • Lightweight, but the complexity arises in managing shared resources
  • However, since they share memory, synchronization (e.g., locks) is required to avoid data corruption (e.g., race conditions)
  • Example: Downloading multiple files simultaneously in a web browser
  • Best for: tasks like I/O-bound operations (e.g., network communication, file I/O) that benefit from shared memory and concurrent execution

Concepts: Concurrency vs Parallelism

  • Concurrency and parallelism can be implemented at the same time
  • For example, a program can use concurrency to manage multiple tasks and parallelism to execute those tasks at the same time on separate cores

Concurrency (single or multi-core):

  • General concept of handling and running multiple tasks (process, program, thread) at the same time (but not necessarily simultaneously)
  • Running and managing two or more tasks (process, program, thread) in overlapping time periods
  • Can be achieved via multithreading, multiprocessing, or asynchronous programming
  • Focuses on task / context switching rather than parallel execution
  • Implementations:
    • Multithreading / Multiprocessing: concurrent execution of threads / processes
    • Asynchronous: non-blocking tasks, typically for I/O

Parallelism (multi-core):

  • Multiple tasks are executed at the same time on different CPU cores or machines
  • For true parallel execution, multiple CPUs or multiple cores on a single CPU are required
  • Best for CPU-heavy operations that can be divided into independent tasks
  • Implementations:
    • Multiprocessing: true parallelism with multiple cores/CPUs
    • Multithreading: threads can run in parallel on multiple cores

Synchronous vs Asynchronous

Synchronous

  • Tasks are executed one after another, and each task must finish before the next one begins
  • Easier to reason about but can be inefficient, especially when dealing with I/O or waiting tasks
  • Can also use subroutines
  • Example: Reading a file line by line; each line is processed in sequence

Asynchronous

  • Non-blocking operations; uses event loops or promises / callbacks (e.g. async / await)
  • Single-threaded, but allows tasks to pause while waiting for I/O, freeing up the CPU to handle other tasks
  • Can also use coroutines
  • While it's non-blocking, a "greedy" task that is CPU-bound and runs for too long without yielding can monopolize the event loop
    • Solution:
      • Splitting large tasks: break the greedy task into smaller chunks that yield back control using mechanisms like await or setTimeout
      • Multi-threading / multi-processing: for CPU-bound tasks, consider using worker threads or separate processes to offload the intensive work, preventing them from blocking the event loop

Race Conditions, Locks, and Deadlocks

Race conditions

  • Occur when the outcome of a program depends on the sequence or timing of uncontrollable events like thread execution
  • Typically happens in multithreading when threads access and modify shared resources without synchronization
  • Example: Two threads attempting to increment a shared counter without proper locking

Locks

  • A mechanism to control access to shared resources in multithreading environments to avoid race conditions
  • Only one thread can hold the lock at any time, ensuring no two threads modify a shared resource simultaneously

Deadlocks

  • A situation where two or more threads/processes are waiting for each other to release resources, causing them to be stuck indefinitely
  • Example: Thread A locks Resource 1 and waits for Resource 2, while Thread B locks Resource 2 and waits for Resource 1

Subroutines vs Coroutines

Subroutines

  • Traditional functions or methods where control is passed from one function to another
  • Once the subroutine completes, control is returned to the caller
  • Example: Function A calls Function B, then waits for B to complete before continuing execution

Coroutines

  • Generalized version of subroutines that can pause execution and yield control back to the caller without terminating
  • Useful in asynchronous programming to handle non-blocking operations (e.g., file I/O or network calls)
  • Example: A coroutine function performing an I/O operation can yield while waiting for the response, allowing the CPU to do other work

Other comparisons

I/O task handling

  • Asynchronous and coroutines optimize single-threaded I/O
  • Concurrency and multithreading involve managing multiple tasks, either by sharing time or running in parallel

Concurrency vs Asynchronous

  • Asynchronous is a method to achieve concurrent execution, focusing on non-blocking I/O operations
  • While concurrency is a broader concept that includes multiple ways of progressing tasks
⚠️ **GitHub.com Fallback** ⚠️