Processes and Threading - robbiehume/CS-Notes GitHub Wiki
- Comparison with visuals (Jupyter nb)
- Linus Torvaldes' view on processes vs threads
- Linux processes overview (with good diagrams)
Concept | Definition | Typical Use Case | Efficiency |
---|---|---|---|
Process | Independent execution unit with its own memory | Running separate applications | High overhead, isolated |
Thread | Lightweight unit of execution, shares memory within a process | Tasks within an app (e.g., handling multiple tabs in a browser) | Lightweight, requires synchronization |
Synchronous | Tasks run one after the other, blocking | Linear operations, easy to reason about | Less efficient for I/O tasks |
Asynchronous | Tasks can start without waiting for others to complete | Network calls, file I/O, event loops | More efficient, non-blocking |
Concurrency | Multiple tasks in progress (may not run simultaneously) | I/O-bound tasks (web servers, user interface) | Time-slicing on single core |
Parallelism | Multiple tasks executed at the same time on different cores/machines | CPU-bound tasks (matrix computations, simulations) | Requires multiple cores or machines |
Multithreading | Multiple threads within the same process, sharing memory | I/O-heavy tasks (e.g., network services) | Requires locks for data safety |
Multiprocessing | Multiple processes with separate memory spaces | CPU-heavy tasks (e.g., video rendering) | Avoids shared resource issues |
Coroutines | Functions that can pause and resume (non-blocking) | Async operations (network calls, I/O) | Efficient for async tasks |
Subroutines | Functions that pass control to another function until complete | Simple sequential execution | Simpler control flow |
- A single-core CPU can run one process at a time
- Each core inside a multi-core CPU can run one process/thread at a time
- With hyperthreading, you can achieve 2 per core by making it appear to the OS that a single core is actually 2
Process: a program executing on your computer
- A process is just a construct / abstraction an OS uses
Program: stored instructions
- A program is a static entity, a process is an active entity
Thread: a task associated with a process
- A process usually has multiple threads
Job: collection of tasks (processes)
- A job typically refers to a unit of work submitted to a job scheduler, often in batch processing or distributed systems
- Jobs can consist of multiple tasks or processes that run together to accomplish a task
- Jobs are typically used in scheduling systems (like cron jobs, batch jobs) to describe work that needs to be completed, and they may involve one or more processes
- Multiple separate processes, each with its own memory space
- Processes can run in parallel if there are multiple cores/CPUs
- Heavier than threads due to memory isolation and the need for inter-process communication (IPC)
- Example: Performing complex mathematical operations on large datasets using multiple CPU cores
- Best for: CPU-bound tasks where processes can run independently and true parallelism is required
- Multiple threads can run within the same process, sharing memory and other resources
- Threads can run concurrently (on one core) or in parallel (on multiple cores)
- Lightweight, but the complexity arises in managing shared resources
- However, since they share memory, synchronization (e.g., locks) is required to avoid data corruption (e.g., race conditions)
- Example: Downloading multiple files simultaneously in a web browser
- Best for: tasks like I/O-bound operations (e.g., network communication, file I/O) that benefit from shared memory and concurrent execution
- Concurrency and parallelism can be implemented at the same time
- For example, a program can use concurrency to manage multiple tasks and parallelism to execute those tasks at the same time on separate cores
- General concept of handling and running multiple tasks (process, program, thread) at the same time (but not necessarily simultaneously)
- Running and managing two or more tasks (process, program, thread) in overlapping time periods
- Can be achieved via multithreading, multiprocessing, or asynchronous programming
- Focuses on task / context switching rather than parallel execution
- Implementations:
- Multithreading / Multiprocessing: concurrent execution of threads / processes
- Asynchronous: non-blocking tasks, typically for I/O
- Multiple tasks are executed at the same time on different CPU cores or machines
- For true parallel execution, multiple CPUs or multiple cores on a single CPU are required
- Best for CPU-heavy operations that can be divided into independent tasks
- Implementations:
- Multiprocessing: true parallelism with multiple cores/CPUs
- Multithreading: threads can run in parallel on multiple cores
- Tasks are executed one after another, and each task must finish before the next one begins
- Easier to reason about but can be inefficient, especially when dealing with I/O or waiting tasks
- Can also use subroutines
- Example: Reading a file line by line; each line is processed in sequence
- Non-blocking operations; uses event loops or promises / callbacks (e.g.
async
/await
) - Single-threaded, but allows tasks to pause while waiting for I/O, freeing up the CPU to handle other tasks
- Can also use coroutines
- While it's non-blocking, a "greedy" task that is CPU-bound and runs for too long without yielding can monopolize the event loop
-
Solution:
-
Splitting large tasks: break the greedy task into smaller chunks that yield back control using mechanisms like
await
orsetTimeout
- Multi-threading / multi-processing: for CPU-bound tasks, consider using worker threads or separate processes to offload the intensive work, preventing them from blocking the event loop
-
Splitting large tasks: break the greedy task into smaller chunks that yield back control using mechanisms like
-
Solution:
- Occur when the outcome of a program depends on the sequence or timing of uncontrollable events like thread execution
- Typically happens in multithreading when threads access and modify shared resources without synchronization
- Example: Two threads attempting to increment a shared counter without proper locking
- A mechanism to control access to shared resources in multithreading environments to avoid race conditions
- Only one thread can hold the lock at any time, ensuring no two threads modify a shared resource simultaneously
- A situation where two or more threads/processes are waiting for each other to release resources, causing them to be stuck indefinitely
- Example: Thread A locks Resource 1 and waits for Resource 2, while Thread B locks Resource 2 and waits for Resource 1
- Traditional functions or methods where control is passed from one function to another
- Once the subroutine completes, control is returned to the caller
- Example: Function A calls Function B, then waits for B to complete before continuing execution
- Generalized version of subroutines that can pause execution and yield control back to the caller without terminating
- Useful in asynchronous programming to handle non-blocking operations (e.g., file I/O or network calls)
- Example: A coroutine function performing an I/O operation can yield while waiting for the response, allowing the CPU to do other work
- Asynchronous and coroutines optimize single-threaded I/O
- Concurrency and multithreading involve managing multiple tasks, either by sharing time or running in parallel
- Asynchronous is a method to achieve concurrent execution, focusing on non-blocking I/O operations
- While concurrency is a broader concept that includes multiple ways of progressing tasks