20250715 - cywongg/2025 GitHub Wiki

Below is a set of hierarchical notes summarizing the lecture transcript. Each section is framed as a question to prompt deeper reflection and learning. All examples and details mentioned in the transcript are incorporated and explained intuitively.

---

# Lecture Notes on Process Execution

---

## 1. Introduction to Process Execution  
**Q: What is a simple process execution and why is it important for understanding how programs run?**  

- **Overview:**  
  - The lecture begins by demonstrating a *step-by-step process execution*, focusing on how the CPU “walks” through a process.
  - Emphasis is placed on the **program counter (PC)**—a register that directs the CPU to the next instruction.
  
- **Key Points:**  
  - **Step-by-Step Execution:**  
    - The process execution is demonstrated by stepping through instructions one by one.
    - Understanding this helps in grasping how every instruction is fetched, executed, and how data flows within the CPU.
  
  - **Practical Demonstration:**  
    - The execution process is visualized with animations (even though later slides become less animated due to tedium).
    - This tangible step-through makes abstract concepts (like the PC) more accessible.

---

## 2. Compilation Process Overview  
**Q: How is high-level C code transformed into a machine-executable process, and what are the steps involved?**  

- **Compilation Steps:**  
  - **C Code → Assembly:**  
    - In the previous lecture, C code was written and then compiled into assembly language.
    - A demo showed this transformation and explained links to external libraries via the linker.
  
  - **Assembly → Machine Code:**  
    - The assembly code is then assembled into machine code (binary), which the CPU can execute.
    - Although the machine code is printed, it is not human-readable due to its binary nature.

- **Why It Matters:**  
  - Understanding the compilation chain is crucial as it influences both program performance (e.g., caching and memory access patterns) and security (for example, keeping the code section read-only).

---

## 3. Structure of Process Memory  
**Q: Why is it critical to understand the layout of a process in memory?**  

- **Memory Layout:**  
  - **Processes in Memory:**  
    - Multiple processes reside in system memory. Each process occupies a dedicated memory space that is divided into sections (e.g., text/code, data, stack).
  
  - **The Text (Code) Area:**  
    - The portion of memory containing the actual code is called the *text area* (or simply *code*).
    - Once a process is loaded and running, its text area is **read-only** for both security and caching reasons.
    - An exception occurs with just-in-time (JIT) compilation used in languages like JavaScript or Java.
  
  - **Memory Pages and Swapping:**  
    - Memory is subdivided into *pages* (blocks of continuous memory).
    - The OS (kernel) monitors these pages; if a page isn’t used for a long time, it might be swapped out to disk to free memory resources.
    - **Example:** A process running multiple pages might have some pages “kicked out” if unused until needed again (a “cold start” when the page is pulled back from disk).

---

## 4. CPU Registers and Their Roles  
**Q: What roles do the program counter and instruction register play in process execution?**  

- **Key CPU Registers:**  
  - **Program Counter (PC):**  
    - Holds the address of the current or next instruction.
    - It is incremented (e.g., by four bytes in a 32-bit system or eight bytes in a 64-bit system) after fetching an instruction to point to the subsequent one.
    - **Example:** After executing an instruction, the PC is explicitly incremented by the instruction’s size.
  
  - **Instruction Register (IR):**  
    - Once the CPU fetches an instruction from memory, it is loaded into the IR.
    - The instruction is then decoded and executed.
    - **Note:** Even though we see assembly-like instructions in the lecture for clarity, they represent underlying machine code operations.

- **CPU Execution Flow:**  
  - The process starts with an empty CPU state.
  - The OS schedules a process, loading its code and updating the PC.
  - The instruction is fetched (from memory or cache), loaded to the IR, executed, and the result (if any) is written back to memory.

---

## 5. Process Scheduling and Core Utilization  
**Q: How does the kernel schedule processes onto CPU cores, and why is a one process-per-core model standard?**  

- **Scheduling on a Core:**  
  - The CPU (or more precisely, a single core) executes one process at a time.
  - Even with hyperthreading (a technology introduced by Intel), a physical core is represented as two virtual cores:
    - Each virtual core gets its own set of registers.
    - However, some resources like cache may be shared, leading to possible performance trade-offs.
  
- **Real-World Considerations:**  
  - **Example:** In hyperthreading, while two processes might run on what appears to be separate cores, they share lower-level caches (L1/L2), affecting overall performance.
  - Performance can sometimes even be lower with hyperthreading because the CPU might not perfectly anticipate the needs of both processes.

---

## 6. Detailed Instruction Execution  
**Q: What are the steps involved in fetching and executing an instruction, and why is each step significant?**  

- **The Execution Cycle:**  
  - **Fetch:**  
    - The CPU uses the PC to fetch an instruction from memory.
    - **Example Detail:**  
      - The first instruction is fetched, taking about 100 nanoseconds if not immediately available in cache.
      - Often, **bursts** of data (e.g., 64 bytes) are fetched and stored in the cache to optimize subsequent accesses.
  
  - **Decode:**  
    - Although not elaborated in deep detail, decoding occurs where the fetched instruction is interpreted.
    - This step is handled by parts of the CPU such as the ALU (Arithmetic Logic Unit) control.
  
  - **Execute:**  
    - The CPU performs the operation indicated by the instruction.
    - **Example Instruction:**  
      - A demo “move” instruction transfers a value into register R0.
  
  - **Increment the PC:**  
    - After execution, the PC is updated by adding four bytes (for a 32-bit instruction) to point to the next instruction.
    - **Discussion Point:**  
      - The lecture included a quiz-like remark on how many “add” operations occurred during the process (hinting that it is more than one simple add).

- **Memory Store:**  
  - After executing an operation, if the result must be available to other parts of the program, the CPU writes the result back to memory.
  - **Writing Through Cache:**  
    - Write operations travel through cache layers (L1, L2, L3) before reaching main memory.
    - This ensures that subsequent reads from the same memory address are fast.

---

## 7. Understanding Memory Access Latency and CPU Caches  
**Q: Why is it important to consider different memory access times, and how do caches help mitigate memory latency?**  

- **Memory Hierarchy and Latency Numbers:**  
  - **Registers:** ~0.5–1 nanosecond  
  - **L1 Cache:** ~1 nanosecond  
  - **L2 Cache:** ~2 nanoseconds  
  - **L3 Cache:** ~7 to 15 nanoseconds (depending on whether L2 or L3 is shared)  
  - **Main Memory (RAM):** ~100 nanoseconds  
  - **SSD:** ~150 microseconds  
  - **Hard Drive:** ~10 milliseconds or more

- **Implications for Performance:**  
  - Fetching an instruction that is not in the cache leads to a delay (e.g., 100 nanoseconds) compared to a cache hit.
  - Organizing code so that nearby instructions are stored together can lead to cache hits, greatly reducing access time.
  - **Example:**  
    - The lecture notes that after fetching an instruction burst (64 bytes), subsequent instructions are likely to hit the fast L1 cache.

---

## 8. Putting It All Together: A Walkthrough Example  
**Q: Can we combine all the concepts into a complete walkthrough of a simple process execution?**  

- **Execution Flow Recap:**  
  1. **Process Scheduling:**  
     - The OS loads a process into the CPU. Initially, the CPU registers (including the PC) are blank.
  
  2. **PC Initialization:**  
     - The computer reads the starting address from the executable’s header (e.g., an ELF header) and sets the PC accordingly.
  
  3. **Instruction Fetch:**  
     - The first instruction is fetched from memory (or cache) and loaded into the Instruction Register.
     - **Cost Example:**  
       - Direct memory access may take ~100 nanoseconds, but with caching, the latency is much lower.
  
  4. **Executing the Instruction:**  
     - The computer decodes the instruction (e.g., move a value into R0).
     - Execution is performed, updating registers as necessary.
  
  5. **Updating the PC:**  
     - After the instruction executes, the PC is incremented (by four bytes, for a 32-bit instruction) to point to the next instruction.
     - This increment operation (or series of adds) is done entirely within the CPU registers, not in main memory.
  
  6. **Storing the Result:**  
     - When required, results (e.g., the output of an add operation) are written back to memory.
     - The data passes through the CPU’s write caches in a write-through manner.

- **Quiz Element:**  
  - The lecture pokes fun at a quiz asking: “How many 'add' operations did we perform?”  
    - Although it might seem like one, it could be several (updating data, PC increments, and the arithmetic operation itself).

---

## 9. Final Reflections and Key Takeaways  
**Q: Why is it beneficial to understand the detailed inner workings of process execution, and how does it influence programming and system design?**  

- **Integration of Concepts:**  
  - **Efficient Process Execution:**  
    - Understanding registers, memory layout, and the execution cycle allows for more efficient code.
  
  - **Cache Optimization:**  
    - Writing code with locality in mind (keeping related instructions together) optimizes cache usage—a critical factor in performance.
  
  - **Real-World System Design:**  
    - The discussion on process scheduling, swapping (from memory to disk), and hyperthreading provides insight into designing systems that are both fast and resource-aware.
  
  - **Interactive and Iterative Learning:**  
    - The lecture’s inclusion of quizzes and live slide additions helps reinforce learning, encouraging students to question and engage with material actively.

---

By breaking the lecture down into these segments and asking questions at each heading, you create a structured overview that combines theoretical knowledge with practical examples—making it more conducive to learning and review.