Implementing an Instruction Cache for Magic‐1 - retrotruestory/M1DEV GitHub Wiki

Implementing an Instruction Cache for Magic-1: Critical Technical Details

Adding an instruction cache to Magic-1 would be one of the most impactful architectural improvements. Based on the verified architecture documentation, here's a detailed technical implementation approach:

Current Architecture Constraints

The Magic-1 currently has:

No hardware cache
16-bit address space (64KB total)
2KB page size
Memory-mapped I/O at 0xF800-0xFFFF
Three general-purpose registers (a, b, c)
Limited microcode space

Instruction Cache Design

1. Cache Structure

// Simplified Verilog description of proposed cache
module instruction_cache (
    input [15:0] address,          // Instruction address
    input clock, reset,
    output [15:0] instruction,     // Fetched instruction
    output hit                     // Cache hit indicator
);
    // Direct-mapped cache with 256 entries (512 bytes)
    // Each entry contains a 16-bit instruction and tag
    reg [15:0] cache_data[255:0];  // Instruction storage
    reg [7:0] cache_tag[255:0];    // Tag bits (upper 8 bits of address)
    reg cache_valid[255:0];        // Valid bits

    // Internal signals
    wire [7:0] index;              // Cache line index
    wire [7:0] tag;                // Address tag
    
    // Address breakdown
    assign index = address[7:0];   // Lower 8 bits form the index
    assign tag = address[15:8];    // Upper 8 bits form the tag

    // Cache hit detection
    assign hit = cache_valid[index] && (cache_tag[index] == tag);
    
    // Output data on hit
    assign instruction = hit ? cache_data[index] : 16'hZZZZ;
    
    // Cache fill logic would be implemented here
endmodule

2. Integration with Instruction Fetch

// Integration with existing fetch logic
module fetch_unit (
    input [15:0] pc,              // Program counter
    output [15:0] instruction     // Instruction to execute
);
    wire cache_hit;
    wire [15:0] cache_instruction;
    wire [15:0] memory_instruction;
    
    // Try cache first
    instruction_cache icache (
        .address(pc),
        .clock(system_clock),
        .reset(system_reset),
        .instruction(cache_instruction),
        .hit(cache_hit)
    );
    
    // Only access memory on cache miss
    memory_access mem_access (
        .address(pc),
        .data_out(memory_instruction),
        .enable(!cache_hit)       // Only enable on miss
    );
    
    // Select instruction source based on hit/miss
    assign instruction = cache_hit ? cache_instruction : memory_instruction;
    
    // Cache fill logic on miss
    always @(posedge system_clock) begin
        if (!cache_hit) begin
            // Fill cache with fetched instruction
            // (Actual implementation logic here)
        end
    end
endmodule

3. Cache Management

Cache Control Registers

// New memory-mapped control registers for cache
#define ICACHE_CTRL_REG    (*(volatile uint16_t*)0xFF50)
#define ICACHE_STAT_REG    (*(volatile uint16_t*)0xFF52)
#define ICACHE_FLUSH_ADDR  (*(volatile uint16_t*)0xFF54)

// Control register bits
#define ICACHE_ENABLE      0x0001  // Enable cache
#define ICACHE_FLUSH       0x0002  // Flush entire cache
#define ICACHE_STATS       0x0004  // Enable hit/miss statistics
#define ICACHE_LINE_INV    0x0008  // Invalidate single line

// Status register bits
#define ICACHE_READY       0x0001  // Cache ready for operation
#define ICACHE_MISS_CNT    0xFF00  // High byte: miss counter
#define ICACHE_HIT_CNT     0x00FF  // Low byte: hit counter

Software Control Functions

// Enable instruction cache
void icache_enable() {
    ICACHE_CTRL_REG |= ICACHE_ENABLE;
}

// Disable instruction cache
void icache_disable() {
    ICACHE_CTRL_REG &= ~ICACHE_ENABLE;
}

// Flush entire cache
void icache_flush() {
    ICACHE_CTRL_REG |= ICACHE_FLUSH;
    while (!(ICACHE_STAT_REG & ICACHE_READY)) { } // Wait for completion
    ICACHE_CTRL_REG &= ~ICACHE_FLUSH;
}

// Invalidate specific cache line
void icache_invalidate_line(uint16_t address) {
    ICACHE_FLUSH_ADDR = address;
    ICACHE_CTRL_REG |= ICACHE_LINE_INV;
    ICACHE_CTRL_REG &= ~ICACHE_LINE_INV;
}

4. Hardware Implementation Requirements

Cache Parameters

Size: 512 bytes (256 entries × 16 bits per instruction)
- Small enough to fit on FPGA without excessive resources
- Large enough to capture common loops and function bodies
Organization: Direct-mapped (simplest to implement)
- Each address maps to exactly one cache location
- Index = address[7:0], Tag = address[15:8]
Policy: Write-through (simplest to maintain coherency)
- All writes go directly to memory
- Cache only used for instruction fetches

Critical Hardware Constraints

Device Page Handling:

// Device space (0xF800-0xFFFF) must bypass cache
wire device_space = address[15:11] == 5'b11111;
assign use_cache = !device_space;

Paging Interaction:

// Cache must respect page boundaries
// Flush on page table changes
always @(posedge page_table_write) begin
    // Invalidate all entries
    for (int i = 0; i < 256; i++) begin
        cache_valid[i] <= 1'b0;
    end
end

Timing Constraints:
- Cache lookup must complete within one clock cycle
- Cache fill can take multiple cycles
- Critical path: tag comparison → hit determination → instruction selection

5. Microcode Changes

The Magic-1's microcode would need minimal changes:

// Add cache control sequence
cache_fill:
    mem_read         // Read memory
    cache_write      // Write to cache
    next_instr       // Continue
    
// Modify fetch sequence to check cache
fetch:
    cache_check      // Check cache first
    br.hit next      // If hit, continue
    goto cache_fill  // If miss, fill cache
next:
    decode           // Decode instruction
    execute          // Execute instruction

6. Performance Implications

Based on typical program behavior:

Expected Hit Rate: ~85-90% for most code
Performance Improvement:
- ~40-60% overall speedup for computation-heavy code
- ~10-20% for I/O bound code
Memory Traffic Reduction: ~75-80% fewer instruction fetches

7. Technical Implementation Challenges

Cache Coherency:
- Self-modifying code needs special handling
- Solution: Flush cache lines on memory writes to code pages
Page Transitions:
- Cache must be flushed when page tables change
- Add a signal from the paging unit to trigger cache flush
Context Switching:
- Cache could be shared across contexts or flushed on context switch
- Recommended: share cache but flush on major context switches
Resource Constraints:
- FPGA resource usage must remain reasonable
- Time-multiplexed tag comparison can reduce logic requirements
Magic-1 Microcode Integration:
- Limited microcode space requires efficient implementation
- May need to repurpose existing microcode operations

8. Integration with Bootloader

The bootloader would need modifications to initialize the cache:

// During system initialization
void setup_cache() {
    // Initial flush to clear any undefined state
    icache_flush();
    
    // Enable cache after memory system is initialized
    icache_enable();
    
    // Register cache flush handler for page table changes
    register_pt_change_handler(icache_flush);
}

9. Critical Technical Constraints

Word Alignment:
- The cache must maintain Magic-1's requirement for 16-bit word alignment
- All cached instructions must be aligned to even addresses
Device Page Access:
- Instructions in device space (0xF800-0xFFFF) must never be cached
- Hardware must detect device address ranges and bypass cache
Paging Interaction:
- Cache uses virtual addresses (post-translation)
- Cache must be flushed on page table base (PTB) register changes
Interrupt Handling:
- Interrupt vector fetches must still work with caching
- Consider pre-loading interrupt vectors into cache

This implementation approach respects Magic-1's existing architecture while providing a significant performance boost through efficient instruction caching, particularly for code with loops and frequently executed functions.