Implementing an Instruction Cache for Magic‐1 - retrotruestory/M1DEV GitHub Wiki
Implementing an Instruction Cache for Magic-1: Critical Technical Details
Adding an instruction cache to Magic-1 would be one of the most impactful architectural improvements. Based on the verified architecture documentation, here's a detailed technical implementation approach:
Current Architecture Constraints
The Magic-1 currently has:
- No hardware cache
- 16-bit address space (64KB total)
- 2KB page size
- Memory-mapped I/O at 0xF800-0xFFFF
- Three general-purpose registers (a, b, c)
- Limited microcode space
Instruction Cache Design
1. Cache Structure
// Simplified Verilog description of proposed cache
module instruction_cache (
input [15:0] address, // Instruction address
input clock, reset,
output [15:0] instruction, // Fetched instruction
output hit // Cache hit indicator
);
// Direct-mapped cache with 256 entries (512 bytes)
// Each entry contains a 16-bit instruction and tag
reg [15:0] cache_data[255:0]; // Instruction storage
reg [7:0] cache_tag[255:0]; // Tag bits (upper 8 bits of address)
reg cache_valid[255:0]; // Valid bits
// Internal signals
wire [7:0] index; // Cache line index
wire [7:0] tag; // Address tag
// Address breakdown
assign index = address[7:0]; // Lower 8 bits form the index
assign tag = address[15:8]; // Upper 8 bits form the tag
// Cache hit detection
assign hit = cache_valid[index] && (cache_tag[index] == tag);
// Output data on hit
assign instruction = hit ? cache_data[index] : 16'hZZZZ;
// Cache fill logic would be implemented here
endmodule
2. Integration with Instruction Fetch
// Integration with existing fetch logic
module fetch_unit (
input [15:0] pc, // Program counter
output [15:0] instruction // Instruction to execute
);
wire cache_hit;
wire [15:0] cache_instruction;
wire [15:0] memory_instruction;
// Try cache first
instruction_cache icache (
.address(pc),
.clock(system_clock),
.reset(system_reset),
.instruction(cache_instruction),
.hit(cache_hit)
);
// Only access memory on cache miss
memory_access mem_access (
.address(pc),
.data_out(memory_instruction),
.enable(!cache_hit) // Only enable on miss
);
// Select instruction source based on hit/miss
assign instruction = cache_hit ? cache_instruction : memory_instruction;
// Cache fill logic on miss
always @(posedge system_clock) begin
if (!cache_hit) begin
// Fill cache with fetched instruction
// (Actual implementation logic here)
end
end
endmodule
3. Cache Management
Cache Control Registers
// New memory-mapped control registers for cache
#define ICACHE_CTRL_REG (*(volatile uint16_t*)0xFF50)
#define ICACHE_STAT_REG (*(volatile uint16_t*)0xFF52)
#define ICACHE_FLUSH_ADDR (*(volatile uint16_t*)0xFF54)
// Control register bits
#define ICACHE_ENABLE 0x0001 // Enable cache
#define ICACHE_FLUSH 0x0002 // Flush entire cache
#define ICACHE_STATS 0x0004 // Enable hit/miss statistics
#define ICACHE_LINE_INV 0x0008 // Invalidate single line
// Status register bits
#define ICACHE_READY 0x0001 // Cache ready for operation
#define ICACHE_MISS_CNT 0xFF00 // High byte: miss counter
#define ICACHE_HIT_CNT 0x00FF // Low byte: hit counter
Software Control Functions
// Enable instruction cache
void icache_enable() {
ICACHE_CTRL_REG |= ICACHE_ENABLE;
}
// Disable instruction cache
void icache_disable() {
ICACHE_CTRL_REG &= ~ICACHE_ENABLE;
}
// Flush entire cache
void icache_flush() {
ICACHE_CTRL_REG |= ICACHE_FLUSH;
while (!(ICACHE_STAT_REG & ICACHE_READY)) { } // Wait for completion
ICACHE_CTRL_REG &= ~ICACHE_FLUSH;
}
// Invalidate specific cache line
void icache_invalidate_line(uint16_t address) {
ICACHE_FLUSH_ADDR = address;
ICACHE_CTRL_REG |= ICACHE_LINE_INV;
ICACHE_CTRL_REG &= ~ICACHE_LINE_INV;
}
4. Hardware Implementation Requirements
Cache Parameters
- Size: 512 bytes (256 entries × 16 bits per instruction)
- Small enough to fit on FPGA without excessive resources
- Large enough to capture common loops and function bodies
- Organization: Direct-mapped (simplest to implement)
- Each address maps to exactly one cache location
- Index = address[7:0], Tag = address[15:8]
- Policy: Write-through (simplest to maintain coherency)
- All writes go directly to memory
- Cache only used for instruction fetches
Critical Hardware Constraints
-
Device Page Handling:
// Device space (0xF800-0xFFFF) must bypass cache wire device_space = address[15:11] == 5'b11111; assign use_cache = !device_space;
-
Paging Interaction:
// Cache must respect page boundaries // Flush on page table changes always @(posedge page_table_write) begin // Invalidate all entries for (int i = 0; i < 256; i++) begin cache_valid[i] <= 1'b0; end end
-
Timing Constraints:
- Cache lookup must complete within one clock cycle
- Cache fill can take multiple cycles
- Critical path: tag comparison → hit determination → instruction selection
5. Microcode Changes
The Magic-1's microcode would need minimal changes:
// Add cache control sequence
cache_fill:
mem_read // Read memory
cache_write // Write to cache
next_instr // Continue
// Modify fetch sequence to check cache
fetch:
cache_check // Check cache first
br.hit next // If hit, continue
goto cache_fill // If miss, fill cache
next:
decode // Decode instruction
execute // Execute instruction
6. Performance Implications
Based on typical program behavior:
- Expected Hit Rate: ~85-90% for most code
- Performance Improvement:
- ~40-60% overall speedup for computation-heavy code
- ~10-20% for I/O bound code
- Memory Traffic Reduction: ~75-80% fewer instruction fetches
7. Technical Implementation Challenges
-
Cache Coherency:
- Self-modifying code needs special handling
- Solution: Flush cache lines on memory writes to code pages
-
Page Transitions:
- Cache must be flushed when page tables change
- Add a signal from the paging unit to trigger cache flush
-
Context Switching:
- Cache could be shared across contexts or flushed on context switch
- Recommended: share cache but flush on major context switches
-
Resource Constraints:
- FPGA resource usage must remain reasonable
- Time-multiplexed tag comparison can reduce logic requirements
-
Magic-1 Microcode Integration:
- Limited microcode space requires efficient implementation
- May need to repurpose existing microcode operations
8. Integration with Bootloader
The bootloader would need modifications to initialize the cache:
// During system initialization
void setup_cache() {
// Initial flush to clear any undefined state
icache_flush();
// Enable cache after memory system is initialized
icache_enable();
// Register cache flush handler for page table changes
register_pt_change_handler(icache_flush);
}
9. Critical Technical Constraints
-
Word Alignment:
- The cache must maintain Magic-1's requirement for 16-bit word alignment
- All cached instructions must be aligned to even addresses
-
Device Page Access:
- Instructions in device space (0xF800-0xFFFF) must never be cached
- Hardware must detect device address ranges and bypass cache
-
Paging Interaction:
- Cache uses virtual addresses (post-translation)
- Cache must be flushed on page table base (PTB) register changes
-
Interrupt Handling:
- Interrupt vector fetches must still work with caching
- Consider pre-loading interrupt vectors into cache
This implementation approach respects Magic-1's existing architecture while providing a significant performance boost through efficient instruction caching, particularly for code with loops and frequently executed functions.