CPU and clocking modifications - nealcrook/multicomp6809 GitHub Wiki

I made some edits to the VHDL for the 6809 processor to make it more friendly to an FPGA flow:

  • Changed reset to active-low, consistently asynchronous
  • Changed clock to rising-edge

The modified version is in multicomp/Components/MC6809p/cpu09p.vhd -- to use it, copy the MC6809p folder to the Components folder of your project tree, and add it to the file list in the Altara tools.

Another clocking change that I made was to clock the processor on the input (50MHz) clock rather than a divided clock, and to use the HOLD input to control the clock rate.

The 6809 does not perform a memory access on every cycle; it spends some clocks performing internal operations. The output VMA indicates when the 6809 is performing a memory access. For example, the data sheet shows that the instruction ORA with extended addressing takes 3 bytes and 5 cycles. That means, for the 5 clock cycles, three will present a valid address (VMA=1) and the remaining 2 are internal operations of the CPU (VMA=0). Of course, in this case the three cycles are (1) opcode (2) address high (3) address low.

The clock control logic tracks VMA and runs the 6809 fast when it is performing internal operations (VMA=0) and slow when it is performing bus accesses (accesses to I/O or to on-FPGA RAM/ROM or to off-FPGA memory).

From a timing point of view, it is only necessary to run the 6809 slow for off-FPGA memory accesses. The clock control logic could track both VMA and the address in order to achieve this. However, the additional complexity is not justified: I/O cycles are not performance-critical, on-FPGA RAM is only used in trivial system and on-FPGA ROM is disabled after boot except for a FORTH-only environment.

Running the 6809 "fast" involves asserting HOLD one clock in two, to give an effective clock rate of 50/2=25MHz. Running it "slow" involves asserting HOLD for 4 clocks in 5, to give an effective clock rate of 50/5=10MHz. The net effective clock rate is somewhere between the two frequencies, depending upon the code stream being executed.

The final changes I made in this area of the design were to create control signals for the external RAM straight from the clocking state machine (to get better timing that could be achieved by gating them with clocks), and to generate a read strobe and write strobe for use on all of the internal devices.

Here are code snippets to show how this all goes together. First, modified declarations:

    signal n_WR                   : std_logic;
    signal n_RD                   : std_logic;
    signal n_cpuWr                : std_logic;
    signal hold                   : std_logic;
    signal vma                    : std_logic;
    signal state                  : std_logic_vector(2 downto 0);
    signal n_WR_uart              : std_logic := '1';
    signal n_RD_uart              : std_logic := '1';

    signal n_WR_sd                : std_logic := '1';
    signal n_RD_sd                : std_logic := '1';

    signal n_WR_vdu               : std_logic := '1';
    signal n_RD_vdu               : std_logic := '1';

    signal wren_Ram1              : std_logic := '1';

    signal romInhib               : std_logic := '0'; -- from the memmapper
    signal ramWrInhib             : std_logic := '0'; -- from the memmapper

(I might have missed a couple but it should be obvious when you try to compile)

The modified CPU entity looks like this:

    cpu1 : entity work.cpu09p
    port map(
                        clk => clk,
                        rst_n => n_reset,
                        rw => n_cpuWr,
                        vma => vma,
                        addr => cpuAddress,
                        data_in => cpuDataIn,
                        data_out => cpuDataOut,
                        halt => '0',
                        hold => hold,
                        irq => '0',
                        firq => '0',
                        nmi => '0');

And the clocking control looks like this:

process (clk) begin
    if rising_edge(clk) then
        -- Enable for baud rate generator
        serialClkCount <= serialClkCount_d;
        if serialClkCount(15) = '0' and serialClkCount_d(15) = '1' then
            serialClkEn <= '1';
        else
            serialClkEn <= '0';
        end if;

        -- state control - counter influenced by VMA
        if state = 0 and vma = '0' then
            state <= "100";
        else
            if state < 4 then
                state <= state + 1;
            else
                -- this gives the 4->0 transition and also provides
                -- synchronous reset.
                state <= (others=>'0');
            end if;
        end if;

        -- decode HOLD from state and VMA
        if state = 3 or (state = 0 and vma = '0') then
            hold <= '0'; -- run the clock
        else
            hold <= '1'; -- pause the clock
        end if;

        -- decode memory and RW control from state etc.
        if (state = 1 or state = 2 or state = 3) then
            if n_cpuWr = '0' then
                n_WR <= '0';
                n_sRamWE <= (n_sRamCSHi_i and n_sRamCSLo_i) or ramWrInhib ; -- synchronous and glitch-free
            else
                n_RD <= '0';
                n_sRamOE <= n_sRamCSHi_i and n_sRamCSLo_i; -- synchronous and glitch-free
            end if;
        else
            n_WR <= '1';
            n_RD <= '1';
            n_sRamWE <= '1';
            n_sRamOE <= '1';
        end if;
        -- Serial clock DDS
        serialClkCount <= serialClkCount + 2416;
    end if;
end process;

Each of the on-chip peripherals is connected up like this:

    n_WR_uart <= n_interface2CS or n_WR;
    n_RD_uart <= n_interface2CS or n_RD;

Operation of the clock control

The CPU input clock is 50MHz and the HOLD input acts as a clock enable. When the CPU is executing internal cycles (indicated by VMA=0), HOLD asserts on alternate cycles so that the effective clock rate is 25MHz. When the CPU is performing memory accesses (VMA=1), HOLD asserts for 4 cycles in 5 so that the effective clock rate is 10MHz. The slower cycle time is calculated to meet the access time for the external RAM.

The n_WR, n_RD signals (and the SRAM WE/OE signals) are asserted for the last 4 cycles of the 5-cycle access; these are not the critical path for the access: the critical path is the addresss and chip select, which are nominally valid for all 5 cycles.

The clock control is implemented by a counter, which tracks VMA. The HOLD and n_WR, n_RD controls are a synchronous decode from the counter:

  • When VMA=0, state transitions 0,4,0,4,0,4...
  • When VMA=1, state transitions 0,1,2,3,4,0,1,2,3,4...

In both cases, HOLD is negated (clock runs) when state=4 and so the CPU address (and VMA) transitions when state goes 4->0.

Here is a simulation waveform showing the operation of the clock control logic:

{{https://github.com/nealcrook/multicomp6809/blob/master/photos/clk_ctl.png|Clock control sim waves}}

You can also browse to the image directly at https://github.com/nealcrook/multicomp6809/blob/master/photos/clk_ctl.png (since the WIKI does not allow the image to be zoomed).

Clock speed-up options (if your RAM can take it)

You can easily take 1 or 2 cycles out of this timing (eg to remove 1 cycle change 3 to 2 and 4 to 3 in the state logic).

Theoretically, since the 6809 timing-closes at 50MHz, you can eliminate the wait state from the VMA=0 cycles. However, that would mean generating HOLD combinatorially from VMA which might introduce a timing loop. I have not tried it.

(If you try and fail to get it all compiling based on these instructions, use the "issues" (!) button on the right-hand side to contact me for help).

⚠️ **GitHub.com Fallback** ⚠️