Using BRAMs - red-bote/VHDL_Demos GitHub Wiki

Xilinx Block RAMs

Discusses the use of BRAM resources on Xilinx FPGA with VHDL. This guide is based mostly on information from the following Xilinx manuals:

You can clone a github repo to obtain the code files for the Xilinx XST Examples.

The archive of the code samples for UG901 is a registered download, there is no direct link.

Block Memory on the FPGA

A RAM is typically implemented as a table, but can just as well be implemented as a case. As a table, the table is indexed by the RAM address which must be an integer. Therefore, if the address is input to the RAM as a SLV, it must be converted to integer.

Bulk memory storage is important for many applications. Blocks of memory can be constructed on the FPGA from the basic building elements of flip-flops and lookup-tables. An FPGA device is likely to offer some amount of dedicated bulk memory, typically referred to as block-RAM or BRAM storage.

HDL memory descriptions aren't completely analogous to discrete "physical" component RAMs and ROMs.

Asynchronous or Synchronous Memory

In order to synthesize a BRAM, the HDL must describe a synchronous memory, where either the data or the address, or both, are synchronized to the clock. One important consideration for the application is that there may need to be some compensation in the design to account for the clock-cycle consumed to read data from the block RAM. This is encountered in video image generation where the pixel data coming from a video-RAM has to be synchronized exactly to the raster scan timing in order for the image to display properly on the screen at the intended location.

If the HDL does not describe a synchronous memory, a distributed RAM will be synthesized on the FPGA. There may be an advantage to using an asynchronous memory, but the size of the memory will be limited compared to that which is typically available from block-RAM resources.

Inferred RAM Primitives

Infer as used in this context pertains to the ability of the synthesis tool to take a "generic" HDL description (no vendor-specific attributes) and realize the implementation that is best optimized for the underlying FPGA technology. The code examples discussed on this page are intended to demonstrate the Xilinx templates for writing vendor-neutral HDL code from which the tool is most likely to infer the intended implementation without explicitly referencing any underlying vendor IP.

ROMs Using Block RAM Resources

  • rams_21c is a ROM with registered address
  • rams_21a is a ROM with registered output (template 1)
  • roms_1 is nearly identical to rams_21a, it was imported from ug901-vivado-synthesis-examples and infers a BRAM.

Only the circuit using roms_1 inferred a BRAM in this test. roms_1 has The RAM_STYLE attribute (discussed in Chapter 4: HDL Coding Techniques of Vivado User Guide Synthesis UG901) which rams_21a does not have:

  attribute rom_style : string;
  attribute rom_style of ROM : signal is "block";

Complete roms_1.vhdl listing (cleaned up on VHDL Beautifier, Formatter):

-- ROM Inference on array
-- File: roms_1.vhd
library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_unsigned.all;

entity roms_1 is
    port (
        clk : in std_logic;
        en : in std_logic;
        addr : in std_logic_vector(5 downto 0);
        data : out std_logic_vector(19 downto 0)
    );
end roms_1;

architecture behavioral of roms_1 is
    type rom_type is array (63 downto 0) of std_logic_vector(19 downto 0);
    signal ROM : rom_type := (
        X"0200A", X"00300", X"08101", X"04000", X"08601", X"0233A", 
        X"00300", X"08602", X"02310", X"0203B", X"08300", X"04002", 
        X"08201", X"00500", X"04001", X"02500", X"00340", X"00241", 
        X"04002", X"08300", X"08201", X"00500", X"08101", X"00602", 
        X"04003", X"0241E", X"00301", X"00102", X"02122", X"02021", 
        X"00301", X"00102", X"02222", X"04001", X"00342", X"0232B", 
        X"00900", X"00302", X"00102", X"04002", X"00900", X"08201", 
        X"02023", X"00303", X"02433", X"00301", X"04004", X"00301", 
        X"00102", X"02137", X"02036", X"00301", X"00102", X"02237", 
        X"04004", X"00304", X"04040", X"02500", X"02500", 
        X"02500", X"0030D", X"02341", X"08201", X"0400D"
    );
    attribute rom_style : string;
    attribute rom_style of ROM : signal is "BLOCK";

begin
    process (clk)
    begin
        if rising_edge(clk) then
            if (en = '1') then
                data <= ROM(conv_integer(addr));
            end if;
        end if;
    end process;

end behavioral;

Run synthesis (or implementation) tool an check the results under the Design Runs tab, or run Utilization for more detailed information: images/rams/SQCrpV.png

Multiple Architectures in VHDL Module

The ram_comp entity shows an example of multiple alternate architectures that can instantiated from the component.

entity ram_comp is
    Port ( clk : in STD_LOGIC;
           addr : in STD_LOGIC_VECTOR(5 downto 0);
           data : out STD_LOGIC_VECTOR(19 downto 0));
end ram_comp;

architecture arch_rams_21c of ram_comp is              -- arch_rams_21c
    signal ram_addr : std_logic_vector(5 downto 0);
    signal ram_data : std_logic_vector(19 downto 0);
begin
    ram_addr <= addr;

    u_rom : entity work.rams_21c
    port map (
        clk => clk,
        en => '1',
        addr => ram_addr,
        data => ram_data
    );
    data <= ram_data;
end arch_rams_21c;

architecture arch_roms_1 of ram_comp is              -- arch_roms_1
    signal ram_addr : std_logic_vector(5 downto 0);
    signal ram_data : std_logic_vector(19 downto 0);
begin
    ram_addr <= addr;

    u_rom : entity work.roms_1
    port map (
        clk => clk,
        en => '1',
        addr => ram_addr,
        data => ram_data
    );
    data <= ram_data;
end arch_roms_1;

The following code illustrates entity instantiation with the architecture name designated explicitly:

    u_ram : entity work.ram_comp(arch_roms_1)
    Port map(
        clk => clk,
        addr => ram_addr,
        data => ram_data
        );

    u_ram : entity work.ram_comp(arch_rams_21c)
    Port map(
        clk => clk,
        addr => ram_addr,
        data => ram_data
        );

Inferring BRAM Examples

Single Port and Dual Port RAMs

Single-port RAMs will be introduced first. Simple dual-port RAMs where one input source is read-write and the other source is read-only will be considered (typical for video-RAM application).

The table below reproduced from Xilinx XST User Guide UG687 describes minimum sizes of BRAMs in the Xilinx ISE tool suite that preceded Vivado.

images/bram/4jeOhB.png

BRAM Synchronization Modes

BRAMs must be described in VHDL in such a way that the address, or the data out, or both, are registered. The typical RAM synchronizing topologies in Xilinx are:

  • write first (aka read-through)
  • read first
  • no change

The following top-level VHdL code was used to simulate BRAM synchronization modes in Vivado. The accumulator implements an unsigned up-counter with increment of 1. The accumulator has a synchronous reset which is preferred for use with BRAMs. The counter provides data input to the RAM, and address for the RAM, as well as an arbitrary periodic write-enable signal to test writing to the RAM.

----------------------------------------------------------------------------------
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;

entity rtl_top is
    Generic (constant COUNTER_BITS : integer := 16);
    Port ( clk : in STD_LOGIC;
           reset : in STD_LOGIC;
           led : out STD_LOGIC_VECTOR (15 downto 0));
end rtl_top;

architecture test_arch of rtl_top is
    signal count : std_logic_vector (COUNTER_BITS-1 downto 0);
    signal ram_wre : std_logic;
    signal ram_addr : std_logic_vector (5 downto 0);
    signal ram_din : std_logic_vector (COUNTER_BITS-1 downto 0);
    signal ram_dout : std_logic_vector (COUNTER_BITS-1 downto 0);
begin
    u_accum : entity work.accumulators_2
    generic map (
        WIDTH => COUNTER_BITS)
    port map (
        clk => clk,
        rst => reset,
        D => std_logic_vector(to_unsigned(1, COUNTER_BITS)),
        Q => count
    );
    ram_din <= count;
    ram_addr <= count(8 downto 3); -- 6-bit address is held for 8 clock cycles
    ram_wre <= (not count(9))      -- allows 64 bytes of RAM to be initialized at startup
                or (count(3) and not count(2) and count(1) and count(0))
                or (count(2) and not count(1));

    -- modify section to instantiate the RAM to be tested
    u_rams_02a : entity work.rams_02a
    port map (
        clk => clk,
        we => ram_wre,
        en => '1',
        addr => ram_addr,
        di => ram_din,
        do => ram_dout
    );

    led <= ram_dout;
end test_arch;

RAM with Asynchronous Read

Single-Port RAM with Asynchronous Read (Distributed RAM)

images/Using-BRAMs/vCrk0Z.png

The write-cycle is synchronized to the clock but the new data is also written immediately to the output port independent of the clocked process. Therefore new data is available immediately as it is written.

On a read-cycle, data is available from addressed location immediately as ram address changes. Data out is not in the clocked process (although still occurs with the clock edge as the rom address is driven by counter derived from the clock).

BRAM Write-First Mode

Single-Port BRAM Write-First Mode (recommended template)

images/Using-BRAMs/gBwQ2z.png

In a write-cycle new data is simultaneously stored to the addressed location as well as copied to an output register in the same clock period .

Single-Port BRAM Write-First Mode (registered read address template) also infers a BRAM in write-first mode.

Single-Port RAM with Synchronous Read (Read Through) is very similar. A description of the terminology "Synchronous Read (Read Through)" is found in the older documentation XST User Guide for Virtex-4, Virtex-5, Spartan-3 UG627 "A true synchronous read is the synchronization mechanism available in Virtex block RAMs, where the read address is registered on the RAM clock edge."

BRAM Read-First Mode

Single-Port BRAM Read-First Mode

images/Using-BRAMs/NwYQKW.png

In a write-cycle, stored data is "read first" with newest data appearing on the following clock period.

BRAM No-Change Mode

Single-Port BRAM No-Change Mode

images/Using-BRAMs/0VoxxI.png

Data output register is not updated during the write-cycle so new data can't appear until the next read occurs at that address.

Next