Ramulator Wiki - i7mist/ramulator GitHub Wiki

Quick start with example simulator

There are two example simulator instantiations for DRAM trace and CPU trace. (src/Main.cpp) You can compile and run it directly or configure it by slightly modifying these two examples. For the guidance to running these two examples without modification to the source code, please refer to CMU-SAFARI/ramulator

Configure the simulator

Ramulator supports many attributes that can be configured, including DRAM standards, scheduling mechanisms etc. This section is going to introduce how to configure ramulator by slightly modifying src/Main.cpp.

Standalone simulation

DRAM standards

To specify DRAM standards, please specialize channels and controllers with correct DRAM standard name. And use appropriate parameters for other DRAM specifications.

e.g. If you want to simulate DDR4, replace all '<DDR3>' with '<DDR4>'

Other DRAM specifications

  • Channel/Rank number

This can be configured without modifying the source code. It is implemented by instantiating specified number of channels and insert specified number of ranks.

  • DRAM organization/speed/type/number of subarray/...

Specify the organization/speed parameters/... when initializing the DRAM spec. Please refer to the definition of the constructor of target standard.

e.g.

// DRAM organization, speed
DDR3* ddr3 = new DDR3(DDR3::Org::DDR3_2Gb_x8, DDR3::Speed::DDR3_1600K);
// DRAM organization, speed, type, number of subarray
SALP* salp_1 = new SALP(SALP::Org::SALP_2Gb_x8, SALP::Speed::SALP_1600K, SALP::Type::SALP_1, 8);

Controller specifications

Scheduling mechanism and row policy can be easily configured by changing the default type of class Scheduler and RowPolicy (in src/Scheduler.h)

Further explanations can be found in next section about simulator organizaion.

Integrated with Gem5

Compile Gem5 integrated with Ramulator

$ hg clone http://repo.gem5.org/gem5-stable
$ cd gem5-stable
$ patch -Np1 --ignore-whitespace < /path/to/ramulator/gem5-2adc17b55ed4-ramulator.patch
$ cd ext/ramulator
$ mkdir Ramulator
$ cp -r /path/to/ramulator/src Ramulator
$ scons build/<arch>/gem5.<binary> -j8 # Compile gem5

Requirements for Ubuntu:

  • Ramulator needs to be compiled by clang, so before compile the patched gem5, please set the environment variable CC and CXX as below:
export CC=clang
export CXX=clang++

Run Gem5 integrated with Ramulator

The gem5 command line constitutes of four parts as below. To run it with Ramulator, please add --mem-type=ramulator and --ramulator-config= in the [script options] part.

$ <gem5 binary> [gem5 options] <simulation script> [script optRun gem5 with --mem-type=ramulator and --ramulator-config=<ramulator-config-filename>

Here is the command to run Hello World example with Gem5 integrated with Ramulator.

$ ./build/<arch>/gem5.opt ./configs/example/se.py --mem-type=ramulator --ramulator-config=gem5-config.cfg --cpu-type=timing --caches --cmd=tests/test-progs/hello/bin/<arch>/linux/hello

Configure Gem5 with Ramulator

There is a config file for configuring ramulator integrated with gem5. Users can specify the DRAM standard, number of channels and ranks (also subarrays for SALP), speed parameters, and DRAM orgnization.

The options for DRAM standards available in Gem5 are listed in src/Gem5Wrapper.cpp. For other options, please refer to the next part.

static map<string, function<MemoryBase *(map<string, string>&, int)> > name_to_func = {
    {"DDR3", &MemoryFactory<DDR3>::create}, {"DDR4", &MemoryFactory<DDR4>::create},
    {"LPDDR3", &MemoryFactory<LPDDR3>::create}, {"LPDDR4", &MemoryFactory<LPDDR4>::create},
    {"GDDR5", &MemoryFactory<GDDR5>::create}, 
    {"WideIO", &MemoryFactory<WideIO>::create}, {"WideIO2", &MemoryFactory<WideIO2>::create},
    {"HBM", &MemoryFactory<HBM>::create},
    {"SALP-1", &MemoryFactory<SALP>::create}, {"SALP-2", &MemoryFactory<SALP>::create}, {"SALP-MASA", &MemoryFactory<SALP>::create},
};

Options parameters for DRAM standards and speed

The options for speed parameters and organization are listed in src/T.cpp (T stands for DRAM standard name), taking src/DDR3.cpp as an example:

  • Organization:
map<string, enum DDR3::Org> DDR3::org_map = {
    {"DDR3_512Mb_x4", DDR3::Org::DDR3_512Mb_x4}, {"DDR3_512Mb_x8", DDR3::Org::DDR3_512Mb_x8}, {"DDR3_512Mb_x16", DDR3::Org::DDR3_512Mb_x16},
    {"DDR3_1Gb_x4", DDR3::Org::DDR3_1Gb_x4}, {"DDR3_1Gb_x8", DDR3::Org::DDR3_1Gb_x8}, {"DDR3_1Gb_x16", DDR3::Org::DDR3_1Gb_x16},
    {"DDR3_2Gb_x4", DDR3::Org::DDR3_2Gb_x4}, {"DDR3_2Gb_x8", DDR3::Org::DDR3_2Gb_x8}, {"DDR3_2Gb_x16", DDR3::Org::DDR3_2Gb_x16},
    {"DDR3_4Gb_x4", DDR3::Org::DDR3_4Gb_x4}, {"DDR3_4Gb_x8", DDR3::Org::DDR3_4Gb_x8}, {"DDR3_4Gb_x16", DDR3::Org::DDR3_4Gb_x16},
    {"DDR3_8Gb_x4", DDR3::Org::DDR3_8Gb_x4}, {"DDR3_8Gb_x8", DDR3::Org::DDR3_8Gb_x8}, {"DDR3_8Gb_x16", DDR3::Org::DDR3_8Gb_x16},
};
  • Speed:
map<string, enum DDR3::Speed> DDR3::speed_map = {
    {"DDR3_800D", DDR3::Speed::DDR3_800D}, {"DDR3_800E", DDR3::Speed::DDR3_800E},
    {"DDR3_1066E", DDR3::Speed::DDR3_1066E}, {"DDR3_1066F", DDR3::Speed::DDR3_1066F}, {"DDR3_1066G", DDR3::Speed::DDR3_1066G},
    {"DDR3_1333G", DDR3::Speed::DDR3_1333G}, {"DDR3_1333H", DDR3::Speed::DDR3_1333H},
    {"DDR3_1600H", DDR3::Speed::DDR3_1600H}, {"DDR3_1600J", DDR3::Speed::DDR3_1600J}, {"DDR3_1600K", DDR3::Speed::DDR3_1600K},
    {"DDR3_1866K", DDR3::Speed::DDR3_1866K}, {"DDR3_1866L", DDR3::Speed::DDR3_1866L},
    {"DDR3_2133L", DDR3::Speed::DDR3_2133L}, {"DDR3_2133M", DDR3::Speed::DDR3_2133M},
};

Simulator Organization

This section introduces the simulator organization. It will explain the function of each class and tell the way to customize each part of a DRAM system.

Overview

Simulator Organization

Memory

Memory is formed by a vector of controllers. and each controller simulates the control of a channel. This is the top level to instantiate a DRAM system. It receive requests from a frontend dispatching them to the correct controller. Address mapping is done here. The address of a request is decomposed to a list of id for each level in DRAM's hierarchy, and the vector is generally named as addr_vec.

Controller

Controller combines a variety of parts in DRAM, including a DRAM tree (from channel level), a scheduler, rowtable to store the metadata of activated rows, rowpolicy to select rows that is to be applied to speculative commands and refresh to schedule refresh commands separately from scheduler for other commands. It also saves requests sent to this controller, and separate them in three categories: read requests, write requests and other requests (e.g. refresh)

The controller's pointer (named ctrl) is stored in all members mentioned above. All functions registered in those members access the structure that records the state of the system referenced from ctrl.

The function tick is used to simulate the next cycle which is going to update the state and timing of DRAM nodes and update clk.

DRAM:

Ramulator modeling different levels in DRAM in a hierarchy (i.e.,tree) of state machines. DRAM class is Ramulator's generalized template for building the hierarchy. An instance of the DRAM class is a node of the tree. The controller interacts with the tree through only the root node, channel. All commands are served recursively from channel.

The workflow of serving a request is arranged with member functions from DRAM. Request sent from the root node (channel), and dispatched to their children recursively.

DRAM specification:

These specifications are assigned from a DRAM standard instance. Their functions will be further explained later.

  • lambda (lookup table for lambda functions): Update state.

  • prereq (lookup table for lambda functions): Return prerequisite command.

  • timing (vector of T::TimingEntry): Memory timing parameters.

DRAM state:

  • state: State of this node. (e.g., Opened, Closed)

  • row_state: State of rows. (There are too many rows for them to be initialized individually, so rows are not initialized as nodes, and their bank (or an equivalent entity) tracks their state for them.)

  • next: For commands, the earliest time that it could be ready. Which is

Behavior of State-Machines

  • decode: Get prerequisite command.

  • check: Check whether this command is ready.

  • update: Update the timing/state of a tree. After a command has been issued.

The memory controller relies on these three functions to serve a memory request. First, it use decode to get_first_cmd, then it use check to verify whether the command is ready. If the check is passed, then it calls issue_cmd, which calls update to update the state and timing of each DRAM node.

Scheduler:

The class Scheduler is used to simulate the memory scheduler. 3 scheduling mechanisms are implemented in the baseline design.

enum class Type {
    FCFS, FRFCFS, FRFCFS_Cap, MAX
} type = Type::FRFCFS;

The mechanism is specified by type. Default mechanism is FRFCFS (First Ready, First Come First Serve)

list<Request>::iterator get_head(list<Request>& q)

This interface is used to extract the request with the highest priority in current scheduling mechanism.

typedef list<Request>::iterator ReqIter;
function<ReqIter(ReqIter, ReqIter)> compare[int(Type::MAX)]

To customize your own scheduler, you can simply add compare function for the new scheduling mechanism.

RowPolicy:

Generally, This class can be used to get some activated rows in the DRAM that meet a certain condition. When it couldn't find a command to schedule, it do something speculative. (e.g. to precharge a row) The use of RowPolicy is to decide which row should be applied the speculative command to. It includes some functions to get one row under a certain condition from the rowtable.

enum class Type {
    Closed, Opened, Timeout, MAX
} type = Type::Opened;

3 types of row policy are supported. Selected type is stored in member variable type. Default value is Type::Opened.

  • Closed: Return the first row in the rowtable that is ready for the command.
  • Opened: Return an empty indexing vector. (This is the default value, which causes no row to be precharged.)
  • Timeout: Return the first row in the rowtable that is ready for the command and has been timeout.
int timeout = 50;

Specify the timeout length. (For Type::Timeout)

vector<int> get_victim(typename T::Command cmd)

Get the victim row for command cmd:

function<vector<int>(typename T::Command)> policy[int(Type::MAX)]

policy is a lambda table to handle different row policy. If you want to add a row policy, please add a new type in enum class Type and add the corresponding handler in policy.

RowTable:

Maintains currently activated rows and their metadata. The metadata is maintained by the following structure in table:

struct Entry {
        int row;
        int hits;
        long timestamp;
}

Member functions:

void update(typename T::Command cmd, const vector<int>& addr_vec, long clk)
  • Update table by Command cmd, addr_vec is the index of the target row, clk specifies the time to serve the command, which is used for updating timestamp. The type of cmd is given by spec in ctrl.
int get_hits(vector<int>& addr_vec)
  • Return the hit number of a row. addr_vec is the index to specify a row in a DRAM tree.

Refresh:

This is a refresh scheduler. Currently it supports all-bank refresh and per-bank refresh.

Standard Specification

Ramulator extracts the full specification for the hierarchy and behavior, which is then entirely consolidated into just a single class. The following instructions take DDR3.h/cpp as an example.

In class DDR3, DRAM organization, speed, and timing parameters are specified in org_map, speed_map and timing respectively.

// in DDR3.h
struct TimingEntry
{
    Command cmd;
    int dist; // The history buffer size for this timing parameter
    int val; // The value of the timing parameter
    bool sibling; // Whether the timing parameter is used by target node or its sibling nodes
};

Lookup table lambda contains lambda functions for updating the state after a command in a certain level.

/* Lambda */
// in DDR3.h
function<void(DRAM<DDR3>*, int)> lambda[int(Level::MAX)][int(Command::MAX)];
// in DDR3.cpp
lambda[int(Level::Bank)][int(Command::ACT)] = [] (DRAM<DDR3>* node, int id) {
    node->state = State::Opened;
    node->row_state[id] = State::Opened;};

Lookup table prereq contains lambda functions to get prerequisite commands for a command in a certain level.

// in DDR3.h
function<Command(DRAM<DDR3>*, Command cmd, int)> prereq[int(Level::MAX)][int(Command::MAX)];
// in DDR3.cpp
prereq[int(Level::Rank)][int(Command::RD)] = [] (DRAM<DDR3>* node, Command cmd, int id) {
    switch (int(node->state)) {
        case int(State::PowerUp): return Command::MAX;
        case int(State::ActPowerDown): return Command::PDX;
        case int(State::PrePowerDown): return Command::PDX;
        case int(State::SelfRefresh): return Command::SRX;
        default: assert(false); 

state specifies the current state of this node.

⚠️ **GitHub.com Fallback** ⚠️