Example: I2S Audio - JulianKemmerer/PipelineC Wiki

Original URL: https://github.com/JulianKemmerer/PipelineC/wiki/Example:-I2S-Audio

pmodarty

colorpipeline

This page is describes using an Arty and a PMOD I2C DAC+ADC standard 1/8 in (3.5mm) stereo audio jacks adapter to do basic audio signal processing for distortion and digital delay effects.

This example is from a series of examples designed for the Arty Board. See that page for instructions on using the Arty board with PipelineC generated files.

Setup

Digilent provides reference files: Here is the .xdc file describing the PMOD ports for the I2S adapter. The PMOD port is mostly unchanging in its top level definition pmod.c. From those top level ports i2s_pmod.c is used to map PMOD ports to I2S signals.

Following Digilent's instructions, a basic I2S passthrough example was the first step confirmed working. The code for the I2S passthrough, and included top level ports,etc can be seen here, and a snippet below:

#define SCLK_PERIOD_MCLKS 8
#define LR_PERIOD_SCLKS 64
#pragma MAIN_MHZ app 22.579
void app()
{
  // Registers
  static uint3_t sclk_counter;
  static uint1_t sclk;
  static uint6_t lr_counter;
  static uint1_t lr;
  
  // Read the incoming I2S signals
  i2s_to_app_t from_i2s = read_i2s_pmod();
  
  // Outgoing I2S signals
  app_to_i2s_t to_i2s;
  
  // Basic loopback:
  // Only input is data, connected to output data
  to_i2s.tx_data = from_i2s.rx_data;
  // Outputs clks from registers
  to_i2s.tx_sclk = sclk;
  to_i2s.rx_sclk = sclk;
  to_i2s.tx_lrck = lr;
  to_i2s.rx_lrck = lr;
  
  // Drive I2S clocking derived from current MCLK domain
  
  // SCLK toggling at half period count
  uint1_t sclk_half_toggle = sclk_counter==((SCLK_PERIOD_MCLKS/2)-1);
  // 0->1 SCLK once per period rising edge
  uint1_t sclk_period_toggle = sclk_half_toggle & (sclk==0); 
  if(sclk_half_toggle)
  {
    // Do toggle and reset counter
    sclk = !sclk;
    sclk_counter = 0;
  }
  else
  {
    // No toggle yet, keep counting
    sclk_counter += 1;
  }
  
  // LR toggling happens per SCLK period 
  if(sclk_period_toggle)
  {
    // LR toggling at half period count
    if(lr_counter==((LR_PERIOD_SCLKS/2)-1))
    {
      // Do toggle and reset counter
      lr = !lr;
      lr_counter = 0;
    }
    else
    {
      // No toggle yet, keep counting
      lr_counter += 1;
    }
  }
  
  // Drive the outgoing I2S signals
  write_i2s_pmod(to_i2s);
}

I2S Stereo Samples Stream

Wrapping the above basic clocking logic for I2S receive and transmit, the i2s_mac.c module exposes an AXIS-like streaming interface for RX and TX.

#define sample_t q0_23_t

// I2S stereo sample types
typedef struct i2s_samples_t
{
  sample_t l_data;
  sample_t r_data;
}i2s_samples_t;
// _s 'stream' of the above data w/ valid flag
typedef struct i2s_samples_s
{
  i2s_samples_t samples;
  uint1_t valid;
}i2s_samples_s;

// RX function def w/ flow control
typedef struct i2s_rx_t
{
  i2s_samples_s samples;
  uint1_t overflow;
}i2s_rx_t;
i2s_rx_t i2s_rx(uint1_t data, uint1_t lr, uint1_t sclk_rising_edge, uint1_t samples_ready, uint1_t reset_n);

// TX function def w/ flow control
typedef struct i2s_tx_t
{
  uint1_t samples_ready;
  uint1_t data;
}i2s_tx_t;
i2s_tx_t i2s_tx(i2s_samples_s samples, uint1_t lr, uint1_t sclk_falling_edge, uint1_t reset_n);

// Single MAC module with both RX and TX
typedef struct i2s_mac_t
{
  i2s_rx_t rx;
  i2s_tx_t tx;
}i2s_mac_t;
i2s_mac_t i2s_mac(uint1_t reset_n, uint1_t rx_samples_ready, i2s_samples_s tx_samples);

i2s_mac_passthrough_app.c shows implementing an audio passthrough using this i2s_mac module.

Additionally, to aid in making user code easier to write + autopipeline, i2s_mac.c exposes an i2s_mac instance wired to globally visible wires as defined below:

// RX
typedef struct i2s_mac_rx_to_app_t
{
  i2s_samples_s samples;
  uint1_t overflow; 
}i2s_mac_rx_to_app_t;
typedef struct app_to_i2s_mac_rx_t
{
  uint1_t samples_ready;
  uint1_t reset_n;
}app_to_i2s_mac_rx_t;
// Globally visible ports/wires
i2s_mac_rx_to_app_t i2s_mac_rx_to_app;
app_to_i2s_mac_rx_t app_to_i2s_mac_rx;

// TX
typedef struct i2s_mac_tx_to_app_t
{
  uint1_t samples_ready;
}i2s_mac_tx_to_app_t;
typedef struct app_to_i2s_mac_tx_t
{
  i2s_samples_s samples;
  uint1_t reset_n;
}app_to_i2s_mac_tx_t;
// Globally visible ports/wires
i2s_mac_tx_to_app_t i2s_mac_tx_to_app;
app_to_i2s_mac_tx_t app_to_i2s_mac_tx;

The final i2s_app.c file reads+writes these globally visible wires.

Digital Delay Effect

The delay.c file contains the logic to implement a half second digital delay. A half second deep samples FIFO is filled, delaying samples, and then that delayed stream is read continuously from there on, the FIFO output is combined with current passthrough samples creating a single slap back delayed echo.

i2s_samples_s delay(uint1_t reset_n, i2s_samples_s in_samples)
{
  // Passthrough samples by default
  i2s_samples_s out_samples = in_samples; 
  
  // Buffer up rx_samples into FIFO
  static uint1_t buffer_reached_full;
  i2s_samples_t fifo_data_in = in_samples.samples;
  // Data written into FIFO as passing through
  uint1_t fifo_wr = in_samples.valid; 
  // Read from fifo as passing through, and after delay reaching full buffer
  uint1_t fifo_rd = in_samples.valid & buffer_reached_full;
  samples_fifo_t fifo = samples_fifo(fifo_rd, fifo_data_in, fifo_wr);
  
  // Combine FIFO output delayed samples with current samples
  // if enough samples buffered up / delayed
  if(fifo.count >= DELAY_SAMPLES)
  {
    buffer_reached_full = 1;
  }
  if(fifo_rd & fifo.data_out_valid)
  {
    out_samples.samples.l_data = q0_23_add(out_samples.samples.l_data, fifo.data_out.l_data);
    out_samples.samples.r_data = q0_23_add(out_samples.samples.r_data, fifo.data_out.r_data);
  }
  
  if(!reset_n)
  {
    buffer_reached_full = 0;
  }
    
  return out_samples;
}

Distortion Effect

Per kind internet folk of the past, sgn(x)*(1-e^(G*-|x|)) was suggested as a distortion function to implement (w/ gain G=15.0). It was decided that a lookup table of 256 function points, followed by linear interpolation between those LUT points would be enough to approximate the function. The file interp_lut_gen.py is used to generate most of distortion.c which uses Q0.23 fixed pointer numbers, a snippet:

q0_23_t distortion_mono(q0_23_t x)
{
  // Get lookup addr from top bits of value
  uint8_t lut_addr = int24_23_16(x.qmn);
  // And interpolation bits from lsbs
  uint16_t interp_point = int24_15_0(x.qmn);

  // Generated lookup values:
  q0_23_t Y_VALUES[256];
  Y_VALUES[0].qmn = 0x0;
  Y_VALUES[1].qmn = 0xe2789;
  ...
  // M Scaled down by 2^4
  q0_23_t M_VALUES[256];
  M_VALUES[0].qmn = 0x713c4c;
  M_VALUES[1].qmn = 0x64b6ba;
  ...

  // Do lookup
  q0_23_t y = Y_VALUES[lut_addr];
  q0_23_t m = M_VALUES[lut_addr];

  // Do linear interp, dy = dx * m
  // Not using fixed point mult funcs since
  // need intermediates to do different scaling than normal
  q0_23_t dxi; // Fractional bits of input x
  dxi.qmn = interp_point;
  int48_t temp = dxi.qmn * m.qmn;
  int48_t temp_rounded = temp + (1 << (23 - 1));
  // Shift right by 23 for normal Q mult, then shift left by 4 to account for slope scaling
  q0_23_t dy;
  dy.qmn = temp >> 19;
  // Interpolate
  q0_23_t yi = q0_23_add(y, dy);
  return yi;
}

Automatically Pipelined Audio Effects

By putting the stateful overflow flag in a separate/isolated app_status (not shown below) the remaining app and effects_chain can be arbitrarily pipelined to meet timing.

// Autopipelineable stateless audio stream effects processing pipeline
i2s_samples_s effects_chain(uint1_t reset_n, i2s_samples_s in_samples)
{
  // Delay effect
  i2s_samples_s samples_w_delay = delay(reset_n, in_samples);
  
  // Distortion effect
  i2s_samples_s samples_w_distortion = distortion(samples_w_delay); //in_samples);
  
  // "Volume effect", cut effects volume in half w/ switches
  i2s_samples_s samples_w_effects = samples_w_distortion;
  uint4_t sw;
  WIRE_READ(uint4_t, sw, switches)
  if(uint4_1_1(sw))
  {
    samples_w_effects.samples.l_data.qmn = samples_w_effects.samples.l_data.qmn >> 1;
    samples_w_effects.samples.r_data.qmn = samples_w_effects.samples.r_data.qmn >> 1;
  }
  if(uint4_2_2(sw))
  {
    samples_w_effects.samples.l_data.qmn = samples_w_effects.samples.l_data.qmn >> 1;
    samples_w_effects.samples.r_data.qmn = samples_w_effects.samples.r_data.qmn >> 1;
  }
  if(uint4_3_3(sw))
  {
    samples_w_effects.samples.l_data.qmn = samples_w_effects.samples.l_data.qmn >> 1;
    samples_w_effects.samples.r_data.qmn = samples_w_effects.samples.r_data.qmn >> 1;
  }
  
  // Use switch0 to control, 1=effects on
  // Connect output
  i2s_samples_s out_samples;
  if(uint4_0_0(sw))
  {
    out_samples = samples_w_effects;
  }
  else
  {
    out_samples = in_samples;
  }
  
  return out_samples;
}

// Send audio through an effects chain
#pragma MAIN app
void app(uint1_t reset_n)
{
  // Read wires from I2S mac
  i2s_mac_rx_to_app_t from_rx_mac;
  WIRE_READ(i2s_mac_rx_to_app_t, from_rx_mac, i2s_mac_rx_to_app)
  i2s_mac_tx_to_app_t from_tx_mac;
  WIRE_READ(i2s_mac_tx_to_app_t, from_tx_mac, i2s_mac_tx_to_app)
  
  // Received samples
  i2s_samples_s rx_samples = from_rx_mac.samples;
  // Signal always ready draining through effects chain, checking overflow in status
  uint1_t rx_samples_ready = 1;
  
  // Send through effects chain
  i2s_samples_s samples_w_effects = effects_chain(reset_n, rx_samples);
  
  // Samples to transmit
  i2s_samples_s tx_samples = samples_w_effects;
  
  // Write wires to I2S mac
  app_to_i2s_mac_tx_t to_tx_mac;
  to_tx_mac.samples = tx_samples;
  to_tx_mac.reset_n = reset_n;
  WIRE_WRITE(app_to_i2s_mac_tx_t, app_to_i2s_mac_tx, to_tx_mac)
  app_to_i2s_mac_rx_t to_rx_mac;
  to_rx_mac.samples_ready = rx_samples_ready;
  to_rx_mac.reset_n = reset_n;
  WIRE_WRITE(app_to_i2s_mac_rx_t, app_to_i2s_mac_rx, to_rx_mac)

  // Control+status logic is stateful (ex. overflow bit)
  // and is kept separate from this stateless autopipelineable function
  app_status(reset_n, to_tx_mac.samples.valid, from_tx_mac.samples_ready);
}

Top level instantiation of the PipelineC entity inside VHDL file board.vhd:

-- The PipelineC generated entity
top_inst : entity work.top port map (   
    -- Main function clocks
    clk_22p579 => clk_22p579,
            
    -- Each main function's inputs and outputs
    app_reset_n(0) => i2s_rst_n,
    
    -- LEDs
    led0_module_return_output(0) => leds_wire(0),
    led1_module_return_output(0) => leds_wire(1),
    led2_module_return_output(0) => leds_wire(2),
    led3_module_return_output(0) => leds_wire(3),
    
    -- Switches
    switches_module_sw => switches_wire,

    -- PMOD
    --pmod_ja_return_output.ja0(0) => ja(0),
    pmod_ja_return_output.ja1(0) => ja(1),
    pmod_ja_return_output.ja2(0) => ja(2),
    pmod_ja_return_output.ja3(0) => ja(3),
    --pmod_ja_return_output.ja4(0) => ja(4),
    pmod_ja_return_output.ja5(0) => ja(5),
    pmod_ja_return_output.ja6(0) => ja(6),
    pmod_ja_inputs.ja7(0) => ja(7)
);

As written, targeting the Artix 7 device on the Arty board, the PipelineC tool reports the following:

██████╗ ██╗██████╗ ███████╗██╗     ██╗███╗   ██╗███████╗ ██████╗
██╔══██╗██║██╔══██╗██╔════╝██║     ██║████╗  ██║██╔════╝██╔════╝
██████╔╝██║██████╔╝█████╗  ██║     ██║██╔██╗ ██║█████╗  ██║     
██╔═══╝ ██║██╔═══╝ ██╔══╝  ██║     ██║██║╚██╗██║██╔══╝  ██║     
██║     ██║██║     ███████╗███████╗██║██║ ╚████║███████╗╚██████╗
╚═╝     ╚═╝╚═╝     ╚══════╝╚══════╝╚═╝╚═╝  ╚═══╝╚══════╝ ╚═════╝

Output directory: /home/julian/pipelinec_syn_output
================== Parsing C Code to Logical Hierarchy ================================
Parsing: /media/1TB/Dropbox/PipelineC/git/PipelineC/main.c
================== Writing Resulting Logic to File ================================
Building map of combinatorial logic...
Using VIVADO synthesizing for part: xc7a35ticsg324-1l
Writing VHDL files for all functions (as combinatorial logic)...
Writing multi main top level files...
Writing the constant struct+enum definitions as defined from C code...
Writing clock cross definitions as parsed from C code...
Writing finalized comb. logic synthesis tool files...
Output VHDL files: /home/julian/pipelinec_syn_output/read_vhdl.tcl
================== Adding Timing Information from Synthesis Tool ================================
Synthesizing as combinatorial logic to get total logic delay...
...
app_status Path delay (ns): 1.558 = 641.8485237483953 MHz
i2s_rx Path delay (ns): 3.019 = 331.23550844650543 MHz
i2s_tx Path delay (ns): 2.947 = 339.328130302002 MHz
i2s_mac_ports Path delay (ns): 3.581 = 279.2516056967328 MHz
samples_fifo Path delay (ns): 8.248000000000001 = 121.2415130940834 MHz
delay Path delay (ns): 8.248000000000001 = 121.2415130940834 MHz
distortion Path delay (ns): 9.811 = 101.92640913260625 MHz
effects_chain Path delay (ns): 14.335 = 69.75933031042902 MHz
app Path delay (ns): 14.335 = 69.75933031042902 MHz
================== Beginning Throughput Sweep ================================
Function: led0_module Target MHz: 22.579
Function: led1_module Target MHz: 22.579
Function: led2_module Target MHz: 22.579
Function: led3_module Target MHz: 22.579
Function: leds_module Target MHz: 22.579
Function: switches_module Target MHz: 22.579
Function: pmod_ja Target MHz: 22.579
Function: i2s_mac_ports Target MHz: 22.579
Function: app Target MHz: 22.579
Setting all instances to comb. logic to start...
Starting with blank sweep state...
Starting middle out sweep...
Starting from zero clk timing params...
Collecting modules to pipeline...
Pipelining modules...
Updating output files...
Running syn w timing params...
led0_module : 0 clocks latency...
led1_module : 0 clocks latency...
led2_module : 0 clocks latency...
led3_module : 0 clocks latency...
leds_module : 0 clocks latency...
switches_module : 0 clocks latency...
pmod_ja : 0 clocks latency...
i2s_mac_ports : 0 clocks latency...
app : 0 clocks latency...
Running: /media/1TB/Programs/Linux/Xilinx/Vivado/2019.2/bin/vivado ...
switches_module Clock Goal: 22.58 (MHz) Current: 69.15 (MHz)(14.46 ns) 0 clks
app Clock Goal: 22.58 (MHz) Current: 69.15 (MHz)(14.46 ns) 0 clks
i2s_mac_ports Clock Goal: 22.58 (MHz) Current: 69.15 (MHz)(14.46 ns) 0 clks
Met timing...
================== Writing Results of Throughput Sweep ================================
Output VHDL files: /home/julian/pipelinec_syn_output/read_vhdl.tcl
Done.

Note 0 clocks latency several times above. These state that no modules required pipelining (0 clks latency added). The clock goal of ~22.58MHz is an easy target to meet.

util

Note the use of exactly two DSPs in the distortion function: a multiplier for each of the stereo left+right channels. Also note the use of block rams for the delay effect (FIFO of samples).

Once loaded onto the board, switches control if the effects chain is active and at what volume. Give it a try!.

As a demo of PipelineC's autopipelining functionality (that would be come needed for longer effects chains or higher audio sample rates) the same design can be configured to run ~4x faster at a 100MHz clock rate:

...
Best guess sweep multiplier: 2.88
Starting from zero clk timing params...
Collecting modules to pipeline...
Pipelining modules...
Best guess slicing: app , mult = 1.0 [0.2, 0.4, 0.6000000000000001, 0.8]
Updating output files...
Running syn w timing params...
led0_module : 0 clocks latency...
led1_module : 0 clocks latency...
led2_module : 0 clocks latency...
led3_module : 0 clocks latency...
leds_module : 0 clocks latency...
switches_module : 0 clocks latency...
pmod_ja : 0 clocks latency...
i2s_mac_ports : 0 clocks latency...
app : 3 clocks latency...
Running: /media/1TB/Programs/Linux/Xilinx/Vivado/2019.2/bin/vivado ...
app Clock Goal: 100.00 (MHz) Current: 121.24 (MHz)(8.25 ns) 3 clks
Met timing...
================== Writing Results of Throughput Sweep ================================
Output VHDL files: /home/julian/pipelinec_syn_output/read_vhdl.tcl
Done.

app : 3 clocks latency indicates that the tool pipelined the app() function to a pipeline depth of 4 stages ('3' clocks of pipeline register latency). The pipelining divided up the effects_chain logic into a pipeline as such:

delay() function: No pipelining, can't be pipelined
distortion() function: 3 pipeline stages (2 for multiply, 1 for add)
+1 pipeline stage for muxing related to switches
= 4 stages

Post implementation timing report showing that timing has been met: timing

Utilization for this pipelined version: utilpipelined

If you've been paying attention so far - you'll know that autopipelining won't help much further (the problem lives in my lazy fifo implementation used for the delay effect :) ). Take care folks! Please reach out if you want to give this a go or have any questions at all.