Coarse_Delay_Block_ _A_Green_Block - david-macmahon/wiki_convert_test GitHub Wiki
Tutorial 7: Implementation of a Green Block
Authors: Mekhala Muley (Version 1)
Expected completion time: 3 hours
In this tutorial, you will design a “green block” with a dynamic internal structure. This is done using system generator blocks, blocks specific to ROACH board and a matlab script.
The first step will be to design a block using system generator block and do the basic functionality check by simulating the design. Once the successful simulation is done, the design can be tested on actual hardware for which you will need to compile the design. The compilation will generate a .bof file which will be used for programming the board. The output from the actual hardware can be read and plotted by a python script. This ensures that the block is working as expected on the hardware.
The final step will be to make it a parameterised block and write a matlab script for dynamically changing internal structure of the block. At the end of this tutorial you will have a parameterised and scripted green block.
Various blocks in the CASPER library are parameterised blocks with dynamic internal structure. For eg. consider the FFT block where the number of stages changes with the number of FFT points to be performed. For such blocks replicating the internal structure every time is not the efficient method. Hence a matlab script is written for redrawing such blocks where the internal structure is replicated depending upon the parameter value.
In this tutorial Coarse delay block will be designed. This delay block will be designed for simultaneous data inputs. The functionality of the coarse delay block is to delay the simultaneous input data stream by specified number of clock cycles. This is achieved by writing the input data samples in memory and reading them with an offset equal to the amount of delay required. This block has dynamic internal structure which can be constructed using a matlab script. The internal structure is replicated with increase in number of simultaneous input ports.
Click here for Complete information of Turorial Lab & Setup at Casper Workshop 2011, Pune, India. The lab at the workshop is pre-configured with the CASPER libraries, Matlab and Xilinx tools. If you want to try this tutorial in your lab, please refer the topics : H/W & S/W requirements and SET-UP of Tutorials at WORKSHOP 2011 ,Compiling the Design & Programming on ROACH FPGA and Locations & Files information for the WORKSHOP 2011 wherever applicable in the link Complete information of Turorial Lab & Setup.
In this tutorial you will design a coarse delay block. The function of the block is to delay the input data stream by specified number of clock cycles. This is achieved by writing the input data samples in memory and reading them with an offset equal to the amount of delay required.
The block will have n-number of simultaneous input as well as output ports. The amount of delay is provided dynamically by the user through software register. Counter data is given as the input to the block (This is used for checking the functionality of the block ). The basic functionality can be checked by simulating the design and checking the input as well as output data on the scope.The input data and the delayed output data can be stored in a BRAM using a snap block. This BRAM is read via python script to check the functionality of the block on actual hardware.
In the present case you will design a delay block with 4 simultaneous input port. The delay block will be designed to compensate maximum of 2048 clock cycles and this value is externally provided by you at the time of testing. As the simultaneous inputs are four, hence 4-BRAMs with 512 (i.e. 2048/4) locations are required to store the data. The internal logic is designed in such a way that the delays greater than 4 clock cycles are addressed by certain logic and delays less than 4 clock cycles are achieved via barrel shifter. A sync is provided to synchronize the delay block with rest of the logic in the design.
Start Matlab and open Simulink (either by typing simulink on the Matlab command line, or by clicking the Simulink icon in the taskbar). Create a new model file.
Add all the blocks from Simulink library browser and arrange them as shown in the figure below:
By now you should have used these blocks a number of times. Add System Generator block from Xilinx Blockset -> Basic Elements -> System Generator libray path. The settings can be left on default.
Also add XSG core config from BEE_XPS System Blockset -> XSG core config. Set its “Hardware Platform” to ROACH sx95t with “User IP Clock Source” as sys_clk which is an onboard 100MHz crystal. This block will configure the System Generator block which you have added previously. All hardware related blocks are yellow and can be found in the BEE_XPS library.
You need to add these two blocks for all the CASPER designs.
The inputs required to the coarse delay block are
a) simultaneous n input data_in.
b) a sync : for synchronisation of the coarse delay block with rest of the design
c) a delay : for the amount of delay to be compensated
d) an enable : for latching the delay value provided dynamically by the user at the sync boundary only.
Add two software registers from BEE_XPS System Blockset -> Software Register. Set the I/O direction to “From Processor”. Rename those software registers as sync and delay.
Sync pulse will ensure the proper synchronization of the coarse delay block with rest of the logic in the design.
Delay is used to provide the delay values from the processor dynamically to the block.
Add Pulse Generator from Simulink->Sources->Pulse Generator. Set Pulse type to “Sample based” , Period as “100” , Pulse Width as “1”. This is used for simulation. It simulates a sync pulse with a period of 100 clock cycle and pulse width of 1 clock cycle.
Add Constant from Simulink->Sources->Constant. Set Constant value to “2”. This is used for simulation. It will delay the input data stream by 2 clock cycles.
Add two slice blocks from Xilinx Blockset-> Basic Elements->Slice. Rename them as Slice1 and Slice2. Set properties of Slice1 to “Boolean Output”, as the data type of Sync input is boolean. Set the properties of Slice2 as shown in the figure below :
The maximum delay that can be compensated is 2048 i.e. 2^11 hence the bitwidth of the slice is set to 11.
Add four Counter blocks from Xilinx Blockset-> Basic Elements->Counter. Rename them as Counter1, Counter2, Counter3, Counter4. Set the properties of Counter1 as shown in the figure below:
Set all the properties for rest of the counters similar to that of Counter1 except for “Initial value” . Set 2, 3 and 4 as the “Initial Value” for Counter2, Counter3 and Counter4 respectively.
Add a register from Xilinx Blockset-> Basic Elements->Register and rename it to delay_reg. Select Provide enable port.
This register is used to latch the delay only when enable (which is connected to sync pulse) goes from low to high.
Add two slice blocks from Xilinx Blockset-> Basic Elements->Slice. Rename them to bram_rd_addrs and bs_sel and set the properties as shown in the figure below:
As the data is coming in four parallel stream, 4 BRAMs are required to store the data. The upper MSBs of delay(software register) sliced by bram_rd_addrs are used as the address of the BRAM for data read out. The lower bits of delay(software register) sliced by shift_sel are used as the select input for the barrel shifter.
Add three constant blocks from Xilinx Blockset-> Basic Elements->Constant. Rename them to Constant1, Constant2 and Constant3. Set Type to “Unsigned” and Number of bits to “2” for all of them. Set Constant value to “2” for Constant1, Constant value to “1” for Constant2 and Constant value to “0” for Constant3.
Add three relational block from Xilinx Blockset-> Basic Elements->Relational. Set Comparison to “a>b” for all the blocks.
Add three convert blocks from Xilinx Blockset-> Basic Elements->Converter. Set Type to “Unsigned”, Number of bits to “1” for all the blocks.
Add three adder block from Xilinx Blockset->Math->AddSub. Go to Output Type tab of the block. Select Precision to “Full”.
Storing of the data in memory can be implemented using single port RAM or dual port RAM. Depending upon the need the user can select the type of RAM available.
Add counter block from Xilinx Blockset-> Basic Elements->Counter. Rename it to wr_addr,/i>. Set Counter type to “Free running”, Count Direction to “Up”, Initial Vaule to “0”, Step to “1”, Output Type to “Unsigned”, Number of bits to “9”, Binary point to “0”. This counter is used to generate the address for writing the data into dual port RAM of 512 i.e. 2^9 locations.
Add a substractor block from Xilinx Blockset->Math->AddSub. Go to Basic Tab , set Operation to “Subtraction”. Goto Output Type tab set Precision to “User Defined”, Output Type to “Unsigned”, Number of bits to “9”, binary point to “0”. It basically generates the read address from the write address offsetting it by delay value.
Add a delay block from Xilinx Blockset-> Basic Elements->Delay. Set the Latency to “1”.
Add two constant blocks from
Xilinx Blockset-> Basic Elements->Constant
. Rename it to
Constant1
and set Type to “Boolean”, Constant value to “1”. This block is used to keep the write enable of port A always high for the dual port RAM. Rename the another block to
Constant2
and set Type to “Boolean”, Constant value to “0”. This block is used to keep port B always high i.e. port B will be configured as the reading port of dual port RAM.
Add a Dual Port RAM from Xilinx Blockset->Memory->Dual Port RAM. Goto Basic tab and set Depth to “512”, Memory Type to “Block RAM”, Latency to “3”. This is used for storing the input data stream.
Add a constant block from Xilinx Blockset-> Basic Elements-> Constant. Set Type to “Unsigned”, Constant value to “4” and Number of bits to “11”.
Add a substractor block from Xilinx Blockset->Math->AddSub. Go to Basic Tab , set Operation to “Subtraction”. Goto Output Type tab set Precision to “User Defined”, Output Type to “Unsigned”, Number of bits to “9”, binary point to “0”.
Add relational block from Xilinx Blockset-> Basic Elements-> Relational.Set Comparison to “a>=b”. All the above logic is used to generate the read and write address for the RAM.
Add a constant block from Xilinx Blockset-> Basic Elements-> Constant. Set Type to “Boolean” and constant value to “1”. This indicates that the data is written into RAM at every clock cycle.
Add a single port RAM from Xilinx Blockset->Memory->Single Port RAM. Set Depth to 512, Memory Type to “Block RAM”, Write Mode to “Read before write” and latency to “3”.
Add a barrel shifter block from CASPER DSP Blockset -> Reorders-> barrel_switcher. Set the Number of inputs to “2”. Barrel shifter basically shifts the data by a specified number in one clock cycle. This is used to achieve the delay compensation less than 4 clock cycles.
Add a delay block from Xilinx Blockset-> Basic Elements->Delay and rename it to sync_delay. Set its latency to “4”. This is used to delay the sync pulse in order to match with the pipeline latency encounter by the data within the block.
The outputs from the coarse delay block are sync_out and simultaneous n number of data_out.
Add 10 signal routing goto block from Simulink->Signal Routing-> GoTo. Rename them to sync_in,data_in1,data_in2,data_in3,data_in4,sync_out,data_out1,data_out2,data_out3,data_out4. These are used for signal routing.
Add 10 signal routing From blocks from Simulink->Signal Routing-> From. Rename them to sync_in,data_in1,data_in2,data_in3,data_in4,sync_out,data_out1,data_out2,data_out3,data_out4. These are used for signal routing.
Once all the blocks are added you need to check the functionality of design. This done by simulating the design. The simulated output from the design can be viewed using a scope. Add Scope1 from Simulink->Sinks->Scope.
Add 5 delay blocks from Xilinx Blockset-> Basic Elements->Delay. Set the latency of the four blocks to 7 and connect them to data_in signal routing blocks. Set the latency of the fifth delay block to 6 and connect it to sync_in signal routing block.
Add 10 gatewayout blocks from Xilinx Blockset -> Basic Elements -> Gateway Out. Uncheck the “Translate into output port”.
Add snap block from CASPER DSP Blockset -> Scopes ->snap64. Snap block is used to capture the data from the FPGA and make it accessible to the user. The block stores the 64 bit data when the ctrl signal goes from low to high. Hence this block is used to check the behaviour of the delay block on actual hardware.
Add 4 delay blocks from Xilinx Blockset-> Basic Elements->Delay. Set the latency to 7 and connect them to data_in signal routing blocks. These are used to match with the pipeline latency encounter by the data stream in the block.
Add a concat block from Xilinx Blockset-> Basic Elements-> Concat. Set number of inputs to 8. This is used to give four 8-bit data_in and four 8-bit data_out concatenated together and give it to the snap blocks datain input port.
Add a constant block from Xilinx Blockset-> Basic Elements-> Constant. Set Type to “Boolean” and Constant value to “1”.
Once the basic design has been created the next step is to simulate it for checking the functionality of the coarse delay block. Simulation results shows whether the output is as expected or not. If not suitable corrections can be made before moving for the next step.
Goto Simulation->Configuration paramters. Set the stop time to 20. This will give simulated output for 20 time stamps.
Goto Simulation->Start. This will simulate the design for 20 timestamps. And the results can be viewd in the Scope 1.
The simulation result shows that the data_in is delayed by 2 clock cycles. Hence data_in1 i.e port1 (connected to 3rd Axes labled as Data_in1) is available at the output port3 (which is connected to the 7th Axes labled as Data_out3) hence showing that the input is delayed by 2 clock cycles.
Compile the design to test it on the hardware. Goto matlab command window and run the command bee_xps. It will open the following window
Make sure the file displayed in the pop-up is correct and then click the Run XPS button. After compilation, it creates a directory named after the model file name without the .mdl extension. There is a sub directory named bit_files. In this bit_files directory there are .bit and .bof file. We need the .bof file to program the FPGA.
You need to copy this .bof file at location [FPGA_PROG_BOF_DIR] after changing the permissions of the file.
eg. for the bof file [TUT7_BOF_FILE] in the area [STD_BOF_DIR]
$ chmod a+x [STD_BOF_DIR]/[TUT7_BOF_FILE]
$ cp [STD_BOF_DIR]/[TUT7_BOF_FILE] [FPGA_PROG_BOF_DIR]
A script is written in python to communicate with the ROACH board.This script programs the board, sets the delay from command line and plots the input and delayed output data stream from the BRAM. Run the python script file “[TUT7_PYSCRIPT_FILE]” saved in the following location : “[STD_PYSCRIPT_DIR]”
Usage : [STD_PYSCRIPT_DIR]/[TUT7_PYSCRIPT_FILE] roach_name -b [bof_file] -d Enter the corresponding Location/File names and roach name/IP. This will program the Roach board with the above BORPH file as well as set the delay of 5 clock cycles.
eg.
$ [STD_PYSCRIPT_DIR]/[TUT7_PYSCRIPT_FILE] roach030172 -b [tut7_coarse_delay_blk_2011_Sep_23_1151.bof] -d 5
The plot shows the data_in and data_out. Hence there are four Axes representing 4 simultaneous data streams. The first axis shows the zoomed data points of data_in1 and data_out1. The Red Dot is for data_in1 and the Blue Plus sign is for data_out1. The difference between the data in and data out is exactly the same as provided i.e of 5 clock cycle.
After the functionality has been checked and verified now you can write a matlab script for the block.
Select all the blocks connected between input and output ports of the coarse delay block as shown in the figure below. Right clcik and select “Create Subsystem”. Rename the block as <your_initials>_coarse_delay_wideband_prog.
Rename the input ports as sync, delay, en, data_in1, data_in2, data_in3 and data_in4. Similarly rename the output ports as sync_out, data_out1,data_out2, data_out3 and data_out4.
Right click and select “Edit mask”. Following window will open. Go to Parameters tab and add various parameters and corresponding paramter value to it as shown in the figure below:
Go to initialization tab and from the initialization commands call the matlab function. The function name should be <block_name>_init in our case it will be <your_initials>_coarse_delay_wideband_prog_init. Pass the mask parameters and parameter value to the initialisation function in ('parameter_name' , parameter_value) pairs where 'parameter_name' is a text string containing the name of the paramter and parameter_value is the associated value as shown in the figure below:
Go to documentation tab and write the block name i.e. <your_intials>_coarse_delay_wideband_prog in the Mask Type section.
The above steps will pop-up a window when the user double-clicks the block as shown in the figure below:
Add this coarse delay block to the CASPER DSP Blockset -> Delays library. First Open the library then gotoSimulink library browser Edit-> Unlock Library. Now paste the block in that library.
Refer to the delay_wideband_prog.m file at the location “[CASPER_LIB_DIR]”.
Step 1 : Open the new matlab file(.m file). The file name should be <block_name>_init.m i.e. <your_initials>coarse_delay_wideband_prog_init.m and save it to the “[CASPER_LIB_DIR]”.The initialization function should be <your_name><block_name>_init.
Step 2 : Write the Comment describing the purpose of the function and how to use it.
Step 3 : Following are the various functions used to write the script for the block :-
“same_state” : It checks whether the parameters are changed or not. If they are changed then only it will redraw otherwise it will exit.
“check_mask_type” : This is a sanity check . This is used to check that the initialisation script is being called to operate on the appropriate block. This name is mentioned under the Documentation tab in Mask editor for the block.
“munge_block” : The call to this function causes link to the original library block to be disabled if the block is not to be instantiated in a library. This causes changes to local and not to propagate to the library.
“get_var” : This is used to extract the various parameter values passed to the function.
“delete_lines” : It removes all the connections in the existing system.
“reuse_block” : This searches for an existing block of the same type and sets the necessary parameters. Use get_param(gcb,'ObjectParameters') and get_param(gcb,ParameterName) function to get all the necessary paramters of the block. get_param(gcb,'ObjectParameters') function will give name of all the parameter where “gcb” gets the current active block. get_param(gcb,ParamterName) function will return the value corresponding to the ParameterName.
“add_line” : This connects the output ports of one block to the input port of another block.
“clean_blocks” : A call to clean_blocks after all the blocks are instantiated and connected, ensures unconnected blocks are removed.
“save_state” : Each Simulink block has a parameter called “UserData” for storage of data by the useful. The call to save_state causes a data structure to be saved to this parameter. This structure consists of two fields viz state and parameter. State contains a hash of the input arguments (used by the same_state to determine if these have changed). Parameters contains a data structure with all the mask parameters passed to the initialisation function. save_state also update the mask parameters and overwrites variable references with actual values. This forces mask values to change to values passed into the script. If mask values do not change, Simulink does not execute the initialisation script so this also forces a change in mask values as variables are expanded so that future change in these variables cause execution of the initialisation script. Note : For more detail description on script writing visit following link https://casper.berkeley.edu/wiki/DSP_Block_Standardisation
Call the function from the Matlab Command Window. It will list the error(s) if any , in the command window. This helps in debugging the script before actually using it with the block.
For eg. Execute the function from matlab command window delay_wideband_prog_init(gcb,'max_delay',1024,'n_inputs_bits',3,'bram_latency',4,'bram_type','Dual Port')
Pick and place the block from the library to the new model file. Now change few paramters of it. It will give a call to the initialisation script and redraw the block according to the input parameters. You can see that the internal structure of the block changes with the change in input parameter.
You have completed the tutorial on designing of the green block. You can now design any block whose internal structure varies dynamically depending upon the masking paramters.