Latency_Memo - david-macmahon/wiki_convert_test GitHub Wiki
Maximum Useful Latency
Overview
While adding latency is a good idea to improve timing of your designs, there is a limit to the useful amount of latency to use in each block. If additional latency is selected beyond the maximum amount listed here, the extra latency will be implemented as registers or SRL-16 resources in the FPGA and will not help meet timing requirements.
This information was extracted from the Xilinx documentation, in particular the Coregen documentation mostly. It has not been fully experimentally verified.
Don't forget to enable pipelining where possible, otherwise adding latency will not help
Maximum Latency List
- Cast/Convert >3 doesn’t help
- (Embedded) Multiplier >3 unlikely to help for up to 18x18 multiplies if output precision is full. For saturation etc, not sure, additional might help.
- BRAM >3 unlikely to help
- MUX for less than 8 inputs, latency >1 unlikely to help
- Add/Sub No info. I’ve seen latency up to 6 help timing.
Empirical Results
The following behaviors have been observed, and may be applied to trick the tools into giving better timing results. YMMV.
Bit-sliced bus pipelining
Under 10.1, pipelining a bit-sliced bus after the slicing gave better timing results, even with the same total latency.
The following implementation did not meet a 350MHz time constraint on ROACH with SX95T-1:
The following (supposedly equivalent) implementation did meet the same timing constraint: