Timer clock enable divider tricks - barawn/verilog-library-barawn GitHub Wiki

Cheap timer/clock count methods

A lot of Xilinx tools suggest using SRLs for frequency division, and for small dividers, they're definitely right. However, the problem is that for large dividers, they suggest tricks like this: http://www.markharvey.info/art/srldiv_04.10.2015/srldiv_04.10.2015.html

There are two downsides to this design: first, it has 4 independent control sets since the CEs are all different, which means packing in a single slice is difficult. Second, a design like this is not synchronizable/resettable - it starts up when it starts up, and resetting it would require you to add more logic to the clock enables to each one. Getting it to synchronize once is cheap - you add an FF to the initial SRL and set it when you want to start. But because the timers are used to control the logic of later stages and SRLs are not resettable, there's no easy trick for a later reset/synchronize.

It also turns out that for very large n, this isn't even the right way to do it! The reason is that the registers in the SRLs are not being used efficiently: to count to n, you need ~log2(n) registers, not n. But you can't use the SRLs as binary-type counters because you don't have access to all of the values. So you need to be clever.

Coprime timer

Consider wanting to implement a 1024-clock delay. One option is two use 2 SRL shift registers, one of maximum length (+FF, so 33) and one of length 31, and loop them back, starting on your start signal. Because 31 and 33 are coprime, they will not both assert again until 1023 clocks later. A little bit of extra logic is needed to handle stopping the timer and adding the extra clock, but it's not that bad.

This is basically the cheapest way to handle delays up to 1024-ish clocks or so. Past that you need to get more clever.

LFSR timer

Probably the cheapest way to create a 2^n timer for large n is to create the cheapest LFSR you can using SRLs, start it at 1, and then use a counter to detect a run of log2(n) zeros, which indicates the LFSR has completed. There's a bit of fiddling you'd need to do to extend the period by 1 (since the LFSR's period is 2^n-1) but this isn't much.

Max length LFSR feedbacks can be found here:

https://users.ece.cmu.edu/~koopman/lfsr/

or for a trimmed list generally including the most 'useful' LFSRs:

https://docs.xilinx.com/v/u/en-US/xapp052

although some of Xilinx's choices aren't the best: 16 bits, for instance, is probably best implemented with (16,5,4,3) which can still be done with 2 SRL16s + 2FFs (srl->ff->ff->srl), and 19 bits using (19,6,2,1) is a poor choice over (19,18,8,7) (srl->ff->srl->ff). The easiest way to identify "easy" LFSRs from the full list is to look for fewest bits + fewest gaps between bits, and then adjacent bits at the end + middle is the easiest to implement.

The reason why this ends up being cheaper is that it requires only a log2(n)-bit counter rather than an n-bit counter. Once you get up into the 20-30 bit range this starts to dramatically win.