03 Clock Domain Crossing (CDC), Asynchrony, IO Synchronization. - alex-aleyan/xilinx GitHub Wiki
STOPPED AT 5.6.1 MCP Formulation using a synchronized enable pusle
https://wavedrom.com/editor.html
Source:
- See 01-Sources: clock-domain-crossings
- See the code in appendix of the source!
Overview:
- Multi-bit CDC techniques:
- Multi-bit signal consolidation
- Consolidation.
- Consolidation and an extra flip-flop.
- Multi-Cycle Path (MCP) formulation (settling the bus and adding enable signal)
- MCP formulation using a synchronized enable pulse
- Closed-loop - MCP formulation with feedback
- Closed-loop - MCP formulation with acknowledge feedback
- Multi-Cycle Path (MCP) formulation (settling the bus and adding enable signal)
- Multi-bit CDC signal passing using asynchronous FIFOS
- Multi-bit CDC signal passing using 1-deep/2-register FIFO synchronizer
Notes:
- Synchronization failure is caused by an output going metastable and not converging to a legal stable state by the time the output must be sampled again (Source: Cummings). Metastability caused by the data input to a FF changing during the setup and hold times.
- Use clock manager when dividing the clocks. Do not do it via logic.
- Always register the data leaving the "sending" domain to reduce the oscillation from the combinatorial circuit - do NOT let the signals leave the combinatorial logic unregistered at the tail.
- The ODDR output module has to be utilized when bringing FPGA's internal clocks to an external pin.
- when pin planning, remember that the data input and its corresponding clock should enter the FPGA on the same clock region.
- When bridging data between clock domains - use a SINGLE Control Signal.
- Always double-register the control signals in the new clock domain and never use metastable signals as Write Enable or Address to RAM.
- The double registers should be placed as close as possible to reduce the propagation of metastable signal.
- Pulse Synchronization:
- Two Flip-Flop Synchronizer (Open-Loop Synchronizer):
process(Clock_Fast) begin if (rising_edge(Clock_Fast)) then -- Double Register Synchronization Logic: Strobe_Fast_Clk <= Strobe_Slow_Clk; -- Strobe_Fast_Clk is meta-stable signal Strobe_Fast_Clk_dly1 <= Strobe_Fast_Clk; -- Strobe_Fast_Clk_dly1 is stable after double reg end if; end process;
- Three Flip-Flop Synchronizer (Open-Loop Synchronizer):
process(Clock_Fast) begin if (rising_edge(Clock_Fast)) then -- Double Register Synchronization Logic: Strobe_Fast_Clk <= Strobe_Slow_Clk; -- Strobe_Fast_Clk is meta-stable signal Strobe_Fast_Clk_dly1 <= Strobe_Fast_Clk; -- Strobe_Fast_Clk_dly1 is stable after double reg Strobe_Fast_Clk_dly2 <= Strobe_Fast_Clk_dly1 ; -- Strobe_Fast_Clk_dly2 is even more stable after tripple reg end if; end process;
- Two Flip-Flop Synchronizer with Edge Detector (Open-Loop Synchronizer): in addition to double-register, add the edge detector logic consisting of 1 more register and an AND gate with one of two inputs inverted:
process(Clock_Fast) begin if (rising_edge(Clock_Fast)) then -- Double Register Synchronization Logic: q1 <= sig_from_slow_clk; -- Strobe_Fast_Clk is meta-stable signal q2 <= q1; -- Strobe_Fast_Clk_dly1 is stable after double reg -- Rising Edge Detect Logic: q3 <= q2; re_detect <= q2 AND (NOT q3); end if; end process;
- Note: in the case we are using re_detect as data_valid signal to pass data across clock domains, we might want to add a flop to shift the re_detect signal to the middle of the valid data to assure the data is settled during re_detect pulled high indicating valid data crossing.
process(Clock_Fast) begin if (rising_edge(Clock_Fast)) then -- Double Register Synchronization Logic: q1 <= sig_from_slow_clk; -- Strobe_Fast_Clk is meta-stable signal q2 <= q1; -- Strobe_Fast_Clk_dly1 is stable after double reg -- Rising Edge Detect Logic: q3 <= q2; re_detect <= q2 AND (NOT q3); re_detect_strobe <= re_detect; end if; end process;
- Note: in the case we are using re_detect as data_valid signal to pass data across clock domains, we might want to add a flop to shift the re_detect signal to the middle of the valid data to assure the data is settled during re_detect pulled high indicating valid data crossing.
- Two Flip-Flop Synchronizer (Open-Loop Synchronizer):
- Bus Synchronization:
- Bus Synchronizer (Open-Loop Synchronizer): Because each signals on a data bus undergoes different propagation delay, we have to make sure each line of the bus is stable:
process(Clock_Fast) begin if (rising_edge(Clock_Fast)) then -- Double Register Synchronization Logic: Metastable_Data_Clk_O <= Data_Clk_I; -- single register Stable_Data_Clk_O_xD <= Metastable_Data_Clk_O; -- double register -- Comparison to make sure each line of the bus has stabilized: Stable_Data_Clk_O_dly1 <= Stable_Data_Clk_O; Output_Data_Clk_O <= Output_Data_Clk_O; -- default to its own value to avoid latch inference if (Stable_Data_Clk_O = Stable_Data_Clk_O_dly1) then Output_Data_Clk_O <= Stable_Data_Clk_O_dly1; end if; end if; end process;
- Dual Clock FIFOs (Closed-Loop Synchronizer) - yields the best results.
- Bus Synchronizer (Open-Loop Synchronizer): Because each signals on a data bus undergoes different propagation delay, we have to make sure each line of the bus is stable:
- MTBF (Mean Time Between Synchronization Failures; see Dally and Poulton) depends on clock frequencies of the two domains
- Larger MTBF are desired since larger MTBF indicates longer periods between failures.
- MTBF=1/(Fsynch * Fdata * X)
- Fsynch - Frequency of the Synchronizing clock (slower clocks yield higher MTBF).
- Fdata - Data changing frequency (slower data yield higher MTBF).
- X - other factors.
Fast to Slow clock domain crossing:
- Note if we don't utilize Fast to Slow synchronization techniques, it's possible for the sending signal to change twice before the slow clock domain samples the signal.
- Two approaches are utilzied:
- Open-Loop (without ACK) synchronization.
- pro: can be used when relative clock frequencies are fixed and understood (harmonic, phase shifted, etc) by the designer.
- pro: cheap, fast, and easy to implement.
- con: if design requirements change, each Open-Loop Synchronizer must be revised by the engineer. Can be minimized via SysteVerilog assertion to detect if the input pulse fails to meet the "three edges" requirement.
- Closed-Loop (with ACK) synchronization.
- The sender adds an enable signal, the enable signals is synchronized to the receiver clock domain, and then passed thru the sending clock synchronizer prior being passed back to the sender.
- pro: synchronizing a feedback signal is a safe way to assure the sender that the control signal has successfully made it to the new clock domain.
- con: synchronizing the control signals from sender to receiver domains, and then from receiver to sender clock domains introduces a delay.
- Open-Loop (without ACK) synchronization.
- Synchronizing Slower to Faster clock domains is not a problem as long as the faster (receiving) clock domain is at least 1.5 faster than the slower (sending) clock domain. We want the slower data to last at least 3 edges of the faster clock to avoid metastability.
- In simulations, it's easier to use assertions to assure 3 clock edges requirements than to measure a fractional width of a CDS signal during simulation.
- The 3 edge requirement applies to both Closed-Loop and Open-Loop synchronizers. The Closed-Loop synchronizer automatically endusres that at least 3 edges are detected for all CDC signals.
- At the minimum, the signal sent by the sending clock domain must at least last for 1 full clock period + (2 x setup time) + (2 x hold time).
- In case of Fast to Slow domain crossing, if the CDC signal is only pulsed for one fast-clock cycle, the CDC signal could go high and low between the rising edges of a slower clock and not be captured into the slower clock domain as (copy/paste from Cummings).
Multi-bit CDC:
- Cummings handles multibit CDC via next 3 strategies:
- Multi-bit Signal Consolidation.
- Multi-cycle Path Formulation.
- Pass multiple CDC Bits Using Gray Codes.
- Multi-Bit Signal Consolidation
- consolidate multiple CDC signals into a 1bit CDC signal.
- Example: we can consolidate ENABLE and LOAD signals into a single LOAD_ENABLE signal.
- Multi-Cycle Path Formulation (MCP Formulation):
- common technique for safely passing multiple CDC signals.
- MCP Formulation technique:
- sending unsynchronized data to a receiving clock domain paired with a synchronized control signal.
- The data and control signals are sent simultaneously allowing the data to setup on the inputs of the destination register while the control signal is synchronized for two receiving clock cycles before it arrives at the load input of the destination register.
- Advantages:
- The sending clock domain is not required to calculate the appropriate pulse width to send between clock domains.
- the sending clock domain is only required to toggle an enable into the receiving clock domain to indicate that data has been passed and is ready to be loaded. The enable signal is not required to return to its initial logic level (Open-Loop).
- Problem : two or more encoded signals are being passed between clock domains. Each may arrive slightly skewed producing erroneous output.
- Solution: both MCP Formulation and FIFO techniques can provide solution to :
- Closed-loop MVC with feedback.
- Closed-loop MVC with acknowledge feedback.
- Async FIFO.
- 2-deep FIFO.
- Solution: both MCP Formulation and FIFO techniques can provide solution to :
- MCP Formulation with Synchronized Enable Pulse.
- The most common method to pass a synchronized enable signal between clock domains is by implementing a toggling enable signal that is passed to a synchronized pulse generator to indicate that the unsynchronized multi-cycle data word can be captured on the next receiving clock edge (Cummings).
- The key feature - the polarity of the input signal does not matter (image sources).
process(Clock_Fast) begin if (rising_edge(Clock_Fast)) then -- Fast Clock: Double Register Synchronization Logic: q1 <= sig_from_slow_clk; -- is meta-stable signal q2 <= q1; -- is stable after double reg -- Fast Clock: Any Edge Detect Logic: q3 <= q2; load_en <= q3 XOR q2; end if; end process;
- Pass Multiple CDC Bits Using Gray Codes.
- Pass Multiple CDC Bits Using Asynchronous FIFOs.
- Crossing clock domains with an Asynchronous FIFO: https://zipcpu.com/blog/2018/07/06/afifo.html
- Pass Multiple CDC Bits Using 1-deep/2-register FIFO synchronizer.
Other
- CDC FIFO min depth:
- Xilinx LUTRAM is configurable as a 2x32 or 1x64 RAM, meaning the minimum depth of a FIFO the tool can even infer will be 32 therefore inferring smaller FIFO than 32 deep is unbeneficial.
- Because async clocks experience jitter, an 8 entry FIFO can and will overflow depending on the average clock skew between a read and write domain with the same nominal frequency unless you insert a dead cycle on the write side every once in awhile. Inserting a single dead cycle every few million clocks is sufficient to allow such a FIFO to avoid overflow.