Using The Cycle Counter - TeensyUser/doc GitHub Wiki

The cycle counter is a hardware counter which increments with the CPU frequency (F_CPU) and is available for all ARM Teensy boards except the Teensy LC. The cycle counter can be utilized for high precision time measurements, sub microsecond delays, performance checks etc.

Using the cycle counter

The current value of the cycle counter is stored in the 32bit register ARM_DWT_CYCCNT. The following snippet shows how to read its value and convert it to nano seconds.

void setup()
{
    // The following 2 lines are only necessary for T3.0, T3.1 and T3.2
    ARM_DEMCR    |= ARM_DEMCR_TRCENA;         // enable debug/trace
    ARM_DWT_CTRL |= ARM_DWT_CTRL_CYCCNTENA;   // enable cycle counter
}

void loop()
{
    uint32_t cnt = ARM_DWT_CYCCNT;
    float     ns = cnt * 1E9f/F_CPU;
    Serial.printf("CYCCNT: %10u (%0.6g ns)\n",cnt, ns);

    delay(1000);
}

Which - for a T4.0 - generates the following output:

CYCCNT:   84777964 (1.41297e+08 ns)
CYCCNT:  684781584 (1.1413e+09 ns)
CYCCNT: 1284785178 (2.14131e+09 ns)
CYCCNT: 1884788776 (3.14131e+09 ns)
CYCCNT: 2484792336 (4.14132e+09 ns)
CYCCNT: 3084795937 (5.14133e+09 ns)
CYCCNT: 3684799542 (6.14133e+09 ns)
CYCCNT: 4284803183 (7.14134e+09 ns)
CYCCNT:  589839454 (9.83066e+08 ns)
CYCCNT: 1189843051 (1.98307e+09 ns)

Especially for the faster boards (T3.5 - T4.0) the cycle counter increments so fast that an overflow of the 32bit register happens quite frequently. The numbers above show a overflow time of about 7s for a Teensy 4.0

The table below shows the CPU frequency (standard setting), the counter period and the overflow time for all boards supporting the cycle counter:

Board F_CPU Counter Period Overflow
T3.0, T3.1, T3.2 96 MHz 10.4 ns 44.7 s
T3.5 120 MHz 8.33 ns 35.8 s
T3.6 180 MHz 5.56 ns 23.9 s
T4.0, T4.1, MM 600 MHz 1.67 ns 7.16 s

If you need longer overflow times you can easily extend the 32bit counter to 64bit as shown in the sketch below. Whenever you call cycles64() it first checks if an overflow happened since the last call. If there was an overflow it increments the upper 32bit of the 64 bit result. It then returns the logical OR of the upper 32bit and the cycle counter in the lower 32bit.

uint64_t cycles64()
{
    static uint32_t oldCycles = ARM_DWT_CYCCNT;
    static uint32_t highDWORD = 0;

    uint32_t newCycles = ARM_DWT_CYCCNT;
    if (newCycles < oldCycles)
    {
        ++highDWORD;
    }
    oldCycles = newCycles;
    return (((uint64_t)highDWORD << 32) | newCycles);
}

void setup()
{
    // The following 2 lines are only necessary for T3.0, T3.1 and T3.2
    ARM_DEMCR    |= ARM_DEMCR_TRCENA;         // enable debug/trace
    ARM_DWT_CTRL |= ARM_DWT_CTRL_CYCCNTENA;   // enable cycle counter
}

void loop()
{
    uint64_t cnt = cycles64();
    double   sec = cnt * (double)1.0/F_CPU;
    Serial.printf("CYCCNT:%11" PRIu64" ->%13.9f s\n",cnt, sec);

    delay(1000);
}

Here the output which obviously does not overflow anymore.

CYCCNT:  779717541 ->  1.299529235 s
CYCCNT: 1379721136 ->  2.299535227 s
CYCCNT: 1979725376 ->  3.299542293 s
CYCCNT: 2579729579 ->  4.299549298 s
CYCCNT: 3179733732 ->  5.299556220 s
CYCCNT: 3779737940 ->  6.299563233 s
CYCCNT: 4379742131 ->  7.299570218 s
CYCCNT: 4979746380 ->  8.299577300 s
CYCCNT: 5579750548 ->  9.299584247 s
CYCCNT: 6179754769 -> 10.299591282 s
CYCCNT: 6779758955 -> 11.299598258 s
CYCCNT: 7379763147 -> 12.299605245 s

Please note: This algorithm only works if you call cycles64() at least once per overflow time (see table above). As long as your sketch doesn't block for several seconds placing a dummy call to cycles64 in loop should be enough.

(See here how to use the 1s RTC interrupt to update the 64bit cycle counter in the background https://github.com/TeensyUser/doc/wiki/implementing-a-high-resolution-teensy-clock#the-periodic-timer-of-the-real-time-clock)

⚠️ **GitHub.com Fallback** ⚠️