RT60 - QuantAsylum/QA40x GitHub Wiki
The RT60 tool provides a fast, repeatable way to measure room decay using a Farina exponential sweep and Schroeder integration. It is designed for practical room-treatment work (before/after comparisons, placement experiments, and βis this better?β decisions), not for exhaustive, standards-compliant ISO 3382 workflows with many user-adjustable knobs.
The emphasis is on:
- Simple setup and fast iteration
- Calibrated SPL readout (via mic sensitivity and preamp gain)
- Broad, treatment-relevant frequency bands
- Clear diagnostics for noise, distortion, and timing
- Defaults that will cover a range of common room sizes, from a small office to small clubs
The input to the MISC RT60 plug-in is straightforward. You specify the required analyzer output amplitude, the maximum measurement time you require, your measurement mic sensitivity, and any mic pre-amp gain.
The maximum window you need will be a function of the room size. In the plot below, you can see a range of room sizes (in cubic meters) and the expected reverberation time given the materials in the room. A home office might be 25-50 m3, a home theater 50 to 100m3, and a small club 300 to 1000m3. For a 503 home theater with hard walls and floors, we might expect the RT60 time to be nearly two seconds. With some light wall treatment (25% coverage and carpet on the floor) that might fall to half a second. Give yourself some margin. That is, if you think your room is at 0.5 seconds, it doesn't hurt to specify 1 or even 1.5 seconds. It will take longer to measure, but once you get a first measurements you can adjust to a smaller room size as needed.
One more note: The bandwidth of the measurement is limited from 80 to 10 kHz. Use a 48k sample rate to keep the amount of data reasonable.
A block diagram of a typical setup appears as follows:
In the diagram above, the QA403 output is connected to an amp + speaker, which is placed inside the room. A measurement microphone is placed near the listening point, >1m away from the speaker. The QA472 provides 10 dB of low-noise gain, and that output is routed into the left channel of the QA403. The right channel QA403 output is routed directly to the right channel input. This direct connection provides a "reference" allowing timing information to be gathered, and also providing an amplitude reference. With the mic sensitivity and pre-amp gain, we can then learn the absolute amplitudes in dBSPL.
The RT60 options shown above require you to specify an output amplitude. You will want to start with a very small number here, perhaps -50 dBV or so, as you setup the system for first time use. The final value will depend on several factors, including amp gain, speaker sensitivity, mic sensitivity, etc. But it's easy to "sneak up" and increase the value in small steps and use the feedback from the RT60 measurement to guide your way. After familiarizing yourself with the operation and related equipment, it should be very quick to take your equipment to a new room and quickly arrive at very accurate measurements without any calibration.
Sample output from a run is shown below.
From the plot above, we can see a family of curves that show the decay of energy in the room. The stimulus is an exponential sweep, which when deconvolved gives us a room impulse response. The impulse response is then separated into bands in frequency domain, and the band-limited impulse responses (IR) are again computed. A Schroeder integration is performed, allowing us to see how the energy decays as both a full bandwidth (80 to 10 kHz) and band-limited bandwidth response.
The plot includes decay curves for:
- Low (80β300 Hz): typically influenced by bass traps, corner trapping, soffits, and large absorbers
- Mid (300β4000 Hz): typically influenced by wall panels, reflection control, and general absorption
- High (4000β8000 Hz): typically influenced by lighter absorption, surface finishes, and flutter control
- Full: overall broadband decay behavior
These bands are intentionally broad and aligned with common treatment decisions:
- If Low is long, add bass trapping
- If Mid is long, add wall panels / reflection control
- If High is long, consider lighter absorption or diffusion and address flutter
Below the graph you'll see a text summary of the data. Let's initially focus on the first four lines:
Full 0.39s (T30) | C50: 9.8 dB (91%)
Low 0.43s | C50: 11.0 dB
Mid 0.41s | C50: 8.7 dB
High 0.34s | C50: 11.6 dB
The first line ("Full") shows that a 60 dB decay estimate (based on the first 30 dB, from -5 dB to -35 dB) took 0.39 seconds. In the graph, we can see the colored traces. And you'll see "fat" traces that help you understand the region where the extrapolation was made. We can also see the decay times for the various bands (low, mid and high). We know that across the full spectrum it took 0.39 seconds, but we can also see that the low band took 0.43 seconds, and the high-band took 0.34 seconds. This is common, as it can be hard to tame the energy at lower bands.
Each of the four bands also has an associated C50 ("clarity"). The C50 number tells you the relationship of the energy that occurred in the first 50 mS of the decay relative to the region beyond 50 mS. This is important because to maximize clarity--aka your ability to understand spoken words--you want most of the energy to be present in the first 50 mS of the IR. And a positive number >5 dB or so is where you want your C50 number to be. For the full spectrum, we can see the figure is 9.8 dB. And next to that is a figure 91%. This figure is the D50 number. This is called "definition" and is the same measurement as the C50 number, but expressed as a percentage rather than in dB. So, 10 dB = 90.9%.
Next, we have the following:
TD Peak: 107.3 dB SPL | Silence: 35.8 dB SPL | DR: 72 dB
IR Peak: 19.3 dB | Silence: -55.4 dB | DR: 75 dB
The first is a measurement on Time Domain. Just before the chirp starts, there's a 50 mS period of silence in which the room noise floor is captured. This is converted to dBSPL via the mic cal factor and pre-amp gain, and displayed. Additionally, the RMS of the sweep is measured in a sliding 50 mS window and from that the peak SPL can be learned. And from this, we can know the peak RMS and the RMS during silence, and we can estimate the best-case dynamic range. Here, it's 72 dB.
Next, we make similar measurements on the Impulse Response. That is, we measure the RMS of at the very end if the IR tail (usually around the last 200mS). That was then compared to the RMS of a 50 mS sliding window taken over the IR, and those to are then expressed as another dynamic range measurement.
The purpose of both of these DR measurements to ensure you have the dynamic range needed to make a useful RT60 measurements. And we can see from the summary that we do.
Next, we have the following:
Path Delay: 7.82 ms (375.3 samp) GCCpk: 53.291
This line tells us the path delay. Recall the right channel input is connected directly to the right channel output. This gives a reference. The delay of 7.82 mS reflects the processing in the powered speaker, along with the distance between the speaker and microphone. The GCCpk represents the correlation peak from the GCC-PHAT algorithm. Usually, this should come in above 30 to give good confidence in the estimate.
Finally, we have the line:
Max noise energy (0s to IR Direct IR): 2L: -52 dB 3L: -36 dB 4L: -56 dB 5L: -52 dB
This gives you a concise summary of the energy of each of the Farina harmonics. A useful byproduct of Farina deconvolution is a series of additional IRs that occur before the primary IR. These are offset in time, and represent the energy contained in the harmonic regions. With a perfect amp, speaker and microphone, these would be vanishingly small since there was no harmonic distortion present. But in reality, distortion is present. And the amount of distortion will be representative of how hard your system is working.
In the summary above, we can see the harmonic energy associated with the 3rd harmonic is about -36 dB below that of the primary impulse response. This figure isn't THD: Instead, it's a figure of merit that may or may not be helpful in your analysis.
The measurement captures a single buffer of length N samples at sample rate Fs. Total capture time is:
- Total capture duration (seconds) = N / Fs
A portion of that buffer is used for the exponential sweep, controlled by SweepPct:
- Sweep duration (seconds) = SweepPct * (N / Fs)
The remainder of the buffer is available for:
- Pre-silence (before the sweep starts)
- A small margin
- Post-sweep decay capture (the room decay window)
In other words, SweepPct trades off:
- Longer sweep (more excitation energy, often better SNR at low frequencies)
- Versus more time left for decay capture (longer measurable RT window)
A common choice like SweepPct = 30% provides a reasonably long sweep while leaving most of the buffer for room decay.
Below, you can see some debug data for a plot with a 1 second RT (Reverberation Time). Note the pink lines marking the region from 40 mS to 90 mS. These is the region used for the TD silence calc. This will be your room noise floor.
Next, is some debug data of the IR obtained by deconvolving the IR. Here we can see the blue window function. Pink lines to the left of the IR indicate the Farina harmonic IR locations. The pink line at 50 mS denotes the crossover for the C50 caculations, and you can see a final pink line at t=1sec, which is the user-specified max reverberation time.
Connect the setup as shown above. Use your specified mic sensitivity and a cal file (loaded as User Weighting) if appropriate. For a first run, pick a 2 second maximum reverberation time and a -50 dBV output. If using a mic-pre, set it to 0 dB. Set the full scale input on the QA403 to 0 dBV and use a 256K FFT size. Position your mic at your designated listening point. The speaker should be pointing at the mic, with the mic generally set to be 90 degree incidence (for most measurement mics). The aim here is to ensure the mic is seeing the direct field from the speaker.
When the run completes, there's a single data point you are looking for, and that is the measured peak dB SPL from the time domain data. In the case below, we can see that is 67.4 dB SPL. We can also see the dynamic range on that same line is very limited at 22 dB.
For a next run, we want to get around 100 dBSPL, which suggests we want to add about 30 dB to our previous run. So, we'll use -20 dB the next run:
We increased the output by 30 dB, and the measured dB SPL rose from -67.4 to 97.1 dB:
Now let's take a look at the main display of the QA403 at the left channel:
We can see the peaks here are around -80 dBV, suggesting we have a lot of room before we hit our 0 dBV full scale input. Let's increase the mic gain on the QA472 to +20 dB and run again after adjusting the mic pre-amp gain in the options dialog:
We now have a plot that showing reasonable dynamic range, and it's pretty clear the room we're dealing with has an RT60 under 500 mS.
Given the RT60 around 400 mS, we can change the max RT to 0.5sec improve the speed of the test. With this reduced time, you can change your FFT size to 64k. At a 48k sample rate, this means the entire buffer will be 64/48 = 1.33 seconds to play out. The resulting plot is shown below, with the key numbers largely unchanged.
Next, we move the mic about 1m closer to the speaker and run again, noting the results are largely unchanged (the path delay has changed, though).
Finally, let's add another 10 dB to the output. This should push our peak dBSPL to around 105 dB.
This time, we saw an error (below) and the QA403 input relays clicked, indicating an overload.
Now, we change the mic gain from 20 down to 10 dB and run again:
And the resulting plot is as follows:
What is of note above is that the 2H harmonic energy in the has risen about 5 dB. Let's bump pre-gain down to 0 dB, add another 5 dB to the output and run again:
And the result is below. Now we're at 114 dB SPL peak, and we're seeing 74 dB of dynamic range. The 2H is still elevated, but the fact that is hasn't elevated more means the powered speaker is probably still happy.