Fundamentals of Music and Audio Processing: Fourier Transform - ECE-180D-WS-2024/Wiki-Knowledge-Base GitHub Wiki

Author: Ian Wells

Introduction

Audio processing is an essential element of music and sound production, enabling engineers to integrate post-production processing to achieve specific tasks. In the sphere of music production, audio processing is instrumental in manipulating sounds, adjusting Equalization (EQ), and creating enjoyable songs or scores. Sound engineers accomplish this by utilizing a combination of filters and transforms to balance out frequencies and reduce noise in signals. To introduce the fundamentals of audio processing, this article will explore the technicalities of sound, outline the core principles of the Fourier Transform, and provide example Python code to implement common processing techniques such as frequency domain analysis and low pass filtration.

Understanding Sound

By definition, sound is a longitudinal wave that travels through a medium (gas, liquid, or some solid). In most practical cases, the noises humans hear propagate through air. Created by vibrating objects, these waves can be thought of as compressions and rarefactions within a medium. When a person hears sound, longitudinal waves vibrate the eardrum to create auditory sensations in the brain. The waves can then be represented in the time domain by plotting amplitude over time, in which Amplitude, can be defined as the amount of work required to generate the energy that sets a medium’s particles in motion. [2]

Figure 1: Amplitude vs Time Plot for a Sound Wave [2]

The frequency of a wave is thought of as the amount of vibrations per second, which determine the pitch of a particular sound wave. Further below, Figure 2 denotes a table listing 12 Western musical notes alongside their corresponding frequency, with an example note “A” having a characteristic sound wave with a frequency of 440 Hz. Inversely, the period of a sinusoid is defined as the time it takes for a sinusoid to repeat a cycle in seconds. Mathematically, the relation between period and frequency is defined by the following equation: f = 1/T , where frequency is the inverse of a period.

In a wave, wavelength denotes the distance from trough to trough or from crest to crest within a given cycle. The relation between frequency, period, and wavelength is expressed in the following relation: 2pif = 2pi/T = w , where w is the wavelength.

Figure 2: List of the 12 western musical notes and their corresponding frequencies (Note: A at 880Hz has the same pitch as A at 440Hz but is an “octave up”) [1]

Audio Storage

In the real world, sound is a continuous analog signal. However, due to the binary system of Computers, digital signals can only be recorded in the discrete realm. To record sounds, an Analog-to-Digital-Converter (ADC) is used to transform real-world audio into a discrete signal. ADCs use sampling rates to collect amplitude information about noise waves. For audio, the standard commercial sampling rate is roughly 44.1 kHZ, meaning that 44,100 samples of the signal’s amplitude is recorded every second. Increasing the sampling rate allows a computer to reconstruct an audio signal more accurately at the cost of increased storage. In most common audio formats, sound is stored discreetly and in the time domain.

Two of the most common storage files for audio include WAV (Waveform Audio File Format) and MP3 (MPEG Audio Layer III). In these digital formats, sound is represented as a sequence of numerical values that correspond to the amplitude of the sound wave at different points in time. The coding segment of this article will focus on .WAV files. [3]

The Fourier Transform and Fast Fourier Transform

The Fourier Transform is a mathematical operation which decomposes a signal into its constituent frequencies as well as the magnitude of those frequencies. In the realm of audio processing, the Fourier Transform is a critical tool that serves as the basis for many of the technologies responsible for recording, modifying, and outputting sound. It aids in implementing analysis and modulation of an audio’s frequency spectrum. Formally, the Fourier Transform is defined as:

Figure 3: Mathematical Representation of the Fourier Transform [4]

If there is an audio signal x(t) modeled by amplitude over time, X(f) then denotes the frequency representation of the same signal.As seen by the formula, it is taken by deriving the integral of a given signal, x(t) over all time, and modulating it by a complex exponential function (e^-j2𝛑ft). Using Euler’s formula, this exponential can be broken down into a combination of sine and cosine waves given by:

Figure 4: Euler's Formula

X(f) represents the count of frequencies “f” present within the signal. Where each frequency itself can be broken down into sine and cosine waves as well, with sine representing the imaginary component and cosine representing the real component.

The Fourier Transform is a process used solely for continuous signals. For discrete signals, the integral can not be taken. Instead, an alternative transform, known as the Discrete Fourier Transform (DFT) can be used. The DFT samples a signal over a finite period of time instead of all time, deriving a summation.

Since all music is constructed on the basis of different notes, one can utilize an FFT (Fast Fourier Transform) on a discrete signal in the time domain to yield a frequency representation of an entire song. The FFT is an alternative algorithm to the traditionally computation heavy DFT (Discrete Fourier Transform) that is often utilized to perform transforms on discrete signals. The FFT was designed to be a more speed efficient algorithm to apply transforms on a signal at the sacrifice of some accuracy and algorithmic complexity.The FFT is defined mathematically as:

Figure 5: Mathematical Representation of the Fast Fourier Transform [4]

Generally, the majority of audio signals are recorded in the time domain. When sound is captured digitally, the amplitude of the waves and time stamps are stored discretely, sampling the amplitude at specific points in time. However, in terms of analysis, plotting an audio signal in the time domain yields a minimal description on the sounds frequency composition.

Figure 6: The above graph is an amplitude/time domain representation of a recorded song. Frequency information is limited and the prominent features are areas where the signal strength is largest. To begin plotting a signal, we first need to install all the dependent modules to run our code, including: soundfile, sounddevice, numpy, and matplotlib.pyplot. To do so we can run:

pip install soundfile sounddevice numpy matplotlib

We can then import these modules in our code by writing:

import soundfile as sf
import sounddevice as sd
import numpy as np import math
import matplotlib.pyplot as plt

Next to begin plotting a graph of a signal we must first begin by writing code to read an input signal:

data, fs = sf.read('IPAGIRL.wav') # Load a wave file called input.wav
plt.plot(time, data)

The above two lines of code provide the primary functionality of reading an audio file and outputting the plot seen in Figure 3. data, fs serve as two variables storing the audio data as well as sample rate returned by sf.read. plt.plot then takes in time and data (in which time represents a list containing the time values at which we are sampling) as the xy values respectively. The full implementation of the code is located in the appendix. The code is marked by the comment "Code to visualize sound on a plot." [6] By utilizing the FFT on a song, we can analyze the most common frequencies (usually corresponding to a particular musical key) through a graph of largest bin counts on a frequency graph. The following code is a basic example of a FFT used to display the frequencies within a recorded song.

Figure 7: The above figure is the FFT of the same song, illustrating that lower frequencies are more common. These low frequencies are typical in Western music and match the sub 500 HZ frequency range of the electric guitar and human voice used in the song.

fft_result = np.fft.fft(data)
frequencies = np.fft.fftfreq(len(fft_result), 1/fs)

The above lines of code are most critical in plotting the frequency spectrum of an audio signal. "np.fft.fft(data)" takes the FFT of the data set and returns an array of Fourier coefficients representing amplitude and phase. “np.fft.fftfreq(len(fft_result), 1/fs)” is then used to derive an array of the frequency bins. The argument “len(fft_result)” stores the length of the resulting FFT transform and the argument “1/fs” is the time interval between samples. The full implementation of this code is marked in the "Code to Visualize Frequency" in the appendix section. [7]

Applications of FFT

Equalization, also known as EQ, is thought of as a fundamental audio processing tool for adjusting the frequency balance of audio signals. Low frequencies correspond to bass sounds, with higher frequencies sounding more sharp (like the higher strings of a guitar or a singer hitting a ‘high note’). Adjusting an audio signal’s EQ is the equivalent to applying a low, high, or band pass filter to a signal’s frequency data. The following python code is an example of using a FFT to create a low pass filter which can adjust the low frequencies present in a song. [5]

Figure 8: The above figure illustrates the FFT of the sample song used, with the second plot illustrating the same song having undergone low pass filtration.

# Define a cutoff frequency for the low-pass filter
cutoff_frequency = 200  # Adjust this value based on your desired cutoff frequency
# Create a frequency mask for the low-pass filter
low_pass_mask = (np.abs(frequencies) <= cutoff_frequency)
# Apply the low-pass filter to the FFT result
fft_result_low_pass = fft_result * low_pass_mask
# Take the inverse Fourier transform and extract the real part
low_pass_result = np.fft.ifft(fft_result_low_pass).real

The above coding segment is the implementation of the low-pass filtration plot seen in Figure 7. To perform this implementation, a low-pass filter mask (range of desired frequencies) is multiplied by the discrete Fourier Transform. In this case, we initiated our frequency cutoff threshold at 200 Hz. The mask is then created by taking the absolute value of each frequency within our signal and zeroing out any frequency values that are not within the cutoff frequency range. Finally, to convert the filtered signal back into a signal in the time domain, “np.fft.ifft(fft_result_low_pass).real” is used to apply an inverse FFT and store only the real components of the signal into “low_pass_result”.

To listen to the newly filtered version of the audio, we can write:

# Play the low-pass filtered audio
sd.play(low_pass_result)
sd.wait()

This code utilizes our imported sound device module to play the resulting “low_pass_result” through the default speaker of our computer. Full implementation of this code can be found in the appendix section marked "Low Pass Filter." [8] Another critical sound processing tool is Compression. Compression aims to reduce the dynamic range of an audio signal, the difference between the loudest and softest parts, making it easier to mix tracks together without any one element overpowering the others. This is achieved by automatically lowering the volume of the loudest parts of a signal, thus ensuring a more consistent overall level. Compressors are fundamental in both live sound and studio recordings, allowing engineers to manage signals with wide dynamic ranges effectively, making them sound more polished and professional. Through use of a Fourier Transform, frequency ranges which carry important digital information are made visible. In areas where limited information is being transmitted, compression techniques can be applied to remove overly dynamic audio levels and improve the quality of a recorded signal without losing sound quality.

Having the correct mix of frequencies on each instrument in a song is crucial for achieving a balanced and enjoyable sound. This article aims to explain the basics of sound processing, and how Fourier Transforms play a pivotal role in shaping the overall sonic landscape of musical composition.

Conclusion

In the realm of sound engineering and music production, Fourier analysis serves as a crucial tool for linking the complexity of sound waves in time to their frequency components. Using the Fast Fourier Transform (FFT), a practical form of the Discrete Fourier Transform (DFT), audio signals are broken down into their frequency elements. This process facilitates spectral analysis, filtering, equalization, and compression. By mastering the Fourier Transform within the context of audio production, one gains a deeper understanding of sound behavior and its digital interactions. Fourier analysis demystifies the intricate nature of sound waves, empowering audio engineers, producers, and musicians to create soundscapes with greater clarity, balance, and impact. Through this article, we have provided a foundational explanation of audio processing and provided a brief walkthrough of implementation details for common signal processing techniques. Listed further below are more detailed code examples alongside a list of references for further exploration into the subject

Links and References

#Code to visualize sound on a plot

import matplotlib.pyplot as plt
#Code for reading in our sound file. I am using a local file on my desktop for this.
import soundfile as sf # Import our soundfile dependency
data, fs = sf.read('IPAGIRL.wav') # Load a wave file called input.wav

#Here we are rrescaling our time axis to display in seconds
time = []
# Go through from 0 to the length of the data
for value in range(0, len(data)): 
    # Determine the current sample value in time
    new_time_value = value / fs
    # Add it to the resulting array   
    time.append(new_time_value)

time = [value/fs for value in range(0, len(data))]
plt.plot(time,data)
plt.xlabel("Time (s)")
plt.ylabel("Amplitude")
plt.title("Girl from Ipanema Cover Audio File Plot")
plt.show()

The following code is a basic example of a FFT used to display the frequencies of this same recorded song:

#Code to visualize frequency on a plot

import matplotlib.pyplot as plt
import numpy as np
#Code for reading in our sound file. I am using a local file on my desktop for this.
import soundfile as sf # Import our soundfile dependency
data, fs = sf.read('IPAGIRL.wav') # Load a wave file called input.wav

#Here we are rescaling our time axis to display in seconds
time = []
# Go through from 0 to the length of the data
for value in range(0, len(data)): 
    # Determine the current sample value in time
    new_time_value = value / fs
    # Add it to the resulting array   
    time.append(new_time_value)

time = [value/fs for value in range(0, len(data))]

fft_result = np.fft.fft(data)
frequencies = np.fft.fftfreq(len(fft_result), 1/fs)

#Creating a freq_range_mask totake a look at only the frequencies we would like to look at
freq_range_mask = (frequencies >= 0) & (frequencies <= 5000)
plt.subplot(2, 1, 2)
# Plot the magnitude spectrum (only positive frequencies)
plt.plot(frequencies[freq_range_mask], np.abs(fft_result[freq_range_mask]))

plt.xlabel("Frequency (Hz)")
plt.ylabel("Magnitude")
plt.title("FFT of Audio Signal")

plt.tight_layout()
plt.show()

#Our Findings Show that we most commonly have recorded G2,A2,F2, and A3 frequencies in our recording.  
#We can further explore this callibration by recorded single notes to reduce noise.

#Low Pass Filter Full Code
import matplotlib.pyplot as plt
import numpy as np
import soundfile as sf
import sounddevice as sd

# Load a wave file
data, fs = sf.read('IPAGIRL.wav')

# Perform Fourier transform
fft_result = np.fft.fft(data)
frequencies = np.fft.fftfreq(len(fft_result), 1/fs)

# Define a cutoff frequency for the low-pass filter
cutoff_frequency = 200  # Adjust this value based on your desired cutoff frequency

# Create a frequency mask for the low-pass filter
low_pass_mask = (np.abs(frequencies) <= cutoff_frequency)

# Apply the low-pass filter to the FFT result
fft_result_low_pass = fft_result * low_pass_mask

# Take the inverse Fourier transform and extract the real part
low_pass_result = np.fft.ifft(fft_result_low_pass).real

# Normalize the audio data to ensure it is within the valid range
low_pass_result /= np.max(np.abs(low_pass_result))

# Plot the magnitude spectrum (only positive frequencies) for the original signal

plt.subplot(2, 1, 1)
plt.plot(frequencies, np.abs(fft_result))
plt.xlabel("Frequency (Hz)")
plt.ylabel("Magnitude")
plt.title("FFT of Original Audio Signal")

# Plot the magnitude spectrum for the low-pass filtered signal
plt.subplot(2, 1, 2)
plt.plot(frequencies, np.abs(fft_result_low_pass))
plt.xlabel("Frequency (Hz)")
plt.ylabel("Magnitude")
plt.title("FFT of Low-Pass Filtered Audio Signal")

plt.tight_layout()
plt.show()

# Play the low-pass filtered audio
sd.play(low_pass_result)
sd.wait()