Spatialization - davidpanderson/Numula GitHub Wiki

Numula's spatialize module has basic functions for manipulating audio signals. The goal is to let you "spatialize" music; that is, to

divide a piece into separate voices;
have each voice seem to emanate from different point in space;
move these points over time, algorithmically.

Currently only stereo position is supported: voices can be located along a line between the left and right speakers. In the future we hope to support 3D binaural sound.

Here's an example (see below). Listen with headphones.

Accessing WAV files

Software synthesizers (such as Pianoteq) let you render MIDI files to .wav files. This module lets you read .wav files into memory, combine and change them, write the results to .wav files, and play .wav files.

from numula.spatialize import *
show_info(fname)

Print the parameters (frame rate, sample size, etc.) of the given .wav file.

n = nframes(fname)

Return the number of frames in a WAV file.

graph_wav(fname, nframes)

Draw a graph of the first N frames of the given .wav file.

samples = read_wav(fname)

Read the given .wav file into memory. Audio signals are stored in memory as lists of floating-point sample pairs. Each sample pair is called a 'frame'.

write_wav(fname, samples)

Write the samples to the given file.

zero_signal_ns(nsamples)

Return a zero signal with the given number of samples.

zero_signal_t(nsecs, framerate)

Return a zero signal with the given duration.

scale(samples, gain)

Multiply the samples by the given value.

pan_signal(isig, framerate, ang, pos_array, osig)

Add the input signal isig to the output signal osig, spatializing it according to the per-sample list of stereo positions pos_array. Each value is from -1 (left) to +1 (right). "ang" is the separation (0..1) between the left and right input channels; if 0, the channels are summed (i.e. treated as mono). Panning is done using the constant power formula.

If pos_array is shorter than isig, its last value is used for remaining frames. osig must be at least as long as isig.

Specifying stereo position in score time

The pos_array argument to pan_signal() is in terms of seconds, i.e. performance time. But to be musically useful we need two things:

To specify position as a function of score time, e.g. a voice should pan from left to right over 4 measures.
To specify changing position as a piecewise function of time, the same as volume and tempo changes.

These capabilities are provided by the function

Score.get_pos_array(pos_pft: PFT, framerate: float) -> list[float]

This takes a PFT describing a stereo position trajectory (over score time). It returns a per-frame list of stereo positions as required by pan_signal(). Call this after all timing adjustments have been done.

get_pos_array() works by constructing a monotonic linear map from performance time to score time, then looping over frames and evaluating the PFT at the corresponding score time.

Example

To spatialize a piece with Numula:

Create a Score, tagging each note with its voice name
Write each voice to a MIDI file
Render each of these MIDI files to a WAV file
Add these WAV files, with the desired panning, into an output file

An example is in poly_pan.py. Some excerpts:

ns = Score()
ns.insert_ns(v0, tag='v0')
ns.insert_ns(v1, tag='v1')
ns.insert_ns(v2, tag='v2')
ns.done()

Create the Score, tagging the three voices.

max_nframes = 1
for i in range(3):
    ns.write_midi('data/v%d.mid'%i, lambda n: 'v%d'%i in n.tags)
    pianoteq.midi_to_wav('data/v%d.mid'%i, 'data/v%d.wav'%i)
    nf = spatialize.nframes('data/v%d.wav'%i)
    if nf > max_nframes:
        max_nframes = nf

Render the voices to WAV files. Keep track of the maximum length.

pos_pft = [
    [
        Linear(.5, 1, 4*6/8),
        Linear(1, 1, 28*6/8)
    ],
    ...

Define (as PFTs) the position trajectory of each voice. In this case, the first voice starts in the middle, moves to the right over 4 measures, and remains there.

signal = spatialize.zero_signal_ns(max_nframes*2)

Make the output buffer.

for i in range(3):
    pos_array = ns.get_pos_array(pos_pft[i], 44100)
    spatialize.pan_signal(
        spatialize.read_wav('data/v%d.wav'%i),
        44100, .1,
        pos_array,
        signal
    )

For each voice, convert its position trajectory into to a per-sample position list. Then use pan_signal() to apply this to the voice's signal, adding the result to the output buffer.

spatialize.write_wav('data/pan_test.wav', signal)

Write the output buffer to a WAV file.

Playing .wav files

spatialize.play_wav('data/pan_test.wav')

Play the specified .wav file using (on Windows) Windows Media Player.