IO Guide - Quefumas/gensound GitHub Wiki

There are three main I/O operations, which are further described in the sections below:

  • Signal.play() - playback to speakers.
  • Signal.export(filename) - export to file.
  • WAV(filename) - load Signal from audio file (Wave and AIFF always supported; for other formats, see below).

The supported file formats for exporting and loading, as well as the actual device used for playback, depend on which libraries and software are available to the user. To see which options are supported, skip to the end of the page.

Gensound uses lazy evaluation, meaning that Signal objects are not actually computed until required for playback or file export. This means that mixing, concatenating and applying Transforms is very fast, but calling play and export could take time, as they recompute the entire Signal tree. If you wish to use both on the same Signal object, it is advisable to call Signal.realise(sample_rate), and save the returned Audio object, which supports both operations with the same interface.

Signal Playback

For playback, use the method Signal.play(). It has the following interface:

  • sample_rate - defaults to 44100. Valid values are 8000, 11025, 16000, 22050, 24000, 32000, 44100, 48000, 96000. Note that a Signal object doesn't know/care what sample rate it is supposed to be played in, even when it was directly loaded from a Wave file, so you will have to be explicit about sample rates other than 44.1 kHz.

  • byte_width - defaults to 2 (16-bit). Supported values are 1,2,4 (TODO I think? there was something about floating points)

  • max_amplitude - defaults to None. This argument is used to adjust the output volume, has the following behaviour, which has been updated in version 0.4:

    • If max_amplitude is a positive number, Gensound will shrink/stretch the amplitudes so that they peak (in absolute value) at exactly max_amplitude. Numbers greater than 1 are allowed, but will result in clipping. Note that in this case, any gain or amplitude modification applied to the master Signal will have no effect.

    • If max_amplitude == 0, Gensound will not perform any kind of amplitude adjustment.

    • If max_amplitude is None (the default), Gensound will ensure the amplitudes fit within the range [-1,1], so as to avoid clipping. If there was no clipping in the first place, the audio will remain the same.

Export to File

Use Signal.export(filename). After the filename argument, the user may also specify the three argument available for play, as above. As of 0.4, Gensound supports exporting to WAV/AIFF/AIFC files.

Gensound signals are theoretical constructs that have no knowledge of the sample rate and byte width they are intended to have. Depending on whether we specify durations as milliseconds or number of samples, we may get very different results when exporting.

Signal to samples

Sometimes we wish to evaluate the Signal without passing it to playback or saving as a file. One such case is to pass the data to another module. To obtain the amplitudes directly, use the following code:

someSignal.realise(sample_rate).audio # this is an instance of numpy.ndarray

This retrieves a numpy.ndarray with dtype=numpy.float64, with shape (num_channels, num_samples).

Signal to byte stream

Another use case is when we want to obtain a properly encoded byte stream of the audio. This is similar the previous case, but with two crucial difference:

  • The samples are no longer 64-bit floats, and are instead converted to a specified encoding. Currently supported are unsigned 8-bit integers (when byte_width is 1), signed 16-bit integers (byte width 2), and 32-bit integers (byte width 4), with partial and upcoming support for other encodings. The default should be good enough for most cases; if it isn't, then you probably already know enough to find a solution.
  • The samples are interleaved in memory. This means that the samples will be arranged chronologically, with simultaneous samples from the various channels grouped together (as opposed to writing all samples from the entire first channel; then from the second, etc.).

Example:

bytestream = someSignal.to_bytes() # available from 0.4

The to_bytes function may also receive the same arguments as for play().

Loading From File

Use the WAV class to load audio from file.

Changing sample rate of Raw audio

For a WAV or Raw Signal, use the resample method which recomputes the associated audio stream so it can be played using the desired sample rate without affecting pitch.

from gensound import WAV, test_wav

WAV(test_wav).play(sample_rate=32000) # will sound lower in pitch, since test_wav is in 44.1 kHz

WAV(test_wav).resample(32000).play(sample_rate=32000) # playback will now be at proper pitch

Or in order to save the file using the new sample rate:

WAV(test_wav).resample(32000).export("new_file.wav", sample_rate=32000)

Since WAV/Raw data is cached, calling resample once will affect all other objects contained the same WAV file. We may choose to implement an option to keep both the original as well as the resampled copies, but if that bothers you, you're most likely looking for the Stretch transform instead. Either way, one workaround is to load several unique copies of the original wave file, and another one is using Raw(WAV(filename).mixdown(original_sample_rate)) which effectively makes a unique copy in memory.

resample is a method of Raw, inherited by WAV. Note that it is not a Transform, for good reasons. If you thought about using it to stretch the audio while altering the pitch, you are probably looking for the Stretch transform, which works very much the same.

Hardware support and file formats

Support status

Running IO.status() (with IO available at gensound.io) will print to console the current system's support for playback and file read/write. Supplying the argument True will also print out alternative options which are available for each operation.

For playback, Gensound knows how to interact with the following alternatives:

  • pygame (available here)
  • simpleaudio (available here)
  • playsound (available here)
  • winsound (0.5.1) - always available on Windows systems, relies on export to temporary file, but does not open external player (like os does).
  • os - this one is always available, and involves exporting to a temporary file and playing with the system's default player.

Gensound will try to pass any additional keyword arguments from play() to the appropriate function on the hardware end. For instance, when using pygame, it will accept the additional keyword argument loops, as specified in the PyGame documentation. In addition, it will also return a pygame.mixer.Sound object, which can be stopped.

Changing I/O interfaces

When several options are available for some I/O operation (as can be seen with IO.status(True)), it is possible to tell Gensound to use a particular one, rather than letting it make its own mind. Use IO.set_io(action, interface_name, format=None):

  • action should be one of play, load or export, and indicates which operation should use the specified interface.
  • interface_name is the name of the interface (i.e. pygame, playsound) to be used for this operation, exactly as appears in the output of IO.status(True).
  • format (optional) is relevant only when action is load or export, and indicates for which file formats this interface should be used. This defaults to *, meaning all formats.

File formats

For both reading and writing, Gensound relies on Python's builtin support for *.wave, *.wav and *.aiff, *.aifc, *.aif files. In addition, Gensound automatically detects if FFMPEG is installed (make sure it appears in the PATH environment variable), and if so, read/write capabilities are available for pretty much any format supported by FFMPEG.