Sound ~ Research Implementation and User Stories - uchicago-cs/chiventure GitHub Wiki

Welcome to the Sound Wiki

The following wiki will document the progress of implementing sound.

User Stories

As a player, I want audio that will enhance the story-telling experience by having appropriate soundtracks and sound effects for various elements of the game. (JIG)
As a player, I’d like to listen to music while on the chiventure start screen, before I build a game therefore using sound to mark the beginning of chiventure.
As a player, I want background music to play while in different rooms of a game in order to create ambience that supports the narrative experience.
As a player, I want a sound effect upon room changes in order to receive auditory feedback of movement throughout chiventure.
As a player, I want a sound effect upon action or object selection so that I may receive feedback when I successfully perform an action or use an object.
As a player, I want an option to locate and listen to all music, with titles and composers, found in chiventure.
As a player, I want to change the volume of sound effects, background music, and ambient sounds, independently, therefore having customizable sound settings.
As a player, I want to have control of a master volume setting that can change the volume of sound effects, background music, and ambient sounds all at once.
As a player, I want to toggle on or off sound effects, background music, ambient sounds, and the master volume, so that I can conveniently, and quickly, customize my sound settings.
As a player, I want music that plays upon finishing a game to convey accomplishment and remain consistent with music found at the start of chiventure.
As a player, I want different sound effects or music to play when new characters are introduced which would allow sound to support each character’s development and their relative differences.

Text to Speech Library Research

1. Festival

We ran into issues setting up festival. Not only was it written in c++, but we think the version of c++ it was written in is now outdated. The sample hello world program they gave as a compiler check has errors which stack overflow claims are due to conflicts between pre and post 2000s C++ compilers.

2. GNU speech

When reading through it's features, we were struck by the range of inputs it could handle. Aside from a 70,000 word dictionary with 6,000 names, it can parse non-word attributes into words (dates, numbers, etc.) and speak them. It can also pronounce words not in it's dictionary, just with less accuracy.

However, there seems to be little recent documentation about how to set it up and how to make a demo. The website is from ~2003 I believe so I'm not even positive its a good idea for us to look into based off of the possibility its simply too old.

3. espeak

We tried to get the demo to run as it was written in the readme but we ran into the problem of not being allowed to download the libraries on the linux system due to restrictions. Espeak seems like a really good option for chiventure because of its seemingly straightforward usage, variety of options to choose from, versatility, and the fact that it's written in C.

Sound Library Research

1. PortAudio

PortAudio is an open-source API that supports the development of audio programs across many platforms such as Windows, macOS, and Linux. PortAudio also other functionalities that may prove useful outside of the playback of audio, but that is currently beyond the scope of this initial research. Compiling the PortAudio library creates a dynamically linked library that allows us to utilize the many functions, structs, and definitions. The repository also contains many sample programs that allow the user to generate executable files that playback sound.

Some of the initial difficulties began with the unclear tutorial instructions. Visual or video aids would have been a huge help. Another drawback was the need to use Visual Studio to build the PortAudio library for Windows users. Visual Studio is a very large (~20GB) program therefore downloading Visual Studio may not be ideal. However, PortAudio is currently maintained by a community of developers and the PortAudio API was through documentation. The next step would be to familiarize the structs and functions within PortAudio to develop a greater understanding of PortAudio's capabilities.

Repository: https://github.com/PortAudio/portaudio

Tutorial: http://files.portaudio.com/docs/v19-doxydocs/tutorial_start.html

Documentation: http://files.portaudio.com/docs/v19-doxydocs/portaudio_8h.html

Sample Programs: http://files.portaudio.com/docs/v19-doxydocs/group__examples__src.html

2. Libsoundio

Libsoundio is an open source API that provides cross-platform audio input and output. The library linked to the GitHub is an abstraction with an emphasis on performance for heavy, real-time software like digital audio workstations and consumer software like music players. (VW)

I think the library just standardizes all the different sound drivers into the same API. A sound driver converts ‘raw signals’ into a format that can be understood by speakers, and the library enables you to program using the same API regardless of the sound driver (e.g. mac, windows, linux can have different sound drivers). A backend is a sound server that can be installed to serve as 'middleware' between software applications and hardware.

JACK: A low-latency audio server, used to connect several client applications to an audio device, and allow them to share audio with each other. There is no difference in how an application sends or receives data regardless of whether it comes from/goes to another application or an audio interface.
PulseAudio: A modular, general purpose sound server. Its main purpose is to ease audio configuration, its modular design allows more advanced users to configure the daemon precisely to best suit their needs. PulseAudio
ALSA: The Advanced Linux Sound Architecture (ALSA) provides kernel driven sound card drivers. It replaces the original Open Sound System (OSS). Besides the sound device drivers, ALSA also bundles a user space driven library for application developers. They can then use those ALSA drivers for high level API development.
CoreAudio: Core Audio is the digital audio infrastructure of iOS and OS X. It includes a set of software frameworks designed to handle the audio needs in your applications. Read this chapter to learn what you can do with Core Audio.
WASAPI: The Windows Audio Session API (WASAPI) enables client applications to manage the flow of audio data between the application and an audio endpoint device. Every audio stream is a member of an audio session. Through the session abstraction, a WASAPI client can identify an audio stream as a member of a group of related audio streams. The system can manage all of the streams in the session as a single unit.
Dummy (silence)

(VW)

Repository: https://github.com/andrewrk/libsoundio

3. SDL

SDL is a widely used library that allows access to many features such as video and audio. The SDL wiki provided useful information regarding documentation, installation, and tutorials which aided in building the library in Visual Studio. The library was tested by making a simple project using the SDL_INIT_EVERYTHING function which initialized the timer, audio, video, joystick, haptic, controller, and events subsystems.

SDL is a potential sound library we may want to use because of the ease of building the SDL library, the abundance of information on SDL, and previous research on SDL implementation for chiventure. Moving forward, we may want to look at source code and identify how we can implement audio to code that is currently being used by chiventure.

Repository: https://github.com/libsdl-org/SDL

Tutorial: https://wiki.libsdl.org/Tutorials

Documentation: https://wiki.libsdl.org/CategoryAPI

Residual Backlog Issues from Sound Team Spring 2021:

Implement play/load sound function: Play sound within src/sound/sound.c appears to be working, but the sound_init() function does not incorporate the given variables correctly. This doesn’t seem to be an issue when playing a sound, so perhaps sound_init() is unnecessary. Additionally, to play the sound we are able to use SDL_OpenAudioDevice(), SDL_QueueAudio(), SDL_delay() functionally while also ending and closing the sound file with SDL_CloseAudioDevice(), SDL_FreeWAV(), and SDL_Quit().