Writing an Application Defined Audio Device (ADAD) for Engage - rallytac/pub GitHub Wiki

Introduction

Engage supports platform-specific, low-latency audio interfaces for capturing audio from microphones and playin audio to speakers. However, there are cases where the platform has unique audio characteristics, features, or limitations that prevents Engage from interfacing with the audio subsystem. Furthermore, some applications have a need to process the raw audio directly rather relying on Engage to do so.

In these cases, Engage allows for Application-Defined Audio Devices or "ADAD" for short.

How an ADAD works

Simply put, an ADAD is a software construct that is wholly managed by the application using the Engage Engine.

This software construct may connect to actual audio hardware, convey the audio data over a network, record the audio for archival purposes, mix or forward the audio into other streams, and so on. As far as Engage is concerned, however, the ADAD is a virtual audio device capable of either capturing audio (as a "microphone") or rendering it (as a "speaker") using raw PCM format (16-bit, signed, Little Endian format).

To get things going, you first need to register the device with Engage, calling engageAudioDeviceRegister(), passing in JSON configuration that describes the device. The C++ definition of this configuration is "*AudioDeviceDescriptor" and can be found in the ConfigurationObjects.hpp file. Serialized, the JSON that is of importance from the application's perspective, is as follows:

{
   "samplingRate":16000,
   "channels":2,
   "direction":2,
}

samplingRate indicates the rate at which PCM is sampled by the device. Valid values are 8000 and 16000 - representing 8Khz or 16Khz (wideband) sampling.
channels indicate whether the device operates in mono (1 channel) or stereo (2 channel) mode. At this time, only values of 1 and 2 are allowed for channels. Furthermore, it is recommended that input devices (i.e. virtual microphones) represent their input as 1 - i.e. mono.
direction indicates whether the device is an input device - i.e. Engage views it as a microphone - or an output (i.e. speaker) device. For input devices, set this value to 1. For output devices, set it to 2.

An example speaker configuration

{
   "samplingRate":16000,
   "channels":2,
   "direction":2,
}

An example microphone configuration

{
   "samplingRate":16000,
   "channels":1,
   "direction":1,
}

The device callback function

Along with the JSON, what's also passed into the call to engageAudioDeviceRegister() is a pointer to a callback function in the application that Engage calls to carry out operations for the device. (This is not unlike calls to ioctl() that you're likely already familiar with.) .

This function has the following C-language prototype:

typedef int (*PFN_ENGAGE_AUDIO_DEVICE_CTL)(int deviceId, 
                                           int instanceId, 
                                           EngageAudioDeviceCtlOp_t op, 
                                           uintptr_t param);

As with similar style calls to ioctl() and the like, the parameter list for the callback function consists of some identifying parameters, an operation code to be carried out, and an additional parameter that may or may not be appropriate or defined depending on the operation.

deviceId is the ID of the device assigned by Engage
instanceId is the ID of an instance of the device as assigned by the application
op is the operation to be carried out, such as starting, stopping, pausing, and resuming instances
param is defined based on the value of op

Example code

The best way to explain how this all hangs together is to write some code. Let's do that! For our purposes, we're going to create two audio devices - one being a virtual speaker, the other being a virtual microphone - which we'll register these devices with Engage. Furthermore, when we create an Engage group in the call to engageCreateGroup(), we're going to tell Engage to use our ADADs for the group's microphone and speaker respectively.

Before we get going, though, we need to understand what a device ID is, and what an instance ID is...

Device IDs and Instance IDs

When you register a new device with Engage, it will assign a 16-bit numeric identifier which is used inside the Engine. This number is unique for every hardware and virtual audio device used by Engage and will always be a positive integer. In other words, if you have a device identifier of zero, or a negative number, there was a problem and the identifier is not valid.

Now, when Engage uses an audio device (such as the speaker output on a computer), it creates a connection to that device. If it uses the audio device a second time while the first connection is already in use, it may create another connection. These connections are known as instances within the Engage Engine. ADADs need to provide a similar mechanism - i.e. a single audio device needs to be able to support multiple connections to it or, more abstractly, virtual instances of the device.

So, once Engage knows about a device, and that device has been associated with a group for input or output, the device needs to be able to create and destroy instances at the behest of the Engage Engine. These creation and destruction operations may happen at any time to its best not to assume that particular instance of a device will be kept alive for any particular length of time.

Also, and very importantly, we say Engage "may" create additional connections to a device. But tat may not always be the case because the Engine might very well decide to use the same device instance for multiple groups to reduce resource usage. Therefore, your ADAD must be capable of dealing with multiple simultaneous streams. As of this writing, Engage will create a unique speaker/output instance per group but will create only a single microphone/input instance that is shared across groups. What this means is that (again, as of this writing), the stream from your input ADAD - i.e. microphone - will be propogated to all groups that are currently transmitting and that are using the same Device ID. In other words, microphone ADADS for the same device ID will have only a single instance created by Engage. Speaker ADADS on the other hand, will have multiple instances for the same device ID - 1 per group. If you want to force multiple instances of an input, you need to register multiple input ADADs and assign each to each unique group in Engage.

Instance Operations

Creation

Instances are created inside your ADAD by Engage calling the callback control function defined in the call to engageAudioDeviceRegister() with an op value of eadCreateInstance. In response, your ADAD needs to setup whatever constructs that are necessary for it to function, and return a positive integer value that identifies the instance.

Destruction

When it comes time to destroy an instance of an ADAD, Engage will call the callback with eadCreateInstance. You should return ENGAGE_AUDIO_DEVICE_RESULT_OK for this operation.

Starting the instance

Engage will start the instance only when necessary - such as starting a speaker only when inbound audio is ready for rendering or when the microphone is needed to be captured to transmission purposes. Once the device is started, it needs to remain in a running state until such time that Engage stops the instance or destroys it. The operation to start the instance via the callback is eadStartInstance. In response, your callback function should return ENGAGE_AUDIO_DEVICE_RESULT_OK to indicate that the instance has started or one of the other ENGAGE_AUDIO_DEVICE_xxxxx constants.

Stopping the instance

Similar to starting the instance, Engage will stop the instance when necessary. It will do so as soon possible so as to minimize CPU and memory usage. The operation to stop the instance via the callback is eadStopInstance. Also, return ENGAGE_AUDIO_DEVICE_RESULT_OK if everything is fine.

Pausing, resuming, resetting, and restarting the instance

Finally, to pause, resume, reset, and restart the instance, Engage will call your ADAD's callback with eadPause, eadResume, eadResume, eadReset, and eadRestart respectively. As with starting and stopping, return ENGAGE_AUDIO_DEVICE_RESULT_OK if everything is fine.

Let's get coding

For our example, we're going to make a little class called ADADInstance which will represent an instance of an ADAD - whether it be an input ADAD (a microphone) or an output ADAD (a speaker). Each time an instance is created, by calls from Engage, we're going to create an ADADInstance object and store it in a STL map for quick and easy retrieval for later operations. Each object is going to encompass all the operations our ADAD instance can carry out; including a simple loop running on a thread that will either feed Engage PCM audio buffers as they "become available" (in other words, a microphone), or pull PCM audio buffers at intervals that are appropriate for the device. Our simplistic little device is going to run it's loop in 10 millisecond increments either delivering or pulling 160 PCM samples every time the loop goes around. (We're doing 160 samples per buffer because we're going to tell Engage that our device samples in 16 Khz wideband format - i.e. 16 PCM samples per millisecond).

Pushing and pulling audio

Notice in the code below how the ADAD is wholly responsible to "pushing" audio to Engage as well as "pulling" audio from Engage by calling engageAudioDeviceWriteBuffer() and engageAudioDeviceReadBuffer() respectively. Engage will not ask for audio in the case of input, nor will it send audio to the ADAD in the case of output. Rather, your ADAD is responsible for timing and streaming of audio in the manner that it makes most sense for your audio. The declaration of these functions is as follows:

ENGAGE_API int16_t engageAudioDeviceWriteBuffer(int16_t deviceId, int16_t instanceId, const int16_t *buffer, size_t samples);
ENGAGE_API int16_t engageAudioDeviceReadBuffer(int16_t deviceId, int16_t instanceId, int16_t *buffer, size_t samples);

For both functions, the deviceId and instanceId parameters form a unique combination of the registered device and the instance of that device.
For engageAudioDeviceWriteBuffer(), the buffer parameter points to the location of the PCM samples that the ADAD wants to deliver to Engage; while for engageAudioDeviceReadBuffer(), buffer points to the location where the ADAD wants Engage to place inbound audio samples.

The samples parameter works in the same way. For engageAudioDeviceWriteBuffer(), samples indicates how many PCM samples (not bytes!) the ADAD wishes to write into Engage from buffer; while, for engageAudioDeviceReadBuffer(), samples indicates the number of samples the ADAD wishes to read from Engage.

The return value from these functions is either a negative number to indicate that an error occurred, or the value specified by samples.

Outbound audio

Outbound audio can be sent to Engage in any block size - up to 16000 samples at a time. However, smaller block sizes are recommended so as to minimize audio latency for receivers. Also, it is recommended that these block sizes be in multiples of 10ms for increased efficiency.

Inbound audio

In the case of Engage receiving audio; that inbound audio is queued upon receive and the ADAD extracts samples from the head of the queue every time it calls engageAudioDeviceReadBuffer(). If the queue becomes too long, Engage automatically trims the oldest samples. If insufficient queued samples are avauilable, Engage will fill the remaining space in the buffer with zeroes. Therefore, the ADAD should request a small number of samples on every call rather than an overly large amount which will result in empty spaces in the resulting audio playout stream.

class ADADInstance
{
public:
    ADADInstance(int16_t deviceId, 
                 int16_t instanceId, 
                 ConfigurationObjects::AudioDeviceDescriptor::Direction_t direction)
    {
        _deviceId = deviceId;
        _instanceId = instanceId;
        _direction = direction;
        _running = false;
        _paused = false;
    }

    int start()
    {
        if(!_running)
        {
            _running = true;
            _paused = false;
            _threadHandle = std::thread(&ADADInstance::thread, this);
        }

        return ENGAGE_AUDIO_DEVICE_RESULT_OK;
    }

    int stop()
    {
        _running = false;
        if(_threadHandle.joinable())
        {
            _threadHandle.join();
        }

        return ENGAGE_AUDIO_DEVICE_RESULT_OK;
    }

    int pause()
    {
        _paused = true;
        return ENGAGE_AUDIO_DEVICE_RESULT_OK;
    }

    int resume()
    {
        _paused = false;
        return ENGAGE_AUDIO_DEVICE_RESULT_OK;
    }

    int reset()
    {
        return ENGAGE_AUDIO_DEVICE_RESULT_OK;
    }

    int restart()
    {
        return ENGAGE_AUDIO_DEVICE_RESULT_OK;
    }

private:
    ADADInstance()
    {
        // Not to be used
    }

    void thread()
    {
        // Our "device" will work in 10ms intervals
        const size_t  MY_AUDIO_DEVICE_INTERVAL_MS = 10;

        // The number of samples we produce/consume is 16 samples per millisecond - i.e. this is a wideband device
        const size_t  MY_AUDIO_DEVICE_BUFFER_SAMPLE_COUNT = (MY_AUDIO_DEVICE_INTERVAL_MS * 16);

        int16_t buffer[MY_AUDIO_DEVICE_BUFFER_SAMPLE_COUNT];
        int     rc;
        int     x;

        // These are used to generate the sine wave for our "microphone"
        float amplitute = 1000;
        float pi = 22/7;
        float freq = 1024;
        float tm = 0.0;
        float phaseShift = 0.0;

        while( _running )
        {
            if(!_paused)
            {
                if(_direction == ConfigurationObjects::AudioDeviceDescriptor::Direction_t::dirOutput)
                {
                    rc = engageAudioDeviceReadBuffer(_deviceId, _instanceId, buffer, MY_AUDIO_DEVICE_BUFFER_SAMPLE_COUNT);

                    if(rc > 0)
                    {
                        // At this point we have rc number of audio samples from the Engine.  These now need to be sent 
                        // onward to where they're needed.  For purposes of this demonstration, we'll simply calaculate the
                        // average sample level and display that.

                        float total = 0.0;
                        for(x = 0; x < rc; x++)
                        {
                            total += buffer[x];
                        }

                        std::cout << "ADADInstance received " << rc << "samples with an average sample level of " << (total / rc) << std::endl;
                    }
                }
                else if(_direction == ConfigurationObjects::AudioDeviceDescriptor::Direction_t::dirInput)
                {
                    // For purposes of this demo, we'll fill our buffer with an ongoing sine wave.  In your app you will want to pull in these
                    // samples from your actual audio source

                    for(x = 0; x < MY_AUDIO_DEVICE_BUFFER_SAMPLE_COUNT; x++)
                    {
                        tm += 1.0;

                        if(tm > 360.0)
                        {
                            tm = 1.0;
                        }

                        float sampleVal = amplitute * sin(2 * pi * freq * tm + phaseShift);

                        if(sampleVal < -32768.0)
                        {
                            sampleVal = -32768.0;
                        }
                        else if(sampleVal > 32767.0)
                        {
                            sampleVal = 32767.0;
                        }

                        buffer[x] = (int16_t)sampleVal;
                    }

                    rc = engageAudioDeviceWriteBuffer(_deviceId, _instanceId, buffer, MY_AUDIO_DEVICE_BUFFER_SAMPLE_COUNT);
                }
                else
                {
                    assert(0);
                }
            }

            // Sleep for our device's "interval"
            std::this_thread::sleep_for(std::chrono::milliseconds(MY_AUDIO_DEVICE_INTERVAL_MS));
        }
    }

    int16_t                                                     _deviceId;
    int16_t                                                     _instanceId;
    ConfigurationObjects::AudioDeviceDescriptor::Direction_t    _direction;
    std::thread                                                 _threadHandle;
    bool                                                        _running;
    bool                                                        _paused;
};

We also need to define the STL map where we'll store pointers to our device instances. It looks as follows and is stored globally in our application:

std::map<int16_t, ADADInstance*>    g_audioDeviceInstances;

We also need some other global variables:

int16_t                             g_speakerDeviceId = 0;
int16_t                             g_microphoneDeviceId = 0;
int16_t                             g_nextAudioDeviceInstanceId = 0;

Next, we're going to need that callback function that Engage will call to carry out the operations described above. It's responsibility is to create and destroy device instances - inserting and removing those instances in the map as appropriate. Also, it will be responsible for delegating instance-level operations (such as starting and stopping) to instances in the map as needed.

int MyAudioDeviceCallback(int16_t deviceId, int16_t instanceId, EngageAudioDeviceCtlOp_t op, uintptr_t p1)
{    
    int rc = ENGAGE_AUDIO_DEVICE_RESULT_OK;

    ADADInstance *instance = nullptr;

    // Instance creation is a little different from other operations
    if( op == EngageAudioDeviceCtlOp_t::eadCreateInstance)
    {
        g_nextAudioDeviceInstanceId++;

        if(deviceId == g_speakerDeviceId)
        {
            // Create an instance of a speaker
            instance = new ADADInstance(deviceId, g_nextAudioDeviceInstanceId, ConfigurationObjects::AudioDeviceDescriptor::Direction_t::dirOutput);
        }
        else if(deviceId == g_microphoneDeviceId)
        {
            // Create an instance of a speaker
            instance = new ADADInstance(deviceId, g_nextAudioDeviceInstanceId, ConfigurationObjects::AudioDeviceDescriptor::Direction_t::dirInput);
        }
        else
        {
            // Something went terribly wrong!
            assert(0);
        }     

        g_audioDeviceInstances[g_nextAudioDeviceInstanceId] = instance;

        rc = g_nextAudioDeviceInstanceId;
    }
    else
    {
        // Track down the instance object
        std::map<int16_t, ADADInstance*>::iterator itr = g_audioDeviceInstances.find(instanceId);
        if(itr != g_audioDeviceInstances.end())
        {
            instance = itr->second;

            switch( op )
            {
                // We should never fall into this case because the "if" above catered for it.  But, some compilers
                // will warn about the switch not catering for all enum values from EngageAudioDeviceCtlOp_t.  So we'll
                // put in this case to keep them happy.
                case EngageAudioDeviceCtlOp_t::eadCreateInstance:
                    assert(0);
                    break;

                // Otherwise, Engage wants us to ...

                // ... destroy an instance
                case EngageAudioDeviceCtlOp_t::eadDestroyInstance:
                    instance->stop();
                    delete instance;
                    g_audioDeviceInstances.erase(itr);
                    break;

                // ... start an instance
                case EngageAudioDeviceCtlOp_t::eadStart:
                    instance->start();
                    break;

                // ... stop an instance
                case EngageAudioDeviceCtlOp_t::eadStop:
                    instance->stop();
                    break;

                // ... pause an instance
                case EngageAudioDeviceCtlOp_t::eadPause:
                    instance->pause();
                    break;

                // ... resume an instance
                case EngageAudioDeviceCtlOp_t::eadResume:
                    instance->resume();
                    break;

                // ... reset an instance
                case EngageAudioDeviceCtlOp_t::eadReset:
                    instance->reset();
                    break;

                // ... restart an instance
                case EngageAudioDeviceCtlOp_t::eadRestart:
                    instance->restart();
                    break;

                // The compiler should catch this.  But, just in case ...
                default:
                    assert(false);
                    rc = ENGAGE_AUDIO_DEVICE_INVALID_OPERATION;
                    break;
            }            
        }
        else
        {
            std::cout << "MyAudioDeviceCallback for an unknown instance id of " << instanceId << std::endl;
            rc = ENGAGE_AUDIO_DEVICE_INVALID_INSTANCE_ID;
        }        
    }

    return rc;
}

Finally, we need some housekeeping code to plumb this stuff all together. First, we need to register our device with Engage. We do so by calling engageAudioDeviceRegister. For our demo purposes, we're going to use the handy-dandy AudioDeviceDescriptor class found in ConfigurationObjects.hpp which makes short work of creating serialized JSON. But you can use any method you'd like to create the JSON needed. Also, we'll be registering a speaker device and a microphone device.

void registerADAD()
{
    // Setup the speaker
    {
        ConfigurationObjects::AudioDeviceDescriptor speakerDevice;
        speakerDevice.direction = ConfigurationObjects::AudioDeviceDescriptor::Direction_t::dirOutput;
        speakerDevice.deviceId = 0;
        speakerDevice.samplingRate = 16000;
        speakerDevice.channels = 2;
        speakerDevice.boostPercentage = 0;
        std::string json = speakerDevice.serialize();
        g_speakerDeviceId = engageAudioDeviceRegister(json.c_str(), MyAudioDeviceCallback);
        if(g_speakerDeviceId < 0)
        {
            g_speakerDeviceId = 0;
        }
    }

    // Setup the microphone
    {
        ConfigurationObjects::AudioDeviceDescriptor microphoneDevice;
        microphoneDevice.direction = ConfigurationObjects::AudioDeviceDescriptor::Direction_t::dirInput;
        microphoneDevice.deviceId = 0;
        microphoneDevice.samplingRate = 16000;
        microphoneDevice.channels = 1;
        microphoneDevice.boostPercentage = 0;
        std::string json = microphoneDevice.serialize();
        g_microphoneDeviceId = engageAudioDeviceRegister(json.c_str(), MyAudioDeviceCallback);
        if(g_microphoneDeviceId < 0)
        {
            g_microphoneDeviceId = 0;
        }
    }
}

Of course, if we register devices, we need to unregister them when we're all done.

void unregisterADAD()
{
    if(g_speakerDeviceId > 0)
    {
        engageAudioDeviceUnregister(g_speakerDeviceId);
        g_speakerDeviceId = 0;
    }
    
    if(g_microphoneDeviceId > 0)
    {
        engageAudioDeviceUnregister(g_microphoneDeviceId);
        g_microphoneDeviceId = 0;
    }
}

And ... we should be sure to cleanup properly when all is complete

void cleanupADADInstances()
{
    for(std::map<int16_t, ADADInstance*>::iterator itr = g_audioDeviceInstances.begin();
        itr != g_audioDeviceInstances.end();
        itr++)
    {
        itr->second->stop();
        delete itr->second;
    }

    g_audioDeviceInstances.clear();
}

Hopefully you're still following and not running for the hills by now. We're pretty much done so keep going a little bit more.

The only thing remaining for us to do is, when we call engageCreateGroup(), to modify the group definition JSON to tell Engage to use our ADAD instead of the default audio device. For the group's audio.outputId, plug in the value of g_speakerDeviceId. And for the group's audio.inputId, use the value of g_microphoneDeviceId. For example, assuming Engage gave us a device ID of 3 when we registered our speaker, and a device ID of 7 when we registered our microphone, the JSON for our group would look (at least) as follows:

{  
   "type":1,
   "id":"{114c3587-fac6-455c-83ab-19c6769c1228}",
   "name":"Alpha",
   "alias":"UNIT12345678",

   "audio":{  
      "outputId":3,   <---------
      "inputId":7     <---------
   },

   "rx":{  
      "address":"234.42.42.2",
      "port":18002
   },

   "tx":{  
      "address":"234.42.42.2",
      "port":18002
   },

   "txAudio":{  
      "encoder":10,
      "framingMs":20
   }
}

By comparison, if we weren't using an ADAD for the group, our JSON would simply be:

{  
   "type":1,
   "id":"{114c3587-fac6-455c-83ab-19c6769c1228}",
   "name":"Alpha",
   "alias":"UNIT12345678",

   "rx":{  
      "address":"234.42.42.2",
      "port":18002
   },

   "tx":{  
      "address":"234.42.42.2",
      "port":18002
   },

   "txAudio":{  
      "encoder":10,
      "framingMs":20
   }
}

If we used just our microphone for the group but the default speaker, the JSON would be:

{  
   "type":1,
   "id":"{114c3587-fac6-455c-83ab-19c6769c1228}",
   "name":"Alpha",
   "alias":"UNIT12345678",

   "audio":{  
      "inputId":7   <---------
   },

   "rx":{  
      "address":"234.42.42.2",
      "port":18002
   },

   "tx":{  
      "address":"234.42.42.2",
      "port":18002
   },

   "txAudio":{  
      "encoder":10,
      "framingMs":20
   }
}

And if we used just our speaker for the group but the default microphone, the JSON would be:

{  
   "type":1,
   "id":"{114c3587-fac6-455c-83ab-19c6769c1228}",
   "name":"Alpha",
   "alias":"UNIT12345678",

   "audio":{  
      "outputId":3   <---------
   },

   "rx":{  
      "address":"234.42.42.2",
      "port":18002
   },

   "tx":{  
      "address":"234.42.42.2",
      "port":18002
   },

   "txAudio":{  
      "encoder":10,
      "framingMs":20
   }
}

Easy peasy!