Overview of the changes introduced in the atomic effects fork of obs studio - HoneyHazard/obs-studio-atomic-effects GitHub Wiki

General technology changes in the obs-atomic-effects fork of OBS

  • Introduce results; they are tied to respective params
  • Results are passed in a lot like other params before draw; they are also fetched back as results after draw
  • Only one type of result is introduced, which is atomic_uint. This type will be used with atomicCounterIncrement(...) effect statements to allow atomic increments of a counter, which enabled us to build the Pixel Match Switcher plugin.
  • D3D11: change from ps_4_0 to ps_5_0 shader model to support UAV variables
  • OpenGL: change #version 330 to #version 460 to support atomic counters

Outline of the changes

  1. Introducing the atomic counter results
  2. Effects
  3. Parsing the intermediate shaders
  4. Graphics API
  5. libobs-opengl
  6. libobs-d3d11

Introducing the atomic counter results

  • Atomic counters are special variables that allow safe increments (or decrements) of a counter in the parallel shader environment. The Pixel Match Switcher plugin for OBS uses this feature of the graphics systems to count matching pixel as video data passes though the plugin's filter. Because the counting is done in the shader, the atomic counters work with great performance on any reasonably modern video card (must support ps_4_0 for D3D11 or version 460 for OpenGL)
  • Atomic counters require special treatment by D3D11 and special treatment by OpenGL. They need to reside in specially configured blocks of graphics memory and require highly specialized API to be initialized and used.
  • We have selected atomic_uint keyword to represent atomic counters being introduced into the OBS effect system. This keyword is also how the counters are represented in GLSL. Like other global variables of the effect, it must be preceded by uniform.
  • We have selected atomicCounterIncrement(...) function, also borrowed from GLSL, to perform an atomic increment on a counter.

Effects

Effects provide a cross platform wrappers for GLSL and HLSL shader use by OBS. In the OBS effect system you specify both vertex and pixel shader behavior in the same effect file.

Roughly speaking, the effect initialization is as follows:

  1. Effect file is parsed by the effect-parser module, generating data structures for all building blocks of the shaders it needs to generate.
  2. These building blocks are then used to write strings that represent intermediate versions of the vertex and pixel shaders for the effect.
  3. The intermediate versions of shaders are then parsed by shader-parser module to generate data structures representing the shader functionality.
  4. Finally, the data structures generated by the shader-parser are passed to down to either gl-shaderparser or d3d11-shaderprocessor modules to write actual shaders that will do the effect's work.

Additions to the effect language

We introduce two new syntax elements to support use of the atomic counters in the OBS effect language. The two are borrowed from GLSL, which, arguably, has a more user-friendly syntax for implementing atomic counters than HLSL.

  • atomic_uint is the variable type that will be used for the atomic counters. Like other global variables of the effect, it must be preceded by uniform keyword.
    • Example: uniform atomic_uint myCounter;
  • atomicCounterIncrement(...) statement increments an atomic counter.
    • Example: atomicCounterIncrement(myCounter);
  • Note: unlike actual GLSL, will not require and will not support the layout(...) qualifier in the effect code, but the OpenGL graphics subsystem will have to generate the qualifier in the event it is used.

Effect data structures

The data structures built for each effect are used to dynamically generate intermediate shader code, and provide mapping of high-level abstractions for effect variables to lower-level abstractions of shader variables. While modifying the effect data structures, we mirror existing abstractions for effect parameters and their pathways to the Graphics API, as we introduce new abstractions and new pathways for interfacing with program and shader results.

gs_effect [new members]

This struct provides a high level interface for interfacing an effect parameter. It has an array of parameter data structures called params of type gs_effect_param. So we also add a new member results, which is an array of gs_effect_result.

gs_effect_result [new struct]

High level interface for working with a result. Members are:

  • name result name
  • type variable type; currently only GS_SHADER_PARAM_ATOMIC_UINT is supported
  • cur_val value that was last retrieved from result.

Analogous to gs_effect_param.

pass_shaderresult [new struct]

Serves as mapping of higher level effect result interface gs_effect_result to a lower-level shader result interface handle gs_sresult_t.

Analogous to pass_shaderparam.

gs_effect_pass [added members]

Has handle pointers for vertex and pixel shaders, and an array for mapping effect params to shader params. These are used later to make passing in of parameter values possible.

So we add program_results, which is an array of pass_shaderresult, to also do the mapping of effect results to shader results, and make fetching of the results possible.

Parsing the effect code

effect-parser module converts the effect code into data structures and uses them to generate intermediate shader code. Input effect string is tokenized by whitespace, and is then processed, token by token, to build data structures representing individual behavior units of the effect. These are then used to generate intermediate versions of vertex and pixel shaders, so they can be passed down to shader-parser.

Here the goal was to introduce support for results and atomic_uint type.

ep_param [new members]

This data structure maintains information about individual effect parser parameters as they are being parsed from the effect code. Since some parameters are now also results, we add a new field to be used with those results:

  • is_result gets set to true for any parameter that is also result. This will later lead to gs_effect_result being instantiated for every param marked with this flag.

ep_param_init(...) [modified function]

Assigns fields to ep_param. Modified to receive and assign the new field is_result.

ep_parse_param(...) [modified function]

Calls ep_param_init(...) with values received from ep_parse_other(...) and makes sure no erroneous symbols follow the param declaration.

Our modifications here are limited to propagating is_result value to ep_param_init(...).

ep_parse_other(...) [modified function]

This function is for parsing anything in the effect code that is not whitespace, struct, technique, or sampler_state. The function has variables for reacting to property, const, and type keywords, and activates functions for parsing functionss and params. Our modifications are for special handling of the atomic_uint results.

  • is_result variable is added, and is set to true whenever atomic_uint type token is encountered.
  • call to ep_parse_param(...) also passes is_result variable to the function

Writing intermediate shaders

After the effect code is parsed into data structures, these data structures are used to generate the intermediate shader code (so it can be processed later by the shader-parser). During this building of the intermediate shader code for a vertex/pixel shader, a mapping is constructed so that higher level effect params can be linked with corresponding lower-level shader params.

The additions here are for constructing analogous mapping between effect results and shader results.

ep_write_param(...) [modified function]

This function receives a pointer to ep_param as input and writes the param's declarations in the intermediate shader code. It also appends a name of every used param to the array used_params.

We teach the function to also append a param name, that is also a result, to the list of results. It will now receive a pointer used_results, which is an array of strings representing names of the results used in the effect. If the ep_param, passed into the function, is marked with the is_result flag, it is appended to used_results.

ep_write_func_param_deps(...), ep_write_func_func_deps(...), ep_write_func(...), ep_makeshaderstring(...) [modified functions]

These functions call each other and other lower level functions as the intermediate shader is being built from the data structures representing it. They maintain used_params - an array of strings representing params that were encountered, to be eventually passed down to ep_write_param(...), where a name is appended to the array for every used param.

They are all modified so that used_results, an array of strings representing names of the results used in the effect, can also make its way to ep_write_param(...) function, where it can be updated for every used result.

ep_compile_result(...) [new function]

  • Takes as input a name for a result and a pointer to an allocated gs_effect_result so its fields can be initialized.
  • Finds a corresponding gs_effect_param of the same name in the effect's array of params, and retains the pointer in gs_effect_result being initialized. Because this is all called (and has to be called) after the params are "compiled", the array of params is stable and a pointer into that array is safe.
  • A string of param/result name is also duplicated into gs_effect_result.

ep_compilepass_shaderresults(...) [new function]

This function is responsible for building mapping between higher level handles of effect results and lower level handles of shader results, and also invokes mapping between effect params and effect results.

For every result name in the input array of strings used_results:

  • Fetches a pointer to gs_effect_result from the effect_parser that corresponds to the result name.
  • Calls ep_compile_result(...) so gs_effect_result receives a pointer to corresponding gs_effect_param.
  • Finds a pointer/handle of type gs_sresult_t from gs_shader_t that correspots to the result name.
  • The mapped pair of result handles, represented by pass_shaderresult, is appended to the pass_results array, which is passed in by pointer.

Analogous to ep_compile_pass_shaderparams(...).

ep_compile_pass_shader [modified function]

This function invokes generation of shader strings for either vertex or pixel shader. While doing so, it also generates mapping of effect parameters to shader parameters, which is used later to make setting of params possible.

We need to provide a mapping of effect results to shader results, very similar to how it is done for params, so the result retrieval works. Mapping between effect results and corresponding effect params will also be indirectly invoked.

  • Add variable used_results, which is an array of strings representing names of results used in the shader. used_results is updated during the calls to ep_makeshaderstring(...).
    • This is analogous to how used_params is worked with.
  • Obtain a pointer variable pass_results of type pass_shaderresult. It is pointing to the data member program_results of ep_pass pointer, and will be modified in-place to retain the results mapping in ep_pass data structure
    • Similar to vertshader_params and pixelshader_params members of ep_pass.
  • Add a call to ep_compilepass_shaderresults(...). used_results and pass_results variables are passed in. pass_results will be updated.
    • Analogous to how ep_compile_pass_shaderparams(...) is called.
    • Also will result in effect results receiping pointers to corresponding effect params.

Interacting with the effects

In a typical scenario the effect user obtains handles to an effect's parameters, and uses those handles to pass the parameters to the effect, so the values can be passed down to the lower level layers and eventually end up in the actual graphics system and shader machinery.

We have to expand the usage to include the results. Similarly to params, the user will be able to lookup result handles by name. The user will then be able to retrieve result values from the result handles.

The result variables are also connected to param variables of the same name. atomic_uint will require a new type of param that is not int, and also requires some special care by either graphics system when using it as a parameter.

gs_effect_get_result_by_name(...) [new function]

Obtains a gs_eresult_t pointer by name. This handle can then be used by gs_effect_get_atomic_uint_result(...) to obtain an effect result.

Analogous to gs_effect_get_param_by_name(...).

gs_effect_set_atomic_uint(...) [new function]

This allows you pass a value to an unsigned integer atomic counter in the shader like you would set any other param. Nothing new here, except the new parameter type that will be used with the atomic_uint variable in the effect code. All lower-level details are in other functions.

Analogous to gs_effect_set_int(...).

gs_effect_get_atomic_uint_result(...) [new function]

Given a gs_eresult_t pointer/handle, retrieves the value of the atomic_uint result after drawing with the effect has finished.

Again, no super low-level stuff here; just some memory copies. Very similar to gs_effect_set_xyz(...) functions (where xyz is a data type) except we get instead of set since the results are coming back after the draw, instead of being passed in before the draw.

effect_setval_inline(...) [modified function]

This is a wrapper for updating effect parameters with new values. In addition to some error-checking, the functions prevents updating uniform data in the shaders when the values given are no different from the previously assigned value. (so, no update should be necessary)

We customize the behavior to always force updates anytime the parameter type is GS_SHADER_PARAM_ATOMIC_UINT. Our atomic counters are both a param and a result, and are treated a little different from other uniforms by the graphics systems. So, we must ensure the value is always passed in to the shaders before every draw, even if we keep sending the same value.

effect_pass_free(...) [modified function]

This is a cleanup function for gs_effect_pass. We modify it to also cleanup the newly introduced program_results array.

effect_free(...) [modified function]

This is a cleanup function for gs_effect. We modify it to also cleanup the newly introduced results array.

Parsing the intermediate shaders

shader-parser processes the intermediate vertex and pixel shader code generated by the effect-parser. Data structures are generated to represent the shader behavior. This allows reformatting the intermediate shader into GLSL or HLSL code, depending on which graphics subsystem is used.

Our changes here are for supporting results, and for assigning a unique index for each atomic counter variable, so either graphics system will be ready to allocate resources for its specialized handling of the counters.

shader_parser [new members]

This structure holds an instance of a cf_parser which is a c-style parser, and has arrays of data structures for params, structs, samplers, and funcs.

We need an incrementing counter to assign unique, increasing index for each atomic counter we encounter in the intermediate shader code. This struct seems to be a fine place to keep the next index to be assigned, so we add atomic_counter_next_index integer field.

shader_var [new members]

This struct represents a variable in a shader code. We want results to be special kind of variables, so we add is_result flag to the fields. We also add atomic_counter_index so that each atomic_uint variable can be uniquely identified in preparation to be handled by either graphics system. Increasing values will be assigned to indices of counters in the order of the counter's declaration in the intermediate shader.

shader_parser_init(...) [modified function]

This initializes members of shader_var. We modify it to also initialize atomic_counter_next_index to 0, so the counter enumeration index will begin with 0.

shader_var_init_param(...) [modified function]

This function initializes a shader_var that was previous allocated, assigning its fields based on function parameters.

New function parameters are added to support results and atomic counters; specifically:

  • is_result set to true when the variable is also a result
  • atomic_counter_next_index is an integer that is passed by pointer. Whenever the variable is of type atomic_uint this integer is copied to the field atomic_counter_index in shader_var, and then the next index to be assigned is incremented.

sp_parse_param(...) [modified function]

Very similarly to ep_parse_param(...), this calls shader_var_init_param(...) to initialize a shader_var and do some error-checking.

We modify the call to shader_var_init_param(...) so the flag is_result is passed down to it, and also pass the pointer to atomic_counter_next_index of gs_shader as the other new argument.

sp_parse_other(...) [modified function]

Very similar to ep_parse_other(...) and is responsible for parsing anything in the intermediate shader code that is not whitespace, struct or sampler_state. We add special handling for variables of type atomic_uint.

  • Local boolean is_result is added, and is set to true when a variable of atomic_uint type is encountered.
  • is_result is also passed down to sp_parse_param(...).

Graphics API

graphics is "an API-independent graphics subsystem wrapper". Many data types are defined. Among other things, it has function pointers that need be assigned to functions specific to OpenGL vs Direct3D11 operation.

Here we integrate some new functionality needed to make results work.

New data types

gs_shader_result [new struct]

This is actually defined by either D3D11 or OpenGL subsystems. It will contain data that either system will need to interact with an actual shader variable associated with the result. However, the graphics code will pass this data around using the [gs_eresult_t](#gs_eresult_t-new typedef) typedef wrapper.

See OpenGL and D3D11 implementations of gs_shader_result.

gs_eresult_t [new typedef]

This is a typedef for gs_effect_result so a pointer handle to an effect result can be used by modules that don't need knowledge of the graphics internals.

Analogous to gs_eparam_t.

gs_sresult_t [new typedef]

This is a typedef for gs_shader_result so a pointer handle to a shader result can be used by modules that don't need knowledge of the shader internals.

Analogous to gs_sparam_t.

gs_shader_param_type [extended enum]

This enum for shader param data types is extended to include GS_SHADER_PARAM_ATOMIC_UINT.

New function signatures

The following new function signatures will be declared so they can be defined by either OpenGL or D3D11 subsystems. This will allow platform-agnostic effect abstractions to interact with the actual shader machinery of either subsystem.

gs_shader_get_result_by_name(...) [new function signature]

gs_sresult_t *(*gs_shader_get_result_by_name)(gs_shader_t *program, const char *name);

Fetches a pointer/handle gs_sresult_t by name from a shader program.

See OpenGL and D3D11 implementations.

Analogous to gs_shader_get_param_by_name(...).

gs_shader_set_atomic_uint(...) [new function signature]

void (*gs_shader_set_atomic_uint)(gs_sparam_t *param, unsigned int val);

Passes an unsigned integer value to the atomic counter variable in the shader represented by the gs_sresult_t pointer/handle.

Expands on the existing gs_shader_set_xyz(...) function declarations, where xyz is a data type.

See OpenGL and D3D11 implementations.

gs_shader_get_result(...) [new function signature]

void (*gs_shader_get_result)(gs_sresult_t *result, struct darray *dst);

Copies new result data from the shader into the dst, provided a gs_sresult_t pointer/handle.

See OpenGL and D3D11 implementations.

Naming is analogous to gs_shader_set_val(...) declaration.

New and changed functions

download_results(...) [new function]

This uses the array of mapping pairs pass_shaderresult to transfer data from shader result handles gs_sresult_t (where new data is available after a technique draw is finished) into the associated gs_effect_result, which makes the data available to effect users. It calls gs_shader_get_result(...).

Naming is analogous to upload_parameters(...).

gs_technique_end_pass(...) [modified function]

This function is doing some cleanup after a technique draw has finished, and it was a convenient place for us to insert a call to download_results(...).

libobs-opengl

This module provides implementation of the Graphics API for the OpenGL graphics system.

Roughly speaking, most of the changes fall into the categories of writing low-level code to implement the atomic counters with features available in OpenGL, and adding the structure and logic necessary to make the results work.

gl-subsystem

gl-subsystem.h/.cpp does many GL-specific implementations of the Graphics API, including GL-specific implementations of the data structure for shader and program params.

We will modify and add new data structures here to add support for results, with some additions being specific to support atomic_uint variables.

gs_shader_param [new members in libobs-opengl implementation]

This is the GL-specific implementation for the shader_param, that has some low-level details for interacting with the params. In platform-agnostic sections of the code the pointers to gs_shader_param are passed around using gs_sparam_t pointer handles.

Our concept of a result implies being linked with a param of the same name, and using atomic_uint variable as a shader param requires some special handling too. So, we choose this struct to contain low-level data necessary for interacting with the atomic counter params AND results, as well as flags that were similarly added to other structures to support results.

New members are:

  • is_result when true indicates this param also has an associated result
  • buffer_id is the ID of the OpenGL Buffer Object that will be used to interact with the atomic counter. This will be received by calling init_atomic_buffer(...)
  • layout_binding in the GL subsystem this represents the index into the indexed target GL_ATOMIC_COUNTER_BUFFER​ that represents graphics data for our counters.
  • layout_offset in the GL subsystem we can have multiple atomic counters share the same layout binding but have different offsets.
  • ::construction:: TODO: Currently, all our counters get assigned unique binding and the offset is always 0. A good optimizations will be to reuse same binding but have several counter variables with different offsets into the shared memory block of the same binding.

gs_shader_result [libobs-opengl implementation]

This will be our implementation of the shader result for the GL subsystem.

Members are:

  • name: name of the result
  • param: pointer to gs_shader_param, which has fields with low-level details for interacting with atomic counter params/results.
  • cur_value data array where retrieved result data will be stored

gs_shader [new members in libobs-opengl implementation]

This is the GL-specific definition for representing an active shader. One of the members is params, which is an array of gs_shader_param.

We add results, which is an array of gs_shader_result.

program_result [new struct]

This just has a pointer gs_shader_result, so a program result can be associated with a shader result.

Analogous to program_param.

gs_program [new members]

This represents an OpenGL shader program. It has gs_shader pointers to a vertex and pixel shader, as well as an array of program_params.

We add results, which is an array of program_results.

gl-shaderparser

After shader-parser has parsed the intermediate shader code into data structures, gl-shaderparser code will generate the final GLSL shader code from these data structures.

Our changes here are for generating code that utilizes the atomic counters feature of OpenGL/GLSL. atomic_uint type is native to GLSL, but we need to generate the layout(...) qualifier block that is required for atomic_uint variable declarations, which we have omitted from the effect language.

Notably, atomicCounterIncrement(...) (or decrement) effect statements, that we aim to support, are borrowed from GLSL, and require no special translation when regenerated from intermediate shader code into GLSL.

gl_write_var [modified function]

This function writes variable declarations into GLSL, taking into account various qualifiers.

To support atomic_uints in GLSL, the declaration must be preceded by a layout(...) qualifier block. So, our declarations for atomic counters will need to look like this:

(layout binding = 1, offset = 0) uniform atomic_uint myCounter;

We modify the function to insert the layout qualifiers for variable declarations that are of type atomic_uint. We will use atomic_counter_index of shader_var as the source for the binding index.

gl_shader_buildstring [modified function]

Dispatches lower-level functions for putting together the final GLSL code.

We change #version 330 preprocessor declaration to #version 460 to support atomic counters.

🚧 TODO An alternative to bumping the shader version all the way to 460 is activating the GL_ARB_shader_atomic_counters extension. Unfortunately, this seems to be breaking effect compilation for some of the effects that don't use the atomic counters. This alternative can still be explored but the cause of effect breakage will need to be further investigated, or a system for activating the extension only for the effects that require it may need to be introduced.

gl-shader

Defines many functions for activating and interacting with a GLSL shader and an OpenGL shader program.

Our changes are for integrating the flow of data for results, and introducing the low-level code for activation and interaction specific to atomic counters (atomic_uint).

gl_add_result(...) [new function]

Instantiates and stores a new result in the results array of gs_shader. New instance of gs_shader_result gets a copy of the param/result name, so the corresponding param and the result can be linked once the intermediate shader parsing has finished and the array of params is stable.

gl_add_param(...) [modified function]

Initializes a gs_shader_param given a shader_var as input. Textures get some special treatment here. Once initialized, new instance of gs_shader_param is pushed back to the params array of gs_shader. We modify the function as follows:

  • When param is of type GS_SHADER_PARAM_ATOMIC_UINT its layout_binding gets set to the value of atomic_counter_index of shader_var, and its layout_offset is 0.
  • is_result field of shader_var propagates to the new gs_shader_param.
  • in the event is_result is true, gl_add_result(...) is called to instantiate and store a gs_shader_result corresponding to the param's name.

init_atomic_buffer(...) [new function]

Generates a new OpenGL Buffer Object of type GL_ATOMIC_COUNTER_BUFFER, which will be used for writing to and reading from an atomic counter variable. Buffer ID is retained to be stored in gs_shader_param.

We use glGenBuffers(...), glBindBuffer(...), glBufferData(...), glBindBuffer(...), and glBindBufferBase(...) to get an atomic counter buffer initialized.

gs_shader_set_atomic_uint(...) [libobs-opengl implementation]

Implements gs_shader_set_atomic_uint(...) function signature for the OpenGL subsystem.

Very similar to most of the other gs_shader_set_xyz(...) implementations (where xyz is a data type) that just copy data into cur_value data array of gs_shader_param.

gs_shader_get_result_by_name(...) [libobs-opengl implementation]

Implements gl_shader_get_result_by_name(...) function signature for the OpenGL subsystem.

Analogous to gs_shader_get_param_by_name(...) and just searches for the right result with name field that matches.

gs_shader_get_result(...) [libobs-opengl implementation]

Implements gs_shader_get_result(...) function signature for the OpenGL subsystem.

Just copies data into destination pointer from the cur_value data array of gs_shader_result.

gs_shader_destroy(...) [modified function]

This is a cleanup function for gs_shader and we modify it to also cleanup the results array of gs_shader_result.

program_set_param_data(...) [modified function]

This function works on getting param values/data into uniform variables of an active GLSL shader. For most supported data types this means calling glUniformXyz(...) function, with some specialized work needed for textures.

The atomic counter variables are also special and we cannot use glUniformXyz(...) style functions to set values before draw. When param type is GS_SHADER_PARAM_ATOMIC_UINT we add another specialization to call glBindBuffer(...), glBindBufferBase(...) and glBufferSubData(...) functions so the value can make its way to atomic_uint variable of interest in the shader, before the draw.

assign_program_param(...) [modified function]

This function finds the uniform locations of a given gs_shader_param by calling glGetUniformLocation(...) with the param's name, and then pushes the given shader param to the params array of gs_program.

Because the mechanisms used for initializing and using the atomic counters are different from regular uniforms, we modify the function to skip finding and assigning a uniform location any time a param is of type is GS_SHADER_PARAM_ATOMIC_UINT.

assign_program_shader_results(...) [new function]

This function works on connecting each constructed gs_shader_result with the respective gs_shader_param of the same name. It also constructs a new program_result linked with the shader result, and pushes it to results array of gs_program.

Naming is analogous to assign_program_shader_params(...).

assign_program_results(...) [new function]

This function calls assign_program_shader_results(...) for both vertex and pixel shaders of gs_program.

Analogous to assign_program_params(...).

gs_program_create(...) [modified function]

After GLSL shaders were compiled this function creates a new shader program, attaches the shaders to the program, and links it. After linking it calls assign_program_params(...) and assign_program_attribs(...), so we also add a call to assign_program_results(...).

gs_program_destroy(...) [modified function]

This is a cleanup function for gs_program and we modify it to also destroy the results array containing program_results.

libobs-d3d11

This module provides implementation of the Graphics API for the Direct3D11 graphics system.

Roughly speaking, most of the changes fall into the categories of implementing atomic counters using Direct3D's UAV variables system, and adding the structure and logic necessary to make the results work.

d3d11-subsystem

d3d11-subsystem.h/.cpp does many D3D11-specific implementations implementations of the Graphics API, including D3D11-specific implementations of the data structure for shader and program params.

We will modify and add new data structures here to add support for results, with some additions being specific to implement atomic_uint variables using the UAV system.

gs_shader_param [new members in libobs-d3d11 implementation]

This is the D3D11-specific implementation for the shader_param, that has some low-level details for interacting with the params. In platform-agnostic sections of the code the pointers to gs_shader_param are passed around using gs_sparam_t handle.

Our concept of a result implies being linked with a param of the same name, and using a UAV variable as a shader param requires some special handling too. So, we choose this struct to contain low-level data necessary for interacting with the atomic counter params AND results, as well as flags that were similarly added to other structures to support results.

New members are:

  • is_result when true indicates this param also has an associated result
  • atomicCounterIndex in the D3D11 subsystem will represent index into the buffer of UAV memory of unsigned integers where variable resides. The value will be copied from atomic_counter_index of shader_var.

Situational repurposing of an existing member:

  • The pos member is being used to store the byte address of the variable in the memory chunk used to set const/uniform variables. For UAV counter variables we will reuse the same variable as the byte address into the UAV memory chunks that we send and receive from the shader. Since each counter variable will be 4 bytes, anything that is a counter will have its pos assigned to atomicCounterIndex * 4.

gs_shader_result [libobs-d3d11 implementation]

This will be our implementation of the shader result for the D3D11 subsystem.

Members are:

  • name: name of the result
  • param: pointer to gs_shader_param, which has fields with low-level details for interacting with UAV counter params/results.
  • curValue data array where retrieved result data will be stored

gs_shader [new members in libobs-d3d11 implementation]

This is the D3D11-specific definition for representing an active shader. Among other things, it has params member, which is a vector of gs_shader_param, which partakes in passing the param values to the shader before the draw. So, we add results, which is a vector of gs_shader_result, and will partake in fetching the results after the draw.

The class also holds const data size and descriptor used for initializing the const/uniform buffer. So, we introduce UAV data size and several descriptor variables used to initialize the UAV buffer and its use. The descriptors are initialized in the gs_shader::BuildUavBuffer(...) (called from the vertex and pixel shader constructors) and are reused in gs_vertex/pixel_shader::Rebuild(...).

  • uavBd is a D3D11_BUFFER_DESC structure for describing the UAV buffer. Gets passed down to ID3D11Device::CreateBuffer(...)] to create the UAV buffer in graphics memory.
  • uavTxfrBd is a D3D11_BUFFER_DESC structure for describing the UAV transfer buffer. Gets passed down to ID3D11Device::CreateBuffer(...)] to create a transfer buffer for transferring data back and forth between the system memory and the UAV graphics memory.
    • analogous to bd which is a D3D11_BUFFER_DESC used in initializing the constants buffer that transfers uniforms' data to the graphics memory.
  • uavViewDesc is a D3D11_UNORDERED_ACCESS_VIEW_DESC structure for describing the UAV view. Gets passed down to ID3D11Device::CreateUnorderedAccessView(...) to create the UAV view for sending the counter data into the shader.

Finally, the class holds pointer to a D3D11 interface for the buffer used in setting const/uniform data. So, we add pointers to the two buffers and a UAV view used in setting and receiving UAV data. These are created in the gs_shader::BuildUavBuffer(...) (called from the vertex and pixel shader constructors) and have to be reinitialized in the event of gs_vertex/pixel_shader::Rebuild(...).

  • uavBuffer is a pointer to ID3D11Buffer representing the UAV buffer in graphics memory. Obtained by calling D3D11Device::CreateBuffer(...)] with uavBd as one of parameters.
  • uavTxfrBuffer is a pointer to ID3D11Buffer for sending and receiving the UAV data. Obtained by calling D3D11Device::CreateBuffer(...)] with uavTxfrBd as one of parameters.
    • analogous to constants buffer pointer for setting uniforms
  • uavView is a pointer to ID3D11UnorderedAccessView and is used to sending data to the UAV region in graphics memory. Obtained by calling ID3D11Device::CreateUnorderedAccessView(...) with uavBuffer and uavViewDesc as arguments.

device_draw(...) [modified function in libobs-d3d11 implementation]

This function encapsulates much of a typical draw activity, including loading vertex buffer, updating blend, raster, Z-stencil states, view+proj matrix, invoking UploadParams(...) on shader parameters to vertex and pixel shaders, and finally drawing primitives. It is called from gs_draw(...).

As we are introducing the concept of results that become available after the draw, we add a call to gs_shader::DownloadResults(...) on the vertex and pixel shaders, after the primitive draw has finished.

d3d11-shaderprocessor

After shader-parser has parsed the intermediate shader code into data structures, d3d11-shaderprocessor code will generate the final HLSL shader code from these data structures.

Our changes here are for generating code that parses the added effect syntax for atomic counters, and utilizes the UAV variables of D3D to implement atomic counters. Unfortunately, in HLSL there is no direct analogue to atomic_uint type and no atomicCounterIncrement(...) function, so we translate our intermediate shader code to become other things in HLSL that needs to be generated:

  • RWStructuredBuffer<uint> __uavBuffer : register(u1); is used to declare a UAV memory chunk in the shader that we will use for storing the atomic counter variables. Such a buffer will be added when UAV counters were encountered in the effect code, and the uavBuffer handle of gs_shader will be connected to this buffer when [gs_shader::BuildUavBuffer(...)](gs_shaderBuildUavBuffer-new-function) or gs_vertex/pixel_shader::Rebuild(...)` is called.
  • We cannot assign and access variables inside our __uavBuffer by name, but we can index into it like an array of unsigned 32-bit integers. We will use the atomicCounterIndex member of gs_shader_param as an index into the buffer (which was inherited from atomic_counter_index of shader_var).
  • InterlockedAdd(...) will be used to replace atomicCounterIncrement(...) added to the effect language.
  • So, an effect statement atomicCounterIncrement(varName) will need to be translated into InterlockedAdd(__uavBuffer[0], 1), where 0 happened to be the atomic counter index for varName, and 1 is because "increment" is equivalent to "add 1 and assign";

ShaderProcessor::SeekUntil(...) [new function]

This protected utility function is added for iterating through cf_tokens of ShaderParser until a token matching a string is found. Used by ShaderProcessor::PeekAndSkipAtomicUint(...) and ShaderProcessor::ReplaceAtomicIncrement(...) functions.

ShaderProcessor::SeekWhile(...) [new function]

This protected utility function is added for iterating through cf_tokens of ShaderParser for as long as tokens match a string. Used by ShaderProcessor::PeekAndSkipAtomicUint(...) and ShaderProcessor::ReplaceAtomicIncrement(...) functions.

ShaderProcessor::PeekAndSkipAtomicUint(...) [new function]

This function as added to eat up all tokens that are part of uniform atomic_uint myVar; declarations. As mentioned before, we will be using numeric index into the __uavBuffer of the shader code instead of counter variable names, so all tokens that are part of declarations for atomic_uints will be completely ignored - no output will be produced. Returns true when one such declaration was encountered and swallowed up.

ShaderProcessor::ReplaceAtomicIncrement(...) [new function]

This works on translating the atomicCounterIncrement(...) statement added to the effect language into InterlockedAdd(...) statements of HLSL. In order to translate variable name of intermediate shader code into a numeric index that can be used with __uavBuffer - the params array of shader_parser is scanned for shader_var with the matching name and the atomic_counter_index of that shader variable is used.

ShaderProcessor::BuildString(...) [modified function]

This one obtains tokens from the parser of the intermediate shader code, and replaces keywords of the effect language into things that actually exist in HLSL, so the final HLSL string is constructed. Our additions of the atomic counter syntax are no exception to these needs, and even require some additional handling. Changes are: -ShaderProcessor::ReplaceAtomicIncrement(...) is called whenever atomicCounterIncrement keyword is encountered, and will navigate tokens to replace the entire increment statement.

  • We also add a call to ShaderProcessor::PeekAndSkipAtomicUint(...), which is called after all other keywords are checked for a need of conversion. If the function returns true - this means atomic_uint declaration was encountered, and all tokens of the declaration statement will be be skipped in the HLSL.
  • As mentioned before, we need to add RWStructuredBuffer<uint> __uavBuffer : register(u1); into output to declare the UAV memory block for the counters, but only for the shaders that have atomic counter variables. So, instead of stringstream output function variable there are now string streams tempOutput and finalOutput. tempOutput is being written to as tokens are being processed, just like output was before. During processing of the tokens, we will learn if the UAV block will be needed or not. And if it is needed, in the finalOutput we will insert RWStructuredBuffer<uint> __uavBuffer : register(u1); statement after static const bool obs_glsl_compile = false but before the rest of the code, which has been constructed in tempOutput. Now finalOutput stream has the final HLSL code to be copied into outputString, which is the function argument that is passed by reference. Phew.

d3d11-shader

Defines much of the functionality for initializing and interacting with an active HLSL shader.

Changes here will be for adding the needed structure and logic for results, initializing all the D3D11 device and context handles needed for interacting with the UAV buffer containing atomic counter variables, and using them.

gs_shader_get_result_by_name(...) [libobs-d3d11 implementation]

Implements gl_shader_get_result_by_name(...) function signature for the GL subsystem.

Analogous to gs_shader_get_param_by_name(...) and just searches for the right result with name field that matches.

gs_shader_set_atomic_uint(...) [libobs-d3d11 implementation]

Implements gs_shader_set_atomic_uint(...) function signature for the D3D11 subsystem.

Copies data from a const void* data pointer into curValue vector of gs_shader_param, resizing the vector when necessary.

gs_shader_get_result(...) [libobs-d3d11 implementation]

Implements gs_shader_get_result(...) function signature for the D3D11 subsystem.

Just copies data into destination pointer from the curValue data array of gs_shader_result.

gs_shader::BuildUavBuffer(...) [new function]

Iterates through the vector of gs_shader_param. For each param that is also a result, uses pos member to determine the largest value of the mapping index, so the required size of the UAV block is known. Once this size, uavSize, is known, and is not zero, initializes the uavBd, uavTxfrBd, and uavViewDesc descriptors, and uses them to create the uavBuffer and uavTxfrBuffer, and uavView of the gs_shader, which are used for sending data to and from the UAV graphics memory.

Analogous to the behavior of gs_shader::BuildConstantBuffer(...), except there is some additional complexity due to UAV use and support for bi-directional data flow.

gs_vertex/pixel_shader::Rebuild(...) [modified functions]

These functions reconstruct a vertex or pixel shader, maintaining the descriptors for const/uniform buffer but reinstantiating the const buffer and making sure all params work in the new instance of the shader.

Similarly to how const data is handled, we maintain the uavBd, uavTxfrBd, and uavViewDesc descriptors, but we create new instances of the uavBuffer and uavTxfrBuffer, and uavView of gs_shader, which are required for sending data to and from the UAV graphics memory.

gs_shader::UpdateParam(...) [modified function]

This function adjusts constData data array, passed by reference, in response to an input gs_shader_param. The pos member of each param dictates where in the const data (uniform) memory chunk the param's data will go. When new or modified param data is due to be inserted into the chunk of const data, a pass-by-reference boolean uploadConst is set to true, to flag that the const data will need to be uploaded to refresh the uniform variable(s).

We mirror this arrangement as we add support for sending UAV memory chunks to the shaders, triggered by the counter params+results that need it:

  • We add uavData data array that is passed into the function by reference. It will be adjusted in response to the input param when it is of type GS_SHADER_PARAM_ATOMIC_UINT. Once again, pos member of each param is used as a mapping index, except this time it's indexing into the UAV data instead of the const/uniform data.
  • We add uploadUav, a boolean passed by reference, that will need to be set to true whenever an input gs_shader_param is of type GS_SHADER_PARAM_ATOMIC_UINT. We always force UAV data refresh anytime a UAV variable is encountered - even if the UAV memory chunk appears unchanged. We want UAV counters to be set to predictable values as we begin drawing with the effect.

gs_shader::UploadParams(...) [modified function]

This function initializes a local variable constData, which is data vector, and then calls gs_shader::UpdateParam(...) on every param of gs_shader until constData contains all the values to be assigned to the uniforms. If new or modified param data was encountered during the updates - the uploadConst flag gets set, and the function uses ID3D11DeviceContext::Map(...) to map the constants buffer, so the constructed data block in constData is copied and ends up in this const/uniform buffer.

We mirror this structure as we introduce UAV counter results. We add uavData data array and uploadUav boolean, and we also process these by gs_shader::UpdateParam(...). If any of the params were atomic counter results, the uploadUav flag is set and:

gs_shader::DownloadResults(...) [new function]

After the draw we need to download the UAV data back:

  • Add a local veriable, data vector resultsData to receive the UAV memory chunk.
  • As we work in the opposite direction, we call ID3D11DeviceContext::CopyResource(...) first to copy data from uavBuffer in the graphics memory to the uavTxfrBuffer in the system memory.
  • Then use ID3D11DeviceContext::Map(...) to copy data from uavTxfrBuffer to the resultsData vector.
  • Finally, copy data from the resultsData vector to individual gs_shader_result, using the pos member of each result as a mapping index.

gs_pixel_shader::gs_pixel_shader(...) [modified constructor]

This a constructor for a class that represents an active pixel shader, and is derived from gs_shader base class. It instantiates and makes calls to d3d11-shaderprocessor, so the previously generated intermediate shader code can be parsed again into useful data structures for interacting with the shader. The final HLSL code is assembled and ID3D11Device::CreatePixelShader(...) is called to create the shader and keep and handle to it inside gs_pixel_shader.

Here, the only modification was changing shader model from ps_4_0 to ps_5_0 in order to support UAV buffer.

::construction:: TODO: Should gs_vertex_shader constructor also be changed to use vs_5_0 so atomic counters can be supported in vertex shaders? Besides consistency, are there benefits to doing so? Is there a potential useful application for vertex shaders? Are there benefits to not doing it? Should we change it anyway since we are using higher pixel shader model and it's unlikely one will be supported by the hardware/drivers and not the other?

⚠️ **GitHub.com Fallback** ⚠️