Using host functions with graphs and streams - rshipley160/learn-cuda GitHub Wiki

Previous: Creating graphs explicitly

Introduction and Problem Setup

Sometimes there are things that just don't make sense to run on the GPU, but that nonetheless need to be run in the correct order of completing other GPU tasks, such that they would need to be triggered by either stream or graph communication, depending on the structure you have used for your GPU tasks.

To illustrate this, we are going to modify the quadratic equation solver that was introduced in the tutorial on Events so that it prints out "Discriminant has been calculated!" as soon as the sum under the square root has been calculated.

Technically you can print to the screen using the GPU, but doing so requires extra setup and wastes GPU resources, so for our purposes this is enough motivation to use a host function instead.

In the event-based version, this means calling the host function from one of the streams after the sum has completed

    elementwiseProduct<<<gridSize, BLOCK_SIZE, 0, bMinus>>>(b, b, sol1, numElements);
    elementScalarProduct<<<gridSize, BLOCK_SIZE, 0, bPlus>>>(a, c, -4, sol2, numElements);

    // Use events to ensure completion
    cudaEventRecord(bMinusComplete, bMinus);
    cudaStreamWaitEvent(bPlus, bMinusComplete);

    elementwiseSum<<<gridSize, BLOCK_SIZE, 0, bPlus>>>(sol1, sol2, sol1, numElements);
            
    // Trigger host function here

    elementwiseSqrt<<<gridSize, BLOCK_SIZE, 0, bPlus>>>(sol1, tmp, numElements);

In the graph version this means adding a node that calls the host after the node containing the sum kernel.

    cudaGraphAddKernelNode(&determinantSumNode, quadraticGraph, nodeDependencies.data(), 2, &determinantSumParams);

    nodeDependencies.clear();
    nodeDependencies.push_back(determinantSumNode);

    // Add host node in here and update dependencies

    cudaGraphAddKernelNode(&determinantSqrtNode, quadraticGraph, nodeDependencies.data(), 1, &determinantSqrtParams);

    nodeDependencies.clear();
    nodeDependencies.push_back(determinantSqrtNode);

Host Functions in Streams

To launch a host function on a stream we use

cudaLaunchHostFunc(cudaStream_t stream, cudaHostFn_t fn, void* userData)

As you can see, this takes a stream, which is the stream to launch the task on, a function, and a void pointer for user data, which is used to pass the parameters to the host function once it is ready to run.

The function parameter is simply the name of the function we want the stream to use, however, we have to define the function a little bit differently than usual in order for it to be useable from a stream context.

To show this, here is our discriminantComplete function that we will be calling to signal that the discriminant of the quadratic equation has been calculated

CUDART_CB void discriminantComplete(void *userData) {
    printf("Discriminant complete!\n");
}

There are three main properties that any function needing to be triggered by the GPU needs to satisfy:

  • It must have the CUDART_CB specifier
  • It must be of type void
  • It must take single void pointer parameter

In our case we can just discard whatever parameters are passed to the function since ours will print the same message regardless, but for functions that do need parameters, how do we change them to accept all of the parameters from a single pointer?

For parameters that are all of the same type, you could allocate an array of that type, fill it with the parameters and then cast it as a void pointer before it is passed to the host function, but the easier and more general purpose solution is to use a struct.

For example, if we wanted to run a host function which prints out the integer and float it is passed, we could set it up like so

struct myStruct {
    float myFloat;
    int myInt;
};
 
void CUDART_CB printValues (void *userData) {
    myStruct values = *((myStruct *)userData);

    printf("%d\n",values.myInt);
    printf("%f\n",values.myFloat);
}

Then all one has to do to run the function is assign the values of the struct members and cast the reference &values as a void pointer before passing it to the host function.

With the setup complete, we can insert a call to cudaLaunchHostFunc on the bPlus stream right before the square root function executes, and declare that the discriminant has been completed as you would expect it to.

    elementwiseSum<<<gridSize, BLOCK_SIZE, 0, bPlus>>>(sol1, sol2, sol1, NUM_ELEMENTS);

    cudaLaunchHostFunc(bPlus, discriminantComplete, NULL);

    elementwiseSqrt<<<gridSize, BLOCK_SIZE, 0, bPlus>>>(sol1, tmp, NUM_ELEMENTS);

The full code for the stream host function launch is available here.

Host Functions in Graphs

To launch a host function from a graph, the first step is the same as if we were going to launch the function from a stream: we have to redefine the host function in the format shown in the last section.

Since we've already done that for our discriminant function, let's add the host node to our quadratic solver graph from the explicit graph creation tutorial.

To actually create the host node in the graph, we create a params object that holds the function and the data, and then we can use that information to add a host node to the graph, right between the discriminant sum and square root nodes.

    cudaGraphAddKernelNode(&determinantSumNode, quadraticGraph, nodeDependencies.data(), 2, &determinantSumParams);

    nodeDependencies.clear();
    nodeDependencies.push_back(determinantSumNode);

    cudaGraphNode_t hostNode;
    cudaHostNodeParams hostParams = {0};
    hostParams.fn = discriminantComplete;
    hostParams.userData = NULL;

    cudaGraphAddHostNode(&hostNode, quadraticGraph, nodeDependencies.data(), 1, &hostParams);

    nodeDependencies.clear();
    nodeDependencies.push_back(hostNode);

    cudaGraphAddKernelNode(&determinantSqrtNode, quadraticGraph, nodeDependencies.data(), 1, &determinantSqrtParams);

As with the usage of parameters from a stream-triggered host function, you can also use multiple parameters in host functions called from graphs using the same three steps.

  • Define a struct for your parameters
  • Create a new instance of it and fill with values
  • Set the .userData of the node's parameters to be the void pointer reference of your struct - hostParams.userData = (void*)&myStruct;

Next: Graph node glossary