OpenGL on Nvidia K20 - shawfdong/hyades GitHub Wiki

Each of the 8 GPU nodes in Hyades contains an Nvidia Tesla K20 GPU Accelerator, a server-class GPU. For Tesla K20X and K20 devices, OpenGL support is an opt-in feature via the GPU Operation Mode (GOM). On Tesla K40 and newer devices, OpenGL capabilities are always enabled[1].

Table of Contents

Enabling hardware-accelerated OpenGL

We'll use NVIDIA System Management Interface (nvidia-smi) to enable hardware-accelerated OpenGL on Tesla K20 devices.

# nvidia-smi --help

Query the GPU Operation Mode (GOM) currently in use:

# nvidia-smi --query-gpu=gom.current --format=csv,noheader
Compute
By default, only compute tasks are allowed on Tesla K20 devices. Graphics operations are not allowed.

Enable hardware-accelerated OpenGL by putting the GPU into the All On mode:

# nvidia-smi --gom=0
GOM changed to "All On" for GPU 0000:03:00.0.
All done.
Reboot required.

Reboot the GPU node.

GOM is now All On:

# nvidia-smi --query-gpu=gom.current --format=csv,noheader
All On

Installing X server

OpenGL doesn't exist until we create an OpenGL context. Creation of OpenGL context is platform-specific. On Linux, it is typical to use GLX. GLX provides the interface connecting OpenGL and the X Window System: it enables programs wishing to use OpenGL to do so within a window provided by the X Window System. Hardware acceleration to indirect GLX is brougt by AIGLX (Accelerated Indirect GLX), by loading the Mesa DRI driver inside the X server.

Install X Window System:

# yum grouplist
# yum groupinstall "X Window System"

Install GNOME Desktop Environment:

# yum groupinstall "GNOME Desktop Environment"

Because there was no X Window System on the GPU nodes, the installation of Nvidia driver didn't include the X Window drivers. Let's re-install the Nvidia driver:

# /pfs/sw/NVIDIA/NVIDIA-Linux-x86_64-340.58.run
Now there are /usr/lib64/xorg/modules/drivers/nvidia_drv.so and /usr/lib64/xorg/modules/libnvidia-wfb.so.340.58.

Configuring X server

There is no display connector on Tesla K20, so we'll set up a headless X server to manage the OpenGL contexts. Nvidia provides an X Configuration Tool (nvidia-xconfig) to manipulate X configuration files. We'll use it to configure the X server for headless operation.

# nvidia-xconfig -h

Query GPU info:

# nvidia-xconfig --query-gpu-info
Number of GPUs: 1

GPU #0:
  Name      : Tesla K20m
  UUID      : GPU-ad6ee1ac-074f-6950-ae49-8d84ad7600df
  PCI BusID : PCI:3:0:0

  Number of Display Devices: 0

Create the X configuration file /etc/X11/xorg.conf:

# nvidia-xconfig --busid=PCI:3:0:0 --use-display-device=none

By default, only console owner can start the X server on display 0. We'll allow any user to do so, by replacing the following line in /etc/pam.d/xserver:

auth       required	pam_console.so
with:
auth       sufficient	pam_permit.so

Run:

# sed -i 's/required\tpam_console.so/sufficient\tpam_permit.so/' /etc/pam.d/xserver

Test the headless X server:

$ Xorg :0 2> /dev/null &
$ export DISPLAY=:0

$ xrandr 
Screen 0: minimum 8 x 8, current 1280 x 1024, maximum 16384 x 16384

$ glxinfo
name of display: :0.0
display: :0  screen: 0
direct rendering: Yes
server glx vendor string: NVIDIA Corporation
server glx version string: 1.4
...
client glx vendor string: NVIDIA Corporation
client glx version string: 1.4
...
GLX version: 1.4
...
OpenGL vendor string: NVIDIA Corporation
OpenGL renderer string: Tesla K20m/PCIe/SSE2
OpenGL version string: 4.4.0 NVIDIA 340.58
OpenGL shading language version string: 4.40 NVIDIA via Cg compiler
...

$ glxgears 
101360 frames in 5.0 seconds = 20271.846 FPS
105496 frames in 5.0 seconds = 21099.188 FPS
105475 frames in 5.0 seconds = 21094.883 FPS
...

OpenGL libraries

The following OpenGL libraries are installed on the GPU nodes.

GLU

GLU (OpenGL Utility Library) consists of a number of functions that use the base OpenGL library to provide higher-level drawing routines from the more primitive routines that OpenGL provides. GLU is provided by the mesa-libGLU & mesa-libGLU-devel packages of RHEL/CentOS 6.

GLEW

GLEW (OpenGL Extension Wrangler Library) is a cross-platform C/C++ library that helps in querying and loading OpenGL extensions. GLEW provides efficient run-time mechanisms for determining which OpenGL extensions are supported on the target platform. All OpenGL extensions are exposed in a single header file, which is machine-generated from the official extension list. GLEW is provided by the glew & glew-devel packages in the EPEL repository.

freeglut

freeglut is a full replacement for GLUT (OpenGL Utility Toolkit). GLUT (and hence freeglut) allows the user to create and manage windows containing OpenGL contexts on a wide range of platforms and also read the mouse, keyboard and joystick functions. freeglut is provided by the freeglut & freeglut-devel packages of RHEL/CentOS 6.

To link with freeglut, e.g.:

$ gcc glut.c -o glut.x -lglut -lGLU -lGL -lXmu -lXext -lX11 -lm

GLFW

GLFW is a newer alternative to GLUT. It is an Open Source, multi-platform library for creating windows with OpenGL contexts and receiving input and events.

Download GLFW 3.1:

$ cd /scratch/
$ wget http://downloads.sourceforge.net/project/glfw/glfw/3.1/glfw-3.1.zip
$ unzip glfw-3.1.zip

Build and install GLFW 3.1[2]:

$ cd glfw-3.1
$ mkdir build
$ cd build
$ cmake -DCMAKE_INSTALL_PREFIX:PATH=/pfs/sw/serial/gcc/glfw-3.1 ..
$ make all install

GLFW 3.1 is installed at /pfs/sw/serial/gcc/glfw-3.1. It has a lot of dependencies:

$ export PKG_CONFIG_PATH=/pfs/sw/serial/gcc/glfw-3.1/lib/pkgconfig
$ pkg-config --static --libs glfw3
-L/pfs/sw/serial/gcc/glfw-3.1/lib -lglfw3 -lrt -lXrandr -lXinerama -lXi -lXcursor -lGL -lm -ldl -lXrender -ldrm -lXdamage -lX11-xcb -lxcb-glx -lXxf86vm -lXfixes -lXext -lX11 -lpthread -lxcb -lXau

So we are better off using the pkg-config tool to build GLFW applications[3]. For instance, to compile the example GLFW code:

$ gcc -o GLFW_example.x GLFW_example.c \
  `pkg-config --cflags glfw3` \
  `pkg-config --static --libs glfw3`

OpenGL Compute Shader

Nvidia K20 supports OpenGL 4.4. Since version 4.3, the core specifications of OpenGL include support for Compute Shader, a OpenGL shader stage that is used entirely for computing arbitrary information[4].

In the wiki article GPU QuickStart Guide, I demonstrate several different ways to implement a simple AXPY (A·X Plus Y) computation on the CUDA platform. Here I present you yet another implementation, using OpenGL Compute Shader (error handling removed for code clarity):



#define GL_GLEXT_PROTOTYPES
#include <GL/glut.h>
#include <GL/glext.h>
#include <cstdlib>
#include <cstdio>
#include <ctime>
#include <cmath>
#define N 20480

int main (int argc, char** argv)
{

    glutInit (&argc, argv);
    glutInitDisplayMode (GLUT_DOUBLE | GLUT_RGBA);
    glutInitWindowSize (1, 1);
    glutInitWindowPosition (100, 100);
    glutCreateWindow ("Compute Shader");
 
    // get OpenGL version info
    const GLubyte* renderer;
    const GLubyte* version;
 
    renderer = glGetString (GL_RENDERER);
    version = glGetString (GL_VERSION);
    printf ("Renderer: %s\n", renderer);
    printf ("OpenGL version supported: %s\n", version);
    
    const char* compute_shader =
        "#version 430\n"
        "uniform float a;"
        "layout (local_size_x = 32) in;"
        "layout (std430, binding=0) buffer xblock { float x[]; };"
        "layout (std430, binding=1) buffer yblock { float y[]; };"
        "void main () {"
        "    int index = int(gl_GlobalInvocationID);"
        "    y[index] += a * x[index];"
        "}";

    GLuint cs = glCreateShader (GL_COMPUTE_SHADER); 
    glShaderSource (cs, 1, &compute_shader, NULL);
    glCompileShader (cs);

    GLuint cs_program = glCreateProgram ();
    glAttachShader (cs_program, cs);
    glLinkProgram (cs_program);

    // initialize x and y
    float *x, *y;
    size_t size = N * sizeof (float);
    x = (float *) malloc (size);
    y = (float *) malloc (size);

    srand (time (NULL));
    int i;
    for (i=0; i<N; i++)
        x[i] = (float) random () / RAND_MAX;
    for (i=0; i<N; i++)
        y[i] = (float) random () / RAND_MAX;

    float a = (float) random () / RAND_MAX;

    // buffer objects
    GLuint bo[2];
    glGenBuffers(2, bo);

    // bind buffer object bo[0] to indexed buffer target with binding=0
    // (xblock in computer_shader)
    glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, bo[0]);
    glBufferData(GL_SHADER_STORAGE_BUFFER, size, x, GL_STATIC_READ);

    // bind buffer object bo[1] to indexed buffer target with binding=1
    // (yblock in computer_shader)
    glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 1, bo[1]);
    glBufferData(GL_SHADER_STORAGE_BUFFER, size, y, GL_DYNAMIC_READ);

    glUseProgram (cs_program);
    glUniform1f (glGetUniformLocation (cs_program, "a"), a);

    glDispatchCompute(N/32, 1, 1);

    // read from buffer object to host memory
    float *d;
    d = (float *) malloc (size);
    glGetBufferSubData(GL_SHADER_STORAGE_BUFFER, 0, size, d);

    // verify the results
    float m = -1.0f;
    float tmp;
    for (i=0; i<N; i++) {
        y[i] += a * x[i];
	tmp = fabsf ( (d[i]-y[i])/y[i] );
	if ( tmp > m ) m = tmp;
    }

    // clean up
    glDeleteBuffers(2, bo);
    glDetachShader(cs_program, cs);
    glDeleteShader(cs);
    glDeleteProgram(cs_program);

    free(x);
    free(y);
    free(d);

    if ( m < 1E-6 ) {
        printf("Success!\n");
        return 0;
    }
    else {
        printf("Failure!\n");
        return 1;
    }
}

Note:

  1. Even though we don't do any rendering in this program, we still have to create an OpenGL context in order to use Compute Shader.
  2. For simplicity, we use GLUT to start the OpenGL context.
  3. The Nvidia driver installation updated the OpenGL libraries, but didn't update the header files, on the GPU nodes.
  4. The OpenGL header files are provided by the package mesa-libGL-devel-9.0-0.7.el6.x86_64, which is a bit outdated, and only supports OpenGL 4.3 APIs.
  5. To use OpenGL 4.4 APIs, we could update to mesa-libGL-devel-10.1.2-2.el6.x86_64 (CentOS 6.6); or download the headers directly from OpenGL Registry.
To compile the sample code, ssh to one of the GPU nodes and run:
$ g++ saxpy.cs.cpp -o saxpy.cs.x -lglut -lGLU -lGL -lXmu -lXext -lX11 -lm

$ export DISPLAY=:0
$ ./saxpy.cs.x 
Renderer: Tesla K20m/PCIe/SSE2
OpenGL version supported: 4.4.0 NVIDIA 340.58
Success!

Note: to use Nvidia K20 for your OpenGL applications, you must set DISPLAY=:0 so that your applications will run on X server attached to the GPU.

See Also

References

  1. ^ Remote Visualization on server-class Tesla GPUs
  2. ^ Compiling GLFW
  3. ^ GLFW - Building applications
  4. ^ Compute Shader
⚠️ **GitHub.com Fallback** ⚠️