OpenGL on Nvidia K20 - shawfdong/hyades GitHub Wiki
Each of the 8 GPU nodes in Hyades contains an Nvidia Tesla K20 GPU Accelerator, a server-class GPU. For Tesla K20X and K20 devices, OpenGL support is an opt-in feature via the GPU Operation Mode (GOM). On Tesla K40 and newer devices, OpenGL capabilities are always enabled[1].
We'll use NVIDIA System Management Interface (nvidia-smi) to enable hardware-accelerated OpenGL on Tesla K20 devices.
# nvidia-smi --help
Query the GPU Operation Mode (GOM) currently in use:
# nvidia-smi --query-gpu=gom.current --format=csv,noheader ComputeBy default, only compute tasks are allowed on Tesla K20 devices. Graphics operations are not allowed.
Enable hardware-accelerated OpenGL by putting the GPU into the All On mode:
# nvidia-smi --gom=0 GOM changed to "All On" for GPU 0000:03:00.0. All done. Reboot required.
Reboot the GPU node.
GOM is now All On:
# nvidia-smi --query-gpu=gom.current --format=csv,noheader All On
OpenGL doesn't exist until we create an OpenGL context. Creation of OpenGL context is platform-specific. On Linux, it is typical to use GLX. GLX provides the interface connecting OpenGL and the X Window System: it enables programs wishing to use OpenGL to do so within a window provided by the X Window System. Hardware acceleration to indirect GLX is brougt by AIGLX (Accelerated Indirect GLX), by loading the Mesa DRI driver inside the X server.
Install X Window System:
# yum grouplist # yum groupinstall "X Window System"
Install GNOME Desktop Environment:
# yum groupinstall "GNOME Desktop Environment"
Because there was no X Window System on the GPU nodes, the installation of Nvidia driver didn't include the X Window drivers. Let's re-install the Nvidia driver:
# /pfs/sw/NVIDIA/NVIDIA-Linux-x86_64-340.58.runNow there are /usr/lib64/xorg/modules/drivers/nvidia_drv.so and /usr/lib64/xorg/modules/libnvidia-wfb.so.340.58.
There is no display connector on Tesla K20, so we'll set up a headless X server to manage the OpenGL contexts. Nvidia provides an X Configuration Tool (nvidia-xconfig) to manipulate X configuration files. We'll use it to configure the X server for headless operation.
# nvidia-xconfig -h
Query GPU info:
# nvidia-xconfig --query-gpu-info Number of GPUs: 1 GPU #0: Name : Tesla K20m UUID : GPU-ad6ee1ac-074f-6950-ae49-8d84ad7600df PCI BusID : PCI:3:0:0 Number of Display Devices: 0
Create the X configuration file /etc/X11/xorg.conf:
# nvidia-xconfig --busid=PCI:3:0:0 --use-display-device=none
By default, only console owner can start the X server on display 0. We'll allow any user to do so, by replacing the following line in /etc/pam.d/xserver:
auth required pam_console.sowith:
auth sufficient pam_permit.so
Run:
# sed -i 's/required\tpam_console.so/sufficient\tpam_permit.so/' /etc/pam.d/xserver
Test the headless X server:
$ Xorg :0 2> /dev/null & $ export DISPLAY=:0 $ xrandr Screen 0: minimum 8 x 8, current 1280 x 1024, maximum 16384 x 16384 $ glxinfo name of display: :0.0 display: :0 screen: 0 direct rendering: Yes server glx vendor string: NVIDIA Corporation server glx version string: 1.4 ... client glx vendor string: NVIDIA Corporation client glx version string: 1.4 ... GLX version: 1.4 ... OpenGL vendor string: NVIDIA Corporation OpenGL renderer string: Tesla K20m/PCIe/SSE2 OpenGL version string: 4.4.0 NVIDIA 340.58 OpenGL shading language version string: 4.40 NVIDIA via Cg compiler ... $ glxgears 101360 frames in 5.0 seconds = 20271.846 FPS 105496 frames in 5.0 seconds = 21099.188 FPS 105475 frames in 5.0 seconds = 21094.883 FPS ...
The following OpenGL libraries are installed on the GPU nodes.
GLU (OpenGL Utility Library) consists of a number of functions that use the base OpenGL library to provide higher-level drawing routines from the more primitive routines that OpenGL provides. GLU is provided by the mesa-libGLU & mesa-libGLU-devel packages of RHEL/CentOS 6.
GLEW (OpenGL Extension Wrangler Library) is a cross-platform C/C++ library that helps in querying and loading OpenGL extensions. GLEW provides efficient run-time mechanisms for determining which OpenGL extensions are supported on the target platform. All OpenGL extensions are exposed in a single header file, which is machine-generated from the official extension list. GLEW is provided by the glew & glew-devel packages in the EPEL repository.
freeglut is a full replacement for GLUT (OpenGL Utility Toolkit). GLUT (and hence freeglut) allows the user to create and manage windows containing OpenGL contexts on a wide range of platforms and also read the mouse, keyboard and joystick functions. freeglut is provided by the freeglut & freeglut-devel packages of RHEL/CentOS 6.
To link with freeglut, e.g.:
$ gcc glut.c -o glut.x -lglut -lGLU -lGL -lXmu -lXext -lX11 -lm
GLFW is a newer alternative to GLUT. It is an Open Source, multi-platform library for creating windows with OpenGL contexts and receiving input and events.
Download GLFW 3.1:
$ cd /scratch/ $ wget http://downloads.sourceforge.net/project/glfw/glfw/3.1/glfw-3.1.zip $ unzip glfw-3.1.zip
Build and install GLFW 3.1[2]:
$ cd glfw-3.1 $ mkdir build $ cd build $ cmake -DCMAKE_INSTALL_PREFIX:PATH=/pfs/sw/serial/gcc/glfw-3.1 .. $ make all install
GLFW 3.1 is installed at /pfs/sw/serial/gcc/glfw-3.1. It has a lot of dependencies:
$ export PKG_CONFIG_PATH=/pfs/sw/serial/gcc/glfw-3.1/lib/pkgconfig $ pkg-config --static --libs glfw3 -L/pfs/sw/serial/gcc/glfw-3.1/lib -lglfw3 -lrt -lXrandr -lXinerama -lXi -lXcursor -lGL -lm -ldl -lXrender -ldrm -lXdamage -lX11-xcb -lxcb-glx -lXxf86vm -lXfixes -lXext -lX11 -lpthread -lxcb -lXau
So we are better off using the pkg-config tool to build GLFW applications[3]. For instance, to compile the example GLFW code:
$ gcc -o GLFW_example.x GLFW_example.c \ `pkg-config --cflags glfw3` \ `pkg-config --static --libs glfw3`
Nvidia K20 supports OpenGL 4.4. Since version 4.3, the core specifications of OpenGL include support for Compute Shader, a OpenGL shader stage that is used entirely for computing arbitrary information[4].
In the wiki article GPU QuickStart Guide, I demonstrate several different ways to implement a simple AXPY (A·X Plus Y) computation on the CUDA platform. Here I present you yet another implementation, using OpenGL Compute Shader (error handling removed for code clarity):
#define GL_GLEXT_PROTOTYPES
#include <GL/glut.h>
#include <GL/glext.h>
#include <cstdlib>
#include <cstdio>
#include <ctime>
#include <cmath>
#define N 20480
int main (int argc, char** argv)
{
glutInit (&argc, argv);
glutInitDisplayMode (GLUT_DOUBLE | GLUT_RGBA);
glutInitWindowSize (1, 1);
glutInitWindowPosition (100, 100);
glutCreateWindow ("Compute Shader");
// get OpenGL version info
const GLubyte* renderer;
const GLubyte* version;
renderer = glGetString (GL_RENDERER);
version = glGetString (GL_VERSION);
printf ("Renderer: %s\n", renderer);
printf ("OpenGL version supported: %s\n", version);
const char* compute_shader =
"#version 430\n"
"uniform float a;"
"layout (local_size_x = 32) in;"
"layout (std430, binding=0) buffer xblock { float x[]; };"
"layout (std430, binding=1) buffer yblock { float y[]; };"
"void main () {"
" int index = int(gl_GlobalInvocationID);"
" y[index] += a * x[index];"
"}";
GLuint cs = glCreateShader (GL_COMPUTE_SHADER);
glShaderSource (cs, 1, &compute_shader, NULL);
glCompileShader (cs);
GLuint cs_program = glCreateProgram ();
glAttachShader (cs_program, cs);
glLinkProgram (cs_program);
// initialize x and y
float *x, *y;
size_t size = N * sizeof (float);
x = (float *) malloc (size);
y = (float *) malloc (size);
srand (time (NULL));
int i;
for (i=0; i<N; i++)
x[i] = (float) random () / RAND_MAX;
for (i=0; i<N; i++)
y[i] = (float) random () / RAND_MAX;
float a = (float) random () / RAND_MAX;
// buffer objects
GLuint bo[2];
glGenBuffers(2, bo);
// bind buffer object bo[0] to indexed buffer target with binding=0
// (xblock in computer_shader)
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, bo[0]);
glBufferData(GL_SHADER_STORAGE_BUFFER, size, x, GL_STATIC_READ);
// bind buffer object bo[1] to indexed buffer target with binding=1
// (yblock in computer_shader)
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 1, bo[1]);
glBufferData(GL_SHADER_STORAGE_BUFFER, size, y, GL_DYNAMIC_READ);
glUseProgram (cs_program);
glUniform1f (glGetUniformLocation (cs_program, "a"), a);
glDispatchCompute(N/32, 1, 1);
// read from buffer object to host memory
float *d;
d = (float *) malloc (size);
glGetBufferSubData(GL_SHADER_STORAGE_BUFFER, 0, size, d);
// verify the results
float m = -1.0f;
float tmp;
for (i=0; i<N; i++) {
y[i] += a * x[i];
tmp = fabsf ( (d[i]-y[i])/y[i] );
if ( tmp > m ) m = tmp;
}
// clean up
glDeleteBuffers(2, bo);
glDetachShader(cs_program, cs);
glDeleteShader(cs);
glDeleteProgram(cs_program);
free(x);
free(y);
free(d);
if ( m < 1E-6 ) {
printf("Success!\n");
return 0;
}
else {
printf("Failure!\n");
return 1;
}
}
Note:
- Even though we don't do any rendering in this program, we still have to create an OpenGL context in order to use Compute Shader.
- For simplicity, we use GLUT to start the OpenGL context.
- The Nvidia driver installation updated the OpenGL libraries, but didn't update the header files, on the GPU nodes.
- The OpenGL header files are provided by the package mesa-libGL-devel-9.0-0.7.el6.x86_64, which is a bit outdated, and only supports OpenGL 4.3 APIs.
- To use OpenGL 4.4 APIs, we could update to mesa-libGL-devel-10.1.2-2.el6.x86_64 (CentOS 6.6); or download the headers directly from OpenGL Registry.
$ g++ saxpy.cs.cpp -o saxpy.cs.x -lglut -lGLU -lGL -lXmu -lXext -lX11 -lm $ export DISPLAY=:0 $ ./saxpy.cs.x Renderer: Tesla K20m/PCIe/SSE2 OpenGL version supported: 4.4.0 NVIDIA 340.58 Success!
Note: to use Nvidia K20 for your OpenGL applications, you must set DISPLAY=:0 so that your applications will run on X server attached to the GPU.