OpenGL on Nvidia K20 - shawfdong/hyades GitHub Wiki
Each of the 8 GPU nodes in Hyades contains an Nvidia Tesla K20 GPU Accelerator, a server-class GPU. For Tesla K20X and K20 devices, OpenGL support is an opt-in feature via the GPU Operation Mode (GOM). On Tesla K40 and newer devices, OpenGL capabilities are always enabled[1].
We'll use NVIDIA System Management Interface (nvidia-smi) to enable hardware-accelerated OpenGL on Tesla K20 devices.
# nvidia-smi --help
Query the GPU Operation Mode (GOM) currently in use:
# nvidia-smi --query-gpu=gom.current --format=csv,noheader ComputeBy default, only compute tasks are allowed on Tesla K20 devices. Graphics operations are not allowed.
Enable hardware-accelerated OpenGL by putting the GPU into the All On mode:
# nvidia-smi --gom=0 GOM changed to "All On" for GPU 0000:03:00.0. All done. Reboot required.
Reboot the GPU node.
GOM is now All On:
# nvidia-smi --query-gpu=gom.current --format=csv,noheader All On
OpenGL doesn't exist until we create an OpenGL context. Creation of OpenGL context is platform-specific. On Linux, it is typical to use GLX. GLX provides the interface connecting OpenGL and the X Window System: it enables programs wishing to use OpenGL to do so within a window provided by the X Window System. Hardware acceleration to indirect GLX is brougt by AIGLX (Accelerated Indirect GLX), by loading the Mesa DRI driver inside the X server.
Install X Window System:
# yum grouplist # yum groupinstall "X Window System"
Install GNOME Desktop Environment:
# yum groupinstall "GNOME Desktop Environment"
Because there was no X Window System on the GPU nodes, the installation of Nvidia driver didn't include the X Window drivers. Let's re-install the Nvidia driver:
# /pfs/sw/NVIDIA/NVIDIA-Linux-x86_64-340.58.runNow there are /usr/lib64/xorg/modules/drivers/nvidia_drv.so and /usr/lib64/xorg/modules/libnvidia-wfb.so.340.58.
There is no display connector on Tesla K20, so we'll set up a headless X server to manage the OpenGL contexts. Nvidia provides an X Configuration Tool (nvidia-xconfig) to manipulate X configuration files. We'll use it to configure the X server for headless operation.
# nvidia-xconfig -h
Query GPU info:
# nvidia-xconfig --query-gpu-info Number of GPUs: 1 GPU #0: Name : Tesla K20m UUID : GPU-ad6ee1ac-074f-6950-ae49-8d84ad7600df PCI BusID : PCI:3:0:0 Number of Display Devices: 0
Create the X configuration file /etc/X11/xorg.conf:
# nvidia-xconfig --busid=PCI:3:0:0 --use-display-device=none
By default, only console owner can start the X server on display 0. We'll allow any user to do so, by replacing the following line in /etc/pam.d/xserver:
auth required pam_console.sowith:
auth sufficient pam_permit.so
Run:
# sed -i 's/required\tpam_console.so/sufficient\tpam_permit.so/' /etc/pam.d/xserver
Test the headless X server:
$ Xorg :0 2> /dev/null & $ export DISPLAY=:0 $ xrandr Screen 0: minimum 8 x 8, current 1280 x 1024, maximum 16384 x 16384 $ glxinfo name of display: :0.0 display: :0 screen: 0 direct rendering: Yes server glx vendor string: NVIDIA Corporation server glx version string: 1.4 ... client glx vendor string: NVIDIA Corporation client glx version string: 1.4 ... GLX version: 1.4 ... OpenGL vendor string: NVIDIA Corporation OpenGL renderer string: Tesla K20m/PCIe/SSE2 OpenGL version string: 4.4.0 NVIDIA 340.58 OpenGL shading language version string: 4.40 NVIDIA via Cg compiler ... $ glxgears 101360 frames in 5.0 seconds = 20271.846 FPS 105496 frames in 5.0 seconds = 21099.188 FPS 105475 frames in 5.0 seconds = 21094.883 FPS ...
The following OpenGL libraries are installed on the GPU nodes.
GLU (OpenGL Utility Library) consists of a number of functions that use the base OpenGL library to provide higher-level drawing routines from the more primitive routines that OpenGL provides. GLU is provided by the mesa-libGLU & mesa-libGLU-devel packages of RHEL/CentOS 6.
GLEW (OpenGL Extension Wrangler Library) is a cross-platform C/C++ library that helps in querying and loading OpenGL extensions. GLEW provides efficient run-time mechanisms for determining which OpenGL extensions are supported on the target platform. All OpenGL extensions are exposed in a single header file, which is machine-generated from the official extension list. GLEW is provided by the glew & glew-devel packages in the EPEL repository.
freeglut is a full replacement for GLUT (OpenGL Utility Toolkit). GLUT (and hence freeglut) allows the user to create and manage windows containing OpenGL contexts on a wide range of platforms and also read the mouse, keyboard and joystick functions. freeglut is provided by the freeglut & freeglut-devel packages of RHEL/CentOS 6.
To link with freeglut, e.g.:
$ gcc glut.c -o glut.x -lglut -lGLU -lGL -lXmu -lXext -lX11 -lm
GLFW is a newer alternative to GLUT. It is an Open Source, multi-platform library for creating windows with OpenGL contexts and receiving input and events.
Download GLFW 3.1:
$ cd /scratch/ $ wget http://downloads.sourceforge.net/project/glfw/glfw/3.1/glfw-3.1.zip $ unzip glfw-3.1.zip
Build and install GLFW 3.1[2]:
$ cd glfw-3.1 $ mkdir build $ cd build $ cmake -DCMAKE_INSTALL_PREFIX:PATH=/pfs/sw/serial/gcc/glfw-3.1 .. $ make all install
GLFW 3.1 is installed at /pfs/sw/serial/gcc/glfw-3.1. It has a lot of dependencies:
$ export PKG_CONFIG_PATH=/pfs/sw/serial/gcc/glfw-3.1/lib/pkgconfig $ pkg-config --static --libs glfw3 -L/pfs/sw/serial/gcc/glfw-3.1/lib -lglfw3 -lrt -lXrandr -lXinerama -lXi -lXcursor -lGL -lm -ldl -lXrender -ldrm -lXdamage -lX11-xcb -lxcb-glx -lXxf86vm -lXfixes -lXext -lX11 -lpthread -lxcb -lXau
So we are better off using the pkg-config tool to build GLFW applications[3]. For instance, to compile the example GLFW code:
$ gcc -o GLFW_example.x GLFW_example.c \ `pkg-config --cflags glfw3` \ `pkg-config --static --libs glfw3`
Nvidia K20 supports OpenGL 4.4. Since version 4.3, the core specifications of OpenGL include support for Compute Shader, a OpenGL shader stage that is used entirely for computing arbitrary information[4].
In the wiki article GPU QuickStart Guide, I demonstrate several different ways to implement a simple AXPY (A·X Plus Y) computation on the CUDA platform. Here I present you yet another implementation, using OpenGL Compute Shader (error handling removed for code clarity):
#define GL_GLEXT_PROTOTYPES #include <GL/glut.h> #include <GL/glext.h> #include <cstdlib> #include <cstdio> #include <ctime> #include <cmath> #define N 20480 int main (int argc, char** argv) { glutInit (&argc, argv); glutInitDisplayMode (GLUT_DOUBLE | GLUT_RGBA); glutInitWindowSize (1, 1); glutInitWindowPosition (100, 100); glutCreateWindow ("Compute Shader"); // get OpenGL version info const GLubyte* renderer; const GLubyte* version; renderer = glGetString (GL_RENDERER); version = glGetString (GL_VERSION); printf ("Renderer: %s\n", renderer); printf ("OpenGL version supported: %s\n", version); const char* compute_shader = "#version 430\n" "uniform float a;" "layout (local_size_x = 32) in;" "layout (std430, binding=0) buffer xblock { float x[]; };" "layout (std430, binding=1) buffer yblock { float y[]; };" "void main () {" " int index = int(gl_GlobalInvocationID);" " y[index] += a * x[index];" "}"; GLuint cs = glCreateShader (GL_COMPUTE_SHADER); glShaderSource (cs, 1, &compute_shader, NULL); glCompileShader (cs); GLuint cs_program = glCreateProgram (); glAttachShader (cs_program, cs); glLinkProgram (cs_program); // initialize x and y float *x, *y; size_t size = N * sizeof (float); x = (float *) malloc (size); y = (float *) malloc (size); srand (time (NULL)); int i; for (i=0; i<N; i++) x[i] = (float) random () / RAND_MAX; for (i=0; i<N; i++) y[i] = (float) random () / RAND_MAX; float a = (float) random () / RAND_MAX; // buffer objects GLuint bo[2]; glGenBuffers(2, bo); // bind buffer object bo[0] to indexed buffer target with binding=0 // (xblock in computer_shader) glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, bo[0]); glBufferData(GL_SHADER_STORAGE_BUFFER, size, x, GL_STATIC_READ); // bind buffer object bo[1] to indexed buffer target with binding=1 // (yblock in computer_shader) glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 1, bo[1]); glBufferData(GL_SHADER_STORAGE_BUFFER, size, y, GL_DYNAMIC_READ); glUseProgram (cs_program); glUniform1f (glGetUniformLocation (cs_program, "a"), a); glDispatchCompute(N/32, 1, 1); // read from buffer object to host memory float *d; d = (float *) malloc (size); glGetBufferSubData(GL_SHADER_STORAGE_BUFFER, 0, size, d); // verify the results float m = -1.0f; float tmp; for (i=0; i<N; i++) { y[i] += a * x[i]; tmp = fabsf ( (d[i]-y[i])/y[i] ); if ( tmp > m ) m = tmp; } // clean up glDeleteBuffers(2, bo); glDetachShader(cs_program, cs); glDeleteShader(cs); glDeleteProgram(cs_program); free(x); free(y); free(d); if ( m < 1E-6 ) { printf("Success!\n"); return 0; } else { printf("Failure!\n"); return 1; } }
Note:
- Even though we don't do any rendering in this program, we still have to create an OpenGL context in order to use Compute Shader.
- For simplicity, we use GLUT to start the OpenGL context.
- The Nvidia driver installation updated the OpenGL libraries, but didn't update the header files, on the GPU nodes.
- The OpenGL header files are provided by the package mesa-libGL-devel-9.0-0.7.el6.x86_64, which is a bit outdated, and only supports OpenGL 4.3 APIs.
- To use OpenGL 4.4 APIs, we could update to mesa-libGL-devel-10.1.2-2.el6.x86_64 (CentOS 6.6); or download the headers directly from OpenGL Registry.
$ g++ saxpy.cs.cpp -o saxpy.cs.x -lglut -lGLU -lGL -lXmu -lXext -lX11 -lm $ export DISPLAY=:0 $ ./saxpy.cs.x Renderer: Tesla K20m/PCIe/SSE2 OpenGL version supported: 4.4.0 NVIDIA 340.58 Success!
Note: to use Nvidia K20 for your OpenGL applications, you must set DISPLAY=:0 so that your applications will run on X server attached to the GPU.