Hello World - tugrul512bit/Cekirdekler Wiki

As with programming tradition, here is a hello world program that is doing nothing but writing "hello world" for 1000 times to console:

            ClNumberCruncher cr = new ClNumberCruncher(
                AcceleratorType.GPU, @"
                    __kernel void hello(__global char * arr)
                        int workitem_id=get_global_id(0); // contiguous between devices, easy parallelism
                        printf(""hello world"");

            ClArray<byte> array = new ClArray<byte>(1000);
            array.compute(cr, 1, "hello", 1000, 100); 

this code creates a context for each device, handles necessary buffer copies automatically but ofcourse doesn't use in kernel as seen, creates 10 workgroups each with 100-workitems making 1000 workitems totally and distributes them to all GPUs found in system and then just prints "hello worldhello worldhello worldhello worldhello worldhello world...". The parameter "1" here is its compute id for the load balancer and performance counter to differentiate this hello world computation from other computations. Everytime same id is used, more fair work balance will be calculated for multiple GPUs.

(Please note that a byte in C# is unsigned while char in opencl-c(C99 constrained) is signed so you may need uchar in your real kernel)

Some beta drivers of GPUs are also problematic with printf so pay attention to have a fully-opencl-compatible system setup. You can test and benchmark your system with compubench.