| OpenCL is a new technology that lets programmers write small programs (“kernels”) that will be executed in a massively parallel way on modern graphic card processors (GPUs), which have come to be dormant powerhorses for most non-gamers. It actually also allows these “kernels” to run on (multicore) CPUs, thus effectively presenting a unified way to tap into most of the processing power of a modern computer. |
|
|
OpenCL comes with a runtime (integrated to the recent MacOS X 10.6 and soon-to-be-provided by all major graphic card vendors on other operating systems), a C API for the host programs and a dedicated C-derived language to write the parallel kernels being hosted (see OpenCL’s specification).
So how can we use that promising technology from Java ?
The odd news is that for once, it’ll be an easier Java affair on MacOS X than on other operating systems, due to Apple’s OpenCL implementation being the only officially-released one in the wild so far (who said that Java on the Mac was in bad shape ?).
So first, we need OpenCL bindings for Java. This is trivial to create thanks to JNAerator (see the “Building” section on NativeLibs4Java’s OpenCL page).
These auto-JNAerated C-style wrappings are indeed easy to create, but they are a pain in the a bit painful to use, for a few reasons :
- JNAerator creates multiple options for each function bound : it might not be obvious which one is best suited for a given use.
- The OpenCL API is very C-oriented and plain unfriendly to OO-brainwashed Java developers.
- One has to know JNA’s runtime classes a bit to know how to call the methods (do you all know what an IntByReference is ?)
- There are little OpenCL examples on the net
Ok, so we just need a simple OO layer around that C API, right ? Here it is !
If this still looks cryptic to you, here’s the long-awaited simple Java example that uses the OpenCL bindings to run simple parallel operations (and runs fine on MacOS X 10.6) :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 | CLPlatform platform = OpenCL4Java.listPlatforms()[0]; // take first platform available CLDevice[] devices = platform.listAllDevices(); //CLDevice[] devices = platform.listCPUDevices(); //CLDevice[] devices = platform.listGPUDevices(); CLContext context = CLContext.createContext(devices); int dataSize = 10000; String src = "\n" + "__kernel void aSinB( \n" + " __global const float* a, \n" + " __global const float* b, \n" + " __global float* output) \n" + "{ \n" + " int i = get_global_id(0); \n" + " output[i] = a[i] * sin(b[i]) + 1; \n" + "} \n"; CLProgram program = context.createProgram(src).build(); CLKernel kernel = program.createKernel("aSinB"); CLQueue queue = context.createDefaultQueue(); // Allocate OpenCL-hosted memory for inputs and output CLMem memIn1 = context.createInput(dataSize * 4); CLMem memIn2 = context.createInput(dataSize * 4); CLMem memOut = context.createOutput(dataSize * 4); // Bind these memory objects to the arguments of the kernel kernel.setArgs(memIn1, memIn2, memOut); /// Map input buffers to populate them with some data FloatBuffer a = memIn1.blockingMapWrite(queue).asFloatBuffer(); FloatBuffer b = memIn2. blockingMapWrite(queue).asFloatBuffer(); // Fill the mapped input buffers with data for (int i = 0; i < dataSize; i++) { a.put(i, i); b.put(i, i); } /// Unmap input buffers memIn1.unmap(queue, a); memIn2.unmap(queue, b); // Ask for execution of the kernel with global size = dataSize // and workgroup size = 1 kernel.enqueueNDRange(queue, new int[]{dataSize}, new int[]{1}); // Wait for all operations to be performed queue.finish(); // Copy the OpenCL-hosted array back to RAM FloatBuffer output = NIOUtils.directFloats(dataSize); memOut.read(output, queue, true); // Compute absolute and relative average errors wrt Java implem double totalAbsoluteError = 0, totalRelativeError = 0; for (int i = 0; i < dataSize; i++) { float expected = i * (float)Math.sin(i) + 1; float result = output.get(i); double d = result - expected; if (expected != 0) totalRelativeError += d / expected; totalAbsoluteError += d < 0 ? -d : d; } double avgAbsoluteError = totalAbsoluteError / dataSize; double avgRelativeError = totalRelativeError / dataSize; System.out.println("Average absolute error = " + avgAbsoluteError); System.out.println("Average relative error = " + avgRelativeError); |
This is reasonably easy to read, but it is possible to make OpenCL even more affordable for casual programmers who don’t want to be bothered with learning OpenCL. For that, please wait for my next post : ScalaCL: Reap OpenCL’s benefits without learning its syntax (Scala DSL for transparently parallel computations) (edited to match actual title)
In the meanwhile you can find all the sources and binaries to play with on OpenCL4Java’s homepage.
This has only been tested on MacOS X, but you might succeed adapting it on Windows using ATI Stream 2 beta or the early access OpenCL drivers from NVidia (please report any success in the comments).
As usual, comments are highly welcome
Edit (Sept 30th 2009): Updated code snippets to match recent changes in API










Sounds cool. I’ve been thinking that something like ScalaCL should exist (although I’m not familiar with OpenCL yet).
@Erkki Thanks
Here’s a preview of the syntax that will be available in ScalaCL :
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
{
val a = FloatsVar
val b = FloatsVar
val result = FloatsVar
content = result := a * sin(b) + 1
//edit Sept. 17th 2009: syntax changed, the old one was :
// override var content: Stat = result ~ (a * sin(b) + 1)
// and : Program(Context.BEST, i)
}
var prog = new Prog1(Dim(10000))
prog.a.write(someInputBufferForA) // can be direct or array-backed
prog.b.write(someInputBufferForB)
prog !
println prog.result.read() // returns a direct buffer (read(buf) also exists)
Here,
is equivalent to
, but ScalaCL is able to infer the use of dimensions as implicit array indexes.
Should you want to interleave the results, say
, ScalaCL will guess that the size of the result buffer should be twice that of a and b.
Support for multi-dimensions, local id, inference of input and output buffer sizes and extensive OpenCL functions support is also in the works or planned.
Pingback: Ché zOlive » Blog Archive » ScalaCL: Reap OpenCL’s benefits without learning its syntax (Scala DSL for transparently parallel computations)
That’s it, I’ve blogged about ScalaCL
Pingback: Ché zOlive » Blog Archive » OpenCL4Java v1.0-beta-1 released
Using the newly released OpenCL drivers from NVIDIA (under Ubuntu) it complains about ‘a kernel function must return void’ in the build step. This is easily solved by changing the line
“__kernel aSinB(” +
to
“__kernel void aSinB(” +
Hi Florian,
You’re right, thanks !
I’ve updated the examples in SVN + edited this post accordingly
Cheers
Hi,
I followed this guide, and after a lot of work
got things running but I get an error :
Exception in thread \main\ java.lang.IllegalArgumentException: Null OpenCL CLPlatform !
I use winXP with AIi Stream SDK Beta 2.0,
and I can run OpenCL in C without problem,
but I am unsuccessful in JAVA. Any ideas?