In my previous post I introduced OpenCL4Java, a new library I wrote to make it relatively easy to use OpenCL from Java.

Today I’ll present you ScalaCL, which goes a step further in Scala(if you haven’t jumped in the Scala train yet, now is the time to do it !)

ScalaCL is a library that makes it trivial to evaluate simple parallel expressions in Scala.

**
Technically, ScalaCL is an [internal DSL](http://fragmental.tw/research-on-dsls/domain-specific-languages-dsls/internal-dsls/) (see [Domain Specific Language](http://en.wikipedia.org/wiki/Domain-specific_language)) that creates OpenCL kernels out of its internal AST-like representation and executes them through the [OpenCL4Java](http://code.google.com/p/nativelibs4java/wiki/OpenCL) OO bindings (which in turn use [JNA](https://jna.dev.java.net/) + [JNAerator](http://code.google.com/p/jnaerator/) to call OpenCL's C API
*</Useless Technical Jargon Off>*

A common OpenCL example you can find on the web is the “vector add” (see ATI doc, Python::OpenCL…)
Here’s how it looks like in ScalaCL :

import scalacl._
import scalacl.ScalaCL._
class VectAdd(i: Dim) extends Program(i) {
  val a = FloatsVar // array of floats
  val b = FloatsVar
  var output = FloatsVar
  content = output := a + b
  // Or some more useful content :
  // content = output := a * sin(b) + 1
}
var n = 1000;
var prog = new MyProg(n)
prog.a.write(1 to n)
prog.b.write(prog.a)
prog ! // run the operation
println prog.output // print the output

Here we simply have arrays of “a” and “b” float values and for each position i in the arrays we want to compute “output[i] = a[i] * b[i]”.

The cool thing here is this gets executed transparently in parallel on the graphic card (or on your GPU, depending on your OpenCL environment). Yes, that’s parallelization for free !

Notice the “:=” operator to express assignment of some expression to a variable.

Note that :

content = output := a * sin(b) + 1

would be strictly equivalent to

content = output(i) := a(i) * sin(b(i)) + 1

The size and array index of the a and b array variables are here automatically inferred by ScalaCL to that of the only declared dimension of execution of the program (an OpenCL kernel can execute on an arbitrary number of dimensions, which is useful to process 2D or 3D images).

Of course, the size of the arrays can also be specified by hand, and this is actually needed when there are more than one dimension of execution :

class SomeProg(x: Dim, y: Dim) extends Program(x, y) {
  val someBuf = IntsVar(10)
  val someOtherBuf = IntsVar(y) // buffer as large as the y dimension
  content = ...
}

In this last example, the program will execute on a 2-dimensional basis (equivalent to evaluating its content in a for (int x ...) for (int y ...) loop).

However, the preferred way is to let ScalaCL guess the size of input and output arrays, even when in non-trivial cases :

class VectSinCos(i: Dim) extends Program(i) {
  val x = FloatsVar
  val sincosx = FloatsVar
  content = List(
    sincosx(i * 2) := sin(x),
    sincosx(i * 2 + 1) := cos(x)
  )
}

Here, the result “sincosx” array will contain interleaved sin and cos values for each x input, so its size is automatically inferred to twice the size of x. The size of x is itself implicitely bound to be the only dimension declared in the program.

Behind the scene, ScalaCL will create the following OpenCL kernel source code for the previous program :

__kernel function(
__global const float* in,
__global float* out)
{
  int dim1 = get_global_id(0);
  out[dim1 * 2] = sin(in[dim1]);
  out[(dim1 * 2) + 1] = cos(in[dim1]);
}

ScalaCL guesses out that there is one input (read-only) array and one output (read-write) array, and it creates the source code that corresponds to the declared content of the program. The names of the variables are unknown to it though (DSL technical limitation), but sensible names are chosen so that the OpenCL kernels are still readable (“x” became “in”, “sincosx” became “out”) in the debug output. The typical user will never need to look at the generated OpenCL source code anyway…

ScalaCL is still in the very early stages of development but It Already Works (TM) and might soon relieve you from the need to learn too much about OpenCL.

You have a few options to try these examples, in preferred order :

Here are some of the features planned in a near future (besides bugfixes) :

  • Add ImageVar and corresponding functions. Right now, only scalar variables (FloatVar, IntVar…) and array variables (FloatsVar, IntsVar)
  • Add LocalDim and syntax to deal with workgroup size. For now, the workgroup size is always 1
  • Provide automatic support for reductions, through the use of +=, -=, /=, *= operators.

In a soon-to-be-published post I’ll talk about OpenCL4Java performance with some benchmark results for simple parallel operations… Stay tuned 🙂

Any feedback is highly welcome, as usual…