January Update : JNAerator, BridJ, JavaCL, ScalaCL and… Scalaxy !

Happy New Year !

First, I’d like to thank all the users of JNAerator, BridJ, JavaCL and ScalaCL for their continued support and help.

The progress over this year wouldn’t have been possible without the contributions of Rémi Émonet (biggest patcher), Kazó Csaba (who even created javaclpp) and Atsushi Eno (Android support).

I’d also like to give a special thank to some of the most prolific and friendliest bug-reporters of this year : Andrei Sochirca, Raphael Cohn, Andrea Aime (but there are so many others !).

Now here are some news about the projects, for those who don’t follow the commits on github :-)

Maven Central

I’m in the process of getting all NativeLibs4Java artifacts on the Maven Central repository (using Sonatype OSS). What this means is that you won’t need to add repository and/or pluginRepository tags to your Maven project files to get the releases, and that the snapshots will now be available from Sonatype’s OSS Snapshots Repository (see updated usage instructions).

There are still a few blocking points (like, deploying Android’s dex tool to Maven Central), but I’m optimistic about finishing this task in the next few weeks.

In the meanwhile, should you want to build from sources or even just depend on a snapshot version, you might need to install dx to your local repository manually using the script BridJ/admin/android-dx-package.

BridJ

I’ll be releasing version 0.6.1 of BridJ with tons of fixes as soon as the Maven Central work is completed.

Please note that passing or returning structs by value will still not be supported in that release, although some experimental work has been shoved in (for very large structs on select platforms).

For more details please see BridJ’s CHANGELOG.

JNAerator

This year I’ve been criminally lazy with JNAerator release notes : I haven’t announced versions 0.9.8 and 0.9.9 publicly… maybe because I wasn’t that proud of the stability of these two versions :-S

Anyway, 2012 should be much better since the upcoming version 0.9.10 is bound to become the best version of JNAerator ever, with a few quantum leaps in features and stability (will release it in within a few weeks).

For more details please see JNAerator’s CHANGELOG.

Oh, and please note a big change here : JNAerator artifact’s groupId was changed to com.nativelibs4java (was previously com.jnaerator), and snapshots are available on Sonatype’s OSS Snapshots Repository.

JavaCL

Version 1.0-RC2 of JavaCL will ship soon, with quite a few bug fixes.

For more details please see JavaCL’s CHANGELOG.

Oh, and last but not least : JavaCL is now covered in the chapter of OpenCL in Action (clearly the best OpenCL reference and hands-on guide out there, packed with amazing real-world examples !)

ScalaCL vs. Scalaxy

Since last summer’s Scalathon, progress on ScalaCL hasn’t been very visible because it’s been a bit slow : the new stream rewrite pipeline broke all the tests (especially those that compare the compiled bytecode to some reference hand-rewritten version), so it’s become hard and unrewarding to work on the project. But now the tests are almost back to normal, and it’s time to split the project in two parts :

  • Scalaxy Compiler Plugin for generalistic loop rewrites, performance warnings and compile-time coding guidelines enforcement
  • ScalaCL Collections & Compiler Plugin for GPGPU computation (the plugin obviously depends on Scalaxy since it needs to rewrite loops found in Scala functions converted to OpenCL kernels)

If you have any idea of a better name for Scalaxy, please contribute it on the list or on twitter : previous codename was FaSca (for Faster Scala), but I’m not sure I like the new one much more :-S

It’s hard to say when a release will be ready for either ScalaCL or Scalaxy, since the latter still requires quite a bit of stabilization, but snapshots should already be usable right now.

That’s it for today, happy hacking to everyone !

Posted in BridJ, In English, JavaCL, JNAerator, NativeLibs4Java, ScalaCL, Uncategorized | Comments Off

BridJ 0.6 released : new protected mode, more stable (especially on Windows), Objective-C delegates & blocks

BridJ is an innovative native bindings library that lets Java programmers use native libraries (written in C, C++, ObjectiveC and more) in a very natural way (inspired by the great JNA, with better performance, C++ and generics added).

Here’s a summary of the changes between version 0.5 and this new version 0.6 (see full change log here) :

  • Added errno/GetLastError() mechanism : declare methods to throw org.bridj.LastError and it’s all handled automatically (issue #74)
  • Added protected mode (-Dbridj.protected=true / BRIDJ_PROTECTED=1), to prevent native crashes (makes BridJ bindings slower + disables optimized raw calls).
  • Added proxy-based Objective-C delegates support (forwards unknown methods to a Java instance) (issue #188)
  • Added Objective-C 2.0 blocks support (similar to callbacks, inherit from ObjCBlock instead of Callback) (issue #192)
  • Added Pointer.asList() and .asList(ListType) to get a List view of the pointed memory
    • depending on the ListType, the view can be mutable / resizeable
    • removed the List interface from Pointer (which is now just an Iterable)
    • added Pointer.allocateList(type, capacity) to create a NativeList from scratch (has a .getPointer method to grab the resulting memory at the end)
  • Added Pointer.moveBytesTo(Pointer)
  • Added support for embedded libraries extraction from “lib/arch” paths (along with “org/bridj/lib/arch”, where arch uses BridJ’s convention)
  • Added TimeT (time_t), timeval classes (issue #72)
  • Added Platform.getMachine() (same result as `uname -m`)
  • Added support for multiarch Linux distributions (issue #2)
  • Added support for versioned library file names (issue #72)
  • Added global allocated memory alignment setting (BRIDJ_DEFAULT_ALIGNMENT env. var. & bridj.defaultAlignment property), + Pointer.allocateAlignedArray
  • Added basic calls log mechanism (disables direct mode) : -Dbridj.logCalls=true or BRIDJ_LOG_CALLS=1 (only logs the method name & signature, not the arguments or returned values)
  • Added BridJ.setMinLogLevel(Level) (issue #190)
  • Added Platform.addEmbeddedLibraryResourceRoot(root) to use & customize the embedded library extraction feature in user projects
  • Added support for packed structs (@Struct(pack = 1), or any other pack value)
  • Added check of BridJ environment variables and Java properties : if any BRIDJ_* env. var. or bridj.* property does not exist, it will log warnings + full list of valid options
  • Added @JNIBound annotation to mark native methods that should not be bound by BridJ but by plain old JNI
  • Fixed Pointer.next/.offset methods (used to throw errors a the end of iteration)
  • Fixed Pointer.getNativeObjectAtOffset(long byteOffset, Type type)
  • Fixed struct fields implemented as Java fields
  • Fixed resolution of MacOS X “ApplicationServices” framework binaries, such as CoreGraphics
  • Fixed some COM bugs with IUnknown
  • Fixed demangling/matching of CLong & SizeT
  • Fixed CLong & SizeT arguments
  • Fixed Objective-C runtime (basic features), added NSString constructor & NSDictionary (with conversion to/from Map<NSString, NSObject>)
  • Fixed crashes on Win32 (when using Pointer class in bound function arguments)
  • Fixed crash during deallocation of Callbacks + fixed leak of Callbacks (now need to retain a reference to callbacks or use BridJ.protectFromGC / unprotectFromGC)
  • Made the StructIO customization mechanism more flexible
  • Made JawtLibrary public
  • Various Javadoc tweaks

Special thanks to the users, contributors and bug reporters that helped getting this version out !

You can contribute to the project by reporting bugs here and joining the NativeLibs4Java Community.

Wait no longer : download and use BridJ now !

Posted in Uncategorized | Comments Off

JavaCL 1.0.0-RC1 released : switched to BridJ, added FFT utils and UJMP matrices, bugfixes, API updates…

JavaCL v1.0.0 RC1 is available !

Feedback needed : last-minute bugs will be fixed in next release candidates within the coming weeks, then a final 1.0 version will be released.

Download / Install | Browse JavaDoc | Getting Started (Tutorial) | Discuss

Launch the new JavaCL Interactive Image Transform Editor.

Release Notes

Here are the main changes from this new version (see full change log) :

  • The BridJ-powered version now becomes the default !
    The JNA version is still maintained and available with all Maven artifact ids suffixed with “-jna”. See migration notes below…
  • Added simple Fourier-analysis classes (package com.nativelibs4java.opencl.util.fft), with double and float variants, usable with primitive arrays or OpenCL buffers :
  • Added some math-related compiler options to CLProgram :
    • setFastRelaxedMath() (triggers all the others !)
    • setFiniteMathOnly()
    • setUnsafeMathOptimizations()
    • setMadEnable()
    • setNoSignedZero()
  • Added CLContext.createBuffer(Usage, Buffer)
  • Added CLBuffer.copyTo(CLQueue, CLMem destination, CLEvent…) and CLBuffer.emptyClone(Usage)
  • Added NIOUtils.indirectBuffer(size, bufferClass)
  • Added CLContext.toString
  • Deprecated CLXXXBuffer in favor of CLBuffer<xxx> (CLIntBuffer becomes CLBuffer<Integer>, etc…)
  • Changed CLContext.createBuffer(Usage, length, class) to createBuffer(Usage, class, length) to match the JavaCL/BridJ API (and provoke migration issues : people should now use a primitive class rather than an NIO buffer class !!!
  • Complete rewrite of UJMP Matrix implementation, using a design borrowed to ScalaCL (uses the BridJ port of JavaCL)
  • Fixed issue #66 (create temp files in ~/.javacl subdirectories instead of /tmp)
  • Fixed OpenGL sharing on MacOS X
  • Fixed CLProgram.getBinaries() in some cases
  • Fixed CLBuffer.read on indirect buffers
  • Fixed NPE that happens with null varargs CLEvent[] array
  • Fixed length = 1 case in reduction utility
  • Fixed ATI detection (“ATI Stream” now replaced by “AMD Accelerated Parallel Processing”, cf. Csaba’s comment in issue #39)
  • Fixed issue #55 : applied Kazo Csaba’s patch to fix the bounds of CLBuffer.map’s returned buffer
  • Fixed inheritance of CLBuildException (now derives from CLException)

Migration notes : from JNA to BridJ

The “javacl” Maven artifact now refers to the BridJ version of JavaCL, while “javacl-jna” corresponds to the JNA version (which was the default version up to 1.0 beta 6).

While the JNA version is still maintained (and will get patches at least for the 1.0.x versions), it is advised that all users migrate to the BridJ port at some point. The main selling points are :

  • Smaller call overhead (BridJ beats out JNA in interop. performance)
  • BSD-licensed (as opposed to the LGPL license of the JNA version)
  • Nicer API, with generic CLBuffer<T> instead of typed buffer classes (CLIntBuffer is now CLBuffer<Integer>)
  • All the new features will be reserved to the BridJ version

This quick migration guide on the wiki will help you perform the move in minutes.

Getting started

You can read the Getting Started (Tutorial) page on the wiki to get started very quickly !

Please join the NativeLibs4Java Google Group to discuss JavaCL / ScalaCL, get the latest news and ask for support from the growing JavaCL community.

Posted in Uncategorized | Comments Off

BridJ 0.5 released : Android support, dynamic callbacks, prepackaged subsets, support for old Java and MacOS X versions…

BridJ is an innovative native bindings library that lets Java programmers use native libraries (written in C, C++, ObjectiveC and more) in a very natural way (inspired by the great JNA, with better performance, C++ and generics added).

Here’s a summary of the changes between version 0.4.1 and this new version 0.5 (see full change log here) :

  • Added support for the Android / arm platform (issue #69)
  • Added Pointer.clone() that duplicates the pointed memory (requires a pointer with bounds information)
  • Added various pre-packaged specialized subsets of BridJ : c-only, windows-only, macosx-only, unix-only, linux-only, ios-only, android (see details on the wiki)
  • Added Pointer.allocateDynamicCallback(DynamicCallback, callingConv, returnType, paramTypes…)
  • Added BridJ native library path override : one can set the BRIDJ_LIBRARY environment variable or the “bridj.library” property to the full path of libbridj.so/.dylib/.dll
  • Fixed behaviour in environments with a null default classloader (as in Scala 2.9.0)
  • Added support for Java 1.5 (issue #57)
  • Added support for MacOS X 10.4, 10.5 (was previously restricted to 10.6)

Special thanks to Atsushi Eno, whose testing and many patch proposals were instrumental in getting BridJ to work on Android.

By the way : anyone can contribute to the project by reporting bugs here.

Wait no longer : download and use BridJ and join the NativeLibs4Java Community !

Posted in Uncategorized | Comments Off

BridJ 0.4.1 released (r1990): many callbacks fixes, better windows APIs support, enhanced C++ templates…

BridJ is an innovative native bindings library that lets Java programmers use native libraries (written in C, C++, ObjectiveC and more) in a very natural way (inspired by the great JNA, with better performance, C++ and generics added).

Here are the changes since version 0.4 (see full change log here) :

  • Fixed callbacks on Windows x86
  • Fixed multithreaded callbacks ! (callbacks called in a different thread than the one that created them used to hang indefinitely)
  • Fixed Pointer and ValuedEnum arguments and return values in callbacks
  • Fixed loading of libraries that depend on other libraries in the same directory on Windows (issue #65)
  • Fixed BridJ.sizeOf(Pointer.class), sizeOf(SizeT.class), sizeOf(CLong.class)…
  • Enhanced C++ templates support
  • Added support for Windows APIs Unicode vs. ANSI functions renaming (e.g. SendMessage being one of SendMessageW or SendMessageA, depending on Platform.useUnicodeVersionOfWindowsAPIs)
  • Added deprecated support for struct fields implemented as Java fields, to ease up migration from JNA (needs manual calls to BridJ.writeToNative(struct) and BridJ.readFromNative(struct)) (issue #54)
  • Added preliminary read-only support for STL’s std::vector C++ type
  • Added BridJ.describe(Type) to describe structs layouts (automatically logged for each struct type when BRIDJ_DEBUG=1 or -Dbridj.debug=true)
  • Added BridJ.describe(NativeObject).
  • Added StructObject.toString() (calls BridJ.describe(this))
  • Added BRIDJ_DEBUG_POINTERS=1 (or -Dbridj.debug.pointers=true) to display extended pointer allocation / deallocation debug information
  • Reorganized Windows COM packages (moved out DirectX code to it own top-level project : com.nativelibs4java:directx4java)
  • Implemented FlagSet.equals

Special thanks to Andrei Sochirca for this release : couldn’t have made it without his patient and tireless testing and bugs reporting.

Anyone can contribute to the project by reporting bugs here).

Now what ?

  1. Download Bridj’s binaries : works on Windows (x86/x64), Linux (x86/x64), MacOS X (universal) and Solaris (x86)
  2. Generate BridJ wrappers for your library using JNAerator (just select “BridJ” in the “Runtime” combobox).
  3. Read some documentation :
  4. FYI Pointer is probably the most important class to look at, the only other classes you need to know about are the ones created by JNAerator.

  5. Join the NativeLibs4Java Google Group to share your questions and remarks with the community !
Posted in Uncategorized | 1 Comment

BridJ 0.4 released (r1869): subclass C++ from Java, better Javadoc and many fixes

BridJ is an innovative native bindings library that lets Java programmers use native libraries (written in C, C++, ObjectiveC and more) in a very natural way (inspired by the great JNA, with better performance, C++ and generics added).

In this new version (0.4), there are a lot of bug fixes and new features :

  • Added parsing of GNU LD scripts (issue #61)
  • Fixed demangling of size_t / long C types with GCC
  • Fixed Linux x86 symbols extraction
  • Added experimental C++ virtual overrides : it is now possible to subclass C++ classes from Java, even with anonymous inner classes ! (no support for multiple inheritance yet)
  • Fixed crash in C++ destructors at the JVM shutdown (issue #60)
  • Fixed callbacks with float args
  • Added support for varargs functions
  • Introduced basic C++ templates support (binding of compiled template classes, not template methods / functions yet)
  • Added dynamic functions support : Pointer.asDynamicFunction(callConv, returnType, argTypes…)
  • Added support for arbitrary C++ constructors
  • Added support for __stdcall callbacks
  • Added COM VARIANT class with very basic data conversion support
  • Added many COM UUID definitions (from uuids.h, codecapi.h, ksuuids.h)
  • Added Solaris x86 support
  • Added @DisableDirect annotation to force-disable raw assembly optimizations (also see BRIDJ_DIRECT=0 or -Dbridj.direct=false for global disable)
  • Fixed long return values (issue #47)
  • Fixed ‘@Ptr long’ return values on 32 bits platforms
  • Fixed structs sub-structs and array fields
  • Fixed unions :
    • pure unions can be created with the @Union annotation on the union class (+ fields annotated with @Field(value = uniqueId))
    • structs with unioned fields can be defined with fields annotated with @Field(value = uniqueId, unionWith = indexOfTheFirstFieldOfTheUnion)
  • Fixed size computation of unions & structs (issue #51, issue #64)
  • Fixed JAWTUtils on win32 (issue #52)
  • Fixed Pointer.pointerToAddress(long, Class, Releaser) (issue #48)
  • Fixed incomplete refactoring (issue #58)
  • Moved all the is64Bits(), isWindows()… methods and SIZE_T_SIZE constants out of JNI class into new Platform class
  • Moved the C++ symbols demanglers to package org.bridj.demangling
  • Renamed Pointer.asPointerTo(Type) to Pointer.as(Type)
  • Enhanced FlagSet (added toString(), toEnum(), fromValue(ValuedEnum))
  • Enhanced Pointer (added allocate(Type), allocateArray(Type, long))
  • Greatly enhanced the API Javadoc : stable versiondevelopment version

Many thanks to the early users and bug reporters (you can report bugs here).

Now what ?

  1. Download Bridj’s binaries : works on Windows (x86/x64), Linux (x86/x64), MacOS X (universal) and Solaris (x86)
  2. Generate BridJ wrappers for your library using JNAerator (just select “BridJ” in the “Runtime” combobox).
  3. Read some documentation :
  4. FYI Pointer is probably the most important class to look at, the only other classes you need to know about are the ones created by JNAerator.

  5. Join the NativeLibs4Java Google Group to share your questions and remarks with the community !
Posted in Uncategorized | 1 Comment

JNAerator 0.9.7 released (r1817, 20110329) : better pointer types, better BridJ generation and C++ parsing

More than ever, JNAerator lets Java programmers access native libraries transparently, using a runtime such as BridJ (C / C++), JNA (C only) or Rococoa (Objective-C).

As version 0.9.6 was a kind of silent release (not advertised), here are the main changes since version 0.9.5 :

  • Fixed generation of typed pointers (+ introduced undefined types for BridJ runtime)
  • Added generation of globals for BridJ
  • Added -parseInOneChunk option to control parsing more finely (forces regular, non isolated-mode parsing, which is more correct but also more fragile)
  • Fixed issue 82 (enum renaming)
  • Fixed issue 79 (unescaped quotes in string defines)
  • Fixed issue 84 (some exotic function pointer syntax)
  • Fixed binding of Windows’ BOOL type for BridJ: it’s int, not boolean !!!
  • Added explicit call to setFieldOrder in JNAerated JNA structures (not needed with BridJ, which has its @Field annotations to guarantee proper order)
  • Enhanced parsing and AST generation of C/C++ expressions
  • Fixed BridJ inherited struct fields indexes computation

As usual, there’s more than one way to use JNAerator :

Many thanks to the bug reporters ! (you can help too : join the NativeLibs4Java Google Group)

Posted in Uncategorized | Comments Off

BridJ 0.3 released (r1638) : fixes, fixes and (tiny) stylistic changes

BridJ received a lot of fixes and enhancements in this new version, notably thanks to the fact I’m using it in ScalaCL (and that I have more time for my hobby projects now).

Here are the main changes :

  • Fixed binding of “c” library on Unix
  • Fixed iteration on unbound native-allocated pointers (issue 37).
  • Fixed Visual C++ demangling (issue 36 : bad handling of back-references).
  • Added Pointer.getBuffer(), getSizeTs(), getCLongs() and other missing methods.
  • Fixed byteOffset-related issues in CLong and SizeT pointer read/write methods.
  • Renamed most pointer byteOffset methods with an -AtOffset suffix (for instance, Pointer.getInt(long) becomes getIntAtOffset(long))
  • Inverted charset and StringType arguments in Pointer.getString / .setString methods
  • Renamed Pointer.withIO(PointerIO<U>) to Pointer.as(PointerIO<U>)
  • Added Pointer.asUntyped() (equiv. to Pointer.as((Class<?>)null))
  • Allow pointerToBuffer on a non-direct buffer (and added Pointer.updateBuffer to copy data back to non-direct buffer if needed)
  • Assume @Runtime(CRuntime.class) by default
  • Autodetect calling convention on Windows (based on name mangling), unless convention is explicitely specified with @Convention(Style.X)
  • Added BRIDJ_<LIBNAME>_LIBRARY environment variables to hard-code the shared library path in priority
  • Added library alias mechanism : BridJ.setNativeLibraryActualName, .addNativeLibraryAlias
  • Fixed callbacks-related Win32 crashes
  • Fixed super-critical bug on Windows 32 bits with size_t arguments !
  • Fixed some Pointer endianness bugs
Now what ?
Posted in Uncategorized | Comments Off

(French) Slides of JavaCL / ScalaCL presentation @ ISC’s feb 2011 GPGPU workshop

Hier, journée très sympathique dans les locaux de l’ISC-PIF, où j’ai participé à leur première journée thématique sur les méthodes de calcul GPGPU.

Les autres intervenants, visiblement très calés, y ont fait des présentations passionnantes, toutes exploitant l’accélération GPU :

  • algorithmes génétiques distribués sur GPU
  • calculs de morphogénèse d’embryons de poissons tigres (vidéos proprement bluffantes !)
  • simulation d’organes
  • accélération de code C / Fortran

Voici les slides de ma présentation sur JavaCL / ScalaCL.

[gview file="http://ochafik.com/blog/wp-content/uploads/2011/02/JavaCL_ScalaCL_Olivier_Chafik_ISC_febr-2010.pdf"]

Merci à Julian Blicke et Romain Reuillon de m’avoir invité à présenter mes projets !

Posted in Uncategorized | 2 Comments

Write your first OpenCL Discrete Fourier Transform with JavaCL in 15 minutes

This article is intended for OpenCL beginners who want to use Java to tap into the vast power of their GPUs.

It will show you how to write a simple Discrete Fourier Transform (DFT), walking you through the various steps needed (install of an OpenCL implementation, writing of OpenCL and Java / JavaCL hosting code, compilation) and will show you an unique feature of the JavaCL library : the JavaCL Generator.

You can type the example developed here by yourself, or simply download JavaCLTutorial.zip, unpack it and just run Maven (or open the project with any Maven-aware IDE, such as Netbeans, IntelliJ IDEA Community Edition or Eclipse+Maven).

Prequisites

First, you’ll need to make sure you have a working OpenCL implementation on your computer.

It’s okay if you’re on MacOS X 10.6 Snow Leopard, otherwise JavaCL’s Get / Install page will guide you through this.

To build the project (available for download JavaCLTutorial.zip”>download), you’ll probably be better off installing Maven.

Discrete Fourier Transform

The Discrete Fourier Transforms (DFTs) are extremely useful as they reveal periodicities in input signals as well as the relative strengths of any periodic components. In addition, the complex modulus of a properly scaled DFT is commonly known as the power spectrum of the input data.

Think of a DFT as the slow and easy twin of the Fast Fourier Transform (FFT). Both compute the same transform, but the former is dumb-easy while the latter is wicked-fast (O(n \log(n)) vs. O(n^2)).

You can skip the rest of this paragraph without harm if you don’t care about the maths behind this :-)

Considering a sequence of {N} complex numbers denoted by {x_0,\dots, x_{N-1}}, its dual representation in the Fourier orthonormal basis (exponentials) {X_0,\dots, X_{N-1}} is performed using the following Discrete Fourier Transform formula

\displaystyle  	X_k = \sum_{n=0}^{N-1} x_n e^{-2i\pi\frac{kn}{N}} \ \ \ k=0,\dots, N-1 \ \ \ \ \ (1)

where {i} is the imaginary unit. Then, the Inverse Discrete Fourier Transform (IDFT) is given by

\displaystyle  	x_n = \frac{1}{N} \sum_{k=0}^{N-1} X_k e^{2i\pi\frac{kn}{N}} \ \ \ k=0,\dots, N-1 \ \ \ \ \ (2)

These formulations aren’t unique but remain the most widespread.

In particular, if {x_n} is a real signal, then {X_k} and {X_{N-k}} are such that {\bar{X}_k=X_{N-k}} where {\bar{z}} denotes the complex conjugate. Therefore, the DFT output for real inputs is half redundant, and it is sufficient to analyze the signal by roughly looking at half of the DFT coefficients.

The original C code

The DFT is pretty straightforward to code in C :

void dft(
const double *in, // complex values input (packed real and imaginary)
double *out, // complex values output
int length, // number of input and output values
int sign) // sign modifier in the exponential :
// 1 for forward transform, -1 for backward.
{
for (int i = 0; i < length; i++)
{
// Initialize sum and inner arguments
double totReal = 0, totImag = 0;
double param = (-2 * sign * i) * M_PI / (double)length;

for (int k = 0; k < length; k++) {
double valueReal = in[k * 2], valueImag = in[k * 2 + 1];
double arg = k * param;
double c = cos(arg), sin(arg);

totReal += valueReal * c - valueImag * s;
totImag += valueReal * s + valueImag * c;
}

if (sign == 1) {
// forward transform (space -> frequential)
out[i * 2] = totReal;
out[i * 2 + 1] = totImag;
} else {
// backward transform (frequential -> space)
out[i * 2] = totReal / (double)length;
out[i * 2 + 1] = totImag / (double)length;
}
}
}

The OpenCL code

OpenCL makes it easy to call the same function many times in parallel.

As our original DFT C code simply consists in a double loop, we can make the outer loop parallel and keep the inner loop as is. The following OpenCL function (kernel) corresponds to the outer loop body, we’ll have it called many times in parallel by the OpenCL runtime with a varying parameter (here retrieved with the get_global_id function and stored in the ‘i’ variable) :

// Enable double-precision floating point numbers support.
// Not all platforms / devices support this, so you may have to switch to floats.
#pragma OPENCL EXTENSION cl_khr_fp64 : enable

__kernel void dft(
__global const double2 *in, // complex values input
__global double2 *out, // complex values output
int length, // number of input and output values
int sign) // sign modifier in the exponential :
// 1 for forward transform, -1 for backward.
{
// Get the varying parameter of the parallel execution :
int i = get_global_id(0);

// In case we're executed "too much", check bounds :
if (i >= length)
return;

// Initialize sum and inner arguments
double2 tot = 0;
double param = (-2 * sign * i) * M_PI / (double)length;

for (int k = 0; k < length; k++) {
double2 value = in[k];

// Compute sin and cos in a single call :
double c;
double s = sincos(k * param, &c);

// This adds (value.x * c - value.y * s, value.x * s + value.y * c) to the sum :
tot += (double2)(
dot(value, (double2)(c, -s)),
dot(value, (double2)(s, c))
);
}

if (sign == 1) {
// forward transform (space -> frequential)
out[i] = tot;
} else {
// backward transform (frequential -> space)
out[i] = tot / (double)length;
}
}

Here are a few remarks concerning this code :

  • OpenCL has vector data types, such as the double2 that we’ve used. It supports efficient vector-vector and vector-scalar operations (+, -, *, /…), with a few extras for vector-vector operations (dot product…).
  • We’re using the sincos function, which is presumably faster than using sin and cos separately
  • Double-precision numbers are not supported by default in OpenCL (some GPUs just don’t support them), so we have to enable the OpenCL double extension at the first line with the #pragma OpenCL mechanism. The exercice of using floats instead (so that it works on all devices) is left to the reader (you just need to dumb-replace ‘double’ by ‘float’ and ‘Double’ by ‘Float’ ;-) )

The JavaCL host code

The previous OpenCL source code needs to be presented to the OpenCL API so that it can compile it and call it with the correct parameters.

In general, the job of a OpenCL host program (which can be written in C / C++, Java or Python) is to :

An important feature of OpenCL is its deferred execution model : most operations are asynchronous, yielding a completion event which can be waited for (in a blocking way) or given as a reference to other asynchronous tasks, so that they can wait for the initial task to complete before proceeding. As a result, almost every OpenCL operation must be bound to a command queue (as of this writing, most command queues execute commands sequentially or in order, but this will hopefully change over time !).

OpenCL can be summarized as a cross-platform build and run infrastructure that uses reflection-like APIs and is focused on asynchronous and parallel programs.

Now that you’ve got a general idea of what OpenCL is about, here’s the JavaCL host code that corresponds to the OpenCL source code seen previously :

package tutorial;

import com.nativelibs4java.opencl.*;
import com.nativelibs4java.opencl.CLPlatform.DeviceFeature;
import com.nativelibs4java.util.*;
import java.io.IOException;
import java.nio.DoubleBuffer;

public class DFT {

    final CLQueue queue;
    final CLContext context;
    final CLProgram program;
    final CLKernel kernel;

    public DFT(CLQueue queue) throws IOException, CLBuildException {
        this.queue = queue;
        this.context = queue.getContext();

        String source = IOUtils.readText(DFT.class.getResource("DiscreteFourierTransformProgram.cl"));
        program = context.createProgram(source);
        kernel = program.createKernel("dft");
    }

    /**
* Method that takes complex values in input (sequence of pairs of real and imaginary values) and
* returns the Discrete Fourier Transform of these values if forward == true or the inverse
* transform if forward == false.
*/
    public synchronized DoubleBuffer dft(DoubleBuffer in, boolean forward) {
        assert in.capacity() % 2 == 0;
        int length = in.capacity() / 2;

        // Create an input CLBuffer that will be a copy of the NIO buffer :
        CLDoubleBuffer inBuf = context.createDoubleBuffer(CLMem.Usage.Input, in, true); // true = copy
        
        // Create an output CLBuffer :
        CLDoubleBuffer outBuf = context.createDoubleBuffer(CLMem.Usage.Output, length * 2);

        // Set the args of the kernel :
        kernel.setArgs(inBuf, outBuf, length, forward ? 1 : -1);
        
        // Ask for `length` parallel executions of the kernel in 1 dimension :
        CLEvent dftEvt = kernel.enqueueNDRange(queue, new int[]{ length });

        // Return an NIO buffer read from the output CLBuffer :
        return outBuf.read(queue, dftEvt);
    }

    /// Wrapper method that takes and returns double arrays
    public double[] dft(double[] complexValues, boolean forward) {
        DoubleBuffer outBuffer = dft(DoubleBuffer.wrap(complexValues), forward);
        double[] out = new double[complexValues.length];
        outBuffer.get(out);
        return out;
    }

    public static void main(String[] args) throws IOException, CLBuildException {
     // Create a context with the best double numbers support possible :
     // (try using DeviceFeature.GPU, DeviceFeature.CPU...)
        CLContext context = JavaCL.createBestContext(DeviceFeature.DoubleSupport);
        
        // Create a command queue, if possible able to execute multiple jobs in parallel
        // (out-of-order queues will still respect the CLEvent chaining)
        CLQueue queue = context.createDefaultOutOfOrderQueueIfPossible();

        DFT dft = new DFT(queue);
        //DFT2 dft = new DFT2(queue);

        // Create some fake test data :
        double[] in = createTestDoubleData();

        // Transform the data (spatial -> frequency transform) :
        double[] transformed = dft.dft(in, true);
        
        for (int i = 0; i < transformed.length / 2; i++) {
            // Print the transformed complex values (real + i * imaginary)
            System.out.println(transformed[i * 2] + "\t + \ti * " + transformed[i * 2 + 1]);
        }
        
        // Reverse-transform the transformed data (frequency -> spatial transform) :
        double[] backTransformed = dft.dft(transformed, false);

        // Check the transform + inverse transform give the original data back :
        double precision = 1e-5;
        for (int i = 0; i < in.length; i++) {
            if (Math.abs(in[i] - backTransformed[i]) > precision)
                throw new RuntimeException("Different values in back-transformed array than in original array !");
        }
    }

    static double[] createTestDoubleData() {
        int n = 32;
        double[] in = new double[2 * n];

        for (int i = 0; i < n; i++) {
            in[i * 2] = 1 / (double) (i + 1);
            in[i * 2 + 1] = 0;
        }
        return in;
    }
}
view raw DFT.java This Gist brought to you by GitHub.

Compiling and running

Assuming the OpenCL and Java source files are in the same directory as javacl-1.0-beta-6-shaded.jar, you can build your code with the following command :

1
javac -cp javacl-1.0-beta-6-shaded.jar tutorial/DFT.java

And run it with that one (replace the ‘;’ by a ‘:’ on Unix systems) :

1
java -cp javacl-1.0-beta-6-shaded.jar;. tutorial.DFT

Alternatively, you can use the following Maven pom.xml :

<project xmlns="http://maven.apache.org/POM/4.0.0"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
 <modelVersion>4.0.0</modelVersion>
  <groupId>com.nativelibs4java</groupId>
  <artifactId>javacl-tutorial</artifactId>
  <name>JavaCL Tutorial</name>
  <url>http://code.google.com/p/javacl/</url>
  <version>1.0-beta-6</version>
  <packaging>jar</packaging>
    
  <properties>
   <scala.version>2.8.1</scala.version>
  </properties>

  <repositories>
    <repository>
      <id>nativelibs4java</id>
      <name>nativelibs4java Maven2 Repository</name>
      <url>http://nativelibs4java.sourceforge.net/maven</url>
    </repository>
  </repositories>
  
  <dependencies>
<dependency>
<groupId>com.nativelibs4java</groupId>
<artifactId>javacl</artifactId>
<version>1.0-beta-6</version>
<scope>compile</scope>
</dependency>
<!--dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency-->
  </dependencies>

  <build>
   <plugins>
<plugin>
<groupId>com.nativelibs4java</groupId>
<artifactId>javacl-generator</artifactId>
<version>1.0-beta-6</version>
<!--configuration>
<javaOutputDirectory>${project.build.directory}/../src/main/java</javaOutputDirectory>
</configuration-->
<executions>
<execution>
<phase>generate-sources</phase>
<goals>
<goal>compile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>2.3.1</version>
<configuration>
<source>1.6</source>
<target>1.6</target>
</configuration>
</plugin>
<!--plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
</plugin-->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>1.3.3</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
<exclude>META-INF/maven/**</exclude>
</excludes>
</filter>
</filters>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
  </build>
</project>


view raw pom.xml This Gist brought to you by GitHub.

Of course, it will be easier to just download the .zip of this tutorial and run Maven in its root directory (if it’s the first time you use Maven, go have a coffee while it downloads the myriad of dependencies it needs to build stuff).

1
2
mvn package
java -cp target/javacl-tutorial-1.0-beta-6-shaded.jar tutorial.DFT

Maven Magic : the JavaCL Generator

To any Java programmer, setting the OpenCL kernel arguments with untyped interfaces must seem a bit odd. If you forget an argument or give an argument of the wrong type, you’ll only discover it during you program’s execution, not at compile time.

To avoid such issues, JavaCL has a very nice toy : the JavaCL Generator.

The JavaCL Generator simply parses your OpenCL headers and outputs Java code with one typed method per kernel, with the correct Javadoc and argument names (seems familiar ? it’s just a derivative of JNAerator !)

Given the previous OpenCL kernel code, the JavaCL Generator creates the following Java code :

package tutorial;
import com.nativelibs4java.opencl.*;
import java.io.IOException;

/// Auto-generated wrapper around the OpenCL program DiscreteFourierTransformProgram.cl
public class DiscreteFourierTransformProgram extends CLAbstractUserProgram {
public DiscreteFourierTransformProgram(CLContext context) throws IOException {
super(context, readRawSourceForClass(DiscreteFourierTransformProgram.class));
}
public DiscreteFourierTransformProgram(CLProgram program) throws IOException {
super(program, readRawSourceForClass(DiscreteFourierTransformProgram.class));
}
CLKernel dft_kernel;
public synchronized CLEvent dft(CLQueue commandQueue, CLDoubleBuffer in, CLDoubleBuffer out, int length, int sign, int globalWorkSizes[], int localWorkSizes[], CLEvent... eventsToWaitFor) throws CLBuildException {
if (dft_kernel == null)
dft_kernel = createKernel("dft");
dft_kernel.setArgs(in, out, length, sign);
return dft_kernel.enqueueNDRange(commandQueue, globalWorkSizes, localWorkSizes, eventsToWaitFor);
}
}

Which lets us rewrite the DFT class as follows, in a type-safe way (arguments count and types are checked by the compiler) :

package tutorial;

import com.nativelibs4java.opencl.*;
import java.io.IOException;
import java.nio.DoubleBuffer;

public class DFT2 {

    final CLQueue queue;
    final CLContext context;
    final DiscreteFourierTransformProgram program;

    public DFT2(CLQueue queue) throws IOException, CLBuildException {
        this.queue = queue;
        this.context = queue.getContext();
        this.program = new DiscreteFourierTransformProgram(context);
    }

    public synchronized DoubleBuffer dft(DoubleBuffer in, boolean forward) throws CLBuildException {
        assert in.capacity() % 2 == 0;
        int length = in.capacity() / 2;

        CLDoubleBuffer inBuf = context.createDoubleBuffer(CLMem.Usage.Input, in, true); // true = copy
        CLDoubleBuffer outBuf = context.createDoubleBuffer(CLMem.Usage.Output, length * 2);

        // The following call is type-safe, thanks to the JavaCL Maven generator :
        // (if the OpenCL function signature changes, the generated Java definition will be updated and compilation will fail)
        CLEvent dftEvt = program.dft(queue, inBuf, outBuf, length, forward ? 1 : -1, new int[]{length}, null);
        return outBuf.read(queue, dftEvt);
    }
    
public double[] dft(double[] complexValues, boolean forward) throws CLBuildException {
        DoubleBuffer outBuffer = dft(DoubleBuffer.wrap(complexValues), forward);
        double[] out = new double[complexValues.length];
        outBuffer.get(out);
        return out;
    }
 }
view raw DFT2.java This Gist brought to you by GitHub.

To make this work, you currently need to use Maven with the pom.xml file shown above. The JavaCL Generator will convert each .cl file in src/main/opencl into a wrapper class in target/generated-sources/java (and will do the same for test OpenCL files : src/test/opencl/…/*.cl -> target/generated-test-sources/java/…/*.java). The opencl directory will be added to the classpath so that resources are resolved and packaged by Maven correctly :-)

Going further…

JavaCL has a few unique features that make it the best choice amongst OpenCL bindings :

  • automatic and transparent caching of program binaries (to skip compilation times !)
  • kernels can include files from the Java classpath (!)
  • Maven generator that provides safe Java interfaces for your OpenCL kernels (as demonstrated in this article)
  • goodies such as parallel random number generator, reduction utilities…
  • image kernel transform editor
  • Scala library with parallel collections and automatic translation from Scala to OpenCL (!)

Go try JavaCL’s demos, read its wiki and join its active user community.

Posted in Algorithms, Featured, In English, Java, JavaCL, OpenCL | 5 Comments