Scalaxy/MacroExtensions: a DSL / compiler plugin to write macro-based DSLs (enrichments without runtime dependency!)

Quick link: Scalaxy/MacroExtensions on GitHub

Scala’s enrich-my-library pattern allows Scala developers to add methods (and more) to existing classes.

Scala 2.10 facilitates this pattern by providing implicit classes:

This works great, but…

  • It usually implies some object creation at runtime (lightweight though it might be, expecially if the extension is a value class),
  • Your extension library is a new runtime dependency (which is not such a bad thing, but it can be avoided, please read on).

The other day, Eric Christiansen rightly complained on NativeLibs4Java’s mailing list that Scalaxy/Compilets were quite constrained by the typer, and that he wished he could define extension methods more easily.

I gave that a serious thought, and came up with a compiler plugin that runs before the typer / namer and performs the following expansion:

Quite exciting syntax twisting, but overall not a huge line-saver.

Then I realized there’s another pattern that could benefit from such rewrites: macros enrichments.

A macro enrichment?

A macro enrichment just extends the “enrich-my-library” pattern by implementing the extension method using a macro. As a result, the enrichment is “inlined” at compilation time, and there’s no runtime dependency nor overhead. See my recent experiments for examples of such macro enrichments:

Scalaxy/MacroExtensions just uses the exact same syntax as above to create all the implicit class / macro wiring necessary to implement the extension as a macro:

Update(Feb 22th 2013): Updated the syntax to use @scalaxy.extend instead of @extend.

The plugin supports two kinds of body for extension methods: regular code, and macro.

  • If the body looks like some regular method implementation, as above, it will wrap it in a reify call and will wrap all references to self and to the method parameters in splice calls.
  • If the body is a macro implementation, it will put it verbatim in the resulting macro:

If you find this cool, just give it a try! (and follow me on Twitter if you want to hear about my next experiments :-)).

Scalaxy/Debug: macros that make Predef.assert, require and assume a joy to use!

Quick link: jump to Scalaxy/Debug straight away!

Throughout languages, assertions have proved to be a handy tool for programmers who want “light” error checks for very unlikely error cases (these checks are typically only performed in Debug builds so as not to slow down Release code).

In C / C++, assert macros usually print out the exact source code of the expression that failed the assertion check, which is often more than enough to understand the meaning of the failure.

In Java and Scala though, you have to provide your own message if you don’t want to be presented with an anonymous failure trace:

How often do you find yourself writing the following code?

Or, more recently (string interpolation FTW!):

Well, thanks to Scalaxy/Debug macros, you can now forget about this kind of grungy code:

You just need to add the following to your build.sbt file to get going:

Oh, and this works with assert, assume and require.

Enjoy! (please file any bug on Scalaxy/Debug’s GitHub and follow me on Twitter if you find this useful :-))

Scalaxy/Loops: Optimized foreach loops for Scala 2.10.0 have landed (+ recap on ScalaCL)

In a hurry? Skip this lengthy post and go straight to Scalaxy/Loops!

Update: edited description of what “-inline” does in the last-but-one section.

Recap on an interesting performance issue

A couple of years back, I created a startup with a friend and got to write all our code in Scala, which included some Monte Carlo simulations for logistics planning optimization.

I was horrified to see that my seemingly barebones code didn’t run nearly as fast as I expected:

The reason why this code is much slower than its Java equivalent for loop is that it is desugared as…

… which creates a couple of objects at runtime and performs a truckload of method calls on them (each of these closures is an object, so the outer closure is created once and the inner closure is created once per iteration, and of course there’s a cost for calling each of them in each inner or outer loop iteration).

When I discovered this, I found myself starting to rewrite my code using while loops, like this:

But then my code became harder to read (I actually had nested loops on arrays, which looks even worse than ranges), and I wondered why I was doing this in Scala at the first place. Something had to be done.

Update: also see this great post on Loop Performance and Local Variables in Scala by Mark Lewis.

A first solution: compiler plugins (and ScalaCL was born!)

When I decided I couldn’t write while loops by hand, it’s quite naturally that I turned to Scala compiler plugins.

Steep learning curve, quite a few rough edges, but I soon released an optimizing compiler plugin that covered all of my needs: Range, Array, List with foreach, map, reduce, fold, scan… you name it!

Despite my repeated red warnings that this was experimental-ware, people started using ScalaCL for production purposes, I went to present ScalaCL at Scalathon 2011 and Devoxx France 2012, and the sky was blue (oh, and it was also capable of running Scala on GPUs in a very experimental way, incidentally, which may have confused a few…).

In the last months where I had time to work on that project, I started optimizing chains of collection calls, ending up with quite a complex piece of engineering that I eventually didn’t stabilize (version 0.3 was almost ready… and now it’s quite stale :-( ).

The catch with ScalaCL was:

  • Compiler plugins are a pain to write, and debug.
  • The internal APIs can be broken at every new release, making maintenance quite risky (and since I already had a few open-source projects to maintain, I had hardly enough hobby time left: JNAerator, BridJ and JavaCL).
  • Using the compiler plugin requires some special arguments for the compiler, and this scares some people off.

A new approach: macros

Now with Scala 2.10, we’ve got a shiny new API that exposes the internals of the compiler in a clean and (hopefully) future-proof way (although it’s still an experimental feature), which doesn’t require any setup of a compiler plugin.

Moreover, it brings an extremely powerful “reification” mechanism which allows for painless AST construction from macros.

Playing around with Scalaxy/Beans and Scalaxy/Fx last week just wasn’t enough: I had to restart my work on loops with macros:


This concludes a year of catching up with Scala 2.10 milestones, with various rewrites of ScalaCL and Scalaxy (+ an experiment on Compilets), and the scope is voluntarily smaller than before: Scalaxy/Loops optimizes simple loops (and does it well), period.

How to make your loops faster in 30 seconds

If you’ve started using Scala 2.10 and use Sbt, you’re in luck: just edit your build.sbt file as follows:

Now you just need to import scalaxy.loops._ and append optimized to all the ranges which foreach loops you want to optimize:

That’s it!

(so far it’s limited to Range.foreach, but and / .foreach might follow soon)

Cool, but (why) is it (still) faster?

Oh yeah, it’s faster!

I’m uncomfortable announcing speedups because it depends on your JVM, on your code, etc… But see for yourself (expect x3+ speedups for tight loops):

Scalaxy/Loops naive Benchmark

Now for the “why”, it’s a bit more complicated. My understanding is that the Scala team didn’t want to invest resources into such “specific” optimizations and instead chose to focus on more generalistic ones, like inlining.

Looking at code compiled with “-optimise -Yinline -Yclosure-elim” in 2.10, you can see that Range foreach method is indeed inlined… But not its closure (the call of which is what takes all the time :-( ).

Update: there’s some amazing work on a new Scalac optimizer going on, hope it magically solves all our issues in a future version!

Anyway, hope it’s useful to some people, and stay tuned for a stable release soon.

Keep in touch

Whether you file bugs in Scalaxy’s tracker, join the NativeLibs4Java mailing-list or ping me on Twitter (@ochafik), any feedback or help will be greatly appreciated!

Type-safe dynamic beans factory with Scala 2.10 macros + Dynamic

With Scala 2.10.0 just out, we’ve now got a couple of shiny new toys:

  • Macros provide a clean (but experimental) way to plug into the Scala compiler, making AST rewrites very easy and mostly type-safe.
  • Dynamic is a type that lets you hijack method calls to predefined “fallback” methods (applyDynamic and al.).

The very interesting thing here is that despite its name, Dynamic only involves compile-time routing of unknown methods… and it is possible to implement applyDynamic using a macro!

This means using Dynamic you can create DSLs with code that is loosely typed (in Scala type system terms) and with macros we can still add your own custom compilation-time type-checks, and even rewrite the code so that it no longer refers to Dynamic at the first place!

Some practice: a type-safe dynamic beans factory

Java Beans are a bit tedious to create from Scala:

Well, with a Dynamic subclass that implements applyDynamicNamed using a macro, we can reach the following syntax:

And of course, this is all type-checked and rewritten to the exact same setter-intensive code as above, with the exact same runtime performance (nifty, isn’t it?).

Enjoy! (see newest sources and tests in Scalaxy’s repository)

JavaCL 1.0.0-RC3 released: massive performance improvements, bugfixes, OSGi

JavaCL v1.0.0 RC3 is available!

Download / Install | Browse JavaDoc | Getting Started (Tutorial) | Discuss

JavaCL is a BSD-licensed library that lets you use OpenCL from Java, in order to run massively-parallel computations on your graphic card (and/or CPU).

Release Notes

Here are the main changes between 1.0.0-RC2 and 1.0.0-RC3 (see full change log) :

  • Fixed byte order hack for ATI platforms
  • Fixes / optimized event callbacks (but broke API: CLEvent.EventCallback now only takes the completion status as argument, not the event anymore)
  • Fixed library probe
  • Fixed handling of image2d_t and image3d_t in Maven plugin (contrib. from Remi Emonet, request #308 and issue #307)
  • Fixed OpenGL interop on Windows (issue #312)
  • Fixed error about mismatching byte order for byte buffers, and replaced mentions to getKernelsDefaultByteOrder() by getByteOrder() (issue #336)
  • Fixed AMD App 2.7 Linux library loading code for
  • Fixed AMD download link in demos.
  • Added CLEvent.FIRE_AND_FORGET to avoid returning events from all the methods that accept a vararg eventsToWaitFor.
  • Added naive OSGi support to the main JAR.
  • Added list of devices in program errors.
  • Added CLBuffer.allocateCompatibleMemory(CLDevice)
  • Added client properties to CLContext (lazy + concurrent)
  • Optimized low-level bindings on OpenCL 1.1+ platforms, with dynamic runtime switch (removed synchronized keyword from all native calls), and made OpenCL 1.0 synchronization a warning.
  • Enhanced CLDevice.toString (include platform name)
  • Deprecated CLKernel.enqueueNDRange with int[] parameters
  • Return CLUserEvent from CLContext.createUserEvent();

Getting started

You can read the Getting Started (Tutorial) page on the wiki to get started very quickly !

Please join the NativeLibs4Java Google Group to discuss JavaCL / ScalaCL, get the latest news and ask for support from the growing JavaCL community.

JNAerator 0.11 released: ultra-fast raw bindings for BridJ, tons of critical fixes

JNAerator (licensed under LGPL 3.0) lets Java programmers access native libraries transparently, using a runtime such as BridJ (C / C++, BSD-license), JNA (C only, LGPL) or Rococoa (Objective-C).

As often, this new release contains tons of critical fixes, so all JNAerator users are strongly encouraged to migrate to this new version.

Here’s a summary of the changes between version 0.10 and 0.11 (see full change log here) :

  • Fixed infinite loops in simple typedefs (issue #288)
  • Fixed some -beautifyNames cases (issue #315)
  • Fixed parsing of some C++ templates (including template constructors)
  • Fixed “long long” regression
  • Fixed JNAeratorMojo.config documentation (issue #330)
  • Fixed long / long long / short pointer function return types
  • Fixed generation of BridJ C++ constructors
  • Fixed enum names that collide with Java identifiers (issue #334)
  • Added a type definition override switch, useful force mismatching 32/64bits types to some predefined types (for instance, -TmyVal=intptr_t)
  • Added raw bindings generation for BridJ
  • Added parsing of ‘using’ C++ statements
  • Added TypeRef.resolvedJavaIdentifier
  • Added parser support for `complex double` (cf. complex.h)
  • Added test for BridJ raw signatures
  • Moved to ECJ 3.7.2
  • Moved to JNA 3.4.0
  • Refactored type resolution and conversion
  • Rationalized CompilerUtils classpath + bootclasspath

Special thanks to the users and bug reporters that helped getting this version out !

You can contribute to the project by reporting bugs here and joining the NativeLibs4Java Community.

Wait no longer : try JNAerator through Java Web Start, or download it now !

BridJ 0.6.2 released: many fixes, OSGi, dependent libraries…

BridJ (BSD-licensed) is an innovative native bindings library that lets Java programmers use native libraries (written in C, C++, ObjectiveC and more) in a very natural way (inspired by the great JNA, with better performance, C++ and generics added).

Here’s a summary of the changes between version 0.6.1 and this bugfix-release version 0.6.2 (see full change log here) :

  • Fixed serious crashes on Win64 in assembler optimizations
  • Fixed BridJ.protectFromGC !
  • Fixed raw assembler optimization for floats & doubles, finally! (+ updated Win binaries)
  • Fixed handling of classloaders in some use-cases (issue #283)
  • Fixed (issue #306)
  • Fixed Pointer.copyTo(dest, elementCount) (issue #317)
  • Fixed alignment of struct array fields (issue #319)
  • Fixed alignment of double fields on Linux 32 bits (issue #320)
  • Fixed dlopen log for non existing absolute library paths
  • Added experimental Linux/arm support (issue #327)
  • Added @Library.dependencies + test library
  • Added BridJ.getNativeLibraryName back
  • Added ComplexDouble struct for C99’s `_Complex double` type.
  • Added GCC shortcut case for demangling of C++ constructors
  • Added Pointer.getIntAtIndex(long) / .setIntAtIndex(long, int) (and with all primitive variants)
  • Added quiet mode (BRIDJ_QUIET=1 / bridj.quiet=true) (issue #328)
  • Added parsing of Mach-O compressed symbols tries (LC_DYLD_INFO command) to dyncall (issue #311)
  • Added assembler optimizations for functions with up to 16 arguments on Win64 !
  • Added Pointer.withoutValidityInformation() (yields faster, unsafe pointer)
  • Added BridJ.subclassWithSynchronizedNativeMethods(Class) to create a subclass where all native methods are overridden
  • Added Pointer.getIntAtIndex(long) / .setIntAtIndex(long, int) (and with all primitive variants)
  • Added naive OSGi support to the main JAR.
  • Rationalized Java logs (issue #328)
  • Changed library extraction mechanism to allow extraction of dependencies (see @Library.dependencies); removed DeleteOldBinaries option
  • Special aliases for “c” and “m” libraries on windows (-> mscvrt)
  • Speedup assembler optimization on win64 (movsb -> movsq)
  • Removed ios-package (binaries for iOS/arm)

Special thanks to the users, contributors and bug reporters that helped getting this version out !

You can contribute to the project by reporting bugs here and joining the NativeLibs4Java Community.

Wait no longer : download and use BridJ now !

JavaCL 1.0.0-RC2 released : bugfixes, Maven Central repository

JavaCL v1.0.0 RC2 is available !

Download / Install | Browse JavaDoc | Getting Started (Tutorial) | Discuss

Launch the new JavaCL Interactive Image Transform Editor.

Release Notes

Here are the main changes between 1.0.0-RC1 and 1.0.0-RC2 (see full change log) :

  • Release artifacts are available in Maven Central
  • Added support for sub-images reading/writing from/to CLImage2D (slower than with full images)
  • Fixed endianness issues with CLBuffer (issue #80)
  • Fixed migration of cached binaries to newer versions of OS (e.g. upgrading from Snow Leopard to Lion) (issue #81)
  • Fixed handling compiler options containing spaces (issue #274)
  • Fixed tutorial artifact pom repositories (issue #279)
  • Fixed support of Intel’s OpenCL 1.5 Windows runtime (issue #297)
  • Fixed many Javadoc typos
  • Enhanced LocalSize API (added static factory methods for all primitive types)
  • Deprecated CLContext.getKernelsDefaultByteOrder() and CLDevice.getKernelsDefaultByteOrder()
  • Added more informative exceptions when passing null pointers to CLBuffer.writeBytes (issue #257)
  • Updated to OpenCL 1.2 headers
  • Added -cl-nv-verbose, -cl-nv-maxrregcount, -cl-nv-opt-level + proper log even without error when nv-verbose is set
  • Enhanced handling of endianness : warn when creating contexts with devices that have mismatching endianness, throw when creating buffer out of Buffer / Pointer with bad endianness
  • Changed signature of CLPlatform.listDevices (now takes a single CLDevice.Type, including All, instead of an EnumSet thereof)

Getting started

You can read the Getting Started (Tutorial) page on the wiki to get started very quickly !

Please join the NativeLibs4Java Google Group to discuss JavaCL / ScalaCL, get the latest news and ask for support from the growing JavaCL community.

JNAerator 0.10 released : bugfixes, added Maven project output modes

JNAerator (licensed under LGPL 3.0) lets Java programmers access native libraries transparently, using a runtime such as BridJ (C / C++, BSD-license), JNA (C only, LGPL) or Rococoa (Objective-C).

This new release contains tons of critical fixes, so all JNAerator users are strongly encouraged to migrate to this new version.

Here’s a summary of the changes between version 0.9.7 and 0.10 (see full change log here) :

  • Fixed generation of large long values
  • Fixed conditional parsing of __in modifier (and COM modifiers in general)
  • Fixed generation of globals and variables included more than once
  • Fixed parsing of unary ‘-‘ operator
  • Fixed parsing of C++ constructors and class inheritance
  • Fixed parsing of default values for type name template arguments
  • Fixed parsing of const type mutator (fixes `void f(struct x * const);`) (issue #205)
  • Fixed parsing of null char escape ‘\0′ (issue #214)
  • Fixed conversion of `int a; f(&a);`
  • Fixed handling of “long int” and “short int” (issue #267)
  • Fixed parsing of __declspec, __attribute__ and some modifiers-related regressions
  • Fixed conversion of __inline functions when -convertBodies is on
  • Fixed NPE in JNAeratorUtils.findBestPlainStorageName (issue #258)
  • Fixed parsing of empty strings (spotted by @ENargit in issue #255)
  • Fixed generation of typedefs (issue #273)
  • Fixed generation of casted constants (issue #96)
  • Fixed generation of unnamed structs and unions (issue #94)
  • Fixed multidimensional array sizes for JNA target (issue #165)
  • Fixed handling of hexadecimal constants (issue #296)
  • Fixed conversion of comments for BridJ target runtime
  • Fixed generation of BridJ calling conventions (issue #282)
  • Fixed handling of __stdcall function pointers and functions (issue #282)
  • Fixed mapping of bool for JNA(erator) target runtime (issue #289)
  • Fixed parsing of malloc, free and many potential modifiers (issue #278 and issue #280)
  • Fixed handling of unicode library paths (issue #276)
  • Fixed parsing of friend members in C++ classes, and of assignment operators (operator+=, …)
  • Fixed generation of very simple edge cases “long f();”, “short f();”, “f();” (issue #270)
  • Changed naming of anonymous function pointer types : `void f(void (*arg)());` now yields callback `f_arg_callback`
  • Enhanced handling of parsing failures : faster failover to “sliced” parsing, reduced verbosity of errors
  • Added Maven output modes and -mode switch to choose between Jar, StandaloneJar, Directory, Maven, AutoGeneratedMaven (deprecated -noJar and -noComp)
  • Added support for MSVC asm soups + added -removeInlineAsm hack switch that tries to regex-remove __asm soups that still cannot be parsed out
  • Added support for BridJ’s bundled libraries mechanism and paths (+ enforce them using an enum param)
  • Added parsing of expression initializer blocks `v = { 1, 2, 3};`
  • Added preservation of original textual representation of constants
  • Added tr1 to default C++ includes
  • Generate symbols of all files in directories where any file was explicitly listed
  • Added support for command-line definition of macros with arguments : -Df(x)=whatever
  • Added conversion of `malloc(x * y * sizeof(whatever))`
  • Removed C++ name mangling feature for JNA target runtime (was simplistic anyway)
  • Release artifacts are available in Maven Central

Special thanks to the users and bug reporters that helped getting this version out !

You can contribute to the project by reporting bugs here and joining the NativeLibs4Java Community.

Wait no longer : try JNAerator through Java Web Start, or download it now !