Saturday, May 17, 2014

Boodew: a very small functional language in 250 lines of c++

I just published a very small language called "boodew" that you can find here:

The basic idea of it and its implementation is to have something as powerful and small as possible. As said in the readme, this is a variant of the scripting language used in sauerbraten ( and tesseract (

The basic idea is to rely only on string evaluation and string substitution prior to its evaluation. This leads to very good and simple patterns very similar to what Lisp can offer with lists.

Simple imperative example

Let's look at a very simple imperative example:

Here we simply write a loop that iterates 16 times and print something for the first 8 iterations. This line is one expression.
  • The first argument always defines the function or the builtin to call here "loop".
  • The next argument is the loop variable and the next one is the maximum loop counter. This loop goes from 0 to 15.
The last element enclosed in brackets in the expression to execute. Brackets mean that the expression evaluation is not done when parsing the loop expression but when loop builtin will decide to execute it. Let's break it into pieces:
  • "?" defines a condition statement made of the condition which is executed before "?" is run, the "then" expression and an optional "else" expression
  • "(< ($ i) 8)" is comparison. Here we use "$" that looks up a variable and replaces it by its value
  • The two last statements are the "then" expression and the "else" one. Note that loops support classical break and continue

Function definition and function call

Consider this example:

As you see, you define a function and use it. The function by itself is no more than a string i.e. a "[...]" expression. Calling it consists in looking up the function variable i.e. its string and to evaluate it. This example also shows a simple usage of dynamic scoping where the function can access the local variables of its caller(s). 

Function arguments are easy since they are just regular variables created on the fly by the interpreter. Consider this example:
We define a function called "fn" that implicitly has one argument. This argument can fetched by its name "0" using the regular "$" builtin. Extra arguments will be called "1", "2"...
This example will therefore print "boum"

String substitution

Look at the other example:

Here we use operator "@". This operator forces the evaluation inside a nested "[]" expression. Therefore "int ($ i)" will be evaluated just before echo is executed. Since i equals 4, echo will output "hop4"

Note that "@" also works as an escape character and can therefore be used to delay the string expansion. Consider this example:
It will print "[hop@($ i)]" since the first "@" will prevent the evaluation of the second "@". After the expansion, the "@" expression can then be evaluated as above.

Higher order programming

Function declaration and string substitution are enough to define a real functional language.

For example, consider this small code:

It defines a function currying function i.e. a function that partially applies arguments to another function, similar to what ML languages do or what you can do with std::bind in c++.
Let's examine it:

  • first, we define our higher-order bind function. As the rest in boodew, this is just a string. The basic idea of it is given a function (its first argument) and the argument to apply (its second argument), we build a new string that will define a function where the first argument is applied. To do so, we use operator "@". Note that we use the "double" version of it here, since we need to delay its evaluation once.
  • secondly, we use operator "^"  which is the identity function that simply returns its argument. That makes sense since the argument is a string i.e. this is the function we want to create. 
  • Both "^" and "@" can define a function of functions since "^" will return a string modified by the later application of "@"
  • We then apply this operator on builtin "+" applying the argument "2" and we create a new function (i.e. still a string) out of it
  • We finally call this function. Nothing special here. This is a regular function
Note that we could be even more brutal with this concept by making partial evaluation lazy. Using more operator "@", we can make the application both partial and lazy by letting the final evaluation being done at call time. Anything is possible since we just assemble strings.

What next

Boodew is a very simple but pretty powerful language at least compared to its size (250 line of c++ code). Right now, the number of builtins is limited but it is really easy to add more. Also, there is no real boodew context: everything is global. The first implementation aimed at being really small and straight to the point. Interesting addition (apart from more builtins and a stronger implementation) would be the addition of user data similarly to what other scripting languages do. This could offer more powerful constructs in particular if they embed a "to_string" method :-)

Monday, April 29, 2013

Some ray tracing performance on Javascript engines

I am playing with ideas to represent world geometries. Playing with deformable voxels as "surface nets", I finally implemented a deformable recursive grid to store the voxels.

I guess the idea is really close to brick maps as for example, exposed by Cyril Crassin in his nice Ph.D. here.

The idea is super simple: this is just a grid of grid of .... of voxels.

Anyway, the code is here for those who want to look at it:

(look for brick something)

One of the interestings parts is obviously to raytrace this thing. Ray tracing a grid is super simple as exposed by Amanatides here. Extending it to recursive grid is as simple as some c++ template horror and tracking both tmin and tmax while recursing (see "intersect" function).

This led to benchmarking the javascript engines. Here is some very simple level:

with the depth map as it is raytraced:

Native C has been compiled with gcc 4.7 and the JS versions with emscripten.

Here are the performance:

Native C: 919,000 ray/s
Firefox nightly: 148,000 ray/s
Firefox nightly + asm.js: 508,000 ray/s
Chromium 25: 111,000 ray/s

asm.js proves to be super efficient and as advertised, I got 50% of native performance.

Obviously, native code is not using any SIMDified code so, in practice, the gap is still much larger. Also, for JS versions, multi-threaded code will only work with webworkers which will only support message passing. This may bring some extra hit compared to a regular shared memory implementation.

Anyway, this is still really encouraging!

Tuesday, April 16, 2013

Beignet (OpenCL for IVB on Linux) got its first release!


The OpenCL code base I initiated last year is now officially supported by Intel. The guys there added plenty of stuffs. It is pretty cool.

The official announce is here:

There is a short news on Phoronix:

Even something on Slashdot!

I am pretty eager to see further developments. OpenCL, though far from perfect, is a nice API easy to use and that can give decent performance.

This is anyway pretty cool since most of the initial code I did was a lonely project and I had never been sure it will be officially supported at some point.

Sunday, March 31, 2013

Cube ported to JS

Hello all,

following the hype, I spent some time porting Cube (1), the minimalistic FPS, to javascript with Emscripten.

Here is a screenshot:

Emscripten is amazing. The guys did a really good job to wraps useful libraries like SDL, SDL_mixer and so on.

On my side, most of the effort was therefore basically on adapting the old OGL 1.x code to a more recent webGL like version of it.

Emscripten developers put a large effort on making OGL 1.x directly running in the browser but I wanted to have a cleaner and faster implementation only using webgl features

Cube is an interesting code base compiler wise. It is indeed pretty CPU bound since a large time is spent to build the vertex arrays for every frame instead of caching them into VBOs as every game does today.

On my machine (Intel NUC with i3 IVB), the code is mostly 4 times slower than the native version. I tried asm.js with the latest nightly build but with no performance difference. Not sure why.

Code is slow on Chrome.

Code is here:

You  need to download cube data yourself here:


Wednesday, January 16, 2013

Bits of OpenCL on Linux for IvyBridge are published

Some OpenCL code I was working on at Intel has been recently publically pushed and is now available for download.

A phoronix news about this is here:

The code is here:

The code basically contains both the run-time code which is basically the OpenCL host code (clKernel*, clProgram*) and a compiler back-end which is responsible to take bits of LLVM IR code and to output IvyBridge ISA from it.

Small problem is that I am not working for Intel anymore so, I will not work that much on it.

The code is decent even if some parts are a bit over-the-top, c++ style wise. However, the most important parts are already here (instruction scheduling, instruction selection infrastructure, register allocation) even if a serious amount of work is still to be done.

Since I am not working for Intel anymore, I cannot speak for Keith Packard and the others but I guess any patch will be welcomed. For those interested in low-level hacking, do not forget that the complete IvuBridge documentation is still here:

Thursday, December 27, 2012

Playing with oprofile on Linux

I just spent some time using oprofile on Linux. oprofile allows basically to profile everything running on your system with a rather low overhead.
Lots of details here:

A quick overview:

1. make oprofile use your kernel (root). Ignore it if you do not care about kernel symbols
$ opcontrol --vmlinux=/usr/src/linux-3.2.13-1-ARCH/vmlinux

2. make oprofile measure time spent in libraries (root)
$ opcontrol --separate=lib

3. start oprofile (root) 
$ opcontrol --start

4. measure time  spent in functions for "cube_client" :-)
$ opreport --demangle=smart  --symbols ~/src/cube/src/cube_client

5. You get this:
CPU: AMD64 family12h, speed 1497.22 MHz (estimated)
Counted CPU_CLK_UNHALTED events (CPU Clocks not Halted) with a unit mask of 0x00 (No unit mask) count 100000
samples  %        image name               symbol name
68078004 72.8798             /usr/lib/dri/
3984600   4.2657  cube_client              world::render_seg_new(float, float, float, int, int, int, int, int)
3060858   3.2768  cube_client              world::isoccluded(float, float, float, float, float)
2838442   3.0386  cube_client              rdr::render_flat(int, int, int, int, int, sqr*, sqr*, sqr*, sqr*, bool)
2696379   2.8866             __mcount_internal
1777893   1.9033  cube_client              world::render_wall(sqr*, sqr*, int, int, int, int, int, sqr*, sqr*, bool)
1664943   1.7824             mcount
1450401   1.5527             /lib/
794027    0.8500             _wordcopy_fwd_aligned
787522    0.8431  cube_client              world::computeraytable(float, float)
687461    0.7360  cube_client              rdr::render_square(int, float, float, float, float, int, int, int, int, int, sqr*, sqr*, bool)
669011    0.7162  cube_client              rdr::ogl::lookuptex(int, int&, int&)
640268    0.6854       /usr/lib/fglrx/
603660    0.6462  cube_client              rdr::render_flatdelta(int, int, int, int, float, float, float, float, sqr*, sqr*, sqr*, sqr*, bool)
486056    0.5203  cube_client              rdr::ogl::drawframe(int, int, float)
441795    0.4730  cube_client              rdr::ogl::addstrip(int, int, int)
164852    0.1765             __memmove_sse2
160559    0.1719  cube_client              _ZN7physics7collideEP6dynentbff.constprop.6


You will find lot of information on the net like how to capture other perf counters. Look at:
$ opcontrol --list-events

Thursday, August 2, 2012

IvyBridge GPU documentation and code on the web

Hello all,

Just to remind that IVB spec is online. I mean:
  • The complete state setting is documented
  • The complete ISA for the "shader cores" (we call them Execution Units or EUs) is also here
  • The documentation for the interesting shared functions (sampler, loads/stores) is also here
The documentation is here:

It may be a bit rough to start with but fortunately, we also have a complete MIT licensed open source OpenGL stack called "Mesa". It is here:

Mesa is a big piece of code that supports many targets but you may see the Intel GPU specific part here: