Monday, April 29, 2013

Some ray tracing performance on Javascript engines

I am playing with ideas to represent world geometries. Playing with deformable voxels as "surface nets", I finally implemented a deformable recursive grid to store the voxels.

I guess the idea is really close to brick maps as for example, exposed by Cyril Crassin in his nice Ph.D. here.

The idea is super simple: this is just a grid of grid of .... of voxels.

Anyway, the code is here for those who want to look at it:

(look for brick something)

One of the interestings parts is obviously to raytrace this thing. Ray tracing a grid is super simple as exposed by Amanatides here. Extending it to recursive grid is as simple as some c++ template horror and tracking both tmin and tmax while recursing (see "intersect" function).

This led to benchmarking the javascript engines. Here is some very simple level:

with the depth map as it is raytraced:

Native C has been compiled with gcc 4.7 and the JS versions with emscripten.

Here are the performance:

Native C: 919,000 ray/s
Firefox nightly: 148,000 ray/s
Firefox nightly + asm.js: 508,000 ray/s
Chromium 25: 111,000 ray/s

asm.js proves to be super efficient and as advertised, I got 50% of native performance.

Obviously, native code is not using any SIMDified code so, in practice, the gap is still much larger. Also, for JS versions, multi-threaded code will only work with webworkers which will only support message passing. This may bring some extra hit compared to a regular shared memory implementation.

Anyway, this is still really encouraging!

Tuesday, April 16, 2013

Beignet (OpenCL for IVB on Linux) got its first release!


The OpenCL code base I initiated last year is now officially supported by Intel. The guys there added plenty of stuffs. It is pretty cool.

The official announce is here:

There is a short news on Phoronix:

Even something on Slashdot!

I am pretty eager to see further developments. OpenCL, though far from perfect, is a nice API easy to use and that can give decent performance.

This is anyway pretty cool since most of the initial code I did was a lonely project and I had never been sure it will be officially supported at some point.

Sunday, March 31, 2013

Cube ported to JS

Hello all,

following the hype, I spent some time porting Cube (1), the minimalistic FPS, to javascript with Emscripten.

Here is a screenshot:

Emscripten is amazing. The guys did a really good job to wraps useful libraries like SDL, SDL_mixer and so on.

On my side, most of the effort was therefore basically on adapting the old OGL 1.x code to a more recent webGL like version of it.

Emscripten developers put a large effort on making OGL 1.x directly running in the browser but I wanted to have a cleaner and faster implementation only using webgl features

Cube is an interesting code base compiler wise. It is indeed pretty CPU bound since a large time is spent to build the vertex arrays for every frame instead of caching them into VBOs as every game does today.

On my machine (Intel NUC with i3 IVB), the code is mostly 4 times slower than the native version. I tried asm.js with the latest nightly build but with no performance difference. Not sure why.

Code is slow on Chrome.

Code is here:

You  need to download cube data yourself here:


Wednesday, January 16, 2013

Bits of OpenCL on Linux for IvyBridge are published

Some OpenCL code I was working on at Intel has been recently publically pushed and is now available for download.

A phoronix news about this is here:

The code is here:

The code basically contains both the run-time code which is basically the OpenCL host code (clKernel*, clProgram*) and a compiler back-end which is responsible to take bits of LLVM IR code and to output IvyBridge ISA from it.

Small problem is that I am not working for Intel anymore so, I will not work that much on it.

The code is decent even if some parts are a bit over-the-top, c++ style wise. However, the most important parts are already here (instruction scheduling, instruction selection infrastructure, register allocation) even if a serious amount of work is still to be done.

Since I am not working for Intel anymore, I cannot speak for Keith Packard and the others but I guess any patch will be welcomed. For those interested in low-level hacking, do not forget that the complete IvuBridge documentation is still here:

Thursday, December 27, 2012

Playing with oprofile on Linux

I just spent some time using oprofile on Linux. oprofile allows basically to profile everything running on your system with a rather low overhead.
Lots of details here:

A quick overview:

1. make oprofile use your kernel (root). Ignore it if you do not care about kernel symbols
$ opcontrol --vmlinux=/usr/src/linux-3.2.13-1-ARCH/vmlinux

2. make oprofile measure time spent in libraries (root)
$ opcontrol --separate=lib

3. start oprofile (root) 
$ opcontrol --start

4. measure time  spent in functions for "cube_client" :-)
$ opreport --demangle=smart  --symbols ~/src/cube/src/cube_client

5. You get this:
CPU: AMD64 family12h, speed 1497.22 MHz (estimated)
Counted CPU_CLK_UNHALTED events (CPU Clocks not Halted) with a unit mask of 0x00 (No unit mask) count 100000
samples  %        image name               symbol name
68078004 72.8798             /usr/lib/dri/
3984600   4.2657  cube_client              world::render_seg_new(float, float, float, int, int, int, int, int)
3060858   3.2768  cube_client              world::isoccluded(float, float, float, float, float)
2838442   3.0386  cube_client              rdr::render_flat(int, int, int, int, int, sqr*, sqr*, sqr*, sqr*, bool)
2696379   2.8866             __mcount_internal
1777893   1.9033  cube_client              world::render_wall(sqr*, sqr*, int, int, int, int, int, sqr*, sqr*, bool)
1664943   1.7824             mcount
1450401   1.5527             /lib/
794027    0.8500             _wordcopy_fwd_aligned
787522    0.8431  cube_client              world::computeraytable(float, float)
687461    0.7360  cube_client              rdr::render_square(int, float, float, float, float, int, int, int, int, int, sqr*, sqr*, bool)
669011    0.7162  cube_client              rdr::ogl::lookuptex(int, int&, int&)
640268    0.6854       /usr/lib/fglrx/
603660    0.6462  cube_client              rdr::render_flatdelta(int, int, int, int, float, float, float, float, sqr*, sqr*, sqr*, sqr*, bool)
486056    0.5203  cube_client              rdr::ogl::drawframe(int, int, float)
441795    0.4730  cube_client              rdr::ogl::addstrip(int, int, int)
164852    0.1765             __memmove_sse2
160559    0.1719  cube_client              _ZN7physics7collideEP6dynentbff.constprop.6


You will find lot of information on the net like how to capture other perf counters. Look at:
$ opcontrol --list-events

Thursday, August 2, 2012

IvyBridge GPU documentation and code on the web

Hello all,

Just to remind that IVB spec is online. I mean:
  • The complete state setting is documented
  • The complete ISA for the "shader cores" (we call them Execution Units or EUs) is also here
  • The documentation for the interesting shared functions (sampler, loads/stores) is also here
The documentation is here:

It may be a bit rough to start with but fortunately, we also have a complete MIT licensed open source OpenGL stack called "Mesa". It is here:

Mesa is a big piece of code that supports many targets but you may see the Intel GPU specific part here:

Friday, July 20, 2012

Various code bases and cube (the game) pushed on github

I decided to follow the hype and I pushed everything on github:

Note that I also cleaned up cube (the first cube game, the one before Sauerbraten) to make it compile with no complaint on gcc 4.6 and VS2010.

Did I already say that cube is amazing? The complete engine (cube itself + its network layer aka the 2005 version of enet) takes 10,000 LoC.

I may write some post reviewing the code. However, just to have fun, really look at command.cpp which basically implements an insanely powerful mini-scripting language in 300 LoC. Really cool.

Obviously, for more features, Sauerbraten and its next incarnation Tesseract are also really impressive.

Cube however remains unique by its size.