I am playing with ideas to represent world geometries. Playing with deformable voxels as "surface nets", I finally implemented a deformable recursive grid to store the voxels.
I guess the idea is really close to brick maps as for example, exposed by Cyril Crassin in his nice Ph.D. here.
The idea is super simple: this is just a grid of grid of .... of voxels.
Anyway, the code is here for those who want to look at it:
https://github.com/bsegovia/cube-template/blob/master/src/ogl.cpp
(look for brick something)
One of the interestings parts is obviously to raytrace this thing. Ray tracing a grid is super simple as exposed by Amanatides here. Extending it to recursive grid is as simple as some c++ template horror and tracking both tmin and tmax while recursing (see "intersect" function).
This led to benchmarking the javascript engines. Here is some very simple level:
with the depth map as it is raytraced:
Native C has been compiled with gcc 4.7 and the JS versions with emscripten.
Here are the performance:
Native C: 919,000 ray/s
Firefox nightly: 148,000 ray/s
Firefox nightly + asm.js: 508,000 ray/s
Chromium 25: 111,000 ray/s
asm.js proves to be super efficient and as advertised, I got 50% of native performance.
Obviously, native code is not using any SIMDified code so, in practice, the gap is still much larger. Also, for JS versions, multi-threaded code will only work with webworkers which will only support message passing. This may bring some extra hit compared to a regular shared memory implementation.
Anyway, this is still really encouraging!
Monday, April 29, 2013
Tuesday, April 16, 2013
Beignet (OpenCL for IVB on Linux) got its first release!
Youhou!
The OpenCL code base I initiated last year is now officially supported by Intel. The guys there added plenty of stuffs. It is pretty cool.
The official announce is here:
http://lists.freedesktop.org/archives/intel-gfx/2013-April/026747.html
There is a short news on Phoronix:
http://www.phoronix.com/scan.php?page=news_item&px=MTM1Mjc
Even something on Slashdot!
I am pretty eager to see further developments. OpenCL, though far from perfect, is a nice API easy to use and that can give decent performance.
This is anyway pretty cool since most of the initial code I did was a lonely project and I had never been sure it will be officially supported at some point.
The OpenCL code base I initiated last year is now officially supported by Intel. The guys there added plenty of stuffs. It is pretty cool.
The official announce is here:
http://lists.freedesktop.org/archives/intel-gfx/2013-April/026747.html
There is a short news on Phoronix:
http://www.phoronix.com/scan.php?page=news_item&px=MTM1Mjc
Even something on Slashdot!
I am pretty eager to see further developments. OpenCL, though far from perfect, is a nice API easy to use and that can give decent performance.
This is anyway pretty cool since most of the initial code I did was a lonely project and I had never been sure it will be officially supported at some point.
Sunday, March 31, 2013
Cube ported to JS
Hello all,
following the hype, I spent some time porting Cube (1), the minimalistic FPS, to javascript with Emscripten.
Here is a screenshot:
Emscripten is amazing. The guys did a really good job to wraps useful libraries like SDL, SDL_mixer and so on.
On my side, most of the effort was therefore basically on adapting the old OGL 1.x code to a more recent webGL like version of it.
Emscripten developers put a large effort on making OGL 1.x directly running in the browser but I wanted to have a cleaner and faster implementation only using webgl features
Cube is an interesting code base compiler wise. It is indeed pretty CPU bound since a large time is spent to build the vertex arrays for every frame instead of caching them into VBOs as every game does today.
On my machine (Intel NUC with i3 IVB), the code is mostly 4 times slower than the native version. I tried asm.js with the latest nightly build but with no performance difference. Not sure why.
Code is slow on Chrome.
Code is here:
https://github.com/bsegovia/cube-gles
You need to download cube data yourself here:
http://sourceforge.net/projects/cube/
Enjoy!
following the hype, I spent some time porting Cube (1), the minimalistic FPS, to javascript with Emscripten.
Here is a screenshot:
Emscripten is amazing. The guys did a really good job to wraps useful libraries like SDL, SDL_mixer and so on.
On my side, most of the effort was therefore basically on adapting the old OGL 1.x code to a more recent webGL like version of it.
Emscripten developers put a large effort on making OGL 1.x directly running in the browser but I wanted to have a cleaner and faster implementation only using webgl features
Cube is an interesting code base compiler wise. It is indeed pretty CPU bound since a large time is spent to build the vertex arrays for every frame instead of caching them into VBOs as every game does today.
On my machine (Intel NUC with i3 IVB), the code is mostly 4 times slower than the native version. I tried asm.js with the latest nightly build but with no performance difference. Not sure why.
Code is slow on Chrome.
Code is here:
https://github.com/bsegovia/cube-gles
You need to download cube data yourself here:
http://sourceforge.net/projects/cube/
Enjoy!
Wednesday, January 16, 2013
Bits of OpenCL on Linux for IvyBridge are published
Some OpenCL code I was working on at Intel has been recently publically pushed and is now available for download.
A phoronix news about this is here: http://www.phoronix.com/scan.php?page=news_item&px=MTI3MTU
The code is here: http://cgit.freedesktop.org/beignet/
The code basically contains both the run-time code which is basically the OpenCL host code (clKernel*, clProgram*) and a compiler back-end which is responsible to take bits of LLVM IR code and to output IvyBridge ISA from it.
Small problem is that I am not working for Intel anymore so, I will not work that much on it.
The code is decent even if some parts are a bit over-the-top, c++ style wise. However, the most important parts are already here (instruction scheduling, instruction selection infrastructure, register allocation) even if a serious amount of work is still to be done.
Since I am not working for Intel anymore, I cannot speak for Keith Packard and the others but I guess any patch will be welcomed. For those interested in low-level hacking, do not forget that the complete IvuBridge documentation is still here: https://01.org/linuxgraphics/documentation/2012-intel-core-processor-family
A phoronix news about this is here: http://www.phoronix.com/scan.php?page=news_item&px=MTI3MTU
The code is here: http://cgit.freedesktop.org/beignet/
The code basically contains both the run-time code which is basically the OpenCL host code (clKernel*, clProgram*) and a compiler back-end which is responsible to take bits of LLVM IR code and to output IvyBridge ISA from it.
Small problem is that I am not working for Intel anymore so, I will not work that much on it.
The code is decent even if some parts are a bit over-the-top, c++ style wise. However, the most important parts are already here (instruction scheduling, instruction selection infrastructure, register allocation) even if a serious amount of work is still to be done.
Since I am not working for Intel anymore, I cannot speak for Keith Packard and the others but I guess any patch will be welcomed. For those interested in low-level hacking, do not forget that the complete IvuBridge documentation is still here: https://01.org/linuxgraphics/documentation/2012-intel-core-processor-family
Thursday, December 27, 2012
Playing with oprofile on Linux
I just spent some time using oprofile on Linux. oprofile allows basically to profile everything running on your system with a rather low overhead.
Lots of details here: http://oprofile.sourceforge.net/about/
A quick overview:
4. measure time spent in functions for "cube_client" :-)
$ opreport --demangle=smart --symbols ~/src/cube/src/cube_client
Counted CPU_CLK_UNHALTED events (CPU Clocks not Halted) with a unit mask of 0x00 (No unit mask) count 100000
samples % image name symbol name
68078004 72.8798 fglrx_dri.so /usr/lib/dri/fglrx_dri.so
3984600 4.2657 cube_client world::render_seg_new(float, float, float, int, int, int, int, int)
3060858 3.2768 cube_client world::isoccluded(float, float, float, float, float)
2838442 3.0386 cube_client rdr::render_flat(int, int, int, int, int, sqr*, sqr*, sqr*, sqr*, bool)
2696379 2.8866 libc-2.15.so __mcount_internal
1777893 1.9033 cube_client world::render_wall(sqr*, sqr*, int, int, int, int, int, sqr*, sqr*, bool)
1664943 1.7824 libc-2.15.so mcount
1450401 1.5527 libm-2.15.so /lib/libm-2.15.so
794027 0.8500 libc-2.15.so _wordcopy_fwd_aligned
787522 0.8431 cube_client world::computeraytable(float, float)
687461 0.7360 cube_client rdr::render_square(int, float, float, float, float, int, int, int, int, int, sqr*, sqr*, bool)
669011 0.7162 cube_client rdr::ogl::lookuptex(int, int&, int&)
640268 0.6854 fglrx-libGL.so.1.2 /usr/lib/fglrx/fglrx-libGL.so.1.2
603660 0.6462 cube_client rdr::render_flatdelta(int, int, int, int, float, float, float, float, sqr*, sqr*, sqr*, sqr*, bool)
486056 0.5203 cube_client rdr::ogl::drawframe(int, int, float)
441795 0.4730 cube_client rdr::ogl::addstrip(int, int, int)
164852 0.1765 libc-2.15.so __memmove_sse2
160559 0.1719 cube_client _ZN7physics7collideEP6dynentbff.constprop.6
....
You will find lot of information on the net like how to capture other perf counters. Look at:
$ opcontrol --list-events
Lots of details here: http://oprofile.sourceforge.net/about/
A quick overview:
1. make oprofile use your kernel (root). Ignore it if you do not care about kernel symbols
$ opcontrol --vmlinux=/usr/src/linux-3.2.13-1-ARCH/vmlinux
2. make oprofile measure time spent in libraries (root)
$ opcontrol --separate=lib
3. start oprofile (root)
$ opcontrol --start$ opreport --demangle=smart --symbols ~/src/cube/src/cube_client
5. You get this:
CPU: AMD64 family12h, speed 1497.22 MHz (estimated)Counted CPU_CLK_UNHALTED events (CPU Clocks not Halted) with a unit mask of 0x00 (No unit mask) count 100000
samples % image name symbol name
68078004 72.8798 fglrx_dri.so /usr/lib/dri/fglrx_dri.so
3984600 4.2657 cube_client world::render_seg_new(float, float, float, int, int, int, int, int)
3060858 3.2768 cube_client world::isoccluded(float, float, float, float, float)
2838442 3.0386 cube_client rdr::render_flat(int, int, int, int, int, sqr*, sqr*, sqr*, sqr*, bool)
2696379 2.8866 libc-2.15.so __mcount_internal
1777893 1.9033 cube_client world::render_wall(sqr*, sqr*, int, int, int, int, int, sqr*, sqr*, bool)
1664943 1.7824 libc-2.15.so mcount
1450401 1.5527 libm-2.15.so /lib/libm-2.15.so
794027 0.8500 libc-2.15.so _wordcopy_fwd_aligned
787522 0.8431 cube_client world::computeraytable(float, float)
687461 0.7360 cube_client rdr::render_square(int, float, float, float, float, int, int, int, int, int, sqr*, sqr*, bool)
669011 0.7162 cube_client rdr::ogl::lookuptex(int, int&, int&)
640268 0.6854 fglrx-libGL.so.1.2 /usr/lib/fglrx/fglrx-libGL.so.1.2
603660 0.6462 cube_client rdr::render_flatdelta(int, int, int, int, float, float, float, float, sqr*, sqr*, sqr*, sqr*, bool)
486056 0.5203 cube_client rdr::ogl::drawframe(int, int, float)
441795 0.4730 cube_client rdr::ogl::addstrip(int, int, int)
164852 0.1765 libc-2.15.so __memmove_sse2
160559 0.1719 cube_client _ZN7physics7collideEP6dynentbff.constprop.6
....
You will find lot of information on the net like how to capture other perf counters. Look at:
$ opcontrol --list-events
Thursday, August 2, 2012
IvyBridge GPU documentation and code on the web
Hello all,
Just to remind that IVB spec is online. I mean:
http://intellinuxgraphics.org/documentation.html
It may be a bit rough to start with but fortunately, we also have a complete MIT licensed open source OpenGL stack called "Mesa". It is here:
http://cgit.freedesktop.org/mesa/mesa/
Mesa is a big piece of code that supports many targets but you may see the Intel GPU specific part here:
http://cgit.freedesktop.org/mesa/mesa/tree/src/mesa/drivers/dri/i965
Just to remind that IVB spec is online. I mean:
- The complete state setting is documented
- The complete ISA for the "shader cores" (we call them Execution Units or EUs) is also here
- The documentation for the interesting shared functions (sampler, loads/stores) is also here
http://intellinuxgraphics.org/
It may be a bit rough to start with but fortunately, we also have a complete MIT licensed open source OpenGL stack called "Mesa". It is here:
http://cgit.freedesktop.org/
Mesa is a big piece of code that supports many targets but you may see the Intel GPU specific part here:
http://cgit.freedesktop.org/
Friday, July 20, 2012
Various code bases and cube (the game) pushed on github
I decided to follow the hype and I pushed everything on github:
https://github.com/bsegovia
Note that I also cleaned up cube (the first cube game, the one before Sauerbraten) to make it compile with no complaint on gcc 4.6 and VS2010.
Did I already say that cube is amazing? The complete engine (cube itself + its network layer aka the 2005 version of enet) takes 10,000 LoC.
I may write some post reviewing the code. However, just to have fun, really look at command.cpp which basically implements an insanely powerful mini-scripting language in 300 LoC. Really cool.
Obviously, for more features, Sauerbraten and its next incarnation Tesseract are also really impressive.
Cube however remains unique by its size.
https://github.com/bsegovia
Note that I also cleaned up cube (the first cube game, the one before Sauerbraten) to make it compile with no complaint on gcc 4.6 and VS2010.
Did I already say that cube is amazing? The complete engine (cube itself + its network layer aka the 2005 version of enet) takes 10,000 LoC.
I may write some post reviewing the code. However, just to have fun, really look at command.cpp which basically implements an insanely powerful mini-scripting language in 300 LoC. Really cool.
Obviously, for more features, Sauerbraten and its next incarnation Tesseract are also really impressive.
Cube however remains unique by its size.
Subscribe to:
Posts (Atom)



