Sunday, November 6, 2011

Point-frag: a distributed game engine

These days and weeks, I am developing a small game engine in my spare time. I personally spent 18 months in the video game industry. It was pretty nice and overall video games are very interesting since the amount of technical challenges is huge: rendering of course, streaming, network, AI, a lot of system programming and so on...

So, after thinking about cool personnal projects I may do, I decided to play with a video game engine. The current code is here:
(do not try to run it right now, it is changing quickly and I am the only developer so it is unstable)

One of the first goals is to make it completely distributed: there is no heavy threads, no main thread or no renderer threads. Basically, *everything* is executing uniformly through the same tasking system (in that case, yaTS).

In all game engines I saw, several heavy threads usually exist. They are typically:
- main thread which is basically a loop
- renderer thread which run the OGL/DX/Gcm/whatever draw calls
- streaming threads
- audio thread
Usually many of them communicate with a dedicated command buffer with the main thread which is usually connected to them in a unidirectional or bidirectional way.

Beside that, all other threads are worker threads and usually communicate with a task interface (like C functions or C++ class or whatever). They are a resource that can be used at any time.

This basically makes the thing somehow cumbersome and we have to deal with several levels of parallelism in the system.

So, the main idea with point frag is to remove that and to have only worker threads:
- there is no main thread
- there is no game loop

Of course, typically you may need to pin some tasks to some threads (like something that does some OGL draw call), the system is flexible enough to run arbitrary task on arbitrary threads.

In the code you can see today, I simply display a model and use the tasking system to load the textures (asynchronously) and compress them into DXT1... Nothing spectacular but I must start somewhere.

However, the cool thing you may already see is the fact that the game loop is emulated by making frame_n spawn frame_{n+1}. This allows the system to continue in a continuation passing style way.

The good thing is that since there is no game loop, you can then design your complete game as a pipeline. The idea is to subdivide your frame into sub-tasks and see each sub-task as an element of a pipeline. This allows to have partial frame overlapping or if you duplicate the pipeline (so you have twice, three times a frame), you can even increase more the parallelism (with more latency).

Since everything is a task, it becomes easier to decrease (for example) the "renderer thread" (basically the one that has the OGL context) burden. Just subdivide the job to do to only have OGL calls in specialized OGL tasks. Everything else can go somewhere else.

Unfortunately, nothing is perfect and IOs are a problem. By IO, I mean any asynchronous requests handled by the system. When you read a file in a non-blocking way, you must do something to cover the latency. In yaTS, you can ask the tasking system to run anything. It is convenient.

However, the other task you call can also do an IO that can be much longer to complete than the IO before. This therefore blocks the calling task that performs the shorter IO.

Fortunately, you can handle this problem with a tasking system a la yaTS / TBB. Basically, it is possible to emulate co-routines with tasks and to yield capabilities

This is for another post (and that does not require assembly)

So, the no-heavy-threads approach may sound good but scheduling remains a damn problem. This is the good thing with thread over-subscription + blocking IOs: the system really helps you making the system progress in a relatively fair and time-shared environment. When you discard the system and do everything yourself, you are on your own

Anyway, point frag is my laboratory for experiments regarding tasking, rendering, procedural generation... Let's get some fun


cbloom said...

Cool. I definitely think this is the right way for the future.

The way I deal with IO is by having a separate thread to run IO tasks, and other tasks are able to "pause" on an IO wait like a coroutine (the pause causes them to actually be pushed on a waiting stack and the worker thread picks up another task).

I've been wanting to do an experiment like this but also with semi-transactional game object model :

bouliiii said...

I mostly implemented the same idea for the IO except I am just using the task queues to wait. Having a sperate queue may be a better idea. I have to think about it.

Problem is however to know when to re-pick the tasks and when to yield the worker thread if there are only IO tasks. This is by the way a problem with distributed queues, you never know for sure that there nothing to do and for power reason or with a machine with SMT it can be clearly better to yield the thread competely. Busy waiting is bad.

Anyway, I read your post and this is really interesting. This really gave food for mind for the game objects.
Right now, I just use a mutex :-)