Sunday, September 12, 2010

Experimenting with Lightmap Updates

Quake lightmap updating, how do I love thee? Not very much at all, really. After my previous mountains of fun figuring out what the hell was going on with glTexSubImage2D it was recently time to experiment a little with the timing of updates.

I have for a while been working on the basis that updating lightmaps at the end of the frame was the best thing to do, as it would give the glTexSubImage2D call some more time to complete (i.e. while the server and client are running for the next frame) before the texture is needed again. Of course that means that the light updates you see on-screen would be one frame behind, but you won't notice (I would defy you to), and it's only when things are inconsistently out of sync that they become jarring. Logically this should be faster as the call (and it's associated data) should just go into OpenGL's command queue, but it seems as though things are not so clear.

It turns out that - on some drivers, at least - if OpenGL hasn't finished drawing with a texture when you update it (which is more likely than not to be the case, as OpenGL - and D3D - is an asynchronous API) it will stall until it finishes before the texture can be updated. These stalls can be quite brutal, requiring glTexSubImage2D to pause for 27 milliseconds in one case. That dropped framerates down to 30 on one driver I tested on.

So the best time to update a lightmap is to wait until almost as late as possible before you actually use it to draw with. This means that all updating of the in-memory copy of the data should be completed and it should be ready for the glTexSubImage2D call - and nothing else, so far as updating is concerned - before you use it, and you should not call glTexSubImage2D again on it for the remainder of that frame.

This is one reason why Quake's default multitextured renderer gets incredibly jerky and slow on certain hardware, why modified engines that don't change that part of it can gind to a halt, and why it's sometimes faster to disable multitexturing with those engines. The default will only update a small region of the lightmap, then draw, then update another small region of what is potentially the same lightmap, then draw again, and so on. So the sequence is update/draw/update with massive stall/draw/update with massive stall/draw/etc.

Oddly enough, this is exactly what I already knew to be the case with updating dynamic vertex buffers, so with hindsight it should have been very obvious to me.

It may still be the case that this behaviour is only exhibited on certain drivers but not on others, and that I might need to do an on-startup dummy run through an update to figure which is the fastest, and I haven't yet done head-to-head comparison of the two strategies on a wide enough variety of hardware, so I don't know yet how widely applicable this is.

There may also be some benefit in having a short delay between the update and the use - say by drawing something else (like sky) or by sorting some objects - but this will need more testing to be certain.

My overall feeling though is that when it comes down to the final analysis, a strategy that works most acceptably on the widest variety of hardware is always preferable to one that works excellently on only a selective subset.

Every lesson learned, no matter how frustrating at the time, is a good lesson in the longer term, and this was certainly one of those.

(DirectQ 1.9, by the way, is probably going to evaluate lightmaps fully dynamically on the GPU in a pixel shader so none of this will be relevant to that.)

0 comments: