Wednesday, January 13, 2010

Driving geometry submission even lower

I'm currently hatching together a plan to further reduce the overhead of vertex submission for the world (alias models are already optimal here). It occurs to me that in any given scene roughly 3-4 vertexes will contain the same x/y/z, and of those perhaps 75% will have the same texcoords (and textures). That's an awful lot of duplicate data going to the GPU. Now, I already index on a per-poly basis, and PIX counters show that I can submit over 2000 vertexes in a single batch in some real-world examples, but I think I can drive that down to about the 1000 mark.

The idea is to store a framecount with each vertex, and if the vertex was already used this frame we just need to submit the index with which it was used (this would also need to be stored). Otherwise we submit the full vertex and update it's framecount.

I haven't fully factored lightmaps into this yet however, and right now it seems as though they would completely scupper the idea, but I'm thinking that even if they do I could probably still obtain some useful benefit in the area of overall memory usage for the map (for example, I already have the dreaded warpc down to 29MB, and could probably get that down further to 20MB). Even so... I've already implemented a simpler version of the DirectQ renderer on stock ID GLQuake using the non-multitextured path, and the end result was almost on a performance par with DirectQ (both without VBOs, although bearing in mind that DirectQ has some heavier operations in some areas - particularly lightmaps - that drove it down here). It may well be the case that loss of multitexturing is a fair trade-off if the saving on submission turns out to be in excess of 50%. (In an ideal world APIs would let us index vertexes and texcoords independently, but neither D3D nor OpenGL support that.)

This would also open DirectQ up to 3D cards with less than 3 texture units. I don't anticipate that there are too many of these around any more; even the original GeForce 256 from over 10 years ago has 4.

Before I do any of this I need to establish some better sanity in my VBO lock/unlock operations. Right now I do an unlock/draw/lock with each and every texture change (I still get good performance despite this) and I need to batch these up so that I only unlock when the VBO is full, then follow that with multiple draws.

Generally when I do things like this I pick on the alias models renderer first. Everything is nicely self-contained in a single function, and overall it's simpler to work with. It's also brought me good luck with advances in the past; every advance I've successfully made here has also been of great benefit elsewhere.

Let's see if it happens again this time.

0 comments: