Saturday, June 26, 2010

Dammit, my code is awesome!

Bet that got your attention. ;)

Anyway, this is by way of saying that the Top Secret plan I had for making the world draw faster has been tried and rejected. It didn't work out.

The idea was (obviously) to put all of the world in one (or more, if needed) big static Vertex Buffer, then use that for rendering from rather than the dynamic buffers I currently use. In theory it means that a good number of vertexes don't need to be downloaded to your 3D hardware at runtime, so the bandwidth saving should make things faster.

And it didn't.

It's worth examining why it didn't. The first and most obvious reason is that in Quake the world is actually not a static thing. You move around in it, things come into and go out of view, pieces of it can move, and so on. Using static storage to represent dynamic objects means that you've got to do an awful lot of hopping around in that static storage in order to find the ones you want. Jump a few hundred bytes forward here, a few thousand backward there, that kind of thing. You could potentially pass back and forth over the same region of memory several hundred times per frame. Slow. On the other hand dynamic storage allows only the objects you actually need to be streamed in, and in the order they're needed too. Linear access, moving forward all the time. Fast.

Second reason is that it broke the batching. Imagine a situation where you have a group of surfaces that all share the same texture. Some of them belong to the world, some of them to moving objects like doors, platforms, etc. DirectQ can draw all of these together in one draw call. Fast. Breaking the batching means it needs 2 or more draw calls to handle them, which slows things down. That's just one texture; now imagine that you have 5, 10 or even 50 textures. Slow.

The third reason is that the current renderer has had the benefit of a lot of micro-optimizations over the past half-year or so. True, it's recently been rewritten, but I was able to carry a lot of them forward. Individually they don't amount to much, but taken together they do a great deal to reduce the overhead of dynamically streaming stuff in.

Overall I'm not displeased. I needed to satisfy my own curiosity here, and I've now done that, so from that perspective alone it was worthwhile. Also, the code was starting to turn a mite ugly. I don't like ugly code. Aside from making Baby Jesus cry, it's something that I just know I'm going to need to come back to in the future and try understand, and risk upsetting things (and myself) by trying to add new functionality to what would have been an already quite disgusting setup.

I'd already been there with 1.7 when trying to add in alpha surfaces, so it's not somewhere to go again.

So after that detour it's back to occlusions. I'll update on that shortly.

0 comments: