Tuesday, May 11, 2010

When hardware T&L is slower...

If you've read portions of DirectQ's source code you might have noticed that I perform my own transforms in software rather than using hardware transforms. You could be forgiven for thinking I'm silly to be doing this, but in actual fact hardware transforms - even in heavy scenes with up to 100,000 vertexes in them - are actually somewhat slower.

This isn't the normal way of things so there must be a reason, and the reason why is batching. Let's take one of the most familiar scenes in Quake.



There are 4 zombies (2 hidden behind a wall) and 3 torches (one hidden behind a wall) that must be sent to the GPU in this scene. The batching that DirectQ does enables all 4 zombies to be sent in a single operation, and likewise all 3 torches to be sent in a single operation. With hardware T&L in use it would be necessary to reset the modelview matrix (or D3D equivalent) between each entity, thereby breaking the batches into smaller chunks, and thereby making less efficient use of your GPU. True, you do gain speed in one area by avoiding the need to transform in software, but you lose far far more speed than you gain by breaking the batches, so on balance it's slower.

For 1.8.4 I've added an alternate code path to demonstrate this. Right now it's there but completely disabled (if (0)), so anyone who wants to use it would need to recompile the engine. I might add a cvar to control it, but I really don't want to encourage anybody who might be under the impression that hardware T&L must be always faster in every situation to use this code path, so then again I might not. What I do want to do is show that - yes - I have thought about these things, considered the options, tested different approaches, and settled on what works best.

This post I guess demonstrates another area where things may not always be as you expect, and goes hand-in-hand with the post I made some time ago about disabling multitexture in GLQuake (and why it's sometimes a performance advantage) versus disabling multitexture in DirectQ (and why it's never a performance advantage).

Next time I'll talk about hardware instancing and why it's not appropriate for DirectQ.

2 comments:

Nyarlathotep said...

Weird, random thought that just came from nowhere: Hexen 2 doesn't enjoy the widespread popularity of Quake, but it's no less deserving of a quality modern port. Hammer of Thyrion does nicely in that regard, but is there any chance of a DirectH2 in the future?

Also, have there been any reports of how well DirectQ gets along under Wine?

mhquake said...

I'd love to do a DirectH2 (love the game) but the problem that I have with it is that the released code needs the mission pack. Not a problem for me (I have the mission pack) but potentially a problem for anyone who downloads it.

I know that earlier versions of DirectQ worked just fine under Wine, but I'm not certain about the more recent ones and haven't heard any reports in a long time. I'd expect that the answer would be "not very well" as a lot of the code I use nowadays is very D3D-specific and having to go through a translation layer to OpenGL would hurt it.