Sunday, March 16, 2008

Premature Optimization

One of the golden rules of programming is to avoid premature optimization, but yet with my Z Buffer capture I have not only optimized prematurely, but have deliberately set out to do so. This was for a number of reasons.

Occlusion queries are a fairly tried and trusted technique, but the current implementation requires a pipeline commit before you can read back the data. For multiple entities, where you only want entities to be tested against world geometry (i.e. not against other entities) that translates into multiple pipeline commits. On one of my test machines, the net result is that implementing hardware occlusion queries causes a drop in FPS from 230 to 170. In many cases, this is acceptable as you can win back by skipping subsequent rendering. In Quake, the win back is not sufficient to justify the loss.

The primary goal of this technique was to implement occlusion. The secondary goal was to do so without any appreciable performance loss. Achieving the primary goal is relatively easy, any first year student could write the code, and even a non-programmer could sketch out the basics. Pythagoras knew how to do it. However, doing it in an acceptable and feasible real-time system is not easy.

As soon as I knew that I wasn't going to be able to get a fully-in-software implementation working, and I soon as I made the decision to walk away from even attempting it, I knew that any solution would have to be optimized like crazy. At that point in time I had fully intended to abandon it completely, but something about glReadPixels and GL_DEPTH_COMPONENT kept nagging at the back of my brain. The performance loss from glReadPixels is from two areas: the pipeline commit and the actual read back. If the impact of these could be minimized, then I would have a viable solution.

From then on it was a case of 2 + 2 = 4. Since a highly optimized solution was part of the basic requirement, it followed that any initial prototype would have to be optimized from the outset. Otherwise there was no point in even continuing beyond the depth buffer capture stage. So in order to meet the basic requirement, I broke the rules.

I suppose that the moral of this story is that the old rule of "premature optimization == bad" still stands as a good general rule, but it's important to realize that it's not universally applicable, and that when cases arise where optimization is required even at proof-of-concept stage, you need to sit down and consider whether or not it actually is premature.

0 comments: