Wednesday, March 12, 2008

glReadPixels benchmarks

I've been benchmarking various implementations of glReadPixels, to get a feel for what kind of performance hit I'm going to take by using it for getting the depth buffer. It's not that bad at all. Here's some findings and observations:

  • glReadPixels performance is comparable to using occlusion queries. The performance hit is almost identical. As I'm going to be coding this path anyway, I now have no reason to code a second path that uses occlusion queries. I'll be going glReadPixels all the way.
  • There is no difference between placing the glReadPixels call before the main render or after it. This will be caused by the fact that I'm not using glFinish, so I deduce that a full pipeline flush is happening in both cases.
  • Placing a few glFlush calls throughout the main render (e.g. after each texture chain and alias model that is rendered) can dramatically reduce the performance impact, as there is less of a pipeline stall when the time comes to do the glReadPixels. This is the single most effective thing that helps performance - without glFlush I lose 23% FPS, with it I only lose 5%. It's well worth investigating this more to find the optimal amount of calls (and places to put them).
  • I'm using Jay Dolan's recursion avoidance technique, so I'm only doing a glReadPixels on each frame that I do a full recursion on; otherwise I assume that the previous frame's depth buffer is good to work with. This doesn't help performance as much as the glFlush technique (about 1% to 2% gain).
  • There is no difference between reading a 10 x 10 chunk and a 64 x 64 chunk; the performance impact comes more from the pipeline flush than the size of the buffer that is read back.
  • I only do a glReadPixels every other frame, the rationale being that there's not going to be much in the way of difference between the depth buffers for 2 consecutive frames (at least for the purposes of this exercise). This virtually eliminates the performance hit.
  • Benchmarks were all done using a timedemo, meaning that the recursion avoidance and usefulness of glFlush will be somewhat less than in real gameplay.
Overall, this is far more positive than I had hoped. Being able to capture a smaller version of the depth buffer for comparison using a hardware method means that I can keep the code a whole lot simpler. I can now totally write off all plans to do a full software version.

The next part will involve actually putting something into the depth buffer that is captured!

0 comments: