Monday, January 11, 2010

More on Vertex Buffers

I switched alias models over to my proposed new vertex buffer approach and - even without the cycling pool idea - got a nice performance boost (about 10 to 15 frames) from the infamous ne_tower scene (vertex and index buffers in D3D actually seem to be extremely intelligently designed for reuse multiple times per frame). Note that this is without the occlusion queries I was talking about earlier, so I think that on the appropriate hardware, and after I implement both the cycling pool and occlusion queries, I might be able to get this scene close to locking at the 72 FPS mark. Typical ID1 scenes won't see much benefit as they don't stress the hardware enough, however.

Thankfully the rendering design I had set up was able to make the changeover very painlessly. Just replace my memory buffers with the equivalent D3D buffer objects and it worked perfectly first time.

Phew!

The next big question is how large should the cycling pool be? Right now my buffers hit the 65536 objects (either vertex or index) per buffer mark, so with a pool of 8 of them we're looking at maybe 13MB of video RAM. That is nowhere near the last word on the subject though, as the buffer size does NOT need to be that large. I can easily go for maybe one 10th of the size and still retain full performance (my personal feeling is that one 100th might even be adequate), so increasing the pool size to something like 32 or even 64 would give less video RAM overhead (5MB and 10MB respectively at a buffer size of 6000) and enable more optimal throughput - just because D3D can reuse the same buffer multiple times per frame doesn't mean that you should (I think!!!)

Currently I've got maybe 8 DrawPrimitive calls for alias models in a typical ID1 scene, topping out at perhaps 40-50 in the most OTT huge maps. I don't see those figures changing much with a buffer size of 6000, so 32 seems a nice pool size. Although I eventually intend adding the world in there too, with that kind of pool size it should be more than ample time for drawing from any given buffer to complete before it's turn comes around next.

Of course these are all figures that I'll play around with and see what gives the best balance of performance vs storage efficiency. It may even turn out to be the case that I don't actually need the cycling pool at all and I'm already at my performance peak, but we'll see.

Thankfully the previous release was an alpha as it means I have more freedom to play around with the way things work and add and remove features as I see fit. There will be some minor consequences of this approach, such as the ability to switch hardware T&L on or off, as well as to control the variable max submission batch size being removed.

Anyway, overall what we have here is a neat and efficient setup that will boost performance in difficult scenes ever further, so it's quite a fair tradeoff, don't you think?

0 comments: