After my benchmarking of R_RecursiveWorldNode it seemed valid to attempt a few things to address performance losses. The first thing I did was to confirm that what I found was consistent and not just a one-off freak result, so I re-checked and established that we're spending roughly 20% of our CPU time in there on bigger scenes.
This is all CPU time of course, so if you're GPU bound there won't be much we can do about it.
Before trying out a non-recursive solution I optimized R_RecursiveWorldNode until it screamed for mercy, reducing the amount of local state that needs to be saved on the stack and cutting the number of recursions almost in half. Then I did the same to R_CullBox. I gained a few frames from that, which indicates to me that I'm personally either GPU or fillrate bound. Still, it should be good for anyone on slower/older CPUs so it was worthwhile.
Next I tried the non-recursive solution. The trick here was to use a recursive walk of the BSP tree to build a front-to-back array of nodes (without frustum culling) only when the PVS changes, then scoot through the arrays non-recursively every frame (using frustum culling this time).
The end result is that it was much the same speed in smaller (ID1) scenes but quite a bit slower in bigger scenes. This can be attributable to the recursive version allowing for faster accepts and rejects of nodes (if a node is culled then all of it's children are also culled, and likewise for if a node is fully onscreen). There were also issues such as software backface culling of surfaces not working correctly.
The speed comaprisons were made without drawing being done, thereby removing the GPU as a contributing factor, but the observation on backface culling was from a run with drawing to verify correctness. In both cases the tests were made standing still in a single spot so as to rule out the recursive visible node building from the non-recursive version.
The next step seems to be to see what I can do to tune the performance of the non-recusrive version, but I honestly think that equalling the recursive version is a best-case end result here. I'll update this post when I have more info.
I will likely leave both versions in the code, with a cvar (r_usenodearrays) to select between them, and with the recursive version being the default.
Monday, April 26, 2010
Fun with R_RecursiveWorldNode
Posted by
mhquake
at
8:14 PM
Subscribe to:
Post Comments (Atom)
0 comments:
Post a Comment