Thursday, July 1, 2010

1.8.4 versus 1.8.5 performance

Performance optimizations that benefit one type of map don't always translate into optimizations that benefit another type. There are some that are no-brainers for sure, like the improved batching, reduced particle fillrate, chopping off the status bar area from per-frame updates and scaled down rendertarget for bonus flashes.

Others are different. With ID1 maps using a common vertex size and format for everything made sense. It was only necessary to set up the vertex buffer once per frame, and so on. With other maps it was a drain. The sheer amount of data going in meant that not only was the advantage of only setting up the buffer once wiped out, but also that further performance was lost.

I could have cvar-ized this one and let the player decide, but ultimately I decided myself. ID1 maps are going to run fast anyway, so any further performance gains - while certainly nice - are dubious at best. When they're achieved at the expense of reduced performance where you really need it most, then they're just more trouble than they're worth.

So I've reverted 1.8.5 back to the old 1.8.4 way of using different vertex sizes and formats. Ultimately this means a drop of a few percent in ID1 maps (doesn't matter, 1.8.5 already wipes the floor with 1.8.4) but a worthwhile (and sometimes even colossal - I've seen 30% increases here) gain in big complex scenes that really stress the GPU.

FPS scores are meaningless; all they serve to indicate is the difference on my own machine. But for what it's worth, 1.8.4 was nudging on 260 in timedemo demo1, whereas 1.8.5 will get you over 310. A good chunk of this comes from the reduced-size rendertarget (to 80% of the viewport size, which I've found keeps framerates when underwater consistent with framerates when above water) but even without that it was getting over 280.

Others may scoff but I've found the ID1 demos to be good representations of different parts of the renderer that stress the engine in different ways. demo1 is the particle (and therefore fillrate) monster, demo3 is the dynamic lighting monster, and demo2 gives the underwater warp code a good working out.

Something like bigass1 (170 FPS, by the way) may be quite brutal, but that's just stressing everything. No useful information there. For testing purposes you need to be able to home in on more specific items, and particularly demo1 and demo3 are really useful for identifying performance in two parts of the renderer that every engine is going to need to tackle and optimize.

1 comments:

Pottenham said...

As always you´re making the right decissions,like "loosing a few fps in ID1 maps to gain performance where it really matters".

Can´t wait to test the new version,with r_ambient,custom charsets and all those other nice optimizations on my favourite maps.