Wednesday, June 16, 2010

OK, this was annoying

Just been spending a few hours trying to troubleshoot potential reasons why my scrap texture allocation was causing textures to render quite distorted and weird. It's not fun when your code is pretty much the same as the GLQuake equivalent, but GLQuake works fine and your's doesn't.

I'm not certain if my final understanding of this is correct, but in the end it seems as though there are more problems with D3D's floating point precision. The solution that worked was to align everything on 16-texel boundaries (some of Quake's textures aligned on 8 or 24 texel boundaries). This causes the texcoords generated by the allocator to have no more than 5 decimal places, which seems to preserve the texture correctly during rendering.

I'm going to experiment a little more with aligning on strict power of 2 boundaries or on 32-texel boundaries and see if that improves matters further, although to be honest doing so would be wasteful of atlas space (and fillrate) and may not have any really measurable improvement.

Another minor performance improvement was to revert to GLQuake's (and possibly WinQuake's, although I haven't checked) trick of updating the status bar only when it needs to be changed. I need to double-check this against different drivers as it may cause the old flashing status bar problem depending on the page-flipping policy, but I think I've solved the root cause of that at API level (D3D gives you really nice control of the hardware like that). Overall it saves about 5% to 10% on fillrate every frame (assuming the default full status bar) which can really help in situations where you're fillrate-bound (as you're most likely to be with DirectQ).

This only works with the classic status bar; the various overlay and headsup bars/HUDs need to overwrite fully as geometry is drawn under them and can be seen through them in the Z-order each frame.

I've also removed color buffer clearing except when absolutely essential. Color buffer clearing can be extremely fast these days, but not doing it at all is even faster.

Overall 1.8.5 is now running at about 1.5 times the speed in timedemo demo1 that 1.8.0 did. Once I finalize this chunk of code I can go back to porting the rest of the stuff to my new renderer (sky just went in recently) and see how much extra juice I can get out of it.

1 comments:

Nyarlathotep said...

Hang in there! We're all eager to see this, and I'm pretty keen to see how well this works on the whopping great i865G video in my HTPC...