The interesting thing about some of my current work is that it throws up all kinds of subtle and curious bugs. I'm going to save the punchline on this one till the end; for now let me describe what was happening.
I knew that I was getting some strange rendering stats. I could see that easily in some maps; epoly counts (which - remember - are now mostly irrelevant for performance, but still handy to keep around as we'll soon see) were see-sawing wildly between large (80k ish) figures and relatively smaller (20k ish) ones. Stuff I was doing that should have given a good speed boost wasn't quite having the effect I expected. Plus I had this weird thing happening with entity alpha. Stick with me because it's all related.
Let's talk about the latter. I was running some tests and noticed that certain models which should have had alpha were behaving strangely. Instead of a nice steady translucency they were flickering wildly between the alpha they should have had and full solid. Quite funky stuff, and I was initially blaming the Nehahra code (which uses it's own rather weird standards, and none of what's coming up is going to stop me doing the cleanup it badly needs).
Some digging around followed; a lot of Con_Printf'ing and hard-coding alpha values into entities to see what happened. All I could determine was that it was as if something was switching alpha blending off when it shouldn't be.
I removed redundant state filtering, ran with the debug runtimes and under Pix; no luck. The only useful observation was that it seemed to go to full-on or full-off under odd circumstances too - such as when the console was down or when certain menu screens (like the Quit confirm) were up.
Definitely a mismatched state problem I still thought (hint - it wasn't), but I was damned if I could find it. Forced alpha blending on all surfaces - no luck. Hard-coded alpha into my shaders - no luck.
One thing I did notice while comparing with other maps was that it only seemed to happen on static entities. Now we're getting somewhere, but what the hell was going on?
Even more confusingly - this only happened when host_maxfps was set to above 72. Set it to 72 or lower and it didn't happen. This should have been the giveaway but by then my brain was frazzled enough to completely miss it.
So I traced back through the lifetime of these compared to server entities, when something that I should have been aware of a long time ago came up.
I'm running with decoupled timers.
Server entities are added to the visedicts list on every pass through the main loop when the server runs.
Static entities are added to the visedicts list on every pass through the main loop when the client runs.
The client typically runs between 5 and 35 times faster than the server in this kind of setup.
So we had an initial visedicts list, then static entities get added to it and stuff gets drawn. Then static entities get added to it and stuff gets drawn again. Then static entities get added to it and stuff gets drawn again. Then static entities get added to it and stuff gets drawn again. Then static entities get added to it and stuff gets drawn again. And so on for 5 to 35 frames until the list is cleared again when a server frame runs. OUCH!
And the reason why it didn't happen when the console was down? When you bring down the console in a singleplayer game I throttle framerates back to 72. Simple! And what about the menu screens? In there I run a few screen updates to flush pending stuff before popping up the notification dialog and listening for keystrokes, during which no screen updates happen and both the client and the server pause. It doesn't take long for drawing the same entity over and over again to fill to solid, and with the visedicts list never being cleared during that time it never went back to translucent.
Fortunately the fix was easy enough - just prevent static entities from being added if the list hasn't been cleared this frame. But wow; the torture I was inflicting on the renderer (and on the visedicts list) was something to behold.
So as a bonus I've got something like a doubling of framerates in situations where this occurs heavily, as well as a certain amount of sanity checking that I need to do elsewhere. In particular I need to check if running CL_RelinkEntities every client frame wouldn't be a better idea. But that's all for later. Right now I'm just glad that I didn't release with that one in.
Wednesday, April 6, 2011
Bug Hunting!
Posted by
mhquake
at
2:51 AM
Subscribe to:
Post Comments (Atom)
0 comments:
Post a Comment