Tuesday, November 17, 2009

Relieving the bottleneck on slow Gfx cards

This one's going to be a little bit difficult to explain, so bear with me. If it seems bad at the start, remember that it's not and try to get through to the end.

By way of introduction, all versions of Quake by default limit your FPS during play to 72. This is overridden in timedemos, and can also be overridden by some engines that provide a cvar for it (DirectQ's is called "host_maxfps" and defaults to 72).

Now, there are a lot of things going on in a frame. Key events and other input, command processing, network transmissions, screen updates, sound, and so on. Of these, screen updates is actually the major bottleneck on a lot of hardware. Even a 4 year old machine is perfectly capable of running everything else at a coupla thousand frames per second (try setting r_norefresh 1 then doing a timedemo and you'll see what I mean).

To relieve some of this bottleneck I've added an optional (off by default) cap equal to your refresh rate multiplied by a user-configurable factor on how many times per second the screen update portion (and only the screen update portion) of each frame can run.

The next bit is very important.

This is not the bad thing it might sound like to some.

The maximum number of times per second that your screen can be redrawn is equal to your refresh rate anyway, so one effect of this cap is to prevent damage to your monitor.

A second effect is that it relieves pressure on your graphics card. If you're running a 60 Hz refresh rate and you don't have enough fillrate to do the Quake default 72 FPS, you'll get better performance all round (yes, there are still cards like that, even in modern machines).

A third effect is that it allows other frame events to run at something more like their natural speed, meaning that even if you're getting poor graphics performance you're less likely to suffer from framerate dependent physics and less likely to choke input and sound. Even those with low framerates can get a more enjoyable game.

And finally remember that it's off by default, so unless you actually enable it you will notice nothing different at all.

This is not the same thing as vsync; vsync (which will also be available in the next release) blocks everything until the next refresh interval, this just skips drawing but lets everything else run normally and unaffected.

A side effect is that timedemos are stupidly fast, and are in fact largely meaningless.

If you're not happy with the idea of this, you can always forget that it ever existed; it's off by default, remember. If on the other hand you're getting uneven input and/or choppy sound, or if your CPU meter in Task Manager hovers very low despite DirectQ running flat out, you're heavily GPU-bound and this might be the solution you need. You can even experiment with different values; set it to 0.5 to refresh at 30 FPS (assuming a 60 Hz refresh rate, and remembering that everything else runs at normal speed), for example.

It's not mandatory that you use it (chant: "off by default, off by default, off by default"), but for the primary target audience of this engine (those with slow Intel cards) it does provide benefit.

6 comments:

=peg= said...

Sounds like a very good thing to have to me!

By the way, is this the equivalent of "independent physics" in QuakeWorld clients? I was never really sure what that meant, but reading this post, it starts to make sense, unless it's totally unrelated ;)

LordHavoc said...

I don't really understand the purpose of this, if going by the assumption that rendering takes time and does not return until it is done, then you simply end up with a series of short frames in between very long frames, giving an odd stutter to the timing.

If going by the assumption that the GPU is asynchronous and that you are waiting on the GPU to render things, and that it will continue doing so while you do other things, then I can see the purpose (a series of short frames in between semi-short frames).

If going by the assumption that the game is CPU bound in rendering code and the GPU is waiting on the CPU, this does no good (series of short frames in between long frames).

I don't really grasp the intent...

@peg no, that just means it lets your rendering fps go higher than the network rate :)

mhquake said...

Direct3D is somewhat asynchronous in that it buffers up to 3 frames worth of data and commands. The problems occur when you have a fast CPU but a slow GPU; you're feeding data to the GPU faster than it can handle, and that's causing semi-random stalls (which in Pix show up as being quite significant in length) while currently buffered data is force-flushed to make room for the new stuff. By deliberately slowing down the rate at which data is fed to the GPU you avoid this and everything is smooth, you get much better parallelism.

Of course, if you don't have a slow GPU then none of this applies and you don't need this feature.

=peg= said...

Ah ok, thanks for clarifying that LH ;)

LordHavoc said...

Okay, I think I understand the reasoning then.

The next question is: how universally could this feature be enabled? What drawbacks are there to clueless users? How can it be automatically tuned (or eliminate the need to tune it)?

I always aim for the "clueless users" being able to enjoy the game as intended, if a feature is beneficial, it should be automatic.

mhquake said...

Interesting question. I'm not certain if it could be automatically tuned short of running some dummy frames and comparing the results against expected benchmarks, but I'm massively suspect about such an approach (what do you set the benchmarks to for starters, and how do you know another process on the machine didn't adversly affect them?)

Doing it in real time might be another option. Assuming that vsync is off, comparing the average time that it took to render the previous 100 or so frames (the number would likely have to be in that region to be statistically valid, if not even higher) against an expected time, and dynamically adjusting based on that, might be an option. But I'm talking off the top of my head here and don't have any Real World figures just yet. Transient conditions associated with a pre-emptively multi-tasking OS could potentially cause invalid results here. Also suspect, although as it would adjust in real time a transient condition would soon enough be recovered from.

Right now I'm thinking the best way is to have it as a "try this and see if it makes a difference" option. Not perfect I know and accept, but it seems better than running the risk of an automatic mechanism setting it too low when it needn't be (or too high when it should be low).

This opens the question of whether such an option should be exposed to the user at all. In some ways it can seem counter-intuitive to deliberately slow down rendering in order to gain (or at least level) performance by having headroom and space to breathe elsewhere, and it's also exposing some behind-the-scenes stuff that the user should - quite correctly - not have to be concerned with at all. I'm willing to try it as an experiment in my next release, but I'm not anticipating oceans of stats or feedback on it.