Thursday, December 31, 2009

WDDM Drivers and Multithreading

I've just been putting some work into making DirectQ behave itself better with WDDM drivers. This is the driver model used by Windows Vista and 7, and is restricted to cards that are capable of running the Aero desktop. There are a few considerations with these drivers which don't apply otherwise, but in practice they can be implemented anyway.

Firstly it seems as though textures don't preload to the GPU properly. It's necessary to do a single full screen update at the end of R_NewMap to get textures loaded from system RAM into video RAM, otherwise in some circumstances the first frame of a map will look very strange indeed (partially drawn with lots of grey holes in it) and will stall the engine while textures are being demand-loaded. I'm going to extend this concept to doing a render of each texture to an offscreen buffer so as to ensure that all textures needed are kept properly "hot".

Secondly, and I'm of the opinion that this is at least semi-evil, WDDM drivers have a "stall detection" feature. If the runtime thinks that a graphics driver has hung, it will reset the device automatically. This is all fine and dandy, but the stall timeout is a mere 2 seconds. Now, and it's a matter of debate whether this is Quake's fault or not, but when loading a map Quake does not update the screen. With larger maps, and in general when loading the first map after Quake starts, the time period during which the driver is idle may be longer than 2 seconds. This will trigger the stall time out (in fact MS advise that anything running at less than 10 screen updates per second might trigger the timeout).

To work around this I've devised a function that detects how long it's been since the last screen update and triggers one if this time period is passed (if we're already in the middle of a screen update it silently does nothing). Right now I have it being called from various strategic places in the code, but I'm going to put it into it's own thread so that it's constantly running. The thread will be started suspended, kicked off by the first model load, and suspended again after all loads are completed so that it doesn't interfere with normal operation. I'm looking for a reasonably benign operation I can perform on the device that will go through to the hardware layer but won't affect what's on-screen.

Speaking of threads, I've recently been giving thought to a multithreaded DirectQ. Right now on a 2 or more core machine DirectQ just uses a single core, which is obviously not taking full advantage of the hardware's capabilities. Thinking about the logical separation points for threads, it occurs to me that the most obvious is the client/server split. Others include sound and possibly graphics and input (to ensure that input resolution is not affected by framerate). I'd consider running the server thread at a steady 72 (or whatever your value of host_maxfps is) FPS, so that we would get framerate independent physics.

I haven't even written any exploratory code for this yet, so don't expect major updates in that direction for a while. It would be a fairly large task, and would involve identifying and finding satisfactory solutions to any client/server crossover points (the server would definitely need it's own copy of the models, although they could be stripped back a lot from what the client has).

0 comments: