Thursday, May 26, 2011

Updates - 26th May 2011

One of the really neat things about working on two different engines at once is that I get to try out many ideas in both, and cross-check how something works in one with the other. The net result is that something that's potentially new and dangerous becomes a lot safer and more robust.

So the whole FPS-independence thing is coming to a very satisfactory close. There are just a few edge-cases to work out, but overall it's working extremely well and completely glitch-free. Both RMQ and DirectQ now have it, with DirectQ being used as the trailblazer and RMQ being used to consolidate, confirm and cross-check.

One really good thing about it is that the code took a very unexpected turn and ended up being far far simpler than I had ever anticipated. In fact, on a first read it seems more or less identical to ID Quake; a few functions have been moved around, a few got an extra "frametime" parameter, and a few were split into two functions. But nothing totally earth-shattering came out of it.

MDL movement interpolation has also taken a strange turn - again in both. It turns out that the old QER code is - fundamentally - total bollocks for a lot of cases. I'm still using it for multiplayer games, but for single player I ended up scavenging some code from an early DarkPlaces that does movement interpolation on the server. This works a LOT cleaner and neater, and doesn't suffer from occasional timing glitches.

The by now mandatory "other news".

RMQ is getting IQM as an optional replacement for MDLs. Things have gotten to the stage where the limitations of MDL are becoming a serious factor, and something better is just flat-out needed. Thankfully a sane and sensible option is available, rather than a horrible monstrosity with everyone's favourite feature bolted on. Now if only similar was available for BSP...

DirectQ has gone through another evolution of it's particle system. Previously it used one of two modes - either with or without hardware geometry instancing. That's all been ripped apart and replaced by a much cleaner and simpler (and less CPU/bandwidth deficient) system. If you're familiar with the DirectX SDK Instancing Sample, it's option 2 - "Shader Instancing with Draw Call Batching". This gets particle submission down to a single vertex per-particle (instead of 4), runs on SM 2.0 hardware, and - in all of my tests - is a good deal faster than anything else when under load.

I've used a similar technique for controlling the view model interpolation, so it's no longer necessary to refill a dynamic vertex buffer with blendweights each frame. All small stuff in the performance stakes, but small stuff can add up.

Occlusion queries have come and gone again in DirectQ. I was considering them for RMQ as well, but they won't be used in either now. The simple fact is that I was optimizing for freak extreme conditions which ended up being slower in 99% of more common cases - sometimes much slower. Occlusion queries are really only of value when the effort required to just draw the thing is higher than the overhead from issuing queries, drawing bounding boxes and collecting results. You also need to have most of the objects in your scene actually occluded for them to work right - otherwise you're expending extra effort just to get back a "yeah, draw this object anyway" result.

I'm still slightly intrigued by the possibility of using them as a replacement for PVS, but that of course is only valid on the client. The server still needs it's own PVS and you can't replace that with renderer code.

Finally, both engines have now got ultra-smooth player movement. There's always been some low-level grittiness or jerkiness in Quake's player movement, which has been completely eradicated. It's now even possible to run at the standard 72 FPS and not get even a single jerk, but combined with the FPS-independent stuff, if you want to go faster for whatever reason (to match your monitor's refresh rate, say) you can.

Over the next short while, implementing IQM is going to be my primary thing, so I'll probably write some on that soon-ish.

Thursday, May 19, 2011

Would anybody mind...

...if I got rid of the view pitch drifting code?

This was old stuff from the days of keyboard looking, or having to hold down a key to mouselook, which automatically recentered your view after a short while as soon as you stopped looking.

I think everybody plays with mouselooking on these days, so it would help resolve a few complexities if I could just delete it.

Tuesday, May 17, 2011

Running at over 1000 FPS

There are a lot of subtleties involved when framerates go over 1000, and a lot of strange things suddenly start happening. I guess this is partially reflected in the "don't allow really short or really long frames" comment in the ID Quake source, but it would have been nice if that had been followed up with "...because this, that and the other happen". All the same, I don't expect that Quake was ever tested at this kind of framerate back in 1996 so I'll let it pass without further comment.

One item of concern is dynamic lights. There are some dynamic lights in the engine which are given a die time of 0.001 seconds after they are spawned, with the obvious intention being that they will last for one frame and be respawned again immediately afterwards. At over 1000 FPS we suddenly have them lasting for more than one frame, with the end result being that we could have multiple such dynamic lights on the go at one time. If we were running at 5000 FPS we would actually have 5 dynamic lights being thrown around a player who has the Quad Damage - bad for performance indeed (although at that kind of speed performance isn't something you worry about).

A really weird and subtle one emerged in conjunction with timer decoupling. In order to get smooth and responsive input it's necessary to accumulate input events every frame, then gather them up and send them to the server at 72 FPS. This only becomes an issue at this kind of framerate (when movement suddenly becomes ultra-jittery otherwise); run any slower and you can ignore it.

Anyway, when I simulated a framerate of 10,000 FPS (via host_framerate) I discovered a very strange effect - pressing the forward key would cause me to move backwards, and vice-versa. Some digging around revealed the answer - Quake transmits the forward movement to the server as a short (16-bit) integer. This has a maximum and minimum of about 32,000, and the accumulated input was causing it to overflow and wrap.

I guess that's a definite case of a protocol limit on how fast Quake can run. I'm not too certain of where the precise cutoff point is; I measured it around the 5,800 FPS mark but can't be any more precise.

Particle effects are another interesting one, and this affects framerates below 1,000 too. I mentioned this one earlier, but the technical details are that it's necessary to split particle spawning off from CL_RelinkEntities. CL_RelinkEntities is called by CL_ReadFromServer, which must be called every frame (otherwise movement goes totally to herky-jerky land), but if you spawn particles behind a rocket every frame you'll be spawning maybe 10 times as much as you should. So particle spawning needs to also run at a slowed-down rate otherwise the faster we go the more we'll hurt framerate. At maybe 1000 FPS it translates into a halving of framerates every time a smoke trail is spawned.

All rather curious stuff indeed.

Monday, May 16, 2011

DirectQ Update - 16th May 2011

I've taken another pass through the whole timer and timer decoupling code, and now have a much better, more stable and more flexible solution. Instead of being hacked together based on a best-guess, this one is actually based on the proper documented way of doing this stuff, which is nice. There are still lots of subtleties with Quake's timing (I'll be mentioning one later) so it's still somewhat in the experimental/disabled-by-default bracket, but I can confidently say that I'm now at the stage where I can see it becoming the hard-coded enforced behaviour at some time in the future.

Particles have been worked over some more, and a lot of what I wrote about a coupla days back is now totally overturned. There is no longer any particle texture in the engine (and therefore no quality cvar to control it): instead it's entirely generated on the GPU which gives extremely high quality up-close but scales back beautifully when particles are further away. It's a mite slower in some circumstances but faster in others, and overall I think the quality tradeoff is well worth it.

The only thing relating to particles left on the CPU is now velocity and position updates; everything else is GPU-side.

Speaking of particles, that timer subtlety I mentioned rears it's ugly head here. It turns out that when running at a high framerate and connected to a remote server (or playing a demo) a lot more particles are being generated than when running at a lower framerate (which is the cause of a preformance problenm too). You can see this yourself by checking out the lava ball trail in the Start map at different values of host_maxfps. One solution is to use timer decoupling to scale back the rate at which particles are generated to a constant rate irrespective of framerate.

My proposed breaking change to the video code is likely going to be deferred for a while; I've reviewed the code and made a first attempt (which I very quickly reverted from), and it's quite obvious that the whole startup code is a mess that probably needs to be gutted and rewritten more than anything else. A huge part of the reason for that is that much of it dates back to my original D3D port and has been hacked around to make things work rather than being properly implemented.

All in all an interesting batch of updates.

Saturday, May 14, 2011

DirectQ Update - 14th May 2011

Been a while since the last update but I have been working away behind the scenes on a few things, just tuning and optimizing more parts of the engine.

There is a change in the visual appearance of the particle dot, which has now moved from being a resource embedded in the engine to being a procedurally generated texture. It's not generated in the shader, but rather in engine code; all the same it does mean that you can now control it's quality level (although in practice there's no performance gain from using a lower level, so it's for visual preference only) via the r_particlequality cvar.

Particle transforms have moved entirely to the GPU (which was very nice to do) and performance is up overall.

Likewise the underwater warp texture has made a similar move to a procedurally generated texture and can be controlled with r_waterquality. This texture is also used for controlling regular water warps, so I felt justified in using the same name as FitzQuake uses. I've also chosen the same default value for this as Fitz for better engine cohabitation. In this case there is actually no quality gain from chooing a higher value, and I recommend that you just leave it at the default.

Dynamic light updates have been improved with better lighting falloff and faster updating in general; the lighting model looks a little different to GLQuake now, but overall I think it's better.

Brush surfaces have also been changed a little with better vertex buffer locking/unlocking behaviour; there's still huge room for performance improvements with this code, but it is getting better all the time. I'm currently debating what to do with the old gl_keeptjunctions cvar; this was present but did nothing in previous versions, but I've now restored the behaviour. In case you don't know what it does, setting it to 0 (GLQuake's default) will remove some vertexes from surfaces, which gives higher performance but at the expense of the occasional blinking pixel onscreen. I've tested a few heavy scenes and can confirm that it can make things go up to 20% faster (typical scenes will be lower, of course), but is the visual tradeoff worth it? So what should the default be? You decide.

Finally, and now that it seems that on-the-fly window resizing is stable, I've added two cvars - d3d_width and d3d_height - to save out the new sizes. The reason why these are "d3d_" and not "vid_" is the same reason why I chose "d3d_mode" instead of "vid_mode" - these cvars with Direct3D don't coexist peacefully with OpenGL engines. I'll probably also add cvars for saving out the window position too - right now all I do is just center the window onscreen whenever a mode change happens.

So with all of that I'm considering a breaking change to video startup. The current mode list is divided in two, with approx the first half being windowed modes and the second half being fullscreen. Under the new way I'm planning, mode 0 will be the only windowed mode and the d3d_width and d3d_height cvars will control it's size; modes 1 and above will be the fullscreen modes. This will actually roughly realign DirectQ with the way GLQuake handled it, and it's also attractive because I can provide arbitrarily sized windowed modes in the menu, so it's fairly definite that I'm going to do it, but the only question is; do I do it now or do I wait until after the next release?

Monday, May 9, 2011

FOV Fixing Up

Currently working on fixing up DirectQ's FOV support for widescreen resolutions; the old code I had was fairly cruddy and hacked around with over time, so it's good to go back and clean things out.

At this point I think it's reasonable to throw a question out in the open: how do people want FOV to work? There are a number of options and things to consider here:

  • A cvar to switch DirectQ's FOV handling back to the way GLQuake did it seems reasonable and sensible.  This can also serve as a last-resort panic button: if things get terminally screwed up for you then at least it'll get you back to something that works.  It may not be great, but at least it will work.  This is currently present (and has been for the past coupla years) and is called "fov_compatible".
  • In relation to this, what should the default handling be?  This is one clear case where I think "the way GLQuake did it" is not a good default; the new method should be the default.  Everyone agree?
  • Correcting FOV for widscreen aspect ratios requires a baseline aspect ratio to derive the values from.  Should this be software Quake's 320x(200-48) or GLQuake's 640x(480-48)?  (The -48 is for the default status bar size).  I favour GLQuake as the baseline here; it might not be the absolutely "correct" baseline derived from the original Quake engine, but people are so used to it that going back to the original just looks weird.
  • Handling of the gun.  Previously I've (except when I've done it wrong) drawn the gun as if FOV was 90 when FOV is >= 90, but drawn it at the reduced FOV otherwise (with the new handling) or just drawn it the way GLQuake did (with the old handling).  Is there any requirement at all to draw it the way GLQuake did for FOV > 90?  I'm thinking this is another one of those cases where "the way GLQuake did it" is actually crap and - this time - should not only not be a default, but should not even happen at all.
All opinions welcome.

Sunday, May 8, 2011

DirectQ Update - 8th May 2011

As is usual when I do an extensive update with lots of things changed, a patch release is looking like it's going to be necessary sometime soon. This is just to cover a few problems that are manifesting on some peoples machines, and to fix up a few things that I'd left out.

That doesn't mean that new features won't be forthcoming. Two for you so far; first one is the return of gl_picmip for you QuakeWorld-look fanatics. Now you can set it to 10 billion and get flat shaded everything in DirectQ too!

The second is optional software Quake mipmapping. What this means in that DirectQ can generate mipmaps in a similar manner to software Quake - only 4 miplevels get generated, and liquid textures are not mipmapped at all. You can toggle this at runtime by setting gl_softwarequakemipmaps to 1; there's no need to restart the map or the renderer to make the change. Combine it with point filtering and square particles for the full effect.

Saturday, May 7, 2011

Release 1.8.7 (2011-05-07) is Out

Here you go.

Friday, May 6, 2011

DirectQ Update - 6th May 2011

Been digging into the timer functions again; I've reverted some of them back to the way ID Quake did things because I started running into some serious precision issues at stupidly high framerates. Overall it became a case of "if you find yourself adding more and more complexity to something, then you're probably doing it wrong and need to row back and rethink", so it's good to have this sorted out before releasing.

Brush surface draw call batching behaviour is now user-configurable (via r_surfacebatchcutoff and a menu option); default behaviour is to batch as agressively as possible (unless there is only one surface to draw, in which case batching incurs unnecessary overhead). On some hardware it may be slightly faster to tweak this parameter.

Dynamic light updates have been reworked for some small extra performance. There are still a few more frames to be pulled out of this, but overall it's already very fast anyway so it's not too high a priority.

...and we're getting closer...

Wednesday, May 4, 2011

MAY THE FOURTH BE WITH YOU

Just some small sanity checking and fine-tuning to be done before releasing; think we're almost there.

Marcher Fortress performance on the Intel 945 is now up to 170 FPS; close to a doubling as a result of recent work on reducing CPU load.

As well as raw performance, there are a few extra features coming through in this version which are worth mentioning.

Gamma adjusted lightmaps, via the lm_gamma cvar. This defaults to 1.0 (no adjustment), and may be useful for tweaking brightness in cases where you don't want to adjust your global gamma (e.g. if running in a windowed mode). Note that it only affects lit objects (solid surfaces and MDLs); sky and liquids are not gamma adjusted using this method, and nor are 2D GUI textures. Try it and see.

Mousewheel support in the menus and console. Pretty rudimentary, but all the same. The mousewheel now scrolls the console properly (a long-outstanding request) and will also scroll through menu options.

FitzQuake-compatible menu/console/status bar scaling. (Updated) - this is now more or less fully compatible with Fitz for peaceful engine coexistence (which is very important). DirectQ's old gl_conscale cvar still exists and still maintains the old behaviour if you prefer that. For layout reasons DirectQ doesn't allow a virtual size below 640x480 using either method.

Crappy edges around textboxes have been fixed. ;)

More news as it happens.