Tuesday, June 30, 2009

More Features Gone

I've removed both centerprint logging and fading from the engine. The latter is just eye-candy, but the former is something that a lot of people like to have, and that may be viewed as an essential feature.

However, both features interfere with mods that are already out there; those that use centerprints from QC to display persistent stats or custom menus on-screen. In the case of logging, your console just fills up with logged centerprints and as such is essentially useless. In the case of fading, persistent centerprints fade to a dull grey.

I'm hoping that this is temporary, and that I can find a workable and stable solution, but it is an interesting case. Fading I'm not too worried about, but logging is one where a genuinely useful feature that people want breaks prior art.


EDIT:

I think I've found a viable solution for logging, which is to only log a new centerprint if the previous one has already been cleared, so that's gone back in. There are still certain cases where it will miss on logging a centerprint - two different centerprints in quick succession, for example - but rare missed centerprints are preferable to none at all.

Monday, June 29, 2009

Oh sRGB I Hardly Knew Ye

I've just removed sRGB support from the engine.

Ye haven't an arm, ye haven't a leg, hurroo, hurroo
Ye haven't an arm, ye haven't a leg, hurroo, hurroo
Ye haven't an arm, ye haven't a leg
Ye're an armless, boneless, chickenless egg
Ye'll be having to put a bowl to beg
Oh Johnny I hardly knew ye.
I'm deeply saddened to see it go, but in the end the deciding factor was that it just did not play nice with alpha blending. The underwater polyblend was one example, but there were others.

The D3D SDK says it all:
Ideally, hardware should perform the frame buffer blending operations in the linear space, but hardware is allowed to perform it after the pixel shader but before the frame buffer blender. This means that the frame buffer blending operations that take place in sRGB space will produce incorrect results.
In other words, the end results can vary from driver to driver, and D3D allows this to happen as "correct operation".

There are a number of possible options now:
  1. Switch between sRGB and linear colour space as required, i.e. switch back to linear before performing any blending operations, then back to sRGB when finished. I've already tried this and the impact on framerate is horrendous.
  2. Bake sRGB translation into the textures at load time, probably the easiest method, but siwtching sRGB on or off would then require a vid_restart. The visual quality would be inferior too, but I suspect not that much so.
  3. Add a lookup to every pixel shader to do the translation. Not so certain that I like the idea of this one, as once again performance would suffer and anyone who wanted to use their own custom shaders would also have to support it.
  4. Accept the frame rate loss and do the state switching as required. This is just included for completeness sake: it's not even close to the table for consideration.
I'm pretty certain that I'll want to bring it back at some time, and of the 4 options above, the second is probably the best compromise, so unless something else comes to mind that will likely be the route chosen. But it won't make it to 1.6.3 as it would hold up the release and I've other things to be working on.

Of Indexes and Index Buffers (Slight Return)

This is what we get when we load The Marcher Fortress with indexed rendering enabled.



Been a long time since I've seen one of those.

Anyway, two choices were available: fixing it or removing indexed rendering. I chose the latter, but I think a bit of explanation is required for why.

  • Some cards have a maximum index value lower than that required for a Quake map.
  • Using indexes means having to decompose world geometry from trifans to lists, in order to get any advantage at all from geometry batching.
  • Any change of model, texture or lightmap during the render will break geometry batching; worst case scenario is only a single surface is rendered, and even then it's worse than not using indexes owing to the fan to list decomposition.
  • Quake geometry meant that the index buffer had to be composed in software per-frame, further removing any advantage from using it.
  • In order to avoid having to have two world render functions I had rolled the entire thing into one complex function with a lot of condition testing per surf, which introduced it's own overhead.
  • The end result was not really much faster than not using indexes, and was actually slower on some cards.
In summary, indexed renderng was far from being all it was cracked up to be. In fact, I've always been of the opinion that unless your geometry naturally falls into really large tristrips that share properties for each entire strip, and the exact same strips are drawn each frame, you're not going to be able to sensibly use index buffers. Drawing teapots on flat surfaces is fine, Quake is not. I had originally included this as an option for the render as it was faster in some cases, but on balance - and taking the BSOD into account - I'm happier to revert to the cleaner and simpler codepath.

1.6.3 Progress Update

Been a while since I posted a progress update list, so here's what you can expect from 1.6.3 so far:

  • Fixed wrong version number in Window title.
  • Amalgamated readme docs for previous versions.
  • Partially implemented fixed path on world and alias models.
  • Removed d3d_Device->SetTransform calls (I feel silly for having left them in so long...)
  • Added improved state management.
  • Removed TimeRefresh command.
  • Improved effect pass switching; should no longer switch to a pass if nothing is drawn in that pass.
  • Improved BeginScene/EndScene calls to conform with specification and documentation.
  • Removed silly translucency check.
  • Modified vis loading so that decompression is done at load time.
  • Restored anti-wallhack code and added cvar to toggle it (sv_antiwallhack, default 1).
  • Fixed bug where D3D lighting was not being disabled.
  • Enforced correct default states in the state manager and in the renderer.
  • Added enhanced smoke trail particles (r_newparticles 1, default 0).
  • Removed framerate dependency from particle system (and anything else using r_frametime for updates).
  • Removed framerate dependency from dynamic light system.
  • Fixed bug where water surfs, particles and sprites were potentially rendered with an invalid depth bias.
  • Added sv_pvsfat cvar to control how fat the fatpvs is; default 8, set to higher values to fix disappearing models in some maps.
  • Fixed bug where an entity that touches MAX_ENT_LEAFS, none of which are in the PVS, but the entity should be visible, is not transmitted.
  • Added improved entity leaf touch detection as a consequence of the above (functionally identical to old way).
  • Removed restriction on number of leafs an entity may be in.
  • Adjusted brushmodel loader so that leafs and nodes are in contiguous memory.
  • Optimised surface index loading to make map loading substantially faster.
  • Substantially optimised other areas of map loading; maps should now load about 10x faster than 1.6.2
  • Optimized memory allocations in a more general manner for faster map load times.
  • Reworked Cvar_Set (cvar_t *, float) and Cvar_Set (char *, float) to prevent value round-tripping through a char *.
  • Added gl_clearcolor command (and _gl_clearcolor cvar, not intended to be used directly) to control background clear colour; default 0 (black).
  • Added r_lightscale cvar to help rebalance really bright or really dark maps (default 1).
  • Fixed hard crash when you try to run a mod that uses rogue/hipnotic/quoth content without having r/h/q in your gamedirs.
  • Added support for DXT1, DXT3 and DXT5 texture compression.
  • Moved hud_overlay, hud_sbaralpha, hud_drawsbar and hud_drawibar to CVAR_ARCHIVE for performance boost.
  • Changed sv_antiwallhack default to 0.
  • Allowed TAB key in demos to display current scores (this is hard-bound, even if you have TAB bound to something else).
  • Restored automap code (toggleautomap command, bind to "m" or something) - not very robust or performant right now.
There are two bugs I want to kill off before releasing, one is a hard crash in CL_NewTranslation when the game changes (I suspect nearly all of this function may be a WinQuake relic - the latter half certianly is - and therefore not needed, but I need to confirm), the other is a fairly bad excessive darkening in the polyblend overlays.

Sunday, June 28, 2009

Fast Engine Is Fast

I've just hit the mythical 120 FPS/timedemo demo1/sucky Intel graphics mark I was talking about a while back, blown right through it, and am now cruising towards 130.

The secret was compressed textures. I'd known for a while that I had a HUGE bottleneck in my world render, but this has totally loosened things up.

The downside is that I now need to rework a lot of my texture loading routines, but oh well, I'll get there. Small price to pay for the added speed.

Friday, June 26, 2009

DirectQ Memory Allocations

DirectQ doesn't use the classic Quake "Hunk", but instead has it's own memory subsystem. The reason why I wrote this is because I wanted to remove the dependency on setting a specific value of -heapsize; in fact the -heapsize command-line option has been completely removed from the engine.

The advantage of this approach is that the engine can load any map or mod irrespective of memory requirements and with no special handling needed. It only allocates as much memory as is needed, so if a map that requires 60MB is followed up by a map that requires 8MB, things don't suffer for the 8MB map.

One disadvantage is that certain parts of the game code may assume that all allocated memory is contiguous. This doesn't apply to items that used to go in the cache, like alias models, but anything that went through the old Hunk_AllocName function might need it. I'm pretty certain that leafs and nodes need to be in contiguous memory, for example, although somehow I've gotten away with it so far.

Another disadvantage is that it's prone to the "lots of small allocations" syndrome. This is a great convenience for development - just allocate the memory you need when you need it without having to worry about anything. It's not so good for production, however; lots of small allocations are SLOW.

So far, as I indicated a few posts earlier, I've batched up some of these allocations, but that's something that I've consciously been doing outside of the memory subsystem. This is a wrong approach - the memory subsystem should be able to cope with this internally and in a totally transparent manner.

Where all of this is leading is that I'm going to do a minor rewrite. The interface will remain the same, but internally I'm thinking that I'm going to take a step both backwards and sideways, and move it to a "multiple mini-Hunks" approach. This will have the advantage of retaining the lack of a hard limit, but also of being able to allocate memory faster owing to having initial allocations of large blocks that can fill up.

Disappearing Posts

Apologies if you can't see this, but it seems as though recent posts don't show up here until after I sign on. Not certain what's happening, but I'm keeping an eye on it...

Thursday, June 25, 2009

Map Loading Optimisations

Anyone who's ever loaded a really huge map in DirectQ will know that it's SLOW. That's now fixed and maps will load substantially faster. The speed increase goes up as the map size increases; ID1 maps are noticeably faster but not so much so, maps like Marcher can be about 10 x faster, and the real monsters are even faster again.

There's no tricksy/unreliable voodoo here, just some applying some common sense to my memory allocations.

Other news: I've removed the old 1.6.1 download, both from the links to the right and the CodePlex site. When 1.6.3 comes out, I'll likewise be removing 1.6.2 about a week afterwards.

Next Release Status

I've decided that the next release will be 1.6.3 rather than 1.7, and will likely happen sometime next week. There are now enough fixes and optimisations in the current codebase to warrant this.

One thing that will remain unfinished is the particle system. Classic particles are still there, and they rock (I likes me some classic particles), but there will be a partial implementation of "enhanced" particles. It's currently only on smoke trails, but I'd like to also get blood in. If I don't, I'm not going to worry too much or delay the release on account of it.

Wednesday, June 24, 2009

Disappearing Brushmodels

I've had a bug reported where brush models can occasionally disappear. It was originally in APSP1, but I've also observed it myself in the Marcher Fortress. In both cases these are extremely large brushmodels (such as an extremely tall elevator).

Going back to the original 1.0 of DirectQ reveals that the bug also manifests there; I would assume that it's likewise present in other engines, but I haven't fully tested.

The bug is PVS related; sending all entities to the client irrespective of PVS resolves it. Expanding out the fatness of the fatpvs also resolves it, although I don't view this as a solution, as the amount varies per map, per entity, and depending on the distance from the client POV. However, this did get me to add an "sv_pvsfat" cvar to enable control over this factor, which may be a handy feature for mappers.

Further digging reveals that the actual cause is the entity being in too many leafs; if we hit the value of MAX_ENT_LEAFS before we hit any leafs that are in the current PVS, we'll see the symptoms. This is one internal maximum that I hadn't yet addressed in DirectQ, so it's now becoming a priority. The default of 16 is normally fine for most models, but when one goes beyond basic ID1 functionality in a map, abberant circumstances such as this occur. Nonetheless, it's good to push boundaries sometimes, and the end result will be a more capable DirectQ.

Simply bumping the value of MAX_ENT_LEAFS is one method of handling this, but I think I'm going to go for something a little more flexible (and less wasteful of RAM).


It's fixed.

As a side effect I was able to remove the MAX_ENT_LEAFS hard limit entirely, and make the entity-leaf-touch routine a lot more optimised.

I love bugs like this that result in good stuff all round.

Tuesday, June 23, 2009

The Sized-Down Refresh Window

To keep with the look and feel of classic Quake, DirectQ fully supports the original status bar (as default, although in a much rewritten state to allow for greater flexibility) and the ability to reduce or increase the size of the 3D refresh window. This was a performance gain in the software days: sizing down the 3D refresh window meant less pixels on the screen to update per frame, meaning faster framerates. However, hardware acceleration changes all of that.



In order to use a sized down 3D refresh window, like in the screenshot above, it is necessary to construct a 3D viewport around the dimensions of the 3D refresh window size. This is smaller than the full framebuffer, and as a result it is slow. Not horrendously slow, but if we change our HUD layout slightly we'll get the picture.



Here I've gone into the HUD menu and set "Draw As Overlay" on (or hud_overlay 1) and the alpha of the status and inventory bar components to about 0.6, thus mimicking the DarkPlaces HUD. Now the 3D refresh window is the full size of the framebuffer, which is also the size of the 2D refresh window.

The end result is that things run quite significantly faster. Despite a larger 3D refresh area, the driver is able to optimize better, and we get something in the order of a 5 percent speedup - even by comparison to the default viewsize 100. 5 percent might not seem like much when you're running at high framerates, but when running at low framerates it can be the difference between a choppy game and a reasonably smooth game.

With all that in mind, DirectQ will still be retaining the classic look as the default, but don't be surprised if you see some of the HUD options moving into your config.cfg in the next release.

Monday, June 22, 2009

Framerate Dependencies

Out-of-the-box Quake has a number of things in it that are dependent on the framerate; perhaps most notoriously the physics code. Less well known however are effects such as particles, and anything else that you may have added that's calculated from the same frametime.

The next release will fix some of these. I'm in two minds about fixing physics; on the one hand it seems obvious and logical (why should someone with a high or low FPS be at an unfair advantage?), on the other hand it is a gameplay change. Perhaps something for a server-side cvar?

Speaking of the next release, I'm not too certain yet when it will happen. My original plan was to get 1.6 out, then go back and finish off some things on MHColour. However, DirectQ is where my itch is right now, and I'd much rather focus on something I want to do (I leave the "need to do" stuff for the day job).

To summarise the current position: yes, coding on the next release is underway; yes, a good block of changes have already been made; yes, it could be released sooner rather than later as a 1.6.3 (rather than a 1.7).

However - and it is a big however - I think 1.6.2 got me where I wanted to be with the previous block of changes. It's stable, and can stand as-is as the final 1.6 release; no further update is required there. The frequent release policy worked out well, but now I'm back to thinking that I want something more substantial next time around. Something like the long promised but never delivered new particle system.

Internally I'm calling my current codebase "1.6.3", but I'm holding fire on any final versioning or release dates until I get a better idea myself of where things are headed.

Sunday, June 21, 2009

An OpenGL DirectQ?

This seems to be a time when crazy ideas come out, but it's just occurred to me that - given the degree to which I have now abstracted out a lot of the rendering in DirectQ - porting it back to OpenGL should not really be that difficult. I'm not going to do it, of course, but if anyone really wanted to, all of the rendering code now lives in the Effect class in d3d_hlsl.cpp; aside from that it's really just state changes, setup and texture loading elsewhere.

What would this give you? A port back to OpenGL wouldn't give you multiple platform support; there is just too much Windows API in DirectQ, well beyond what was in the original Quake. The sound code is exclusively DirectSound, the XInput and DirectInput code would need to go, and there have been some changes to the network code (to make it more happy with the new cvar system, primarily), meaning that you just can't drop in the original code files and expect it to work. The issue of porting to C++ would need to be addressed also; I don't think there was a single code file I didn't touch when doing this; if nothing else, switching from qboolean to bool would disrupt a lot.

But it could be an interesting thing for someone to do, and if anyone wants to take my code and port it back, or use it as the basis for a D3D implementation of their own Q1 engine, they have my blessing.

Meanwhile, I've come across a very good paper discussing differences between the two APIs in a pragmatic real-world setting; it's more focussed on CAD applications than on games, but a lot of it connects perfectly with my original reasons for porting to D3D. This extract sums the whole situation up very nicely.

When we use OpenGL, we have found over the past many years (and still today) that we need to invest in a large, significant amount of QA that simply verifies that the OpenGL graphics driver supports the OpenGL API on the level that we use (which is actually rather dated, to be consistent with OpenGL GDI Generic, from circa 1997). In spite of the fact that we do not use any new fancy OpenGL extensions and use OpenGL almost on the level of 1997 graphics HW technology, we routinely encounter OpenGL graphics drivers that do not work correctly and as a result, we have our extensive OpenGL graphics HW certification process which involves a serious amount of testing effort on our part that does not actually test Inventor, it merely tests the OpenGL graphics driver. In fact, we currently have 44 (and counting) OpenGL "Workarounds" that we can invoke in our OpenGL graphics layer to "workaround" various OpenGL graphics driver problems.

The opposite situation exists for Direct3D. If a Direct3D graphics driver is Microsoft WHQL (Windows Hardware Quality Lab) certified, then it works correctly as far as Direct3D itself is concerned. This is the purpose of the WHQL testing and certification suite at Microsoft, to enforce compliance with the Direct3D specification. No such process exists in the graphics community for OpenGL and the quality of OpenGL graphics drivers suffers greatly as a result. With Direct3D, our QA team can focus on testing _our_ code and finding defects in _our_ graphics code, instead of having to spend all their time just verifying that the graphics HW vendors have done their job correctly to produce an OpenGL graphics driver that actually works.
I'm not so sure about the "no such process exists in the graphics community for OpenGL" remark: there is conformance testing, after all; but I have personally felt for some time that OpenGL conformance testing is not as tight as it should be, perhaps to avoid antagonising a certain major GPU vendor who is known to produce poor OpenGL implementations? Whatever, I'm going to let it stand without further argument or comment.

You can grab the full thing here.

Thursday, June 18, 2009

Implementing a fixed pipeline path

Yesterday I started on implementing the fixed pipeline path in DirectQ. The good news is that - so far as I can gather - Direct3D is very forgiving in terms of attempting to call programmable pipeline functions where it's not supported, and that the behaviour is to silently fail rather than kick and scream. This means that I can reuse a lot of the codebase without modification, in particular the major part I was concerned about, which is the Effect loader and compiler.

However, it's looking as though I may need to restructure some of the code in order to get it done without creating a mess along the way. So the next few releases will likely contain successive evolutions of the fixed pipeline code, but it won't be operational, and will in fact be hard-disabled.

One aspect I'm not too happy about here is that I had expected some significant speed gains running through the fixed path, especially with respect to the Intel chip I'm testing on. However, while there was a gain, it was quite small (a coupla frames) and may be down to transient conditions arising elsewhere. On the bright side however it means that my previous intention of providing an option to switch to fixed if you wanted to (say if you have a card that supports pixel shaders but runs them poorly) need not be followed through on. This makes things a lot easier for me.

So far I have most of the world and alias models implemented, but I need to go back to the 1.0 code and see how I'd set up the texture blends there. The fixed pipeline in Direct3D is a very complex beast, with a lot of options available, and even the simple ones requiring considerable setup. It is however much more flexible that OpenGL.

The vertex handling will remain in the programmable; it's just so much more simple than using fixed, and Direct3D will handle it very well in software emulation, with approximately equal performance (it may even be faster than fixed).

Wednesday, June 17, 2009

Software T&L Support

DirectQ supports cards that don't have hardware T&L, and runs quite well on them, but it does go through Direct3D's Vertex Buffer interface for them. I'm starting to implement a pure software alternative for such situations, so that we can get rendering without the overhead on these cards. Part of the eventual goal here will be to also provide a non-HLSL option for situations where a card may not have the required Pixel Shaders support (any card - even a TNT2 - will run Vertex Shaders very efficiently in Direct3D).

So far I've implemented removal of the VBO interface on the world, and it gets a few extra FPS (about 1-2, to be honest). The downside is that I now have 4 rendering paths on the world (VBO with indexes, VBO without indexes, SW with indexes and SW without indexes), but I do believe it's worth it.

This is something I've been conscious of ever since I first started moving everything to HLSL - it's all well and good, but there are cases where the solution isn't optimal. The culmination of this current effort will be a DirectQ that will run on just about anything, which will be a nice return to the state of things as they were in 1.0 and 1.1, but with the added muscle of HLSL for cards that do support it.

The real bonus here is that the rendering interface I wrote for 1.6.1 will easily and cleanly support this, and without too many code changes. All I need is to set up the correct texture stage states and disable the pixel shader. There might be a bit of grief in loading and compiling the effect files though; I'm not certain if an effect file that contains a pixel shader can be compiled on a machine that doesn't support them.

The only cloud on the horizon of this is that I currently don't have a machine that doesn't have the required PS support (i.e. none), so I can't test it too well.

Tuesday, June 16, 2009

Release 1.6.2 Is Out

The steam train continues with release 1.6.2; this updates fixes some bugs and adds some new functionality (some of which is - admittedly - at the experimental stage, but you can't move forward by walking in reverse, can you?)

Get it from the right!



Full list of changes:
  • Fixed r_64bitlightmaps 0 on a device that supports 64 bit textures gives 4 x brightness.
  • Fixed crash in timedemo demo3 (also potential crash in a number of other scenarios).
  • Added sRGB Color Space option (default off) (r_sRGBgamma cvar).
  • Removed case sensitivity from cvar tab autocompletion.
  • Restored alias models to correct place in draw order.
  • Added sRGB support to linear colour operations.
  • Added skyalpha support (r_skyalpha).
  • Removed loading disc code.
  • Switched world surface rendering to front-to-back.
  • Resolved various bugs with render states being lost following a device reset.
  • Resolved MAJOR cause of slowdown on cards that support hardware T&L.
  • Fixed hipnotic particle field was missing.
  • Reverted alias model interpolation back to the vertex shader.
  • Removed state blocks.
  • Added index buffer for world surfs (r_useindexbuffer, default 0), may be faster on some cards.
  • Tightened up on video startup by adding a check for fullscreen mode support and texture creation to each format type.

Of Indexes and Index Buffers

If you were to read the MS documentation, you would be forgiven for thinking that there is a single One True Way to write Direct3D code. This involves using Vertex Buffers and Index Buffers for everything, and submitting huge batches of (perfectly depth-sorted) primitives per call.

All well and good, but here in the Real World things don't always work out like that, and sometimes one has to make compromises, or forego what is supposed to be a performance optimising technique. It's all about give and take: for example, we already know from years of GLQuake work that sorting surfaces by texture is more important than sorting them by depth, as texture changes are more expensive than overdraw (at least with a narmally vis'ed map).

Likewise with Index Buffers. The first lesson you learn is that Quake surfaces fall naturally into triangle fans, so in order to submit more than one surface per call you need to decompose them into individual triangles. Then you learn that for each state change you need to draw all surfaces that have been accumulated so far. Around about here you also learn that you can never achieve optimal vertex reuse as even in cases where the vertexes are the same, the texcoords may be different, the textures may be different, the lightmaps may be different, and they may belong to different models; and that the cost of setting it up at load time is prohibitive. Finally you learn that you need to fill it from scratch each frame, so the potential gains from storing it in hardware are effectively lost.

DirectQ now supports the optional use of an Index Buffer for rendering world surfaces. I say optional because it's actually slower in my tests so far than rendering direct from the Vertex Buffer, so therefore it's disabled by default. You can enable it, and it might run faster on your card, but then again it might not. View it as a feature that's there and available for now, but that might be removed in any future release.

For what it's worth, the current rendering bottlenecks in DirectQ lie in the world render (in particular the main texture changes) and in some state changes between render paths. While I could implement optimisations for other items (such as alias models), the current cost of drawing them is so small (in the case of alias models it's about 5 FPS) that it's not worth the effort.

Random Direct3D Notes - Part of an ongoing series

Just making these notes to consolidate a lot of the things I wish I knew, or I wish were documented in a more appropriate and/or accessible place, and which I got burned by during the course of developing this engine.

D3DCREATE_FPUPRESERVE is needed for everything aside from kiddies games that don't depend on high resolution timers.

32 bit index buffers are not available on all hardware, and even if available don't always have the full 32 bits of resolution available (try finding that in the documentation... it's there all right, but just try finding it...)

Some cards run faster with anisotropic filtering enabled; try around 4 x and adjust up or down from there.

Don't use the Direct3D gamma interfaces: use GDI instead; they're well known, documented better, and work in both windowed and fullscreen modes.

There's virtually nothing in perf difference between DrawPrimitive and DrawPrimitiveUP when drawing strips or fans.

Using a viewport smaller than the full resolution can hurt performance.

Direct3D supports the full vertex shader interface in software very efficiently.

There must be a good reason somewhere, somehow for supporting every common (and a few uncommon) image format aside from TGA.

Sunday, June 14, 2009

The Need for Speed - Part 2

Version 1.6.2 will run up to twice as fast on a card that supports hardware T&L. If you have a card that doesn't (mostly Intel laptop cards from a coupla generations back, these days) you'll still get a small few extra frames but nothing special.

Further updates: the NVIDIA "black screen" bug seems to have been fixed; I'm not sure because (1) I never got it on my NVIDIA machine, and (2) even with the "fix" in I still don't see any different. I've replicated the steps to reproduce it exactly, but nothing happened. The sole remaining difference is that it happened on Windows 7.

Another bug fix: I had moved a lot of render states to being dynamically set with a check per frame, so on a game or mode change, the states ended up getting lost. This included texture filtering and a few other things. That's all fixed now.

sRGB Support is Finished

And quite beautiful it is too. I've ended up doubling the colour intensity again if sRGB is enabled as it tends to darken things quite a bit, but as I'm using shaders for everything that's easy enough to do.

The only thing that might cause trouble is that Direct3D specifies that sRGB scaling is not applied to the alpha channel, but I've found there are times that it's needed. I've written a transformation lookup for linear colours so right now I just pass alpha through that where required.

I know that there are bugs I have to fix, but it's also nice to have something else new and improved in each release, and this is genuinely worthwhile, rather than being gratuitous eye-candy for the sake of it.

Saturday, June 13, 2009

Updates for 1.6.2

I'm going to continue with the policy of releasing more often, so here's what's happening with 1.6.2 (hopefully to be released next week) so far:

  • Fixed r_64bitlightmaps 0 on a device that supports 64 bit textures gives 4 x brightness.
  • Fixed crash in timedemo demo3 (also potential crash in a number of other scenarios).
  • Added sRGB Color Space option (default off).
  • Removed case sensitivity from cvar tab autocompletion.
  • Restored alias models to correct place in draw order.
The sRGB colour space is particularly nice to have; the old 8 bit textures really come alive using it. Of course, it's a change from the original Quake look so it defaults to off.



It looks a little darker here but that's just on account of use of hardware gamma in-game. One thing that still needs to be finished off is cases where a texture is modulated by a colour: the colour is retained in linear space, so it turns really bright.

Friday, June 12, 2009

1.6.1 Bugs

Quite a few bugs are arising in 1.6.1, moreso in my own testing than in anything being reported to me.

To summarise: I had started working on 1.7, and crash bugs were beginning to come in more frequent. There always comes a point where you need to consider if you should be continuing down the path you're on or pull back and reconsider. It turns out that the cause is the implementation of the std::vector class, which I had been using to manage a few items in the game. In a lot of cases this is fine, but in cases where it is storing a pointer, and the pointer is referenced elsewhere after allocation, a subsequent allocation may cause the pointer to become invalid. My own fault really for not reading the documentation fully.

So a release 1.6.2 will come sooner rather than later.

With hindsight one of the causes of this was the long delay between 1.5 and 1.6, which coupled with the major backend changes in 1.6 caused a situation where a lot of new code went into the wild all at once. A better approach would have been to adopt a more incremental release pattern, with - say - 1.5.1, 1.5.2, 1.5.3 and so on for each batch of changes. This would have made it easier to track down bugs as well, as there would have been a smaller list of diffs between each version.

Lesson learned.

Another bug reported is a black screen when going direct from 1.5 to 1.6, which seems to be rectified by issuing a vid_restart command. Unfortunately in a lot of cases I'm not able to reproduce reported bugs, as they don't happen on the machines I test on. In this case, going direct from 1.5 to 1.6 on one of my test machines caused no problems at all.

Overall, and in spite of that, I remain pleased with the 1.6 codebase.

Thursday, June 11, 2009

Release 1.6.1 Is OUT!

Release 1.6.1 with the promised Draw_TileClear fix has just been released.

I've moved the entire project from SourceForge to CodePlex. It's been no secret that SF has been frustrating me immensely over the past while, and it's now at the stage where I feel it's just too unreliable and arcane for my liking. I'll be leaving the old SF page up, but all new releases will be on http://directq.codeplex.com/.

I've also posted copies of most of the previous releases to the CodePlex page, and I'll be updating the links list to point to these. The only ones missing are 1.1 Emergency Refresh 1 (it was superseded almost immediately by an Emergency Refresh 2, which I've named 1.1.2 here) and 1.6 (as 1.6.1 supersedes it).

Draw_TileClear problem is fixed

Release 1.6.1 will be available as soon as somebody puts SF's rattle back in it's pram.

I might make a few more minor changes while I'm waiting on that. Once it is released I'll be removing the 1.6 link.

Despite saying that I'll make more changes, there will be no need to download 1.6.1 if you already have 1.6 and the Draw_TileClear problem doesn't bother you.

Release 1.6 is Ready

It's compiled, it's packaged, it's ready to roll.

Unfortunately, SourceForge's file upload interface is currently suffering from a terminal case of being utterly crap again. Basically it's broken broken broken, so I can't release it there just yet.

While I'm waiting for them to de-crapify things, I've just stuck it onto RapidShare here: http://rapidshare.com/files/243413048/DirectQ_1.6_2009-06-11.zip

LATE BREAKING NEWS

I've just become aware that I released this with a semi-non-functional DrawTileClear function. This is the function that's responsible for filling the area around the status bar, as in the shot below:



This was unintentional, but I'm going to leave it be for the RapidShare release. When SF gets it's act together I'll release a fixed version then.

Wednesday, June 10, 2009

DirectQ Update

A feature I had been holding back for release 1.7 just made it into 1.6: a fairly heavily restructured rendering backend with more states moved over to HLSL. Texture objects are entirely HLSL now, so you're no longer limited by the number of fixed pipeline TMUs you might have. Certain texture states (clamp or wrap) are also HLSL managed.

So 1.6 has been delayed even further, but I seriously did want to get this in, as some of my code (instanced brush and liquid, primarily) had become incredibly messy. They're now nice and clean, so that's a result.

So as for the release, I'm thinking that I'm getting to the stage where I'm just going to release what I have shortly. There's a lot more work I have lined up, and if I continue on into something else, it'll never be released. It's been long enough since 1.5, and all of the recent issues have now been resolved.

I won't give a more definite timeframe, I've been caught out by that before, but it will be soon.

Tuesday, June 9, 2009

The Need for Speed

Brief summary: DirectQ has been developed on Intel hardware, primarily because it's a good lowest common denominator so far as performance benchmarking is concerned. If it runs well on Intel, it will likely run well on anything; seems to make sense.

The benchmark of choice is the "classic" timedemo demo1. Why? It's the first thing you see when you run Quake, and it's a good real world performance test; not too much of everything, but just enough to push the system a little.

Now, I'm currently getting about 106 FPS on it. Pretty good all things considered, but the problem is that I know the engine is capable of hitting 120 FPS without too much modification and with no visual degradation.

Why push the speed more? Simple reason is that extra speed in normal operation will translate into headroom for sexy effects for those who want them.

So the real problem is - and once again we're back to my old favourite complaint - while I could set things up so that I hit closer to that 120 FPS figure, and while I know exactly what needs to be done to get there, Microsoft's documentation is sadly lacking in providing me with the single last crucial bit of info I need in order to do it. And so we enter the trial and error cycle, and run the risk of making a mess in the process.

One of Microsoft's developers (something old, something new, etc) likes to harp on a bit about people doing the wrong thing in Windows programming. However - and this isn't just D3D, it's endemic across their documentation - doing the right thing involves pulling together info from multiple disparate sources, some of which have no apparent logical connections between them, then standing back and taking in the bigger picture. You have to do all of this yourself, including making guesses at what those sources might be, and heaven help you if you miss one.

Would it really hurt to provide that one last extra sentence sometimes?

Monday, June 8, 2009

Custom Name Generator



This took about half an hour to put together.

As you'll see if you look at the full sized image, I'm having linear interpolation issues with my textbox graphics. A coupla solutions are bubbling under in my mind: the obvious one seems to be to use edge clamping, but with D3D that's a global render state setting rather than a per-texture parameter. In any event it would probably not work with the blend into the adjoining textbox body texture.

The next solution is to carry out some jiggery pokery with the vertexes and texcoords if an alpha edge is detected on a non-mipmapped texture.

The final solution - which I did in the old GL engine - is to revert back to using scrap textures and expand the edges in the scrap block, but I'm concerned this may not work too good with external textures. An alternative derived from that is to not use a scrap but expand the edges, but then we're getting into non-power-of-two territory. However, I feel that one will work if we increase the size to which we resample up to a certain threshold.

This is another thing to be fixed pre-release, as it's a fairly ugly blotch on what otherwise looks a very smooth and clean interface.

Sunday, June 7, 2009

Problems Resolved

OK, two problems appear to be resolved.

Firstly, I did yet another clean build of XP and this time my crash bugs went away on it. I'm assuming now that the problem was caused by something in the DirectQ build I was testing on, but I'm going to dig a little more. If it turns out to be the case that everything is now resolved just fine, I'm going to stick to statically linking the CRT - as I said, the advantage of not requiring the user to do a CRT update outweighs everything else.

Secondly, the NVIDIA water problem is resolved; or at least the cause of it. I had been using partial precision floats in my shaders, which were fine on other hardware, but it transpires that whereas other manufacturers use a 24 bit float at partial precision, NVIDIA use a 16 bit float.

I have yet to decide what the best approach here is, but I'm thinking that what I might do is (1) cvar-ize the ability to switch between full and partial precision, (2) detect NVIDIA hardware at startup, and (3) default to full precision on NVIDIA hardware and partial precision on others. This means that NVIDIA is going to run quite a bit slower than anything else, so a possible (4) is to have an option for applying full precision to warp surfaces only.

Dynamic vs Static Linking to the C Runtime

Historically I've always statically linked to the C runtime, the intention being that I didn't want to force users to the engine to have to download an updated runtime in order to run it. However, recent events (and thanks to Spirit for the suggestion) have uncovered an interesting fact: the engine will always attempt to dynamically link anyway, even if I've told it to statically link.

Anyway, with that in mind I'm starting to wonder if I should revise my preference here. So far as I can see, the sole advantage of statically linking (and it was a big enough one to outweigh the advantages of dynamically linking) is gone, so I'm thinking of switching over. Advantages of dynamically linking seem to me to currently include:

  • Being able to use updated versions of the runtime, which may contain important bugfixes.
  • Being able to mix and match managed and native code in a single executable.
  • Reducing the executable size.
  • Better compatibility between the core engine code and whatever it is in there that's dynamically linking.
So overall, this seems like being something I might do sooner rather than later.

New Links Sections

I've added two new "Links" sections on the right-hand side. The first one is a list of required stuff you'll need before you can run DirectQ successfully, while the second is a list of optional extras. These will be added to as time goes by, and as I discover things.

I'm going to write a bit about the "Required Stuff" links here, as some of them need some explanation. If you run DirectQ and it crashes, this is the first place you should look. Now read on...

DirectX Update
Even if you have DirectX 10 available, you will need updates to DirectX 9 in order to run DirectQ. Microsoft have shipped an older DirectX 9 runtime with both Vista and Windows 7, and as a result there are certain DLLs missing from it. I could have linked to an older version of DirectX 9, which would have avoided this, but I wanted to use the extra features in the shader compiler from later versions, so that's just the way it is; sorry. In any event, you will probably need this for other DirectX 9 games too.

Quake
I've linked to IDs "Quake" page here, but that just provides the old DOS shareware Quake, which is awkward to install on a modern OS. As soon as I find a link for a more reasonable updated version, I'll change this link.

Visual C++ 2008 Runtimes
In theory you shouldn't need this, as I statically link to the runtimes, but in practice it appears to be required. DirectQ might run successfully without it, and in fact you might be able to go quite a long way without it, but at some stage you're going to crash.

NVIDIA Users

OK, I've established that the DirectQ liquid shader looks ugly on NVIDIA cards under XP too, so it's not just a Windows 7 driver bug causing this.

If you have an NVIDIA card please accept my apologies; it shouldn't look pixellated, grainy and jerky. Here's a shot of it on Intel for comparison:



This is something I'm going to try fix before releasing.

Tuesday, June 2, 2009

Crash Bug Disaster City!

As I feared, as soon as I installed my dev tools the bugs went away. Sigh.

It's obvious that some system files got updated during this process, so it's not really gonna be possible to test this properly (don't fancy doing another XP build either).

Got Crash Bugs If You Want Them

Just blew away my Windows 7 install and reverted it back to XP SP3. First thing I did was install Quake and test out DirectQ, and boy do I have some pretty bad crash bugs.

Now, hopefully it's not that serious. This was a fresh Quake install without my normal small army of extra content and so on, so I'm reckoning that some of the new code (I'm suspecting the menu population code for maps, demos, etc) is misbehaving a little.

Anyway, I'm bringing Visual C++ 2008 Express and the Direct X SDK onto the new XP build to try some debugging. Hopefully it won't be one of those weird cases where the bugs go away once the devtools are installed (bleagh).

Bottom line though is that it's going to end up being an even more stable and solid engine as a result of all this, so it's good news in the end, but the matter of releases is currently on hold.

Monday, June 1, 2009

Release 1.6 is Imminent

Unless I get any totally unforseen major bugs, 1.6 will be released this week.

I'm already looking ahead to 1.7; there are a few things after coming up, which are related to code structure and the like, and which are becoming necessary to be done, but which don't affect the general usability of the engine. Including them in 1.6 would unnecessarily delay the release, but they are needed in order to move forward, so they're going into 1.7.

I might also include the alternate particle system, but that almost happened for 1.6 and ended up being removed, so I'm not making any promises.