Thursday, April 29, 2010

Release 1.8.3c is Out

  • Fixed bug where mouse cursor was visible when entering a map and not using DirectInput.
  • Fixed bug where you would be pointing at the sky and wildly spinning when not using DirectInput.
  • Fixed bug where key up events were not being sent for mouse buttons when not using DirectInput.
  • Added vertex cache optimization for MDL files.
  • Fixed lagged keyboard when DirectQ is not the active application.
  • Fixed bug where MP3 music may occasionally loop after 1 or 2 seconds instead of playing the full track.
  • Added capability to stream media container files from the internet.
  • Fixed bug where a TGA texture may occasionally fail to load correctly (e.g. top skybox in 5 Rivers).
http://directq.codeplex.com/releases/view/44519

I haven't included a "Windows 98 build" this time, but I may add one at a later date if there is sufficient demand.

Some of the 1.8.4 work is included in this, but only a very very small part. I like giving these advance previews of what's to come, but I can't go too far with it as some of the underlying systems have changed quite radically and the code I've included here is already starting to diverge from the main 1.8.4 codebase.

Two Stupid Bugs

It's been a long time since I've looked at the input code, what with other things taking my attention. I had however been aware that on my Windows 2000 test VM I had some issues with the non-DirectInput code, but I was assuming that it was the fault of the VM mouse driver as last time I had run through it things worked fine.

Recently however I've started getting reports from Actual Real Other People Out There that the same things were happening to them. Firstly you point at the sky and spin in circles, and secondly mouse buttons don't send key-up events. This is only if DirectInput fails to start or if you have it disabled (using m_directinput 0).

So I switched off DirectInput on my main Windows 7 machine. Bang. Fully reproducible.

OK, it's my fault, I screwed up the input code and you weren't going mad.

The spinning in circles bug was easily fixed - I had a pair of minus signs where I should have had plus signs.

I haven't yet figured why key-up events don't get through for the mouse buttons; but I'm working on it. (Update: got it. They weren't getting through because I wasn't sending them through. Of course. Now, we start off at a weird angle and the cursor is visible when you enter a map. Time to fix those too...)

Once I have this fixed I'm going to release an update (1.8.3c) containing these fixes as well as a few others that are needed, but not badly enough to warrant a patch on their own. I may also roll some of the 1.8.4 code into it; I think either of the non-HLSL warp code or the MDL vertex optimization would be a nice present (so long as backporting doesn't create a mess elsewhere).

Unfortunately I haven't yet been able to track down why DirectInput may not be starting up on some machines; this one is not reproducible (so far). The best advice I can give is the usual - you may have a downlevel version of the DirectInput DLLs (especially if you have Vista or Windows 7) and you should update your DirectX.

Other updates for today

FPS of scenes with high epoly counts have gone up by 5% to 10% on most hardware (i.e. anything from the last 10 years). This is due to reordering of model indexes for better vertex cache efficiency. I had assumed that my previous experiments had indicated that the ID MDL layout was already optimal, but I had just been using the D3D optimization functions wrong. Of course if the documentation was of any use in the first place it would have helped...

Max unique sounds has gone to unlimited. This threw out a few more bugs in the sound code that I'll need to work through.

All of the different types of memory storage in the game have now been implemented as reusable objects. This makes my coding job for anything that needs memory (i.e. everything) a lot easier, meaning that you get results faster.

I've implemented a custom TGA loader. In theory D3D can load a TGA file itself, but in practice it sometimes fails. If you've ever run 5 Rivers you'll know what I mean. The trick is that if a TGA fails via D3D I run it through my own loader, output another TGA in memory (just an RGBA - well, BGRA really - array with a TGA header in front of it), then send that through D3D. Works every time.

Looking at the 5 Rivers skyboxes I note that some of them are strange sizes (770k?) and also that if I open one in a Paint program, then save it without making any changes, it works then. I guess that the D3D loader is throwing a false error at some stage in the process.

Wednesday, April 28, 2010

Updates to the MP3 Player

I've been making the whole MP3 player a lot more robust than it used to be. What I had was based on code I had originally written maybe 6 or 7 years ago, and included in an MHQuake from that time. It worked fine so I just restructured some of it a little, but otherwise it's probably the oldest actual lines of code in DirectQ.

First off is that it now picks up music from your gamedir correctly, using the correct fallback order of gamedir/expansion pack(s)/ID1 for searching.

Secondly is that I've removed the ability to set a music dir via a cvar. Now it's just music/ and sound/cdtracks/ that get searched.

Thirdly is that I've modified the looping code a little. It no longer waits for DirectShow messaging but instead polls the current position every few frames. We all know that polling is BAD and EVIL and MUST BE DESTROYED but nonetheless in a game engine which is going to hog your CPU anyway, normal rules don't apply. DirectShow messaging proved unreliable, polling will work. I do it every few frames so that it won't impact on performance, but in practice even doing it every frame is OK.

Fourthly is that it's now able to stream an m3u or any other type of container from the web.

What it doesn't do yet is play tracks from PAK/PK3/ZIP files. This is just a matter of extracting the file to your temp folder and playing it from there (and remembering to clean up by deleting it when finished).

Monday, April 26, 2010

Fun with gl_subdivide_size

Yes, I really do jump around from one thing to another like that. ;)

The big upcoming news is that the non-HLSL water warp is back and you're going to be able to change gl_subdivide_size at runtime without reloading the map. This only applies if you have HLSL switched off (r_hlsl 0 or deselect "Use Pixel Shaders" in the Video Options menu): HLSL doesn't subdivide at all.

This requires regenerating the vertexes at runtime, but it's only done if the value changes, otherwise the previously generated vertexes are used. It's quite fast to do anyway, so even if you change values as fast as you can type you won't notice any stalls.

A slightly earlier cut of it was messy and ugly, and had some onscreen glitches during the change, but I've managed to bring everything together very nicely so now we can change values really cleanly. There's still a slight shimmer while the vertexes are regenerated if you switch r_hlsl on/off, but it's no big deal and hurts nothing.

It's cool to be able to play around with the value and get a good balance between performance and quality while seeing the results immediately.

Another great outcome of this is that I've now re-integrated huge chunks of my renderer. Currently I have alias models and most brush surfaces using the exact same code for a lot of their stuff. This is really getting me to a place where I want to be and I'll be able to pull off some neat things when I do get there.

Fun with R_RecursiveWorldNode

After my benchmarking of R_RecursiveWorldNode it seemed valid to attempt a few things to address performance losses. The first thing I did was to confirm that what I found was consistent and not just a one-off freak result, so I re-checked and established that we're spending roughly 20% of our CPU time in there on bigger scenes.

This is all CPU time of course, so if you're GPU bound there won't be much we can do about it.

Before trying out a non-recursive solution I optimized R_RecursiveWorldNode until it screamed for mercy, reducing the amount of local state that needs to be saved on the stack and cutting the number of recursions almost in half. Then I did the same to R_CullBox. I gained a few frames from that, which indicates to me that I'm personally either GPU or fillrate bound. Still, it should be good for anyone on slower/older CPUs so it was worthwhile.

Next I tried the non-recursive solution. The trick here was to use a recursive walk of the BSP tree to build a front-to-back array of nodes (without frustum culling) only when the PVS changes, then scoot through the arrays non-recursively every frame (using frustum culling this time).

The end result is that it was much the same speed in smaller (ID1) scenes but quite a bit slower in bigger scenes. This can be attributable to the recursive version allowing for faster accepts and rejects of nodes (if a node is culled then all of it's children are also culled, and likewise for if a node is fully onscreen). There were also issues such as software backface culling of surfaces not working correctly.

The speed comaprisons were made without drawing being done, thereby removing the GPU as a contributing factor, but the observation on backface culling was from a run with drawing to verify correctness. In both cases the tests were made standing still in a single spot so as to rule out the recursive visible node building from the non-recursive version.

The next step seems to be to see what I can do to tune the performance of the non-recusrive version, but I honestly think that equalling the recursive version is a best-case end result here. I'll update this post when I have more info.

I will likely leave both versions in the code, with a cvar (r_usenodearrays) to select between them, and with the recursive version being the default.

Sunday, April 25, 2010

More status updates

Everything is nice and stable and clean with the memory system now. There is only one remaining item which has an upper memory restriction on it, and that's memory used by the current map, which is 128 MB. Bear in mind that even warpc.bsp only needs 30 of this, so you'd need a fairly extreme set of circumstances to come even close to hitting it.

The "advanced menus" aren't going to be too well supported going forward. They had fallen by the wayside over the past few releases, and to be honest the standard menus as they are seem perfectly good enough.

I've achieved one long-standing personal goal in changing the edicts from an edict_t array to an edict_t * array. There is some seriously ugly code in the ID progs interpreter, and I had been holding back from doing it, but I finally knuckled down and after 4 attempts blowing up in my face got it nice and tight. It will now be possible to simplify a lot of the code in there, meaning it will be much easier to find and fix bugs, as well as to add new features to that part of the engine.

I've also been doing some profiling and have found that my previous suspicions about R_RecursiveWorldNode were correct. In large scenes DirectQ is spending up to 20% of it's time in that function. That's quite a serious bottleneck, and while I haven't yet done anything, now that I know I can start taking steps to address it.

Saturday, April 24, 2010

Current Status of Several Items

I've decided to roll back the memory code to 1.8.3 level as I was ending up creating something of a mess with too much handling of special situations going on. What I'm building up from there is looking a lot cleaner and clearer, so it was obviously the Correct Decision.

I'm also withdrawing the 1 TMU renderer from the current codebase. I've been reading the Q3A renderer (for the first time) a lot lately, and I'm quite shocked by the extent to which I've independently reinvented huge chunks of it in DirectQ. This leads me to the conclusion that it's not going to be a huge leap forward to generalise my current code better, in which case I'm going to be able to produce a better 1 TMU renderer. Trying to integrate the code I wrote into any conceivable work moving forward would not be a pleasant experience, and when a feature is holding things back to that degree the best thing is for it to either be removed or deferred.

Several long standing bugs have been fixed. One I'm particularly happy with is a heap-corruption bug in the progs.dat loader that goes all the way back to the days of MHQuake. I had worked around this by putting huge walls around the memory used by this, but now that it's gone I've been able to revert to the simpler, nicer way of ID Quake.

I've introduced an artificial limit of 65536 surfaces visible at any one time. This is not so bad as it sounds; even with r_novis 1 marcher.bsp (a good candidate for a big scene) hits maybe 11000 or so; view frustum and backface culling really does help a LOT here.

I'm also thinking that with my current code it should be possible to merge a lot of surfaces in each node together - they don't even need to produce a convex outline when done, just share the same texture and lightmap. I could easily and quickly eliminate duplicate vertices from the resulting triangle soup, although in practice I'm not certain how beneficial that would be. Preliminary tests indicate a saving of 33% best case, which needs to be weighed against the cost of additional load time (although I have code written to do the duplicate removal in linear-scaling time; takes about 100ms per 25000 verts).

Right now a major timewaster with big maps appears to be the need to run R_RecursiveWorldNode every frame. I know that FitzQuake uses a trick to only run it when the PVS changes, so I'm going to see what needs to be done to plagiarise that.

Finally, I'm interested in hearing from any of the 5 people (to date) who have downloaded the Windows 98 build. Did you download it for use on Windows 98, or did you download it just because? If the former, did it even work? If it didn't, what happened? Did you try compiling and testing the code yourself?

It's no major difficulty to produce such a build, but all the same if I'm going to continue doing so I'd like to know if it would be worth my while.

Thursday, April 22, 2010

Possible DirectQII Update?

I'm still in a semi-experimental mode with some other engines, and one of those is DirectQII. It's a nice engine to fool around on as it's D3D renderer formed the basis for a large part of DirectQ 1.7, and some of the code survives in 1.8 even today. There are a few things I want to experiment with and it's good to do so away from the main codebase.

I've also made some enhancements to the basic code as it was released last year. I'm not a fan of the renderer for the Quake 1/II model formats, so I switched this one over to something similar to what's in DirectQ at the moment (although with a memory pointer instead of a vertex buffer, and without the boosted limits) and was quite pleased on one hand (but appalled on the other) to discover that Quake II's triangle model format is roughly twice as inefficient as Quake 1's (I guess the "II" has to stand for something). Using my vertex/mesh optimization trick I've reduced the vertex submission for these to about one fifth of what it was.

The neat thing about this trick is that it passes it's data through the D3DXOptimizeFaces/D3DXOptimizeVertices routines completely unchanged; in other words it produces an already optimal index buffer layout.

The odd thing about it is that it uses the software Quake II data to generate the final mesh; the GL commands are untouched.

I've also completely generalized the process of converting individual polygons (represented as D3DPT_TRIANGLEFAN) into grouped intexed triangle lists. It did help that there was a core group of 4 or 5 functions that everything passes through which I could check for state changes in and submit batches of polygons from.

Overall I reckon it's maybe doubled the speed of the engine at certain times, and made using a low subdivide size for water surfaces a realistic prospect.

I intend checking for other areas where I can feed improvements back to DirectQ too. The Quake II engine is generally good to do some work with, as it's a relatively minor evolution of Quake rather than a different engine, and a lot of the DOS heritage and general ugliness has been removed from it.

So this is where I get to the "possible release" bit. All in all it really depends on whether or not I make a mess during my experiments. The whole purpose of these experiments is to give myself the freedom to make a mess without needing to be overly concerned about the consequences, so there's a chance that might be the outcome (it happened with my WinQuake experiments). It also gives me freedom to abandon things in a half completed state if feel there's nothing to be gained from continuing. So stay tuned and find out.

More fun with Memory Management

Sometimes when I'm evolving an idea I'll take it through 3 or 4 different implementations (occasionally at the same time) to see which works out best. Sometimes I even go very far with one version of it before realising that it's not a good solution.

Quake's memory management system is a disgrace. This is because it was written for a DOS game (and actually evolved from the Doom code) where it was the only application running on the system and could just grab all available memory. Three different types of allocation needed to be handled in that single block of memory, as well as free operations and moving memory around. The code as it stands in the released source is still largely the old DOS code, and a lot of pain and suffering is inflicted as a result. The fact that you still have to specify -heapsize is only the beginning.

DirectQ has for the last while been dealing with this by reserving a large chunk of address space in virtual memory and drawing down from that as required. This was functionally equivalent to specifying a large -heapsize but only the memory that was actually required was allocated. However it did have the disadvantage that there was still a hard upper limit on memory, and if that ran out you were EFF-YOU-SEA-KAY'ed.

It had to be kept large enough to accomodate all reasonable requirements, but at the same time small enough so that there was headroom for other memory allocations. I had found that 128 MB was a "good enough" size for maps; even warpc.bsp only needed about 30 MB of that, so the other 100-odd was available for other per-map allocations such as particles, entities, static entities, beams, and so on. The remaining 1920 MB of the address space was then available for other allocations, many of which were also squeezed into similar hard-upper-limited pools.

This is all despite the fact that DirectQ has actually removed most limits and can keep allocating memory until it runs out. It's just that it may run out sooner than it should.

For 1.8.4 I've been playing around with various evolutions of the memory system in an attempt to get the best of all worlds. The ideal Quake memory system should be able to create an arbitrary amount of independent buffers, each of which may be indefinitely expanded, and each of which may be easily discarded as required, and preferably in a single operation that releases all memory for all allocations from that buffer. Common objects should be kept together, and allocations should not fall foul of the "lots of small being worse than very few big" trap.

Problems are compounded by the fact that you often don't know how much memory is required in advance, and also that for certain objects the memory must be in a single contiguous block.

Right now I have what looks a lot like a very good solution up and running in a standalone application. I've been hitting it with various random patterns of allocations, and it can quite easily handle 100,000 randomly sized really tiny allocations in a fraction of a second. It's capable of the contiguous memory trick if required, and is also indefinitely expandable (although if it expands the contiguity is broken).

Later on I'm going to port it to the engine and move some allocations over to it to see how it performs in the real world. It should be interesting.

Wednesday, April 21, 2010

1.8.3b Release

Some 1.8.3 bugs have come through, so I've released a 1.8.3b to address them and also to fast-forward some of the more important 1.8.4 changes.

The first bug is in relation to player skins, and the screenshot tells it all:



While it might be a quite effective Gimp Suit, it's not the way Quake should be.

Secondly, if you use chase_active 1 and the camera goes outside a wall the game would crash hard.

Other changes are the removal of beams and temp entities limits, the removal of alias model triangles and vertexes limits, and some more minor fixes. It really should be 1.8.4 on account of these alone, but calling it that would only confuse the version numbering I've already announced, so 1.8.3b it is.

I've also included a Windows 98 build with this. This build is VERY EXPERIMENTAL. It might work, it might not, strange things might happen with it, you might need to install prerequisites, and so on. I really don't know as I don't have a Windows 98 machine to test it on.

If you're running Windows 2000, XP, Vista or 7 you most definitely should NOT use it, and I won't be entertaining ANY bug reports from people who do. This build is for anyone who has Windows 98 and is willing to try an experimental build out, and maybe do some legwork themselves with respect to getting it running.

Get it here: http://directq.codeplex.com/releases/view/44083.

Change to the QuakeWorld HUD

OK, the QuakeWorld HUD is going to get another change from the version as it appears in QW itself. There, the weapon icons are towards the bottom-right of the screen, but I'm moving them to top-right.

The reason why is the Hipnotic extra weapons. QW does not support these, and it's code has nothing to indicate how they should be handled.

Moving to top-right just makes the whole layout task a lot cleaner. Instead of having to detect if Hipnotic is active and shunt the positioning up about 32 pixels (it would actually be 36 because I space them out a little) you basically do absolutely nothing instead and It Just Works.

I had thought about adding an option for top-right vs bottom-right, but there is only so far you can go with the options, and if I start opening that door then everyone will want an option for their own favourite layout and we'll be right back with the mess we started with. "Can I have keys on the left, horizontally spaced by 2, stats on the bottom with no spacing, and ammo in middle-right, please?" The more options you add for little tweaks like that, the more untidy the code gets, and the more likely it is that the options will start to fight each other (I've already had a bad experience with the left-aligned classic status bar). That's in nobody's interest; I get annoyed working on the code and the engine starts doing strange things when you run it. Eventually we all agree that it was more trouble than it was worth, and it gets removed.

So let's try a strange experiment, bypass the painful stages, and start out with not doing it instead.

The same applies to the left-aligned inventory with the QW HUD; I'm not doing it for pretty much the same reason. I guess most NetQuake players aren't familiar with the QW HUD anyway, and many may not even know that it even exists in the first place, so maybe none of this will be that big a deal after all.

HUD Updates

OK, the various HUD layouts have now been done, including all the frag counts and other widgets, for ID1, Hipnotic and Rogue. That Hipnotic weapon layout code is a thing of true evil. Here's what they look like:




Controllable with values of cl_sbar, top-left is 0, top-right 1, bottom-left 2, bottom-right 3.

What I haven't done yet is the left-sided inventory and the left-aligned status bars, but they'll come soon enough.

In other news, while looking for something else I found an old DirectX 6/7 SDK (yes, it includes both). This is a neat thing to have, as it contains lots of handy documentation on things that didn't make it into the 8 or 9 SDKs. That perverse part of me is itching to get me some Execute Buffer action, just to see if they're really as bad as they were made out to be.

Oh, and I didn't find that something else either. If anyone knows a legal way to recover a lost Doom 3 CD key I'd be interested in knowing. (Update: found it! Phew!)

Tuesday, April 20, 2010

More on the HUD

I've decided to add the Quake 64 HUD as well. It was present in an officially released and approved version of Quake, so we must regard it as an official version of a Quake HUD that should be available, even if it doesn't get much love.

Both the QuakeWorld and the Quake 64 HUDs will be very slightly changed from the originals. I mentioned that I'm moving the armor/health/ammo icons to the right of the values, so that it's easier to visually associate the correct icon with it's value. I'm also spacing them out a little more. There seems to be a tendency with Quake to have everything rammed tightly up against one edge of the screen (and tightly up against each other), which I guess was required if you're running at 320x200. Being faithful to the original is great, but not when the original was crap, OK?

The classic status bar will stay completely unchanged, of course.

The way it's panning out it looks as though all of the HUD items may actually end up being user-configurable after all. I'm storing positions/etc in global arrays, but there's no reason why the contents of these couldn't be replaced with e.g. something read from a config file (COM_Parse format, natch). I'll talk more about that later, if I get to do it. It's definitely a cleaner proposition than the old cvar-based method.

Work on the HUD Code

The next step is giving the HUD code a good working over. I've completely removed the customization system, as well as temporarily removed the functionality of viewsize, pending an outcome of this. While it may seem a shame to see it go, it was quite a complex and fiddly thing, and I suspect that for most people it was a case of too many options. Apologies to anyone who loved it, but ultimately it was one of those features that was holding back progress.

The intended result is that there will be 4 basic HUD styles available: None, Classic, Overlay or QuakeWorld. There will also be the ability to select whether or not an inventory is shown (and for QuakeWorld, whether it's on the left or the right). The Overlay HUD will have selectable alpha, and both it and the Classic HUD will be able to be either centered or left-aligned. I'll likely change the QW HUD so that the pictures appear to the right of the values, as when you have a 2 digit stat the standard layout looks like the values are associated with the wrong pictures.

That's a total of 4 options to choose from (down from about 25,000) which will make it more accessible.

Finally I'm going to be using the behaviour of viewsize to work with these. The default of 100 will give you a full HUD (whichever type you use), 110 will hide the inventory (again for whichever type you choose), and 120 will be no HUD. Other values won't be settable. Almost the very same as before, in other words.

This will be interesting to work on, and I'm looking forward to seeing the finished article.

Monday, April 19, 2010

Single TMU Path and 2D Drawing

I've decided to make the single TMU path fully functional, so you'll get dual sky layers, fullbright colours and even a bit of VBO/primitive batching optimization with it. It's been interesting to bring it on, as I had always wondered how performance would compare given the VBO/batching optimizations, versus there being less texture changes, and whether in these times even continuing to have a multitexture path was worthwhile.

Well the answer is in, and the result is that the single TMU path is capable of rendering big complex scenes quite smoothly, but even there the multitexture path utterly decimates it - we're talking about roughly double speed with multitexturing enabled (it's less in simpler scenes, but still faster).

My current reference "engine killer" scene is this one, from ne_tower: about 3,000 wpoly and close on 14,000 epoly:



On my test machine it pulls FitzQuake down to about 7 FPS. DirectQ with -nomtex gets 55 or so, but DirectQ using the multitexture path scales the giddy heights of 100.

Having said that, it may be the case that your hardware is different and that you have a genuine need for running with -nomtex; if so, then it will be there for you, but you really should try the full path first and only use -nomtex if you've proven that you need it.

I mentioned 2D drawing. One minor change here is that I've set the default texture filter mode when the console/etc are at the full scale to no filtering. This completely eliminates blurry and faded 2D graphics, and restores the vibrant colours of software Quake. Because the graphics are being drawn actual size they don't need any filtering so there is no visual degradation from this.

Of course everything is smaller in that mode, but because the graphics (and especially the text) are crisp, clean and legible, they're actually easier to see (and easier on the eyes). It's almost like putting on a pair of glasses for the first time: suddenly everything snaps into focus.

Scaling up the graphics will revert to bilinear filtering, as it's necessary to prevent everything becoming a worse mess, but will also revert to blurry and faded graphics. Nothing I can do about that; contact your local 3D card manufacturer and complain to them about the need for better filtering modes.

This needs some further work to integrate properly with external textures (as I've also needed to change the texture addressing a little), but it's been one of the most worthwhile changes.

It's funny how things change. When GLQuake first came out bilinear filtering was DA BOMB DOOD and absolutely everything had to have it, even to the extent that some of the glaringly obvious failings of GLQuake were overlooked because - oooh! - look at the pretty bilinear filtering! It's only now with the benefit of experience (and that the novelty has worn off) that we can see that it's not always all that it's cracked up to be, and that there are even certain cases where it could be reasonably claimed to be actively evil.

Working on a new DirectFitz

I'm not certain what happened, I just got the inclination to do so.

Anyway, the new version is currently in a state where it will at least run, but there are a few things that need cleaning up here and there with it. The two big changes are that it should run on Windows 98, and that it is extremely fast. Faster than FitzQuake itself on my machine, thanks to some of the primitive batching optimizations being ported over from DirectQ. It still uses D3D8, of course.

It still can't handle big scenes too good, but that's mostly on account of needing to go through a translation layer, but overall performance is quite impressive and it really shows the benefit of switching from glBegin/glEnd pairs to proper indexed triangle lists.

There are a few of what I would call "FitzQuake annoyances" that I've taken the opportunity to "fix". Normally when I do a D3D8 port I try to keep the end result as close to the original as possible, but with FitzQuake a small handful of items have been too much for me, and I find the engine quite irritating to use as a result.

Gun angle will be revertible to the same as in ID Quake via a v_gunangle cvar (a value of about 1.5 works well), mouselooking will be on by default and controllable by a freelook cvar, gl_flashblend is changing to 0 by default, and scr_conspeed is changing to 3000 by default. Nothing too dramatic, but I suppose that arch-traditionalists would foam at the mouth over even those.

I've also fixed the timer for multicore and 64-bit CPUs.

As soon as I release it I'll post a notification here.

Disabling Multitexture in DirectQ

In current versions of DirectQ you can't disable multitexture, and furthermore a minimum of 2 texture units are required to run. From 1.8.4 onwards you will be able to do this, and DirectQ will support 1 TMU devices, but it will be a somewhat downgraded mode.

Let's discuss this a little. Recieved wisdom is that disabling multitexture can make some maps run faster. The reason why is that GLQuake's multitexture implementation is somewhat botched, and the non-multitexture paths may be preferable as they do better state batching. (There's also GLQuake's horrible old mirror hack, but we don't talk about that round my way.)

This isn't actually necessary in DirectQ. I've tuned the engine to do state and primitive batching, and it's capable of extracting full performance from maps and scenes where you would normally expect Quake to be brought to it's knees. Try it and see. Run some maps that are known performance killers and see how it goes.

So the only reason you would have to disable multitexture is if you know you have a bug in your driver and you know that a driver update doesn't fix it. Otherwise you may possibly be damn well going to disable it and nothing will stand in your way.

Anyway, about that downgraded mode. Not using multitexture in DirectQ will also disable fullbright colours, alpha on brush models and dual sky layers. Bottom line is that it adds complexity to the code, thus making it harder for me to test, debug and evolve. It is also intended for low performance cards, and as such these effects will only drag down something that is already running slow.

It also completely bypasses the batched primitive/vertex buffers path for world surfaces, meaning that if you use it to try and make a big, complex scene run faster, it will actually run quite a bit slower instead; currently about one-sixth the speed (i.e. back to DirectQ 1.7 and earlier levels). Yes, this is intentional. Firstly, as I said above, you don't need to disable multitexture to make these run faster. Secondly, the 1 TMU path is intended for low end cards that will derive no benefit from primitive batching or VBOs.

So the deal is, if you want to run DirectQ on a 1 TMU device you will be able to. If you want to disable multitexture you will also be able to, but beware that the end results are the opposite of what you would expect from GLQuake.

Sunday, April 18, 2010

Changes for 1.8.4

OK, I've pretty much resolved the old PF_VarString issue, so here's what's happening to date for 1.8.4:

  • Several limits have been removed. Beams and Temporary Entities have gone to unlimited; these were previously set at about 20,000. I'd like to also remove the limit on Visible Edicts but unfortunately it needs to be in a linear list for sorting, so it stays at 65536.
  • The previous limits of 21,845 on Alias Model Triangles and 65,534 on Alias Model Vertexes have been removed; the Alias Model Format is now unlimited in all respects aside from Frames and Skins (which are Protocol dependent).
  • Some efficiencies and optimizations in the Alias Model loader have resulted in fewer temporary memory buffers being created and (hopefully) faster loading.
  • The limit of 256 MB for cached models has been removed (cached sounds were previously unlimited).
  • FPS-independent gravity has been implemented. I'm not 100% convinced that there won't be unwanted side-effects on entities, so I might release with it either disabled or cvar-ized, but I'll probably leave it on particles.
  • A potential rendering bottleneck has been removed; DirectQ previously used FVF Codes for most purposes. Suspecting that the runtime created Vertex Declarations on the fly for these, I've transitioned to using native Vertex Declarations.
  • A few long-standing bugs have been fixed; the mouse should no longer go outside the window while playing in Windowed modes, Zone memory will now be correctly created with unlimited size all of the time, and efrag clearing has been corrected.
  • The old non-HLSL waterwarp is coming back, and it will be possible to change the value of gl_subdivide_size at runtime. Changing it to a lower value will leak a little memory until the next map load, but higher values will be OK.
More as it happens.

Saturday, April 17, 2010

DirectQ is probably going to VCPP 2010

In the end the clinching factor was the fixed intellisense support. I can live with downsides like the funky groovy colour scheme and the occasional strange behaviour of find. When you get to a certain age short term memory is not what it used to be, and I've come to rely heavily on intellisense for hinting about parameter order and types. In this regard using 2008 has been primarily an exercise in frustration, as I bash away at the "." key shouting "come on! you were there a minute ago!" at my machine.

I'm still going to be in a position to make a final build on 2008 (or even 2003) for downlevel OSs (anything pre-XP) if required, but there comes a point when continuing to support a legacy platform is counter-productive.

In other news, I've had a pretty hellish time with my old friend PF_VarString. This is the function that puts the "2" into "you get 2 rockets", and seems shockingly delicate. I've run up against it before with ports of other engines to VCPP 2008, and have fixed various problems with it already, but sooner or later it seems quite determined to misbehave. Even a straight port of an unmodified engine to VCPP 2008 has I would guess a 25% chance of blowing up in this function, and there is no apparent cause.

As a result I've had to backtrack some parts of the code to the old 1.8.3a level, so some of the memory work I've done is now lost. It seems a shame but correct behaviour is more important than elegant code.

It would be tempting to blame the memory work for this, but my opinion is naturally coloured by my previous experiences with this function.

Anyway, I'm going to bring on elements of the new stuff and see if I can pinpoint the spot at which it breaks. Overall it would be quite a coup to finally resolve what has been a problem for a good 2 years now.

Friday, April 16, 2010

Improvements to Memory Usage

In a lot of cases with DirectQ I've completely removed the hard limit imposed by the engine on the number of objects you can have. There is still however a more practical limit as a result of the amount of memory available.

On 32-bit Windows each process typically has 2 GB of address space to play with (the other 2 GB are reserved for the system). I had partitioned off chunks of that for use by various object types; for example cached models could use up to 256 MB. This gave a total of somewhat under 1 GB; the remainder was free for use by any other allocation (this was the equivalent of the "Zone" in ID Quake).

The big problem here was the number of partitioned off chunks started to grow, and as I had assigned fairly conservative (i.e. high) limits, space was in danger of becoming tight.

I haven't quite ripped it up and started again, but what I have done is re-architect elements somewhat. I now have a new "scratch memory" buffer that's usable for any short-lived chunk of memory I might need (typically within the scope of a single function). This enabled me to get rid of a lot of the memory partitions I formerly had. Cached models have moved into that 1 GB of free space (which is quite a bit more than 1 GB), as have file loading and most permanent allocations which are made one-time only when the game starts.

The only real restricted buffer remaining is for allocations made per-map. This was 128 MB but I've increased it's allowance to 256 MB. In practice this wasn't really needed as I can fit even warpc.bsp in ~30 MB, but the extra headroom does give a sense of well-being and contentment. This is a maximum size by the way; the amount of memory actually allocated will be less.

Ideally everything would be in free-for-all space and have the full room of that 2 GB to grow as big as it needs, but there are some parts of the engine that require different objects to be in contiguous memory blocks. Some of these were inherited from ID Quake and some - unfortunately - were introduced by me. I've been dealing with these as I come across them, and will continue to do so, so that some day we can achieve the ideal situation.

Would anybody complain if...

...I removed the ability to run on Windows 2000 from DirectQ?

Put it this way: 1.8.3 was the first version in a long time that could run on 2000. Nobody complained during that time. Nobody.

Now, I fully understand that from an idealistic perspective running on Windows 2000 would be nice to have. But let's take off the idealistic hats for a minute or two and look at it from a purely practical perspective.

Does removing the ability to run on 2000 actually affect you? Or do you just think it might affect someone else? And do you know who that someone else is?

The OS we're talking about here is a business OS that was never really available for the domestic user and that was very quickly replaced (within 1.5 years) by XP. It's over 10 years old, which makes it almost as old as Windows 98.

What I'm looking for is one person who has Windows 2000 on their only machine and who regularly runs DirectQ on it. If I can find that one person I'll keep it.

Thursday, April 15, 2010

Changes of Plan

I had originally intended to be away for a week, but unfortunately there was the small matter of a volcano in Iceland, and as a result my trip has had to be cancelled. Having nothing else planned and some days off work, the result is that I might get some DirectQ updates done.

I have some interesting ideas brewing up, so let's see what happens.

Updates.

Idea 1: Memory efficiencies. There are quite a few places in the engine where a large-ish memory buffer is needed to store copies of (or pointers to) items. Some in the server, some in the network transport, and some in the renderer. I've previously allocated separate buffers, but since these are completely isolated and renewed each frame I can just use a single buffer for them all.

Idea 2: Complete removal of alias model limits. DirectQ has a limit of about 20,000 triangles in an alias model, because each triangle needs 3 vertexes and 65536 vertexes is the maximum you can address with a 16 bit index buffer (32 bit index buffers aren't supported on all hardware). So split the model into separate parts. The maximum applies to each part, but there need be no limit on the number of parts.

Idea 3. The Zone becomes King. A lot of DirectQ's memory management is via small memory buffers, of which there are many. Because VirtualAlloc needs to specify a maximum size, I've specified conservatively high sizes, but this reduces the amount of address space available for the Big Stuff. So use the Zone (which is now unlimited) for all of these little allocations instead.

Wednesday, April 14, 2010

Release 1.8.3a Emergency Update

Apologies if you have already downloaded 1.8.3, but two late breaking bugs slipped through my testing.

The first was that crosshair colour wasn't working. Not that big a deal, but it should be done right all the same.

The second was a doozy and only came to light when I recompiled the source to commence on a potential 1.8.4 version. My cvar and command compatibility layer was misbehaving extremely badly, in that it was accidentally stomping over original cvars and commands.

That one doesn't seem to have affected the original 1.8.3 release, but all the same I prefer not to take the risk given that I'm going to be away for a while. Let's not have me coming back to problems.

The new link is: http://directq.codeplex.com/releases/view/43581.

Release 1.8.3 is now available

Clicky.

Two points to note.

No fog. Built-in fog is no longer available on Shader Model 3+ hardware, so if you want to have fog that works on all cards you need to write your own fog code in shaders. Yes, that sucks mightily, but I am not the one making these decisions. In the end I decided to just remove it rather than have it working on some and not working on others.

"But OpenGL fog still works!" I hear you cry. Yes, but that's because every OpenGL implementation is required to implement the full specification, and fog is still in the full specification. An OpenGL implementation however has complete freedom with regard to how it implements the specification. So what you are seeing is most likely a software-based implementation of fog. Direct3D is a lot more unforgiving - it exposes the bare bones of what's supported on the hardware and nothing more. No software fallbacks (except one very odd one in supporting vertex shaders in software), no get out of jail free clause.

Yes, it still sucks. For 1.9.0 I'm intending to use vertex shaders on everything (using that software implementation if you don't have the hardware) and we'll get fog back then. Promise. For now rather than make a mess, and rather than explain why a feature works for some but not for others, I decided to not have that feature at all.

Second point is that if you get any errors or crashes with this release your first thing to do is upgrade your DirectX. Always always always. Even Windows 7 comes with a downlevel version of D3D9, so upgrade it. If it still crashes then tell me how to reproduce the crash.

This has been tested on a variety of Intel and NVIDIA hardware, on Windows 2000, XP and 7, and with various different versions of D3D9, but the objective is to remove unknown factors so that we can focus in on known ones. So upgrade.

Final Updates for 1.8.3

The final handful of new items have just been added. These include a minor bugfix to make disabled menu items draw correctly with the new faster conchars drawing routines, vertex stream filtering, and one big important one that needs some more discussion.

Early in the development of DirectQ I was in the habit of removing cvars and commands that I was no longer using: stuff like gl_ztrick, gl_flashblend and so on. This was fine but tended to spew a lot of "unknown command" errors to the console on loading a config file. Some mod authors also have included QC that sets cvars to what they think they should be (as if your own personal preferences didn't matter), which in the worst case could cause the the same "unknown command" errors to appear every frame.

Needless to say that I've since learned my lesson from this.

I've added a compatibility layer and have included all missing cvars and commands from both WinQuake and GLQuake into it. These don't show up in the autocomplete lists, don't get written to your directq.cfg, but just exist to silently soak up this kind of abuse. No more rash of errors that the old modem cvars don't exist when running default.cfg, for example.

One final bugfix that seems to be needed is that the +map command doesn't seem to work when included on the command-line. I suspect that the reason for this is that I added code to support map names with spaces in them a while back, so it's gobbling the next command and generating an invalid name (and causing the next command to also not work).

This was - kind of - reported by someone who just initially came out and asked me to "make command-line options work" without providing any more info.

A tip for anyone reporting a bug. If I could have one, just one tiny wish in the whole entire world, it is this.

Tell me what the problem is, not what you think the solution is.

You see, the actual cause of the problem might be a long way from what you think the solution is. In this case, command-line options actually do work in DirectQ, but one specific one had a very subtle bug in it that only occurs under very specific circumstances.

If I had been told "I ran DirectQ with the following command-line: '+blah -boo -yadda yadda +berkles 27' and the +berkles part didn't work, but I got the following error in my console: 'could not deflarbulate the convexotron'" it would have been at least 25,000 times as helpful (and 25,000 times more likely to result in a fix).

If I may be so bold as to ask for another wish: if you get an error message, tell me what the error message is. What it says. Error messages aren't just there for to spew gobledigook at people that they can ignore, they provide information to help them with reporting problems.

If I was made Evil Overlord of the Universe tomorrow I would make it mandatory that everyone reporting problems spend at least a year working on a Helpdesk. We would soon see an end to this kind of thing, bwahahahahaha.

Rant over. The next post should be the Release!

1.8.3 Update

I've just made the Release build of 1.8.3; this is something that the actual tests need to be run with as there can occasionally be subtle differences between Release builds and Debug builds.

The executable size is down to 888 KB. Dynamic linking of the DirectX DLLs has been tested and works fine on a number of different machines with a number of different versions of DirectX installed.

Performance has gone up dramatically over 1.8.2; the only performance measurement that makes any real sense to me is on my own machine. This of course is going to be nowhere near relevant to your machine, but it's a useful rough yardstick nonetheless. 1.8.2 had hit the magic 230 FPS in timedemo demo1 on my machine; 1.8.3 fluctuated a bit during development but has now settled at about 255 FPS. The best thing is that there is more to come as suboptimal rendering paths get tweaked and straightened out.

There may be some last minute changes but otherwise things are looking complete and ready, pending the test runs.

Update: some minor last-minute required tweaks have been identified. Nothing earth shattering, and nothing that will delay release, but stuff that's important to get done nonetheless.

Tuesday, April 13, 2010

Entity Occlusions and Lightmaps

It's been a very productive evening for DirectQ with a number of long-standing matters having finally been cracked. The big one is the entity occlusion flip/flopping that I mentioned earlier. This was coming from code that I had written to reuse query objects as soon as they go idle. You may recall from a couple of months back that I had mentioned I was getting hundreds of thousands of queries in some maps. However, I was solving the symptom and not the cause, which was that lightning bolts were generating a huge amount of query objects which would never be picked up on again. The simpler and far more elegant solution was to just not generate queries for lightning bolts at all.

Because of this I have been able to remove a lot of formerly ugly code and revert back to a simpler way of doing things in certain parts of the system. The flip/flopping problem is completely resolved and I have gained quite a substantial performance boost in scenes with high entity poly counts.

I've also been digging some more at lightmaps and have finally uncovered what seems to be the "correct" way of handling lightmap updates. Another large framerate gain came from that. There's not much to be said about this that isn't deeply technical, but it does highlight a serious hole in the D3D documentation. There is a need for a section on "common scenarios" with advice on the correct way to do them. This is the third or fourth such scenario this year where I've had to piece together information from multiple different sections, put 2 and 2 together, take a leap of faith, upset elements of the renderer, and finally come out the other side.

These are all problems that everyone needs to solve all over again for themselves every time, and it's Just Not Good Enough.

The good news however is that tomorrow's planned release is virtually certain. It will be late-ish in the evening (GMT) as there are some small things left to finish up, and I also need to give the engine a good shake-down by playing some Quake.

Traditionally I play through e1 on Easy to make certain that everything works as it should, but this time I think I'm going to do e3, just for a change.

Visual C++ 2010 Express

2010 Express just became available, so I've downloaded and installed it. An initial test upgrade of the DirectQ project went cleanly, a few tweaks regarding the version of the DirectX SDK I had on this machine, and it compiled well and the first quick run went without any crashes.

There are lots of things to like about 2010. Intellisense is finally fixed for C++ (yes, this time it really does work), autocomplete! (I didn't expect that), the UI seems faster and more responsive, and compile times appear to have been reduced.

Some things not to like. Include and lib directories now have to be specified per-project, the funky colour scheme (I'm of the opinion that Serious Tools should just use the native widgets and colours) and - most damning of all - Windows 2000 is no longer a supported platform for applications built with it.

Because of that last point, DirectQ will for at least the medium term remain on 2008. I'll definitely be making use of 2010 for all of the various little test apps I work with, and also for some of the experimental work I do from time to time.

Release 1.8.3 is Imminent

It won't happen tomorrow (Tuesday) but it should happen Wednesday.

I've implemented a Q3A-alike shader system like I've been threatening to. OK, it doesn't parse .shader files (the parameters are hard-coded into the engine) and it doesn't support multiple passes, and is quite primitive really, but the guts of the support is there and running well (and fast). It's quite a bit more flexible than my old VBO framework too. Right now it's got solid surfaces, alias models and shadows in it, but it needs a little massaging before I can start putting other stuff in.

I've also heavily optimized the vertex submission. This isn't actually a bottleneck in Quake (and especially not in DirectQ), but I view this optimization as providing headroom for possible future work that may take it back down again.

A nice result with shadows - switching them on hardly even slows things down at all. There's an awful lot of good things to be said for batching states and primitives. This performance boost has started me thinking about shadow volumes, and I've done some preliminary reading. I'm not certain if I'll even code them yet but it would be interesting to get a demo app up and running. If I do they would likely be a HLSL-only feature. I've not too much interest in fooling around with multiple render paths.

Up next I want to see if I can fine-tune occlusion queries. Some tests indicate that they're flip-flopping between rendering everything and working correctly every few frames, so I need to see if I can figure what's causing that. This is a pre-1.8.3 job and is the main reason why it won't be released until Wednesday.

I also need to test alpha support with my shaders and tidy up a few small things, but otherwise it's pretty much there.

In other news, I've been fooling around with a version of Aguirre's Engine (mentioned one post back, link there). This is just distracting experimental work, and is purely as a test bed for some ideas - no release here. The objective is to see if I can get it into a state that makes it a more useful engine for actually playing Quake with, and the work mostly involves moving a lot of the big static arrays in it to fully dynamic allocation. I'm also seeing if I can bring a limited OpenGL version of DirectQ's renderer onto it. It's a learning exercise for me rather than anything else, and good fun to just be able to hack and slash at code without worrying about consequences.

Some of what I do here might make it into DirectQ. Some code from my WinQuake experiments has already done so (in a very roundabout way and in a completely different part of the engine), so it's not impossible.

Monday, April 12, 2010

A Hack to Fix a Hack

For some unknown benighted reason Aguirre Quake is popular with some players. I don't know why, it's not intended as an engine for playing with. It's a mapping tool.

From the site:

Enhanced versions of Win/GLQuake for mappers.
And from the readme:
This makes it ideal for mappers who at one time or another experience problems with their maps; there are leaks, entity overflow or other issues that prevent the map from loading in a normal engine. These problems should of course be fixed so the map can be loaded in any engine, but during development this engine can be used to help finding and fixing the problems.
Anyway, attempting to load game saves generated in DirectQ causes this engine to go into an infinite loop. The reason why is that it has code in it to search for a carriage return in the save game comment, which terminates when it reaches the "kills" part of the comment. Only this is DirectQ has - for the past number of releases - not written the "kills" part.

This has always worked fine with ID Quake, which can load a DirectQ save without any issues. As ID Quake is the gold standard, my opinion is that if something works in ID Quake but not in Engine X, then Engine X is broken. (OK, so DirectQ wrote a non-standard savegame comment so I have to accept some blame myself too...)

So I find myself in the position where I need to put a hack in DirectQ to fix a hack in another engine that was never intended to be used to play Quake in the first place. On the other hand, it does make DirectQ somewhat more robust against evil character sequences in the save game comment (some mappers seem to like to put a multi-line essay in there!) which is after all a good thing.

Final analysis is that I'm a bit annoyed that it happened in the first place but nonetheless happy to have made my engine somewhat better as a result of it.

Sunday, April 11, 2010

DirectQ Speeds Up (Yet Again)

Is this getting boring yet? I'm after pulling another quite significant speed increase out of DirectQ by reorganising the way vertex buffers are managed. This was originally something I had planned for 1.9.0, but the more I thought about ways to do it, the more I started itching to do it. Right now it's just on world surfaces, excluding sky and water, so there is quite a bit more to come from it.

Overall it's really heading in the direction of a Q3A-alike shader system now, only that Q1 won't use .shader files so it'll just be restricted to the surface types available in a Q1 map.

What does this mean for next week's planned release of 1.8.3? I don't honestly know just yet. Obviously I'm going to continue with the current work and move everything else across to the new setup, but it's far too early to guess at how long that will take. The basic world surfaces took me about 1 hour to do, but integrating the other data types is going to be a bigger job.

Next week also sees the release of Visual Studio 2010, and - depending on how it's legacy OS support pans out (and when the Express versions become available) - I may make the switch. I'll probably write some about that when the time comes.

Update: Change of plan. Firstly, the new setup was relying on a capability that all cards may not have, and secondly it was turning into a lot of work. Basically it's given me some good pointers for what needs to be done in revisiting the refresh, but it won't happen for 1.8.3, and next week's release is still on.

The whole refresh does still need to be revisited; in many respects it's quite clunky and has some seriously ugly code in there (if you've ever looked at d3d_warp.cpp you'll know what I mean). I may yet get something minor done between now and next week, but I won't hold things up for it.

Update 2: Another change of plan. I got it. ;) We're now running 10% faster than 1.8.2 was.

Saturday, April 10, 2010

Game changing in WinQuake

I've just implemented game changing in my WinQuake engine, as well as beginning support for huge maps. In fact the huge map support is done, with the exception of protocol changes. Ugh.

Here's the Marcher Fortress to celebrate:

Friday, April 9, 2010

Strange Performance for 1.8.3

I've experimented with putting the entire world into a static vertex buffer, which should in theory make things really really fast by cutting down quite dramatically on the amount of data sent to the GPU during a frame. Oddly enough however it doesn't. In some cases it even drags performance totally down.

Maybe I'm doing something wrong and need to revisit it, but until I determine what the cause may be I've added a cvar (r_staticworldvbo) to control whether it's used or not. If I do get a resolution I'll likely remove the cvar.

Thursday, April 8, 2010

Optimum Lightmap Sizing

In other news I've been working on tuning the lightmap size for DirectQ. This wasn't a problem in GLQuake as it could mostly target single-texture cards and didn't support huge maps. Older, more innocent times.

It's a very fine balancing act, as the objective of a multitextured/texturechained renderer is to try and ensure that as many surfaces which share the same texture as possible also share the same lightmap. Otherwise the overhead of texture changes (potentially as many as one change per surface) means that a single-textured renderer may well be faster.

You've also got to balance performance in ID1 maps with performance in huge maps. I've found that using tall-but-narrow lightmaps gives good results but the question is how tall and how narrow?

It's further complicated by the fact that there seem to be a number of sweet spots where things appear to balance out well, so you've got a lot of testing to do. Also, there will almost always be aberrant cases where you're basically fighting the map design and eventually just have to accept what you've got as the best you can do for now.

Previous versions of DirectQ (in the 1.8.x series) used a 32 texel wide lightmap and set the height at the max texture height supported by the card. This worked well but had the disadvantage that if even one surface overflowed into a new lightmap you wasted a whole heap of texture memory. It also seemed slightly prone to texture trashing on Debug builds running in the debugger under WDDM drivers (but so is everything; just that it was especially bad).

For 1.8.3 I've discovered that 64x512 lightmaps also represent another of those sweet spots. There's a saving on texture memory on the overflow case, the lightmaps seem to bind to a TMU faster, and you can still pack a good many surfaces sharing the same texture into a single lightmap. Overall performance is up quite a bit in both ID1 maps and huge maps, but it's still possible to construct a scene that's designed to make it suffer (that will always be the case).

No doubt there are other lightmap sizes which offer other tradeoffs and balances, and no doubt at some point in the future I'll come back to those and work some on them, but for now what I've got is a good enough improvement.

Today in DirectQ...

I've been implementing a system to reduce the memory overhead of surface vertexes. I've already done the position element (it's how I can squeeze warpc.bsp into under 30 MB) but I saw some possibility in doing texcoords too.

Well I did it, and - despite trying some trickery to speed things up - it came in with prohibitively slow loading times. ID1 maps weren't too bad, just a second or two slower, but warpc was horrendous. I ended up manually killing the engine after about 10 minutes. It's a shame as this could be quite a memory saving for larger maps, so I'm going to try another idea and see how I get on.

Status Bar and HUD styles in WinQuake

After what seems a long fight I finally got variable HUD styles working in my WinQuake build. All I need to do is finish up the Rogue and Hipnotic stuff and it's done.

This is quite neat and may be ported to DirectQ, although doing so would mean removing DirectQ's customizable HUD. On the other hand this is a lot simpler for the user to get to grips with.

There are basically 7 styles available, selectable via the "viewsize" cvar (it seemed to make sense to use that) or a slider option in the menu:

  • Full Status Bar and Inventory. This is the classic Quake status bar, corresponding to a viewsize of 90, empty regions at the bottom of the screen and all.
  • Full Status Bar without Inventory. This is the smaller Quake status bar showing Armour, Health and Ammo only. Empty regions here too.
  • Alpha overlay Status Bar and Inventory. This is similar to the DarkPlaces bar, with the back pictures getting a 50% alpha stipple (to keep things simpler for software rendering). No empty regions. This is the default with viewsize 100.
  • Alpha overlay Status Bar without Inventory. As above but with Armour, Health and Ammo only. No empty regions.
  • QuakeWorld style HUD and Inventory. This is the HUD from QuakeWorld with the items spaced out a little (just so they're not all bunched together at the edges of the screen). No empty regions. Viewsize 110.
  • QuakeWorld style HUD without Inventory. As above but with Armour, Health and Ammo only. No empty regions.
  • No Status Bar or HUD. No inventory. No empty regions. The very same as the old viewsize 120.
It's OK to change the behaviour of viewsize for an experimental engine like this one, but DirectQ will need something different, so I'm thinking a new cvar will be required if I do decide to use it.

Wednesday, April 7, 2010

Donations/Etc.

I occasionally get asked if I would accept money for what I do on DirectQ. I'm actually incredibly uncomfortable with the idea, and think that it would make it feel like a job.

However, if you do insist on showing some financial appreciation (and I am in no way insisting that you should), my charity/etc of choice is PAWS Animal Rescue. I'm in no way affiliated with them and nothing I say or do is endorsed by them, but I do think they do a wonderful job.

If you don't, or if you disagree, that's cool too.

Pop Quiz Time!

Which one is faster for copying 8 bit source data to a 32 bit locked DirectDraw surface via a palette? This:

 dst[0] = ddpal[src[0]];
dst[1] = ddpal[src[1]];
dst[2] = ddpal[src[2]];
dst[3] = ddpal[src[3]];
Or this:
 *dst++ = ddpal[*src++];
*dst++ = ddpal[*src++];
*dst++ = ddpal[*src++];
*dst++ = ddpal[*src++];
It surprised me a little to learn that it was actually the first. This is essentially the code that my WinQuake engine uses to put the final pixels onto the screen, and - in the course of tidying things up - I had switched a loop from the first of these to the second. Dropped from about 158 FPS to about 125. Switch it back and it goes up again. That kind of difference is no joke.

Of course that's not always the case - putting something similar on lightmap uploads in DirectQ, for example, made things slightly slower - but all the same, it's a good idea to not always assume that pointer arithmetic is always better in every case. Sometimes you need to try both and see which works best, and if it's a critical enough area of the code it could be something of an eye-opener.

Tuesday, April 6, 2010

The Direct3D driver is very fast

Very fast indeed. Even with the overhead of locking the backbuffer and expanding pixels from 8 bit to 32 bit it still manages to come in faster than anything else I've tried so far (aside from the Null driver, which doesn't draw to screen so it doesn't count).

This isn't traditional D3D as in vertexes, textures, matrixes, etc; oh no. The scene is still fully composed in software and D3D is only used for the very last step, which is putting the composed scene from a memory buffer onto the screen. Same as using D3D to display a bitmap in an image viewing program, really.

I'm going to do an OpenGL driver too just to see how things compare up, but I honestly have no idea whatsoever if the code I write here is going to be in any way suitable for this job. But I think nonetheless for completeness sake it needs to be done.

So far the pecking order is D3D > GDI > DDraw > GDI+. It will be interesting to see where OpenGL fits in here.

Update: I unwound some loops in the DirectDraw driver and it shot ahead. Just goes to show that an API is not enough, and sometimes you do have to do dirty work yourself.

The prototype OpenGL driver turned out to be hellishly slow. The lack of direct backbuffer access totally went against it, strategy #1 using glDrawPixels got 50 FPS (expect that to drop to 30 when/if it goes in the engine), strategy #2 using a texture fared no better. OpenGL is just far too high level and abstract for this kind of work.

OK, experiments over, the way it looks is this. 2 drivers will be included; DirectDraw and GDI. DirectDraw will only be available in 32 BPP modes but will be available both windowed and fullscreen. It's going to be preference number 1, and WinQuake will attempt to start up DirectDraw. If that fails, or if the mode is not supported (16 BPP) we'll use GDI instead. There will be a vid_ddraw cvar, defaulting to 1, which is the behaviour described above. A value of 0 will just skip the attempt to use DirectDraw and go straight to GDI.

If GDI fails then something on your machine is terminally f--ked and a crash will be in order.

The other video drivers I wrote will be removed. There's no need for the D3D driver as DirectDraw is now faster, and the GDI+ driver is just a joke.

More WinQuake

Insomnia struck so I fooled around a bit more with my WinQuake engine and implemented switchable video drivers for it. They're really cool and really easy to do; just 4 (but sometimes 5) functions need to be written and you have a new video driver.

Right now there are 4 drivers implemented: Null, GDI, GDI+ and DirectDraw. The Null driver just uses a memory buffer and draws nothing to screen. GDI is probably the fastest driver. GDI+ is deathly slow but is likely not even remotely optimized. DirectDraw is almost as fast as GDI but is slightly hampered by having to use a 32 bit back buffer, so it needs to write the screen to memory, then expand that to 32 bit via a palette into the backbuffer before blitting it to screen. Nasty. If I can find a way to make DirectDraw use an 8 bit back buffer in windowed modes and without having to change the display mode it should get even faster.

I'm going to go for broke and add a Direct3D driver to it later on. It will be just D3D using surfaces in the same way as DirectDraw does, but it will be nice to do a performance comparison. OpenGL should also be possible too, but the setup will be ugly. If I can get an 8 bit backbuffer with one of these it will be nice.

I mentioned in passing that you can switch between drivers while the game is running and see the end results immediately. This is via a vid_driver cvar at the moment, but it's not incredibly user friendly. I'm going to see if I can do anything about that, but first I need me a coffee.

Last WinQuake for Today

Can you feel the awesomeness? Can you? Can you feel it?



OK, it's just cheesy stippling to get that alpha effect on the status bar, but all the same, it brings WinQuake into line with the expected look these days. I could I suppose go with a 32bit backbuffer and do proper alpha, but it would mean having to rewrite all the drawing functions (and losing a lot of performance), so cheesy stippled alpha it is.

Note the better looking crosshair as well. No reason why other crosshair options couldn't also be made available here.

Monday, April 5, 2010

WinQuake Fun

I almost released my WinQuake build earlier on today, but it's not quite ready so I held back. Good thing too as I hadn't done the muzzleflash interpolation fix from DirectQ yet. Here it is for those of you who are terminally curious:





As you can see from the second shot, the flash moves between barrels. This doesn't happen with DirectQ and is straightforward enough to fix, but just needs some time to port the code changes over.

Non-Power of 2 Textures in DirectQ

Quaddicted has just posted another article in it's ongoing series on differences between Software Quake and GLQuake. This one focusses on GLQuake's (rather vile) handling of non-power-of-2 textures, and has motivated me to address the problem a little better in DirectQ. I recommend that you read that article (if you haven't already) before continuing here.

I'll begin by owning up - DirectQ has never handled these too good. A straightforward bilinear interpolation (which is at least better than GLQuake) was as far as I had gone here. Now I've switched it to a dithered triangle filter and have also enabled support for non-power-of-2 texture uploads.

There are 3 types of support available for non-power-of-2 textures. First is the way OpenGL does it, which is not relevant but worth discussing. OpenGL advertises support by means of an extension, but it may or may not be hardware accelerated and you really have no way of knowing. This is the wrong way to do it, for reasons which should be obvious. Direct3D does better by letting you know exactly what the hardware supports (although on the oter hand you need to do more work, but it's a fair tradeoff).

Next there is conditional support. This allows hardware accelerated non-power-of-2 textures provided they're not mipmapped, not compressed and not wrapped. DirectQ will detect this and enable it for HUD and menu graphics, which can meet those conditions.

Finally there is unconditional support which takes priority over conditional, and can be applied to any texture. DirectQ will also detect this and enable it.

The final approach I guess is to pad textures (like FitzQuake does) and I intend doing something along those lines later on.

Interpolation in Software Quake

Just finishing up interpolation in Software Quake; I think the only thing left to do is my muzzleflashes hack for the view entity. I used the old QER tutorial as a baseline, but did my usual thing of putting the position/orientation interpolation into CL_RelinkEntities (so that the viewmodel doesn't need special hacks).

I found quite a few more bugs in the QER tutorial that just weren't present in GLQuake but which caused things to blow up badly in Software Quake. I'll quite likely write these up at Inside3D at some point in the future, but for now I need to road test the code a little more (and fix those pesky muzzleflashes!) For the most part they relate to entities switching models and previous pose numbers being invalid for the new model. Debugging ASM code sure is hella fun.

I must say that Software Quake looks weird - neat, but weird - with interpolation. There are times when you really do have to look twice at the scene just to remind yourself that it is software rendering.

I think this engine will be released. It's quite fun to use and a good alternative performer for anyone who can't use hardware acceleration. There are a few more things I want to do with it beforehand, and the new DirectQ 1.8.3 will almost certainly come first, but this one is a nice thing all the same. At some point in the future I may even merge the code into DirectQ, but that's just crazy ambitious talk right now.

Sunday, April 4, 2010

Even more fun with Software Quake

It's interesting to play around with Software Quake. The renderer is really really solid, and you do learn a few things about how Quake is meant to be (things that don't always work right in GLQuake). There's also ample room for improvement in the engine, and a lot of the fun for me comes from implementing some of those improvements.

Right now I've fully ripped out the SciTech MGL library and switched it over to pure GDI. This was a preliminary to doing DirectDraw, but GDI gives ample performance as it is. It seems as though the advantages of DirectDraw over GDI really only apply if you're doing a sprite-based 2D scroller; for Quake where you write pixels into a buffer and then blast that buffer on screen in a single operation there doesn't seem to be much in the way of difference (GDI is cleaner too; DirectDraw dates from a time when DirectX in general was a horrible mess).

The Quake II code came in very handy here (it doesn't use MGL) and I simplified things even further by not bothering with palettized modes. Overall it gets about 120 FPS at 800x600, which was roundabout where I was aiming for and is a great result for WinQuake.

The next step that's tickling my fancy is interpolation. I've been studying the relevant parts of the code and it looks very achievable. It means switching one function from ASM to C but the performance difference is quite miniscule so I think we're on.

No commitments however as to whether or not I'm going to release anything here, but if the end result is solid enough I just might.

All of this means that no DirectQ work got done, but there's no cause to worry there as the current state of DirectQ is nicely stable and I'm happy at the moment to let it mature until such a time as inspiration for the next steps there occur. I have fairly good ideas as to what those steps are, but implementing them will mean some drastic overhauls of parts of the code so it's not something to approach lightly or recklessly. As soon as a plan of action for them comes together I'll be back to it with a vengeance. Meanwhile I'm having enormous fun.

Saturday, April 3, 2010

More fun with Software Quake

Every now and then I like to go off and do some experimental work. This generally happens after a release milestone and when I need to try out new code in a build that won't pollute my main codebase, or that is sufficiently distant from my main codebase that I can forget about assumptions made in it. Sometimes I do it because I need to, other times I do it because it's fun. Sometimes the end result gets released (the original DirectQ was actually one such experiment) and other times it doesn't.

I mentioned a while back that I was doing some work on a native DirectDraw backend for software Quake but that I was seeing pretty abysmal performance. It did however tickle my fancy to see if I could do the same with a Direct3D backend. There's really nothing much in the difference and the core concepts are the same: you create a device, set up your surfaces, lock your backbuffer, write to it, unlock it, and present your scene. Only the terminology is different. (Note that this is not the way a real DirectDraw app works, but it's the way you need to work for Quake).

So I dug out my test application (that just writes random garbage to screen) and ported it to D3D9; a few optimizations later I was running at almost 200 FPS; a big increase from the 6 FPS I got with DirectDraw.

The neat thing is that now - in the more familiar environment of D3D9 - I've identified a few bottlenecks that really made the DirectDraw version suffer, and they are easily portable. So the next step is - of course - to port them and see what happens.

None of this has any bearing on DirectQ itself of course, but where that stands at the moment is that I'm still adding a few layers of polish in preparation for a 1.8.3 release. I'm off gallivanting around the world for a short while in about 2 weeks time so I intend having something out before then.

The latest change there is that I've implemented the ID3DXRenderToSurface interface to handle my render to texture requirements for the underwater warp. I had always had some concerns that while the code I had written certainly worked, I had no idea how correct it actually was. Some study of an SDK sample and now I've got something that's more in tune with the way the runtime expects things to be done, which is all to the good in the longer term.

Friday, April 2, 2010

Of course the screenshot WAS from Tenebrae

And everything else was a load of nonsense too.....

Hope you all had a good April Fools day nonetheless. Some genuine stuff now. Here's the full change log for 1.8.3 so far:

  • Added test and failure cases to web download code for several less likely scenarios.
  • Added more robust OS versioning.
  • Allowed a PS version that's downlevel from the VS version and enabled shader optimizations.
  • Optimized surface bbox culling by only checking on surfs where both the leaf and the node intersect the frustum.
  • Removed "hard-coding" of release version from the splash screen.
  • Removed all static linking dependencies from DirectX DLLs.
  • Added d3dx_version cvar to switch between different versions of the D3DX DLLs (default 42).
  • Resolved massive speed drain from drawing the console and strings on some platforms.
  • Tidied up interpolation somewhat and switched it to cubic interpolation.
  • Removed fixed pipeline fog (shader model 3 compatibility).
  • Added HLSL path for underwater warp and optimized by incorporating polyblend with the post-process.
  • Improved timer to only wrap if DirectQ itself is running for > ~49 days.
  • Reworked FPS counter more to my liking.
  • Optimized vertex buffers a little better.
Regarding that d3dx_version cvar: it's coded so that if version 42 is not available it will test for successive downlevel versions until it finds one that is available. The lowest version has now been fixed at 32, as it's the lowest version of D3DX that my shaders will compile against. I haven't yet tested it with version 42 when the full app is compiled against an earlier version of the SDK, but that will be done.

Overall performance is up a few percent on 1.8.2 owing to some sneaky optimizations. I'm moving towards being able to put everything into vertex buffers and fully decouple the actual render from the client/server processing, but it might take a few more revisions of the code to actually get there. But in general it's looking nice.

Thursday, April 1, 2010

Direct3D 11 Features

I've decided to implement some D3D11 features in DirectQ. There is some nifty stuff you can do which overall gives a cleaner code path and allows me to integrate some things a lot better. This will probably become the primary path for 1.9.0 and onwards, with the old D3D9 path becoming somewhat deprecated, and eventually being removed entirely. It really is a legacy path in this day and age, and there doesn't seem to be much point in continuing to move forward with it.

Right now I've been bringing on a few things in an experimental build over the past few days, and so far I like what I see a lot. It just nicely solves a lot of the problems that exist with D3D9.

Here's the first screenshot from my experimental build. This probably won't reflect what things will ultimately turn out like, but should give an indication of how much can be done in such a short time:



With that in mind it's quite likely that further development of the renderer in the 1.8.x line will cease, as it seems pointless to be working on code that will be eventually removed. 1.8.3 will still happen of course, but will contain enhancements in other areas.

Till next time.

sv_novis 1

One of the problems with the cheesy r_novis 1 translucent water hack is that you can't see entities through a water surface. To address this I've complemented it with an sv_novis cvar which will also instruct DirectQ to ignore visibility on the server-side.

Because this is a server cvar it only takes effect on the server on which it is issued. It will have no effect in multiplayer games. It's sole use is to add some gratuitous eye-candy for single player.

There may be side effects, like monsters that previously couldn't see you suddenly being able to, and it does slow things down even further. DirectQ's larger message size should help prevent packet overflows in most normal sized maps, but they can probably still happen in large maps or maps with lots of entities.

To be honest I don't know how ronust it really is, so if you're playing a game with sv_novis 1 and it crashes, you will have to try seeing if it also crashes with sv_novis 0 first. It's not intended as a serious feature, more of a "fun" thing to do; a cheesy hack to fix one specific issue with another cheesy hack.