Wednesday, June 30, 2010

Updates for 30th June 2010

More fine-tuning of the occlusion queries code. I think that "blinky" on brush model entities has now been completely abolished but I'm being cautious and leaving the default at MDL only. There was a really rare and quite subtle crash bug in there that I caught once while not running in the debugger, and was damn fortunate to catch a second time while in the debugger.

Viewport scaling and interaction with the status bar now seems to be rock solid for all combinations of possible configurations.

Another bug with anisotropic filtering has been caught. Sheesh! Why can't D3D have texparams like OpenGL? (OK, it can if I wrap IDirect3DTexture9, but I shouldn't have to.)

One nasty performance killer has been identified where surface batching gets broken. DirectQ renders in texture order, with each texture being sorted front to back, then in lightmap order within that (lightmap building is done is texture order to preserve batching). However that means that the draw order of Texture1/Front/Lightmap1, Texture1/Front/Lightmap2, Texture1/Back/Lightmap1 is possible. Yeah, front to back is good, but batching gives more performance, so the order needs to change to texture, lightmap, f2b. Hanging a surface chain off each lightmap seems to be the way to go, and should further boost a lot of really big and complex scenes. I'll be keeping a copy of the old code on standby just in case it doesn't.

The timer has been smoothed out better. Remember the integer division thing I mentioned where at 72 FPS some frames need to be 13ms and some need to be 14ms? Well now they are.

Overall items are starting to fall off my "to do" list quite nicely, and I haven't been caught by any frustrating time-killing bugs yet, but then again I haven't even touched the sound code problem yet. I'll know for definite when a likely release date will be as soon as I have that done.

Tuesday, June 29, 2010

Updates for 29th June 2010

Today has mostly been bugfixing and code refinement. I'm currently wrapping things up, but am still not too confident about announcing any definite release date yet.

r_occlusionqueries can now take values of 0, 1, 2 or 3. 0 is no occlusions, 1 is MDL only, 2 is brush only and 3 is both. The default is now back to 1, which is consistent with the behaviour of 1.8.4. This has also been added to the Video Options menu. Use r_showocclusions 1 to display stats about your occlusion queries.

There's a nasty crash bug somewhere in the sound code that gets triggered when you enter the sound options menu, or - possibly - when you issue a snd_restart command. I may either remove the sound restart functionality or make it a startup time only option if I don't get a fix. Longer term however I think I'll be looking to port the Quake II (or even Q3A) sound code.

Render to texture scaling down is now going to be locked to a fixed size, and won't be available as an option for the regular view. The status bar is becoming unhappy if it's size is not a multiple of 8 screen pixels (as is the main viewport) so I need to tweak this more.

There are about 10 other priority items on my list that I need to fix before I can release. Some of them - like making the viewport sizes multiples of 8 - will take a few minutes; others - like hunting down pointer bugs in the sound restart code - could take seconds, days or even weeks. If a problem is too serious, and if I feel that we can live without (or with reduced functionality of) the feature that depends on it, I'll do what's required to fast-track things. I want this out in July, basically. But when in July? We'll see...

I have a new Maps Menu!

OK, I know that I said I was calling a halt to new features, but this one was important, and 95% of the code was reused from elsewhere too, so here we go.

There were a number of things wrong with the old one. The scrollbox didn't peacefully co-exist with other controls, you couldn't select skill or decide whether you wanted to use the "map" or "changelevel" commands, you could only select a map by BSP name, long map names overwrote parts of the scrollbox, the positioning was all messed up at times. This one takes inspiration from the Demos menu and looks something like this:



All problems fixed; the only thing is that it doesn't show you all of the maps you have installed on the one screen, but right now it's something I'm willing to sacrifice in exchange for the other improvements.

I guess this doesn't matter too much to most people reading this, but for someone new to the game it's important. Plus it's not only friendly to them, it actually teaches them the console commands to use. Double win.

This is a part of the dichotomy in Quake culture. There are two main types of player here, the hardcore experienced type and the casual gamer. Now, DirectQ has had almost 5000 downloads since I moved it to CodePlex (I don't know the old SourceForge stats) and one of the most common ways that people arrive at the site is through Google searches. It's a pretty safe bet that we're not talking about hardcore experienced types here, so - if you're thinking "so what, this is useless to me" - remember that your requirements are not universal.

Nuff said!

Monday, June 28, 2010

Updates for 28th June 2010

Non-VBO fallback implemented in the unlikely event that DirectQ fails to create a Vertex Buffer or an Index Buffer. This is no longer user-selectable.

Changed FPS counter to give an average of times for the past 16 frames and display as integer; benchmarked against FRAPS - yeah, it's good now.

Implemented final drawing function on new renderer.

Added support for external charsets; gl_consolefont (0 = default, classic charset), use "charset-1", "charset-2", etc as names, put the images anywhere you like.

I guess this is one people have been waiting for...

Crap Software Alert!

During the day I have the dubious pleasure of working with Lotus Notes. I reckon I was bad in a previous life and given a choice between coming back as a cockroach or working with Lotus Notes. Next time I'm picking the cockroach.

Everybody should work with Lotus Notes at least once. It's a perfect example of how really and truly shit software evolves and is somehow embraced, while all the time those responsible for it stick their heads in the sand singing "la la la can't hear you" every time someone points out a glaring defect. "It'll be better in version 6!" "It'll be better in version 7!" "It'll be better in version 8!" "Wait till you see 8.5!" And still it's crap. You get the idea.

Sometimes I wonder if it's really just a sick joke and IBM are perfectly aware of exactly what they're doing. The sheer amount of sales bumpf and "give this leaflet to your CEO" waffle on their main Notes page suggests that may be the case.

A good suggested approach for when you're designing software is to ask yourself "how would Lotus Notes do this?" and then do the opposite. It may not work out all of the time, but there's a pretty damn good chance you'll hit the target often enough.

DirectQ update for today is next, with one long-awaited feature putting in an appearance. Stay tuned.

Updates for 27th June 2010

I've decided to call a halt to any new features, so right now we're adding polish and finishing up some things.

Texture filtering modes have been cleaned up and made behave consistently with the "gl_texturemode" command. I've also added them to the video options menu. Anisotropic filtering actually works properly now.

Occlusion queries have had a veritable metric f-ck-load of changes, and are really neat now. At the same time, there are still a few situations where you might get blinky for a frame or two, so I've decided to disable them by default. You can enable them (r_occlusion queries 1 or the menu) if you need them.

Some cvar defaults have changed. The crosshair is now on by default (value 3), lightstyle interpolation is off by default, and fast lightmaps are on for dynamic lights but off for lightstyles. Of course, if you have other values in your cfg they will take priority.

Video startup has been slightly tweaked. DirectQ will now by default (if you have nothing set in your cfg or if you don't use -width and friends) start up in the best of either an 800x600 or a 640x480 windowed mode. If you have an invalid mode number in your cfg it will do the same. Now, some people may be pissed at me for picking a windowed mode as the default, but they're safer, OK? A windowed mode doesn't have exclusive ownership of your screen, so in the unlikely event of anything going wrong during startup you can still have your desktop. And it is your desktop, not mine, so I believe you're entitled to it.

There's a pile of other stuff to finish off right, so that's what further updates before release time will mostly take the shape of. As soon as I get a better idea of how it's shaping up I should be able to give a provisional release date! Wow!

Sunday, June 27, 2010

Fun with Timers

I've been cleaning up the timer code (I may have mentioned that recently) and have now gotten rid of a lot of the nasty hacks I had done (I may have also mentioned that too), and the whole thing has got me thinking about DirectQ's timer in general.

The key problem with timers is that ever since hardware stopped being shit some years back we're in a position where the traditional recommended way of getting high-precison timing info on Windows (QueryPerformanceFrequency and QueryPerformanceCounter) no longer cuts it. There are a few things wrong with it; firstly an overflow problem (that ID to their credit anticipated and worked around), secondly a problem with multi-core machines (where your app may be switched from one core to another, and incur subtle speed variations as a result) and thirdly a problem with power-saving modes (where the CPU may be now running faster than it was when the app started, and so the info you're basing your timing on is no longer valid).

The recommended way to resolve these is to use timeBeginPeriod and timeGetTime instead, which returns time in one millisecond increments. However, there is an even more insidious problem with this, and it goes something like:

"72 FPS was good enough for my grandfather, it was good enough for my father, it's good enough for me, and it will be dang-well good enough for my children too! Now gettouttahere with your fancy new ideas afore I sets the Sabre-Tooth Cat on ye!"
You see, the interesting thing about working with integers instead of floats is that sometimes numbers don't divide evenly. In this case, 1000 milliseconds at 72 FPS gives you 13.888888888888888888888888888888 milliseconds per frame. But because we're working with integers, sometimes it will be 13, other times it will be 14.

So now that we know why Quake II switched to 83.33333 FPS (a nice even 12 milliseconds per frame), what are the consequences for DirectQ? Probably none. That scary man with the Sabre-Tooth Cat is rather big, has a very loud voice and can wield a lot of influence (most of which one would hope he uses for good). However, it does make the problem of obtaining smooth even timing somewhat more complex than it should be, and sometimes when I'm sleeping in my cave at night, after a hard day's work banging stones together to make fire, I really do wish he would go away.

Update:

The current solution looks something like this:
  • cl_maxfps (or host_maxfps, whichever you prefer) is now clamped to 500. At values above 500, frame times will drop to 1 millisecond and rounding errors will be enough to cause accuracy to fall off. I'd actually make this 250 if I thought I could get away with it. Welcome to integer land!
  • Instead of cl_maxfps (or host_maxfps - you get the idea) being an absolute upper limit, timings will average around that value. Sometimes they will be slightly higher, sometimes they will be slightly lower.
  • The net effect is that gameplay is smoothed out. It doesn't feel a single bit rough or raggy in practice. Rounding errors now average themselves out instead of accumulating.

Last updates for today

Occlusion queries on the world didn't work out either. The overhead was just too much in the end, so I ended up losing anything I may have gained.

I seem to be touching on close to a practical maximum number of occlusion queries that it's sensible to have running through the pipeline each frame. Bottom line is that each such query imposes some extra workload, and it gets to the point where results may be potentially outstanding for too long. I need to decide on a policy for this, and right now I'm thinking that if a result is outstanding for more than one frame we just draw it anyway - better to draw too much than too little (and have holes in your view and blinking models).

With this in mind I've also cvar-ized the use of occlusion queries with an r_occlusionqueries cvar. It's also in the Video Options menu. So now if use of occlusion queries causes you any trouble you can just turn them off. In practice you don't even need them 99% of the time, and the slower, older hardware that would benefit the most from them doesn't even support them. Sigh.

The possibility of replacing all of this with an alternative approach does occur to me too. Something like doing software rendering into a small in-memory rectangle in a separate thread is one option. In principle I'm opposed to the general notion of depending too much on anything that requires a certain level of hardware support and needs some special and careful handling. We'll see.

Speaking of menus, I'm going to remove the "go to console" option from the main Options menu. The rationale for this is that if you wanted to go to the console you would have gone to the console, not the menu.

That's it for now.

Saturday, June 26, 2010

Dammit, my code is awesome!

Bet that got your attention. ;)

Anyway, this is by way of saying that the Top Secret plan I had for making the world draw faster has been tried and rejected. It didn't work out.

The idea was (obviously) to put all of the world in one (or more, if needed) big static Vertex Buffer, then use that for rendering from rather than the dynamic buffers I currently use. In theory it means that a good number of vertexes don't need to be downloaded to your 3D hardware at runtime, so the bandwidth saving should make things faster.

And it didn't.

It's worth examining why it didn't. The first and most obvious reason is that in Quake the world is actually not a static thing. You move around in it, things come into and go out of view, pieces of it can move, and so on. Using static storage to represent dynamic objects means that you've got to do an awful lot of hopping around in that static storage in order to find the ones you want. Jump a few hundred bytes forward here, a few thousand backward there, that kind of thing. You could potentially pass back and forth over the same region of memory several hundred times per frame. Slow. On the other hand dynamic storage allows only the objects you actually need to be streamed in, and in the order they're needed too. Linear access, moving forward all the time. Fast.

Second reason is that it broke the batching. Imagine a situation where you have a group of surfaces that all share the same texture. Some of them belong to the world, some of them to moving objects like doors, platforms, etc. DirectQ can draw all of these together in one draw call. Fast. Breaking the batching means it needs 2 or more draw calls to handle them, which slows things down. That's just one texture; now imagine that you have 5, 10 or even 50 textures. Slow.

The third reason is that the current renderer has had the benefit of a lot of micro-optimizations over the past half-year or so. True, it's recently been rewritten, but I was able to carry a lot of them forward. Individually they don't amount to much, but taken together they do a great deal to reduce the overhead of dynamically streaming stuff in.

Overall I'm not displeased. I needed to satisfy my own curiosity here, and I've now done that, so from that perspective alone it was worthwhile. Also, the code was starting to turn a mite ugly. I don't like ugly code. Aside from making Baby Jesus cry, it's something that I just know I'm going to need to come back to in the future and try understand, and risk upsetting things (and myself) by trying to add new functionality to what would have been an already quite disgusting setup.

I'd already been there with 1.7 when trying to add in alpha surfaces, so it's not somewhere to go again.

So after that detour it's back to occlusions. I'll update on that shortly.

More Occlusion Queries

Occlusion queries rock and DirectQ has been set up to be able to read back the results without stalling the pipeline. The tradeoff is that there is a frame or two of latency in the results, but in practice that's not a problem, especially if you set things up to be reasonably conservative to begin with.

Previously I've used them on MDLs only, but now I've extended them to also include brushmodels. Furthermore, all MDLs and brushmodels in the scene are also potential occluding objects, as well as potential occludees. A slight difficulty with this is that objects may now potentially occlude themselves, but I was able to resolve that by expanding the bounding box slightly. This should also help with cases where a model is just barely occluded but is about to come into view shortly.

Is it fast? You bet! The "difficult" ne_tower scene that I frequently mention has now jumped by about 20 FPS. I think that's a good result.

There is now also potential to run them on the world; possibly at node or leaf level. It's all in the name of lopping off as much overdraw as possible so that things can run ultra-smooth in even the most complex scenes.

There is some small overhead from running occlusion queries in that they need to draw a bounding box (with colour and depth writing disabled) in order to test if the object is going to appear in the final scene, but overall this balances against the performance gain from not drawing objects that don't appear.

They can also bring in some good performance levelling. It's no use running simple scenes at 47 billion FPS if complex (or even modelately complex) ones drag you down to less than 20. It's worthwhile to sacrifice some of that 47 billion in order to bring up the speeds of the slower scenes, don't you think?

Anyway, off to try them out on the world; I'll let you know how that goes shortly. And before you ask, no, that's actually not the other thing I have planned for further accelerating world rendering; there's potentially still more to come.

Update:

Unfortunately that one frame of latency is proving to be a killer when crossing a water line. The current option seems to be to not include water surfaces as potential occluders. Let's see how it goes...

Update 2:

Yup, that did it.

Problems Solved!

This was absolutely horrible stuff. I've said it before, but Pix is an absolutely wonderful tool; the way it lets you drill down into all of the functions that get called, examine their parameters, watch your scene build up on-screen, and so on is so valuable for debugging work.

Anyway, I ran a bad scene through it and fairly quickly noticed that my sky setup and takedown code wasn't getting run. A pop back to Visual C++, examine the conditions under which it does and doesn't get run, set some breakpoints, check when those conditions and values change, and we get to the source quickly enough.

So, the new renderer divides it's functionality in two. There's setup code and actual rendering. The setup code builds a list of callback functions and vertex offsets, then writes the stuff that would be rendered into my vertex buffers. The rendering code then replays through the list drawing stuff. The setup can be reset and the list replayed at any point during the frame when things change radically enough that the previous state is no longer valid; this generally only happens when the vertex size changes or when there are too many vertexes and the buffers would overflow, but it can also be called on-demand if I need to.

So what I was doing was running some setup code in a callback function. Setup code that determined what gets buffered up in terms of vertexes and other callbacks. And because that callback hadn't run yet, the determination was not valid which caused things to go quite spectacular on me.

There's a lesson to be learned, but I'm not going to beat up on myself too much over it; the important thing is that it's now working, and that I also know one other thing to watch out for. I also need to go through those callbacks and pull any other setup code out of them in case anything else blows up in the future.

Phew! Result.

Where things are currently at...

The astute (or simply plain fast!) of you will have noticed that I recently made 2 updates and then deleted them quickly afterwards. In case anyone saw it and is wondering that it's a little weird, and just for the general info of everyone, here's where things are currently at.

After completing the render so far, I started working on the variable vertex stride matter. This isn't as straightforward as it might seem, and I fairly quickly got into a mess with state changes happening out of order and so on. So from there I backtracked and brought it up again, and while it worked in about 75% of cases, there was a small problem with static entities (only, which was odd) where they were flickering on and off every other frame, and a rather more serious problem where occasionally everything would turn into a weird inside-out tunnel effect that looked like you either had too many drugs or too little sleep.

These would be cool effects if they were intentional, but they weren't. Even more annoying was the fact that this was something I got working fine way back in 1.8.0 or thereabouts.

So now I've ripped things back to basics again and have been building up one step at a time - what I should have done first time around, I guess. This isn't so bad as it seems - the main renderer itself consists of only 3 short functions; everything else is just buffering up data and setting state.

Right now I'm at a stage where I have everything back to vertex buffers, but with the buffers re-initialized for every state change. This is a little wasteful; they only need to be reinitialized when the vertex format changes, so that's the next step. For that I'll be going back to 1.8.4 and checking out how I did it back then.

The good thing though is that I have the first mess I had gotten into definitively sorted now, as back then the very act of reinitializing the buffers caused things to go bad. So things are looking up a lot better than they had been these past few days, and I'm hoping to get a result with it soon.

That's about it till next time.

Friday, June 25, 2010

I think the current version of the renderer is now complete...

Translucent water surfaces are a total PITA to get done right. Once again no real standard way of handling them evolved over time, leaving us in a situation where we have to deal with a combination of r_wateralpha and explicit entity alpha on world surfaces, inline brush model surfaces and instanced brush model surfaces, and decide how things interact in SP versus MP games in maps that have been VISed for translucent water versus maps that haven't. Yuck.

My current decision here is that explicit entity alpha will always override r_wateralpha. Everything works seamlessly on all the different types of brush model (so you can use translucent water on ammo boxes if you want), and I'm not going to do anything about r_wateralpha in MP games - mostly because there is no way I have ever found of reliably detecting if a map has been VISed for translucent water.

Sigh - why didn't we just decide to use the Q2 SURF_TRANS33 and SURF_TRANS66 flags and be done with it all those years ago?

What else? I've removed a lot of complexity from the renderer by just deciding for now that all vertexes used by everything are padded up to the same size. This is a purely temporary measure until I get deferred vertex stride buffering working in a manner that's pleasing both to me and to older 3D hardware. So the D3D9-specific stream offset I was using is no longer there.

In retrospect I should have been happy to start out that way. There's a lot to be said for keeping things primitive and simple (sometimes so primitive that they're saying "og! og!" and banging stones together to make fire) at early stages of code redevelopment. On the other hand the downside of that is that if you go too primitive you'll probably have to rearchitect the whole thing all over again later on. There's a knack to finding the sweet spot.

So where we're at now is that I need to exhaustively go over all of the various components here and make certain that there's nothing obvious I'm overlooking. Index transfers need optimizing. Then I bring on variable vertex strides one step at a time. Start with sky because it's the first thing drawn so it will be immediately obvious if things aren't working and go from there.

Then there's that idea I had for higher performance again, but this time I'm keeping my damn fool mouth shut until such a time as I know it's going to work.

Till then.

Thursday, June 24, 2010

Updates for 24th June 2010

I've decided to drop the render-to-texture scaling down for the time being; there are occasional glitches with it where it draws on-screen at actual size, rather than stretching back up. It seems as though I'm not quite syncing some states up properly yet, so rather than complicate matters further while everything is still fairly fluid, it's easier to just temporarily remove it.

I've decided to clamp skybox textures to an absolute maximum of 512x512. I think that given the kind of machine that DirectQ is likely to be used on, that's not entirely unreasonable. Like I said, using a set of 1024x1024 textures for a skybox eats 24 MB of video RAM (less if you support compressed textures, of course) and takes an absolute age to load.

Another 20 FPS have been gained in ID1 timedemos due to some optimizations in transfer of vertex data to the GPU. Unfortunately this doesn't translate into a gain in the big maps where you're geometry-bound, but I have an idea building up that should resolve a good proportion of that.

It's always weird when I'm at this stage, and right now it's reminding me of the time I was bringing the original 1.8.0 up; I know very clearly what needs to be done, but working it around all of the wild and wacky special cases involved with the Quake data formats in a reasonably clean manner is quite challenging, and with plenty of opportunities to take wrong turnings.

Example: I'm currently on the 4th restructuring of the 1.8.5 renderer, and a fifth one is definitely in the works. Right now it uses a hardware capability that's not available on pre-D3D9 hardware, so that has just got to go. Plus there's that idea I mentioned, of which more later...

Tuesday, June 22, 2010

Those Wacky Modders!

An old Quake TC called Shrak has thrown up an interesting crash bug. There is one place in the first map where you get to pick up some armour; as soon as you do that your weapon changes (WTF!) and DirectQ goes down in flames. The reason why is that the weapon model has only one frame, and I have to confess that the possibility of that happening had never occurred to me when I wrote my code to handle interpolation of muzzle flashes (which reads the first 2 frames simultaneously).

If you've ever loaded a map with a skybox in DirectQ you may have noticed that it's sloooooow. Skyboxes require 6 textures (one for each face) and some mods provide 1024 x 1024 skyboxes, for a grand total of 24 MB of concompressed data to go down to your Video RAM. Ouch! (The fact that DirectQ loaded each skybox twice didn't help either but I've fixed that now).

One popular external texture pack contains some textures with weird resolutions like 503 x 617. These need to be resampled to powers of two before they can be uploaded to your graphics hardware. A such a high resolution we're talking about taking a process that's already none too speedy (uploading a large texture) and making it substantially slower.

PNG files contained in PK3s need to be uncompressed twice. Once for the PK3 uncompression and once for the PNG uncompression. Just walk away. Make a cup of tea. Go down the shop and buy a newspaper. Take the dog out chasing rabbits. It might be finished by the time you come back.

Another popular external content pack (and you can Quote me on this one!) has a torch model with almost as many triangles in it as the player model. One map makes very heavy use of this torch model, stuffing maybe 50-60 of them into one part of the map. None of which are culled by visibility from certain angles. It's a minor miracle I've got this performing well, but in the normal run of things you can wave your framerates goodbye. (The ironic thing here is that most of those triangles are concentrated in a very very small part of the model that you can't even see unless you're up really really close to it!)

The saga continues... will it ever end?

Monday, June 21, 2010

And on a more positive note

OK, I promised some more positive news, so here it is. I've now fully resolved the problem with jaggies when using a scaled-down rendertarget surface, so I'm able to provide a rendertarget of half the resolution of your screen (or window) when underwater or otherwise when you have a view blend in effect. This not only completely eliminates the fillrate overhead of using a rendertarget in the first place, but it actually runs somewhat faster than using an un-blended full-sized screen. The really great thing is that it's almost (I hesitate to say fully, although I've certainly noticed no difference) completely identical to how it would look if you had used a full-sized screen.

It's all well and good me saying that, but you need to be certain that I'm not caught up in some wild rush of enthusiasm and am actually reporting the truth. So here you go (click on the pic for a bigger version):



With that kind of result in mind I've also provided an option to apply the same kind of scaling down to the main render. In practice, and at lower resolutions, it doesn't do much. However, if you need to run at your full native resolution (because - for example - your driver gives you black bars around a lower resolution) but the fillrate is killing you, this may come in useful.

When underwater you'll be able to change a lot of factors relating to how the warp is carried out. The speed of the warp, the scale of it, as well as the tesselation factor are all under your control. Just look for cvars starting with r_waterwarp.

There are a few bugs and quirks with regard to how it interoperates with other parts of the code left to work through, but it's worth taking the time and it will be a good end result.

The evil that mods do (part 666)

I've just spent quite some time chasing down a visual weirdness and performance bug, only to discover that the cause of it was that a mod had stuffcmd'ed a change to gl_texturemode behind my back. The mode was invalid for the currently available selections in DirectQ (subtle differences between the way OpenGL and D3D handle texture filtering modes), which caused it to revert to some bizarre combination of I don't know what kind of filtering.

Global message to mod authors everywhere: don't do this kind of thing. You may prefer a certain setting for gl_texturemode, but chances are that the player prefers a different one. Even worse, the setting you prefer might not work on the player's hardware. It might just cause things to run slowly, or it might even crash.

In other words, settings like gl_texturemode belong to the PLAYER, not to the mod. Behaving as if it was othewise is just showing disrespect to the person who took the time (and possibly expense) to download and run your mod.

As a rule of thumb, if whether or not a setting works depends on an unknown factor that you have no control over, it's probably a damn good sign that you should keep well away from it.

Anyway, the moral of the story is that checking your current settings is always a good idea if weird things currently start happening for no apparent reason. Arising from this I'm probably going to expose gl_texturemode via the video options menu. It seems to me that as well as being a player-friendly way to expose settings, a menu is also a heck of a good way for everybody to check the current values of several settings all in one place.

For the record, no, I don't know which mod it was. I frequently test with a very wide range of mods, downloading virtually everything as it becomes available, and switching between 4 or 5 and back again in the course of a single testing session. It could have been anything.

Next time we'll have an update on a much more positive note (as soon as I stop chewing bullets over this).

More on Underwater Warps


OK, confirming that I made the right decision, the non-HLSL warp slotted in extremely clean, with about half the code as before. I've also been able to get underwater polyblends for free by blending the texture with the polyblend colour (this was already done for the HLSL warp but never for this), and have made several overall improvements to the warping effect.

It now no longer sizes up slightly when you go underwater, there's no more dramatic heaving and buckling to the sides as you move quickly underwater, and the effect is a little more pronounced than before. You'll need to see it moving to get the full idea of what I mean, but just be assured for now that it's hugely improved without being flashy or ostentatious.

It also scales properly for all of the different variable factors too, as you'll be able to see from the above screenshot.

The next thing I'm going to do is handle some of the other view blends using the same setup. One particular speedup I had discovered before was using a smaller sized rendertarget for bonus flashes. The theory is that the bonus flash is only on screen for a fraction of a second and obscures the view somewhat, so the loss of a small bit of detail for that time is completely unnoticeable.

OK, so it's cheating, but it's also a way to tackle one of the major slowdowns inherited from GLQuake. Plus I have this render to texture framework looking mighty clean now, so I may as well use it rather than have it lying idle but using up video RAM during the majority of gameplay.

Blogger's new editor blows by the way. I want the old one back. :(

Sunday, June 20, 2010

I'm thinking of removing the underwater warp...

OK, before you panic, I will be replacing it with something like what FitzQuake/etc does, so you will still get a swaying underwater distortion effect.

Right, now down to the reasons why. Firstly is that at the present moment it's complicating my whole setup too much. Far too much. Secondly is the fact that it's not going to play nice when I come to do light blooms in the future. Thirdly is the additional video RAM, processing and fillrate overhead of handling render to texture. In some cases framerates can drop to almost half when you go underwater. Fourthly is the fact that it's by now serious legacy code in dire need of a good working over and cleaning up.

One thing I've certainly learned is that when you're spending too much time wrestling with a feature to make it play nice with the rest of your code, then something is wrong. In this particular case, what seems to be wrong is the underwater warp code, as it's the only thing that's causing trouble.

None of this means that it won't be coming back in the future of course. Who knows - it may even come back before 1.8.5 is released, but right now it's holding me back from moving forward.

Opinions?

Update:

OK, I've decided to keep it. It wasn't a decision made lightly, but in the end the proper software Quake style underwater warp is one of the main standout points of DirectQ, and it would have been a real shame to lose it. It's also good to not give up as soon as you start hitting trouble with something.

What I am going to do is remove the HLSL version of the underwater warp. With retrospect, this was the part that was causing the real trouble, and I was having awful difficulties syncing it up with the non-HLSL version, variable screen sizes, variable rendertarget sizes, variable status bar sizes and variable console scales. It was certainly possible alright, but there comes a point when hacking and slashing at existing code (that was never written with that in mind) becomes counterproductive and makes a bigger mess.

Sometimes you need to take a step back before you can start moving forward again in other words. In this case I'm happy that the non-HLSL code is the right way to go. It's a feature that works for everybody too which weighs in it's favour.

Saturday, June 19, 2010

Submit, Batch!

It's always a good sign when you're writing a new renderer if you're able to replace 50+ lines of code from the old one with 3 new lines, and still accomplish exactly the same thing. That's exactly what happened when I just ported over the code for drawing alpha brush model surfaces. It was quite pleasing and definitely indicates that I'm on the right track with things.

The only major remaining item from the main 3D render is alpha liquid surfaces. I'm kind of dreading these as the original code I had written for 1.8.0 turned into quite a tangled mess. I had cleaned it up a little over subsequent releases, but the core horribleness remains. I suppose it's an opportunity to finally gut it and do things right.

Another pleasing outcome is there's now going to be a grand total of one drawing routine in the entire engine that handles everything. It's definitely safe to say that at this point in time, as we're at such an advanced stage in the porting job that there is no question whatsoever of it being otherwise.

Finally for now, I may be experimenting with removing multitexturing from the surface refresh. The reason why is better batching. I think it's going to be 50/50 whether or not we get improved framerates overall from this; one the one hand multitexturing has the advantage that you can do everything in a single pass, whereas on the other hand not multitexturing enables you to batch your vertexes more efficiently.

Now, DirectQ does certain things (some of which are unspeakable) to enable reasonably good batching with a multitextured renderer, but with the way Quake texture/lightmap interaction needs to be set up, it will never be as good as a non-multitextured render. The question is if the gain from better batching offsets the loss from drawing all surfaces twice. In GLQuake it's frequently "yes", but the answer is not so clear-cut with DirectQ. We'll have to find out in other words.

I'll update on how that goes when I get around to it.

More Thoughts for the Future

Just a few thoughts rather than words.

Firstly, I'm not going to get lightmap updates for free with the current architecture. This was a bit ambitious of me, but I still believe that it was a worthwhile goal and I'm glad that I shot for it.

I know how to do animated lightstyles entirely on the GPU, at least for single-component traditional Quake white lighting (RGBA is possible as well but requires beefier hardware). I'm going to consider this for a future release, but it's not on the list for 1.8.5. This will give updates more-or-less for free.

Dynamic lights are trickier. The problem is that we don't have the information sufficiently far in advance, and key elements change on a frame by frame basis. I have some experimental thoughts regarding it that will work on any hardware, but that involes a trade-off of sending a new lightmap to the graphics hardware versus accepting some extra overdraw. I think that the overdraw will be substantially cheaper in the common case, but I know that in the worst case (every visible surface modified by a large number of dynamic lights) it will be substantially more expensive.

There is another solution that requires shaders which I might also consider.

Hardware instancing. I might do it for certain types of object. It needs shader model 3 and requires the per-instance data to be smaller than the cost of just sending the full data directly (which is why it's not suitable for traditional Quake MDLs). Torches could certainly benefit from it though as their animations are largely in lockstep, so the vertexes and texcoords are identical per-instance, with just a matrix and a colour differing. Being client-side static entities means that there's nothing from the server that could upset things either.

Particles would need a position and a colour, so the only saving per-vertex is two floats (unless I find a clever way of reusing vertexes as texcoords); it's certainly do-able but is it worthwhile? I'll certainly be looking at putting some of the CPU work into a vertex shader for the general (no hardware instancing) case anyway.

Occlusion queries as a means of supplementing PVS? Interesting idea, but you're doubling the raw xyz position submission overhead for the world model. In scenes with a high wpoly count where everything submitted actually should be visible it's going to be bad. Changing the renderer so that every second frame we submit occlusion queries but present nothing to the screen would be a solution in this case. Need to think.

Slow machine performance - I occasionally test on a VMWare session running Windows XP and with VMWare's display driver. This is great for identifying potential bottlenecks in the code; it's how I identified that my console character drawing was a serious slow down. I recommend it to everyone. Right now it's telling me that I've still got room for improvement in my handling of bigger scenes, possibly by reusing vertex buffer contents across multiple frames if things haven't changed enough to require a full rebuild of the world.

That's all for now; these are just some theoretical possibilities for the future. Some, none or all of them might happen, and those that do happen might be wildly different in implementation from how I've described above - things always change radically when you get down to writing code.

Even more Particles! And other stuff!

Particles seem to be one of those things I'm destined to eternally struggle with. While debugging something else I happened to notice that my particle batching was almost totally non-existent. This dates back to the original release of 1.8.0, and was a major cause of slowdowns when rocket and grenade explosions went off (and there I was thinking it was dynamic light updates all this time).

Anyway, this is now fully resolved, so I can render thousands of particles on screen with minimal performance impact. I've also added bounding box culling to particle batches so that anything off screen never even gets to the GPU.

I've also resolved issues with use of larger vertex types than were needed for certain objects, so the slight slowdown when drawing MDLs is now removed.

Finally, I'm intending to experiment with use of hardware occlusion queries for nodes and leafs in the world, as a means of assisting Quake's PVS. DirectQ can do the readback part of hardware occlusion queries effectively for free (without needing to stall the pipeline) so this will be a balancing act between whether the overhead of drawing the surfs and issuing a query is acceptable given the gains from not drawing surfs for real.

My feeling is that it will penalise less complex (ID1) maps but will massively accelerate more complex (ne_tower) ones, so on balance it definitely seems worthwhile investing the time. How much it penalises the common case will be the determining factor.

Gameplay is insanely smooth with this version by the way. It really feels as though the engine isn't even running at all. One thing I want to do is decouple the client input from the main game loop which should mean that in most cases you'll get really good gameplay even if you're running slow. DirectInput buffered mode pretty much gives you that, but I'm not using it 100% properly yet as I don't take any account of the timing info it sends. When I get that in place it will rock.

More next time.

Friday, June 18, 2010

Another update

This will be the last one for today, but here's the full view of absolutely everything (so far) being rendered through the new framework.



The important thing to note is the performance counters in the top-right. This required a grand total of 28 draw calls. That's including sky, liquids, solid surfaces, models (including those hidden behind walls), particles, as well as all of the status bar graphics and console characters (even including the "wrote screenshot" message).

The "stream" count is the number of times that the vertex buffers need to be switched to a new vertex type. For the most part everything on screen uses the same vertex type, but there are some items that use a smaller or larger type.

The "lock" count is the number of times the vertex buffer needs to be flushed to video RAM per frame.

In other words it's very fast and efficient.

Anyway, there are a few more minor items to finish up, then it's on to the other 90% of the job, but overall it's looking quite good.

DirectQ License Decision

OK, I've made up my mind on the way things are going to be.

There are substantial chunks of this engine that I've effectively completely rewritten (or written from scratch) myself. These include:

  • The web download code.
  • The Quake cvar compatibility layer.
  • The alias model loader and renderer, and the new in-memory alias model format.
  • The Direct3D enumeration to string converter.
  • The render to texture framework.
  • The vertex buffer framework.
  • The MP3 player.
  • The memory manager.
  • The unicode converter.
From release 1.8.5 onwards these are going to be public domain works.

There are other parts of it, including the texture manager, the menu framework and the surface refresh, that have been given a quite severe makeover, but nonetheless are still identifiably derived from the original Quake source. These are future candidates for moving to public domain as I continue to rework code.

Other parts still were taken by me from other sources. These would include the unzip code (from zlib by way of Q3A), the MD5 generator (from publicly released RSA code) and maybe a few others. These are going to remain under their original licenses, with the exception of the Q3A stuff which I've exercised my option to upgrade to GPL version 3 with.

For the rest of the code I've also upgraded it to GPL version 3. This includes original ID code as well as code I have obtained from other Quake source ports.

Until such a time as 1.8.5 is released, DirectQ is remaining under it's previous licenses, and all previously released versions will remain as they always were.

I think that's about it!

More Progress

Alias models and particles moved over today. Alias models are running slightly slower than before (owing to the larger common vertex size) but I've resolved a major bottleneck in particles with more efficient batching and adding an alpha test to not write anything to the framebuffer with an alpha of 0. I think I'm also going to be able to get away with not indexing them, meaning that I'll have both reduced fillrate overhead and reduced vertexes overhead.

If that isn't neat I don't know what is.

I've also been thinking about ways to make the render more amenable to older hardware, specifically that which doesn't have hardware vertex processing available. It seems to me that in such a scenario copying data to a vertex buffer is just another memory copy we can do without; we might as well transfer it straight to the 3D hardware instead.

Of course I could be wrong and it could be the case that D3D is able to transfer more efficiently from even a software vertex buffer than any code I might write would be. It seems worth experimenting with all the same.

I've also set up vertex streaming so that we can partially stream submitted vertexes to hardware while the CPU is busy setting things up and the GPU is otherwise idle. In theory it should be a win, but in practice it seems to not make much difference.

Anyway, about the only thing left is handling alpha brush and alias models, then I need to go over the code and clean things up, pick up on anything minor I may have omitted (there are a few small things), tweak, polish and add some functionality (I have a few ideas), see how much more speed I can get out of it, and then that part of 1.8.5 will be done. It's the other 90% that needs to be completed after the first 90% is finished, in other words.

Till next time.

Wednesday, June 16, 2010

New Renderer Update

Solid surfaces have now moved across to the new renderer. I haven't bothered with the hyper-optimisations of the DMA transfer to the vertex buffer yet, and I think I'm going to leave it until such a time as I get everything done and then see if I need to do it at all. It's always nice to hold some future speed gains in reserve too.

The setup of solid surfaces rocks hard. It's much much much simpler than it was before, the rendering portion of the code is something like half the size, and the flexibility is really coming through strong. The whole thing is based around callback functions to update state, and uses a common vertex format that means I completely avoid the extra vertex buffer locks and stream source setup that plagued previous versions. That means fast, clean and flexible code.

As I said, 1.8.5 is likely to not be bleeding-edge optimised, but should still run quite a bit faster than 1.8.4 did in most situations.

I've also been working quite a bit on cleaning up the render-to-texture underwater warp update. This was pretty cruddy and fragile code that dates back over a year so it's overdue a sprinkling of freshness. Render-to-texture can hurt quite a lot in fillrate-bound situations, so I've added options to scale down the render target size dynamically at runtime. I think I'll be leaving these as just cvars, as the defaults I've chosen offer a good balance of performance and quality, and I don't want people changing them in menus and then coming to me complaining that it runs slow or looks ugly.

In general whenever I do that you should take it as a signal that this option is one that I don't want you playing with unless you're sure that you know what you're doing, and are prepared to accept the consequences yourself.

Lightmaps are the last item I've been looking at. I'm planning on finding a way to run lightmap updates somehow in parallel with other stuff, as they do represent one of the last real bottlenecks in the engine. When an explosion goes off we see framerates being cut to about a third of what they could be (this depends on hardware) so I want to do something about that. The real problem is that if we're updating a lightmap we need to stall the pipeline until such a time as the update completes before that lightmap can be used. One possible solution seems to be to double-buffer the lightmaps, but we'll see.

Update:

Lightmap updates now run considerably faster than before, with more speed to come. The secret was to shift the update to as early as possible in the frame, before the vertex buffers are built and well before any rendering is done. This means that the lightmap update can largely run parallel with the vertex buffer building, and therefore has a better chance of being finished by the time it's needed to be used.

This will get even faster as I move more rendering to the new setup, as the actual rendering itself is deferred to as late as possible in the frame and then all done in bulk. We might even get to the optimal situation where lightmap updates are entirely parallel and therefore don't need to stall the pipeline at all; in other words we'll be getting them effectively for free.

If that happens I'm going to see how we go with removal of the r_fastlightmaps cvar; after all if updates are free we don't need any hacks to make them fast, do we? A fully dynamically lit map would then become scarily possible. No idea yet what I might do in such a situation.

Water and other liquid surfs just moved over; alias models are next and then the bulk of what we see on-screen is done. After that I need to go over it and pick up anything I've left out, such as alpha surfaces, sprites and particles.

I'm probably going to move particles from triangles to quads. I need this for a custom particle system in the future, and I also need to use Quads here as I need indexed primitives to go into the new renderer. There will be additional overhead on vertexes from this, but a consequent saving on fillrate. Hopefully the way things are going we're not going to be bandwidth-bound, so it should translate into more speed overall.

Update 2:

Had a really neat idea last thing yesterday and couldn't resist the temptation to implement it. We've now got the fast updates to video RAM happening automatically irrespective of data format, and have quite good CPU/GPU parallelism, including free lightmap updates. The next step is to construct a way of streaming vertexes to the GPU efficiently, but I have some ideas for that too, so later on today should see a result with that.

Another possible change of license

A while ago I was playing around with the idea of switching my specific changes to a public domain license. PD is compatible with the GPL, and I believe that the bulk of the stuff I've rewritten is sufficiently different from ID's original code to be validly reclassified as an original work by this point in time.

I may yet do that, but for the time being I'm giving some serious consideration to moving up to version 3 of the GPL. This is allowed by the terms under which ID released the Quake source; with the following being included in the header of every Quake source file (my emphasis):

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
The reasoning behind this is a debate on Inside3D over what exactly you are or are not allowed to link to in a GPL application. The specific problem is that the GPL 2 text on this is quite fuzzy, could be read in a permissive manner, or could be read in a quite stringent manner.

Now, my reading of it is that the intention was to prevent people from putting huge chunks of their source code changes into non-Free static libraries and linking to them from a GPL program that just contains a "main" function and anything else they're obliged to abide by the GPL with. Some of the more stringent readings would say that you can't even link to something like the DirectX SDK. There's a line being crossed somewhere here between the spirit of the law and the letter of the law.

GPL 3 explicitly clarifies the whole matter, and makes linking to other non-Free libraries allowable (within certain reasonable bounds; you probably still can't do what I've said above but you can link to non-Free 3rd-party libs). In the same spirit as that which motivated me to switch from OpenGL to Direct3D, it seems as though one way to avoid having to deal with any potential problems is to just take those problems completely out of the picture.

I'll confirm what's happening later on after I've thought about it more.

OK, this was annoying

Just been spending a few hours trying to troubleshoot potential reasons why my scrap texture allocation was causing textures to render quite distorted and weird. It's not fun when your code is pretty much the same as the GLQuake equivalent, but GLQuake works fine and your's doesn't.

I'm not certain if my final understanding of this is correct, but in the end it seems as though there are more problems with D3D's floating point precision. The solution that worked was to align everything on 16-texel boundaries (some of Quake's textures aligned on 8 or 24 texel boundaries). This causes the texcoords generated by the allocator to have no more than 5 decimal places, which seems to preserve the texture correctly during rendering.

I'm going to experiment a little more with aligning on strict power of 2 boundaries or on 32-texel boundaries and see if that improves matters further, although to be honest doing so would be wasteful of atlas space (and fillrate) and may not have any really measurable improvement.

Another minor performance improvement was to revert to GLQuake's (and possibly WinQuake's, although I haven't checked) trick of updating the status bar only when it needs to be changed. I need to double-check this against different drivers as it may cause the old flashing status bar problem depending on the page-flipping policy, but I think I've solved the root cause of that at API level (D3D gives you really nice control of the hardware like that). Overall it saves about 5% to 10% on fillrate every frame (assuming the default full status bar) which can really help in situations where you're fillrate-bound (as you're most likely to be with DirectQ).

This only works with the classic status bar; the various overlay and headsup bars/HUDs need to overwrite fully as geometry is drawn under them and can be seen through them in the Z-order each frame.

I've also removed color buffer clearing except when absolutely essential. Color buffer clearing can be extremely fast these days, but not doing it at all is even faster.

Overall 1.8.5 is now running at about 1.5 times the speed in timedemo demo1 that 1.8.0 did. Once I finalize this chunk of code I can go back to porting the rest of the stuff to my new renderer (sky just went in recently) and see how much extra juice I can get out of it.

Tuesday, June 15, 2010

Speeds up (yet) Again!

I'd always known that drawing the status bar was a slow operation in DirectQ, and that running the game fullscreen with no status bar resulted in a significant framerate jump. This was caused by a number of factors, one of which was that I'd removed the "scrap" allocation system (for status bar icons) from the engine.

This is a form of Texture Atlas which was originally written to save on texture changes. On any non-prehistoric (post Voodoo 2) hardware texture changes are cheap, so it might seem irrelevant these days, but there is another unintended advantage of it, which is batching.

The key to high performance in a game engine is the ability to batch multiple primitives in a single draw call. DirectQ is capable of doing this for just about anything you see on-screen (I think that translucent brush models and the underwater warp update texture are the only things that aren't batched), which is where it gets most of it's speed when handling big maps and complex scenes from, but any state change requires breaking the batching.

By removing the scrap system I had each status bar icon in it's own texture, which meant that each icon needed a texture change, which broke the batching. It doesn't seem like a big deal (it's just the status bar, right), but depending on factors like what you have in your inventory, there could be over 30 individual textures here. That's 30 unbatched quads being drawn, which will suck the performance out of any engine to quite a surprising degree.

So I'm currently working on restoring the texture atlas system. At the moment it's functional in quite a raw state, but sufficient to confirm that the performance gain is worthwhile. It's not a few tenths of a percent, but more like a 5% to 10% increase. I'm hoping to be able to extend it to support external textures, as well as to have it be capable of coping with any size textures at load time but reduce it to the size that's actually needed at run time.

There are further gains to be made in other areas of the status bar code, arising from bad decisions I had made when originally porting GLQuake to Direct3D, and I'll talk more about them when the time comes.

Monday, June 14, 2010

Updates for 14th June 2010

After yesterday's fun with skyboxes, today I have bitten the bullet and begun the famous renderer reconstruction that I've been talking about for months. This was originally supposed to happen around 1.8.2 or 1.8.3, then got pushed back to 1.9.0, but now I'm bringing it forward.

The objective here is to put everything in the scene through a single unified render path. This render path will be flexible enough to cope with factors such as toggling between HLSL and fixed modes for all object types, and will also incur far less VBO locking/unlocking/switching overhead than what was previously there.

The longer term objective is to enable HLSL on all 3D objects (at least) for the purposes of more speed, more flexibility, and even something as radical and dangerous as bringing back fog! That won't happen until 1.9.0; there's enough to be getting on with at the moment.

All of this sounds ambitious and like a lot of work, but it's not really. I've already ported all of the 2D console, HUD and menu stuff in about one hour, and I'm about to start tackling 3D objects (think I'll start with particles). So far it's running at about the same speed as before, but there are some serious bottlenecks in the 3D render I'm planning on getting rid of.

I'll update further over the next few days on progress as it gets along.

And it's done!

Cube mapped skyboxes that is, not 1.8.5 (very big grin though).

More Updates

Cubemapped skyboxes! As you can see there's still some work left in adjusting the texture orientations to match those expected by a D3D cubemap, but otherwise the thing is working fine.



The major advantage of this is that I can now use the raw geometry from the BSP for rendering the skybox, so I don't need an initial depth-only pass, I don't need the heavy skybox clipping functions, I completely avoid the overdraw associated with the old way, and I get a rather good saving on fillrate and state changes.

I've also added "sky" and "skybox" alternatives for the "loadsky" command for consistency with other engines. They all work the exact same way of course, so there's no reason to prefer one over the other aside from using what you're used to.

One new usability improvement has gone in. Previously if you used the "game" command it always launched the menu, irrespective of whether or not you changed game from the menu or from the console. Now it only launches the menu if you changed from the menu; changing from the console will leave you at the console. I assume that since you were at the console that's where you want to be.

Sunday, June 13, 2010

Fun with Vertex Buffers

Since the original release of 1.8.0 DirectQ has almost exclusively used hardware vertex buffer objects for rendering most 3D objects on screen. Direct 3D is actually pretty good with these, as it possesses nice software fallbacks. Also, even though the geometry data is being updated every frame, D3D also provides an efficient means of using dynamic VBOs in a programmer-friendly manner. This gives improved performance in most cases by offering a more direct route to the 3D hardware.

However, your 3D hardware may not like them. It may offer them but run them slowly; they may require too much video RAM for you, or you may just want to render without using VBOs to see what the difference is like. Finally, some maps may be set up such that they are quite inefficient with VBOs (unlikely, but it could happen).

From 1.8.5 I will be offering a means to switch off VBOs via a gl_vertexbuffer cvar (and a menu option, of course). VBO-less mode will also be triggered if DirectQ fails to create a vertex buffer, which should widen the supported hardware base a little.



I wouldn't recommend switching VBOs off as something that people should do out of a sense of caution (or an unreasonable attraction to crusty old hat technology) but if you ever need to, now you can.

Friday, June 11, 2010

Updates for 10th June 2010

Last set of updates for a few days here.

I've been finishing off the luma blend and the palette restructuring. One thing that happened and that you need to know about was that I ended up removing support for compressed textures. This means dropping a few FPS, but the harsh truth is that texture compression was turning out to be far too much trouble in other areas, and that the loss of quality from it had really started becoming noticeable (especially with Quake's low resolution textures). It was also slowing down loading times a bit too much, but that would have been acceptable if it wasn't for the quality loss.

The images here give a good explanation of the kind of thing that was happening. Not nice.

No need to worry too much about the lost framerate; 1.8.5 still has bags of spare speed in reserve, is still faster than 1.8.4 was in regular gameplay, and will still massively outperform anything else in the big complex scenes.

Additionally, there are some vertex buffer optimizations in the pipeline that should restore the lost frames; not certain if they're going to make it for 1.8.5 but they will come.

Update:

I've been able to work around some of this by selectively enabling/disabling compression based on texture types. A better result.

Wednesday, June 9, 2010

Updates for 9th June 2010

Not much today, but I have been working a little more on the surface refresh. The current plan is to go for the adjusted blending modes for external texture lumas after all, but to generalise it so that the new modes are used for everything. This involves removing fullbright texels from the standard texture image, then using a lightmap * texture + luma mode globally.

This will also massively simplify the 2 TMU path as well as make it easier to disable fullbrights properly.

I've also integrated a lot of the palettes that DirectQ uses, which are pretty much a GLQuake hangover. As well as video dimensions, palettes are something that GLQuake kept multiple copies of scattered all over the place. I didn't help matters by making some more of my own, so I'm consolidating them all into a single set of linked palettes (standard, no fullbrights and fullbrights only) that are just used everywhere.

Both of these are in a state of partial completion and work is ongoing.

Final job was doing a sanity check on my render to texture setup. It occurs to me that I'll likely be able to use a single rendertarget texture with different viewport sizes here, so that should give us something of a video RAM saving.

I've also been thinking a little about eye-candy features in DirectQ, and it looks as though the first one is going to be bloom. Between a combination of render to texture and the new luma blending (which preserves the full luma colour - a prerequisite) this should be very quickly do-able. It will be HLSL only so fair warning. Of course it will also be disabled by default.

Tuesday, June 8, 2010

Speeds up again

Just been doing some work on my render to texture setup, and have gained about 20% performance in most cases. This was achieved by reducing the dimensions of the rendertarget texture to a power-of-2 below the display mode dimensions. The cost is a little jagginess around the edges of some viewmodels during underwater warps (at lower resolutions), but I'm going to see if there is anything I can do to alleviate that.

On balance it's a question of performance versus quality, and for now performance is winning out. Depending on feedback I get this might change, of course. Also, if I can get it smoother at the edges it will help a lot and I may not need to change it.

The main speed gain is in r_hlsl 1 mode, as it depends on the use of a post-processing shader for pickup/damage/etc flashes. r_hlsl 0 won't get the performance increase.

It's also possible to use this to do anti-aliasing as a post-processing effect, which is quite nice. Haven't thought through the full details yet, but it's certainly something that can be easily done (it will knock out the speed gains - and then some - of course).

Update:

All resolved by using the full-sized rendertarget for underwater warps and the scaled down one for bonus flashes.

For reference:

On my main test machine (Intel Graphics FTW!) with timedemo demo1, d3dpro400 gets 90 FPS, my performance optimized DirectFitz gets about 200, as does GLQuake. Release 1.8.4 got about 250-260 FPS; this one easily goes over 300.

Particles!

Last one for today. Up to now particles have always been rendered from system memory. For 1.8.5 you'll have the option to render them from a hardware vertex buffer instead, by setting r_drawparticles 2. This might be faster than using system memory, but then again it might be slower (it is in all tests so far) so you're going to need to try it and find out for yourself.

I've left the default as system memory because it is slower in all of my tests, and at least this way I've got a default that's consistent with previous releases.

r_drawparticles 0 will completely disable all particle drawing, in case you were wondering. You'll only need this if you're looking for the ultimate in speed and you don't care about drawing particles.

Monday, June 7, 2010

Other updates for today (7th June 2010)

It's looking as though the legacy blend mode for external textures lumas isn't going to make it in this release; further work on it hit a stumbling block that's going to require some more renderer restructuring to get around, so that's most likely deferred.

Thanks to Kempie I now have a functional Hexen 2 build that doesn't require the mission pack files. This is just vanilla Hexen 2 based off the source released by Raven, but it's another good step towards a long term goal which is to enable DirectQ to run Hexen 2 content. This isn't going to happen for the next release, or even for the next few, but it is something I'm determined to attempt.

Texture filtering modes have been changed slightly; the console characters now use nearest neighbour sampling by default. This gives better consistency with GLQuake and makes them easier to see (and read) for values of gl_conscale greater than 0.

Sound

Today it's mostly been the sound subsystem getting some tweaks. Sound is just as important as visuals for giving a sense of immersion, but the worlds of Quake are such quiet places. You get the occasional buzzing computer or crackling torch, but that's really all. This makes it even more critical to have decent sound handling.

Now, Quake's original sound system was designed for a P60 with 8MB of RAM running DOS. This means that there are quite a few constraints and artificial limitations in it. These go beyond obvious factors such as sampling rates and quality, and include things like the channel mixer, sound allocations, reuse and combination of sounds, and so on. Which is where today's work has been focussed.

DirectQ is now able to mix from up tp 65536 sound channels. These are dynamically allocated as required and cleared every map load to ease memory pressure. It's also far less gung-ho about reusing channels than the original sound engine was, which altogether gives a much higher quality, crisper and cleaner end result.

Another nice result.

Sunday, June 6, 2010

Changes to Textures

OK, with the netcode fully ported (including rcon) it's time to tackle another item that has been on the to-do list for quite some time, and that's elements of how DirectQ handles external textures.

The first thing is that external textures now get the same gamma scaling as native Quake textures do. This is very important for consistency as in the past, if you had a mixture of both texture types in a map, things just looked bad.

The gamma scaling, by the way, is similar to what GLQuake did (which most source ports have removed) and is - so far as I'm concerned - a non-negotiable feature. For the reason why you need to do a side-by-side comparison of DirectQ with another engine, like so:



On the left we have the iconic wbrick1_5 texture in a typical Quake engine, whereas on the right you can see how it looks in DirectQ. If you click on the link for the full-sized version you'll see even better how the DirectQ version just looks so much more vibrant, colourful and detailed than the other. This is something that's just totally lost without the scaling, so the scaling stays.

Anyway, the second item on the list is handling of fullbrights from external textures. The blend mode DirectQ uses for fullbrights is (mostly) correct so far as native Quake textures are concerned ((lightmap + fullbright) * texture), but doesn't work well with external textures, which were designed with a different blend mode in mind ((lightmap * texture) + fullbright).

Sometimes the end result is that fullbrights just look darker than they should, other times weird things can happen; for example if the base texture has the texels that would have been fullbright removed from it (to prevent over-saturation).

I'm currently about half way through fixing this, and the chosen solution is to just switch the blend mode if an external fullbright was loaded. I might go back and rework some of the native texture code so that I can use a consistent render path, but for now that's where things are at.

Until then, it seems reasonable to finish with a shot of DirectQ in all of it's external texture glory (yes, including sky).

ProQuake netcode features

In among all of the other small enhancements for 1.8.5, there is going to be a full port of the ProQuake netcode. Whether or not it's going to fully functional is another matter entirely, but it will be there.

The primary problem is that the ProQuake code contains a collection of quite evil hacks piggybacked on top of protocol 15. These involve a concept of "mods", stuffing data into strings, and generally doing a lot of nasty stuff that would have been better implemented in a new protocol. It also puts lots of tentacles out into other parts of the engine, which can frequently be difficult to find.

Porting the code to DirectQ is non-trivial:

  • Convert it to C++ and fix any conversion errors.
  • Rework cvars and cmds to my new systems.
  • Reintegrate any other changes I had made.
  • Ensure that it doesn't break anything else (hint - this is the hard part).
If DirectQ was a simple engine that just supported protocol 15 and didn't need to care about compatibility elsewhere this would be easier, but that's not the case. Code evolves over time, and it's not always as straightforward as one would like to take existing code that has been crafted over a number of years (and was originally written at a time when fiddly stuff like protocol compatibility wasn't cared about as much) from one engine and just drop it into another engine that has a completely different structure in a number of critical areas.

Example: I've already broken the ability of FitzQuake to playback a demo recorded with protocol 666 in DirectQ just as a result of the recent NAT fix. Now this may not be a big deal for somebody who's only concerned with the multiplayer capabilities of DirectQ, but you can bet that it will be a big deal for somebody else somewhere.

Depending on the feature that gets broken it may even be a deal-breaker, something that's enough to prevent somebody from using DirectQ. That puts me in a "damned if I do, damned if I don't" situation. Porting the code is only 20% of the work, finding and fixing all of these little incompatibilities is the other 80%.

When it comes to prioritizing one feature over another, it needs to be realised that the hardcore multiplayer community is only a very small part of the user base. Releases of DirectQ typically get 400-700 downloads; most of those I suspect are just casual gamers, people on a nostalgia trip who fancy a bit of Quake and just put "DirectX Quake" into Google to see what comes out.

I do however also like to give something to the hardcore crew (otherwise I wouldn't be writing about this!), so the whole setup is by way of explanation that while better multiplayer capabilities are coming, you should not expect them to all come at once and in a relatively short timescale.

Stay tuned.

Friday, June 4, 2010

1.8.5 is now definite

I was kind of hoping that 1.8.4 would mark the end of the 1.8 line, but a problem has come up with certain routers and/or certain servers that needs fixing. The cause is a partial NAT fix that has been implemented in DirectQ, but in some cases it seems as though it's not enough.

Right now I'm porting over huge chunks of the ProQuake netcode to get this done, so in addition to more robust NAT support you're going to get rcon and maybe one or two other things. I'm not certain how robust the bonus stuff is going to be with this next release, so I'd advise you that while it will be there you shouldn't rely on it too much. ;)

The full NAT fix is absolute top priority in other words.

My current intention is that 1.8.5 will definitely be the last, so we're on to 1.9 after that. Wish lists are closed for 1.8.5, but stil open for 1.9 (of course).

Thursday, June 3, 2010

More hypothetical "stuff to expect in the future"

Following on from my last post, here's a few other things that I'm planning for 1.9:

Fog

I got rid of fog in DirectQ because of limitations with newer gfx cards (they no longer support the old hardware fog so you basically have to do your own in shaders now). For 1.9 I'm going to be bringing it back, and - yes - doing my own in shaders. I want to do a really good HLSL fog mode, with nice simple features such as fullbright colours glowing through fog.

I'm not certain what the status of non-HLSL fog is going to be, fundamentally because I'm in a position where I'm unable to test it properly. It might remain absent, it might be there but be very basic, and it might become more functional in subsequent releases.

Renderer

I did a small amount of experimental work on the "renderer-as-protocol" idea today and it's looking like a viable way forward. This is needed because I want to move (almost) everything to HLSL for r_hlsl 1 mode; partially to support fog properly, but also for extra performance and flexibility.

Some existing bugs and glitches will be cleaned up.

Multiplayer Features

I want to build a server browser that downloads the server list from QuakeOne.com and lets you select a server to connect to from that via a menu.

Some more ProQuake features will definitely be there. I had hoped to also include the team scores features in 1.8.4 but the need to just get it released ended up being too great. They're virtually a definite.

External Textures and other Content Replacement

Charsets are long overdue being added to the list of items supported for external textures. There's still a sufficient amount of evil and ugliness in my code here that I want to clean out.

Half-Life BSP support fell quite a bit backwards in recent releases. I don't view it as being a critical feature owing to the lack of current mods that actually use Half-Life format BSPs, but I'd like to fix it up for the sake of completeness.

MD3 support is almost a core requirement these days. I've also written MD2 support for my Q2 engine that should be easily "drop-n-go"ed into DirectQ. The way I have my model renderer set up should mean that all I really need for these is loaders.

I'm still intending to do a particle system. Optional and disabled by default, of course, but I want one to be there. It's quite low on the priority list though.

Eye Candy

Effects such as bloom and heat haze might be added. These will be r_hlsl 1 only, and be optional and disabled by default (of course). The post-processing code that I currently use for underwater warps was written to be quite flexible in this regard, and the renderer-as-protocol setup will make things much easier to do.

I have an idea for shadow-mapping that seems like it might work well and with minimum Pain and Suffering.

No commitments to any of these, of course.

Multithreading

Some ideas here. I had an earlier version that ran sound updates in a separate thread but it proved unstable. It's something I'm continuing to read up on, and will hopefully be able to get some good results with.

That's all folks (for now)

Basically the 1.8 line has a good, solid and performant renderer that's moved in the right direction but still needs a bit of a working over in order to get more flexibility in. Having achieved that it's time to seriously start considering the addition of eye-candy and other frivolous things. The core of DirectQ will (and always will) remain classic and faithful, but these kind of features are also good things to have.

Holding off on them until the guts of the engine were in place properly was definitely the Right Thing to do; too many times in the past I've created a mess by jumping for the sexy features too soon. 50% of experience is knowing when not to do things.

On the other hand protocol and multiplayer features are things that don't immediately jump in your face, but that are of benefit to most people. There is always room to improve these from the rather basic Q1 implementation, but I'm a firm believer that changes should be slower and more incremental here.

I'll continue to post updates as more changes come to mind, and I'm always interested in feedback on what I've posted so far, as well as new ideas.

Wednesday, June 2, 2010

Some fun for the next release

I'm holding off on a lot of the bigger plans until I see what falls out of the 1.8.4 release, but I just thought that I'd like to give an indication of some kind as to where you can expect to see things going in future.

Some of these are potential candidates for a hypothetical 1.8.4b (or 1.8.5).

  • Luma handling. DirectQ uses a different luma blend mode to most other engines, which in some cases - depending on the textures you use - can result in a multiply by 0 for the final colour. The trouble is that there are a LOT of released texture packs that assume that the "standard" (but actually incorrect for native Q1 textures) luma blending mode is used, which can cause things to look bad. In the spirit of not crusading off a cliff I'm going to do something about this; most likely switch to the "standard" mode for external textures.
  • Gamma handling. DirectQ currently applies the same 0.7 gamma as GLQuake does to native textures, but does not apply it to external textures. This has needed to be fixed for quite some time now, so it will be soon.
  • Window creation. Since I have now unified the splash screen with the main app window, a bug of sorts has emerged. I use CW_DEFAULT as the initial (on creation) window size and position, but that's wrong - it should be CW_USEDEFAULT (the "cw" in CW_DEFAULT refers to the floating point control word, not CreateWindow). This isn't currently causing any problems - I suspect that MS anticipated that people might do this and have code to check for it and correct it - but it's still wrong. My oops.
  • Renderer restructuring. Those of you who have downloaded the 1.8.4 code will see that there are remnants of an initial attempt at integrating the HLSL VBO paths into the fixed pipeline VBO paths, but that it didn't get finished. I now have a much better idea for this whole thing which involves treating interaction between the renderer setup and the actual rendering as a protocol, and using Quake's well-defined protocol functions for building the lists of items to be rendered. They can still be very lightweight lists, but the gain in flexibility will be quite good (it should get faster too).
  • Performance. Believe it or not there is still ample extra performance to be extracted out of this. One plan is to put the map into a static VBO (instead of building it dynamically per frame), the render restructure will give a few more frames, and there is colossal room for improvement with both MDLs and particles. Dynamic light uploads are also a bigger bottleneck than they need be.
Of these, the first few items are more likely to appear sooner than the last ones.

If you have anything on your own wishlist for 1.9, now would be a good time to mention it too. It's fair to give advance warning that requests involving per-pixel lights, bumpmapping and/or real time lights will be quite politely ignored. :)

Tuesday, June 1, 2010

1.8.4 is out

Have fun.

Like I said, there may be some bugs in it that I've missed. I think a reply to this post is probably the best place to report them.

Still no Windows 98 build by the way, as a grant total of zero people requested one for 1.8.3c; this leads me to suspect that my theory about the few who downloaded my previous Windows 98 build might be correct...

UPDATE

A player skin bug got through, so I've put up a patched version of the executable, together with the source code change (in the same archive) needed for the patch. 3 lucky people now have a unique limited edition of release 1.8.4; everyone else gets the patched version.