Sunday, October 31, 2010

Updates for Sunday 31st October 2010

No DirectQ for a short while as I've been Having Fun In Real Life, and have also gone back to some things I've been semi-neglecting in the RMQ engine. It's interesting (if a little schizophrenic!) to ping-pong back and forth between the two like this, as ideas and techniques have more of a chance to cross-pollinate. Also good to pick up one again when I need distraction from the other.

There are some RMQ features that need to come over to DirectQ before I can release 1.8.7; the most significant of these are improved handling of alpha surfaces (sorting by model is not enough) and "fence textures". The latter will be a feature of the upcoming RMQ demo release, so You Have Been Warned.

I also need to port some of the recent protocol ideas I've been testing in DirectQ over to RMQ.

Speaking of RMQ, I've raised the possibility of an earlier "beta" release of the engine with the rest of the team, and have agreement to do so. The thinking behind this is that it represents quite a major rewrite of huge chunks of QuakeSpasm, and - while we do have a pretty good mixture of software and hardware platforms to test on - we obviously don't have everything. It's important therefore to get it out and get as many people bashing on it as possible to uncover any bugs or iron out any wrinkles that we may have missed (or that we may have gotten used to and learned to ignore!)

There's still some current WIP stuff ongoing, but as soon as this comes to a close we'll see what will happen.

Friday, October 29, 2010

More Performance Notes

Things are building up nicely again; I just got maybe 25% extra from it. It's still not as fast as the modified 1.0 I keep for testing and experiments, never mind being as fast as it should be, but it's definitely getting there.

Specific trouble spots.

The sound code I've been using for the past few releases was EFF-YOU-SEA-KAY'ed. I ended up rolling that back to the 1.7.666 level, losing some of the extra functionality I'd added but gaining a whole heap of speed in exchange. I'm going to need to add back the ability to restart the sound system on-demand, as well as player-selectable sampling rates. It's been my stated intention for a while to port the Quake II sound code, so now I have added incentive to do so.

Obviously there were portions of that code still running even when I had thought I'd disabled sound in previous tests. I'm happier with having rolled back though as some of the changes I'd made since were quite fragile.

There is a call to "rand ()" at the start of Host_Frame that I'd always been dubious about (it was in ID's original code) but I'd left in all the same, because I never saw any reason to remove it. That was sucking away a few percent CPU. I've commented it out for now, and am going to run with it like that for a while to see what happens.

Host_FilterTime was brutal; just pushing the stack each run through the main loop was taking about 8% CPU. I've inlined that one and we're in better shape now.

Where things are at now is that the heaviest functions are more or less the ones I would have expected - dynamic lightmap building, alias model vertex interpolation and R_RecursiveWorldNode. There are still some anomalies - a heavy stack push in my Sys_Milliseconds function and the setjmp in Host_Frame being two of the more important ones. There's some runtime checking I've added to the Debug build which is more or less irrelevant (it won't be in Release builds), and drawing the console characters is heavier than I'd like, but not too dramatically bad overall.

I'm now suspecting that the biggest cause of some of these performance drains lies in project options I have selected, so I'm going to play around with reverting some of these to the same as my test/experimental codebase and see what comes out of it. On balance though I'm a lot happier with where things currently stand than I was even only yesterday.

Phew! Onwards!

_________________________


Update.

That worked! It seems to be back up to full speed now, and is going about 55% faster than it was yesterday, or over twice as fast as 1.8.666b was.

What a horrible nightmare of an experience...

Tuesday, October 26, 2010

Where the Wild Things are

I've managed to claw back some of the performance loss that happened with 1.8.666b, but I'm still not all the way there. Currently it's running at about 75% the speed of earlier versions, which translates to maybe 1.5 times the speed of 1.8.666b. It still compares unfavourably to recent OpenGL-based builds I've been testing, as well as to my semi-souped-up DirectQ 1.0 (with added coloured light support and a few other experimental bits in it).

Unfortunately I remain uncertain as to where the rest of the performance has gone. The theory I favour at the moment is that it's timer-related, and that there's something that used to be done every (for example) 5 seconds that is now happening every (same example) 5 milliseconds instead. This could be totally wide of the mark of course. What I have ruled out is anything related to the server, the sound code, or the renderer, so whatever it is, it's almost definitely happening in the client code.

Quite annoying stuff, to say the least.

I've added a fade to the intermission screens (sorta like the menu fade when you bring it up in-game, but not as dark), the main reason being that I feel that the intermision text might be a little hard to see in many scenes. We'll see how that one goes over; I think it looks kinda cool but some might think "it's different! change it back!" without even considering it on it's own merits. If anyone can come up with a valid gameplay related reason for changing it back I'll be more inclined to do so, in other words. (One that's more important than the reason why I added it wouldn't hurt either.)

Been playing some of the original Half Life lately too. 12 years after the hype it's interesting to re-evaluate the game, and a lot of the things I found wrong or annoying about it first time round still hold up today; specifically that I find most of the environments quite bland and repetitive. Some of the sparkly effects and environment mapping that Valve added to it over time really look badly out of place among the rest of the late-90s graphics too. It's still a fun game if you can ignore the vast expanses of featureless military grey though. Shame the source code will probably never be released; it would be fun to have a poke at that (especially the software renderer).

Monday, October 25, 2010

Improved Overbright Lighting

Most Quake engines have improved the lighting from what GLQuake offers, by supporting 2x overbright lighting. There is however one small problem with this:



All those black marks are where even 2x overbright light isn't enough and the lighting range gets clamped (I just removed the clamp and let it wrap down to a low number for demonstration purposes). Quake's lighting range evidently needs a little more than that. It's interesting that this even happens with the standard lightmaps in some places.

So 1.8.7 is going to support 4x overbright light as an option. This gives a fuller lighting range so that coloured dynamic lights will no longer saturate to white (very effective, this) and so that really bright highlights in standard lightmapping come out better.

This will be selectable through the gl_overbright cvar (0 = none, 1 = 2x, 2 = 4x) or a menu option.

There is a slight cost in that the lightmap resolution may visibly degrade a little, but in most cases you shouldn't notice it. This can be resolved by going to 64-bit lightmaps, which earlier versions did support, and which I may consider again, but there is a speed penalty there.

Because of this the default remains at 2x overbrighting.

Saturday, October 23, 2010

Some more fixes and tidying up

Console backscrolling has been fixed so that it no longer shoots you back to the end of the console buffer each time a new message comes in. Because DirectQ can use the Home and End keys for command editing (going to the start or the end of the command you're typing), getting these to interact with backscrolling is an interesting problem. I eventually decided that if you're already at the start of the line and you press Home, it goes to the start of the console buffer, and likewise if you're already at the end of the line and you press End it goes to the end of the console buffer.

There may be some usability glitches to be worked over as successive versions evolve. My objective is to have what DirectQ does when the player presses a key be what the player expects to happen. It needs to be remembered here that this includes new and inexperienced players as well as Quake old-timers. The Home/End behaviour for console line editing is what people expect from exposure to Windows edit controls, so changes to that are really non-negotiable.

I've added IP log support. This is really just for the multiplayer crew and is considered a semi-essential feature by them. In the end it proved quite painless, but there are a number of things to be unhappy about in the default (ProQuake) implementation, not least of which is that it grabs 2 MB of memory to itself at startup, and needs a command-line option to configure a different amount. This goes completely against the ethos of DirectQ where there are no memory limits, only the memory that is needed gets used, and the player never has to specify anything on the command-line. Unfortunately it also uses a rather messy tree structure with some special case checking involved, whereas a linear array with qsort and bsearch would have been a lot cleaner and more maintainable. It'll take quite a bit of work to fix, in other words.

I'm also implementing more of the ProQuake messaging system for the multiplayer folks. It'll probably never be really "complete" (just ditching protocol 15 and replacing it with a new extensible protocol based on that would have been preferable; the messaging system as it stands is a mess) but it will build up a little more than it currently is. Unfortunately for this it puts it's tentacles into all kinds of other places in the code, so implementing any part of it is a lot more work than it should be.

Speaking of protocols, I have some more enhancements to the RMQ protocol coming up. I want full floating point range throughout the engine for all coords and angles; there's no excuse for retaining the quantized versions in a modern protocol (these were especially evil for rotating brush models). Client angles are now the last to move over, and this gives a subtle but noticeable improvement in smoothness. I also want time transmissions as milliseconds so as to avoid accumulated floating point precision loss. For the last few versions DirectQ has been a test bed for these changes, and this is going to continue.

I'm coming to more decisions regarding fog handling, but I haven't coded anything yet. I am satisfied however that the visual discontinuity when moving from an empty leaf to a liquid leaf is sufficient to justify changing fog colour. The specific scenario I'm thinking of is when you're in a map with red fog and you go underwater (or when you're in a map with blue fog and you go into lava). There is some interaction with the "v_cshift" command to be considered. In an ideal world this command either wouldn't exist (not so great as there are useful things a modder can do with it) or would be more flexibly implemented. More on that later, perhaps. A way to let mappers specify their own fog colours for liquid would also be useful, but I have no thoughts on that just yet.

How to handle maps where no "fog" key is set is an interesting dilemma. There is sufficient ambiguity as to what the mappers intention is in this situation, not to mention a large enough pre-existing body of work, to ensure that no perfect solution can exist. I'm reasonably happy with my current ideas on this one, which I've outlined earlier, but no doubt there are going to be some who feel this approach isn't satisfactory.

The decision of whether or not to provide a default fog colour for maps depending on worldtype is made: I won't. The reason is that the mappers intention might have been to use the default grey fog, and I have no way of knowing. This is a case where the existence of previous work turns things the other way. The fact that it's possible to stuffcmd a "fog" from QC without specifying a colour strengthens the case for this decision, as it's clear in that situation that the intention is default grey.

I guess all of these underline how poorly thought through the implementation of fog in GLQuake always was, but hindsight is definitely 20/20 here.

Wednesday, October 20, 2010

Fair Warning

I'm making some changes to the video mode enumeration at startup. The reason for this is that some drivers weren't getting modes which they should get (like an 800 x 600 fullscreen mode was available but an 800 x 600 windowed mode wasn't) and some were getting modes which they shouldn't (like 2560 x 1780 on a screen that only went up to 1280 x 800).

Historically whenever I do this DirectQ goes berserk and everyone has to wipe their configs. This time I think I have a solution, which is to store a "VIDDRIVER_VERSION" in your config. If that stored isn't what the current verison of DirectQ expects (or if it isn't there at all) you'll just get a reasonably safe windowed mode instead.

The reason for that is that the value of vid_mode in your config may not be valid in the new modelist. And as I said, with the addition and removal of modes on some drivers, a new modelist can't be avoided. Having to go into the Video Options menu and set it back the way you want is a one-time only operation, and will only be needed whenever I make potentially breaking changes to the video startup code, which shouldn't be too often.

Tuesday, October 19, 2010

First Fog Decision

If the new map is the same as the previous one, let's not reset fog. We leave it as it was before. If the new map is different, we wipe the settings. In both cases we still let the settings intended for the map (if present) override everything else.

This should handle the case in MP games where you die and respawn, and get your fog settings wiped. It's also relevant in SP. And it still respects the wishes of the mapper. The only tricky bit is where the omission of a "fog" key signifies that the mapper's intention is no fog, but I lean towards the opinion that if you want anything specific you should be saying it explicitly. Anyway, there's nothing to stop the player from bringing down the console and changing it themselves. Since they've already done so once it seems safe to assume that they would do so again.

Bottom line: if the player wants to run with specific fog (or any other) settings, they will, irrespective of what the mapper wants or doesn't want. There comes a point where you have to start getting pragmatic and accept that's going to happen.

I think it's a reasonable enough compromise overall and I can't really see anyone having too much cause to disagree with it.

I'm also interested to see if use can be made of the old Fugly Fog cvar "gl_fogenable" here. I'm thinking maybe 3 settings: "always do what the map wants irrespective of what I say", "do what I want unless the map says otherwise" and "always do what I want irrespective of what the map says", with perhaps the middle one (the behaviour I've just described above) being a sensible default.

Questions about Fog

I've just been optimizing the fog code a little. Previously it had gone through the full fog calculation even when fog was disabled (just with values of 0), which I was never totally happy about, but it was something I had done just to get the damn thing working, always promising myself "I'll do it right next time!"

Well "next time" came, and thanks to the magic of #ifdef in HLSL I now have two copies of my shaders, one version with the fog calculations and one without. I just select which one to run at the start of each frame and have everything nice and fast. A previous incarnation recompiled the shader when fog needed to be enabled or disabled, which stalled a little, but having two copies is minimal overhead and means that all I need to do is set a pointer and it automatically kicks in. No runtime branching either, which - since I'm still targetting Shader Model 2 - would have been quite evil.

So this raises a few questions about how fog should behave. For 1.8.666b I just copied what Fitz did and reset the fog at the start of each map, but should it persist across maps? Should I set a different default fog colour for each worldtype and let the map (or the player) override it if desired? What about underwater fog? Should that be the same colour as the world or should it be it's own colour? Should I disable the view blend if we're underwater and fog is on? I kept the old hacky GLQuake Fugly Fog cvars in the engine but right now they do nothing - should they also take effect? Should they override the "Fog" command or not? Anything else?

For all of these I have my own ideas about what should be done, but I'd definitely be interested in hearing other people's opinions too.

Slow Engine Startups

One of the reasons DirectQ starts up (and changes game directories) slowly is because it builds a cache of "interesting things" in your Quake folder; things like maps, demos, external textures, available games, save games, skyboxes and so on. This cache is then used to populate the menus and to enable TAB autocompletion for most commands that take a file name (I think "exec" is the only one it doesn't do). It also enables faster checking for external textures so that map loads are faster.

With hindsight the way it builds this cache is kinda wrong. To take the example of maps, on startup (and game change) it opens the BSP file, has a look in the entities lump, extracts the name of the map (so that it can display "the Dismal Oubliette" instead of just "e2m6" - being player-friendly is the name of the game here folks!) and does some other checks (like looking for "info_player_" entities to ensure that it's really a map and not an ammo box).

A lot of this info is really not needed until such a time as you actually go navigate to the appropriate menu. TAB autocompletion just needs the name of the file, and the menu can "lazy cache" the data, i.e. wait until it's actually needed before extracting it and saving it out, then use the saved copy in future.

In fact, the only really useful information it extracts on startup is whether or not the map is really a map and not an ammo box (we don't want anyone typing "map b_batt0" in the console, do we?) And even that's slightly dubious because there is nothing to say that QC has to name a player start spot "info_player_anything"! It could be called "elephant" in someone's mod and the map would work perfectly, just so long as the actual QC code was correct.

The same largely applies to demos and save games. The information is different, of course, but the basic principle is the same (and in those cases we have the bonus of not needing to worry about whether a demo or a save file is really an ammo box).

So this is up for change. It's not going to happen immediately (in particular because there is quite a lot of it and it all needs to be reworked and tested) but these are all things that should be getting incrementally faster over time.

Monday, October 18, 2010

Updates for Sunday 17th October 2010

Map load times are now substantially faster owing to improvements and efficiencies in a lot of the loading code. They might get faster still if I ever get around to investigating D3D automatic mipmap generation.

Engine startup time is also a lot faster. I removed the splash screen (sorry, I know it was cute, but...) and also removed a lot of legacy crap that was slowing it down. There's still a vid_restart happening during startup that shouldn't be there, but it's not too big a deal.

I've removed dynamic linking to the D3DX library. What this means is that we're back in the situation where you may need to update your D3D. The version of choice is June 2008 so everybody should be more or less OK, but fair warning all the same. I was never 100% happy with dynamic linking anyway, as there was always a chance that something in D3D might change and break it. This way is just better longer-term.

I've fully deprecated the non-Vertex Buffers mode; it's gone. Bear in mind that D3D will very happily create a Vertex Buffer in software if you don't support them in hardware (early DirectQs used them) so this is OK. Things are just a lot cleaner and clearer - not to mention less buggy - this way. I have some tricks for improving the performance of a lot of the renderer, but they might have to wait until the next one as they will need restructuring elsewhere.

Some limits which were previously set at "infinite" have been clawed back to a finite number. MDL verts is 65536 (that's the most the format can support anyway!) and lightmaps to 256. DirectQ uses bigger lightmaps than ID Quake so you should never run out, even on the most extreme maps.

One slightly controversial one is that I've changed the underwater warp. This is something else that I was never happy with; the old code did a lot of shuffling things around and messing with multiple render targets, and it was never really set up right for it (nor was it totally robust). It also offered a choice between running at half the speed or at lower quality. The new code is full speed, full quality, and still gives quite a good warping effect, but not quite the same as software Quake did it. Wait till you see it before complaining, OK.

I've completely overhauled the 2D drawing code, and it's now totally different to what it was before, and potentially a LOT more efficient. A massive bottleneck exists in the polyblend code, which is done during 2D drawing. I've found with RMQ that shifting it back to the 3D drawing (like what ID Quake did) was actually quite a bit faster. API differences may be a factor preventing this, but it's definitely worth trying.

I've also finally bitten the bullet and ran DirectQ through a source code formatter. Bye bye disgusting unreadable code. Of course I had to spend some time cleaning up some of what that did (no formatter is perfect), but at least all my tabs, indents and braces are the way I prefer them now.

Sunday, October 17, 2010

A Decision To Be Made

Previous versions of DirectQ had an FPS lock of 500 FPS in regular game play (it was unlocked in timedemos), but I removed it for 1.8.666b.

I'm thinking of putting it back in. You see, I'm working on separating the timers in the various components, so that the renderer - for example - can run independently of the client and the server, and at it's own speed. This brings some serious problems to light that didn't really exist before, most particularly that things start behaving a little oddly when you exceed 500 FPS, and they behave really oddly when you exceed 1000 FPS.

What kind of things? Well, above 1000 FPS, and because DirectQ uses an integer timer (it's rock-solid and completely immune to accumulated floating point precision errors) we're getting frametimes of 0 most of the time. This means that anything that's time-dependent in the engine (which is pretty much everything) won't actually happen! The biggest culprit so far is the console. Above 1000 FPS it won't come down at all during gameplay! I suspect that there are more, but I haven't seen them yet.

Above 500 FPS we're in a situation where integer division comes into play. Say you're running at 501 FPS; because of integer division, that means that frames get a time of only 1 millisecond, whereas they should be nearer to two. It's not as pronounced as when you're above 1000 FPS, but it still does lead to some jerkiness.

The proposed FPS lock is going to be 250 FPS. At this point integer division is not going to be such a dramatic thing as the differences between points become smaller.

We're clearly in a trade-off situation here. The choice is between a floating point-happy timer that has humungous problems of it's own (accumulated precision loss, based on an unreliable source that may change during gameplay) or an integer timer that has these problems I've just described. I lean towards favouring keeping the timings rock-solid and adding code to prevent the integer timer problems from happening (i.e. an FPS lock). With the floating point timer it's completely out of my control.

Of course, if I can find another solution that doesn't require the FPS lock I'll do it for sure. I already have some ideas for that, but I can't promise anything.

This is going to start becoming a problem for all clients as GPUs get faster, especially in Quake where people like to run at uber-elite FPS, so everyone's going to face this at some time or another.

Saturday, October 16, 2010

On Wasting Memory

I occasionally read about people suggesting use of suboptimal texture formats like GL_RGB so as to not "waste memory". That's an interesting concept, and it doesn't quite fit in with the defintiion of the word "waste". Let's look at a dictionary, shall we?

waste
1. to consume, spend, or employ uselessly or without adequate return; use to no avail or profit; squander: to waste money; to waste words.
2. to fail or neglect to use: to waste an opportunity.
3. to destroy or consume gradually; wear away: The waves waste the rock of the shore.
4. to wear down or reduce in bodily substance, health, or strength; emaciate; enfeeble: to be wasted by disease or hunger.
5. to destroy, devastate, or ruin: a country wasted by a long and futile war.
6. Slang . to kill or murder.
Right, some of those aren't really applicable here (no prizes for guessing which) but some offer an interesting and different perspective on the word "waste".

Let's take the example of 2 Quake engines. In one of them the programmer has frugally fine-tuned every single resource allocation and it only needs 8 MB of video RAM. In the other one the programmer has not been so meticulous and it needs 64 MB of video RAM. Which one is "wasting memory"? A lot of people would say the second one, but is it?

Back to the dictionary. Definition 1. Is the extra 56 MB being consumed, spent or employed uselessly? Or is it being used for something? Like faster (but larger) texture formats, vertex buffers or something else? Not so clear now, is it?

On to definition 2. This is where things turn totally upside down. On a typical low-end machine from a few years ago (with 128 MB of video RAM), the first engine is neglecting to use a total of 120 MB of video RAM, while the second is only neglecting to use 64 MB. By definition 2, the engine that only needs 8 MB is actually wasting almost twice as much video RAM as the one that needs 64 MB.

You see, video RAM is not one of those resources that needs to have every drop of it preciously conserved. It's there to be used, and if you're not using it, then it's lying idle. (In a funny twist on this, the programmer of the first engine has actually wasted time and energy - by definition 1 - in a lot of effort that was not actually needed!)

So by all means talk about "saving memory" where appropriate, and avoid needless use of it for sure, but everybody should reconsider what they mean by "waste" before using that word, because it might not mean what they think it does!

Friday, October 15, 2010

Updates for Friday 15th October 2010

Input has now become incredibly smooth. I've multithreaded the DirectInput code and cleaned up a lot of subtle little problems in it, some of which I was aware of and some of which I wasn't. I'm debating whether or not to provide an option to run non-multithreaded; right now it attempts to create the extra thread but falls back on non-multithreaded if it can't. Still fine-tuning aspects of it, in other words.

One of the problems with ID's original DirectInput code that I guess people should know about is that it also fires the regular Windows mouse messaging events even if DirectInput is enabled. You can verify this by commenting out the Key_Event calls in IN_MouseMove and running with -dinput.

I've also totally gutted the joystick and XBox 360 controller code. Decision isn't final, so let's see how many people howl to have them back. I guess something like enabling force feedback on a joystick would be a cool novelty effect, but is it actually an essential feature or is it chrome? Is it something you'd play with for 5 minutes, think "that's neat" and then turn off, or would you use it all the time?

I'm slowly working my way through the code doing something of an overhaul of the timers. I've posted on Inside 3D about the problems with floating point timers, specifically problems caused by loss of floating point precision. Switching absolutely everything to integer millisecond timers means that there is no precision loss, and timing totally tightens up and becomes rock solid, even at ultra-high framerates where precision errors accumulate really fast with floating point.

There are a handful of areas where the interface needs to be kept at floating point for legacy compatibility. One of them might be QC, although I suspect that it may be possible to change the relevant struct members to unions and fool the progs. The critical one is the svc_time message - with (most?) current protocols there is absolutely no way whatsoever around that one so you do need to degrade things there. The new protocol I'm working on for RMQ is going to use integer milliseconds for svc_time instead of floats, so it won't be an issue there.

clc_move uses floats too, but that's only used for sending ping times back to the server. I'll probably change it anyway as it seems to make sense and will give an accurate time back at the server in case it ever needs to use it.

Timing is a bitch.

There are also some problems in cases where r_wateralpha is set to less than 1; framerates can become quite jerky and uneven. I'm aware of it and it's on my list, but I haven't investigated in any real detail yet.

About to dive into MDL code soon. There are ways and means of accelerating this quite a bit by offloading some calculations from the CPU to the GPU. I've already got this working in a test engine and have confirmed that the full implementation, incorporating static vertex buffers for MDL data that doesn't change per frame (like indexes and texcoords) can go at about 4 times the speed of the current code. That's not going to happen now though as it doesn't play nice with the structure of the current code, and would actually slow things down instead pending a rewrite of some other things. I'll see how much extra performance I can get out of it though.

Overall things are looking a whole heap in better shape than 1.8.666b did. stay tuned.

Thursday, October 14, 2010

DirectQ Updates

There will likely be a Release 1.8.7 of DirectQ at some time relatively soon. How soon, I don't know yet, but soon enough all the same.

What's motiviating this is the short break from RMQ that I took to get 1.8.666b out. Having gotten used to the ultra-smoothness of the RMQ input code, I was fairly disgusted with DirectQ's, and there are also some MDL and particle rendering tricks from RMQ that I want to port over too.

I'll post updates on how things go, of course.

Being able to do comparisons between both engines is very educational (not to mention beneficial for both, as ideas from one can feed back into the other) and it's something I'll likely be doing a little more of in the future. Week on/week off seems about fine.

Wednesday, October 13, 2010

A Nasty Bottleneck and it's Scandalous Secret Uncovered

DirectQ 1.8.666b has a bottleneck that 1.8.666a didn't have. A week ago it wasn't there in 1.8.666b either, so it's obviously something that happened in that time. What could it possibly be?

The answer lies in the timer. Because Quake's timers use floating point they are subject to precision loss. For some time now DirectQ has used integers on a millisecond scale all the way through to avoid this precision loss, only converting back to floating point at the very last possible moment and only where it is needed (typically by the progs.dat). However, for 1.8.666b I had made the mistake of going back and switching all the timers back to floating point for the sake of "keeping the code clean".

BIG MISTAKE.

The primary cause of the problem seems to have been in the main loop, where I had two DWORD timers that I had switched back to FP. Reverting these to the way that they were resolves things to a large extent. Reverting elsewhere throughout the code fixes everything else up nicely, and we're back to enjoying super-high framerates. Double-checking before and after with FRAPS confirms it.

________________________

Because of all this my warning about 1.8.666b now holds doubly-true. It should be considered as not much more than a techdemo of fog in shaders, with 1.9 going to be the Real Thing.

Tuesday, October 12, 2010

1.8.666b is Out

Get it here.

I've dropped it back to Beta status and haven't made it the recommended release because I'm not totally happy with the code. As I've hinted before, my recent work elsewhere has made me more aware of problems with this code that I would otherwise have been, and I'm also moving towards getting rid of legacy paths that are contributing to a lack of cleanliness in it.

If you have problems with it, my response is probably going to be something along the lines of "use 1.8.666a instead". One that I am aware of is that it runs slower.

Monday, October 11, 2010

Of Particles and Sprites

I've just been having some fun with particles in D3D vertex buffers. I've already done this code in OpenGL for RMQ, so once again it was a case of porting it to the other API. I've aleady made my comparisons of the 2 APIs in the past, so it's enough to say that the D3D version is a lot nicer, has fewer lines, and is much clearer overall.

There are a number of interesting options in D3D for doing particles. I had tried point sprites in the older versions of DirectQ, and they worked well enough, but word is that they're no longer guaranteed to be hardware accelerated on newer GPUs (at least in D3D). Interestingly enough, they don't suffer from the same problems as OpenGL point sprites do.

There's also a rather nice ID3DXSprite interface that automatically handles the necessary states, batching, etc for you. Neat. I did get this working but ultimately - while it was nice - it wasn't flexible enough for Quake's particle requirements and had to go. I'll be considering it as an option for the 2D interface stuff though.

After that we get into silly territory; running them entirely on the GPU. There are practical benefits to this, but I'm of the opinion that they would show their worth in a more complex particle system than Quake's. That's the odd thing about Quake - it can be so primitive sometimes that you really struggle to get modern stuff working well with it. It's still fun though.

There is a certain amount of calculation that could definitely be moved GPU side however; I'm thinking of the whole scale/up vector/right vector thing. But overall I believe that GPUs are more suitable for drawing polygons and stuff like that.

Geometry shaders are right out in D3D9. As a move to D3D11 is looking more and more unlikely, this is something that probably won't stand a chance of happening. I did write a particle geometry shader in OpenGL for RMQ, but once again Quake particles stress on fillrate more than on vertexes or shader ops, so there was no benefit to it and I removed it.

So it looks as though vertex buffers are the way to go, using the same setup as for MDLs: 2 vertex buffers, one dynamic (for position and colour) and one static (for texcoords), and one index buffer.

____________________

I'll probably put DirectQ 1.8.666b out tomorrow. At this point in time I just need to get it off my back so that I can start moving forward with other stuff again. I'm also conscious that I've been neglecting RMQ a little while doing these experiments, but the change has been good for me and I have been learning quite a bit.

You'll need to be aware that 1.8.666b has lost some speed by comparison to earlier versions. I'm not too certain where or how that happened, and right now I just want shot of the thing so I'm not putting much effort into bugfixing it. All the signs are that 1.9 will go like a proverbial rocket, and my focus is on what's coming up rather than what's gone before.

I guess the advice then is to use 1.8.666a unless you're specifically affected by one of the bugs that 1.8.666b fixes, or unless you want to see the fog.

Thoughts on D3D11

About a month ago I drew my first triangle with D3D11. My initial reaction to the API was kind of puzzling, and to understand it, you need to know a little about the history of D3D.

In the beginning it was a mess, and all the horror stories about it were definitely true. But as time went on it was evolved, and by the time of D3D8 - when the requirement to use DirectDraw interfaces in conjunction with it was removed - it was a very clean and elegant API. D3D9 built on that, making some very minor changes and adding HLSL.

So on to D3D10 and 11. I'm not sure if it's correct to say that "D3D11 is to D3D10 as D3D9 was to D3D8", but the similarities are there and it works for me. In both 10 and 11 the interface is through generic buffer objects. You pass them arrays of structs (sometimes nested 2 or 3 levels deep) specifying content types, usages, properties, binary data, etc, and they do their thing. All the simple, direct (pun intended) elements of the API are gone.

This seems kind of familiar so I had to do a slight double-take on it, but I'm quite convinced by now - it's just like our old friends the dreaded Execute Buffers all over again!

Who knows - maybe this is now a rational and sensible design paradigm for a modern 3D API on modern hardware, but it's hard to avoid the feeling that whatever else it may be, it's one hell of a huge step backwards.

I haven't yet worked up the courage to look at textures, so that's where things currently stand with those plans. It's still something I'd like to do, as there are advantages to using D3D11, and if they outweigh the disadvantages (and potential World Of Pain And Suffering inflicted by it's design) then it remains viable. Time will tell.

Attack!

I've been playing around with some experimental new MDL rendering code in an older version of DirectQ, and the good news is that I can now get the 400 Knights map exceeding 200 FPS.

The bad news is that it needs hardware vertex buffers (with multiple streams - a form of instancing, really) and shaders, so it's not going to happen for a while yet. Another reason to drop the legacy paths from DirectQ!

RMQ already has a variation on this, but it's not as fast (and the code is nowhere near as clean - OpenGL VBOs suck for messy code). The main speed difference is that the D3D version can use dynamic vertex buffers, which I've never been able to get working satisfactorily in OpenGL.

So compare this with the heap of gl*Pointer/glBindBuffer/glEnableClientState/glClientActiveTexture/etc that OpenGL needs (how times change):

d3d_Device->SetVertexDeclaration (d3d_MdlVertDecl);

d3d_Device->SetStreamSource (0, d3d_MdlVertBuffers[hdr->buffernum], 0, sizeof (mdlvert_t));
d3d_Device->SetStreamSource (1, d3d_MdlStBuffers[hdr->buffernum], 0, sizeof (mdlst_t));
d3d_Device->SetIndices (d3d_MdlIndexBuffers[hdr->buffernum]);

d3d_Device->DrawIndexedPrimitive (D3DPT_TRIANGLELIST, 0, 0, hdr->numverts, 0, hdr->numindexes / 3);

A funny thing with this code is that it's actually slower under light loads - some overhead from vertex buffers (can't imagine what, but it's the only theory I have) must outweigh the gains from using them in those cases. Put it under pressure though, and it sings.

There are other nice things that RMQ does with vertex buffers that are going to be candidiates for migration back to D3D for DirectQ, but the RMQ codebase does also support legacy codepaths if the newer stuff isn't available, which is a necessary requirement, but does compromise it's performance (and code cleanliness) a little.

Anyway, I also got some work done on lightmaps in DirectQ, but lost a whole heap of performance from it, which was quite odd; especially as a previous version of the same (also D3D) was a huge gain in another test. I'm going to need to revert that, but overall the release is coming closer.

Thoughts on Hexen II

The code is really really ugly. I thought the released Quake 1 code was in bad shape, but this is another level beyond.

It fully respecifies the texture object for dynamic lights! That's not clever; the old insistence on using GL_RGBA (or worse, GL_RGB) is already dramatically slower than it should be, but this is just silly. No wonder there are so few dynamic lights in the game; that's something to change.

I suppose it's the earlier version of GLQuake that it's based on that worked that way.

No multitexture. That's fixed now; we have multitexture and almost everything using vertex arrays. The intention is to complete the move to vertex arrays, then try some of my VBO tricks I've used in RMQ. The longer term goal is to make this a platform I can try out new ideas for the RMQ engine on, but bloody hell, there's a lot of work involved in getting it there.

The various surface types, with all the abslight stuff, is horrible. I've put them all in a shader. RMQ also optionally uses shaders (if they're available) for sky and water polys only; unlike DirectQ they're not user-selectable, but that's OK (I regret making them user-selectable in DirectQ). I might move more stuff in it to shaders too; the ability to set up any texture blends you want exactly how you want them, and with no interminable lines of glTexEnv, is something that it's damn easy to get used to.

Speaking of surfaces; I don't know if Hexen II is supposed to have fullbrights and overbrights like Quake is, but from looking at the colormap I guess it is; they're in now. These are handled in a shader too, and I've used the alpha channel of the texture to stash the fullbright info so that we don't need to load a second texture. This isn't an option for RMQ as it needs the alpha channel for it's "fence textures".

2 x overbrighting does not provide enough lighting range for most Quake maps. They really need 4 x overbrighting but that would give bad stair-step lightmaps. I might look at using 64-bit lightmaps; I've already tried them out in the first release of DirectQ and they work very nicely indeed. These are also an option for RMQ. (Thought: using a shader allows for any arbitrary overbright scale; you're not restricted to 1x, 2x or 4x like with fixed).

I've changed the protocol to allow for floating point coords and angles; this is now incompatible with the original protocol, but it's only an experimenal engine so I can get away with that kind of thing. It does smooth out rotations quite nicely though.

I'm going to do interpolation. I'm not happy with the interpolation code in almost any Quake engine I've ever seen (the software Quake code I wrote comes closest to pleasing me) and I want to try my hand at something that works well and seems good to me.

Part of the problem is that the Quake models (especially the v_ models) are not interpolation-friendly. That's probably not an issue with Hexen II, but it is something that should be fixed. My muzzleflash hack in DirectQ is a decent enough solution, but I live in fear that it's not going to work as a general solution, and that someday someone is going to make a model that it doesn't work on. It's held up well so far though. Fitz also has a workable solution, but none of these can replace a model that's properly designed to begin with.

Note to content creators: if you're creating a new v_ model, please please please make it interpolation-friendly.

Think that's all for now.

Sunday, October 10, 2010

Fog in DirectQ

I recently got GLSL fog working in the RMQ engine, and since the RMQ engine has recieved shader code from DirectQ (ported to GLSL, and optional, of course) what better way to return the favour than doing fog in DirectQ?



OK, support is fairly limited at the moment as I've just implemented it with a static colour, static density and on world surfaces only as a test, but now that I've cracked the problem it's going to happen.

The bad news is that it's in the HLSL path only. As you might have guessed from previous posts, I don't have much interest in maintaining the non-HLSL path going forward, so consider this the first indication that it's becoming deprecated.

Direct3D doesn't emulate the old fixed-functionality fog on Shader Model 3 and upwards hardware (unlike OpenGL, which does), so there are practical reasons for doing it this way too.

Like a wise man once said:

...the cost of adding a feature isn't just the time it takes to code it. The cost also includes the addition of an obstacle to future expansion.

Sure, any given feature list can be implemented, given enough coding time. But in addition to coming out late, you will usually wind up with a codebase that is so fragile that new ideas that should be dead-simple wind up taking longer and longer to work into the tangled existing web.

The trick is to pick the features that don't fight each other. The problem is that the feature that you pass on will allways be SOMEONE's pet feature, and they will think you are cruel and uncaring, and say nasty things about you.
I'll definitely get 1.8.666b out after I've got this done. It's been a long time coming, and the only reason I've been putting it off is pure laziness at this stage. I'm due a short break from RMQ anyway, so it's nice to spend some time on something else.

_____________________________


Update

Fog is now done. It looks good and doesn't slow things down at all.

Another problem - the timer has gone all jerky on me. This is most likely a result of my having (ab)used DirectQ for some experimental work before I wised up and got a dedicated experimental engine. I'll likely need to end up rolling back code to the previous version and then bringing on the important changes. Oh well, not that big a deal and it will mean things are more solid overall.

It's a pity though as getting this out is becoming important. I really really really want to start gutting the old legacy code and reworking some parts that recent experience has taught me a lot about. There's potential for this engine to go considerably faster and this old crap is genuinely holding things back right now.

_____________________________


Update 2

I replaced the screenshot.

Another bug; things get seriously jerky when r_wateralpha is set below 1. This one actually dates back to 1.8.666a; I just never caught it at the time (I rately use r_wateralpha).

I need to run some profiling I think.

Saturday, October 9, 2010

Faster Map Loading

I can now get ID1 map loading times down to virtually instantaneous. OK, they're already fast, so what's the point in beating on something that doesn't need optimization? The answer is big maps - big maps load slowly. Really big maps load really slowly. Removal of bottlenecks from map loading code means that big maps can load faster, meaning that the player is no longer sitting there twiddling their thumbs and looking at the "Loading" screen for what seems like forever. This is a good thing.

There are a few places in the original code which are candidates for attacking here.

First one is texture loading. Instead of allocating each texture structure individually, I allocate a single big array and just pull structs off that as required. GLQuake doesn't need to keep the original texture pixels either so there's a memory saving here too.

Second one is polygon creation. This also suffers from "lots of small allocations" syndrome - especially so in big maps. It's quite easy to use the same technique, allocate a big array of polygons, and pull from that instead.

Texture resampling is another one. GLQuake resamples non-power-of-two textures to powers of two, and this is now done at the same time as the upsampling of 8-bit textures to 32-bit. Another memory saving is just a bonus.

Lightmaps - clearly they're all the same size and format, so if a texture object already exists from the previous map, just reuse that (via glTexSubImage2D) instead of fully respecifying the new texture. There's also scope for applying this principle to other textures, but I haven't done so yet.

There are other bottlenecks in the RMQ engine that don't exist in ID Quake and that I haven't yet identified; it's still a little slower at loading regular ID1 maps (the transition should be almost totally seamless) so I need to do some profiling and find out what's happening.

Friday, October 8, 2010

More on the Experimental Engine

I'm trying to see if I can find a way to make this experimental engine releasable some time, and might well remove the new timer code after all.

Meanwhile here's a screenshot of it:



Well, I did say that it was a Quake engine, but I didn't really say which one, did I?

It's quite an interesting engine to work on, being an obviously earlier incarnation of GLQuake than the one we have the code for (no FOV, no multitexture, etc). It's also a good valid testing codebase as many of it's features are quite similar to those which will be used in RMQ, even though some of them are implemented in a rather strange manner.

Onwards!

Thursday, October 7, 2010

Another GLQuake Problem

I'm not sure how well known this one is, but I've been aware of it for a while (at least since I wrote my software Quake interpolation code) - MDL frame group animations are not timed correctly in GLQuake. In software Quake there is a separate list of "intervals" which each frame in the group references to find how much time must pass before proceeding to the next one. In GLQuake frames in the group are just evenly spaced instead.

I'm not entirely certain why this happened but I more than half suspect that it had something to do with the issues discussed at this link (scroll down a little to the "anatomy of a mis-feature" heading).

So I'm currently working on fixing it for my experimental GLQuake engine, then migrating this fix to my other codebases. One item worth mentioning is that all of my current codebases use an in-memory MDL format that is quite radically different from GLQuake's (compare especially DirectQ's d3d_rmesh.cpp with GLQuake's original), so what code I get out of this will likely be completely unusable for tutorial purposes. That's an unfortunate consequence of progress and code simplification elsewhere.

I'll probably at least do a code-dump of the relevant parts on Inside3D and leave others to work the rest out for themselves, but short of bringing up yet another pure GLQuake build and fixing it in that, there's not really much else I can do there.

More Experimental Work

I've mentioned in passing once or twice that I have an experimental GLQuake engine which I use for testing out ideas on before moving them into a main codebase, and that this engine completely obliterates everything else for performance.

The reason for this is that I've abandoned any support for legacy hardware in it. It absolutely requires multitexture, shaders, vertex buffers, pixel buffers, all the good, modern stuff. (This, incidentally, is a direction I also intend moving in with DirectQ).

Unfortunately it's completely unreleasable for a number of reasons.

Firstly, the somewhat - shall we say - reactionary conservative - nature of some parts of the Quake community would prohibit it from being released. People will want to try it, and it won't work for many. This is for OpenGL 2.1 or higher hardware only, and makes extensive use of OpenGL 2.1 or higher features, so those who are still clinging to older kit won't be able to use it.

Secondly, it's very much a bare bones engine. No wateralpha, no fog, no shadows, no extended limits, no interpolation, no external textures, no multiplayer features, no mapper or modder features, you get the idea. It's just a very basic, very pure, ID1 Quake engine that exists for trying out ideas (like doing fullbrights on the GPU without needing an extra texture), without anything fancy to distract from it's purpose (and - more importantly - without polluting any main codebase with experimental stuff).

Thirdly, and most decisively - it contains non-GPL code in a critical area - it's timer. This is code otained from Microsoft's research website (search for KUSER_SHARED_DATA and ignore the rootkit sites) for a Windows (NT only - see first point) timer that has better resolution than timeGetTime but without the quirks of QueryPerformanceCounter.

I could go off on a rant here about how the GPL restricts creativity and sharing of useful code, all in the name of gene-pool purity, but I've done that before and I won't do it now. (Besides, Microsoft's attitude towards the GPL could be raised as a valid counter-argument).

Over time code and ideas from this will start emerging, but what of the engine itself? Life at over 1000 FPS is nice for sure, and I might circulate it privately some time, but I really don't want to remove the timer (doing so would prevent it from running at over 1000 FPS) or compromise it by adding compatibility with older hardware (or with features I consider inessential for the purposes of this engine) so a public release is out of the question.

Monday, October 4, 2010

Experimental Work

To ease myself back in I've brought on yet another GLQuake build to use as a baseline for experiments. This is a handy thing to have as it means that I can try out new stuff without polluting the main RMQ or DirectQ codebases, and I can determine if a problem is a code problem or an RMQ/DirectQ problem if something works in one but not in the other.

Some things I've done with it so far include:

  • I've ported my water and sky shaders from HLSL to GLSL. The water shader is unlikely to make it into RMQ owing to weird interactions with wateralpha and fog, but the sky shader has already been brought over.
  • A new replacement timer from some code found on the Microsoft Research site. This has 100 nanosecond resolution and suffers from none of the problems that QueryPerformanceCounter has. Obviously it's Windows only so it's not for RMQ, but is a candidate for DirectQ.
  • I've successfully decoupled the client timer from the server timer and now have them both advancing at different rates. They're still in a single thread, but otherwise it means that the server can now be locked to a max of 72 FPS while the client can run as fast as it wants. This is a candidate for both engines.
  • Some significant speed and memory usage improvements when loading maps. I'm not certain if this code would be useful for either engine just yet.
Longer term I intend using this same codebase as a platform for some CSQC work, so we'll see what happens when I stare into the abyss...