Monday, September 27, 2010

I'm back!

I've been away for a while exploring around some ancient ruins in the Middle East, so there have been no updates and I obviously didn't get to put the finishing touches on DirectQ 1.8.666b and get it released.

Just got back yesterday and I'm still in the wrong time zone and recovering from exhaustion caused by a week of walking in semi-desert heat, but fun was had and I'm already looking forward to the next one.

Bear with me for another few days while I'm catching up on stuff, and hopefully things will start happening again soon.

Thursday, September 16, 2010

No further comment necessary



400 Knights. 86 FPS. Nuff said.

Tuesday, September 14, 2010

RMQ Engine Performance

The RMQ Engine is now faster than DirectQ in just about every situation. My 400 Knights stress test is the one where the speed increase is highest, with it being over twice as fast as DirectQ (just over 100 FPS vs just below 50 FPS).

This is a very good result and gives me some useful pointers for moving forward with DirectQ 1.9; it's now very obvious that removing the legacy code paths is very much The Right Thing to do, and should see DirectQ jumping ahead again, especially as MDLs in DirectQ are going to be completely on the GPU.

I might fast-track some of the GPU MDL code to DirectQ over the next few days, as I'm hugely interested in doing a head-to-head comparison of the two engines and the two APIs. If I do, I'll be certain to post an update.

I'll also do my best to get that long-awaited release out while I'm at it.

Update: a quick experiment with DirectQ indicates that it's highly unlikely that going fully on the GPU for MDLs is going to be feasible. The fundamental problem is that the data sets are just so huge - an ID1 map needs about 44MB for it's vertexes!

Of course today's GPUs can take the strain, but at the same time - while I will be raising the entry level for 1.9 - I do want to keep it at a 128MB D3D9 card.

It's not totally the end of the world; I can still do similar to what I've done with RMQ and keep indexes and texcoords in a static VBO but have vertexes in a dynamic one. Even that alone is sufficient to remove a colossal part of the overhead of rendering MDLs. Fully on the GPU certainly would have been nice though - oh well.

Sunday, September 12, 2010

Experimenting with Lightmap Updates

Quake lightmap updating, how do I love thee? Not very much at all, really. After my previous mountains of fun figuring out what the hell was going on with glTexSubImage2D it was recently time to experiment a little with the timing of updates.

I have for a while been working on the basis that updating lightmaps at the end of the frame was the best thing to do, as it would give the glTexSubImage2D call some more time to complete (i.e. while the server and client are running for the next frame) before the texture is needed again. Of course that means that the light updates you see on-screen would be one frame behind, but you won't notice (I would defy you to), and it's only when things are inconsistently out of sync that they become jarring. Logically this should be faster as the call (and it's associated data) should just go into OpenGL's command queue, but it seems as though things are not so clear.

It turns out that - on some drivers, at least - if OpenGL hasn't finished drawing with a texture when you update it (which is more likely than not to be the case, as OpenGL - and D3D - is an asynchronous API) it will stall until it finishes before the texture can be updated. These stalls can be quite brutal, requiring glTexSubImage2D to pause for 27 milliseconds in one case. That dropped framerates down to 30 on one driver I tested on.

So the best time to update a lightmap is to wait until almost as late as possible before you actually use it to draw with. This means that all updating of the in-memory copy of the data should be completed and it should be ready for the glTexSubImage2D call - and nothing else, so far as updating is concerned - before you use it, and you should not call glTexSubImage2D again on it for the remainder of that frame.

This is one reason why Quake's default multitextured renderer gets incredibly jerky and slow on certain hardware, why modified engines that don't change that part of it can gind to a halt, and why it's sometimes faster to disable multitexturing with those engines. The default will only update a small region of the lightmap, then draw, then update another small region of what is potentially the same lightmap, then draw again, and so on. So the sequence is update/draw/update with massive stall/draw/update with massive stall/draw/etc.

Oddly enough, this is exactly what I already knew to be the case with updating dynamic vertex buffers, so with hindsight it should have been very obvious to me.

It may still be the case that this behaviour is only exhibited on certain drivers but not on others, and that I might need to do an on-startup dummy run through an update to figure which is the fastest, and I haven't yet done head-to-head comparison of the two strategies on a wide enough variety of hardware, so I don't know yet how widely applicable this is.

There may also be some benefit in having a short delay between the update and the use - say by drawing something else (like sky) or by sorting some objects - but this will need more testing to be certain.

My overall feeling though is that when it comes down to the final analysis, a strategy that works most acceptably on the widest variety of hardware is always preferable to one that works excellently on only a selective subset.

Every lesson learned, no matter how frustrating at the time, is a good lesson in the longer term, and this was certainly one of those.

(DirectQ 1.9, by the way, is probably going to evaluate lightmaps fully dynamically on the GPU in a pixel shader so none of this will be relevant to that.)

Saturday, September 11, 2010

More fun with timers

Recently I've been experimenting a little more with Quake's timers. One of the primary reasons for this is that I recently bought a new machine to replace the old PC that had died on me, and I was interested in seeing how the old QueryPerformanceCounter problems fared on this. This is arising out of a dissatisfaction with timeGetTime having a resolution of 1 millisecond, mainly because Quake's default framerate (72) doesn't divide into 1000 evenly.

The end result was certainly interesting. With the first test I tried (the original release of DirectQ, back when I was calling it "D3DQuake") I got a really bizarre waltz, as timing speed seemed to see-saw back and forth alarmingly. Quickly switching it over to timeGetTime resolved this immediately, so obviously QPC is still a no-go area on at least one bang-up-to-date, 8-core 64-bit Windows 7 machine.

DirectQ itself uses timeGetTime, but it also contains (disabled) code allowing it to use QPC, so I enabled that and tested, with no ill-effects. Likewise the other engines I've tested that also use QPC worked fine. There's obviously something else at work there, but in the absence of fuller knowledge of what it was I'm inclined to say "steer away from QPC".

While I had the QPC code enabled in DirectQ I also played around a little with max framerates. DirectQ has historically limited it's max FPS to 500, because with an integer millisecond timer anything greater than 500 will round down to 1. So if you're running at even 501 FPS it will round down to 1 millisecond, and give you odd behaviour.

So I removed this cap, and things went totally haywire! The reason for this one was that there is code in Host_FilterTime (inherited from original ID Quake) to lower-bound frametimes to 0.001 seconds. When you're running at over 1,000 FPS this is obviously not a desirable thing to have.

In the end I saw a maximum speed of about 14,400 FPS under idealised conditions. That was fun. At that kind of speed you're obviously in no-man's land with Quake, and all bets are off so far as physics behaviour is concerned (all bets are also off so far as anything even remotely approaching accuracy of the FPS counter, by the way, because we're in frametime territory where even a minor rounding error will become significant).

Interestingly, I was also giving my server throttling code (from here) a bit of a shakedown in DirectQ, and even at a few thousand FPS it still managed to maintain correct physics in Quake, so it looks good and has passed an important test.

More on RMQ progress next time!

Tuesday, September 7, 2010

Updates for Tuesday 7th September 2010

The RMQ engine renderer is getting near to completion now. It's actually on it's third rewrite, but this was all a valuable process and allowed me to experiment with and test out various rendering techniques that both perform and behave differently in OpenGL to what my most recent experience with D3D had me believe.

There is possibly a fourth rewrite in the pipeline, and the reason for this is the difference between how vertex arrays (and vertex buffers) work in OpenGL and D3D.

The D3D method is clearly superior. It involves far less code and gives much more flexibility in what you can do, primarily because it decouples layout specification from data specification. As a specific example, in D3D setting up to render a complex vertex layout (something involving position, colour, normals, lots of multitextured texcoords and maybe some blending factors to go into a vertex shader) requires just two lines of code (SetVertexDeclaration and SetStreamSource); in OpenGL it's multiple pages of glEnableClientState/gl*Pointer (and possibly some glClientActiveTextures for the texcoords).

The example I quoted above isn't really relevant to the RMQ engine, although a hypothetical similar scenario could be for MDLs, so it's not too far off being realistic.

Now, say you're in a position where you have a layout that contains a position and may also contain either of two colours (each stored as a byte[4]) or a texcoord. The vertexes are all stored in a single data array, and you can switch back and forth between each layout at arbitrary points in that array. This one isn't at all far-fetched - the second rewrite of RMQ did just that. You can either add redundant data to the layout or go through a mess of glEnable/glDisable each time you need to change. Yuck.

So hence the third rewrite and hence the fact that performance had started to suffer somewhat recently.

As an experiment I have maybe half of a D3D-like interface to OpenGL's vertex arrays API written; I'm uncertain if I'll continue with it but it's interesting to see the code needed and how API calls that accomplish much the same thing differ between the two. I don't however think that there's going to be much practical advantage to completing it, but if a requirement does come up I'll probably investigate further.

Performance-wise, particles are now a massive bottleneck. I'm not really understanding why at the moment, but when even something as simple as the lava ball trail in the hard hall of start.bsp goes off I lose about a third of my framerate, whereas DirectQ doesn't even bat an eyelid at it. This is on recent NVIDIA hardware, which one would reasonably expect to be in a different class where OpenGL is concerned.


Speaking of DirectQ, I've been thinking further about the roadmap, and while the eventual objective remains unchanged, there will most likely be an interim D3D9 release that raises the hardware requirements to D3D9 class hardware, removes all of the legacy rendering paths, and adds some features to the renderer that I flat out have not been able to do owing to having kept the legacy paths. So that's going to be 1.9, with the D3D11 version being 2.0 (which will line up that jump nicely with a new major version number).

The D3D9 interim version will do everything through shaders, which will prep things nicely for the eventual move up. In fact, r_hlsl 1 in DirectQ will right now already do everything through shaders, so there's no work involved here aside from removing the old code paths. There will however likely be some restructuring where I may have had to do some things in a certain way to support both paths as options, but that may be suboptimal for HLSL-only, and there will be other features added to take advantage of being HLSL-only.

I have some ideas for the kind of stuff I want to add, but I'm not going to make any premature announcements, so you'll just have to wait.

Before then I also intend putting out 1.8.666b, which contains some more bugfixes (no, I haven't had the chance to test out the other ones I mentioned yet) and will likely end up being the last DirectQ for older hardware. This is important for anyone who wants or needs to use it in preference, although there are other D3D engines available here, and the mighty Baker from QuakeOne.com is doing some impressive work with both ProQuake and EngineX, both of which have D3D builds available.

With those kind of alternatives there is no longer a requirement for DirectQ to remain a legacy engine, so it's a nice time to be moving forward and good that I can do so with no regrets and no looking back.

Till next time!