Wednesday, June 16, 2010

New Renderer Update

Solid surfaces have now moved across to the new renderer. I haven't bothered with the hyper-optimisations of the DMA transfer to the vertex buffer yet, and I think I'm going to leave it until such a time as I get everything done and then see if I need to do it at all. It's always nice to hold some future speed gains in reserve too.

The setup of solid surfaces rocks hard. It's much much much simpler than it was before, the rendering portion of the code is something like half the size, and the flexibility is really coming through strong. The whole thing is based around callback functions to update state, and uses a common vertex format that means I completely avoid the extra vertex buffer locks and stream source setup that plagued previous versions. That means fast, clean and flexible code.

As I said, 1.8.5 is likely to not be bleeding-edge optimised, but should still run quite a bit faster than 1.8.4 did in most situations.

I've also been working quite a bit on cleaning up the render-to-texture underwater warp update. This was pretty cruddy and fragile code that dates back over a year so it's overdue a sprinkling of freshness. Render-to-texture can hurt quite a lot in fillrate-bound situations, so I've added options to scale down the render target size dynamically at runtime. I think I'll be leaving these as just cvars, as the defaults I've chosen offer a good balance of performance and quality, and I don't want people changing them in menus and then coming to me complaining that it runs slow or looks ugly.

In general whenever I do that you should take it as a signal that this option is one that I don't want you playing with unless you're sure that you know what you're doing, and are prepared to accept the consequences yourself.

Lightmaps are the last item I've been looking at. I'm planning on finding a way to run lightmap updates somehow in parallel with other stuff, as they do represent one of the last real bottlenecks in the engine. When an explosion goes off we see framerates being cut to about a third of what they could be (this depends on hardware) so I want to do something about that. The real problem is that if we're updating a lightmap we need to stall the pipeline until such a time as the update completes before that lightmap can be used. One possible solution seems to be to double-buffer the lightmaps, but we'll see.

Update:

Lightmap updates now run considerably faster than before, with more speed to come. The secret was to shift the update to as early as possible in the frame, before the vertex buffers are built and well before any rendering is done. This means that the lightmap update can largely run parallel with the vertex buffer building, and therefore has a better chance of being finished by the time it's needed to be used.

This will get even faster as I move more rendering to the new setup, as the actual rendering itself is deferred to as late as possible in the frame and then all done in bulk. We might even get to the optimal situation where lightmap updates are entirely parallel and therefore don't need to stall the pipeline at all; in other words we'll be getting them effectively for free.

If that happens I'm going to see how we go with removal of the r_fastlightmaps cvar; after all if updates are free we don't need any hacks to make them fast, do we? A fully dynamically lit map would then become scarily possible. No idea yet what I might do in such a situation.

Water and other liquid surfs just moved over; alias models are next and then the bulk of what we see on-screen is done. After that I need to go over it and pick up anything I've left out, such as alpha surfaces, sprites and particles.

I'm probably going to move particles from triangles to quads. I need this for a custom particle system in the future, and I also need to use Quads here as I need indexed primitives to go into the new renderer. There will be additional overhead on vertexes from this, but a consequent saving on fillrate. Hopefully the way things are going we're not going to be bandwidth-bound, so it should translate into more speed overall.

Update 2:

Had a really neat idea last thing yesterday and couldn't resist the temptation to implement it. We've now got the fast updates to video RAM happening automatically irrespective of data format, and have quite good CPU/GPU parallelism, including free lightmap updates. The next step is to construct a way of streaming vertexes to the GPU efficiently, but I have some ideas for that too, so later on today should see a result with that.

0 comments: