Taking a short break from timer and input work to explore the potential of hardware instancing a little more. I've seen potential in using this to accelerate MDLs a little, so today I wrote up some test code to just count how much of a saving I can get from using instancing (particularly in combination with my current MDL code which can do massive VBO reuse).
Results were quite surprising to say the least. Even in the most common cases - like the Necropolis demo - maybe 90% of the time there are significant enough numbers of the exact same MDL using the exact same frames, vertexes and textures being drawn. I expected some for sure, but I didn't expect that much. In extreme cases - like my 400 knights map - batches of up to 150 or so of the MDLs being drawn can be instanced. I expect that something like the Quoth longtrch model in ne_tower (where there could be up to 80 or so of them in the PVS and view frustum at any one time) would also benefit hugely from this.
So there is potential benefit to be had here and it's an avenue worth exploring.
Of course this will only work if your hardware actually supports instancing; hardware that doesn't needs to go through the slower path, but at least that can benefit some from the VBO reuse I have going so it's not all bad news.
Thursday, March 24, 2011
DirectQ Update - 24th March 2011 - Fun with Hardware Instancing
Posted by
mhquake
at
8:46 PM
Subscribe to:
Post Comments (Atom)
5 comments:
does this work if frame interpolation is enabled? Seems that it wouldn't work because monsters are not all in lock step, isn't it true that entity thinks are staggered across a 100ms span?
Yes, it works. I do interpolation in the vertex shader so the VBOs only need to be updated 10 times per second, with interpolation factors either being sent as uniforms or as per-instance data.
Entities get sorted by model, then by frames (current and previous), so we get to instance everything that uses the same model/frames/textures. With higher numbers of entities on-screen the probability that the number of instances will be high is good.
Even with only 4-5 entities using the same model there is still a decent probability that instancing will be able to kick in about 50% of the time (I haven't done a detailed analysis, just Con_Printf).
A neat trick is that if the current frame has a lower number than the previous one you can swap the frames (and invert the blend factor) which can increase the number of entities that can be instanced too. If entity A is interpolating between frames 4 and 5, and entity B between frames 5 and 4 (typical for a standing still animation) then they're effectively both the same. Just swap the frames and invert the blend for entity B. This has benefit even in the non-instanced case as the VBOs won't need to be touched.
With my 400 knights map we actually get instance sizes of 200, 130, 50 and downwards from there.
What Direct3D9-class hardware doesn't support geometry instancing, aside from the GeforceFX series? I know the Radeon x700 and x800 (and higher) support it through an extension...
Probably none. Otherwise some lower-end Intels perhaps? I've enabled the ATI extension in this code, by the way.
http://www.google.de/search?hl=en&client=opera&hs=uhT&rls=en&channel=suggest&q=dx10+skinned+instancing&aq=f&aqi=&aql=f&oq=
I haven't read it though.
Post a Comment