Which one is faster for copying 8 bit source data to a 32 bit locked DirectDraw surface via a palette? This:
dst[0] = ddpal[src[0]];Or this:
dst[1] = ddpal[src[1]];
dst[2] = ddpal[src[2]];
dst[3] = ddpal[src[3]];
*dst++ = ddpal[*src++];It surprised me a little to learn that it was actually the first. This is essentially the code that my WinQuake engine uses to put the final pixels onto the screen, and - in the course of tidying things up - I had switched a loop from the first of these to the second. Dropped from about 158 FPS to about 125. Switch it back and it goes up again. That kind of difference is no joke.
*dst++ = ddpal[*src++];
*dst++ = ddpal[*src++];
*dst++ = ddpal[*src++];
Of course that's not always the case - putting something similar on lightmap uploads in DirectQ, for example, made things slightly slower - but all the same, it's a good idea to not always assume that pointer arithmetic is always better in every case. Sometimes you need to try both and see which works best, and if it's a critical enough area of the code it could be something of an eye-opener.
5 comments:
I take it that for(int i=0, i<4, ++i) dst[i] = ddpal[src[i]]; is even slower, then...
woah woah woah woah woah, you have a WinQuake engine too? I didn't know this.
oh, now that i look back i see i missed some posts [:S]. i was about to do a big search to find it.
Yeah, I'm experimenting with some stuff on a WinQuake engine cos it's fun to try out new ideas on something different. I probably intend releasing it at some point, but I've no commitment to do so.
There are some things in it that I can use in DirectQ as well (which is always nice).
This might be because of optimisations by the compiler, the first might output a 32bit mov instruction vs 4 8bit mov instructions of the second and 8 add instructions...
Can't be 100% on it, but it might be something like that.
Post a Comment