Thursday, April 22, 2010

More fun with Memory Management

Sometimes when I'm evolving an idea I'll take it through 3 or 4 different implementations (occasionally at the same time) to see which works out best. Sometimes I even go very far with one version of it before realising that it's not a good solution.

Quake's memory management system is a disgrace. This is because it was written for a DOS game (and actually evolved from the Doom code) where it was the only application running on the system and could just grab all available memory. Three different types of allocation needed to be handled in that single block of memory, as well as free operations and moving memory around. The code as it stands in the released source is still largely the old DOS code, and a lot of pain and suffering is inflicted as a result. The fact that you still have to specify -heapsize is only the beginning.

DirectQ has for the last while been dealing with this by reserving a large chunk of address space in virtual memory and drawing down from that as required. This was functionally equivalent to specifying a large -heapsize but only the memory that was actually required was allocated. However it did have the disadvantage that there was still a hard upper limit on memory, and if that ran out you were EFF-YOU-SEA-KAY'ed.

It had to be kept large enough to accomodate all reasonable requirements, but at the same time small enough so that there was headroom for other memory allocations. I had found that 128 MB was a "good enough" size for maps; even warpc.bsp only needed about 30 MB of that, so the other 100-odd was available for other per-map allocations such as particles, entities, static entities, beams, and so on. The remaining 1920 MB of the address space was then available for other allocations, many of which were also squeezed into similar hard-upper-limited pools.

This is all despite the fact that DirectQ has actually removed most limits and can keep allocating memory until it runs out. It's just that it may run out sooner than it should.

For 1.8.4 I've been playing around with various evolutions of the memory system in an attempt to get the best of all worlds. The ideal Quake memory system should be able to create an arbitrary amount of independent buffers, each of which may be indefinitely expanded, and each of which may be easily discarded as required, and preferably in a single operation that releases all memory for all allocations from that buffer. Common objects should be kept together, and allocations should not fall foul of the "lots of small being worse than very few big" trap.

Problems are compounded by the fact that you often don't know how much memory is required in advance, and also that for certain objects the memory must be in a single contiguous block.

Right now I have what looks a lot like a very good solution up and running in a standalone application. I've been hitting it with various random patterns of allocations, and it can quite easily handle 100,000 randomly sized really tiny allocations in a fraction of a second. It's capable of the contiguous memory trick if required, and is also indefinitely expandable (although if it expands the contiguity is broken).

Later on I'm going to port it to the engine and move some allocations over to it to see how it performs in the real world. It should be interesting.

2 comments:

James said...

Hi, I found this post to be very interesting. I was wondering, aside from superior software scalability, can we expect performance improvements when this is used in a real-world scenario?

In my spare time I work on my own 2D game engine project... I'm at the point where I'm considering a new memory management system. I currently just use new/delete on mots of my objects but this is starting to cause slow downs, especially when allocating a large number of small-ish objects.

Do you have any recommended reading material for this sort of thing, specifically in the context of game development?

mhquake said...

No recommended reading I'm afraid, but I do recommend looking at the documentation on VirtualAlloc and friends. They really are the best API for this type of allocation, as you can reserve huge blocks, commit large ones but actually draw down many small ones. If you're regulmarly allocating and freeing memory consider grouping your objects into multiple memory pools so that you can allocate a large number of them in one go, and also free a large number in one go.

With linked lists I like to keep an active list and a free list. The free list is allocated in groups of 64-512 objects in a single memory block, then the pointers are fixed up by hand. Items can be drawn from or returned to the free list as required, and if it runs out another 64-512 get allocated. Memory is cheap so better to over-estimate than to run out.

The only real performance improvement is going to be in terms of loading time, but that was never much of a bottleneck the way it was. Big gains in flexibility and economy though.