Scripting: Question: Allocation of Component Data

(Zicklag) #1

When you create a new component type in specs you specify the storage type that it will use to store the component data, such as VecStorage. Because the component data is stored in a vector, it is allocated on the heap right?

I’m trying to understand how to best store scripted components. For scripts we are planning on storing the component data as the byte representation of a repr(C) struct. At first I was thinking about storing those bytes as a Vec<u8>, but that would allocate the bytes on the heap, which I immagined would be less efficient than storing them in a byte array ( i.e. [u8; n] ) on the stack.

Obviously different components would have a different length of bytes, so I planned on using generic-array to register different length components as different sized byte arrays all of which could be accessed as byte slices through the GenericArray trait. Some brainstorming code can be found here:

After thinking about that, though, I realized that if we were just going to be storing components on the heap anyway inside of storages like VecStorage, is there any point to being able to allocate the component’s byte data on the stack if we are just putting it in a vector anyway?

(Théo Degioanni) #2

In a VecStorage, while the memory is indeed on the heap, components are stored in a contiguous spaces. This is what makes the storage cache efficient and fast.

Heap allocated memory has two main drawbacks:

  • Allocating is slow: finding where to put your block of memory takes a while. So actually allocating should be avoided.
  • It tends to be fragmented: if you are unlucky, the order in which you free and allocate memory blocks might not be adapted to making your total heap be very compact. For example, if you allocate two 8 bytes buffers, then free the first one and allocate a 12 bytes buffer, you won’t be able to store the 12 bytes in the place of the first one. You will have to allocate them at “the end” of the heap, leaving an empty 8 bytes of wasted space (until something 8 bytes or less is to be allocated). RAM nowadays is still much slower than CPU clocks, so to save time on memory access, when the CPU needs to load a specific memory address, it actually loads a whole block of RAM (stored in CPU cache) for quick access. This means that if you access two memory addresses close to one another, it will already be in the cache (you avoided a cache miss). But if your memory is very fragmented, like it is in object-oriented video game paradigms, memory is not geographically stored in places that make sense, and you therefore have a lot of slow cache misses.

In specs, we try to reduce cache misses and the sparsity of the heap by using storages like VecStorage, which allocate close spaces of memory in which it puts the same components (which are likely to be read and written to at the same time). Note that there are even better strategies, in fact they are being studied for integration in Amethyst (but they have no influence on your problem).

So to answer your last question: yes, there is, because depending on the strategy, memory on the heap can have different costs.

So to solve your specific problem, I think you are on the right track. In the context of dynamic components, you can consider them by their size and not their actual meaning (as long as you keep them in different storage, of course). However, if you want to implement VecStorage for dynamic components, I think std vectors are not quite flexible enough for your use case: you want to store byte arrays of arbitrary sizes, which I believe it cannot do. Maybe you can look for a crate implement that, or make one of your own. If you choose the second path, I recommend actually copying the std’s code and modifying it so instead of using sizeof the type for allocation, it uses a dynamic parameter you specify. (keep in mind the std is licensed under MIT/Apache2.0, but I don’t think in our case that would be of any issue)

(Kae) #3

You may get some inspiration by my fork of legion for this.
Check out the Archetype and Chunk types.

I wrote a C API for storing component types defined in other languages, tested with C#.

(Zicklag) #4

@Moxinilian I was considering providing a reasonable amount of different array sizes that it would just select from to store components, selecting the closest available size that matches, but I didn’t think about the fact that I couldn’t store differently sized arrays inside of the same vector; also that would lead to wasting the memory that the type doesn’t use up in the closest sized array.

I just did a quick search on and I found heaparray which looks promising, but I’ll have to look into it and think more about what exactly I need to be able to do.

@kabergstrom Cool, thanks for the reference. It sounds like you did exactly what I’m trying to do just in Legion. I’ll definitely look at it.

(Théo Degioanni) #5

Why would you need that?

(Zicklag) #6

Oh, wait, you’re right, you only store components of the same type in the same vector. So other than the wasted memory, that would work.

You could implement component storages for arrays who’s lengths are the powers of 2 up to a certain number with a macro and then each component would be stored in one of those storages.

(Théo Degioanni) #7

Well, that would work I suppose. But I really fear the wasted space.
@kabergstrom more specifically, how do you handle this problem briefly?