Legion ECS Discussion

(Paweł Grabarz) #44

I also need to point out, you are testing a known worst case of legion ecs in it’s unoptimized form. As @jaynus mentioned before, the job scheduler is currently being developed, and batching the structural operations will go with it. There is an inherent cost of moving data around and it’s always better than not moving it at all, but in current state we are far from the theoretical maximum performance here. The data layout is optimized for even more common operation - querying. It’s pretty much equivalent to implicit grouping, but instead of laying group boundries on component types, it slices the entities list directly.

You can’t replicate such thing in specs. Currently every specs component type must be stored separately, as it has it’s own separate storage. This is a very central point of the design and would be extremaly hard to change.

1 Like
(OvermindDL1) #45

Ah, I like that name a lot more! :slight_smile:

It’s a very common pattern among all of my past ECS sandboxes though, components being added and removed is an extremely common occurrence.

Though a new storage type would be able to allocate the data together, it would require some kind of change in specs to be able to pull that data out of that same storage across them I would think though. Still, I think legion’s API style might be the better way to go especially with how it fits with scripting systems far easier. The central design point of specs was very odd to me when it was first being designed. Back in C++ my ‘world’ (Context in C++) didn’t know or understand storages, they were external data stores, and could be anything, the most the Context did was hold a reference to storages for ease of looking them up in a scripting context (the storages also fulfilled the Node system my engine used to allow for dynamic lookup in them from scripting systems). This was also why I didn’t have an allocation group style though, sometimes a single storage would hold multiple components, but it was always rather manually performed (though it was very fast).

If curious, my query system was not really the best. Creating a new query could be a bit slow, but once created it could be infinitely stored and re-used for as long as the Context existed (although creating another query that was identical in all ways to an existing one had no real setup time, it just looked up the existing cache entry), and iteration required no bitsets or tests of any sort, rather whenever a component was added or removed on an entity it told the ‘Aspect’ system about it, which updated the internal cache, which is what the query’s directly accessed. It exchanged a bit of memory (multiple megs) and a couple of array lookups and memory sets for having linear-array-speed fast iteration of entities of whatever filtering you wanted (and looking up the storage data for an entity was the job of the ‘systems’, though a system in mine was just any function that was registered with the parallel dispatcher with data about what it accessed to allow for safe parallelism). I was about to replace my 2-decade-long homegrown ECS with the C++ EnTT library before I ended up falling in to Rust though. ^.^

1 Like
(Thomas Gillen) #46

@Frizi is right, entity_data was renamed to component quite some time ago. I did not realise that the readme had not been updated (and the docs need to be updated too).

Queries are really the focus of optimisation in legion, so not using that feature is not going to be representitive of its performance.

As for flag-type components that need to be toggled on and off, there is little scope to optimise that beyond the performance you would get from simply using Option<Component> as your component type and branching inside the loop. We must either pay the cost of moving the entity into a new chunk, or we accept that not all of the entities in a chunk will match the query and must be filtered out. If you go for further filtering entities within a chunk, it will be difficult to beat the performance of simply checking each component for None as your walk through the chunk, as most candidate entities have already been broad-phase culled by the archetype/chunk filtering portion of the query. At that point, I don’t think complicating the API and implementation to support this is worth it over having the user doing the branch themselves (as it breaks a lot of assumptions around which the API was designed). The performance of component addition/removal likely will be improved to perform direct copies between the source and destination chunks in the future, but it will always be a slower operation than in an ecs like specs.

(OvermindDL1) #47

The point of it is to not require branching. Like as a specific example from one of my sandboxes take a “Machine” that does work in a game, when it is given a set of items to work on then it gets a ‘MachineWorkingOn’ component, which holds the state of what it is doing and when it will complete, output, what resources it is drawing, etc… Systems that work on such things as <ElectricalSink, MachineWorkingOn> will be processing these entities for the split second or 5 seconds or 5 minutes or whatever it works on per game tick to deduct power from the electrical network and add that to the processing time on the MachineWorkingOn component (among other things). Most entities that will potentially have MachineWorkingOn will be idle most of the time, thus not having that component (think about 1-10 ticks to process fast recipes, another couple of ticks to remove the output item and load the new inputs, then the components are added on again).

The great majority of ElectricalSink components will not have a MachineWorkingOn, could be transformers, motors, etc… etc… There will be literally hundreds of thousands of entities with ElectricalSink components. There will be large swaths that flicker on and off (and large swaths that are idle due to insufficient resources or insufficient output). This has been no processing issue for me in the past, even with quite literally a million entities.

Queries are still very valuable, and should absolutely be a focus of optimisation.

However, adding and removing components is also extremely common in ECS games, to the tune of, depending on the size of the game, hundreds to tens of thousands of additions/removals per tick (if not more in some cases).

Thus having to move all the memory of all the components that an entity holds for those operations, of which many/most of those entities will not be trivially sized, or having to use an Option and test for the components status when most of them would very well be inactive most of the time (output backup and resource starvation is not at all uncommon) is a significant amount of work needing to be done for no reason at all. Combine that with the fact that the active and inactive ones will be very well intermixed and that blows the branch prediction of the CPU away.

This is not an uncommon action, just like queries are focused on optimization, adding/removing components do as well.

A library optimized for where you do not often add or remove components is a dataflow library, not an ECS library, and is for different purposes than an ECS.

The big thing is that legion could optimize this well by just implementing the group pattern. It would not complicate the API (it would keep it near identical other than specifying which group a component should be in, and that’s a compile-time definition).

1 Like
(Thomas Gillen) #48

If a state change is going to persist for more than a couple of frames, then it will be worthwhile to move it into a new archetype. If you have transient events that occur within a frame, then you are best served by creating a new entity to represent that event, and then have a system batch process all events of that type and delete the entity. If you have high occupancy of a particular component, then wrapping it in Option is an entirely feasible alternative.

Having literally hundreds of components on an entity and performing thousands of component addition/removals per frame is absolutely not common or typical in an ECS based game. A typical game’s frame time is dominated by work that takes place inside loops over entities, and only a small percentage of total entities are moved into a new state significant enough to justify a composition change each frame, even if those composition changes were relatively cheap.

If you have a large array of entity data, with a small subset of those entities containing a component you are interested in, there simply is no way to find those entities that is going to be significantly faster than a linear search. At least not without doing some pre-processing work (like sorting the data or building and maintaining an index) - pre-processing work that is not unlike the cost of sorting the entities into archetypes. If fast filtering an iteration performance is important, then pay the cost of that sorting operation. If it is not, then it is difficult to beat the performance of using Option if you can’t pull the event out into a new temp entity.

In practise, even in the case where you have designed your game around entities with hundreds of potential components, the vast majority of those components will not change within a single frame and each entity likely only has a small number of them at any one time. The majority of these components could be reasonably dynamically added and removed. Many of them can be better represented as transient events as their own entities. The set that you end up keeping permenantly on the entity as options need not be huge.

If legion were to support first-class optional components, by essentially implementing the specs storage and filtering archetecture alongside its current storage system, that would display performance characteristics very similar to Option<Component> (it would be able to check 64 entities at a time with bitfield logical ands, but is still otherwise ultimately a linear search), with the only major difference being possible memory savings from being able to use a sparse storage implementation. The cost to that would be an incredible increase in internal complexity to legion and a likely performance overhead throughout due to having to frequently check whether these additional storages have been attached to a chunk.

1 Like
(OvermindDL1) #49

It has been in the ones I’ve worked on in the past. Especially in modding contexts it an get wildly extreme.

Yeah this is what my engine did that I described above, that was the memory trade-off.

Then the state change would effect the wrong entity, or you start jumping around in memory by storing the ‘parent’ entity within this and accessing it out of order for each iterable.

That is why it should not do what specs is, but rather implement ECS Groups.

Essentially this (using what I’ve seen of Legion’s terminology):

Given these components (keeping it simple for the purpose of this example, a name is a typename and a group id is an integer:

Component Name Group ID
A0 0
B0 0
C0 0
D1 1
E1 1
F1 1
G1 1
H2 2
I3 3
J4 4
  • Instead of a single ‘Chunk’ system, you instead have multiple Chunk systems, the number of which is defined by the components, in the above example there would be 5 Chunk storage systems.
  • Any time components of the same Group ID are stored, legion works as it does now, they get stored together in the chunk, adding/removing one of those components involves copying the whole entity’s chunk worth of data to the new area.
  • Any time components of different Group ID’s are stored, legion will put them in separate chunk storage systems.
  • Creating a query for components that are all the same Group ID then legion would work as it does now.
  • Creating a query for components that are not all of the same Group ID would essentially do two queries and then join those to get the final tuple of values.
  • Doing a chunk iteration seems like it would be as it is, the ‘other’ chunk storages of differing group ID’s would just be returned in that same chunk iterator.
  • The internals of course would change, but the user API would stay relatively identical.

Now how the group ID is specified could come a few different ways, the most simple might just be a trait with a single function for callback on the type, which could become part of a deriver if so wished. If it doesn’t exist it could fall back to, say, group 0, thus by default everything gets batched/chunked together, there-fore separating things out would need to be done explicitly.

No needing to special case optional components or anything of the sort. The complexity increase would come from having an entity existing across chunks, which essentially would work as it does now but with joined queries of a count of queries equaling the amount of unique groups.

A complete separate thing, but my old engine also allowed for scripting systems to state how much memory a component took and the C++ side managed all that, the scripting system layer was able to use that memory however it saw fit (for luajit I had a simple offset access system for a limited set of types for example, I tried to keep everything memcpy'able for serialization speed reasons, but if necessary other data could be stored, but then serialization functionality had to be written for it). This would make allowing modders via scripting systems a much easier and efficient way to handle tightly packed components (more rare ones can of course be done entirely in-scripting system via entity ID of course, if the scripting system received live/death events of any entity it registered on, which in my engine was just another component to say what wanted to listen to events from a given entity). This would be a useful feature for legion to support as well.

(Thomas Gillen) #50

That is essentially the same as specs, where the user has used fat component structs that group together data that they think will be frequently used together. It has the same pros and cons as specs does in comparison to legion.

As soon as you have a query that pulls from two different groups, you lose all of the performance benefits of archetypes, because you are now skipping through multiple slices rather than iterating through them linearly. You also now need to do per-entity filtering, which is substantially slower than per archetype and chunk filtering as you can have a few hundred entities in each chunk.

That case is far from rare, too. Components such as position have no sensible group ID that could be assigned, as a huge portion of all entities will have that component, and it will be read by a great variety of systems together with many varied other components. Either you put position in its own group, giving poor performance everywhere, or you put it together with some other components to get good perf in perhaps one system, and even worse everywhere else (or waste memory, depending on implementation). Meaning that as soon as a system needs to read position together with some other set of components, it will now be making very sparse jumps through the “position group” chunks, likely hitting a cache miss on most per-entity accesses which totally wipes out any performance gains you might have hoped to have gained elsewhere.

(OvermindDL1) #51

Exactly, thus why ‘most’ queries would still be full speed as they would access the same group, it’s only the cross-group ones that would change performance.

You don’t just put them in a group based on what else is accessed with it, but rather with how often it may be added/removed. Something like a Position component is not likely to be added/removed often, and thus it belongs with others that also don’t get added/removed often.

(Thomas Gillen) #52

If you allocate groups by acess, then you can only optimise a component for one system and it will be (very) slow everywhere else as you are now joining across groups and that is substantially slower than an archetype model.

If you allocate groups by change frequency, then everything is slow everywhere. There is now no correleation between data access (which is ultimately what you want to optimise for) and data layout.

In both cases, by not allocating groups by which components entities use together, you either waste massive amounts of memory, or you use sparse arrays with an indirection table per component. Which, again, massively slows down queries (even non-joined). Either way, you also now require per-entity filtering (and either irregular or random acesses depending on whether you keep your entities sorted or not) even in non-join queries.

At the very least, by assigning components to static groups you can no longer assume that all components in a “chunk” are actually resident. That means your queries now need to inspect per-entity and per-component whether that component exists for an entity (and potentially then a per-entity per-component indirection to lookup the component in a sparse array). That is a workload that is quite literally hundreds to potentially thousands of times larger than the data legion needs to inspect during its queries. That is not to mention that this expanded data workload is also much more difficult to order linearly and is likely highly randomised, nor to consider the additional cost when a query needs to join over multiple groups.

The structural modification performance benefits you describe do not come from this group ID, but rather from the fact that this ecs design supports partially resistent components within a group (because it must do to function at all). By assentially allocating a “group ID” on a per-entity basis to match that entity’s layout, legion can assume that all components in a chunk are resident and therefore it needs to perform no filtering within a chunk, and that is a primary source of its query performance. It also allows the library to return data slices to the user, which allows their code to perform highly linear memory accesses.

Legion’s data layout does fragment when you have a very large number of unique entity layouts, as each layout is placed into its own chunk (and thus each component in that layout is in its own slice). However, it is still minimally fragmented. Other designs may not explictly fragment in this manor, but they still impliciltly fragment their accesses when they try and query such data. I have not seen an ecs design that would handle such workloads more efficiently.

(OvermindDL1) #53

Hence why prior I stated that generally they are grouped based on how they are added/removed. In general anything that will always exist once added and is likely never removed, such as say a Position component, will all be in the same group as all the others. This will indeed fulfill ‘most’ of the components in a game and ‘most’ systems as well.

This follows how legion already is working. Not every system will be accessing the ‘sparse’ components, these tend to be comparatively few.

Not per component, but rather per group of those (which for rapidly changing ones may very well indeed by as they currently are); these would in, for example, in a worst case have specs-like speed, which is not at all slow by any stretch.

This is indeed something that happens a lot. One would generally want to keep it out of the hot-paths/systems, but it does always eventually happen unless you have something ‘outside’ managing a lot of that data instead (which then again ends up being an index table or often just a map of some sort).

All components assigned to that group in a given chunk would be resident for a given entity, otherwise it would be in a different chunk, this is how legion works now.

Indeed, like specs has to do now, with plenty of optimizations that can be done as well.

Yes, these are similar problems that specs and others also deals with, and there are a multitude of ways to optimize these, most commonly are indexes, which would work very well for legion as well as it would even only need to touch an index if cross-group joins are performed so even that fairly minimal cost would be reduced. Now yes memory will jump around more, but that can be minimized by encoding the chunk alignments into the index as well to allow faster jumps directly to the correct one, which becomes even faster if it is an ordered index.

These costs are only incurred when cross joins are performed or when bookkeeping is done to add/remove components (generally a lookup and bit flip in an index). The standard in-group query remains as fast as legion does now, and in the worst case of many cross-group joins then it just goes “down to” specs speed, which is still not bad.

And this would entirely remain the case when a system/query will iterate over components within the same group, nothing would change here, and if your game fits this pattern then more power to you, but not all can.

And this is a case where the ones I tend to create would hit, there is an exceptionally large number of ‘unique entity layouts’. A quick test on one of my old sandboxes is showing over 800k unique different component combinations over the life of one of the simulation runs (~1 million active entities at the highest, 417 component types, serialized out the state multiple times and read the save header mapping to get the list of component variations, uniqued them and counted the variations).

From a quick guess from what I know about how this code works, I’d say that making about 200 of the components ungrouped and grouping the rest into a single legion-style system would reduce the variations of the grouped part down to ~200 or so, and this engine does use an index for queries, various style of storage patterns depending on how a component is most often accessed and stored, and it hits 60fps even at the worst times, and right now it doesn’t do any grouping (beyond a few manually grouped things, mostly related to physics and AI calculations).

(Thomas Gillen) #54

This follows how legion already is working. Not every system will be accessing the ‘sparse’ components, these tend to be comparatively few.

Not quite. legion essentially optimises by entity layout, but it does so perfectly such that there are no “sparse” storages at all and it never needs to make any per-entity filtering decisions. This is a significant assumption that it can only make because it stores every unique layout in its own chunk. That one decision is also enirely responsible for the relative cost of adding/removing components. You cannot have one without the other.

All components assigned to that group in a given chunk would be resident for a given entity, otherwise it would be in a different chunk, this is how legion works now.

In which case you are back to paying for moving entities into another chunk when their layout changes. Except you have additional boundries between chunks (group IDs) to create additional fragmentation and still have additional overhead for joining multiple chunks for queries that need components stored across multiple chunks due to this partition.

Indeed, like specs has to do now, with plenty of optimizations that can be done as well.

Specs is twice as slow as legion at queries in trivial cases where there is only one entity type with perfect occupancy and filtering logic is simple. That is the best case comparison for specs. Its performance rapidly degrades as component occupany falls and as entities are added/removed, as accesses become increasing fragmented and the hibitset loses its ability to cull segments. No matter how much you try and optimise the work, it will never be as fast as the literal no work that legion does per-entity, and memory accesses will always be less linear.

(OvermindDL1) #55

Indeed, I know, however you can have both. You keep it doing what it does not for grouped components, secondary components others can access via whatever else they want. Even if that is as simple as a hashmap on the entity ID you know it will be done, best to bake it in more efficiently.

A single entity type with perfect occupancy is never what happens in any game. You will have variety.

Yeah I do find the hibitset pattern a bit odd, that style is never what I’ve used in the past. I’ve always used the pattern of storages storing components internally (which can easily be multiple) in however the pattern they wish, and they register to a central index registry to keep liveness and so forth synched via direct bit manipulation in the storages themselves on the average case (or callbacks on slow cases, such as with scripting systems). I then used what you could call a ‘query’ to access the Index storage, which on first access would have to build the index, which could be slow with a lot of entities (I always registered indexes at load time, before entities were created, and just used the indexes after that, though scripting systems would occasionally need to create a new index but even then generally only once and early). All accesses on my aspect handlers then just used the index directly, as fast as an array iteration to get all entity ID’s (as they were stored in the index directly). Accessing components in my old engine involved direct access, such as via someComponentStorage.getPosition(e);. Indexes could be unordered (slightly faster on add/remove changes), or ordered (linear memory access could be quite useful in some cases for speed reasons), and a ‘system’ just iterated an N amount of these indexes and accessed whatever they needed when they needed. The hot-path systems stored the data very efficiently for whatever purpose was best for the component (physics in sparse array’s, machining in dense array’s that didn’t even need to access the indexes at all, etc…) so it was a linear read down memory (how else do I keep a million entities with literal hundreds of components updated).

Essentially with a groups pattern you keep your current chunk style iteration per group (most of the time you don’t need to know entity ID’s at all after all). For accessing across groups you just iterate with ID’s one of the groups (in general the one with the fewest of the matches, which an index can hold the information of) and just cross join it with the other groups. This is the same optimizations a database performs, the query is the ‘planner’ and you execute the plan optimized for the data load.

(Zicklag) #56

I just saw that The Great Refactor got merged into Legion!

Does anybody mind giving an update on where that puts us now? It looks ( to me ) like it was a set of general improvements throughout.

I haven’t had time to start experimenting with porting amethyst_core to Legion yet, but I was wondering if it makes more sense to wait until there was a scheduler/dispatcher setup for Legion.

(Alec Thilenius) #57

Is anyone currently working on moving core to Legion? I need a hierarchical transform component for Legion, so willing to get started on that if no one else is.

1 Like
(Zicklag) #58

@Xercsess I was going to look into porting core to Legion, but I haven’t gotten started yet and I’ve got other stuff that I’m working on right now, so it would be great if you could do that!

(Alec Thilenius) #59

@zicklag I will take a crack at it.

@Ayfid Any recommendations on how to implements a hierarchy with Legion? This seems like a good use case for a Parent(Entity) Tag?

My 30 min of thoughts on how to implement a hierarchical Transform update system (recomputing world matrices) is to build a forest of Transform trees that need to be recomputed. Once built, this would allow for parallel update of the minimal set of Transforms, without locks. Two things can affect a world matrix needing to be re-computed: somewhere in the parent chain a Parent Tag changed, or a Transform itself changes. The steps are then are as follows:

  • Create an empty forest (HashMap) keyed on Entity with children stored as a recursive forest, along with a visited Set.
  • Iterate all Entities with Transform and a modified Parent Tag (is this possible in Legion?):
    • Skip if already visited
    • Add it to the root of forest and mark visited
    • Recursively search the entities children (via a Tag filter)
      • If the child is a root in the forest, then rotate that entire tree to be under this one and stop searching
      • Else continue adding children to this tree and marking them visited
  • Iterate all modified Transform:
    • [Same algorithm as previous iteration]

Then take that forest and par_iter over each node recursively. There should be enough work in each matrix computation to make this worth multithreading, especially with Rayon’s work-stealing.


(Zicklag) #60

Kae had pointed out that this might be useful reference:


(Alec Thilenius) #61

Wow, that’s a lot of stuff :stuck_out_tongue: I believe this would be equivalent to: https://docs.unity3d.com/Packages/com.unity.entities@0.0/manual/transform_system.html#section-5-hierarchical-transforms-advanced though. The local_to_parent matrix would be directly updated by mutations to the Transform, and the system would compute local_to_wold matrix (as-needed) once per frame. The zillion different ways to handle scale I don’t plan on adding for V1.

1 Like
(Alec Thilenius) #62

Draft of a Transform system implemented in Legion: https://github.com/AThilenius/legion_transform

Would very much appreciate comments / issues!

Mechanism to sub-dispatch group of systems
(Zicklag) #63

@Xercsess That’s awesome! Not sure when I will be able to, but I can’t wait to try it out. :smiley:

@dakom This was something you were needing in Legion, too.