Legion ECS Discussion

(Thomas Gillen) #48

If a state change is going to persist for more than a couple of frames, then it will be worthwhile to move it into a new archetype. If you have transient events that occur within a frame, then you are best served by creating a new entity to represent that event, and then have a system batch process all events of that type and delete the entity. If you have high occupancy of a particular component, then wrapping it in Option is an entirely feasible alternative.

Having literally hundreds of components on an entity and performing thousands of component addition/removals per frame is absolutely not common or typical in an ECS based game. A typical game’s frame time is dominated by work that takes place inside loops over entities, and only a small percentage of total entities are moved into a new state significant enough to justify a composition change each frame, even if those composition changes were relatively cheap.

If you have a large array of entity data, with a small subset of those entities containing a component you are interested in, there simply is no way to find those entities that is going to be significantly faster than a linear search. At least not without doing some pre-processing work (like sorting the data or building and maintaining an index) - pre-processing work that is not unlike the cost of sorting the entities into archetypes. If fast filtering an iteration performance is important, then pay the cost of that sorting operation. If it is not, then it is difficult to beat the performance of using Option if you can’t pull the event out into a new temp entity.

In practise, even in the case where you have designed your game around entities with hundreds of potential components, the vast majority of those components will not change within a single frame and each entity likely only has a small number of them at any one time. The majority of these components could be reasonably dynamically added and removed. Many of them can be better represented as transient events as their own entities. The set that you end up keeping permenantly on the entity as options need not be huge.

If legion were to support first-class optional components, by essentially implementing the specs storage and filtering archetecture alongside its current storage system, that would display performance characteristics very similar to Option<Component> (it would be able to check 64 entities at a time with bitfield logical ands, but is still otherwise ultimately a linear search), with the only major difference being possible memory savings from being able to use a sparse storage implementation. The cost to that would be an incredible increase in internal complexity to legion and a likely performance overhead throughout due to having to frequently check whether these additional storages have been attached to a chunk.

1 Like
(OvermindDL1) #49

It has been in the ones I’ve worked on in the past. Especially in modding contexts it an get wildly extreme.

Yeah this is what my engine did that I described above, that was the memory trade-off.

Then the state change would effect the wrong entity, or you start jumping around in memory by storing the ‘parent’ entity within this and accessing it out of order for each iterable.

That is why it should not do what specs is, but rather implement ECS Groups.

Essentially this (using what I’ve seen of Legion’s terminology):

Given these components (keeping it simple for the purpose of this example, a name is a typename and a group id is an integer:

Component Name Group ID
A0 0
B0 0
C0 0
D1 1
E1 1
F1 1
G1 1
H2 2
I3 3
J4 4
  • Instead of a single ‘Chunk’ system, you instead have multiple Chunk systems, the number of which is defined by the components, in the above example there would be 5 Chunk storage systems.
  • Any time components of the same Group ID are stored, legion works as it does now, they get stored together in the chunk, adding/removing one of those components involves copying the whole entity’s chunk worth of data to the new area.
  • Any time components of different Group ID’s are stored, legion will put them in separate chunk storage systems.
  • Creating a query for components that are all the same Group ID then legion would work as it does now.
  • Creating a query for components that are not all of the same Group ID would essentially do two queries and then join those to get the final tuple of values.
  • Doing a chunk iteration seems like it would be as it is, the ‘other’ chunk storages of differing group ID’s would just be returned in that same chunk iterator.
  • The internals of course would change, but the user API would stay relatively identical.

Now how the group ID is specified could come a few different ways, the most simple might just be a trait with a single function for callback on the type, which could become part of a deriver if so wished. If it doesn’t exist it could fall back to, say, group 0, thus by default everything gets batched/chunked together, there-fore separating things out would need to be done explicitly.

No needing to special case optional components or anything of the sort. The complexity increase would come from having an entity existing across chunks, which essentially would work as it does now but with joined queries of a count of queries equaling the amount of unique groups.

A complete separate thing, but my old engine also allowed for scripting systems to state how much memory a component took and the C++ side managed all that, the scripting system layer was able to use that memory however it saw fit (for luajit I had a simple offset access system for a limited set of types for example, I tried to keep everything memcpy'able for serialization speed reasons, but if necessary other data could be stored, but then serialization functionality had to be written for it). This would make allowing modders via scripting systems a much easier and efficient way to handle tightly packed components (more rare ones can of course be done entirely in-scripting system via entity ID of course, if the scripting system received live/death events of any entity it registered on, which in my engine was just another component to say what wanted to listen to events from a given entity). This would be a useful feature for legion to support as well.

(Thomas Gillen) #50

That is essentially the same as specs, where the user has used fat component structs that group together data that they think will be frequently used together. It has the same pros and cons as specs does in comparison to legion.

As soon as you have a query that pulls from two different groups, you lose all of the performance benefits of archetypes, because you are now skipping through multiple slices rather than iterating through them linearly. You also now need to do per-entity filtering, which is substantially slower than per archetype and chunk filtering as you can have a few hundred entities in each chunk.

That case is far from rare, too. Components such as position have no sensible group ID that could be assigned, as a huge portion of all entities will have that component, and it will be read by a great variety of systems together with many varied other components. Either you put position in its own group, giving poor performance everywhere, or you put it together with some other components to get good perf in perhaps one system, and even worse everywhere else (or waste memory, depending on implementation). Meaning that as soon as a system needs to read position together with some other set of components, it will now be making very sparse jumps through the “position group” chunks, likely hitting a cache miss on most per-entity accesses which totally wipes out any performance gains you might have hoped to have gained elsewhere.

(OvermindDL1) #51

Exactly, thus why ‘most’ queries would still be full speed as they would access the same group, it’s only the cross-group ones that would change performance.

You don’t just put them in a group based on what else is accessed with it, but rather with how often it may be added/removed. Something like a Position component is not likely to be added/removed often, and thus it belongs with others that also don’t get added/removed often.

(Thomas Gillen) #52

If you allocate groups by acess, then you can only optimise a component for one system and it will be (very) slow everywhere else as you are now joining across groups and that is substantially slower than an archetype model.

If you allocate groups by change frequency, then everything is slow everywhere. There is now no correleation between data access (which is ultimately what you want to optimise for) and data layout.

In both cases, by not allocating groups by which components entities use together, you either waste massive amounts of memory, or you use sparse arrays with an indirection table per component. Which, again, massively slows down queries (even non-joined). Either way, you also now require per-entity filtering (and either irregular or random acesses depending on whether you keep your entities sorted or not) even in non-join queries.

At the very least, by assigning components to static groups you can no longer assume that all components in a “chunk” are actually resident. That means your queries now need to inspect per-entity and per-component whether that component exists for an entity (and potentially then a per-entity per-component indirection to lookup the component in a sparse array). That is a workload that is quite literally hundreds to potentially thousands of times larger than the data legion needs to inspect during its queries. That is not to mention that this expanded data workload is also much more difficult to order linearly and is likely highly randomised, nor to consider the additional cost when a query needs to join over multiple groups.

The structural modification performance benefits you describe do not come from this group ID, but rather from the fact that this ecs design supports partially resistent components within a group (because it must do to function at all). By assentially allocating a “group ID” on a per-entity basis to match that entity’s layout, legion can assume that all components in a chunk are resident and therefore it needs to perform no filtering within a chunk, and that is a primary source of its query performance. It also allows the library to return data slices to the user, which allows their code to perform highly linear memory accesses.

Legion’s data layout does fragment when you have a very large number of unique entity layouts, as each layout is placed into its own chunk (and thus each component in that layout is in its own slice). However, it is still minimally fragmented. Other designs may not explictly fragment in this manor, but they still impliciltly fragment their accesses when they try and query such data. I have not seen an ecs design that would handle such workloads more efficiently.

(OvermindDL1) #53

Hence why prior I stated that generally they are grouped based on how they are added/removed. In general anything that will always exist once added and is likely never removed, such as say a Position component, will all be in the same group as all the others. This will indeed fulfill ‘most’ of the components in a game and ‘most’ systems as well.

This follows how legion already is working. Not every system will be accessing the ‘sparse’ components, these tend to be comparatively few.

Not per component, but rather per group of those (which for rapidly changing ones may very well indeed by as they currently are); these would in, for example, in a worst case have specs-like speed, which is not at all slow by any stretch.

This is indeed something that happens a lot. One would generally want to keep it out of the hot-paths/systems, but it does always eventually happen unless you have something ‘outside’ managing a lot of that data instead (which then again ends up being an index table or often just a map of some sort).

All components assigned to that group in a given chunk would be resident for a given entity, otherwise it would be in a different chunk, this is how legion works now.

Indeed, like specs has to do now, with plenty of optimizations that can be done as well.

Yes, these are similar problems that specs and others also deals with, and there are a multitude of ways to optimize these, most commonly are indexes, which would work very well for legion as well as it would even only need to touch an index if cross-group joins are performed so even that fairly minimal cost would be reduced. Now yes memory will jump around more, but that can be minimized by encoding the chunk alignments into the index as well to allow faster jumps directly to the correct one, which becomes even faster if it is an ordered index.

These costs are only incurred when cross joins are performed or when bookkeeping is done to add/remove components (generally a lookup and bit flip in an index). The standard in-group query remains as fast as legion does now, and in the worst case of many cross-group joins then it just goes “down to” specs speed, which is still not bad.

And this would entirely remain the case when a system/query will iterate over components within the same group, nothing would change here, and if your game fits this pattern then more power to you, but not all can.

And this is a case where the ones I tend to create would hit, there is an exceptionally large number of ‘unique entity layouts’. A quick test on one of my old sandboxes is showing over 800k unique different component combinations over the life of one of the simulation runs (~1 million active entities at the highest, 417 component types, serialized out the state multiple times and read the save header mapping to get the list of component variations, uniqued them and counted the variations).

From a quick guess from what I know about how this code works, I’d say that making about 200 of the components ungrouped and grouping the rest into a single legion-style system would reduce the variations of the grouped part down to ~200 or so, and this engine does use an index for queries, various style of storage patterns depending on how a component is most often accessed and stored, and it hits 60fps even at the worst times, and right now it doesn’t do any grouping (beyond a few manually grouped things, mostly related to physics and AI calculations).

(Thomas Gillen) #54

This follows how legion already is working. Not every system will be accessing the ‘sparse’ components, these tend to be comparatively few.

Not quite. legion essentially optimises by entity layout, but it does so perfectly such that there are no “sparse” storages at all and it never needs to make any per-entity filtering decisions. This is a significant assumption that it can only make because it stores every unique layout in its own chunk. That one decision is also enirely responsible for the relative cost of adding/removing components. You cannot have one without the other.

All components assigned to that group in a given chunk would be resident for a given entity, otherwise it would be in a different chunk, this is how legion works now.

In which case you are back to paying for moving entities into another chunk when their layout changes. Except you have additional boundries between chunks (group IDs) to create additional fragmentation and still have additional overhead for joining multiple chunks for queries that need components stored across multiple chunks due to this partition.

Indeed, like specs has to do now, with plenty of optimizations that can be done as well.

Specs is twice as slow as legion at queries in trivial cases where there is only one entity type with perfect occupancy and filtering logic is simple. That is the best case comparison for specs. Its performance rapidly degrades as component occupany falls and as entities are added/removed, as accesses become increasing fragmented and the hibitset loses its ability to cull segments. No matter how much you try and optimise the work, it will never be as fast as the literal no work that legion does per-entity, and memory accesses will always be less linear.

(OvermindDL1) #55

Indeed, I know, however you can have both. You keep it doing what it does not for grouped components, secondary components others can access via whatever else they want. Even if that is as simple as a hashmap on the entity ID you know it will be done, best to bake it in more efficiently.

A single entity type with perfect occupancy is never what happens in any game. You will have variety.

Yeah I do find the hibitset pattern a bit odd, that style is never what I’ve used in the past. I’ve always used the pattern of storages storing components internally (which can easily be multiple) in however the pattern they wish, and they register to a central index registry to keep liveness and so forth synched via direct bit manipulation in the storages themselves on the average case (or callbacks on slow cases, such as with scripting systems). I then used what you could call a ‘query’ to access the Index storage, which on first access would have to build the index, which could be slow with a lot of entities (I always registered indexes at load time, before entities were created, and just used the indexes after that, though scripting systems would occasionally need to create a new index but even then generally only once and early). All accesses on my aspect handlers then just used the index directly, as fast as an array iteration to get all entity ID’s (as they were stored in the index directly). Accessing components in my old engine involved direct access, such as via someComponentStorage.getPosition(e);. Indexes could be unordered (slightly faster on add/remove changes), or ordered (linear memory access could be quite useful in some cases for speed reasons), and a ‘system’ just iterated an N amount of these indexes and accessed whatever they needed when they needed. The hot-path systems stored the data very efficiently for whatever purpose was best for the component (physics in sparse array’s, machining in dense array’s that didn’t even need to access the indexes at all, etc…) so it was a linear read down memory (how else do I keep a million entities with literal hundreds of components updated).

Essentially with a groups pattern you keep your current chunk style iteration per group (most of the time you don’t need to know entity ID’s at all after all). For accessing across groups you just iterate with ID’s one of the groups (in general the one with the fewest of the matches, which an index can hold the information of) and just cross join it with the other groups. This is the same optimizations a database performs, the query is the ‘planner’ and you execute the plan optimized for the data load.

(Zicklag) #56

I just saw that The Great Refactor got merged into Legion!

Does anybody mind giving an update on where that puts us now? It looks ( to me ) like it was a set of general improvements throughout.

I haven’t had time to start experimenting with porting amethyst_core to Legion yet, but I was wondering if it makes more sense to wait until there was a scheduler/dispatcher setup for Legion.

(Alec Thilenius) #57

Is anyone currently working on moving core to Legion? I need a hierarchical transform component for Legion, so willing to get started on that if no one else is.

1 Like
(Zicklag) #58

@Xercsess I was going to look into porting core to Legion, but I haven’t gotten started yet and I’ve got other stuff that I’m working on right now, so it would be great if you could do that!

(Alec Thilenius) #59

@zicklag I will take a crack at it.

@Ayfid Any recommendations on how to implements a hierarchy with Legion? This seems like a good use case for a Parent(Entity) Tag?

My 30 min of thoughts on how to implement a hierarchical Transform update system (recomputing world matrices) is to build a forest of Transform trees that need to be recomputed. Once built, this would allow for parallel update of the minimal set of Transforms, without locks. Two things can affect a world matrix needing to be re-computed: somewhere in the parent chain a Parent Tag changed, or a Transform itself changes. The steps are then are as follows:

  • Create an empty forest (HashMap) keyed on Entity with children stored as a recursive forest, along with a visited Set.
  • Iterate all Entities with Transform and a modified Parent Tag (is this possible in Legion?):
    • Skip if already visited
    • Add it to the root of forest and mark visited
    • Recursively search the entities children (via a Tag filter)
      • If the child is a root in the forest, then rotate that entire tree to be under this one and stop searching
      • Else continue adding children to this tree and marking them visited
  • Iterate all modified Transform:
    • [Same algorithm as previous iteration]

Then take that forest and par_iter over each node recursively. There should be enough work in each matrix computation to make this worth multithreading, especially with Rayon’s work-stealing.


(Zicklag) #60

Kae had pointed out that this might be useful reference:


(Alec Thilenius) #61

Wow, that’s a lot of stuff :stuck_out_tongue: I believe this would be equivalent to: https://docs.unity3d.com/Packages/com.unity.entities@0.0/manual/transform_system.html#section-5-hierarchical-transforms-advanced though. The local_to_parent matrix would be directly updated by mutations to the Transform, and the system would compute local_to_wold matrix (as-needed) once per frame. The zillion different ways to handle scale I don’t plan on adding for V1.

1 Like
(Alec Thilenius) #62

Draft of a Transform system implemented in Legion: https://github.com/AThilenius/legion_transform

Would very much appreciate comments / issues!

Mechanism to sub-dispatch group of systems
(Zicklag) #63

@Xercsess That’s awesome! Not sure when I will be able to, but I can’t wait to try it out. :smiley:

@dakom This was something you were needing in Legion, too.

(BenzzZX) #64

The DepthTag can cause fragmentation.
Unity implemented transfrom system with it but then refactored their code to get rid of it.

(Alec Thilenius) #65

@BenzzzX Noted, ty! :slight_smile: That’s an implementation detail, so for now I’m going to leave it in as it’s already part of the prototype; it’s an easy thing to change later. I’ll take a peak at the Unity ECS source some time to see if I can figure out what they replaced it with. I think parallelism gets a bit harder without the tags (excluding specialized Legion support) but fragmentation would for sure be an issue which might negate the parallelism support in the end.

(Jaynus) #66

@zicklag and all others interested.

I have the working systems and resources implementation of legion in my branch:

This branch is being updated as I need it, but updates have slowed down so I consider it close to API-complete.

This is being used to integrate into my …drum roll…port of amethyst to legion.

A few caveats:

  • This implementats a SHIM syncing system between specs and legion to allow for easier porting. This moves all resources back-and-forth between specs and legion on the legion execution boundary. Additionally, all entities are cloned between the worlds.
    • This means there is some overhead for now, but at least I can much more quickly and fluidly work on porting
  • I am in the process of porting systems. You can see this in amethyst_rendy at the moment.

You can see my current active work at porting amethyst_rendy here: https://github.com/jaynus/amethyst/tree/legion/amethyst_rendy/src/legion

Obviously, help at this point would be great, as its a slow slog through systems.


(Zicklag) #67

Whaat!! That’s awesome! Good job @jaynus and everybody else who helped! That is very exciting to see, and good idea bout the interim syncing solution. :tada:

I’m not sure if I will be able to, but if I do get the time I will definitely look into helping out.