Legion ECS Discussion

(Thomas Gillen) #52

If you allocate groups by acess, then you can only optimise a component for one system and it will be (very) slow everywhere else as you are now joining across groups and that is substantially slower than an archetype model.

If you allocate groups by change frequency, then everything is slow everywhere. There is now no correleation between data access (which is ultimately what you want to optimise for) and data layout.

In both cases, by not allocating groups by which components entities use together, you either waste massive amounts of memory, or you use sparse arrays with an indirection table per component. Which, again, massively slows down queries (even non-joined). Either way, you also now require per-entity filtering (and either irregular or random acesses depending on whether you keep your entities sorted or not) even in non-join queries.

At the very least, by assigning components to static groups you can no longer assume that all components in a “chunk” are actually resident. That means your queries now need to inspect per-entity and per-component whether that component exists for an entity (and potentially then a per-entity per-component indirection to lookup the component in a sparse array). That is a workload that is quite literally hundreds to potentially thousands of times larger than the data legion needs to inspect during its queries. That is not to mention that this expanded data workload is also much more difficult to order linearly and is likely highly randomised, nor to consider the additional cost when a query needs to join over multiple groups.

The structural modification performance benefits you describe do not come from this group ID, but rather from the fact that this ecs design supports partially resistent components within a group (because it must do to function at all). By assentially allocating a “group ID” on a per-entity basis to match that entity’s layout, legion can assume that all components in a chunk are resident and therefore it needs to perform no filtering within a chunk, and that is a primary source of its query performance. It also allows the library to return data slices to the user, which allows their code to perform highly linear memory accesses.

Legion’s data layout does fragment when you have a very large number of unique entity layouts, as each layout is placed into its own chunk (and thus each component in that layout is in its own slice). However, it is still minimally fragmented. Other designs may not explictly fragment in this manor, but they still impliciltly fragment their accesses when they try and query such data. I have not seen an ecs design that would handle such workloads more efficiently.

(OvermindDL1) #53

Hence why prior I stated that generally they are grouped based on how they are added/removed. In general anything that will always exist once added and is likely never removed, such as say a Position component, will all be in the same group as all the others. This will indeed fulfill ‘most’ of the components in a game and ‘most’ systems as well.

This follows how legion already is working. Not every system will be accessing the ‘sparse’ components, these tend to be comparatively few.

Not per component, but rather per group of those (which for rapidly changing ones may very well indeed by as they currently are); these would in, for example, in a worst case have specs-like speed, which is not at all slow by any stretch.

This is indeed something that happens a lot. One would generally want to keep it out of the hot-paths/systems, but it does always eventually happen unless you have something ‘outside’ managing a lot of that data instead (which then again ends up being an index table or often just a map of some sort).

All components assigned to that group in a given chunk would be resident for a given entity, otherwise it would be in a different chunk, this is how legion works now.

Indeed, like specs has to do now, with plenty of optimizations that can be done as well.

Yes, these are similar problems that specs and others also deals with, and there are a multitude of ways to optimize these, most commonly are indexes, which would work very well for legion as well as it would even only need to touch an index if cross-group joins are performed so even that fairly minimal cost would be reduced. Now yes memory will jump around more, but that can be minimized by encoding the chunk alignments into the index as well to allow faster jumps directly to the correct one, which becomes even faster if it is an ordered index.

These costs are only incurred when cross joins are performed or when bookkeeping is done to add/remove components (generally a lookup and bit flip in an index). The standard in-group query remains as fast as legion does now, and in the worst case of many cross-group joins then it just goes “down to” specs speed, which is still not bad.

And this would entirely remain the case when a system/query will iterate over components within the same group, nothing would change here, and if your game fits this pattern then more power to you, but not all can.

And this is a case where the ones I tend to create would hit, there is an exceptionally large number of ‘unique entity layouts’. A quick test on one of my old sandboxes is showing over 800k unique different component combinations over the life of one of the simulation runs (~1 million active entities at the highest, 417 component types, serialized out the state multiple times and read the save header mapping to get the list of component variations, uniqued them and counted the variations).

From a quick guess from what I know about how this code works, I’d say that making about 200 of the components ungrouped and grouping the rest into a single legion-style system would reduce the variations of the grouped part down to ~200 or so, and this engine does use an index for queries, various style of storage patterns depending on how a component is most often accessed and stored, and it hits 60fps even at the worst times, and right now it doesn’t do any grouping (beyond a few manually grouped things, mostly related to physics and AI calculations).

(Thomas Gillen) #54

This follows how legion already is working. Not every system will be accessing the ‘sparse’ components, these tend to be comparatively few.

Not quite. legion essentially optimises by entity layout, but it does so perfectly such that there are no “sparse” storages at all and it never needs to make any per-entity filtering decisions. This is a significant assumption that it can only make because it stores every unique layout in its own chunk. That one decision is also enirely responsible for the relative cost of adding/removing components. You cannot have one without the other.

All components assigned to that group in a given chunk would be resident for a given entity, otherwise it would be in a different chunk, this is how legion works now.

In which case you are back to paying for moving entities into another chunk when their layout changes. Except you have additional boundries between chunks (group IDs) to create additional fragmentation and still have additional overhead for joining multiple chunks for queries that need components stored across multiple chunks due to this partition.

Indeed, like specs has to do now, with plenty of optimizations that can be done as well.

Specs is twice as slow as legion at queries in trivial cases where there is only one entity type with perfect occupancy and filtering logic is simple. That is the best case comparison for specs. Its performance rapidly degrades as component occupany falls and as entities are added/removed, as accesses become increasing fragmented and the hibitset loses its ability to cull segments. No matter how much you try and optimise the work, it will never be as fast as the literal no work that legion does per-entity, and memory accesses will always be less linear.

(OvermindDL1) #55

Indeed, I know, however you can have both. You keep it doing what it does not for grouped components, secondary components others can access via whatever else they want. Even if that is as simple as a hashmap on the entity ID you know it will be done, best to bake it in more efficiently.

A single entity type with perfect occupancy is never what happens in any game. You will have variety.

Yeah I do find the hibitset pattern a bit odd, that style is never what I’ve used in the past. I’ve always used the pattern of storages storing components internally (which can easily be multiple) in however the pattern they wish, and they register to a central index registry to keep liveness and so forth synched via direct bit manipulation in the storages themselves on the average case (or callbacks on slow cases, such as with scripting systems). I then used what you could call a ‘query’ to access the Index storage, which on first access would have to build the index, which could be slow with a lot of entities (I always registered indexes at load time, before entities were created, and just used the indexes after that, though scripting systems would occasionally need to create a new index but even then generally only once and early). All accesses on my aspect handlers then just used the index directly, as fast as an array iteration to get all entity ID’s (as they were stored in the index directly). Accessing components in my old engine involved direct access, such as via someComponentStorage.getPosition(e);. Indexes could be unordered (slightly faster on add/remove changes), or ordered (linear memory access could be quite useful in some cases for speed reasons), and a ‘system’ just iterated an N amount of these indexes and accessed whatever they needed when they needed. The hot-path systems stored the data very efficiently for whatever purpose was best for the component (physics in sparse array’s, machining in dense array’s that didn’t even need to access the indexes at all, etc…) so it was a linear read down memory (how else do I keep a million entities with literal hundreds of components updated).

Essentially with a groups pattern you keep your current chunk style iteration per group (most of the time you don’t need to know entity ID’s at all after all). For accessing across groups you just iterate with ID’s one of the groups (in general the one with the fewest of the matches, which an index can hold the information of) and just cross join it with the other groups. This is the same optimizations a database performs, the query is the ‘planner’ and you execute the plan optimized for the data load.

(Zicklag) #56

I just saw that The Great Refactor got merged into Legion!

Does anybody mind giving an update on where that puts us now? It looks ( to me ) like it was a set of general improvements throughout.

I haven’t had time to start experimenting with porting amethyst_core to Legion yet, but I was wondering if it makes more sense to wait until there was a scheduler/dispatcher setup for Legion.

(Alec Thilenius) #57

Is anyone currently working on moving core to Legion? I need a hierarchical transform component for Legion, so willing to get started on that if no one else is.

1 Like
(Zicklag) #58

@Xercsess I was going to look into porting core to Legion, but I haven’t gotten started yet and I’ve got other stuff that I’m working on right now, so it would be great if you could do that!

(Alec Thilenius) #59

@zicklag I will take a crack at it.

@Ayfid Any recommendations on how to implements a hierarchy with Legion? This seems like a good use case for a Parent(Entity) Tag?

My 30 min of thoughts on how to implement a hierarchical Transform update system (recomputing world matrices) is to build a forest of Transform trees that need to be recomputed. Once built, this would allow for parallel update of the minimal set of Transforms, without locks. Two things can affect a world matrix needing to be re-computed: somewhere in the parent chain a Parent Tag changed, or a Transform itself changes. The steps are then are as follows:

  • Create an empty forest (HashMap) keyed on Entity with children stored as a recursive forest, along with a visited Set.
  • Iterate all Entities with Transform and a modified Parent Tag (is this possible in Legion?):
    • Skip if already visited
    • Add it to the root of forest and mark visited
    • Recursively search the entities children (via a Tag filter)
      • If the child is a root in the forest, then rotate that entire tree to be under this one and stop searching
      • Else continue adding children to this tree and marking them visited
  • Iterate all modified Transform:
    • [Same algorithm as previous iteration]

Then take that forest and par_iter over each node recursively. There should be enough work in each matrix computation to make this worth multithreading, especially with Rayon’s work-stealing.


(Zicklag) #60

Kae had pointed out that this might be useful reference:


(Alec Thilenius) #61

Wow, that’s a lot of stuff :stuck_out_tongue: I believe this would be equivalent to: https://docs.unity3d.com/Packages/com.unity.entities@0.0/manual/transform_system.html#section-5-hierarchical-transforms-advanced though. The local_to_parent matrix would be directly updated by mutations to the Transform, and the system would compute local_to_wold matrix (as-needed) once per frame. The zillion different ways to handle scale I don’t plan on adding for V1.

1 Like
(Alec Thilenius) #62

Draft of a Transform system implemented in Legion: https://github.com/AThilenius/legion_transform

Would very much appreciate comments / issues!

Mechanism to sub-dispatch group of systems
(Zicklag) #63

@Xercsess That’s awesome! Not sure when I will be able to, but I can’t wait to try it out. :smiley:

@dakom This was something you were needing in Legion, too.

(BenzzZX) #64

The DepthTag can cause fragmentation.
Unity implemented transfrom system with it but then refactored their code to get rid of it.

(Alec Thilenius) #65

@BenzzzX Noted, ty! :slight_smile: That’s an implementation detail, so for now I’m going to leave it in as it’s already part of the prototype; it’s an easy thing to change later. I’ll take a peak at the Unity ECS source some time to see if I can figure out what they replaced it with. I think parallelism gets a bit harder without the tags (excluding specialized Legion support) but fragmentation would for sure be an issue which might negate the parallelism support in the end.

(Jaynus) #66

@zicklag and all others interested.

I have the working systems and resources implementation of legion in my branch:

This branch is being updated as I need it, but updates have slowed down so I consider it close to API-complete.

This is being used to integrate into my …drum roll…port of amethyst to legion.

A few caveats:

  • This implementats a SHIM syncing system between specs and legion to allow for easier porting. This moves all resources back-and-forth between specs and legion on the legion execution boundary. Additionally, all entities are cloned between the worlds.
    • This means there is some overhead for now, but at least I can much more quickly and fluidly work on porting
  • I am in the process of porting systems. You can see this in amethyst_rendy at the moment.

You can see my current active work at porting amethyst_rendy here: https://github.com/jaynus/amethyst/tree/legion/amethyst_rendy/src/legion

Obviously, help at this point would be great, as its a slow slog through systems.


(Zicklag) #67

Whaat!! That’s awesome! Good job @jaynus and everybody else who helped! That is very exciting to see, and good idea bout the interim syncing solution. :tada:

I’m not sure if I will be able to, but if I do get the time I will definitely look into helping out.

(Alec Thilenius) #68

Maybe not the right spot for this but…

Unity is pulling some seriously impressive performance out of their new ECS. This is done with an chunked archetypical ECS, just like Legion. Side-by-side with the ECS is the ‘burst compiler’. Burst is essentially an LLVM front-end for a subset of IR, but where I think it has great potential is it’s ability to force vectorization on loops. Combined with chunked ECS, you can get very significant speedups on SIMD hardware (which is all hardware these days).

Now Unity has it hard because they have to interop with their old messy ‘Behavior’ layout, and they are stuck with C#, which is a great language but you can absolutely alias memory in it. So they have to restrict things to prevent that. Rust solves most (all?) of the aliasing issue out the gate; afaik safe run cannot alias memory (apart from RO aliases) because of the borrow checker.

My question is: how much of the advantages ‘burst’ brings to the table do we get for free with the borrow checker + LLVM, and how much could be improved upon? Can Rust guarantee no aliasing and can it improve on something like C++ which is very hit-or-miss on if LLVM can vectorize it?

Mechanism to sub-dispatch group of systems
(Swarnim Arun) #69

I am not good enough to answer about the topic but here are links to the Burst Compiler Presentation and details(might save someone else the time),


1 Like
(Jaynus) #70

Linking the RFC here: https://github.com/amethyst/rfcs/issues/22

(Erlend Sogge Heggen) split this topic #71

5 posts were split to a new topic: Mechanism to sub-dispatch group of systems