Legion ECS Discussion

(Kel) #1

This is a continuation of a Discord discussion over the Legion ECS design and generally allocation strategies in Specs.

The previous discussion can be found below. Beware, it’s lengthy!

Summary

Bombfuse

Is amethyst looking at using Legion over Specs? I’ve seen a good amount of discussion here about Legion
I think Legion’s query model is really nice and usable
From a scripting perspective, it’d be cool to be passed the entire world and just write queries

Khionu

Too much of our internals are locked to specs, and making them ECS agnostic would be very painful. As has been discussed here, Legion and Specs both have their strongpoints and weakpoints. So, in short, we’re certainly not changing our ECS core in the foreseeable future, though that’s not to say it will never change

Bombfuse

Sounds good to me, was just curious about if there was any changes planned, thank you

Kae

I would say, if anyone’s interested in pushing legion into Amethyst, start by achieving feature parity
Personally I also think legion s storage model has some really nice benefits over specs
but it’s far from an equal comparison atm
I implemented some of the stuff required for a “dispatcher” and “System” equivalent feature for legion here: https://github.com/kabergstrom/legion/blob/master/src/schedule.rs It’s just a first pass, but it provides a pretty simple API for implementing a dispatcher. It works like this:

let jobs = generate_test_jobs();
let graph = generate_job_graph(&jobs);
let mut dispatch_state = build_dispatch_state(&graph);
loop {
            match dispatch_state.next_job() {
                ScheduleResult::Done => break, // all jobs have been completed
                ScheduleResult::WaitingForJob =>  {} // no jobs can be scheduled right now due to conflicting resource accesses or dependencies, need to complete more jobs
                ScheduleResult::Schedule(job, idx) => {} // a job should be dispatched, maybe query for chunks and dispatch to a thread pool
            }
}

if someone wants to build on that, or put it into specs, feel free. I’ll probably work more on it later but right now I want to get asset system into amethyst
it’s basically this feature:https://github.com/slide-rs/system-graph/issues/2

Moxinilian

Kae if you indeed work on that more, maybe try to also design it so it is possible to do some of the things we can’t do in specs and are locked out by design
My roommate thought of something that could be of interest but I don’t remember it, I’ll ask him again tomorrow

(A day passes)

Norman784

I have been a long time away from this project, now I read something about legion (the API feels clean, and resembles something to nitric):

  1. What this means?
  2. Is nitric staled?
  3. What’s the goal behind legion development over nitric?

Also I think that transition systems from structs to simple function (I read that on nitric readme :smile: and seems that legion is going in that direction) is much more scripting friendly.

Khionu

Legion is not ours, nor are there plans to pull Legion into Amethyst. There was discussion on the viability, but it was ultimately decided that we cannot make Amethyst ECS agnostic and that there were ups and downs with Legion vs Specs. Nitric is benched for the foreseeable future

Kae

I think it would be short sighted to dismiss legions storage model outright
I think it would be sad if Amethyst couldn’t offer this chunk-oriented storage model in the future

not saying it’s a priority but

Jaynus

Out of curiosity Kae

chunks, each approximately containing 64KiB of data.

Models are a lot bigger than 64kb (example used in explanation of shared data). So is this a hard limit on all data for a single entity?

What’s the case of 55kb entities. Is this 1 per chunk?

Kae

jaynus chunks contain the component structs. Mesh data like vertices are stored in the heap

Jaynus

Ah ok

As all entities in a chunk are guarenteed to contain the same set of entity data and shared data values,

So this isn’t necccasarily true. Or shared data is just a ptr

Kae

Shared data = data stored once per chunk

In the context of legion. Also called “tags”

Jaynus

Ah ok. Sorry, I’m not meaning to nitpick just curious.

What about component locality to chunk locality? This is obviously optimized to many matching entities, not disperate components. Any benchmarks of the 2 diff cases?

Kae

Components are stored sequentially within the chunk, grouped by type

Theres a benchmark result in the legion README. But basically, specs does not guarantee linear iteration (in terms of memory addresses) with join , but legions storage model does

Jaynus

Yah. Hm. Didn’t know specs doesn’t garuntee that with DenseVscStorage.

Kae

Specs isolates each component storage, so when filtering for entities with a certain set of components, specs basically yields a set of indices for entities that are then used to index into componentstorages, while legion yields a set of chunks

Specs guarantees linear access when iterating over a single component type with DenseVecStorage

Khionu

Sorry, it was my understanding that Legion had as many drawbacks and benefits over Specs, and that without the ability to abstract ECS systems and let the user pick, it wasn’t worth considering switching up our ECS base

Jaynus

Granted abstracting away the ECS is about the worst choice that could be made - as it makes optimization hard if not impossible. I’m interested to find that discussion though

Ah found it.

It doesn’t support component insertion/removal

That defeats the whole purpose of an ecs

Maybe that changed?

I can definitely see area for improvement in specs though, in regard to data and cache locality
But there’s always room for improvement there :couch:

Frizi

Abstracting ECS would be disasterous i think. While it’s true that both models have pros and cons, I am leaning towards legion model. It’s biggest drawback is more complicated scheduling and managing components adds and deletes, but it’s definitely a solvable problem. Not that specs does that without issues btw. The biggest gain i see is that chunks adds another level of batching. You can store metadata per chunk - e.g. per-chunk per-component “version” that is being incremented every time we modify a component in that chunk. This basically does what FlaggedStorage tried to do in specs, except without any required synhcronization and queuing (you just need a per-component-type global atomic counter). That is a bit less granular, but it’s actually a good thing. Reacting to a whole lot of changes of individual components (like flaggedstorage does) is very costly and makes things like selective GPU buffer writes totally unpractical. Per-chunk updates on the other adds another layer of batching to that operation, making it worthwile and in fact almost free (tracking those versions is very low overhead). Additionally, because chunks are separated based on entity’s component composition, we are also likely to get many “dynamic” chunks that change all or almost all entities and many “static” that almost never change a single component.

Now the interesting part - it is likely that we can actually implement specs api without any changes on top of legion.

(minus storages ofc, just remove the associated type i guess)

Ayfid

Jaynus It does support [component insertion/removal] now, but it is not fast and is not how you are supposed to use the API except where you absolutely must. The legion API is designed such that you create an entity with it’s components up-front, and ideally as many entities as possible in a single call (the entity insertion function accepts an iterator of component tuples).

It is optimised for entity creation, deletion and iteration speed. Dynamically adding and removing components from existing entities is supported but slow. You can generally design game code around avoiding the need to do component addition/removal outside of entity creation, but you cannot avoid creating, deleting or iterating through entities; those are the core functions of an ECS.

Frizi

I believe we actually have a version of that where entity structure modifications are buffered and performed in very tight code on demand in sync points (once all necessary chunks are not referenced)

the key observation is that you can delay that until the next system tries to read any of the components you just inserted/deleted

Kae is working on that quite heavily

Kae

here is the scheduler stuff so far https://github.com/kabergstrom/legion/blob/master/src/schedule.rs

Ayfid

Yes, I had quite a lengthy discussion with Kae about how to do this in legion, but the actual mutation operation will still be comparatively slow when it does finally get comitted, as it essentially requires a full removal/insertion for the modified entity

Frizi

yeah, but i believe that modifcations in-place are actually way more common

Ayfid

You also can’t effectively batch the operation, because the chunks touched depend on the composition of each entity, which you do not statically know ahead of time and could vary from one entity to the next in the same “add component x” operation

Kae

it’s just swap-remove + push for each component once you have mutable access to the relevant chunks, should be pretty fast?

Frizi

and benefits of fast iteration and pretty much free coarse change tracking are just gold

Ayfid

Yea exactly, I do not think adding/removing components to existing entities is nearly as high a priority as creation and iteration speed

Yes it is Kae, but you need to do the entire thing for every entity

Locate the source chunk, compute what the target chunk will look like, fetch such a chunk (or create it), and then move the data over, performing the modification at the same time

There is little opportunity to share work between entities

And the per-entity cost is still larger than doing the same in specs, as specs does not need to mess around with all of the existing data (iirc?)

But I think this cost is easily worth the benefits of the archetype/chunk design

Frizi

yep, specs is pretty much optimized for this case
but it’s a wrong case to optimize for imo :smile:

Kae

you can optimize the batching by hashing the structure and then sorting entities by the hash of their structure
instant 1000x speedup

Ayfid

You could, I’ not sure how much of a speed up you will get

Frizi

note that in specs, removal of whole entity is also kinda tricky, we delay that until “maintain” actually

Kae i believe you would have to cache the source and destination structure and use both as a compound key

that way you can actually do “entity migrations” in a batch per structure pair

Kae

Archetype has an index in legion
you can probably just use that?

Frizi

you should, yep

Kae

and yeah I agree

Frizi

also you always know the “source”, because you’ve got the entity from a chunk with it :smile:

Kae

having the pair would be good

Ayfid probably depends on the workload, but for a large number of changes I think it’ll be significantly faster, just based on my experience with similar batching things

Kel

Hey since you’re in here, I was curious if there are any open source applications using legion you could link? I was curious to see how legion feels api wise in a larger app
seen it before and design seemed cool

Ayfid

Not as far as I’m aware, I am fairly surprised that it has gotten the attention that it has tbh

Kae

Ayfid I think it’s because you implemented a design with a lot of weight behind it (Unity) :smiley:

Ayfid

Yea legion is pretty heavily based on how Unity does it’s ECS

Jaynus

Catching up on reading back

This whole discussion just makes me think we need some much better amethyst benchmarks. Maybe take the rendy example and add some functionality.

Noithing complicated, but some real world uses to start benchmarking
Mainly this all stems from me poking around for easy-wins in amethyst. and it sweems Transform is one of those big easy wins.
Currently, we basically re-compute every single transform every single frame if anything modifies it basically, in most cases. It’s pretty heavyweight.
And then that sends you down the rabbithole of optimizing flaggedSotrage, and the multithreading it…and you realize you cant really optimize it well
And it turns out Transform basically recomputes large amounts of global matrices every frame in any actual game scenario single threaded

Khionu

I thought the inability to distinguish a mutable access from an actual edit would bite
Was one of the first issues I raised with Specs

Jaynus

Well even that would be okay (though it sucks) if we could parellelize it, but we can’t because of FlaggedStorage.
Or maybe throw some SIMD instructions in there too (hint, nalgebra doesn’t and their storage method makes its epicly complicated)
Transforms should be the one place we can vectorize and parallelize and we cant :frowning: (also, Legion would actually make this worse, not better. Its a scenario where we do not want chunked entity storage)

Ayfid

How would legion make this worse?
Chunks give an easy unit of work for data parallism, it is even built into legion’s API
and each chunk will operate over a single continuous array, which is generally easier to simd too
and unchanged chunks can be skipped

Jaynus

For example global transform application in the majority of use is simply multiplying many 4x4 matrices by a single 4x4. In our case, this could be a single loop over a single array of matrices. That’s very very easy to parallelize (even w specs, if Flagged storage didn’t exist) and even to later apply SIMD as that’s very easy over a contiguous flat array
This is all far future optimization for amethyst of course (or maybe sooner, who knows) but imo worth considering

(Continued below)

3 Likes
(Kel) #2

Part 2:

Summary

Viral

I wonder how random access is slower/faster in legion than in specs

Kae

Chunks are just many contiguous arrays, and they are all guaranteed to be aligned

Viral

Yeah, but there is at least one additional level of indirection

Kae

yes i was mainly replying to jaynus
most importantly, a system can work with chunks instead of componentstorages
this enables chunk-level parallelism automatically
no need for par_join

Viral

So system will be run few times with each chunk that has everything system needs?
But system may want to access different sets of components during execution

Kae

it obviously doesn’t work for aggregations

Khionu

Could both approaches, specs and legion, exist at the same time, same dataset? We’re already managing the lifetimes of mutable references manually, maybe have references pre collected in both fashions?

Could incur a substantial amount of memory overhead, having duplicates of all pointers but as long as we don’t borrow from both implementations at the same time…

Legion uses this chunks approach and Specs uses its storage approach, I’m thinking both might be able to be implemented at the same time. I’m wondering if that would solve the problem of getting the advantages of Legion and Specs

Frizi

I’m afraid that would prevent us from taking advantage of either one of them
there is really no point in having both. I’m not sure why @jaynus thinks that legion is bad for SIMD. It’s literally the best case scenario. Guaranteed linear array of raw values
specs doesn’t actually give you that guarantee

Khionu

Do we have a compiled list of the pros/cons of Specs and Legion?
That would be a helpful reference

Frizi

We could make one, let me write a few points down here for start
the most obvious ones:

specs pros:

  • insertion and deletion of components is faster
  • it’s easy to schedule systems completely “offline” without superfluous barriers

legion pros:

  • iteration over entities with specific components is way faster and easier to vectorize
  • coarse lockfree change detection is pretty much free (ideal case for GPU buffer updates)
  • easy chunk-based parallelization of operations

Viral

Change detection in specs used to be fast

Khionu

Complete transparency in what I’m thinking: we can pick the model that has the most pros, and try to fix it to lesson the cons, by using its cons and the pros of the other.

Frizi

it was decently fast, but also had issues with being too granular and not correct
The incorrect thing about it was that the “change” was not relative to the consumer of data. If you had two systems running at different rates, one would miss many updates

in legion, we can actually solve that easily with atomic counters on chunk level

Viral

Then it will be not very granular

Frizi

that’s a good thing. You have much less buffer writes to perform in the end of the day. The size doesn’t matter that much here
also components are very likely to be all modified in specific chunk, all untouched in others
entities are separated into chunks based on their component composition
so basically your operations are usually very local to just a few chunks
which dramatically improves cache coherency again :smile:

Viral

Maybe many chunks, but you traverse them linearly anyway

Frizi

specs is pretty much random access with DenseVec, you have additional pointer chase and it also wastes a whole lot of memory

Khionu

And there’s guaranteed linear access… doesn’t that mean having to move around entities in a chunk if one is moved?

Frizi

yes, this is why adding/removing entities is harder there
you have to move them between chunks
which also means you can’t really apply those changes immediately, because other systems are executing in parallel

Viral

Adding removing components
That’s where cons come from

Frizi

the solution is to buffer those operations together, and apply all at once in “sync points” between system runs. This makes synchronization trickier
but IMO it’s totally a cost wroth paying

Khionu

This might be laughable, but hear me out: what if we used the style of storage that Specs has right now as a backing storage, then use chunks of pointers?
Vs actually moving around entities?

Viral

that will kill performance

Frizi

it doesn’t buy you absolutely anything
it is actually worst of both worlds
you have random memory access, indirection table, and chunks to synchronize
moving entities around itself is very easy operation
most components are copy types
it’s the synchronization part that’s tricky

Viral

Moving in rust is easy
Even drop types

Khionu

In the chunk model, aren’t entities also the components they have attached?

Frizi

yes this is pretty much it
additionally, the entity id itself is a component :smiley:
it’s not special

Viral

No. It is special Frizi

Frizi

you can actually multiple “key” entities that you index by
well, it is special in that it is a key
but you can have multiple
it makes it possible to implement stuff like 2d indexes

Viral

Yes, because it just an index to table entry
That stores components location

Khionu

So, is there no way to reduce the overhead of ensuring linear access?

Viral

None I’m aware of

Frizi

the overhead only shows on operations that are usually quite uncommon and batched together already
which is entity creation/deletion and component addition/removal
you are much more likely to edit data in-place

Viral

Actually those are encouraged in specs
So amethyst uses them freely
ZST components for entity flagging etc

Frizi

well yes. changing that would change our “optimal operations” set

Khionu

Could we have flags compiled into a single component?
Use a bitset?

Frizi

thing is, you will immediately loose querying speed with that

Viral

Yes, but you can’t join over bitset in legion

Khionu

I see

Frizi

queries are fast because you always know which chunks hold your components

Viral

But, legion supports tags. Which is more powerful
Yet adding removing tag is slower

Frizi

but for cases that you would do zst_components.maybe() in specs, you can just create an component with boolean inside instead

Khionu

That was my other idea
Alright
What are the other practical level concerns?
What could a user do that would make Specs preferable?
And how could we address that in the chunks model?

Viral

Custom storage types

Frizi

in chunk mode, you don’t have storage types at all
:stuck_out_tongue:

Viral

Exactly

Frizi

i think that’s preferable :smile:
it’s less complexity on user side

Khionu

There is a concern, actually
There are usecases for saving and restoring worlds, including synchronization in a P2P setting

Frizi

yes, you can do all that
you don’t do that with storages

Viral

Yes, but you can’t for example, insert code executiin on component addition, accessing, removing

Khionu

That’s counter paradigm, isn’t it?

Viral

I have, for example, storage wrapper that sends removed components over thr channel. Even if component get removed implicitly
I use Drop now, instead of that wrapper
But there could be valid casss for custom storages

Khionu

“could be”, do we have any?

Frizi

we could probably make that a wrapper on component type instead

and make legion call some methods when performing those ops

not sure if we want though

Viral

I did, but now I need to copy channel to all component instances

Frizi

i’m sorry, but this usecase kinda reminds me of https://xkcd.com/1172/
i think that this can be accomplished in a different way
why do you need to respond to component insertions in the first place?

Khionu

Only thing I can think of is logging?

Viral

Frizi I described when I have to respond on removals

Khionu

The rest could be handled when you are doing removals

Frizi

i’m not sure where exactly. What are you doing with this eventstream on the other side?

Viral

I destroy them

Frizi

I think that instead of storages, we can make a component associated type ChangeObserver that determines a resource which is being notified of inserts/deletes/modifications
and most components would just have type ChangeObserver = ();

Viral

I’d prefer !

Frizi

! mean never constructed, which isn’t necessarily true here

Viral

Why would you want to setup () as constructor?

Frizi

as the code that does the insertion inside legion would have to fetch that as a resource and do operations on it
so, having () makes that basically a no-op
while ! makes that a code that won’t compile :stuck_out_tongue:

Viral

It makes world.setup_observer::<T> uncallable

Frizi

anyway, details :smile:
the point is, components can have associated resource that gets notified of things

Viral

Good idea
I like it

Qthree

Viral “Change detection in specs used to be fast” - it’s not if you’re changing most of your components every frame, like with entity position and transform. Then you’re just flooding update channel with useless data. With legion approach it’s just one atomic counter per chunk vs double the size of your storage is specs.

“Yeah, but there is at least one additional level of indirection” actually kind of one less, now you have bitset, Vec with indexes, and actual component (in dense vec storage), with legion you just need chunk address and index.

Viral

originally it was bit per entity and one atomic op per mutable entity access
In specs

Frizi

Single Bitset per storage is not enough information
you need single bitset per storage READ, and you also need to propagate the changes to all readers
when you have chunks, you instead have a “version” counter and a “last visited version” per reader (a system). Then you can query only the chunks that are above that version.
its’s actually a counter per component type in chunk

Jaynus

Hi I’m awake now. I’d like to add that EvejtChannel and BitSet both have significant allocation overhead too, as they are both naively backed by vecs
So just about anything is better

Frizi

uh, if you actually have to allocate stuff, i’d say that Vec is pretty good at that :smile:

Jaynus

RE: SIMD - how is Legion ideal? If I have a system performing a SIMD matrix multiplication, legion actually garuntees my matrices are not adjacent, but stored equadistantly in a contiguous buffer no?

Frizi

still, it’s better to not have it at all
no. Chunks are still stored in SoA manner

Jaynus

Ah!
Small detail not documented (I haven’t looked at code)

Frizi

i think* :smile:
let’s verify

Jaynus

@Frizi oh don’t be pedantic, we all know allocation overhead in amethyst and specs is super gross atm. There are much better ways to reduce malloc syscalls

Frizi

just… don’t allocate what you don’t need?
and add some arenas
we are kinda blocked on rust stdlib with this
but once custom allocators are in, we can do whatever we want
we will most probably end up with per-frame arena alocator
so all your temp buffers are pretty much zero-cost

Jaynus

I’ve been researching allocation strategies for us
We probably want to move more specifically thank global allocators for scenarios such as a frame allocator
Since we really want frame/pool/generic allocators

Frizi

ok, maybe, that’s fine :stuck_out_tongue:
also, why is specs allocation overhead gross?
the memory consumption is kinda gross, especially for sparse components with vec storages :smile:
but if we move away from that, it goes down dramatically

Jaynus

Memory is free tho, it’s 2019 and we should never be optimizing for memory first

Frizi

yes
but mallocing that memory isn’t free ;D

Jaynus

In specs, growing is expensive both for storages and hibitsets
Since it involves moves, copies and other grossness

Frizi

yep
maaany separate buffers grow all at once
and all data is moved away from the old ones

Jaynus

And everything is specs is designed to grow, there’s no sane preallocation strategies

Frizi

it’s just a big glitch if you have enough objects

Jaynus

Yes

Frizi

legion removes that completely pretty much
chunks are limited to reasonable sizes, backed by constant size buffers

Jaynus

This is why I was looking at allocators for us, it’s what got me started with that large allocation problem yesterday

Frizi

I think it’s not worth doing since we want to move away from that anyway, right?

Jaynus

A global bump allocator, a per-frame double bump allocator, and a object pool almost garunteed would be a huge win

But yes, they are treating a symptom not necessarily the problem ( in regards to specs)

Kae

Whats a global bump allocator for?

Frizi

bump allocation without limited region lifetime sounds bad, as deallocation is pretty much never happening
(except maybe when you drop the thing that’s already on top of stack)

Jaynus

“global” in a sense it’s not bound to a frame. I’m thinking per state/level/something
A higher level bump arena than a frame
(haven’t thought that far ahead) but a higher context preallocated garbage pool of memory, basically, that exists for longer lengths of time

Frizi

sure, but bump allocation doesn’t fit there i think
you might need more general solution than that

Kae

Ah you mean arena allocator basically

Jaynus

Perhaps. A prime example would be assets loading for a given duration of time. They are contiguous and don’t require the dynamism a true heap

Kae

https://en.m.wikipedia.org/wiki/Region-based_memory_management

Jaynus

Sure, that’s assuming you need to support singular deallocation
All details :stuck_out_tongue:

Back to Legion!
So how could we, theoretically, implement legion without (pardon language) shitting all over our users and forcing them to rewrite everything? Or do we care?

Frizi

I’m not sure how legion deals with resources that aren’t just components. I assume this is kinda outside of it’s scope
but specs deals with this problem through shred

Kae

Specs abstractions could probably be implemented on top of legion

Frizi

not exactly, also we probably want query-oriented API instead of list of storages and joins

Kae

Why not?

Frizi

no component stoarges in legion

Kae

The world is a component storage for all component types

Frizi

yes, but there are things like being able to get notified about individual inserts/deletes
in specs you can do that already
also i believe that the way we do join s today is kinda not gonna work in terms of scheduling
because you want to determine which chunks must be owned by scheduler before running the system, so you can run it possibly earlier than worst-case “nobody else touches those components at all”
we can probably treat the whole legion “world” as a shred resource though

Jaynus

At least that means we could play with it for now
(easily)
But imo we should probably avoid rewriting another core system and expand/fix what we have for a while LOL
I do kinda really want Legion though

Frizi

yeah, we still have to get rid of explicit dependencies though
so if we decide to live with specs for a little longer, we should at least fix that there

Jaynus

Well I mean, swapping to Legion would be more jarring than render + transform refactor combined. It basically changes the entire engine

Frizi

also actually the pace of this new ecs experiment is really quick, it might be totally worth switching if we plan that properly
another reason to do that sooner rather than later :stuck_out_tongue_winking_eye:
also it might be worth to implement specs api on top of legion
at least 90% of it
:smile:
if what Kae suggested is ok, i.e. component-access oriented systems (as in specs), then that’s perfectly doable
also note that we really need to do some benchmarking before blindly accepting “obviously faster approach” :stuck_out_tongue:
basically, we need to prove that it’s better than what we have, understand the consequences of switch and decide based on that

Jaynus

I agree

I’ve been using rendy example as my benchmark atm because it’s the best sample of all features we have

Kel

Would it not be possible to adapt legion concepts to specs?
There’s just quite a bit of work already in specs is all

Jaynus

You mean rewrite them, under the specs API?
Rather, in specs

Frizi

it’s possible, but i’m not sure it’s forth starting from scratch
me might end up fusing the projects together though :smiley:

Kae

You cannot get all the benefits of legion if you use specs abstractions

Kel

I’m not saying keep the api the same tho
I understand the api needs to be different for the benefits

Frizi

what needs to change @Kae?
i am lost at exactly which part of api we have to change

Kae

Then yeah that should be possible kel, the read/write accessors should be sufficient for the existing dispatcher to provide the guarantees needed for Systems even when accessing chunk components

frizi specs does not expose chunk level dispatching which increases concurrency, and does not expose tags as a concept

Frizi

so, what do we need to change in systems api to support that?
for chunk level dispatching and tags

Kae

Everything :wink:
Systems are structs and can hold state, which is difficult to keep when the same function can be called concurrently for different chunks

Kel

systems absolutely should not have state imo and this is something that the current specs 1.0 stuff was working away from
if you need for example a reader handle make a shred resource specifically for that system

Frizi

So you want to modify the systems to define only “in the loop” code?
this kinda breaks everything imo

Torkleyy

Systems shouldn’t have state in an ideal world, but the price for that is less convenience for users.

Frizi

i’d expect that the loop over queried entities is still inside the system, you can just parallelize it based on chunks
this whole thing would be abstracted away as an iterator that accepts closure

Kae

Could have an immutable reference to self in chunk query handlers, and mutable ref in pre/post functions

Rybek

Two questions about legion though, is it more painful to use than specs? Are there any benchmarks comparing both?

Frizi

there are benchmarks outside of context of amethyst already

Kel

You know what the ergonomics cost might be possible to offload torkleyy
with an associated “state” type that simply creates a resource on that system
with a default of ()

Frizi

i’m not sure why being stateful is a problem again
it’s really convenient and easy to work with that systems have local state

Kel

kae: “which is difficult to keep when the same function can be called concurrently for different chunks”

what could be done is system “states” could be handled with interior mutability and shared with Read

Frizi

it isn’t dificult. Your concurrent iterator just accept send+sync closure only
which means you have to synchronize your local state to use inside that loop if you want it to be concurrent
but you can as well just use single-threaded iterator
Also note that we aren’t first to think about all this. See “JobComponentSystem” in unity
There is IJobForEach that is basically “just run the loop over set of components” kind of system

Jaynus

Unity ECS isn’t the end all tho, nor even a true example of a pure ECS. Just one reference point
With a lot of tech debt, I’d like to add :stuck_out_tongue:

Frizi

but there are others too
yes it is a single reference point, but a valid reference point
we can definitely improve on that, but it’s nice to look at those solutions to at least not be worse :stuck_out_tongue_winking_eye:
Btw

All structural changes have hard sync points. CreateEntity, Instantiate, Destroy, AddComponent, RemoveComponent, SetSharedComponentData all have a hard sync point. Meaning all jobs scheduled through JobComponentSystem will be completed before creating the entity, for example. This happens automatically. So for instance: calling EntityManager.CreateEntity in the middle of the frame might result in a large stall waiting for all previously scheduled jobs in the World to complete.

Same limitations as we have
Also they support multiple worlds that don’t need synchronization between each other
might be useful for stuff like UI

Rybek

System’s state can be treated the same as “World resource that is only accessed by said system and nothing else ever”

Jaynus

That’s just much more annoying programatic overhead than “struct with members”

Frizi

yes it can, but it’s problematic in terms of synchronization. If the system is allowed to be running multiple times at the same instant through multithreading, then it must be send+sync
and that’s terribly annoying to require on every system :stuck_out_tongue:

Khionu

I mean, if the system’s state is a struct that the creator of the system defines, there are natural flow things hindering other things from accessing it
Like, knowing to access the struct
It’s a non issue, really
Oh, problem with that, what if multiple systems use the same State struct?
That could have complications

Frizi

or if single system uses it in multiple threads at the same time

Rybek

the alternative would be to make systems macros? the biggest issue with macros so far seems to be code completion support
I have actually read the proposal from few months ago that seems to be abandoned by now, it had few good ideas for end user code ergonomy
yes i remember there was something about macros not playing nice with generics

Jaynus

So I had a strange thought in the shower
If systems are moved towards parralel units of executive on with no state, and we are dispatching them on our own scheduler
Systems are futures. And the dispatcher is an executor.
So we are basically reimplementing async/await and std::futurw
Why not just use them at that point?

Lucio

hello
i have been called
yes I think this has been discussed before
udoprog did some work around this at one point

Udoprog

All designs I sketched up relied on a mechanism to prevent borrows from living across awaits which is not a thing right now, but could be implemented if we make sure systems are Send and wrap every borrow in something !Send .
The ergonomics around that is… pretty bad though unfortunately.
Example: https://gist.github.com/udoprog/dc075c6b03929d17d088327fda6c1b75#file-non-send-future-txt
That is assuming you want to have code like:

loop {
   scheduler.wait_for_tick().await;

   let _ = (entities &mut thingies).join();
}

Jaynus

This is specific to specs though, right?
Not legion as we are discussing
Frizi, Ayfid how does this apply to legion?

Khionu

How much work would it be to make all the modules not oriented around ECS, but instead had a module that glued all our modules together in ECS?
Seems like that would be better for a number of things going forward, including testing
And would also make it so that core module could be made into legion or specs without touching other modules
And if for some reason someone wanted to use what Amethyst provides in its modules, without ECS, they could do so
(not that I think we should work on solutions that are non-ECS, but this would enable them)

Jaynus

I mean, sure its possible, but I think it would limit the user experience (and our development experience) too much. You’d basically have to write absolutely everything as unitary worker items. At least thats the only design I can think of. I dont think abstraction will solve the problem here.
I think with some benchmarking, we can start asking the question “did we code ourselves into a performance hole using specs?”
And while I agree specs has many limitations/issues, i dont personally know if we can “fix” them (port legion concepts over?) or its less work just to switch.
But we dont know, definitively, what those issues are. Just intuitively
Example: I know, intuitively, the allocations in hibitset are bad. And that the storages of specs are not good to caching. But I can’t definitively say Legion does better.

Khionu

Yeah, we definitely need some solid benchmarks

Jaynus

I think a big problem in this is we don’t really have a good baseline use case.
We have the showcase game, and the rendy example. I don’t know if anyones personal project is to the point where it’d be a useful benchmark.

Khionu

We don’t need a full game to benchmark this
Just a bunch of moving objects would suffice

Jaynus

Well, or we need to sit down and spec out all the different common iteration scenarios.

Khionu

That too

Jaynus

Which is doable, but just more work than “Hey theres a project that already has this real world use for us” :stuck_out_tongue:
I’m always a fan of less work

Udoprog

legion doesn’t do anything fundamentally different other than not having trait-based systems from looking over the code?

Jaynus

The storage method is inherently different, udoprog
Legion provides chunk-based storage, so all entities with the same components are stored in a “chunk”, with contingous arrays for each component within a chunk.
Specs doesn’t provide any of that, so it thrashes caches and is basically random access
Specs we are also inherently limited on parralelism right now (Transform system, for example, cannot thread because of FlaggedStorage)

Udoprog

There’s nothing on a high level in specs preventing implementing such storage imo.
Things like FlaggedStorage would need a substitute in a legion-based solution.

Jaynus

Read way up, theres discussion about that with atomic counters
That sounded terse. Sorry. Just this is rehashing a lot of what was previously discussed :smiley:

Udoprog

Discussion about FlaggedStorage?

Jaynus

FlaggedStorage, and implementing such a system in Legion
but yes, even further up is my 2 failed attempts at FlaggedStorage rewriting :stuck_out_tongue:

Udoprog

The design question is open-ended. FlaggedStorage seems orthogonal to both specs and legion. Either could use an improvement.
Both have the capability of specifying the same granularity on read/write queries.
While open-ended design isn’t inherently bad, we certainly haven’t encountered potential pitfalls until its addressed all requirements needed for something like Amethyst to work.
That’s my 2 cents at least.

Frizi

We need to move that discussions to forums to be productive. We are repeating ourselves

…And here we are now.

(Jaynus) #3

As a preface, I’ve been digging into allocation strategies we could implement, and where our hot allocation paths are. This has really just been playing around at this point, but this is an opportunity to dump all that. Why? I think the main discussion here, that we need to determine what direction to go, or what goals are.

  1. Does specs meet a target:
    a. memory allocation story
    b. concurrency story
    c. iteration story

Now, as I know many people come from not-low-level languages, heres a few high level TL:DR’s just so we are all on the same page.

  • Memory allocation Story
    Goal: Using our ECS should not dynamically allocate under the hood, ever, and should always be explicit (or use custom allocators)
    • Memory allocation is bad. A regular old allocation is a syscall, which is expensive. Even once we mitigate this with our own allocators (Soon™?) its a good general practice to limit dynamic allocations at all time
    • Rellaction is even worse than allocation. Any Vec growing in our code dynamically, means it is another allocation + move of the entire vec to another region in memory. As you can guess, thats expensive.
  • Concurrency Story
    Goal: Systems should be able to concurrently run, including multiple executions of the same system, with minimal user intervention
    • Configurability of concurrency (this system is thread local, this system can parralellize to all hell, etc)
    • Granular data access probably needs to occur, so we dont force lock-step systems out of nievity
  • Iteration Story
    Goal: End users should be able iterate over different filters/queries of component combinations and entities (Maybe? This is beyond my pay grade)
    • Our ECS should optimize and provide the mechanisms to further optimize iteration across our data
    • Cache coherency and data locality should be the primary goals of storage and thus, iteration

I’m going to begin with a random brain dump of thoughts and problems I’ve had since using Amethyst in regards to specs only:

  • hibitset
    Although drastically optimized for iteration speed, its internal use of Vec causes issues

    • It has no sane preallocation strategy, except with with_capacity, which no one uses
    • It has no sane growth strategy, so reallocations happen.
    • Its keyspace is limited to 0-16m. 16m entities is a perfectly sane limit; however, a max ID value of 16m is not, as it makes any type of space filling/hashing/encoding methods hard or impossible to use (I ran into this limit doing morton encoding)
    • It is optimized for the use case of join (fast AND/NOT operations), but not for straight iteration
    • Implementing custom BitSetLike’s is hard
    • I dont think its optimizations currently meet our most common use cases (correct me if I’m wrong please)
  • specs

    • Memory allocation Story
      • None of the underlying storages really perform any kind of optimization for allocations except DenseVecStorage. Even then, we don’t get alignment or caching garuntees. DenseVecStorage also does no sorting, so we are almost garunteed to be doing sparse reads.
      • All underlying storages have exponential growth overhead, as they grow larger, it takes longer to reallocate.
      • Abstracting away storage prevents optimization
      • The dynamic use of hibitset
    • Concurrency Story
      • Fetch/FetchMut, Read/Write`, etc. are currently our instruements to meet this goal.
        • IMO, this is not granular enough, it basically locks entire classes of data to lock-step execution
      • Things like FlaggedStorage don’t have valid solutions yet, making our most critical hotpath system single-threaded (Transform updates)
    • Iteration Story
      • I think the external facing API of specs meets this requirement and is actually very nice
      • Under the hood however, specs fails this goal.
        • Underlying storages abstracts away possible optimizations
          • Can I SIMD a storage? who knows!
        • Entire storage design is optimized for single-component iteration only

Those are just my random thoughts and a brain dump I have been working on over the past few weeks. (Told you @fletcher).

Now the question is:

  1. Am I even right on my story goal assumptions?
  2. How do we measure these things?
  3. Does or can specs meet these goals?
  4. How does Legion already meet them?
4 Likes
(Zicklag) #4

I just read through the whole conversation, and while I don’t understand all of the lower-level points, here are my thoughts:

If we are going to switch the ECS that Amethyst is built on, the sooner that we do it the better. We don’t want to keep working on something that we are going to end up needing to throw away later. We need to do the investigation necessary to adequately compare the two ECS’s so that we can make an educated decision on what to do.

While rewriting the engine to handle a different ECS would be a lot of work, my usual stance on things like this is that I would rather put in the work now and make it great than go with a sub-optimal solution that might come back to bite me later. If Legion really is going to bring Amethyst performance that it wouldn’t be able to get otherwise, then I think it might be worth it. We’re going to have to do the testing to find out for sure.

The change would require updates to pretty much every portion of the engine, though, and that would probably make it difficult to do development on anything else in Amethyst while that is in-progress. For example, I’m intending on working with @Moxinilian to get some work done on scripting in Amethyst, but that is very closely tied to the ECS. Does it make sense to start new development on a scripting system if the ECS is going to change and change the way that we have to build a lot of the scripting system?

This whole thing is a big deal because of both the potential gain and the potential cost of switching or not switching. I think it deserves serious consideration, though.

It seems like a good benchmark or set of benchmarks is the first step.

(Kae) #5

This seems absolute. No implementation of GlobalAlloc that I know of invokes a syscall on every allocation. On Linux with libc the default GlobalAlloc calls malloc which only invokes syscalls in certain situations. On Windows, the default GlobalAlloc calls HeapAlloc. They all have their own platform-dependent behaviours, though I won’t pretend like they are Good Allocators.

Syscalls are only necessary when changing the virtual memory mapping, but is usually done for larger allocations only.

And you will need to dynamically allocate. Perhaps you mean it should aim to re-use allocated memory instead of delegating this to the global allocator?

Seems ok :slight_smile:

Takes a lot of resources to implement and test reasonable use-cases and related benchmarks, but yeah - real use-cases would be best. An ECS doesn’t have that many possible operations, so just showing the pathological and best-case behaviours for both would also be acceptable IMO.

No, it currently does not meet the allocation goals you set up with any of its storage implementations.

Specs cannot meet the concurrency goals with the current API because it dispatches on system-level and not on a component or chunk level.

Specs cannot guarantee linear iteration for multi-component queries due to its isolated ComponentStorage design.

Can specs meet the goals? Not without extensively changing its API.

For simple add/remove, Legion does well with allocation behaviour (at least on my fork :wink: ) but it does not use a custom allocator for its fixed-size blocks. It could do better with custom allocator APIs though, as hashmaps and vecs for chunk metadata still use GlobalAlloc. Also, the blocks are not actually guaranteed to be of an exact byte size. And structural mutations for entities are not optimally implemented currently, requiring multiple individual allocations.

Legion does not have a concurrency story, so it does not meet any of those goals.

Legion has optimal cache coherency and data locality for multi-component queries. It goes beyond optimal in the classical ECS sense with the Tags system that shares certain data across all entities in a Chunk as well. Single-component queries have linear behaviour within a Chunk, but may need to touch many chunks depending on the existing archetypes. I’d say Legion shines most on the iteration goals, which is the most important aspect for high performance considering modern CPU architecture details.

3 Likes
(Kae) #6

Another thought: providing multiple levels of granularity for data storage and parallelism seems quite appealing. I can imagine a dispatcher that is aware of data on following levels

  • Global
  • Per world
  • Per entity
(Erlend Sogge Heggen) #7

Earlier this week I discussed with @jaynus how Evoli might be extended to serve as a real-world-application benchmark for the ECS features we’re most interested in measuring.

  1. More moving entities
  2. Syncing from specs-nphysics
  3. 3D culling
  4. Userless simulation mode

All can be tracked in the ECS Benchmark milestone: https://github.com/amethyst/evoli/milestone/3

1 Like
(Justin LeFebvre) #8

I know I don’t have a ton of skin in the game on this subject since I haven’t spent much time digging into the internals of the ECS (specs/shred) but I would say that there should be no world where we replace Specs with Legion or any other ECS. However, we can and should borrow (steal) the best ideas from another ECS and implement them in Specs whenever possible in order to get our library to where we would like it to be.

1 Like
(Kel) #9

Just to be clear, the primary question in discussion isn’t “should we switch out ECS”. In any case, Legion is in comparison very young and missing lots of features that would be required. This discussion is first and foremost about the decisions in Legion’s design, and in Specs’, and measuring what we could do better and where.

3 Likes