Legion Transform Design Discussion

(Joël Lupien) #1

Hello!

Since we are in the middle of the port to legion, we need to have a good idea of the direction in which legion_transform should go. Porting code to the new legion_transform and then changing it to something else completely would be a waste of efforts.

Here’s the two main options we are thinking about going forward:

  • Having multiple small components: Position, Rotation, Scale, GlobalTransform
  • Having a single big component: Transform (what we have in amethyst currently)

Let’s have a discussion!

1 Like
(Thomas Gillen) #3

The motivation behind splitting the transform into separate position, rotation and scale components is mostly performance - the vast majority of code which needs position doesn’t care about rotation and scale, etc. So you go from reading 12 bytes for a vec3 position component to a minimum of 32 bytes (vec3 pos, quaternion rotation, f32 uniform scale) or 64 bytes if you are using a 4x4 matrix. The latter case is an entire cache line per entity when you only actually care about less than a quarter of it most of the time.

Unity have had two main issues with their transform system.

The first is deciding when to calculate the full local-to-world transformation matrix. This is mostly only used by rendering, so putting it at the end of the update stage of the loop and before rendering is typically where it goes. Even so, sometimes you do want this information earlier in the frame. Putting components together into a larger transform component, however, doesn’t really help this issue at all. Even if you store it internally as a mat4x4 (rather than more compact decomposed fields inside the component), this transform will still usually be the local transform not the global transform. You still need to calculate the global transform at some point, and that data won’t be available and up to date within the frame until that calculation is performed.

Which brings us to the larger issue that Unity have had, and that is how to handle hierarchy transformation updates. Prior to their ECS, in the Unity engine, you could write to a gameobject’s transform and the engine would internally immediately update both the gameobject’s local and world positions, and that of the entire child hierarchy below it. This was only possible because Unity did not allow you to write to this data from anything but the main thread, so there were no shared/mutable data access issues. The write to the local position property could freely jump all over the place in memory modifying properties on other entities.

When they introduced the ECS, however, that also involved a move to a much stricter memory access model which can be far more robustly multi-threaded. Ultimately, this is the source of the difficulty here. You cannot have a write to a component on one entity also immediately perform what is essentially a scattered random write into components in other entities, due to restrictions/assumptions that the threading model imposes to allow the code to be multithreaded.

Combining position/rotatation/scale into a single larger component does nothing to solve this issue.

5 Likes
(Kae) #4

I think the main problem with any sort of transform system in legion’s concurrency model is that you don’t get exclusive access to all transform components just because you have access to one… so how do you do point lookups to update parent/child relationships from within a system? Feels like you need &mut World for that. Which is not terrible, honestly - most heavy systems probably just need Local2World.

1 Like
(Thomas Gillen) #5

It is possible to schedule a legion system with declared access to all components of a given type. However there is no easy way to intercept a write into a component and have it then update the rest of the hierarchy (and whatever mechanism we came up for doing that would also have to disallow direct &mut [T] access to the components).

It is also worth noting that even before moving to their ECS, Unity had been moving away from their immediate hierarchy model for a few years. They started with the option of deferring the update to a certain point in the frame, and then (in one of the 2019 releases iirc) they made that behaviour the default. They did this because the immediate hierarchy updates was one of the most common sources of poor performance in Unity games, and deferring the updates and doing them all together can be a fair bit more efficient.

However the biggest piece of performance advice they used to give on transforms was to put as many gameobjects as possible at the root, and to keep the hierarchy as shallow as possible. That is why the gist from their CTO about proposed changes to their ECS transform code spends so much time detailing how their automated hierarchy flattening should work (going from offline scene in the editor to runtime scene).

That gist also tries to solve some of the issues people have been having with their new transform code, although only partially and much of what is described there doesn’t make a lot of sense.

They propose in there that entities should have decomposed local and global transformation components (separate translation, rotation and scale and separate local and global). However, those components are “hidden” from the user. Instead the user interacts with those components through TransformAspect component which internally holds pointers to the actual data components and presents an API for manipulating them. It looks like the motivation here is that they want to maintain the densely packed data layout of having separate components, but with some of the convenience of having them in one component; namely that users only need to think about one “transform” component and that adjustments to an entity’s local position can immediately update its global position (and vice versa). However hierarchy updates are still deferred to a later system update, so children of that entity will not move immediately.

There are quite a few things which don’t look right about this. The most obvious being that a component holding pointers to other components causes quite a few problems (it breaking Unity’s own rules being one), such as it now being impossible to move components around in memory. It also means that you need to read this struct (which is a similar size to the actual full transform data all by itself) to read the pointer you need to then access the position… which seems like it somewhat defeats the purpose of storing those decomposed fields in their own components in the first place. There are also mentions in there to their compiler optimising out the dynamic branching that would otherwise be needed (and even entire fields from the struct) based upon what data the entity actually needs. I have no idea how that would work, as it would be asking the compiler to essentially generate separate types for each variant, which you cant then just pretend are all the same component type when you store them (you cant have an array of types that are different sizes, you would need a vtable on function calls to know which code to run, etc).

2 Likes
#6

In terms of many-small vs single-big components, I like that many-small offers better performance and granularity for very low usability tradeoff. However - if a GlobalTransform component must manually be inserted, could it be accidentally forgotten?

The refactor could also hopefully give us a chance to address this: Future of nalgebra and math in Amethyst.

The nalgebra types are very unfriendly and confusing, and this seems like a good opportunity to take a look at it.

The replacement crate that was suggested in the post is this one: https://github.com/termhn/ultraviolet.

2 Likes
#7

In https://github.com/amethyst/legion_transform we use LocalToWorld to represent global transformation. We can have systems that automatically insert it (and LocalToParent) if any of the Position, Rotation, Scale are present.

It’s true that replacing nalgebra now is a good opportunity, but this is getting to the point where migration guide from 0.15 (specs) to 0.16 (legion) is becoming impossible. I don’t have any big projects on amethyst so I would be all in for replacement, but this decisions involves a lot of people.

3 Likes
#8

Do you think it’d be possible to use the deferred hierarchy update strategy in a flexible way?

Let’s say my game has a hierarchy of a player holding a shield, would it be possible to:

  1. [Run some systems] Move the player
  2. [First call to hierarchy update system] Call the hierarchy update system to move the shield
  3. [Run some systems] Calculate collisions and adjust player position because the shield hit a wall
  4. [Second call to hierarchy update system] Call the hierarchy update system to move the shield again

In this way, the developer could add as many calls to this as their game demands, leaving the choice of optimization vs flexibility in their hands?

#9

You can already do this unconditionally by adding the same system multiple times.

Conditional execution would add a lot of complexity to the scheduler which usefullness is questionable. However, I see that this would also be useful for implementing pausable systems. Current legion scheduler (amethyst version) is still not finalized and is open for discussion, but probably not in this thread.

1 Like
#10

Sorry, I meant having those systems run every frame, the flexibility part meant in terms of being able to run the system in multiple places. If it’s possible to add the same system multiple times, then I think we have a good workaround for the immediate hierarchy update model.

To answer the original question, I cast my vote for many small components - I think the performance / granularity gains outweigh the (convenience?) of having one large component.

(Duncan) #11

It’s true that replacing nalgebra now is a good opportunity, but this is getting to the point where migration guide from 0.15 (specs) to 0.16 (legion) is becoming impossible. I don’t have any big projects on amethyst so I would be all in for replacement, but this decisions involves a lot of people.

One data point here. I have a large-ish amethyst project that makes heavy use of ncollide3d (which depends on nalgebra). I would definitely prefer to do the legion transition separately, as I expect things to break, and I don’t want to be debugging the ECS and the math library at the same time.

We should weigh the tradeoffs for deferring the switch to a new math library, since I suspect it wouldn’t be so bad to change how the amethyst transforms work while sticking with nalgebra. What are the benefits of switching to a new math library at the same time?

7 Likes
#12

I think the idea is that, if we’re going to change the way Transform works, we should do it once instead of twice.

This also raises the question of whether the legion port is the right moment to change the way Transform works? Maybe it’d be better to rework Transform and possibly change the math library together in a separate update to avoid breaking too many things at once?

(Duncan) #13

Maybe I just don’t understand the full ramifications of switching the math library as it pertains to Transform. But I suspect you could change how Transform works w.r.t legion as one step. Then in a separate step, replace the nalgebra types with those of a new math library. To me, those two smaller steps seem better than one big step.

(Nathan) #14

I propose that talk of a math library transition be tabled for now. I believe opening a debate on that topic at the current moment could be harmful to the morale of the already-stretched-thin effort to ship the stop-the-world-and-transition-to-legion effort. The math library topic could potentially be revisited later as an experiment as per my rationale in the middle of this comment.


I would like to hear more reasons for/against having transforms as multiple-small vs single-big components. I am not an expert in this area, so I don’t have any arguments to provide myself.

I do have a strong opinion on what I want to see: Usability, simplicity, straightforwardness, and understandability. I don’t think max theoretical performance is valuable until we have real-world use cases that are bottlenecked on it. If the single-big component strategy is more usable, simple, straightforward, and/or understandable, I vote for that – but once again: can folks with more knowledge please weigh in on more of the trade-offs between the two options? There’s been a fair amount of talk about issues which neither option solve, so what issues do the two options solve? Let’s make a decision and move forward so we don’t block the legion effort.

4 Likes
(Joël Lupien) #15

Changing the math library: Not happening. At all. Ever. Thank you x)

Small VS Big:
Having a single big component is much much easier for both the end users and the engine developers.
As for the question of performance, I have a benchmark I made around a year ago using specs and the performance of a single big component was generally the same or better than the small components. I assume that legion could make the small components more performant, but to me the performance gain will never be worth the loss in usability.

Also, we had a separation between Transform and GlobalTransform before, and we merged them over weeks of effort. Separating them into 5(?) different components sounds like something I would absolutely despise using (because it was already hell with two components).

If we go with the big component approach, we can copy paste amethyst’s Transform struct into legion_transform and update the code to make everything work.
If we go with the small component approach, we need to update amethyst’s code to work with the new legion_transform structure.

How do we want to go forward?

2 Likes
#16

First, changes need to be kept reasonably scoped. Legion needs to be integrated without other changes before considering other refactors. The current transform system/component should be dropped into legion in the most straightforward way possible before considering changing it.

Transform Representation

Second, I went through this exact design decision last week on my own engine. I went with combining position/rotation/scale into a single component and it made for much cleaner code in many places. In a non-trivial game world (>100 objects), most of them will be stationary. In huge worlds, almost all of them will be stationary.

I believe most commercial engines DO NOT separate position/rotation/scale. To repeat, I think step 1 should be to drop the current transform component into legion as-is.

For future design, I recommend a separate design-time and run-time representation (TransformComponent?)

  • TransformDef at design time with a Vec3 position, Vec3 rotation (in euler angles), float uniform scale, and Vec3 non-uniform scale. Support for other design-time controls like order in which to apply rotations could be added as well. Blender, unity, and UE4 UX could be referenced here to settle on a design-time representation.
  • For runtime I suggest a 4x4 matrix. Position is cheap to update. Rotation is slightly more expensive, but most things don’t move.

In my own engine, I do this baking step (TransformDef -> Transform) at runtime when inserting objects to the simulation, but it could be offline when atelier-assets has support for processing assets. This is already supported in legion and is described in more detail here: Atelier/Legion Integration Demo

Hierarchies

For hierarchies, again consider:

  • Most things do not move. If something doesn’t move, hierarchy can be baked out.
  • Even among things that move, most things won’t be in hierarchies.

I think the goal here is pay-for-what-you-use behavior that has predictable, non-spikey performance. Further, if the primitives in the engine are simple, end users can build on top of them to make solutions that are better optimized for their particular game. Given this, I would favor simplicity over performance.

The plan I was going to take for my engine (I haven’t actually tried this yet!) is to embed hierarchy information in the root entity of that hierarchy as a component. (HierarchyRootComponent?). I would also add a local transform matrix and reference to their root to the TransformComponent. There would be two separate codepaths, and it would be up to the end-user to call the correct one:

  • Fast path: Directly set values on TransformComponent. As long as the entity doesn’t have children, it is possible to update the local transform and world transform without accessing any other data. Using the fast path when children are attached will invalidate their state.
  • Slow path: Fetch tree data from the HierarchyRootComponent. Cached data structures can be kept there to quickly find children to ensure that they are also updated.

Another thought: Hierarchies could be considered an “animation” problem (i.e. attachment to bones) rather than a “transform” problem. It might be possible to treat hierarchies/attachments separately from transforms.

I think this approach (or any other raised for consideration) needs to be dogfooded before committing to it. I also want to reiterate that the legion switch needs to happen before adding anything like this.

5 Likes
#17

I’ve been reading through some of this as I am potentially am going to use legion_transform for my own project. Some basic thoughts I had:

1 Like
(Zicklag) #18

For any math library discussion we can use the previously opened topic:

3 Likes
#19

After some thinking I realised that we don’t have to pick either of those options. As it stands now in legion_transform, all the Position, Rotation and Scale components basically get converted to LocalToParent or LocalToWorld (depending on hierarchy) which are 4x4 matrices and you can still perform all the operations on the single component if you wish. Separate pos, rot, scale components are more like additional functionallity to what we had before.

The two different LocalToParent and LocalToWorld components previously were combined into one Transform, but in reality only the local part is ever modified. So we could just rename LocalToParent to Transform if that’s easier to understand.

#20

When I first saw the model of using optional, opt-in components to extend transform handling, it seemed very attractive. It appeared to isolate complexity well and avoid the engine having to pick a particular model for transforms.

Unfortunately, in a very subtle way, practically I don’t think this is the case. As soon as you have upstream components feeding data into the Mat4x4 transform component, you have to figure out when you’re going to flush upstream changes into it. Only do it at particular point(s) in the frame? Somehow detect writes to upstream components and reads of downstream components and dynamically insert a flush? Push the requirement onto the end user to trigger a flush?

As soon as someone wants to share code with someone else which might be relying on the details of how deferred updates is solved, they have to deal with the complexity too. It’s really a tricky call.

  • If amethyst goes with a simple solution (single transform component), end users can pick their own solution for feeding the Mat4x4 transform component (including using it directly and not doing any deferred flushes). If they do choose to have upstream components and defer flushing the Mat4x4 transform, solving this for a particular game is much easier that solving it at an engine level in a way that makes everyone happy. However, as mentioned it may be more difficult to share code between people/projects that rely on this extra layer of behavior.
  • If amethyst does provide its own solution for upstream components flushing to the Mat4x4 transform, even if it’s optional, I think it’s likely some complexity of it (deferred updates mostly) will leak out. There may be a good solution here, but it may take significant iteration and time to find it.

I ended up picking the simple approach for my engine because after reviewing prior art, I found it is a widely used approach and carries the least risk. (I’d rather spend my “risk” budget on other things.) However, I think both approaches are possible solutions and have merit.

(Thomas Gillen) #21

There are really two design issues being debated here, each of which are separate but not entirely independent. The first is how to represent transforms, and the second is how to handle transform hierarchies. For both of these, we need to consider our priorities between simplicity, flexibility and performance.

For transform representation, I can see 3 options: decomposed fields (position, rotation, scale) and whether those fields should be stored in one component (option 1) or as three separate components (option 2), or alternatively we store the transform as a matrix (option 3).

There are pros and cons to each of these options.

Option 1 (position+rotation+scale in one component)
Pros:

  • Easy & and &mut access to each field.
  • Ideal data representation for each field, e.g. a quaternion for rotation.
  • Only one component for users to manipulate.

Cons:

  • Requires conversion into matrix at some point. Separate matrix component will be out of date until (and potentially after) then.
  • Transformation operations (e.g. “where is this position/direction relative to the entity?”) require expensive conversion into a matrix at each use-site.

Option 2 (position+rotation+scale as three components)
Same as option 1, except:

Pros:

  • Fastest when only accessing one field (e.g. only position).

Cons:

  • Users need to declare access to each field when they need more than one.

Option 3 (matrix)
Pros:

  • Easy access to basis vectors (up, forward, side).
  • Transformation operations relatively cheap and always available.
  • Rendering transform never out of date.
  • Option that requires the engine make the fewest assumptions or prescriptions.

Cons:

  • Most expensive to access in general.
  • Reading and writing rotation is expensive.

However, I don’t think which option from above is all that important. The real challenges come when we try and add transform hierarchies to this. To try and lay out the problem:

  • Entities have a transform which describes their position in global space.
  • Some entities also have a local transform which describes their position relative to another “parent” entity.
  • Local and global transforms are two representations of the same data (the entity exists in one location), so modifying one should logically also modify the other.
  • Doing the above requires mut access to both transforms at the same time any time one of them is modified, and also either access to the parent transform or additional matrix calculations to derive it from the difference between the existing local and global transforms.
  • Moving a parent also moves all of its children (recursively) - which we cannot access without violating the ECS threading model (and generally rust’s borrowing rules).

The trouble comes from the fact that moving entities which exist in a hierarchy requires writing to components across multiple entities in order to keep the world state consistent. The ECS does not allow this, for a variety of reasons. I can only really see two options here:

Option 1: We do not allow mutation of transforms or the hierarchy (adding or removing a child) via mutation of components at all. The transform component is read-only. All modification of any transform must happen on the main thread via functions which require &mut World and which will immediately perform all updates to the entity’s local and global transforms, and to all of its children.

Option 2: We make an entity’s global transform read-only. The local transform is the canonnical transform (entities without a parent are relative to the origin). Global transforms are updated by the transformation system which the user explicitly schedules at certain points in their frame. We provide functions on the local transform which handle the math needed to make global-space adjustments.

I think both of these are somewhat painful, but option 1 especially so.


From a pure usability perspective, I think the best option would be to have two components:

struct Parent {
    parent: Entity,
}

struct Transform {
    local: mat3x4,
    global: mat3x4,  // only updated when transform system runs
    parent: mat3x4, // identity for root entities
}

This Transform struct would provide getters and setters to interpret both the local and global transform as decomposed fields. It would allow adjustments to the local transform (and would provide functions which can make those adjustments in world-space via its knowledge of the parent transform), but the global transform won’t change until the transform system runs and updates them - by transforming child transforms by their parent transforms, and by copying root entity local transforms into their global transforms (with component changed filters to skip most unmoved entities).

This is pretty bad from a performance perspective, however. I have for some time been thinking about ways in which legion might be able to decompose structs like this into internal “sub-components” which would make this quite a performant layout (if you split the matrices into a vec3 position and mat3x3 rotation/scale), mostly in the context of simd-friendly data layouts… but I’m still not sure if it will be possible, whether or not it might require nightly rust, and if it ever does arrive it won’t be for quite some time.

6 Likes