Mechanism to sub-dispatch group of systems

(Andrea Catania) #1

Continuing the discussion from Legion ECS Discussion:

This is a really interesting article, Thanks!

Regarding the dispatching I saw that it’s not yet considered create a mechanism that allow to sub dispatch a group of systems (similarly to the Batch in shred).

This feature is a lot useful for the physics, and for all the situations where you need a custom dispatching pipeline inside the main pipeline.

Check this for example: The Clocks Thread (Time Keeps On Ticking)


Regarding the transformation component, one of the most useful thing that is missing, is the global transformation computed by the hierarchy system. Indeed, store the matrix instead of the result Isometry is not a great solution because it will be useful only for rendering purposes but not for physics nor for game play (and I’m available to do some example to explain this).

Dispatching the rendering

The actual Amethyst-SPECS has a really big limitation by design, that I would like to not have with Legion.

The rendering system is dispatched in a secondary pipeline where everything is dispatched in sequence, preventing any kind of parallelization with the physics or with other systems that will not deal with the rendering (Audio?, Force Feedback?, other external things…).

I didn’t saw how Legion works, but if the rendering system is dispatched like in SPECS I hope that you will consider to support it now since Legion is WIP.

Here more info:

1 Like
Legion ECS Discussion
(Alec Thilenius) #2

With respect to the Transform stuff, indeed an example would be very useful. Ridged body physics in a hierarchy doesn’t make much sense beyond compound collider construction, at which point (just like Unity) the Translation and Rotation components can be manipulated directly by physics solvers.

1 Like
(Andrea Catania) #3

Let’s consider this scene, where each object of each color is a static rigid body.

The ground is the main entity, since if you move the ground you want to move everything attached to it. Then we have the following childs: the Tree, the Road and the Building. At the same time the Light Pole is child of the road and the Roof the child of the Building.

- Ground
    \- Tree
    \- Road
        \- Light Pole
    \- Building
        \- Roof

The physics engine expects that you pass the global transform of each object, but since the global transform computed by the TransformSystem also convert it in a matrix, retrieve this information is not easy. So you have to convert it back to Isometry or recompute the transformation.

Another example is when you have a rigid body that has as child an area, each frame you have to resolve the transformation for both the physics and the rendering.

I think that the TransformSystem should take care to resolve the global transform, and then the RenderingSystem have to convert the isometry to a Matrix before to use it. Basically by decoupling these two actions you solve the problem.

(Alec Thilenius) #4

Apologies if this sounds harsh, but that makes very little sense to me.

Firstly, a ‘static ridged body’ doesn’t make logical sense as a “ridged body” is defined as following kinematics. Perhaps you meant a static collider? The global matrix is already being computed and stored in the LocalToWorld component. Decomposing a matrix into an Similarity (which is what you want for physics, not an Isometry) is very easy, the translation is free to extract (it’s just the right most column) and the uniform-scale and rotation quaternion is also easy to extract but requires some division. Needing to do that is also rare, it would only be decomposed if the entity is non-static and was being used for inverse-kinematics. Moreover, ridged bodies do not belong in hierarchies, nor does it make sense for them to have children transforms (see The exception, as I noted, would be to pre-bake all children colliders into a compound collider and re-compute the COM for the entire ridged body, which could be useful and Unity does exactly this.

Being able to organize things into hierarchies like your example, however, would be super useful for the developer just as an organizational tool. But that would be an editor feature and no actual spacial transformations would be involved.

Again, don’t mean this to be rude, but your examples seem fundamentally incorrect to me. Then again I don’t write physics engines…

Also I fear this is straying way off topic.

(Andrea Catania) #5

Well would be cool understand the differences between Isometry and Similarity, in this context. However a Static Rigid Body is a Static Collider, and again the differences are completely negligible in this context.

I’m talking about Static Rigid Body (not kinematic nor dynamic) that are stacked one on top of the other, that can already be done (even without an editor) using the parent component, to compose the environment in the smartest way.

If I don’t go wrong, I said that the transformation chain is computed and converted to a Matrix, and as I said:

So you have to convert it back to Isometry or recompute the transformation.

so I know that it’s possible to extract the information from the Matrix, but:

retrieve this information is not easy

Nor it’s free; indeed, doesn’t make sense resolve the transform chain (using Isometries) and then convert these to a Matrix and then convert it back to Isometry.

Rather, is much better: Compute the transform chain while leaving it as Isometry, and each other system convert these as it needs.

Even because, you never know if the global transform of an object is needed for gameplay purposes. And in these cases you never want to deal with Matrix conversions; but even worst, the user will be confused about the fact that the local transform is stored as an Isometry while the global transform is stored as a Matrix. So, where the sense is?

I never said that you have to put Dynamic Rigid Body as child of anything, I said that it’s possible that you want to put an Area as child of a Dynamic Rigid Body. In the link that you posted, they are talking about putting a Dynamic Rigid Body as Child of a Dynamic Rigid Body, which is completely another story.

However, it’s not clear to me if your answer is about demonstrate that the Transform mechanism in amethyst is already perfect as is. Because in that case I don’t see a strong reason, but probably can you explain your point?

(Alec Thilenius) #6

Ahh, you’re talking about static in the context of at-rest and not always-stationary, I see. They are indeed kinematic/kinetic and dynamic though, they are just at (physics version of the word) static equilibrium.

I’ll leave this here because I don’t want to argue your first point any longer: “The dynamics of a rigid body system is described by the laws of kinematics and by the application of Newton’s second law (kinetics) or their derivative form Lagrangian mechanics.” - Wiki.

Rather, is much better: Compute the transform chain while leaving it as Isometry, and each other system convert these as it needs .

The Similarity/Isometry is already left alone and not changed, any system can freely access it. It’s stored in the components Translation Rotation and Scale/NonUniformScale which together make up a Similarity/Isometry respectively.

I’m unsure if you’re asking for transform components to be stored world space (which I strongly disagree with), of if you’re asking for there to be a separate LocalToWorld component that stores an Isometry and is computed without first going through a matrix (which is way less efficient then just decomposing the current LocalToWorld), or if you’re asking for something else. Rigid bodies do not belong in run-time spacial hierarchies. There is no such thing as a “dynamic” or “static” rigid body, they are all dynamic otherwise they are not rigid bodies. Colliders sure, those can be in a hierarchy. And like I said, for organization, having a hierarchy is useful but it would be flattened at bake/start-time and all rigid bodies would be in world space.

Regardless, I’m going to politely tap out of this argument. I would recommend a fork or pull-request if you wish to see the changes you describe.

(Andrea Catania) #7

Well, I think that we call things with different names. For me a Static Rigid Body is a stationary non deformable body; a Dynamic Rigid Body is a simulated non deformable body; a Kinematic Rigid Body is a non deformable body moved by the user.

But why that? Well, the reason is that we don’t have only Rigid Body, but we have also Soft body which are deformable bodies. So for me the name “Rigid Body” refer only to the deformability of the body and not its mode.

I don’t know why it is relevant the wiki link, but ok.

Regarding what I’m proposing is: instead to compute the transform chain with matrix, do it with Isometry. Which mean:

This doesn’t make the performance worst, because the currently performed actions are decoupled, that’s it.

This has some advantages:

  1. The type of the transformation variables are the same.
  2. The memory used by the Transform Component is less.
(Alec Thilenius) #8

Not arguing terminology any more, sorry. Otherwise:

The original discussion this one was moved from was talking about the new transform system, not the old one. I didn’t author the old one and it has some fundamental flaws IMO.

This doesn’t make the performance worst

Yes it does…

extern crate test;

use nalgebra::{Isometry3, Matrix4, Vector3};
use test::Bencher;

const COUNT: usize = 100_000;

fn matrix_multiply(bencher: &mut Bencher) {
    let a = vec![Matrix4::new_rotation(Vector3::new(1.0, 2.0, 3.0)); COUNT];
    let b = vec![Matrix4::new_rotation(Vector3::new(1.0, 2.0, 3.0)); COUNT];
    let mut result = vec![Matrix4::<f32>::identity(); COUNT];
    bencher.iter(|| {
        for i in 0..COUNT {
            result[i] = a[i] * b[i];

fn isometry_multiply(bencher: &mut Bencher) {
    let a = vec![Isometry3::rotation(Vector3::new(1.0, 2.0, 3.0)); COUNT];
    let b = vec![Isometry3::rotation(Vector3::new(1.0, 2.0, 3.0)); COUNT];
    let mut result = vec![Isometry3::<f32>::identity(); COUNT];
    bencher.iter(|| {
        for i in 0..COUNT {
            result[i] = a[i] * b[i];


running 2 tests
test isometry_multiply ... bench:   3,044,180 ns/iter (+/- 34,585)
test matrix_multiply   ... bench:   1,702,190 ns/iter (+/- 233,014)
(Andrea Catania) #9

Oh, this is surprising to see. I thought that the Isometry consumes less (or equal) computation power and less memory…

I’m concerned if this is matter of Nalgebra optimization; however, the question now is inverse… Why not use always Matrix?

Unify the local and global transformation type would be a benefit.

(Jaynus) #10

This is, as always, rooted in the issue that nalgebra is a mathematicians library and not a graphics library.

Many of the types and computations make sense from a mathematically perspective, but just are cumbersome or antithetical to a generation of graphics computation optimizations.


All of the “base” types in nalgebra, such as Isometry, are stored as deconstructions of their representation. When they are actually applied to anything, they are transformed first into a Matrix, and then computed with. So you basically have the overhead of creating a matrix every single time you do a Vector * Isometry. This type of behavior is evident all over nalgebra, and has been analyzed and expressed many times and will not be changed; we would have to move math libraries, or do it ourselves, if we ever want to optimize these cases (such as SIMD).

Unify the local and global transformation type

Why? There are many cases where you want to perform local or global transforms. Global is only for rendering, what would we do for bone transformations? We need both of those in that case. It is far more efficient to store many precomputed variants of values instead of caring about memory consumption, at all. In 2019, memory is cheap - our memory usage considerations should be more about cache lines and SIMD goodness and not just “storing less stuff”.

Future of nalgebra and math in Amethyst
(Andrea Catania) #11

Unify the local and global transformation type

I mean unify the type, so instead to have:

local: Isometry<f32>
global: Matrix4<f32>

would be better have:

local: Matrix4<f32>
global: Matrix4<f32>

Well, you already said that both are useful and for this reason the same type would allow to easily perform transformation operations between entities not directly connected.

But why not? It will allow to easily perform a transformation operation to obtain a position to cast a ray cast from an entity global position (in this case the global transform is useful for even the gameplay). Etc…

I never said that we have to be concerned about memory

(Gray Olson) #12

This is not true. If you look into the implementation of Mul<Point3> for Isometry3 you’ll see that it is actually ends up being implemented as essentially (self.quaternion.rotate_point(point)) + self.translate_vector, which is should be more efficient than a full Mat4 * Point4 multiplication.

(Gray Olson) #13

I re-ran this test using my lib ultraviolet and got different results

fn matrix_multiply(bencher: &mut Bencher) {
    let a = vec![Mat4::from_euler_angles(1.0, 2.0, 3.0); COUNT];
    let b = vec![Mat4::from_euler_angles(1.0, 2.0, 3.0); COUNT];
    let mut result = vec![Mat4::identity(); COUNT];
    bencher.iter(|| {
        for i in 0..COUNT {
            result[i] = a[i] * b[i];

fn isometry_multiply(bencher: &mut Bencher) {
    let a = vec![
            Rotor3::from_euler_angles(1.0, 2.0, 3.0),
            Vec3::new(1.0, 2.0, 3.0)
    let b = vec![
            Rotor3::from_euler_angles(1.0, 2.0, 3.0),
            Vec3::new(1.0, 2.0, 3.0)
    let mut result = vec![(Rotor3::identity(), Vec3::zero()); COUNT];
    bencher.iter(|| {
        for i in 0..COUNT {
            result[i].0 = a[i].0 * b[i].0;
            result[i].1 = a[i].1 + b[i].1;


test isometry_multiply      ... bench:  10,513,130 ns/iter (+/- 842,214)
test matrix_multiply        ... bench:  19,973,670 ns/iter (+/- 3,205,758)
(Alec Thilenius) #14

Wooo, really impressive benchmarks for ultraviolet! And really sad metrics for nalgebra :slightly_frowning_face: When can we switch? :stuck_out_tongue: I was also mulling over the idea of vectorization (which Unity is crushing right now), so curious if ultraviolet helps along that path?

Anyway, I’m not a math person and will very happily hand over transform stuff to someone who is. I’m struggling to understand something, would very much appreciate some help: if you were creating a hierarchy chain of Isometry, you can compute that chain by just multiplying the rotors and adding the un-rotated translations (which seems to be what your benchmark does as far as I can tell)? Also what happens to those numbers when you switch to a Similarity or Affine?

(Gray Olson) #15

Tbh I’m a bit surprised here, I don’t see why na would perform so poorly in this respect.

I think this will be discussed more in this thread Future of nalgebra and math in Amethyst but yes, ultraviolet would help significantly in vectorizing things as that is what ultraviolet is designed to do well, moreso than regular scalar operations (though it hopes to also do those well too).

Ah, brain fart… re-did it with the translations rotated properly, i.e.

    bencher.iter(|| {
        for i in 0..COUNT {
            result[i].0 = a[i].0 * b[i].0;
            result[i].1 = a[i].1 + a[i].0 * b[i].1;

and it’s still faster, though less so.

test isometry_multiply ... bench:  15,727,250 ns/iter (+/- 1,340,198)
test matrix_multiply   ... bench:  19,936,750 ns/iter (+/- 1,926,339)

I also made a test for isometry * vector vs mat4 * vector, which resulted in

test isometry_multiply_vector       ... bench:  10,219,610 ns/iter (+/- 867,558)
test matrix_multiply_vector         ... bench:  11,507,740 ns/iter (+/- 2,888,986)

Unfortunately Rotor3(/Quaternion) * Vector3 is less efficient than Mat3 * Vector3, but Rotor * Rotor is faster than Mat3 * Mat3, so there’s not necessarily a win-win… Let’s see what the benches say (isometrymatrix is an isometry but using Mat3 instead of Rotor3)

test isometry_multiply              ... bench:  16,176,850 ns/iter (+/- 1,272,929)
test isometrymatrix_multiply        ... bench:  20,541,570 ns/iter (+/- 1,202,180)
test matrix_multiply                ... bench:  20,032,180 ns/iter (+/- 1,814,484)
test isometry_multiply_vector       ... bench:  10,219,610 ns/iter (+/- 867,558)
test isometrymatrix_multiply_vector ... bench:   9,239,120 ns/iter (+/- 2,497,937)
test matrix_multiply_vector         ... bench:  11,507,740 ns/iter (+/- 2,888,986)

This made less of a difference than I was expecting, one possible reason being that these benchmarks are already being memory-bound rather than computation bound, and using a matrix would exacerbate the memory read difference.

(Alec Thilenius) #16

I would throw in my support for a switching from nalgebra, especially onto something performance-focused from the start like ultraviolet. NA seems awesome for mathematicians and physicists (assuming they switch off Python ever :wink:) but less so for games.

Unfortunately the isometry benchmarks still seem academic, as plenty of transforms will be Similarity (or worse, an Affine). So we would have mixed target types for the LocalToParent and LocalToWorld components. If they were stored as an enum then the cache locality benefit goes out the window (plus a now mandatory branch every time you access them). If you store them as separate types then everyone needs to individually query all of them. The hierarchy system would have to add/remove components to accommodate for membership in a Isometry/Similarity/Affine hierarchy as well, so lots of memcpy. The only logical target is a homogeneous mat4x4, but would very much love to be proven wrong.