[Discussion] EventChannel Performance Pitfalls

(Jaynus) #1

This is a seperate discussion thread opened in tandem with: https://github.com/amethyst/amethyst/issues/1898 to discuss why EventChannel is currently bad.

I’ve copied the details from that ticket here.

Feature

We need to implement a new, immediate-mode eventing system for performance critical events to propagate on a single frame.

Possible Solutions Are

  • [ ] Implement a new same-frame eventing mechanism
  • [ ] Refactor usage of EventChannel across all systems to mitigate delays
  • [ ] Any other suggestions?

Reason

EventChannel from shrev currently has major design implications and performance pitfalls which were not considered on the intial design and usage. The system itself inherently introduces a single frame (16ms) delay on any event which hits the channel, as it is guaranteed to not be read until the next frame. Because of this, its usage across the engine is incurring 1-3 frame delays as certain critical events propagate across the engine.

For example, the Input system chain currently looks like this:
winit EventChannel (1 frame delay) -> InputSystem event channel (1 frame delay) -> Renderer pre-render frames (up to 3 frames delay!) Finally stuff is updated (maybe another frame delay).

This is giving us at a minimum, a 32ms delay for input to actually register. In practice, we are seeing 3-5 frame delays in input, adding up to 100ms input delays. This is unacceptable, and just is the best example of why this design is inherently flawed for its current use in our engine. It is still applicable as a general purpose event channel, but for many critical uses it needs to be phased out.

1 Like
(Kae) #2

Rough proposal: Use a scheduler-aware channel implementation that, when run in a System, buffers events in a thread-local chunk-oriented scratch buffer. When the System finishes execution, let the scheduler thread flush the System’s buffer into the globally visible channel before running dependent jobs, preferrably by simply transferring ownership of the buffer chunks. This should avoid contention entirely.

Challenges:

  • par_join makes it possible to interact with an EventChannel bound to a system from other threads
  • Needs to handle being used outside of a System context, probably.
(Jasper) #3

When things really are performance critical, how does a regular channel compare? There is no guarantee that the sender and receiver will be on the same thread, so why not just skip specs, shred, and anything else in the middle entirely and use a time-tested and proven method?

Reference: Crossbeam Benchmarks

(Tatsuyuki Ishi) #4

I think you two are completely missing the point of the post. It’s not a throughput issue but design issue.

The system itself inherently introduces a single frame (16ms) delay on any event which hits the channel, as it is guaranteed to not be read until the next frame.

The question is, do we need to have this generation mechanism at all? Specifying dependencies is the user’s responsibility, and given that we don’t need to consider the inconsistency that might happens when system runs out-of-order.

1 Like
(Jaynus) #5

So you are correct @ishitatsuyuki , with 100% perfect system execution ordering, this would not be an issue with EventChannel as it stands. However, it still does have the additional overhead (beyond this discussion) because of its guarantees and assumptions.

EventChannel is designed for the following:

  • guarantees delivery to all registered readers
    • This is accomplished by storing a message until all readers read it.
    • This has added overhead that allocation is inconsistent and growable
  • Read and Write exclusivity of shred currently means reads will always occur outside of, and after, any given write. (Makes sense and good)
  • Fire-and-forget semantics, leaving it up to the user to determine any type of handling or returning
  • All Reader systems must always iterate over all messages in an EventChannel

Although I think great for its specifically designed purpose, it has some inherent flaws in the way we use it here. There are plenty of reactive scenarios (especially in game development) where we would either like to mutate an event, filter it, or respond with an immediate new event. There also exists the overhead of any system having to iterate over all event messages in certain channels; this isn’t obviously bad, until you deal with Action Input events; imagine, every single system which expects a certain kind of Action event (event just 1) must now read all input events ever. This is inherently unoptimizable design and something we should consider.

As @kabergstrom stated, I think the solution here is to build eventing directly into a System dispatcher. This is partially implemented in my current working-version of Legion in which i’m implementing a dispatcher for our purposes. The basic premise in Legion, at least, is that anything that needs to access the world is a Job, which has read/write/before/after considerations to dispatch. Once it can be dispatched, its thrown into a Rayon pool and executed. Now, eventing comes into play with being able to dispatch more Job tasks from a system. In that manner, a new Event would just be a new System Job that gets one-shot dispatched in the current frame; usually (but not always!) immediately after its System has executed.

This is one solution, for world modifying events at least. Theres also the circumstance of immediate mode events. These, I need to do more thinking about. But I imagine something like a Future (maybe not a real one) which would block execution of the system until complete (a function call from the perspective of the System). However, from the dispatchers perspective, it would actually defer execution of that system (and its event) until that Event’s AND the System’s dispatch requirements could be met. I haven’t really thought this one through all the way yet.

1 Like