Rendering Runtime API Types Rethink

(Kae) #7

Great points!

I think a lot of this “baking down to rendering primitives” can and should be done in the asset pipeline. I agree we (or users) can build opinionated high-level constructs with custom editors/tooling as long as the engine can accept individual pieces of data from the asset pipeline and turn them into (parts of) a render command.

Talking about ConstBuffer though: how do you imagine the struct should look? As I understand it, the actual data layout will depend on the shader it is used with, so if we want to have a “composable” ConstBuffer it would need to contain named constant buffer values as I described for Materials, then “compiled” for a specific shader.

1 Like

Most things said here only consider main render pass.
There can be no Mesh or Material outside of it.

I’d like to add that rendy has its own Mesh and Texture for static vertex data and images that are typically loaded from assets.
rendy::Mesh is designed in a way that allows to glue it to the shader without man in the middle but with only shader reflection available.

I totally agree that main render-pass must be data-driven. The pass implementation should be able to read shader code and understand what data it should get from World, what get from other render-nodes etc.
But the hardest thing here is how.

My initial idea was to register functions into some registry to allow fetching data from World by name.
Consider next simple vertex shader.

layout(location = 0) in vec3 position;
layout(location = 1) in vec4 color;

layout(location = 0) out vec4 out_color;

layout(set = 0, binding = 0) uniform VertexArgs {
  mat4 transform;
  mat4 camera_view;

fn main() {
  out_color = color;
  gl_Position = vec4(position, 1.0) * transform * camera_view;

Now render-pass seeing that this shader must be applied to the Entity (its handler is somehow associated with it).
It reads that vertex attibutes are position and color. Mesh stores format with named attributes. It just matches names and size of attributes, checks if they occupy one buffer or stored separately, generates VkPipelineVertexInputStateCreateInfo (part of VkGraphicsPipelineCreateInfo) that glues Mesh and shader.

Next render-pass should deal with uniform buffer definition.
First it must be decided if buffer exists on its own or should be created and filled with data.
We can use uniform name for that purpose (I haven’t checked how this data is reflected in spirv though).
Maybe some metadata attached to the Entity guide render-pass in the decision.
Ok. “VertexArgs” means render-pass should create and fill buffer. This means render-pass reads fields to see what they are and where to get actual data. First field is “transform”. Nice. Let’s use Transform to fill this one.
But one does not just walk into Mordor hardcode every possible field name, it’s exactly opposite to data-driven architecture.
Let’s make render-pass go into Registry and do fetch("transform"). This will return a function fn(&World, Entity) -> &[u8]. How this is better than hardcoding? For one it can be augmented from user-code.
I asked @torkleyy if he sees an opportunity in nitric to solve this with even less code-writing and more data-driving :slight_smile:
Also, render-pass creates pipeline layout with uniform buffer at set = 0, binding = 0 and remembers to write into set a buffer that will be filled by data fetched from World to the set.

Long story short render-pass reads all shaders and creates appropriate graphics pipeline.
Or uses one already created one for the same set of pipeline defining data.

To implement this we need to declare what components define a pipeline for the renderable object.
We may settle on Mesh + Material set. To be more precise, vertex format of the Mesh and shaders, blending op, stencil op etc (not textures) from Material define pipeline.
If Entity has at least one then it is renderable object.
Some renderable objects can be rendered without Mesh - sprites, billboards, particles etc.
Maybe Mesh without Material can make sense too.

If you think that this idea worth it then we can discuss it further :wink:
Ask if something is confusing.
Any feedback is appreciated.

(Gray Olson) #9

But, this is what I want to avoid. Since not everything makes sense as a combination of Material + Mesh, it seems like there should be a more generic lower level wrapping of “vertex format + shaders + blending op etc”, and Material and Mesh simply wrap these for convenience as a default (for use with the default passes for which it makes sense to have an entity have a Material + Mesh in order to be drawn). But it seems to me that it shouldn’t have to be that sprites, UI, etc. are “special cases”–they are relatively common things that tie together very basic building blocks that would be very useful to make into more easily useable composable parts.


I didn’t think too much about how to store this in World actually :slight_smile:
Mesh + Material is just what comes first into my head.

(Paweł Grabarz) #11

I’m wondering if using spirv-reflect at runtime might actually be too generic and impose too much performance penalty to be useful. Doing hashmap lookups in a tight rendering loop seems costly. Those lookups might potentially be taken out from the loop and done only once, but still i’m not sure if having a specific data shape should always translate to the same world queries. There are many possibilities what a constant byte buffer might mean for a particular render pass. Relying on a binding name also seems like a stretch. I’m wondering if we can instead statically generate code basing on shader reflection, that would be included into the pass. Then require a “encoder” trait to be implemented by the user, which would connect a shader input shape to the actual render pass, that is query the world for whatever it needs and return the shader input data.


In this case Encoder will be basically the hardcoded render-pass. And render-pass will be not data-driven at all.
BTW. No map lookups required. Spirv reflection is few vectors :slight_smile:
And the process will be done only once for each shader.


I agree that entity name usually is not reliable thing. But in code we rely on string identifiers all the time. Any type or field or variable are named and you use it to get data, to know what it is.
Although in programming language you rely on types more :slight_smile:
But in shaders you often have to use primitive types. And you put semantics into attribute and field names.
Not to mention that in opengl attributes and uniforms were bound using names because there was no way to do it otherwise.

(Joël Lupien) #14

@kabergstrom I talked about Material in my “render unification” RFC

(Kae) #15

I’ve thought some more about this, and Frizi has recently implemented an “Encoder” concept that takes parts of the World and extracts data into buffers, ready for rendering Passes to process.

Here is an example of Encoders:
Here is the greatly simplified DrawFlat2D pass:

I think this is a great way of extracting data from the world and is composable in the sense that a Pass does not need to know about all the components it may get data from, just the buffer format. Ordering of encoded commands for the pass is handled by “post-encoders” that sort the data for the pass. Advantages

  • Optimized (linear cache behaviour, very few branches)
  • Easily parallelizable (par_join, multiple encoders running in parallel)
  • Decoupled from graphics API specifics (encoders are not coupled to a specific Pass, Pass is not coupled to specific components or specific Encoder)

This pattern makes it very straight-forward to implement a Node for the rendy graph: Read the appropriate encoded buffer and emit commands into the command buffer. The Node will be entirely responsible for setting up the graphics pipeline etc.

In terms of getting data from other render nodes: This should be 100% expressed within the node graph.

I think we should provide a ConstBuffer with name:value pairs that assets/users can put arbitrary values in, and make it easy to bind these to a shader, but let Node handle ConstBuffers the way that is appropriate for the pass.

On a different topic, I think it will be possible to make rendy node graph composable at runtime too, with an editor similar to Shader Forge:

This would allow users to modify the entire frame graph as an asset with hot reloading support. We’ll have to flesh out the details, but from scanning the docs it looks like it’ll be possible.

1 Like

That’s not uniform data but actual constants :slight_smile:

The problem I see is that Flat2DData format is not extensible for user.
If user wants to pass more data into shader how would he do that? Could you provide a pseudo code if this can be done with suggested approach.

(Kae) #17

User would extend the Flat2DData format to extract more data from the World.

I meant that this type of name:value pairs can be supplied as an asset and passed by reference in the Encoder supplied data, like Flat2DData, to supply arbitrary data to the shader.


So each user extends Flat2DData until it grows so large that Vec<Flat2DData> it doesn’t fit into memory :smile:

(Kae) #19

Each Pass would define its own input data structure from the World. The input should be kept only to what is needed by the Pass to keep it optimized. If a user wants to create a new pass, it would have a new data format.


We talking about main render-pass here mostly. Others are very different and requires own design.
All objects would be rendered in main render-pass, regardless of what shaders they use.
Each object may need its own set of data from World.

(Kae) #21

Is there a specific reason all world objects should be rendered by a single “main render-pass” node?

(Paweł Grabarz) #22

I think that the current encoder code should be interpreted as a refactor that allows us approach that problem at all, instead of a more-or-less final solution.

I see that the general idea here is to be able declare what the pass needs, and based on that, run only the encoders that provide that data. We could split the existing encoders into multiple parallel ones, but that can easily explode into too fine grained systems that could potentially block each other. Also as Viral noted, there might be not enough memory to fit it all at once. This might warrant some queuing, but IMHO we can ignore the problem for now. (you sure have more system RAM then your video RAM, right? :stuck_out_tongue: ) I tried to implement those encoders as a layer above systems, so combining them could be possible, but failed due to some type system trickery around Join. I’d be glad to try with you again as another iteration on top of what we have.

I have some ideas of what the data flow could have been, but existing shred implementation might be too limited to handle it. Specifically, encoding could be indeed separated per “data kind”, like colors, positions or any uniform/varying/const etc. data. There could be an encoder for “2d position”, “tint color” or “albedo texture”. I think the engine could have many predefines such “slots”, possibly allowing a Custom(&'static str) enum variant for user defined things or something like that. Once the required layout is determined based on current render graph node, the encoders would be scheduled to run with specific buffer destination and stride. The tricky part - multiple encoders mutating the same buffer without locking (possible only due to stride). Next, the buffer would be post-processed for things like sorting (might also be done based on types - like depth sorting only affecting buffers with position), then the pass would just be handed final buffers that can be just straight copied into corresponding Randy objects. Due to buffers being cleared only once per frame, the same buffers can be potentially reused across many different passes.

It all might for sure provoke some changes in ECS.


I like the idea of parallel fine grained encoders but unable to see how they can cooperate.
They can’t just write data into Vec linearly as data from same entity would spread in different indices.

Encoders writing data directly into gpu buffers and set descriptors would not work too for the same reason.
Imagine two types of objects. They rendered by two different pipelines with different layouts. Both need Transform data, but first needs data from component Foo and second needs data from component Bar.
The objects can be interleaved so Transfer encoder would visit objects with Foo and objects with Bar. How then pass that knows nothing of Transfer, Foo and Bar types and encoders would know which objects data at which offset?

I imagine encoders to be not a systems but special handlers that fetch data from World on demand.
Pass iterating through entities with Renderable component, decides which encoders it needs and allocates ranges in buffers and descriptor sets based on data in Renderable component (most data it should get directly from shaders attached to Renderable). Then pass uses encoders to populate buffers and descriptor sets and records draw call into command buffer to which correct pipeline is attached. In next frame pass will not update buffers and descriptors for this Entity if relevant components unchanged.

But maybe you have another solution on your mind.
I understand how iterating toward big goal is important, but we need this big goal well defined and possibly be on the same page. Otherwise we risk to go in wrong direction.

(Kae) #24

I’m going to try and summarize the discussions in this GitHub issue about Encoders

Summary of raised issues/concerns

omni-viral: Frizi’s Encoder in the PR is too specific and not data-driven enough. It’s not extensible: if users want to change the shader and add some data in World components, they would have to copy the entire built-in Pass and Encoder, then add their fields to their custom implementation.

I think this is addressed with the design I propose below.

omni-viral: O(n) is not good enough. We should support only sending changed data to GPU every frame as O(k) where k is number of changes is usually order of magnitude smaller than n.

It’s still unclear how to do this generally and still maintain good per-element performance. I think it’s somewhat possible if we use specs modification events but there would still be a level of indirection to map an entity to an offset in the GPU buffer, which is not necessary when using an O(n) approach to rewrite buffers every frame with current entity set.

In my opinion we should start to make the O(n) approach fast, with very low per-element overhead. This plays to the strengths of current CPUs where we can make obvious performance wins with linear access patterns and has consistent performance regardless of how the game modifies entities. It is probably also easier to implement.

Data-driven Encoder

I’ve given a go at attempting to design a somewhat data-driven approach to World data extraction.

A few observations about the problem,

  1. We know the layout of Components at compile-time
  2. We don’t know the layout of Pipeline buffers that Component data will be written to at compile-time (without current Pass type for specific object kind + known shader struct)
  3. We want to move fields/data from Components to Pipeline buffers

I wrote the following example for a simple Encoder implementation that is implemented for a set of component types.

Let me know what you think.


This function just returns pipeline from renderable. So how renderer knows that this pipeline and data this encoder encodes are compatible?

Could Encoders be combined with this approach? If yes then how?
Do you try every known encoder on every entity?
Some components contain resources, not copyable data to put into buffer.

Moving the needle on user-defined rendering: a render pass for my voxels
(Kae) #26

This can only be known after running pipeline_outputs:
This function links the fields with shader metadata to know where to write data. It would probably return a Result instead.

I suppose? My thinking is that the run_encoder function should run inside of a specs join loop. So you can run any number of these encoders, is that what you mean?

Every registered encoder would run a join loop based on the component set.

EncodeTarget supports any type, just a bit of “unsafe” code, but you can also write inner handles (the u32) with it. So resources would work, the ptr would just not point to a GPU buffer, instead some other place on the heap.