April news + Detailed look into shader handling (Patreon)
Content
Hi Patrons,
here is a preview of what to expect in Cemu 1.7.6. Keep in mind that development just started a week ago and everything mentioned in this post is subject to change. There is currently no set release date but we plan to spent about 3-4 weeks working on this release.
General plans
For 1.7.6 we will be looking into the following:
- Separable shaders (details below)
- Graphic fixes
- Performance improvements
- Fixing remaining rare FPU emulation related gameplay bugs (e.g. Splatoon)
- And as usual, other various bugs and minor problems
If time permits we will also look into audio and compatibility improvements.
Separable shaders
This part will be quite technical and if you just want to read a summary of what to expect from this change, you can scroll to the end.
Let's start with a bit of background knowledge. Here is an image of a modern render pipeline: Image (Taken from the Khronos OpenGL wiki) The image depicts the steps on the GPU to go from some input geometry data to a pixel on the screen. For the purposes of this post we don't have to worry about all the steps and we can boil it down to only these shader stages:
- Vertex shader
- Geometry shader (Optional)
- Pixel shader (Optional) Also called fragment shader
The above applies to OpenGL, DirectX, Vulkan or any other rendering API on PC. For Wii U it's not exactly identical, but similar enough that we don't need to dive into details.
The problem
For historical reasons Cemu always looked at the entire shader pipeline as a whole and shaders were generated in sets. That means that for each possible configuration of shader types, Cemu would generate a new shader set upon first use. This set would contain a copy of the vertex shader, the geometry and pixel shader. If individual shaders were reused but other stages differed, Cemu would create a new set regardless.
It's easier to explain this with an example:
Assume we have 2 vertex shaders (vsA, vsB), 3 pixel shaders (psA, psB, psC) and no geometry shaders.
The game does the following draw operations:
Draw 1: vsA + psA
Draw 2: vsA + psB
Draw 3: vsA + psC
Draw 4: vsB + psA
Draw 5: vsB + psB
Draw 6: vsB + psC
Question: How many shader sets do we need to create?
The correct answer is 6. Each of the draw operations uses a unique configuration of shader types.
This is where we start to see the inefficiency emerge: There are only 5 shaders (2 vertex + 3 pixel), but Cemu generates 6 sets. With only a small numbers of shaders this isn't that big of a deal, but since most games actually ship with hundreds of shaders the number of possible permutations is huge. Even worse, each shader set contains a copy of both the vertex and the pixel shader. So in total we are generating 12 shaders from only 5 shaders as input.
The obvious but difficult solution
Naturally, we should try to avoid creating shaders whenever possible. In theory the solution is easy: Scrap the concept of shader sets and handle shaders individually. Modern OpenGL allows us to do that, so why not make use of it.
However, this is easier said than done. There are a few problems that need to be solved: Not all shaders translate well and sometimes the pixel shader affects the vertex shader and vice versa. In such a case we are forced to create and use those shaders in a set because they cannot exist independently. Another problem is that the current render state can affect the generated shaders. For example, if alpha transparency is enabled it modifies the pixel shader that Cemu generates. Meaning we are forced to generate one pixel shader without alpha transparency and one with it.
But ultimately we try to strive for perfection. Thus, for 1.7.6 we decided to scrap the shader set system in it's entirety and replace it with a solution that can handle individual shaders. The idea is that the new solution is as adaptive as possible. Ideally, it should be able handle shaders individually but also be able to deal with cross-shader dependencies.
The only downside:
The massive change in how we handle shaders in Cemu will break compatibility with already existing shader caches and graphic packs (if they use custom shaders).
Separable shaders (summary)
Cemu 1.7.6 will handle vertex, pixel and geometry shaders individually instead of pairing them up into sets. This allows individual shader types to be reused.
The pros:
- Faster shader compilation overall
- Less RAM usage by shaders
- Better performance
It's too early to give exact numbers yet.
The cons:
- Breaks compatibility with shader caches from 1.7.5 or below
- Breaks compatibility with graphic packs if they use custom shaders