Home Artists Posts Import Register

Content

Hi Everyone,

this is a wrapup of a debugging session from yesterday evening, going on in the MiSTer Discord Server.

It began as a user report 2 days ago with noticed slowdowns in one particular area in Final Fantasy IX on disc 4. I didn't pay too much attention at first, because slowdowns in PSX games are not that uncommun to be honest, but the user investigated further and found that in DuckStation this slowdown doesn't happen. 

It didn't took long and others jumped on it and this scene in the game was tested on a real PSX. It turned out that the real PSX doesn't have any slowdown there, while the PSX core does.

So what is going on?

You can see the scene in question above. It runs on stable 20 frames per second in the core, while all nearby areas run at stable 30 fps. Now why is that?

This area features a full screen rain effect. I quickly identified that this effect is done as transparent lines drawn above the fully rendered image. 

This was no surprise for me, as transparent lines have been identified as the only weakness of the PSX cores GPU design, due to the kind of memory that is used. I made a whole video about it last year, which you can find here:

https://www.youtube.com/watch?v=EMZVp48SMps

You can see the summary of the investigation from last year in this table:

The important part of it is, that only transparent lines are drawn slightly slower on the PSX core as they are on a real PSX. In combination with the a typical scene however, the core should always come ahead.

At least that is as it should be on the most common PSX models. There was a first revision of the console that featured different memory used for Video RAM, which is always slower in the benchmark.

What is so special about transparent lines? It reveals an issue of the MiSTers DDR3 RAM, which is used for the VRAM: it has a high latency.

Lines that are drawn from top to bottom(or bottom to top) will only have 1 pixel drawn per scanline. If these lines to be drawn are transparent, the GPU has to read back the old color value for each pixel from VRAM and will therefore suffer the latency penalty from the DDR3, while larger objects profit from the higher throughput of the DDR3 and average out that effect.

So while all this could explain the slowdown in this area of the game, there is still something strange: the PlayStation where this scene was found to be running at 30 fps is an old model with the "slow" GPU. 

Maybe we should dig deeper? 

So I loaded up a savestate of that area in the VHDL simulation to see the timing of the draw process:

You can see that the backgrounds and 3D models cost only a small portion of the time, while drawing the rain costs most of the time.

The rendering of one image takes 57 milliseconds, with about 14ms for the scene itself and 43ms for the rain. To have the game render at 30fps, we would need to bring this rendering time down to at least 33ms.

That seems not really possible and so I thought that is a dead end and I have to accept this shortcoming. I did however investigate a little bit further to see if some caching of the VRAM might help with this issue.

To see how that could work, I modified the software emulator, which I have written prior to the core, to give me some more detailed information.

I let it draw the rain as non-transparent to see what it looks like and also counted the lines that are drawn: 444 lines that is. And they are all over the place. Not nicely sorted from left to right, but fully random.

This is like the worst case for any caching scenario, as I would have to cache the whole framebuffer area so that it would really help, which is just too much to be really feasible, given how full the FPGA already is.

Well, it's 444 lines and the resolution is only 320 pixels wide. If you look at the picture, it seems that a lot of these lines must be on each other, otherwise those numbers wouldn't make any sense, right?

When trying to debug further in my emulator, if there might be anything that can be done, I stepped through the rendering of the first line and instantly found something very interesting:

The rendering of the first line was skipped! But why?

The framebuffer, that gets rendered, is placed at a position in VRAM. in this case it's from X=320,Y=0 to X=639,Y=223, as you can see in the image below showing the whole VRAM content as the scene with the red border around:

But the X position, that the line would be drawn to, is at 267, so it's outside of the current drawing area.

The PSX GPU allows the game to define a drawing area where drawing is allowed, while all drawing outside is simply dropped.  This drawing area is defined as the red rectangle in the image shows, and the line that shall be drawn is fully outside of it:


Lines that are completly outside of the drawing area don't have to be drawn at all. Even more important, they don't need to fetch data from VRAM for transparency effects.

But how many lines shall that be, that are outside of the drawing area? The game will likely not try to draw lots of useless lines, right?

There is only one way to find out: I implemented skipping of lines that are fully outside of the drawing area in the line drawer of the GPU and ran the simulation again to see the result:

The result is really impressive! The rain drawing comes down to 15ms in the simulation, leading to a total draw time of 29ms, which nicely fits in 30 fps. The game indeed try to draw more lines fully offscreen than onscreen.

So will this change fix the issue? The draw time in simulation is only an assumption, because i don't know the exact DDR3 latency. That is because the latency of the DDR3 depends on several other things, like how much the Linux works on it or what the scaler does. Typically I work with a rather "worst-case" timing in simulation to be on the safe side, so the real performance should be even better, but lets see what the core will tell us after building it:

Success! The area now runs at stable 30 frame per second.

This was a wild ride yesterday evening, lasting 3 hours from the report until the issue was fixed. I tried to summarize it as good as possible, while going into the technical details. Please tell me if you like these kind of reports and if the lengths was good.

The fix itself will be in the unstable version later today and a new release build will be coming soon.

Have fun!

Comments

Anonymous

Great post, thanks for the detailed explanation!

Anonymous

I enjoyed your write-up!