FPGAzumSpass

My most wanted bugfix (Patreon)

Published:

2023-09-24 07:52:18

Imported:

2023-12

Downloads

Download N64_20230923.zip (browse »)

Content

Hi Everyone!

Today we will look at the technical details of a bugfix that I have done 2 days ago and that made so many games playable. It is kind of a bug hunt journey, as the path to fixing has been like a maze with many dead ends. Even the solution alone holds so much details that it could cover a whole article. Please forgive me if the details are too deep, but I'm excited about it and want to share it.

But before we start: There have been plenty of other fixes and updates the last days and I will cover them in a seperate post, so don't worry, you will be up to date soon.

About core updates: I again put a "stable" version to this update post here. However, if you want a more comfortable update process with all the latest builds, you might want to add some lines to your downloader ini so that the newest core is fetched automatically with update all. Zakk has set up the process and it's working great.

All you have to do is to add these lines to the file "/media/fat/downloader.ini"

[n64_dev]
db_url = https://raw.githubusercontent.com/RobertPeip/Mister64/db/db.json.zip

With that being said, let's start!

Probably all of you who tried the test cores will have run into issues with games like Mario 64, Wave Race or Harvest Moon where the audio would have pops in it and the game would freeze random.

If you enabled the error overlay, a common result could be what you see in the picture:

Error 000100. The number is hexadecimal with one error flag per bit, so in this case it's bit 8.

The readme of the core says: "Bit 8 - RSP Instruction not implemented"

We need to step deep into the RSP here right away to get further: the RSP is just a 32 Bit MIPS processor , basically like a PSX CPU with double clock speed. Additionally it has a 8x16Bit vector coprocessor that it very tight coupled with it, just like the PSX had the GTE as coprocessor.

That means, it can execute any code that it's asked for. However: not all instructions you can let it execute have a defined function. That is just a very normal situation for a CPU. If for example the instruction has 8 Bits, there are 256 possible actions that could result from it, like ADD, SUB, MUL or JUMP. But if the CPU only uses 56 of them, 200 are undefined and do nothing or at least nothing defined and documented.

What this error code now tells you is that one of these undefined instructions was executed. The RSP in the core will not do anything when this happens and just skips the instruction, but why did it even end there?

To understand that we need to look at what the RSP usually executes. We just have seen it could be any code, but is that really the case when playing a particular game?

The reality is that programming of the RSP is easy .... and very hard!

It's easy because everyone can start and just execute any kind of code on it. Setting up a simple assembler program can be done in minutes.

But it's also very hard for some reasons:

- The RSP has very few memory. Only 1024 instructions fit into it's super fast memory. Whenever code to be executed is larger, it must be preloaded via Direct Memory Access(DMA). While that is also fast, it can be complicated to exchange code at runtime. Data memory has the same restrictions

- The RSP has the normal Scalar part and the Vector part. Unlike PSX CPU and GTE where each is waiting on the other, the Vector part in the RSP can run in parallel to the Scalar part, but only under certain conditions. To fulfill these, the code would have to be handoptimized in assembler if you want the highest possible throughput

- The RSP runs in parallel to the CPU. When offloading work to the RSP, you need to make sure to keep both in sync and handle the exchange both efficiently and safe

Due to all these things, the RSP was long time a closed component where the developers of games only received access to it using the official library. This library contained different microcode, you probably have heard of before.

I don't know all details here, but from analyzing what games do there are different microcodes loaded for e.g. audio decompression and 3D rendering, depending on what modes (e.g. fog, transparency) they use. So when a developer wants to decode audio to be played back, they most likely used the official libraries microcode for that.

The RSP documentation was later opened up and some developers like Rare made their own more flexible or efficient microcode. The conclusion for us however is: we can rely on many games to not using a lot of different RSP code in all kind of situations, but instead they are mostly executing the same code over and over again.

That brings us back to Mario 64 and why it throws this error code: it's not that the core just didn't implement a RSP instruction, because the microcode is always the same so the issue would always come up. Instead the reason must be that either the data it operates on makes it crash or the code was not loaded properly to the RSPs instruction memory.

Unfortunatly that doesn't make it much easier to search for the issue, as it still leaves many possibilites open. I tried to debug Mario 64 and came to the result that it processes more audio data than it should, resulting in overwriting the memory that holds the return address of it's current function, but I had no clue why this would happen, so I was stuck at this point for quite a while.

Some days ago I received a hint from wark91 on discord about a homebrew game that often crashes and prints some error screen:

The good thing about homebrew games is that the code is often available. In this case it wasn't even the code of the game itself that is interesting, but the library "Libdragon".

Yes, the RSP is very hard to program efficiently, so homebrew developers also use library microcode.

To understand what happens here, we need to look at some details in the crash report:

- the HALT and BROKE signals are set. That means the RSP doesn't run anymore, it has stopped

- SIG5 is set. Libdragons source code says: "Signal used by RSP to notify that has finished one of the two buffers of the highpri queue"

- rspq_next_buffer has timed out. This functions waits for either SIG5 or SIG6 to be set. SIG6 is similar to SIG5: "Signal used by RSP to notify that has finished one of the two buffers of the lowpri queue"

So what we learn from it: the RSP has been issued to execute some code and when the RSP is finished, Libdragon expects the RSP to set SIG6, so that the CPU can work ahead. This has not happened. The RSP stopped without setting SIG6, so the execution hang up.

Now we should look in detail what these "SIG" are:

The RSP has different memory mapped registers that can be read and written by both the CPU and the RSP. One of these registers is the SP_STATUS, which can be used to get information about the state of the RSP.

We already touched the HALT and BROKE, which tell about the current execution status of the RSP overall, if it's still working or not. Then there are information about a DMA being active and some other debug signals.

But what we want to look at are the 8 SIG Bits. These are bits for exchanging trigger information between the CPU and the RSP. Libdragon uses them like that:

- CPU writes SIG7 to request the RSP to do some work

- depending on the priority of the work, the RSP works on it and sets SIG5 or SIG6

- when no more data is requested to be processed, the RSP will stop working and put itself to sleep

The natural idea that comes to my mind when I see that is: what happens if for some reason SIG6 is never set when the RSP is finished? Well, exactly what we see here. But what if SIG7 was not set properly by the CPU? Then the RSP would not start working and SIG6 would also never been set.

This is a serious problem now, because it leaves us in a situation where both the RSP or CPU could execute wrong code or have any kind of issue that leads to the problem and we have no chance to rule one out.

After some hours of debugging both the CPU and the RSP with no result at all, I took a look at the register description in VHDL code that allows to write these SIG bits. Both CPU and RSP can read and write this register, so of course this would be an issue. That's why I added logic for it some months ago alrready: the RSP has priority and can always read and write a register. The CPU write and read to them only when the RSp currently doesn't:

If you want to try yourself: stop reading ahead and find the mistake in it. In this form it's possible to find it. The key here is to assume same time access.

To explain this more: the CPU read and write request is buffered, That means, it will not be executed directly, but instead kept until the RSP doesn't access the RSP registers. Once there is a free slot, which will always happen as it's impossible for the RSP to access the registers every clock cycle, the CPU will do it's read or write and clear the buffer as it is done.

What will happen if there is a CPU read from a register while the RSP writes it? Even that is not an issue. The CPU will still read the old data, but that is allowed behavior, as those two processors are not synced in any way, so their programs must be stable enough to allow that, for example with repeated reads until a SIG is set.

But there is one important issue with the concept I made: while the read and write path are seperated, so they could work in parallel, the address path is not. What happens now if the CPU wants to write register A, while the RSP will read register B? The CPU can do it's work, as there is no write from the RSP and the RSP can always do it's work anyway. However, there is only one address. So instead of writing register A, the CPU instead writes to register B, as the RSP part has priority.

This is very serious because of two reasons:

- register B could be anything else in the RSP, potentially starting a DMA that overwrites the RSP code.

- the write to register A could have been important, e.g. the SIG7 request we have seen above. When this write is lost, the whole communication will hang up forever

The solution to this issue is relativly simple: don't let the CPU access the register when the RSP is reading OR writing instead of allowing read while write. This way the access is always safe.

Now that this bug is fixed, the Libdragon homebrew games should work, right?

Unfortunatly that is not the case. Despite all the good theory about how all this works, it didn't help. Maybe they crash less often but they still crash randomly.

So only a nice fix for some edge case that doesn't matter?

Not at all!

- Booting up Mario 64 and the sound pops were gone. Running around for 5 minutes in Bob Omb Battlefield and still not Audio pop or crash or error flag.

- Running Wave Race attract mode for 20 minutes, still working.

- Harvest moon intro sequence plays fully without hang.

- Majoras Mask music doesn't stop anymore when the framerate falls below the 20fps target.

I was blown away. This was one of the, if not THE most important bug that was in the core at the current point. Mostly because of it's random behavior that made it nearly impossible to research and debug. I feared it could take me very long time to find the reason and then it kind of falls into my lap with a homebrew game that is not even fixed afterwards.

Really hope you also enjoy that as much as I do. Stable running games are one of the most impressive results when I play on any of the cores. Like after some time I sometimes ask myself how many billion instructions have passed through the core and that it's still working and didn't crash. A human cannot grasp anymore what these systems do underneath when they "just work", but it makes me happy to even try to imagine it.

Have fun!