Home Artists Posts Import Register

Downloads

Content

Hi Everyone,

I spend most of the 10 days since the last post with the TLB. Doing the implementation in hardware, where everything happens in parallel is more difficult than in software, but I made good progress:

- several new games are playable. Some examples are Goldeneye, Perfect Dark, Re-Volt, Shadowman and Turok2.

- more features are working now. Most notably the instruction cache does now work together with TLB translated addresses

- over 10 bugs in the CPU related to TLB have been fixed

To make it interesting, I will not just list all the fixes here, but instead present one in detail. It was the last fix that made Goldeneye work stable.


Stop ERET from resetting the error/exception level bits when exceptions is raised in same cycle

This is the github commit text for this and is probably not self explaining, so let's try to make it clearer.

Exceptions in the MIPS CPU are very complex in detail. When they trigger, the program flow is stopped and instead the exception handler will take over, no matter what the CPU was doing at that point in time. To be able to continue the program flow later on, all important information has to saved immidiatly in parallel. This happens in the Coprocessor 0 (COP0) registers that the CPU can read out and analyze.

ERET is a CPU instruction and short for "Exception return". Typical CPU instruction you might have heard of are ADD, SUB or JUMP.  Each instruction can work on some registers, change the program flow or do other special things. ERET is not that special on the surface: it jumps back from an exception handler to normal program flow and changes some exception handler status in COP0 to notify the program flow it's no longer in the exception handler.

So now we have one instruction that is setting the exception mode to off and exceptions will set it to on. What if both happen at the same time? The answer to this is very simple: the exception wins, because exceptions are always more important than the normal program flow. Because of that, the exception mode must also still be set in the COP0 registers.

That is what was not working in the CPU. But why?


Parallel processing and VHDL code

Let's compare ways code is written for FPGA and for software. We start with something very simple in software:

a = a + 1

a = a + 2

The processor would execute both instructions after each other. The first would increase a by 1 and the second by another 2, so it would be increased in total by 3.

In a clocked process in VHDL this is different:

a <= a + 1

a <= a + 2

You see that the "=" is replaced with a "<=" assignment. The first assignment could be translated to:

 "in the next clock cycle, a will have the value of a + 1"

The second assignment could be translated to: 

"in the next clock cycle, a will have the value of a + 2"

However, they are not combined, because a isn't updated in the current clock cycle, so only one of these assignments can win. This is well defined: the last assignment inside a process will be executed, resulting in the first statement to be ignored and a will be a + 2 in the next clock cycle.


In case of the cores CPU, we had two conditions to modify the COP0 exception bit:

When some exception occurs, set it to 1

When ERET occurs, set it to 0

Because the ERET condition was placed behind the exception handling, it would get priority if both happen at the same time. The conclusion to that is that actions with higher priority have to be placed later in the code.

This doesn't sound like TLB, why didn't this occur before?


CPU pipelining

The MIPS CPU consists of 5 pipeline stages:

What that means is that 5 instruction are "in flight" all the time. When something is calculated in stage 3, another instruction is already decoded in stage 2 and another already fetched in stage 1.

That also means, that if some instruction is executed, the following ones must be already read from memory or cache and this is where TLB comes into play.

Let's assume a very simple TLB mapping that maps a 4 Kbyte page of memory from address area 0x0000..0x0FFF to the address area 0x10000..0x10FFF. So if the game wants to execute code from address 0x40, it will fetch the code from memory at address 0x10040. That is is simple and there is nothing special to consider here. 

Now we write a new exception handler code that is super efficient and we want to fit it exactly in the 4kbyte area so we only need one TLB entry for the whole exception handler. We place the ERET instruction at the last possible address, which is 0xFFC..0xFFF, because every instruction is 4 byte in size. And now we execute that code.

What happens when we do that, is that while the ERET instruction is placed inside the TLB mapped entry, the instruction following the ERET will automatically be fetched from memory already, because when ERET is in the Instruction Decode stage, we must already fetch the next instruction to keep the pipeline filled. Because ERETis still decoded, we don't know yet that we will never need the next instruction.

Because of that, the next address 0x1000 needs to be translated and as there is no entry for this page in the TLB, a TLB miss exception is triggered, leading us to having ERET and a TLB miss exception in the same clock cycle.

Why didn't that happen with other exceptions before?


Exceptions have been rare in the past

First of all: exceptions are just like the name says exceptional. They should not happen all the time. In fact, most of them should not happen ever, because the program doesn't even know what to do with them. So when they happen, the program will often crash finally.

Some games like Paper Mario have a handler that at least gives some hints to the developers what has happened and where, but most games will just hang and not do anything anymore.

There have been only 2 exceptions that happened in normal program flow before and both don't execute another instruction in parallel:

- interrupts will exchange the instruction that usually happens to a interrupt request

- the SYSCALL instruction is the opposite of ERET and will move the CPU into the exception handler 

When they happened, it was impossible to have ERET or any other instruction being executed in parallel, so the issue never occured and the bug could sleep in the VHDL code for months without being detected.


With all those bugs resolved, is the TLB done? Unfortunatly not yet:

- the TLB translated memory address still cannot use the CPUs data cache

- there are some bugs left that make Star Wars Battle for Naboo crash early and Conker crash randomly

The first point is mostly work and I will do this next. The second part is (hopefully) the final TLB debug action to be taken after that.


A new core with all the recent fixes is attached as always.

Have fun!

Files

Comments

Anonymous

Great work and thank you!

Anonymous

I must say i'm really impressed. Mario Tennis 64 is nearly perfect (just a few visual glitches with the ball's drag, but that's nothing compared to the overall experience which has never been that good using emulation. Seriously, well done ! and i really did not expect the core to reach this quality stage that early. Thanks again !