FPGAzumSpass

100% TLB and 100 % booting (Patreon)

Published:

2023-12-09 13:39:12

Imported:

2023-12

Downloads

Download N64_20231208.zip (browse »)

Content

Hi Everyone,

the core reached two major milestones before the end of the year, so let's take a look at how that was reached and what the current status is.

100% booting

In the last week all of the non-booting games could be fixed and show signs of lifenow. This includes: Battletanx, Indiana Jonas, Star Wars Battle for Naboo, Fifa and WaveRace Shindou.

100% booting means that every official game will show at least something. Few games might still crash in game, but all should be somewhat playable with the exception of Jet Force Gemini, that doesn't get through the intro cutscene yet. It's a huge step for the core, even though it doesn't mean the core is near completion yet.

The improvements are partly due to TLB functionality being added, but also two anti-piracy methods being handled.

Anti piracy - the simple method

A little disclaimer beforehand: I don't know the source code of these games or the true intention, but the following describes what makes most sense to me, considering what is happening.

Some games like Battletanx and Indiana Jones check the CPUs internal counter to be in a specific value range, otherwise they don't boot.

To understand that better we need to look at the bootup process, at least simplified:

- The N64 CPU boots out of the PIF ROM, which is a small BIOS like memory

- Because it is super slow, the CPU copies the boot program to RSP memory and runs from there

- several hardware parts are initialized now, e.g. reseting the CPU counter to zero and calibrating the RDRAM access

- The bootprogram then checks the header of the ROM and copies the first Mbyte of ROM is copied into RDRAM

- the bootprogram jumps into RDRAM and the game begins it's own, non-BIOS, boot process

As you can see, these first stages take some amount of time. So when the game takes over and checks the CPU internal counter, it will have ticked up quite a bit. On a real console with a real game cartridge, this counter value after booting is relativly stable.

Now let's think what would happen if you boot the same game using a fake cartridge: If the cartridge doesn't have about the same timing, the copy process will take different time and the counter would have a different value, leading the game to not function.

What if you use a flash cart? If no counter measures are done, the cartridge menu would already work in the RDRAM and not go through the boot process again when a game is loaded, leading to completly wrong counter values.

So it makes perfectly sense that checking the counter value right after booting and freezing the game without any information was a method to work against piracy, rather than some required feature.

Anti piracy - fixed

Now you would assume that if the core was just accurate enough in ROM timing and other timing, it should have the counter in the required range after booting. Unfortunatly there are parts that we cannot really emulate in a good way.

The mentioned RDRAM calibration is the key issue here: we don't have RDRAM on the DE10-Nano. Instead we use the DDR3 to behave like the RDRAM. But that means we cannot go through the hardware calibration phase. For us this is plain and simple: the emulated RDRAM is already calibrated, so this will take no time.

To fake this time now for the games that require it, we must count up the CPU counter by the amount of time it would take to calibrate the RDRAM. But I went a cheaper way here:

Few instructions in the boot process before the RDRAM would be calibrated, the CPU counter is reset to zero. This is also the only time in the boot process the CPU counter is touched. Therefore the tricks goes as follows:

After the reset, the first write to the counter value will not set it to zero but instead to some different default value. That's it. With a different starting value, handpicked to match real hardware, we will have a counter value in good range for the games and that is all we really want there.

Anti piracy - the hardware method

N64 cartridges have CIC chips that are used for anti piracy. The typical, simple ones are only checked in the boot stage. Those are a trouble for flash cartridges, because they use the original console and must fulfill the requirements to pass these tests, but for us they are no big deal, as our PIF implementation will never tell the CPU that a cartridge(or ROM) didn't pass the test...why should it?

Then there is one special CIC chip, the 6105. This one can be used to check if a cartridge is original at runtime. But that's not even the worst part of it. This check is not handled by the PIF, returning just a pass or fail, but instead data is exchanged between the CPU and the CIC, which means we cannot simply fake a pass. Instead the core needs to implement the CIC test method to be able to provide a good response for the CPU.

The method works as follows:

- the CPU writes 30 x 4 bit to the PIF RAM and requests the PIF to initiate the CIC check

- the PIF sends these 4 bit "nibbles" to the CIC and they are processed

- the CIC sends back the same amount of modified response nibbles and they are saved to PIF RAM

- the CPU reads the values and checks with it's own expectations

The method for the manipulation is relativly simple...at least today where it's documented well in basically every N64 emulator. This also means the implementation was relativly straighforward in the core. Given it's only 4 bit wide logic makes it also extremly small. It only costs 15 ALM or 0.04% of the available FPGA logic.

The last TLB subfeature

I kept the most difficult one for the last: hooking up the TLB translated addresses to the CPU datacache. Until yesterday, every data read or write using virtual addresses had to go directly to the RAM, instead of being able to use the datacache. Why was that so difficult? Let's take a look at how the datacache works:

The datacache consists of 512 lines with 128 bit in each line. Each of these lines holds additional information in form of a tag and status bits. This tag will tell you which part of the RAM is saved in this cache line.

Some simplified example: We want to read from address 1234.

- the cache will check what tag is saved in cache line 34

- if it is 12, it can deliver us the value saved in the cache

- if it is something else, it has to fetch the line from RAM, update the tag and deliver the new data

This method is straightforward and there is not much to consider, because the tag and the address in memory are equal. With TLB, it gets more complicated:

Let's assume our address is still 1234, but now it's a virtual address used only in the CPU and would translate to a physical address to be used with the RAM as 5634.

- we would now read the cache line using the virtual address 1234

- but we need to check if the tag is correct using the physical address 5634

This concept is called:

Virtually indexed, physically tagged

Why would you do that? Wouldn't it be much easier if we just translate the virtual address to a physical address and use only that for the cache?

Yes indeed, this would simplify the logic quite a bit, but it has a major drawback: your CPU cannot be clocked as high as before. The reason this more complex method is used is to be able to do 2 things parallel: while we read the cache using a virtual address, we can translate the address in parallel to a physical address. If you would use a pure physical indexed and tagged cache, both steps would need to done one after the other.

Implementing it like that needed some rework of the datacache itself, because the original design didn't plan for that. This was a good choice and on purpose and not a mistake. When doing critical parts of the system, I rather have them simple at first and do a redesign later on instead of implementing it in the complicated way directly. This reduces the effort to get it running and the whole system can be validated to work without debugging fully featured, complex modules.

TLB done or not?

In theory, this completes the TLB. All the planned and needed functionality is implemented. The only missing part is TLB in 64bit adressing mode, which is not used by any game. The only known application for this is a Linux port for the N64.

However, some games like Conker or Gauntlet still show some random freezing. As it hits games using TLB more often than other games, it might be related and I need to investigate further. So while it's complete, probably it doesn't mean it's bug free.

That's it for today. A new build is attached.

Next week I will give you a full overview of the project progress and plans for the next year.

Have fun!