FPGAzumSpass

First RSP steps done (Patreon)

Published:

2023-07-22 11:40:25

Imported:

2023-12

Content

Hi Everyone,

let's look together at the progress since the last post and the plan for the next weeks. We will also look in detail at the RSP this time, to understand what the progress even means.

First let's sum up the changes, then go into detail:

- EEPROM handling was implemented. That means games like Mario64 or Mario Kart can now initialize the EEPROM at boot time and read/write data. Without that, these games would never boot, so it was important to have that ready. The EEPROM content is not yet saved to SDCard, that is a step for later on when the games work

- RSP DMA was implemented. This is used to load instructions/data to the RSP to prepare it's processing. On a high level view, that means that games can now load microcode and for example polygon data that will be processed by the RSP

- Scalar RSP processing pipeline is fully working. This is the normal CPU-like part of the RSP that can do what a typical 32 Bit CPU could do: execute code, logical operations, jumps, memory load/store, ...

You can see an overview of the RSP parts in the image at the top and we will go into detail of each of the blocks and it's status in the core.

But first: what even is the RSP?

RSP stands for Reality Signal Processor and is like a coprocessor in the N64. Coprocessor sounds like it would only support the CPU and that's what it really does, however, it is not comparable to your typical e.g. math coprocessor like the GTE in PSX was.

Instead the RSP is a full 32 Bit MIPS CPU running at 62.5Mhz with it's own Instruction and Data Memory. Or in short: it's like a PSX cpu running at about double clock speed.

Furthermore it has it's own coprocessor-like vector unit, which allows to calculate the same operation on 8 different data sets in the same clock cycle. For example it can do 8 ADDs in a single clock cycle additionally to the operation the base processor(called scaler unit) can do.

Yes, vector and scalar unit can run in parallel, making it super fast. Even more because the Instruction and data memorys have no delays. The RSP does not have to stall at all when accessing them, making it behave like all instructions and data would always be cached.

Let's look in detail:

- IMEM:

The CPU can write code to be executed by the RSP into the IMEM(Instruction Memory). As there is some access delay from the CPU to write into IMEM, there is also a DMA available that can be triggered to load data from the main RAM (RDRAM) at full speed. (64bit per cycle)

Status: The IMEM is fully imlemented.

- DMEM:

The same applies for data, which goes into the DMEM (Data Memory). Furthermore this memory is also connected to the RDP, so processed data can be directly transfered to be rendered as e.g. triangles.

Status: The DMEM is mostly implemented, but the RDP connection is missing

- Processing:

While the RSP could work in an endless loop on some program, that's not how it is typically used. Instead it's halted until program/data have been prepared by the CPU and then gets triggered to start working. Once the task is complete, it will halt itself. When the RSP starts running, it will decode instruction from the IMEM. Depending on the instruction type, the next instruction is either run in the Scalar or Vector part of it. If the next 2 instructions are mixed, so scalar and vector type, both can run at the same time.

Status: processing is halfway implemented, but Vector operation decoding and dual instruction handling is missing

- Scalar Unit

This is your typical processor part that can execute ADD, SUB, shifts, branches and so on.

Status: mostly implemented. Exchange with the vector unit is missing

- Vector unit

Does run 8 operations on the same instruction, e.g. 8 Adds with 16+16 bit in 1 clock cycle. It can do about 40 different operations, way more than the scalar unit, so it will be a lot of work.

Status: not started

- RSP/RDP Register access

The RSP is very special with the register access it has. It is obvious that it can somewhat control the RDP to start/modify rendering but it's not so clear at the first view why it can also read and modify it's own config registers as this could allow to do some stupid things like deactivating itself.

But the main idea behind is, that it can reload code and data from RDRAM to IMEM/DMEM via DMA. This way it can handle the complete processing, for example a list of polygons, on it's own without CPU interaction.

Status: RSP and RDP registers are implemented, but currently only the CPU can interact with them

So that's the current status and if you made it until here, it's clear what will be the next tasks:

- implement Scalar Unit access to the RSP/RDP registers

- implement Vector Unit (decoding, memory access, processing)

- implement RDP access to DMEM so processed data can be rendered

Most likely in this order.

This will easily keep me busy for the next weeks and I'm already excited for when that's complete, as this would mean that games could start working afterwards.

Rendering is still missing to many features to show much, but at least something could be visible and sound might also work.

Have fun!

Content

Files