Home Artists Posts Import Register

Content

Hi everyone,

in todays posts we will look into one of the features the MIPS CPU of the PSX, called either "write buffer", "write queue" or "write fifo", depending on which documentation you find.

It's a part of the CPU that every emulation of the PSX needs to have, otherwise you cannot play games with it, right?

Well, it's much more complicated and you will see in the end why this is not true and why I haven't found any emulator implementing it yet.


So what is the write buffer?


It's a FIFO that you would locate logically somewhere on the data path between the CPU and the memory or registers. Whenever RAM or Registers are written, data goes through this FIFO. FIFO stands for "First in, first out" and means that the first value written from the CPU will also be the first value written to RAM, it will keep the order. 

A FIFO can be used to decouple speeds of both sides and that is exactly what is happening here: the CPU can write data every clock cycle, while writing to RAM can be much slower.

Let's look at a timing diagram of the CPU processing to understand what happens:

Here you can see a program that first stores 3 words into memory and then executes other things. The CPU doesn't have to wait for the writes to complete, but can instead execute other instructions.

The Write buffer is filled with the 3 written words and slowly the data is written into RAM: 3 cycles for the first access, 2 for every after to the same RAM page.  


You see already, that the program flow isn't altered, the only thing this write buffer does is alter the timing of the CPU. So implementing it is not mandatory for the CPU to function. You can very well just build an emulator without it and it will work fine for most situations. That's what the PSX core did until some days ago: it had no write buffer. Writes have just gone directly to the RAM.

But there is big difference between the FPGA core and a software emulator. The FPGA core has to deal with actual memory latencies, while the software emulator can just do as if a memory access would cost nothing.

That's exactly what common emulator like DuckStation do: they assume a write to RAM costs no overhead, because the write buffer is there.

On the FPGA core however, I had to deal with the SDRAM latency and even with doing some tricks, i could only bring the writes down to 2 cycles, making them cost double the time for most cases.

There are also situations where the write buffer will be full, when writing many times in a row. If your emulator doesn't have the write buffer implemented, those writes will be too fast then, as the write timing is always the same. In reality, the CPU will slow down, because it has to wait for the write buffer to have a free slot again.


You can see the comparison of the old PSX core release against the current status here:

You see that before, most writes(SH) have been too slow overall, while now the speed matches that of the real PS1.

What would the "emulator solution" look like? Even 7 writes in a row(7SH) would still cost the same time as 1-4 writes, so the execution is too fast.

How important is that now overall? 

Not much you would assume. A program rarely will write so many times in a row. The situation will change however, when the writes go to different RAM pages, where a write to RAM actually costs 7 cycles and the write buffer is depleted much slower.

Seeing how many games are fixed with all the recent timing improvements, it seems PSX games are bound to correct timing more often than you might think.


Have fun!

Comments

giom

Thanks for the write up. I thoroughly enjoy those.