Home Artists Posts Import Register

Downloads

Content

Hi Everyone,

one of the missing CPU features has been implemented for the N64 core last week and we will take a look at it today. But before the technical details, we take a look at the other changes since the last post:

- fix z-Buffer regression in 32Bit color mode

- fix interlace video for analog out

- shift video centering to match original hardware

- pass through all PIF commands for NSAC (e.g. VRU microphone)


Two other new features I want to explain a little more:

Implement mouse as Analog Stick

You can now use a USB mouse as analog stick in the core. This can be useful for playing RTS games like Starcraft or shooters like Quake 2. The OSD allows to select what the mouse buttons(left,middle/wheel,right) works as on the N64 pad, e.g. left click as A or Z. 

As you can also configure a keyboard for player 1 as input device you can for example play shooters with WSAD + mouse. It might not be as accurate and responsive as on a PC, due to the analog stick emulation, but still really playable.


Forced Dedither

Some games don't enable dedither in the VI, but apply dithering to the image. The result is that you clearly see the dither pattern allover the image. You could disable the dithering, but then you get color banding.

I don't know why the original developers have made this choice, but now you can decide yourself: the dedither option in the OSD also allows the "Forced" option now, which gives you the chance to apply dedither even if the game usually wouldn't. It makes a huge difference in F-Zero X and Star Wars Episode 1 Racer, but unfortunatly you cannot active it by default, as for some 2D games it just makes the image blurry.

Right image is with forced dedither:



Write Fifo

If that feature sounds familiar for you, then you are right, as I have already written a similar article about 15 month ago for the PSX. The N64 write fifo behaves more or less the same as in the PSX core, so I will reuse some assets from the old article.

The write fifo is a part of the CPU that will stretch the required memory bandwidth in time to allow the CPU to work, while some data is written in background.

It's a FIFO that is placed in the data path between the CPU and the memory or registers. Whenever RAM or Registers are written, data goes through this FIFO. FIFO stands for "First in, first out" and means that the first value written from the CPU will also be the first value written to RAM, it will keep the order.

A FIFO can be used to decouple speeds of both sides and that is exactly what is happening here: the CPU can write data every clock cycle, while writing to RAM can be much slower.

Let's look at a timing diagram of the CPU processing to understand what happens:

Here you can see a program that first stores 3 words into memory and then executes other things. The CPU doesn't have to wait for the writes to complete, but can instead execute other instructions. The write fifo is filled with the 3 written words and slowly the data is written into RAM. 

You see that the program flow isn't altered, just the timing of the CPU. So implementing it is not mandatory for the CPU to function. You can very well just build an emulator without it and it will work fine for most situations. That's what the N64 core did until some days ago: it had no write fifo. Writes have just gone directly to the RAM.

But there is big difference between the FPGA core and a software emulator. The FPGA core has to deal with actual memory latencies, while the software emulator can just do as if a memory access would cost nothing. Therefore, to reach the real CPU speed, we need this technique.


Let's take a look at what that means for the CPU write speed. I have written a small assembler test for it. It allows to compare the timing of the core against real hardware. This was the result before the write fifo was implemented:

The tests writes 1-9 words after each other to memory and counts the amount of clock cycles it takes. The values in the red box are min and max measurement values over multiple tries on a real N64 console. The values in the white box are from the core.

We need to discuss several things here:

The time values get bigger with larger amount of data written, obvious why. But sometimes the values are the same in the next line. The reason is that the timer of the N64 CPU is only ticking every second clock cycle, so all values are in reality two times as big and we lose that last cycle of accuracy in the measurement, because we have no timer that is accurate enough. In other tests I worked around that to get exact measurement, but here it's not important, because...

The values are horrible inaccurate! What is that? Writing 9 values to RAM costs between 42 and 66 timer ticks, which in reality are 84 to 132 clock cycles. Nearly 50% slower sometimes?

Indeed, the N64 itself is very unreliable in terms of how long a memory action takes for several reasons:

- RAM refresh

- The RAM is shared, so might be interrupted for VI to output to screen

- Which RAM page is open when the next write comes is random, depending on the action in between


Now that this is clear, let's look at what really happens on the N64: due to the write fifo having 4 entries, we can write 5 values without any additional cost in the usual case. Starting with the 6th write, the fifo is still full and the CPU needs to wait until more space is free.

Why 5 writes and not 4? Imagine the first value being written to the fifo. It will instantly be written to RAM when the memory interface is free, making the fifo free again. The second to fifth entry cannot be written to RAM then, as the memory interface is still busy from the first write.

On the MiSTer core side (white box) you can see that each write still costs 4-5 timer ticks = 8-10 clock cycles, starting with the very first write. This can make the CPU significantly slower. Writing 3 values after each other was about 5 times as fast on the real N64.


After the write fifo was implemented, it looks much better already: 

The min values match with the real N64 up until 5 word writes, which is again clear, due to the fifo size. Beyond that the core is currently too fast. The reason is that the first write on the real N64 has some extra latency which is not implemented on the core yet. Overall the core is around 5-7 cycles too fast when the fifo size is not enough. Emulating this latency is still needed for the core and will be part of the general memory latency and bandwidth fine tuning as soon as all major features are in.

That completes this feature. A build with all these changes is attached.

I will work on more CPU topics in the next days.

Have fun!

Comments

Anonymous

Just curious mate, what keeping us from playing with Mario's face in Mario 64?

Anonymous

Really enjoying your technical insights in to building your N64 core. Thank you for sharing!