Home Artists Posts Import Register

Downloads

Content

Hi Everyone!

Today I want to look back at what happened this year, where we came from and how everything has turned out.


January

On 2nd of January I created a N64 test folder on my computer. The purpose was to write some assembler tests in the style of the PSX timing tests I had written before, to find out if a N64 would be feasible at all on MiSTer.

While the PSX2X core had shown that a 32bit MIPS CPU could run at about 66 Mhz, there are more things in the N64 that would be relevant than just the CPU speed. So I wrote some very basic tests to check latency and bandwidth for CPU IO, CPU Cache , RDRAM and Cartridge interface.

You can see one example here: the test measures the latency of accessing all the components of the system from the CPU. The current core nearly fulfills all these things, with only RDRAM being 1 cycle faster sometimes and MI being 1 cycle slower sometimes. 

The core is actually more consistent in the latency than the real hardware and that is one of the reasons I felt more confident it would work:

If games get random timing from the real console, they cannot depend on exact timing in their programming and are much less likely to fail if we would need to apply tricks to get the core running on the N64. Keep in mind that at this time in January I was still thinking several tricks are required to get the core working at all, which turned out to not be the case.

The second important result is the RDRAM latency. You can see a value of 18-22 cycles of latency in the measurement, minus 2 cycles for the overhead measured in the EMPTY test, so 16-20 cycles. But that is not true. The timer in the N64 CPU only ticks up every second clock cycle, so in reality it's much worse: a random read from RDRAM takes 32-40 cycles at 93.75 MHz or at least 341 ns.

Seeing that was such a relief, because that is insanely slow. If we compare it to the PSX which can read from RAM in 8 cycles at 33 MHz, that is only about 240 ns. So despite the N64 using such fast RAM and a higher clock system bus, the requirement for RAM latency is not an issue at all.

RAM bandwidth is quite high with 64 bit at 62.5 MHz and tests showed that it can really reach that bandwidth minus some minimal overhead, but that was just normal business, after the PSX cores VRAM was accessed with 64bit at 66 MHz just fine. So would there be a chance?

I started to write my software emulator and the journey begins.


February

A lot of work went into the most critical part: the CPU. The reason is that one software emulator isn't just the same as every other, they have very different targets. One major target of most "modern" systems being emulated in software is to be able to play games, so the emulator need to be fast enough. That is not the case here.

The whole purpose of the emulatod CPU was to find out what could be optimized, precalculated or skipped in order to keep the maximum clock frequency of the CPU high. In FPGAs, the maximum clock frequency depends mostly on how many things you do in a calculation path within one clock cycle. So in our Cycleon 5 FPGA you could:

- copy a register value to another register with about 300 MHz

- add two small values at 250 MHz

- run the scaler logic at 150-200 MHz

- run a 32 bit CPU at only 66 MHz?

Well this cannot be the whole truth. There is one special 32 Bit CPU in the Cyclone 5 that runs much faster:

The NIOS II is a CPU that is designed by Altera for FPGAs and heavily optimized to run as good as possible. You can see that under such circumstances much higher clock rates are possible.

Unfortunatly we cannot design a custom CPU and have it run the N64, but maybe there is at least a chance to improve it further?

So I spend a lot of time here for the CPU logic, always with the future FPGA design in mind. This was worth it in the end, even with the project taking longer.


March

After these crucial steps where done, I worked on RSP,  Audio, PIF/Gamepad and other parts more or less randomly: implement things that are required to fulfill one test and then I went for the next test. Most critical for the development of the emulator was the n64-systemtest by Lemmy, which has multiple thousand subtests.

This is the current status of the core and it was also the status of the emulator end of March: all non-TLB tests full passed. 

With that games should be working soon? Well, it took some more fixes for issues that are not covered by the systemtest, but not many overall and Mario Kart 64 was working. Unfortunatly I don't have any old screenshots from it, but it was not looking very good.


April

This changed fast, because with the first game running, the motivation was so high, I just HAD to get it render fine.

So in mid of April that status was already that good that I felt having this on MiSTer would be such a great result that there was no way back, so I announced working on the system public.

This was not an easy step, as failure was still possible and I didn't want to deliver something that was only a slideshow with inaccuracy allover it. The main reason I made it public was that I didn't planned to give up, but had multiple backup strategies in mind if things get rough. They sound silly today, but at that time it was still not clear what could really work. Some examples:

- keep the CPU at 32 Bit and emulate all 64 Bit commands

- translate the MIPS assembler commands to NIOS II commands and use that as CPU

- run the CPU at lower clock rate and add caches and other tricks to bring speed back up

In any case, this month was really busy. I implemented large parts of the RDP for the emulator, the full TLB, PAL mode, savestates and save memories.


May

The last couple of bugfixes for my emulator filled the first 2 weeks of May. At this point I had all but 8 games boot up in my emulator and felt really safe with starting FPGA work.

There was one important design choice I still had to make: what to do with the RDRAM. As we have seen above it's bandwidth is 64 Bit at 62.5 MHz and as those 62.5 MHz are the system bus speed in the N64 and used for the majority of components, it would have made sense to just run the DDR3 on the DE10-Nano at this speed.

After writing the DDR3 test core it turned out that at this clock rate, we could easily fulfill the needs of the N64. So we should keep it? 

The test also showed that increasing the clock rate does not only increase the bandwidth, but also lowers the latency. Because there was still no N64 core and it was completly unclear what the path would be, I decided to go one step further and clock the DDR3 interface at double clock speed of 125 MHz.

So we would have more headroom in case we needs tricks for the CPU and if not, we could at least use it for some kind of turbo mode later on. Using different clocks would make the design slightly more complicated, but if it just makes the project take slightly longer, I could live with that. Was it the right decision in the end? That's not even clear yet at the point we are. Time will tell. Overall it still makes the core more flexible for the DE10-Nano, but porting to a different board later on might be more effort.

With that out of the way, the work on the CPU could start. I did heavily profit from doing the PSX core before, because it's also a MIPS CPU and that gave me a headstart. With that i was able to get the first homebrew working at the end of May:



June

Two major milestones where reached. 

First the core got a basic RDP implementation, which made it possible to see live 3D rendering.

For me this is always such a huge step as the main reason I worked on PSX and now N64 is 3D.

On the one hand I started with gaming on the Gameboy and C64 and later got a N64. The jump in graphics was incredible and will always be combined in my head forever: 2D = old and 3D = new.

On the other hand, 3D just always felt so advanced and complicated and doing it in a FPGA was like the holy grail for me. So seeing something like the NICCC demo running at 60 fps and thinking about what the FPGA must calculate in the background to make that possible still is incredible.


The second milestone was the soft FPU I implemented in my emulator that was passing all the tests now. You could ask now what is so special about it, when the emulator was already working fine? Well, it's the big difference between making games run and understanding what really happens.

To make it short: even with the FPU working in the emulator before, I had near to zero understanding of how floating point calculations really work. This is not very difficult overall, because modern programming tools just take all that from you.

You can just write down whatever calculation you want to do and as long as you declare your variables as float, single, double or whatever you need, all just works. The tools even take away the difficulty with the rounding modes, exceptions or conversion to and from integer values, so there is no need to understand how it really works in the background.

For implementing in to the FPGA however, you either depend on using existing libraries or you need to understand it. As the existing libraries would have been way to large in logic size and not optimized for the needs of the N64 MIPS CPU and didn't even cover all instructions needed anyway, I had to learn it.

The first baby steps have been very hard and I did everything wrong I could have: I thought I should start with understanding the ADD instruction, because...it's ADD, what can be so bad about adding two numbers? 

What I learned later is that ADD might be the most complicated of all the supported FPU instructions. So it took me about 2 weeks in total to understand it and I feared that if every instruction takes that long, it would be insane for the project.

Thankfully it turned out that if you understand one of the hardest instructions, the other ones are really easy to graps and I could understand often more than one per day.


July

Now it was time to bring that low level soft FPU knowledge from the emulator to the FPGA core. As it often happened in the past with new things: when you start to understand it, it looses it's magic.

So I was able to implement that FPU in about 2 weeks to the point it would fulfill all tests. Sure, later in the year some bugs have been revealed, but I would have NEVER expected that to be possible in the first place.

With that working and controller interface being implemented, there was a big surprise:

The first retail game was booting up on the core and to some degree it was even playable.

This was completly unplanned and such a nice showcase that the core is indeed real. I took the chance and added the sound interface so we could also hear some of the music and sound effects of this game.

Namco Museum is very special in that it only needs the CPU and some basic system modules, it never touched the RSP or RDP. To get any other game to work, multiple more things had to happen first. But with such a big step, the motivation was high of course.


August

I spend most of the July for the RSP and got it more or less finished early in August, fulfilling all of the the n64-systemtest cases for the RSP.

With that out of the way, there was only one major module left before them "true" games could also work. It took another 10 days of RDP work and the big moment had finally come:

Even with all the flaws, making it barely recognizable as Mario Kart 64, it didn't matter to me at all. Getting this game to boot, run through the menus and going ingame all out of sudden was just amazing.

I just couldn't let it stay at this point, so a lot of graphical improvements followed in this month:

- z-Buffer

- perspective correction

- different texture modes

- tons of bugfixes

The result looked so much better already:


September

It gets kind of hard to track all the changes from here on since a lot of small fixes had huge impact.

One important change was the fog implementation which caused wrong colors in several games like for example in Mario 64 Bob Omb Battlefield.

Also a texture bug was fixed that gave Mario his face.

The most important fix however was the random crash that affected so many games.


In terms of new features, the core got the datacache, which brought most games to their original speed and framerate.

Saving was now working for all the different save methods.

Dual pass rendering and dual texturing have been implemented.

I could show you tons of images of the progress in this months, but I stay with my personal favorite here for this month:

Yes, it's empty and there is not much to do there, but just let me ride though this big world and explore it and I will be happy.


October

Apart from many many bugfixes, we got 3 big features in this month and with all of them I got a lot of help.

The first is the VI implementation:

Markun has helped me so much with the VI implementation. Not only did he made the full implementation in my software emulator, we did also discuss endless hours on how to implement it best for the FPGA.

It was such a big help to discuss with someone also understanding the logic at this low level and interested in finding a good solution on how to implement it with as few ressources as possible.

Together we came up with a solution that is really small for what it can do and it's one of the reasons the core still fits into the FPGA even with most of the features now implemented.


The second feature was SNAC

Blue1 spend so many hours on getting the SNAC implementation working together with my PIF module in the core and in the end I did a whole rewrite of the gamepad communication and the emulated gamepad module to make this as close as possible to the real hardware.

At this point we got all the accessories also working and I can't thank Blue1 enough for everything he did on this way.

Getting SNAC working together with transfer pak was so much fun. After all those years, running Pokemon Stadium with Transfer Pak isn't something that is so easy, both in emulation and using real hardware, due to transfer paks often failing at the connectors when they age, leading to potential save game loss.

I just had to also implement a emulated transfer pak to make this available for everyone. Without having SNAC working and compare against, it would have been impossible.


The third feature is the asychronous VI

The reason you see no image here is also the reason I needed help: this change was very important for CRTs and as I don't have any and also don't want any, I got a lot of help by MikeS.

Not only did he test all my tries to get video out working correct, but also measure with oscilloscope against a real N64, made suggestions how to change things, wrote code and interated over it again and again until the output timing was matching.

So if you enjoy the core on your CRT, this is mostly due to his support.


With the async VI in place, we would also have a fixed video timing now and could have the correct aspect ratio.

Kitrinx, who already wrote code to generate the aspect ratio tables for the PSX core, supported here again and provided the correct values for the different video modes for the N64 core with NTSC, PAL and cropped outputs. 


Many others also supported with working on Mister Main support, the database, testing and other things, so please forgive me if your name is not here. Thank you all so much for bringing this core ahead!


November

It started with some bugfixes and accuracy changes, but I already know which part is the most important for everyone: TLB

The virtual address mapping was the last big feature missing that prevented a bunch of games from working.

I always was worried about implementing it because of the potential impact on the CPU performance. So I wanted to have most other things in place before. It turned out to be the truth, but not as bad as I initially thought.

While I sometimes need to do more than one compile run to get a build I can share, we can still run the CPU at the full clock speed and that alone is amazing.


December

The TLB implementation was just completed recently and the last non booting game also started working on the core. What else could I have hoped for in this year?

I can hardly imagine how things could have turned out better than they did: both the overall progress and the capability of the DE10-Nano have been great.

We got some more fixes and improvements recently I will likely cover in a seperate article, but you will find a core attached if you already want to try them. Be aware that you must also exchange your MiSTer file and reboot it afterwards, because the core depends on these changes. It is also inside the zip file.


That's it for this year. I will continue this journey and I wish you all the best for the next year.

Have fun and thank you so much!

Comments

Anonymous

1 year ago everybody was so adamant this was just impossible. Then it changed to "well, it'll be good to have the core ready for future hardware" and then suddenly, playable games, on the very hardware this wasn't supposed to happen. You are a god damn wizard. Thank you so much for your passion and you hard work.

David Filskov

Just impressive effort!! :)

David Filskov

Earlier Paper Mario stopped just as Mario or Luigi was about to exit the house in the intro but now it won't go ingame or save the player slot. It's been like this for quite a few weeks. Do you know why? - here's a video of this persistent issue: https://www.dropbox.com/scl/fi/v6l8dp7c4sshne912n7b6/Paper-Mario-won-t-go-ingame-anymore.mp4?rlkey=dx1pgkr433fqspo5tfrzq2hgv&dl=0

Trifle

Really enjoyed following your progress with this core Robert, thanks for the development detail posts!