Home Artists Posts Import Register

Content

Hi everyone!

Great news this time: I could finish the FPU implementation much faster than I thought. Furthermore it even costs less resources than I thought. 

But let's do things step by step. I have seen a lot of responses over the last weeks of people not knowing what a FPU really is and what it can do, so maybe it's time for some technical article again. 

This time I want to try something new: Instead of going into technical details, I want to try a very simple explanation that everyone can follow and fully understand the idea behind floating point numbers and why they are useful.

Those that already know everything about it might be disappointed, but it's always a trade off to write an article for so many people and I really want to try that and see if it can be succesful.

So let's start!

First step for us today is to imagine a number with up to four digits. It can be anything between 0 and 9999.

That's right, we are going with normal decimals today and not in binary, because that it a lot easier to grasp.

In the example you already see that we have 4 memory slots required for each number: D1,D2,D3,D4, which are named after the 4 digits that need to be saved for each number. With these 4 slots, we can save numbers between 0 and 9999.

As you learned in school, it's possible to do calculations with these, like adding two values together. The same happens in a CPU when normal integer logic is used. There is a number of bits you have and depending on the amount, a fixed range of numbers can for example be added.

For a 8 bit processor these are numbers from 0 to 255. For our 4 digit decimal computer it's 0-9999.

But what can we do if we want to work with larger numbers? Of course we could glue together 2 numbers and get a total of 8 digits, but this requires additional work for splitting operations in two parts combining them and it also costs more memory.

How about saving larger numbers in only 4 digits? What about negative numbers also? Let's try it!

First, we will split up a number in its parts. As example let's take the number 50.

- 50 is a positive number

- 50 has 1 zero at the end

- 5 is the number without zeros

Those are simple properties, but it's really all we need for now.

Let's look at it:

You can see that we still use 4 digits for saving the number. 

- The first digit is the sign, which is a plus, because 50 is a positive number.

- The second digit counts the numbers of trailing zeros, which is 1. 

- The third and fourth digit is the body of the number that is left over when all trailing zeros are removed. 

This number is already a floating point. So the natural 50 in our newly invented floating point format is +105

Let's look at some more numbers:

It doesn't get much more complicated here, does it? For 600, we have 2 zeros at the end, so the second digit will be a 2 and -7000 has 3 zeros and also it's now negative, so the sign is a minus.

1200 is using both digits of the body now, so really needs all four digits of the floating point representiation.

What can we do with that format already? Our largest number is +999, which is a 99 with 9 zeros, so 99000000000. Not bad to save such a large number with only 4 digits, right? The format can also represent -99000000000 as well.

There must be some drawbacks, otherwise this would be far superior to storing normal numbers?

Let's take a look at the numbers 121 and 122:

Did I made a mistake there? Unfortunatly no.

The number 121 cannot be represented in this format, as only 2 significant digits can be saved in this format. The best we can do is save it as 120 and cut of the 1 at the end. That means that 122 will be the same as 121 in our format.

It even gets worse than that: the larger the number gets, the more digits will be cut off to be able to store it.

Whenever this happens, we should mark this number as inexact, because the representation cannot be converted back without loosing information.

But...that's it! With understanding the format we just learned our first two floating point operations: IntegerToFloat and FloatToInteger


Shall we do another step? Let's add two floating point numbers!

How about adding 50 and 600? We already have their floating point representation from above, so we know they are +105 and +206 and we know that the Integer result when converted back should be 650. 

That is really helpful for checking our result, because we can now also convert 650 to our floating point format and we get +165. 1 for the zero at the end and 65 for the significant digits.

How would we add +150 and +206 together now to achieve that result?

We can already see that to get +165 the 6 is shifted to body1. How could we do that?

600 can also be written in this format as +160, because a 60 with one additional zero is also 600, right?

It is easy to understand that numbers with the same amount of trailing zeros can be added without much effort.  

E.g. you can easily calculate 10+20, because it's the same as 1 * 10 + 2 * 10.

For our example the calculation would be 5 * 10 + 60 * 10 = 65 * 10.

So adding two numbers is easy if you can make them have the same amount of trailing zeros first, because then the significant digits of the body can just be added.


As second example, let's look at 90+11:

To add these floating point numbers we must make the zero count of the first and second number equal again. Let's try both possible ways:

While 90 can be represented as +109, it could also be represented as +090. Then we could already add both numbers and get 90+11=101 for the significant number. But 101 is not possible to store in just 2 digits, so we have to remove one digit by dividing by 10 and adding one to the zero count. So we loose information and the result is inexact.

Instead we could also let the 11 have 1 zero at the end. How? We throw away the 1 at the end and make it a 10 instead. We already lost accuracy here, because 10 is not 11, but we can add the significant numbers now and also get to 100.

So the result in both cases is the same. Unfortunatly the format again doesn't allow for full accuracy.


That completes the ADD functionality and while the last part might have been more complicated: Congratulations, because this was the most difficult it will get. Other operations like multiplication or division sounds more difficult, but they are not, as two numbers can just be worked on directly, they don't have to get their zero count adjusted.

Let's take a very simple example: 20 * 300 = 6000

All that is required for a multiplication is to multiply the significant numbers and add together the zero count. It's really that easy.


With that we already learned a lot of the basics of floating point numbers, how they can be used and what disadvantages they can also have.

Unfortunatly the full IEEE standard for floating point is a little more complicated, but in the end all those special cases with infinity, non-numbers or subnormals doesn't matter for understanding how floating point itself works.


One concept however we still need to touch, but if you made it until here, it's easy for you. Until now we could represent small and also very large numbers with our floating point format, but we left out one important thing about floating point: the point.

It's named that way, because you can also use it with real numbers like 0.5

Let's look at some examples of how that could work:

If we look at what the zeros column did do previously: the signifcant was multiplied with a number that has defined amount of zeros:

- 0 zeros: body multiplied with 1

- 1 zero: body multiplied with 10

What happens now is that the zeros column can get negative. When that is the case, the significant numbers are no longer multiplied with 1,10,100,.... but instead divided by.

0.5 is the same as 5 divided by 10 and 0.02 is the same as 2 divided by 100.

Also for 1.7 this works, as it's the same as 17 divided by 10.


With this little trick of one additional sign for the zeros column we can use our format to not only represent numbers as big as 99000000000 but also as small as 0.000000001.


That's it. If you do everything of that now in binary instead of decimal, you can write your own FPU :)

I hope this little learning lesson was easy enough for everyone and gave you some idea what the tasks for me have been. 

If it was not easy enough to understand, I have to apologize. I'm likely not the best teacher, but I really tried it.

If it was not deep enough for you, you have your chance now: ask anything you like to know about that topic, the FPU FPGA implemention, weird floating point edge cases or whatever is related. If there are questions I could answer and make a second, deeper article about it, I will do it!


Oh and case that is all you wanted to know: yes, the FPU in the MiSTer core is done. It's small and passes all tests I found. You can also run demos with it, for example Mandelbrot from Kroms archive.


Thank you for your support and have fun!

Comments

Llammissar

That's a surprisingly elegant explanation of how FP representation and operations work, great job!

Anonymous

Great explanation! Thanks for the effort