FPGAzumSpass

SPU complete (Patreon)

Published:

2022-01-22 13:22:40

Imported:

2023-12

Content

Hello everyone.

I apologize for not writing very much here the last weeks. As the SPU is now complete and even a first version with single ram support is available, I want to take some time to tell you some details of the SPU, to give you some insights on what happened the last few weeks this year.

First of all, let us look at how the PlayStation can do any kind of audio.

There is the SPU, which can do 24 independent channels of sampled music or sound. There are no sound generator channels (e.g. square or triangle wave) like on many older consoles. Everything is coming from the SPU RAM as ADPCM samples. Only exceptions is the single noise channel that can be connected to any of the 24 channels and the reverb function.

Besides that, the SPU can receive audio from the CD unit and mix that with it's own output. CD audio is divided into CD DA and CD XA(compressed).

These CD features are not available yet in the core, but are planned as next features.

So how does the SPU actually work?

Each channels decodes ADPCM samples from SPU RAM. These samples have a fixed compression of 16 bit to 4 bit + a small header, so they reach a compression rate of 28,5%. As the SPU RAM can hold 512Kbyte, it can fit quiet a lot of sound and music there. But decoding ADPCM for all 24 channels is just the first step.

The decoded samples are then filtered and interpolated and each channel can play back these samples at free configurable speed between 0.0x and 4.0x speed. Furthermore the previous channels sample can also be used as playback speed of the next sample.

The SPU also has a ADSR feature for the sounds. ADSR stands for Attack Decay Sustain Release and describes the 4 steps each channel goes through when playing a sample.

When a sample is started, it runs through the attack phase, increasing it's volume. When a certain level is reached, it switches to the decay phase and loses some volume until another point where it's switching to sustain and holds it's volume until the end of the sample. When this end has come, the channel goes to release state and decreases the volume to zero.

But these are just typical use cases, a game can do whatever it likes with it, as all these stages are configurable in duration(linear or exponential), target volume and direction.

As if that is not enough, every channel can additionally have it's volume adjusted over time using the envelope function individually for left and right(stereo), as well as a static volume multiplier for each side.

All these 24 stereo outputs are then mixed together to a single left and right output, which is fed into the reverb function.

With reverb, data is stored to SPU RAM and additionally to the current sample, is played back from an earlier position in the SPU RAM, where it has been stored before, creating some echo effect.

CD audio is also mixed into, before another envelope function for the left and right results is applied.

A game can also capture generated samples from either CD or 2 SPU channels.

Well, this is just a short summary, I left out quiet some details here, but it may give you a feeling of how much the SPU has actually to do.

Why is it important? All that has to be done for every sample in 768 clock cycles or 0.02 millseconds.

The original SPU is very well designed to fulfill this job(no exceptions) with the help of fixed memory latency. This is also true when using a second SDRAM module with MiSTer.

But how can it work with the random, long latency DDR3?

There are two crucial points in the design:

The first is the SPU design itself. While the SPU has 768 cycles available to calculate everything on a real PlayStation, it would help a lot if it could be faster with DDR3, so it has additional time for RAM access. So I made a SPU that can calculate everything in about 350 cycles. When using SDRAM as SPU RAM it is slowed down, so it fulfills the original timing, while with DDR3 it get's to use this extra horsepower.

The second step are intelligent caches, that are not yet fully finished.There are independent cache blocks for each channel, allowing for prefetch without risk of loosing the cached values and special handling for every access needed. (24 channels, reverb, data transfer).

It allows to bring down the average memory access delay of the DDR3 from roughly 20 cycles to less than 6. The better this gets, the higher the chance that the 768 cycle limit can always be held.

What happens if it cannot? The old sample will be played slightly longer and the new one slightly shorter. So the total playtime is still the same. As long as this happens only very rare, you will hopefully not notice it.

Furthermore, we could add be a small output buffer with less than 1ms delay, removing these length differences.

So that wraps up my work of the last 3 weeks on the core. I will continue to work on the single SDRAM version for some days before I start working on the CD XA implementation.

Have fun!