Interaction preparations - Memory & load optimization (Patreon)
Content
A major concern I've had for quite some time is regarding the size the characters occupy in VRAM (memory on GPU). The great majority of this space is occupied by textures.
In order to draw on textures using the GPU (the coloration system in FurryVNE), we're using something called a "render texture". While being versatile and allowing to draw into them, these are raw, uncompressed data, taking up vast amounts of space. For simple characters, it's not that big of a deal, but for more complex ones having lots of parts and multiple layers of clothing, the memory can grow significantly (up to gigabytes). When only having a single character on screen, it's fine I suppose, but if you're going to have multiple at the same time, the memory usage would simply be unacceptable. Just having 8 of these in a scene would essentially exhaust the memory of many high-end systems. This is not ideal.
Unity (the game engine we're using) does offer some functions for compressing textures to reduce their size, but my experience with them has been underwhelming. I'm not sure why, but even when using the highest quality settings for the compressing, the resulting textures look horrible.
Here's an example of what a normal map looks like when using the built-in compressing functions in Unity:
(Comparison of raw vs compressed data. The texture compression systems in Unity produce horrific results.)
This led us to spending time looking into alternatives. Luckily, we found a free, standalone CLI application that is able to compress textures and retain high quality. The biggest challenge was to prepare raw data from Unity to work with this system, but we managed to get it working. So now we have designed a system that exports the prepared data and compresses it using this third-party CLI application, and then reads it back in again. It works great and the results are much better:
(Comparison of raw vs compressed. The image is actually animating. The results are very similar uncompressed but takes much less size in memory.)
Load optimization
The time it takes to load characters is another concern I've had for a while. Because if you're going to load an interaction with 8 characters in it, and each one takes 8 seconds to load on average, that would be roughly 1 min of waiting. Not ideal!
So once we had this texture compression system in place, we were wondering how much we could speed up character load times by simply directly loading final results rather then generating them. Not only loading final textures directly, but also other parts, such as loading clothing and its final skinning data rather than doing heavy calculations on it.
One of the big challenges with this, we noticed, was finding a nice balance between data generation and data loading. Because as it turns out, some data is actually faster to generate than load (characters in FurryVNE consists of vast amounts of data). Through intense benchmarking, we have been experimenting with making system by system serializable, and cutting them away from generation (and rather load them directly instead through the serialized final data). We believe we have found a somewhat OK balance between generation and data loading.
While there (hopefully) is room for more improvements, we decided to load a few characters using our WIP fast load system and compare the results. You can see them below.
Peppy by Blacky
https://gyazo.com/5b9c99df6756c90980011b0d0ee7c261
Time - 37% (2.7x improvement)
VRAM - 30% (3.3x improvement)
Ankha by joni2k
https://gyazo.com/26521b7945bf11bb446418eed4338ce7
Time - 38% (2.6x improvement)
VRAM - 37% (2.7x improvement)
Maya by Dogson
https://gyazo.com/1be00005b53208b446e14a6490dc66ce
Time - 44% (2.27x improvement)
VRAM - 38% (2.6x improvement)
Krystal by Blacky based on work of kittysasha
https://gyazo.com/b192d61040cf8d96d5b7d9345ac9f242
Time - 22% (4.55x improvement)
VRAM - 30% (3.3x improvement)
----
It seems like the more complicated the character is, the bigger the time gains are. I reckon on slower systems than mine, the difference may even be bigger (as generation would take longer).
Future challenges
While characters are faster to load this way, actually compressing the textures and storing them in the fast load format takes a significant amount of time (something like 30s). Obviously, having to wait 8s for character load and then an additional 30s for compressing the character (FOR EACH ONE) is totally unacceptable user experience. Therefore, the way I'm thinking is making a server that compresses characters automatically as they're uploaded. Then, when somebody wants to use a character in an interaction, it will load the fast load version of the character which is not only faster to load but also uses far less memory.
There are some concerns how exactly that is going to work, though, as a fairly powerful system (in server terms) is required to set something like that up. I've been looking into cloud providers that have the required specs to run FurryVNE and generate the fast load data, and it would be very expensive to run. So expensive, in fact, that it would be cheaper to simply buy your own new computer solely for the purpose of being this server. So the way I'm feeling about this right now is either setup my existing laptop as this server, or simply build some server and have it co-located in some datacenter. We'll see what happens!
Also, another concern is the load time. While significantly less than generation, it still isn't quite as fast as I would hope. Around 2.5s times 8 characters is still 20s. Having to wait 20s for loading an interaction (not including environment load!) is still not a good experience for user.
Furthermore, so far I've only tested this on my own system, and it's a fairly powerful system with nVME SSD. Perhaps the gains would not be as good on slower systems. Needless to say, more time testing this would be optimal, but time is one of those things we don't have a lot of if we're supposed to get interactions out this year!
Summary
We've implemented texture compression into FurryVNE, making characters take far less space in VRAM than before. We've also created a system that instead of generating characters just loads them directly, which speeds up load times significantly.
There are some concerns how to best integrate these things to make the best user experience.
All the best.
- odes