Timothy Baldridge

Removing the Server Jank (Patreon)

Published:

2022-02-05 18:22:51

Edited:

2022-02-05 18:22:54

Imported:

2022-05

Content

As I mentioned in previous posts, one of the nice things about rewriting all of Wabbajack in modern C# is that we are now able to run about 90% of the code on Linux, and since the code is modular it can be split up into multiple smaller services. Also, thanks to the modularity we can make surgical edits to the code to remove some of the hardware requirements we have for these services.

Over the past few months I've taken the time to try and work through the few remaining bottlenecks that require me to maintain a server for list validation. The remaining hurdle was disk space. Github Workers (that we use a lot for different tasks) have ephemeral storage, that is cleared once a task completes. Our modlists are up to 2GB in size, we have about 40 of them, and we need to validate them about once an hour. As you can see this is a problem if we need to get the definitions for these lists as we'd have to download 2GB * 40 lists * 24hrs a day which is well over a TB of data every 24 hours. This is a bit of a problem.

I was recently listening to a court case about forensic investigation of a computer and the government agents mentioned that because they were able to recover a table of contents from a .zip they could verify several of the files in that archive, and that got me thinking: both the WJ CDN and the HTTP protocol support "range reads", meaning you can request only a certain part of a file. Zip files end with a table of contents, so it seems like we could read the end of a modlist (modlists are zip files), then find the one file we want (the modlist definition), read just that segment, and be off to the races!

Fast-forward 3 days or so and I've written code that allows for chunked cached reads of a data source. In essence you can say "call this function when you need a block, blocks are size X, cache no more than Y blocks at a time", and that interface acts as if you had a normal stream to a file on disk. For now we only use this to find the `modlist` entry and parse it, but in the future this could be used for any number of other tasks.

This then gets us to the title of this post: finally, after 2.5 years, none of the WJ servers run in my garage. We got most of the heavy file server stuff into the "cloud" last year, but until today I was running a custom Github action worker for validating lists. Now, because we never need to keep full copies of the lists on-disk, we run the entire system in a ephemeral job in the cloud.

With that, all we really need to do now is get the WJ UI ported over to 3.0. We have some new designers/coders working on this UI work and hopefully we'll have some good beta releases to show of that soon.

As always thank you for the support, this project continues to hum along and it wouldn't do so without you all.