Home Artists Posts Import Register

Content

The Wabbajack server-side code consists of two main parts, the CDN (essentially a FTP+HTTP server, with a 1Gig pipe) and a backend server that does list validation, Nexus caching, list healing and a host of other small tasks (but doesn't require a fat upload pipe). In addition, this same server also runs a  VM which handles our CI test runs from Github.

For the past year or so that backend server has run off a consumer Windows 10 box. Due to how it's built we have a limited amount of SSD space and so the OS runs off of a RAID of 3x WD RED 2TB drives. If you're cringing right now, I agree, and yes it needs to change. A few of the problems with the existing setup:

* WD RED drives are meant for writing large infrequently changing files, and they don't really work well as a main OS drive
* The space on this system is slowly running out, and yet in a perfect world we could keep a copy of every WJ modlist ever made
* The CI tests must run in a VM for security reasons, but that also means they have limited resources and so our CI tests often take 40+ minutes to run.
* Since we're running a normal Windows 10 copy, OS updates will randomly reboot the system. When they do the system takes a long time to restart due to the HDDs being so slow.
* The case I'm using for this system is lacking in expansion space, so adding more storage isn't really an option, and we're slowly running out of space.

Thankfully all this is fixable. Thanks to the donations over the past several months, I've been able to secure a 2U Dell R720 at a good price key features here are enterprise-grade RAID hardware, 16x drive slots, and faster processors than what we're running right now. Server hardware like this often runs practically forever. Hardware often follows something known as a bathtub curve: hardware that will fail often fails right away, or lasts for many years. Even then the two most failure prone parts of a server are the HDDs and the PSUs. This system has redundant 750W PSUs and I've purchased new drives for it, so we're all set there.

Once the drives arrive in the next few days I'll be setting the system up to boot off of a RAID of 1TB SSDs. The HDD storage is CMR based (all you can get these days) but that space is almost always "write once" in our usage, and I'll be adding some other SSDs as a tiered cache. I've used Windows tiered caching in this sort of setup before with good results, it essentially allows you burst-write data to the SSDs, and then Windows will slowly move that data to the HDDs as needed.

I'll be using Windows Server (or Windows 10 LTS if our code doesn't work well on Server), so the random restarts should no longer be a problem. And with 5TB drives, even with several slots used for the cache, we should end up with around 44TB of usable long-term storage. Should last for awhile.

But why Windows? Well the entire backend server stack uses a lot of core components from the Wabbajack project. Someday perhaps we'll look at making the build server run Linux, but so far it's been simpler to lean into Wabbajack's single OS support instead of dividing development time into supporting multiple OSes. So for now the server has to run Windows, and I'm going to try to run it on Windows Server (fewer mandatory restarts and more controls I can tweak), but if there's some Windows 10 only feature, we may have to use Windows 10 LTS

Once the new build server is configured, I'll move the Wabbajack backend services to this machine, then rebuild the existing box as a pure CI server which should speed up our tests dramatically as we'll be able to run the CI code natively on the machine.

Unfortunately most of this won't be directly noticeable by users, but in the end it will result in better stability, and fewer random server outages.

Comments

No comments found for this post.