Jon Baker, Graphics Programming

    home

    writings

Meet Susan, the SuperMicro SYS-6028U-TR4T+ SuperServer

  I've recently acquired an old dual socket server machine. Her name is Susan. We met at a place near my old apartment that sells off-lease refurbished server hardware for very reasonable rates and she followed me home. This is a system set up with 12 drive bays, 4x 10-gigabit ethernet jacks, a dedicated IPMI jack, 2x 18c/36t Xeon parts, 100mb of cache, 128 gigs of RAM, redundant 1000w PSUs, a grand total of two (2) USB ports, and it consumes about 300w under load. The fans are very loud, and using it for development work feels a bit like running a woodchipper. You absolutely need hearing protection for any kind of chronic exposure. It is a machine designed for doing headless work, no GPU, only a VGA port on the motherboard. There is a SAS controller for the drive backplane in one of the PCIe slots and two more open for "normal size" two-slot GPUs.

  The adaptations that have been made for serviceability and reliability under load are very interesting - one of the most unique ones is that the redundant power supplies enable hot-plugging between circuits. I can unplug one PSU - it alarms to signal the power failure, but has no functional impact due to the redundant unit - plug it into another circuit, unplug the second PSU, replug it somewhere, all while never losing power to the machine. I don't have any practical purpose for this right now but I have demonstrated it and I think it's an extremely interesting capability, you can think about things like battery backups, transporting it while running, even taking it in a vehicle.

  A piece of hardware like this, new, is on the order of 10k USD. The numbers involved in these systems are fairly eye-watering from a consumer PC standpoint, on most fronts. This machine was under 600 USD, including the chassis, RAM and two big server CPUs, it's really unlike anything I've ever had access to before. You'll spend that much on a DDR5 RAM kit, easy. With this many drive bays, you could do very well setting it up as network attached storage, but the CPUs are totally overkill for an application like that. If you're not hard up for floating point compute (this is "only" 4 TFLOPS), this is a cheaper toy than most meaningful consumer GPUs, now. In particular for me, it's a very economic way to force myself to adapt to a new system architecture and do some remedial work on CPU multithreading and sync primitives. This is something I hadn't gotten a chance to study before. I learn best with applications, like this, and I found this was a great exercise in the management of threads and also use of atomics and mutexes for correct operation between threads, where you would otherwise create unstable race conditions. I've had a lot of fun "playing supercomputer", learning how to make use of its capabilities the past few weeks.

Adapting to a New System

  Ubuntu is not the default choice that it was, anymore. My favorite Linux distro during college was not maintained since 2014, but spawned a community project to recreate it. It is a bare metal install (to a Windows sensibility) built on top of Debian Linux with a beautifully simple desktop config using OpenBox, Tint2, and Conky. This is more or less all that runs, plus the compositor and a few other little odds and ends, during a user session - this should be an expected default, but that's becoming less and less the case with whatever Microsoft seems to think they're allowed to do on your hardware. Windows 11 is getting scary bad, in increasingly user-facing ways. I write system software in C++, and take as given that doing so in javascript is unacceptable - they have no shame, they literally put ads in the the OS. It feels like a sign of pending collapse. When I do use it, I'm encountering bugs in Windows Explorer, a couple times a week. Right click menu options populate for several frames after the menu appears. This shouldn't happen, it shouldn't be possible for it to happen. It shatters the UI paradigm, you can easily click "The Wrong Thing" as it populates under your mouse - this async bullshit is happening in mobile operating systems, too, options populating with your finger a millimeter off the glass, mid-tap. This introduces a monster class of user facing vulnerabilities in a simple UI element, which will be regarded by the OS as correct behavior. This is a total failure of the software development process - it is so profoundly unethical to ship software like this, a growingly common condition, for which there is no accountability. A testing framework does not replace user testing. This is not a game. Rapidly approaching zero trust. This doesn't work. But I digress.

  Because I have no GPU in the system, I avoided setting up a graphics API for the small codebase I spun up for Crystal. I decided to look at options for textmode UIs, and found a nice one that provides a familiar UI paradigm right in a terminal called FTXUI. One of the sample applications sets up a familiar window analogy that is draggable just like OS windows, inside the terminal. The Debian repos have libftxui-dev for the headers and ftxui-examples for a set of little demo applications that show the UI functionality, with corresponding docs and source code here (see examples). The library puts the terminal in an interactive mode where it recieves mouse events and can logically treat the terminal as a framebuffer to render into. It also supports keyboard interaction. I think quite possibly, I may be able to do the plumbing to run this inside the text renderer in my own engine, but that would be further down the line, I can render all the terminal UI characters but I need to figure out passing input events in. The library provides a nice functional-style interface that kind of resembles a builder pattern. With some coaxing, you can setup dynamic UIs pretty easily, showing and hiding elements with the Maybe() component, which takes a pointer to a bool enable flag. This is nice because it enables central management of those flags for several elements, if you need.

  I have had my frustrations with ImGUI, which became the de-facto drop-in cheap-and-easy UI solution everywhere. We used it at id, and had endless frustrations with event passing, DPI issues, issues when running it inside a QT viewport in idStudio (not to mention that we were doing that with any regularity in the first place), etc, mostly arising from a custom backend that got hacked in. Nobody had a good time with it if they were doing anything beyond "click the button". I am appreciating quite a few aspects of using a textmode UI like this. First of all, who needs 60 Hz on a UI? What are you doing, low latency 3D tasks? No. You're clicking buttons, sliders, 10 Hz is already more than you need. Do you even need to open a separate window from the terminal where you launch the program? When I'm really loading down a system I'm dealing with latencies north of 5 seconds for single inputs. Why make this user interaction additionally contingent on render work?

  And so I think using a terminal UI like this is actually quite compelling. You can run a thread for this terminal UI, run it at 10 Hz. No need for more than that, and really no point tying it to the render thread, as it is fundamentally unrelated. It encourages good practice in synchronizing resources between several threads. If the UI needs to do something, it can spawn a thread to do so. In doing so, we now put onto the system scheduler, our latency constraint. I've got however many threads doing work, and here's another one that's in a loop doing relatively light work to manage inputs, update the display, terminal output, any required messaging to worker threads, then sleeping for 100ms.

Important C++ Utilities for Multithreading

  My coverage of this material has been spotty and I learned a lot here studying up the past few weeks. A lot of university staff are behind the times on new standards and things of this nature, so in my undergrad C++ classes C++11 was considered shiny and new, C++14 and C++17 weren't really even discussed. I had some spotty exposure to bits and pieces of more modern C++ over the years and it's nothing fundamentally surprising. It's good to get into the practice of correctly managing lambda capture lists and synchronization primitives.

  Three pieces of functionality I've had real application for in undertaking this project:

Building a CPU Job System

 The problem statement is maybe relatively simple: I have 72 independent threads of execution that I want to keep busy. How do you do that? What does it look like? How do you monitor it?

 The program starts by creating several threads, one service thread, one UI thread, and a pool of 72 worker threads. Initially, a Crystal was a first-class entity, the primary class for the application. I've since moved to encapsulating that in its own class and managing several of them at a time to be able to pipeline it and make use of the huge memory buffer on this machine (I can now easily saturate 128gb and spill into swap - this chassis can be kitted out with 3tb, I think moving to 512 gigs would enable some pretty cool stuff). I'll describe the program before that point, because it is more pertinent to this discussion.

Future Directions

  This machine has more network bandwidth than you can shake a stick at. I don't have the infrastructure to take advantage of even 10% of it, I've got a couple big gigabit switches, but Susan has 4x 10-gig ports. I don't have anything that I can connect to it in a way that would operate at full capacity - interesting, because this is a machine from 2014. However, this would be quite interesting to explore, connecting it to other devices like itself. I'm not sure if you have to go through a switch or ideally if you can go directly port-to-port. I've recently been talking to a couple friends about some networking libraries, something a bit higher level than dealing with sockets, one called Enet, which was apparently originally made for a game I played as a kid, Cube 2: Sauerbraten, and one called rpclib, which provides more of a function interface. I'll need to pass a significant amount of data between a couple machines to sync scene data and rendered frames. I'd like to move towards building a "graphics supercomputer" and experimenting with different architectures to take advantage of larger scale hardware. With a GPU, I can offload expensive float operations to the device, and do control-flow heavy jobs, maybe work like sorting and organizing of results that come back from the GPU, on the CPU. One current idea I'm toying with is to setup a ring network between several machines like this, and pass progressively refined branching tree structures representing rays for a given pixel. Ring networks are very interesting because you do have an interesting inherent limit in jump distance between machines, that is the number of elements in the ring. Very high bandwidth and zero contention along links. It has redundancy issues, though, if any machine goes down the network stops functioning correctly. Running another line to a switch acts like a central hub, and you can at least be aware of this kind of failure condition from whatever head unit acts as "master".

  I'd also like to get away from using the VGA output on this machine. It wasn't really designed to be used this way, and I'd like to figure out how to SSH/network into it and do the X server forwarding to do a virtual desktop. This is probably one of the next skillsets I will be focusing on developing a bit, figure out at least some amount of basic networking, I'll be looking to figure this out soon. If I have several machines like this, I can have each one report the system monitoring data back to a master machine, and do central monitoring, which I think is a very cool bit of opportunity for blinkenlights. I also need to figure out the correct cables for the GPU_PWR headers to 8-pin PCIe connectors to run a GPU in it. This would massively expand its capabilities, with a significant GPU. I have a 7900xtx that I am planning on using for this.


Last updated 11/29/2025