This project is one of the most multi-layered things I’ve ever done in s&box, I mean there’s levels upon levels of neat stuff going on here that you could write a whole book on and still miss out on a few details.
Three months ago I decided to start following a hunch.
A little backstory: way back in the days before WebAssembly, Emscripten (the most popular C++ to WASM compiler) used to be this wacky fork of LLVM, which, instead of translating C++ into bytecode, was capable of printing out… pure JavaScript???
Anyone who’s ever used JavaScript immediately knows how wtf bonkers that sounds, but Emscripten wasn’t really about the “why,” it was about the “how,” and they proved it could be done. It didn’t really matter how fast/slow the module was, the cool thing was that all of a sudden you could have DOOM running in your browser despite the fact that it was a completely sandboxed web environment.
Long story short, Emscripten has somewhat moved on since then (though the pure JavaScript support lives on through WASM2JS I believe) but the core principles remain the same: whatever you’re actually allowed to do with the hardware, do it, and emulate everything else (libc, filesystem, fibers, etc). Which is awesome, because again, the end result is completely sandboxed. You can do whatever the hell you want with memory because your “memory” is really just a massive JavaScript byte array with the stack at the beginning and the heap taking up the rest.
About a year ago I came to the realization that “hey, C# is actually LESS restrictive than JavaScript in many ways, even if it’s whitelisted out the ass… if they could do it for the web, why can’t we do it for s&box?”
And then I remembered that compiler development is a major-league pain in the ass and filed that idea away.
After all C++/CLI kinda exists, which isn’t whitelist friendly but it’s similar enough that if someone wanted to touch the assembly up, they could port a C++ app to s&box with a little elbow grease. Way easier than actually transpiling. I made a proof of concept of this ages ago which was an editor tool to patch the C++/CLI assembly in with your C# assembly, and it did generally pass ILVerify. But you were limited to the “pure” mode, which meant no pointers, and you had to use .NET types. But at least it passed ILVerify without complaints.
Then Facepunch removed ILVerify and switched to just… compiling code on the fly. Well shit, writing your game in boring goody-two-shoes C# is now mandatory. Bye-bye assemblies.
But like, I adore C#, but I’ve also grown to kind of fucking hate it at the same time because of all the restrictions placed on you by the whitelist. Chiefly, it’s damn near impossible to use any external C# libraries without needing to rewrite atleast some of the code to get rid of stackallocs etc. Plus, you have to include them in your codebase, and s&box’s compile steps are really rigid so you can’t just modify the csproj to add an external folder, or add build steps to do any kind of code transformers etc. And look, don’t get me wrong, this is probably the perfect ecosystem to be developing from-scratch fun little games in. But it’s a little restrictive for the kind of world shattering shit-your-pants on the spot “how the fuck is this possible” type of project. You know the ones I mean. Like DOOM running in a web browser.
I’m so sold on this name that I don’t really care that it doesn’t make much sense, or that the acronym doesn’t really work. It’s just cool to say and cool to look at. Which is fitting for a project with this much wizardry going on. Who knows what the fuck ‘emscripten’ means either.
Anyway, when I started working on DExTr my goal was pretty simple compared to what it can do now. All I wanted to do was make a simple C++ command line app that would take LLVM’s IR and go instruction by instruction, printing out C# instead of bytecode. In a lot of ways, this is an easier task than other transpilers; the IR is in SSA form, with branches/flow already worked out for you, and all of the instructions are really primitive, so you kind of just need to implement them all and then things will just magically start to work. The hard thing is that there’s a shit-ton of instructions.
This is the current LLVM IR reference. Scroll down that page and the list just keeps going.
There’s only about ~50 “core” instructions, the rest are all intrinsics, but clang loves to generate intrinsics, even when you tell it explicitly not to. Every so often it’ll generate a new intrinsic that I haven’t seen before just because it thinks a left funnel shift might maybe be 5% faster on one or two platforms - at which point I have to go back to the drawing board and figure out how to represent that (perfectly) in C#. If there’s an error in your math, good luck finding out where the fuck it is in 68MB of pure C# source.Each and every math bug was about 2-3 days of work to track down - Visual Studio can’t even keep the monolith C# file open without crashing after a few minutes.
Eventually though, things started to work. I wrote some dead code elimination passes, I got my baby’s first virtual memory working, and all of a sudden my test program could calculate pi without crashing.
I wasn’t worried about includes or the C/C++ standard libraries at this point, the main thing was just to get the language syntax going, but I started to realize that I could actually start worrying about the more advanced stuff, because all of this has been done before: Emscripten.
So I went and grabbed the latest upstream Emscripten and started yanking everything out.
I’m using “DEX” to refer to everything that isn’t the core C++ app, which includes the Python compiler drivers and the (mixed C++/C#) compatibility layers that borrow heavily from Emscripten. It’s an even cooler name than DExTr (and makes even less sense), and it’s only three letters which means I can map my compiler commands in a way very similar to how Emscripten does it; ‘emmake’ -> ‘dexmake’, ‘emcc’ -> ‘dexcc’, etc.
Basically DEX includes all the things you would typically expect your compiler to include: musl libc, libcxx, and a C# layer that binds ‘syscalls’ to the s&box API under the hood. So stdout maps to Log.Info, stderr maps to Log.Error, fopen maps to FileSystem.Data.Open*, et cetera.
I also made an effort to mimic a lot of Emscripten functionality, so instead of an HTML5 header that you include, we have some s&box specific headers to set up a main loop that gets run in OnUpdate, alongside some other quality of life things.
After a few barebones experiments to make sure the filesystem etc. was all working how I expected, my first field test of DEX was the obvious choice; the rite of passage of all rites of passage, the quintessential DOOM port.
And holy shit, it was so much easier than I thought it would be. Like, too easy. I had it working in the span of like, two days. And the syntax for the transitions between C# and C++ were so beautifully intuitive.
In Emscripten, you can do this cool thing where you weave in and out from C++ mode to JavaScript and back to C++ and then do a little more JavaScript, passing variables between the two like it’s nothing. The DEX syntax for this is very similar:This is a snippet from my DOOM port, everything outside DEX_ASM_ARGS is C++ and everything inside is C#, and it just works. Like… come to think of it, this is actually easier than the normal way of passing pointers from C++ to C# when you’re working with unsafe code. The memory span is just, right there… you just grab it and everything’s gravy.
So then I had DOOM working (you can check that project out now, it’s available on sbox.game as DoomBox!) and I figured “let’s take it one step further and port Quake!”At which point I’m starting to realize, okay this is actually running a LOT faster than I gave it credit. Obviously it’s not anywhere as near as fast as native code, but the fact that it was able to manage software rendering at 20-25 fps ENTIRELY in safe C# was another one of those “what the fuck” moments.
Then I realized, hey, interop is really easy, why don’t I just tailor GLQuake to call S&Box’s Graphics.* APIs instead of calling OpenGL?
And bam, it worked, Quake in the scene system like it was nothing.At this point I was having so much fun I figured… I’ll just go all out on this, I’ll find out how to make multiplayer work.
So I made a pretend socket layer that sits on top of your s&box connections, and once again, everything worked pretty much how I expected it to.Then I was like - “this is really fast, I wonder if I could run emulators at a decent speed?” The answer was yes. This was getting too fun. Time to give myself a challenge.
At this point I’d been working on DEX for about a month and a half and was already super proud of the results. But I have this annoying habit where when something’s working really well, I give myself some masochistic goal to make an already cool project even cooler instead of just saying “I’m proud of this, this is good enough, let’s move on.”
So naturally I set my sights on something completely ridiculous that I wouldn’t be able to port in a million years: mupen64plus (specifically the LibRetro bindings). Not only does mupen64plus have an absolute shit-ton of code, but it’s also heavily reliant on weird shit like setjmp/longjmp, is notably broken in Emscripten, and makes use of OpenGL exclusively, no software rendering that I’m aware of.
And if that wasn’t hard enough, I wanted to do it without changing a single source file. I allowed myself to mod the makefiles and patch a few platform-specific things out to get coroutines working, but otherwise I made it my goal to get this working without changing a single line of the source.
(If you’re asking yourself “how the fuck…” at this point, hold all questions to the end.)
So the first thing to tackle here was that Mupen64Plus is pretty heavily threaded. I’m still a little wary about implementing pthreads into DEX, because adding multiple threads interacting with memory etc simultaneously introduces a whole lot of scary edge cases that I didn’t want to deal with.
Luckily, Mupen64Plus has a fallback which uses coroutines to jump around the code. Except their coroutine library uses setjump and longjump, which is just, not happening in C#. We don’t have enough control over execution to make it happen.
Enter Asyncify.
Emscripten has this neat fiber API which is more or less exactly what I needed. It’s pretty standard stuff - it lets you pause execution at a spot, go over to do something else with a brand new stack, then trampoline back to the original yield to continue whatever it was you were doing. Currently they’ve got this as a Binaryen pass, but it used to be an LLVM pass - either way, the principle is the same.
For every function that might* call into something which yields execution, add some instrumentation after that call to check whether we did yield, and if so immediately return instead of continuing with what you were doing. Then add instrumentation to every function which calls that function, and every function which calls that function, on and on up the call tree. This has the effect that immediately after you yield, your whole stack just unwinds like dominoes all the way back up to the top.
Then, as you unwind, if you make note of the local variables (stack pointer, the index of the function that was the unwind source, etc), you have everything you need to “rewind” the stack and continue execution like nothing happened.
This increases your module size like a motherfucker, because it has to instrument every function in the whole module which might be able to yield. So not just any function which calls a function which calls a function which yields… any function which calls a function pointer, because what if the function it points to has a yield, and any function which calls that function which calls a function pointer… yeah. Basically triples the size of your code. But it works. And there are ways to reduce that cost which you can go read up on if you’re interested.
At this point I’d already proven with QuakeBox that you could more or less rewrite OpenGL calls into s&box Graphics calls and things will work just well enough to get shit on screen. Maybe it’s not perfect or accurate, but with enough fiddling it’s good enough to see what’s happening and you don’t really care that it’s not 1:1 because how the fuck is this running in s&box anyway. But that was a custom-made solution, I had to modify the Quake code so that it could represent the objects in the scene system.
But you can’t really do that with an emulator, because who’s to say what is and isn’t an object, you don’t really know what’s UI, most of the time you don’t even know where the camera is, you just render what the game tells you in the way it tells you.
So what if instead of hacking the scene system into the game, I hack OpenGL into the scene system? Easy peasy. (Just kidding.)
Anyway, I call this faux OpenGL driver ‘DexGL’.
The first and most obvious hurdle here is GLSL. You can emulate draw calls all you want, but at the end of the day, if the shaders aren’t right things will just break down.
The way I saw it, I had two options:
Translate the GLSL into HLSL, formatted in a way s&box would accept it (easy but inaccurate and chock full of edge cases)
Compile the GLSL into shader_c (what?)
I brought this idea up in the s&box discord and they suggested I go with the first option. So naturally I went with the second option.
More backstory. I realized pretty early on that it wouldn’t be that big of a challenge to write a homebrew shader_c compiler for a few reasons:
shader_c used to contain DXBC. It doesn’t anymore. S&box is on Vulkan, so shader_c has SPIR-V now. GLSL to SPIR-V compilers are dime-a-dozen (I chose to use glslang).
The wonderful folks who maintain VRF have already done the majority of the legwork in reverse engineering the shader_c format. It turned out to not be comprehensive enough to write the full compiler, so I had to do a little bit of reversing on my own - but once the hard stuff like compression has been figured out, it’s really just a matter of diffing compiled shaders to see what changed.
There’s actually a lot of stuff you can fuck up in the shader_c binary and s&box will still accept it. Obviously there are fields here and there which will crash the game if they’re even slightly incorrect, but my vertex layout was just dead wrong for the first half of the project without me realizing and the engine was able to cope.
So I got to work. Turns out the format’s pretty intuitive all things considered. Combos are organized how you’d expect (static top-level combos, then dynamic combos, then individual SPIR-V sources beneath that).
Each SPIR-V input file is bundled with some reflection metadata which the game uses to create the Vulkan pipeline layout, and then there’s some top-level information about the set of globals and textures and samplers and whatnot, and where to put them when they get mapped down to static->dynamic->SPIR-V.
Each dynamic combo definition is trailed with some renderstate information, so I can just enumerate all the possible permutations of renderstates I might need and control them C#-side with Graphics.SetCombo.
I’ll be publishing the DexGL source at some point soon. What I’ve gleaned from the format is not perfect, but it’s leaps and bounds more comprehensive than what VRF is able to read right now and it’s functional enough that it compiles all my shaders without issue.
Glslang, as cool as it is, was honestly a bit of a headache through this project. Their ABI is really closed off so you can’t peek at much besides the most basic reflection data, and you’re pretty SOL if you want to modify the AST in any way.
Eventually I just said screw it and forked glslang, and added a few transformation passes which I needed to get things working:
Firstly, glslang was generating combined image sampler objects in the SPIR-V for GLSL’s sampler2D object. I wasn’t sure shader_c had support for those (and if it did, I wasn’t able to find a descriptor range for them), so I added a transform pass to glslang which would take any combined image samplers and decombine them, putting the separated textures in descriptor bindings >= 150 and the separated samplers at >= 70 for fragment shaders (this appears to be the current s&box descriptor layout).
Later on I also added another pass which remaps gl_Position’s z component from [-1,1] (stupid OpenGL coordinates) to [0, 1] (sane Vulkan coordinates) to fix some issues with viewport depth.
The final piece of the puzzle was tying everything together and writing just enough of an OpenGL driver to get things working. This was generally less impressive than its components but there’s still a few neat things I had to get working here which are worth mentioning.
Firstly, for whatever reason you can’t actually generate and then mount shader_c files at runtime. You could save them out, but they’d be stuck in FileSystem.Data, because FileSystem.Mounted is read-only, and the mounted filesystem is the only place considered when you call Shader.Load. So I opted to make the DexGL compiler a separate C++ app instead of attempting to bundle all of glslang into the DEX module. Right now the game just caches out the GLSL sources to your local data folder, then I’ve got a batch script which will compile out the shader_c for your next run and throw them in the mounted assets folder where they belong. Not too big of a deal.
The second weird hack was with how we render geometry. Mupen64plus uses client-side arrays (sigh) when it believes it’s running under GLES (which it does in the case of DEX).
So your vertex layout is changing all the time with calls to glVertexAttribPointer etc. That data can be ordered pretty weirdly, and so unfortunately s&box’s VertexAttribute struct is a little restrictive for what we need to do. During runtime we have to sort out the VertexAttributes for each drawcall and order/pad them out to match how the data is laid out in memory.
Then we use the DEX C# preprocessor (remember how I said there was too much stuff to talk about and I couldn't fit it all into a single blog post?) to generate out some structs which have a fixed compile-time size so that we can actually upload the geometry.
Then we actually need to get the geometry on screen. Here’s where it gets really hairy. Turns out the only way to get geometry with a custom vertex layout on screen is to use an instance of Mesh and build a Model from it with ModelBuilder. All of the other Graphics.Draw APIs use Vertex (which obviously won’t work as it doesn’t have the right layout), or custom vertex structs which have explicit compile-time [VertexLayout] attributes (won’t work because our vertex layout is determined at runtime).
But obviously we can’t create a Model each frame for each draw call, that’s just going to murder the engine.
So I settled with a mesh pool, which buckets a shit ton of meshes by their vertex layout and vertex count, and adds new entries to the pool if it can’t acquire a mesh that we’re confident is out-of-flight. It’s a little hacky and a little slow (Mesh.SetVertexBufferData and DrawModel are by far the slowest functions when profiling) but it works well enough for what I need it to do.
Long story short it works. What the fuck. This is the culmination of about three months of work (despite folks telling me I was crazy to attempt this!) but the results speak for themselves and I’m still struggling to believe it’s all possible.
I'm planning on making DEX open source in the near future, along with DexGL and the sources for RetroBox. At which point you guys can go crazy and port to your heart's content. I'm still keeping it to myself for a little while though, as there's still one or two things I want to try and get working ;)
Comments