I’m so sold on this name that I don’t really care that it doesn’t make much sense, or that the acronym doesn’t really work. It’s just cool to say and cool to look at. Which is fitting for a project with this much wizardry going on. Who knows what the fuck ‘emscripten’ means either.
Anyway, when I started working on DExTr my goal was pretty simple compared to what it can do now. All I wanted to do was make a simple C++ command line app that would take LLVM’s IR and go instruction by instruction, printing out C# instead of bytecode. In a lot of ways, this is an easier task than other transpilers; the IR is in SSA form, with branches/flow already worked out for you, and all of the instructions are really primitive, so you kind of just need to implement them all and then things will just magically start to work. The hard thing is that there’s a shit-ton of instructions.
This is the current LLVM IR reference. Scroll down that page and the list just keeps going.
There’s only about ~50 “core” instructions, the rest are all intrinsics, but clang loves to generate intrinsics, even when you tell it explicitly not to. Every so often it’ll generate a new intrinsic that I haven’t seen before just because it thinks a left funnel shift might maybe be 5% faster on one or two platforms - at which point I have to go back to the drawing board and figure out how to represent that (perfectly) in C#. If there’s an error in your math, good luck finding out where the fuck it is in 68MB of pure C# source.Each and every math bug was about 2-3 days of work to track down - Visual Studio can’t even keep the monolith C# file open without crashing after a few minutes.
Eventually though, things started to work. I wrote some dead code elimination passes, I got my baby’s first virtual memory working, and all of a sudden my test program could calculate pi without crashing.
I wasn’t worried about includes or the C/C++ standard libraries at this point, the main thing was just to get the language syntax going, but I started to realize that I could actually start worrying about the more advanced stuff, because all of this has been done before: Emscripten.
So I went and grabbed the latest upstream Emscripten and started yanking everything out.