Skip to content
This repository has been archived by the owner on Mar 10, 2020. It is now read-only.

What is the Difference Between GHC and the HaLVM?

Susan Potter edited this page Jun 4, 2017 · 6 revisions

For those interested in developing the HaLVM itself -- rather than developing a unikernel based on the HaLVM -- it may be useful to familiarize oneself with the way the HaLVM is actually structured. (If you're in the other camp -- trying to build a unikernel -- then let me suggest you skip this whole section, as none of it will likely to be useful.)

At its core, the HaLVM is a version of GHC built for a funny environment in a funny way, plus a bunch of libraries. As it has grown over time, the actual number of changes to GHC proper have been declining, and we hope to continue this process as we move forward. Thus, the "version of GHC" bit is actually not that interesting: a few bits of code are commented out, and some deep magic in the threaded runtime is swapped out. The actual interesting bits are the "funny environment", the "built in a funny way", and the "plus a bunch of libraries." Let us discuss each of them, in turn.

Built For A Funny Environment

Normally, when one builds GHC, one uses the standard GNU build process: run configure, make, and make install. If you're fancy, you might change one of the arguments to configure, or you might have discovered GHC's build.mk file and tweaked one of the options there. But, generally: configure, make, and make install. This process builds you a nice, fresh version of GHC that runs on the operating system and processor that you built it on. In other words, you are building a compiler that will run on a host system and generate code for a target machine, where the build, host, and target machines are identical. Many people refer to this as a native compiler, or more commonly, "not one of those pain in the head cross compilers."

The HaLVM is a cross-compiler. While you build it on a 32-bit or 64-bit Intel Linux system, and the tools run on a 32-bit or 64-bit Intel Linux system, the HaLVM tools actually target a 32-bit or 64-bit Intel Xen system. In other words, build and host are the same, but the target is different.

Those of you who have done any cross compiling will be familiar with some of the annoyances of doing so. Instead of calling gcc, for example, you have to call arm-linux-gnueabi-gcc and then be very careful finding your include files and libraries.

GHC makes this effort a little more difficult than GCC, because GHC is only sort-of, kind-of designed to support cross-compilers. Fortunately, though, the system that we're building for (target) is very, very close to the system that we're going to be running the tools on (host), which makes our lives much easier. Because they're so similar, we actually use the same compilers and tools we would have used if we were building a native compiler. What we change, though, is the header files the system uses, and the very core starting objects that make things go.

Header Files

First, we have to define our own header files, so that we properly generate code for Xen rather than for Linux. If you're curious, these files can be found in our minimal version of libc or in the adaptation code, including in our GHC branch, that we will talk about shortly.

These header file ensure that the structure sizes, capabilities, etc., found by GHC and GCC match the versions that will be around when the unikernel runs on "bare metal." If we did not include them (and the very important -nostdlib and -nostdinc flags), then we might accidentally mix details of the Linux system that the tools are running on with details of the Xen system we want to build for, making nobody happy.

Core Starting Objects

Similarly, there are some core objects that we want to make sure are around: the equivalents of C's libc, libm, and crt0.o. Most people are familiar with the first two: they're the built-in C and math libraries used by most C programs. In the case of the HaLVM, we have to build special versions of these libraries that support the impoverished environnment in which the HaLVM lives. Fortunately, we were able to borrow implementations of these from other sources: we borrow our libm module from the Julia project (look for it in the main HaLVM repo under [src/openlibm]), and borrow Galois's own minimal libc implementation (look for it in the HaLVM's GHC fork, under rts/minlibc.

Both of these libraries are built as part of building the HaLVM. Is there a particularly good reason that one is in the main HaLVM repository and the other in the HaLVM/GHC repository? No, not really. We could perhaps make up a post hoc explanation -- something about it being easier to make the paths work out -- but the truth is, there is no particular thought behind that decision. In fact, moving minlibc out of the halvm-ghc tree might be a good project for an aspiring HaLVM engineer. (We would move that direction to continue our minimization of the differences between the halvm-ghc tree and mainline ghc.)

Finally, many C programmers have bounced off crt0.o, if only to the point at which they recognize that if they see that filename, something has gone horribly wrong in their build scripts. But to skip a lot of detail, crt0.o is a small object file that sets up your C program to run. Any initialization required by your operating system, for example, is run in that file before it calls your system's main.

The HaLVM doesn't have crt0.o. Instead, it has a whole bunch of files in halvm-ghc/rts/xen. These files are the ones responsible for bridging the services and capabilities that Xen provides to the services and capabilities that the GHC runtime expects. Perhaps the most direct comparison to crt0.o is the file halvm-ghc/rts/xen/entryexit.c, which is how a HaLVM boots and starts your unikernel. The other files in that tree are there to, for example, implement timers, clocks, interrupts, and similarly necessary things.

The Environment, Concluded

In many ways, the HaLVM is simpler than many cross-compilers because -- to date -- the host and target environments are very similar. The advantage of this situation is that it makes a lot of potential problems go away, and we get to re-use a lot of tools for the environment. For example, we steal local versions of Happy, Alex, hsc2hs, and other tools, because we can. The disadvantage of this situation, though, is that it can lead to a false sense of security. Several relatively painful bugs in the HaLVM's history have arisen from forgetting that host and target are different.

In the future, when we support more than one architecture, this split will need to become more formal and rigid. But until then, for better or worse, we can play some games.

Built in a Funny Way

Coming from outside the HaLVM, you may have the impression that most of the work involved in the HaLVM is in the port of GHC to Xen (mentioned in the previous section) or the libraries that comprise the programmer interface to the system (mentioned in the next section). In fact, though, most of the work in the HaLVM is in the build system and tooling. The former being a disaster and the latter being a delicate balance of pieces.

Before you say "a disaster? what right-minded software engineer would leave that much technical debt behind them?", though, best read this section. Re-jiggering the build system so it works better or makes more sense tends to be the first thing new HaLVM engineers get excited about, and yet we still have something I describe as "a disaster." The GHC and HaLVM build systems have broken many a good engineer. Best be wary. (But some brave souls are still fighting the good fight.)

To make this section a little more approachable, I've split it into a few pieces: the structure of the build, the games we play with GHC, and the tools.

The Structure of the Build

When I refer to "the build system", I'm going to refer to the events that happen when you have appropriately pulled the HaLVM and typed make. The make install tools aren't that interesting, and the make packages systems just invoke the previous in various contorted ways.

So, you want to build a HaLVM.

First, because GHC requires GHC to build, the HaLVM also requires GHC. However, because of some of the games we play with GHC (see the next section for details), we cannot use any sufficiently modern version of GHC. In this case, we need exactly the same version of GHC. In addition, we need to be very sure that the system has all the tools we need, and any libraries they provide to HaLVMs need to be compiled with this same, exact-match compiler.

Early in the HaLVM's existence, Galois needed to deliver source releases of the HaLVM to clients, who wanted to not have to futz with their machines before installing the HaLVM. We also didn't want to have to deal with "bug reports" in which some subtly-different version of GHC caused a problematic mismatch. Thus, the HaLVM has always utilized a build process in which the first step of the build is to download and install the appropriate version of GHC for the local host, along with all the necessary libraries and tools. If you're looking in the Makefile, then these are all the targets that start with PLAT; PLAT stands for "platform", and refers to all the tools and libraries that are intended to run on Linux.

(This mechanism has remained useful as we continued to simply release source tarballs. However, with the advent of binary packages, this may no longer be the best decision. If we release binary tarballs to most people, then we can arrange our build machines appropriately and rely on configure and stern warnings to avoid this build step.)

Once the platform version of GHC is ready to go, we start building the bits and pieces of the HaLVM. This involves building libm (and maybe gmp), injecting a bunch of source code into the GHC tree, and then invoking GHC's build process. Simple.

Building GHC is a bit of a process. We try, as much as possible, to use the various mechanisms for regulating it (configure, build.mk, etc.) to control it, but you'll find that many of the differences between mainline GHC and the version of GHC we use in the HaLVM make small changes to the build system. For example, we introduce the GHC build system to minlibc, the Xen RTS code, and the two HaLVM libraries. At the same time, we tell it that there are some libraries that it should skip, and some tools it shouldn't bother building.

For configure, we tell it where it should find some of the tools it should use, including the platform compiler and tools we just installed. We then use build.mk to adjust some of the compiler and linker flags. We also use the latter to build GHC in just about every "way" imaginable, which is part of why it takes forever to build the HaLVM: we include support for vanilla and profiled builds, along with threaded, debug, and threaded+debug variants. It all takes awhile. This is also where your integer library is chosen, and so forth.

We build a deeply corrupted -- see the next section -- stage 1 compiler. Stage 1 compilers are compilers built by taking the platform's version of GHC and compiling the compiler source code with it. (Stage 2 compilers -- also known as "bootstrapped" compilers -- then use the Stage 1 compiler to compile themselves, so that the compiler itself is built with the latest stuff. Because we're cross-compiling to Xen, building a Stage 2 compiler doesn't make all that much sense. Just ... think about it.)

To a large extent, again, this is captured in the patches we make to GHC, but some of it is visible from the outside. For example, this is why we build the compiler, but then go back and explicitly build the variations of the RTS we require. For some reason, our modified GHC build system doesn't do that for us. Because we have modified the build system and injected our new libraries into the halvm-ghc build process, the build system does automatically take care of our two libraries, though. It will also take care of installing them appropriately, with all the appropriate hashing and database injection, as well.

Once we've built all this stuff, we're done. The rest of the system is just installing all the pieces into the right places.

The Games We Play with GHC

As mentioned in the previous section, we build a Stage 1 compiler for the HaLVM, because the HaLVM is a cross-compiler. There's just one problem with this, however: Template Haskell requires a Stage 2 compiler. And, because Template Haskell is used in a surprising number of places, we would really like it to work with the HaLVM. Otherwise, for example, HaLVMs couldn't use lenses, and what sort of world would that be?

To allow this to work, we perform some shenanigans, all based on the fact that while host and target are different, they're very close.

Thus, to make things work, we adjust the GHC build system to build the Template Haskell code even in Stage 1. What this does is build a Template Haskell implementation in which the macro compiler targets "Intel/Xen", rather than "Intel/Linux." However, because they both start with "Intel", it works out. Sort of. What it means is that while you probably should never reach into IO in macros as a general rule, you really shouldn't do so when building for the HaLVM. Because it's not totally clear what will happen when you do. It also means that the HaLVM libraries -- HaLVMCore and XenDevices -- are available to you in the macro system. Again, their existence has positives and negatives. On the plus side, it means that you can use information about Xen in the expansion of your macros. On the down side, it means that you're technically allowed to call Xen-specific code from your macro. Once again, let me suggest not doing this.

At some point, we are hoping that GHC will have a cleaner build system for cross-compilers, and that Template Haskell will be gracefully included in such builds. Until that day, though: shenanigans.

The Tools

Finally, the tools.

As you've probably noticed, when using the HaLVM, we strongly encourage people to use halvm- prefixed variants of all your favorite commands. We use halvm-ghc in our example Makefiles, for example, or point people to halvm-cabal. Doing so is fairly standard for cross-compilers; again, people developing for ARM computers are used to using tool names like "arm-linux-gnueabi-gcc". They just hide them in their Makefiles so they never have to look at them. We, instead, hide our even longer and more confusing arguments in our tool scripts.

That's right; all of those tools are scripts, and live in src/scripts. They do a bunch of work hiding some of the extra flags or steps you would need to do if you were to use the raw tools. For example, in halvm-ghc, we add a few flags to tell GHC that we want to create a statically-linked executable using a provided linker script. halvm-cabal adds a bunch of flags telling the underlying cabal where to find tools, and enables split-objs to reduce final executable sizes.

You could use all the binaries raw ... but I wouldn't.

Note that there's some pressure between where the magic lives, so if you're going to start monkeying with these scripts, note that there are some interactions between these scripts, the flags included in the various library Cabal files, and some magic placement of libraries by the install process. So if you want to shift the location of a library, for example, consider: when does it really need to be included? Should I add the flag to a Cabal file, if it only needs to appear when that library appears? Or perhaps to the RTS's equivalent, package.conf? Or should it appear in a script, because it depends on how / when the tool is run?

With a Bunch of Libraries

OK, now we understand the connection from GHC to the ground, and how we build the whole thing. Final question: What do we add on top, to make the HaLVM the HaLVM? The answer is: two core libraries, which we distribute with the HaLVM, plus a few extra libraries, which we distribute via Hackage.

The Core Libraries

The two core libraries are HaLVMCore and XenDevice. Briefly, the former is intended to provide basic support for running Haskell programs on Xen, while the latter is intended to support interaction with Xen virtual devices.

HaLVMCore thus contains core memory management, event management, and console primitives. (The latter should arguably be in XenDevice ... but it isn't.) Want to allocate a page, share it with another domain, or set its region type or writability? You can do that. You can also create shared event channels with other domains, and we provide access to most of the raw system calls Xen provides.

In addition, we include our Inter-VM Communication (IVC) support in HaLVMCore, as well as XenStore support. The former allows authors to create high speed, typed channels to other domains (either HaLVMs or those using our libIVC C library). The latter is a Xen global key-value store, typically used for performing handshakes between virtual machines. For example, the IVC layer uses it for the rendezvous portion of an IVC connection, and Domain 0 uses it to transmit information about devices and hardware to user domains.

The XenDevice library is another user of the XenStore, in that it pulls information about devices from there in order to both configure the operation of the devices as well as connect to the appropriate backend.

XenDevice is intended to do exactly what it says: allow HaLVM's to use Xen virtual devices. At its core, it exports the ring buffer protocol that Xen uses for most of its devices. It then directly supports Xen disk, network card, and PCI bus devices. Note that Xen exports (or claims to export) other devices, as well, that we do not currently support: frame buffers, file systems, keyboards, TPMs, USB buses, and SCSI buses. If someone wanted to play around with these, we would love to integrate any pull requests.

The Other Libraries

The two core libraries are intended to bring users up to the point where they could be reasonably expected to start doing Haskell things using Haskell tools, without necessarily worrying about the guts of how the HaLVM is implemented. That isn't necessarily a very useful level, though. Raw disk block access and raw ethernet frames off the wire aren't necessarily what most people are looking for.

To solve that problem, we began creating a series of other layers that others could pull in. These are largely distributed on Hackage, because we'd love it if people could and did think about using them in "normal" environments, as well as the HaLVM.

The more supported of these is the Haskell Network Stack, or HaNS. HaNS can layer on top of our raw network driver, and provide a full TCP stack, using all the normal primitives that you're used to using. Note that if you're using other networking libraries in the Haskell ecosystem, there may be some problems integrating HaNS directly. Some libraries (for example, the tls library) provide a mechanism by which they allow abstraction over the network stack. Others explicitly require the network library. In those cases, we offer network-hans as a partial solution. network-hans offers exactly the same interface as network, so can be silently slipped in to existing package dependencies. Unfortunately, until Backpack or a similar technology is ready to go, you will still need to manually modify the relevant libraries to use network OR network-hans depending on the platform.

The less supported of these -- due to time, not to love -- is the Haskell File System, or HalFS. Much as HaNS takes the raw network card and turns it into a full TCP/IP stack, HalFS is our attempt to turn a raw block device and turn it into a file system. Note that it is its own file system, and is not compatible with any other file system. (This was a benefit historically that may now be turning into a critical flaw.) Again, we'd love to work with you if you want to either buff up HalFS or find a way to replace it with a Haskell implementation of some other popular file system. It should support basic file system usage, but, sadly, it has received a less attention that HaNS. So some of the more advanced features may not work. One nice thing about HalFS, though, is that the library was designed to work with both the HaLVM and FUSE. The latter means that you can mount HalFS trees in Linux (or, theoretically, on a Mac), play around with them as you'd like, and then distribute them to HaLVMs. Historically, this is how we've used HalFS: as a place where we pre-provision some configuration (or static data) files from Linux, and then use the remaining space for any state.

In Conclusion, A Guide

So, that's the HaLVM. A funky version of GHC, ported to Xen using a thick layer of glue code, a minimal C library, and a borrowed math library. Built in a funny way, with a bunch of libraries on top to support commonly-requested operations.

Now that you understand that, and have the context to understand the pointers, let me provide an overview of the repositories involved in the HaLVM:

  • The main HaLVM library provides structure to the whole thing, and includes the source code for the core libraries (HALVMCore and XenDevice) as well as all the scripts, configuration files, build infrastructure, and miscellaneous things needed for packaging. Its structure should be fairly self-explanatory: src/scripts holds all the scripts, src/misc includes a bunch of miscellaneous configuration files, src/debian is for making Debian packages, and src/libIVC is for libIVC. This repository is the one you check out when you want to build the HaLVM. If you're going to be doing development on the HaLVM itself, you'll start getting very good at:
git clone -b <branch> git@github.com:GaloisInc/HaLVM
git submodule update --init --recursive 
autoconf
./configure --prefix=`pwd`/dist
make
make install
  • Our port of GHC, halvm-ghc. Note that we put all of our work in the halvm branch of that repository, because we want to leave master and the other branches untainted for merging reasons. This submodule gets put into <topdir>/halvm-ghc.

  • A link to the Julia project's openlibm. This submodule gets put into <topdir>/src/openlibm.

  • Our port of GHC links to our libc implementation, minlibc, which is placed into <topdir>/halvm-ghc/rts/minlibc. There are some patches in the halvm-ghc/rts Makefile that build it for us, but it may make sense to eventually move this to the "top level", and just provide it on the side like libm.

  • We also use our own branch of the GHC base library, which we call halvm-base. For somewhat annoying reasons, this is dynamically pulled by the HaLVM build system, rather than being linked as a submodule. Looking ahead, future revisions of GHC pull base back into the GHC repository, so we may be able to get rid of this nonsense. Thus, depending on where you are building, you might see halvm-base in <topdir>/halvm-ghc/libraries/base, or you might not.

That concludes our survey of the repositories that make up the core HaLVM. As we mentioned them in the libraries section, I will also point out the HaNS, HalFS, and network-hans libraries.

That's it.

If you discover something you want to add or change, go ahead! That's why this is a wiki, and not a static document somewhere. If you have further questions, let me suggest filing them as bug reports, perhaps with the "documentation" or "question" labels. Otherwise, happy hacking!