Artifact: Meta - prl-julia/julia-type-stability GitHub Wiki

Todo

~~Test with Julia 1.5.4 and nix-shell --pure~~
~~Implement depot removal~~ (guarded by parameter?)
~~Compute stats with code not with Excel~~
"The Overview"
Timings
Update numbers in the paper

Time permits:

Comments in the code
Refactor methods > 15 locs
check if a local registry of packages could allow for offline execution

Below is a more involved discussion of our choices. Kept here mostly for historical reasons.

Reproducibility v2

Below (under "Inner workings") are first thought on how to make results of package analysis more predictable. Today (July 2nd) I had another thought "interesting" thought. Put it simply: Pkg will hopefully get me exactly the same code if I package Manifest.toml (and Project.toml for convinience) in the artifact. Testing now.

Inner workings

The artifact involves processing a lot of Julia packages. My desire was to prepare all packages locally so that the reviewer will not need Internet even. I haven't succeeded in that so far. Here is my explanation.

First, background. Package management is generally tricky because of several reasons:

Julia's package manager Pkg is known to be slow (see this discussion for an example)
it is also not re-entrant (lost the link to the relevant GH issue :-() meaning that running it from multiple processes leads to races if not set up a completely isolated "depot" (a place in the file system where Julia stores all its metadata including package-related things; can be set via an environment variable),
and running the whole pipeline sequentially for 1K packages would probably take a week or something...

So far I was using the separate-depot approach: it allows for parallelism but eats large amounts of space: ~ 1Gb per depot for one package (before running tests), turning into ~1.5Tb (after tests) for all 1K packages.

Another idea was to not use separate depots, but instead construct one depot sequentially and then run the pipeline in parallel with the constructed depot. One common depot for 10 packages is 2.6 Gb. How much it will take for 1K pakackes?... I learned there's an obstacle for this approach anyway, in that it doesn't account for test dependencies (package test suite can declare its own dependencies; there doesn't seem to be an easy way to prepare those ahead of time). So, this will probably fail again because of the re-entrancy problem.

So the separate-depot parallel approach seems most viable at the moment. If add on top of it the idea of removing depots right after processing each corresponding package (surprisingly, I only came up with this idea at the end of last week), this will not even require 1.5 Tb of storage from the reviewer (good!).

There may be a another way still; one that could allow us prepare all packages locally with reasonable space consumption -- that is by creating a local Julia Registry. Registry is a server (can run locally) that Package manager talks to when it tries to find the required package. Essentially it implements the mapping from package names to their locations (usually on the web but my understanding is that it can be locally too). I wasn't able to figure out how to set it up yet. Also I'm afraid this will require computing the transitive closure of our dependencies "manually" (well, by writing custom code that would do that, of course). I deemed this path infeasible in the given time constraints.

Packaging

I've been using VirtualBox-based solutions for many years, including for artifact of our subtyping paper on OOPSLA '18. Over years I came to conclusion that its too much of space and clunky interface between host and guest. These days cool kids use Docker and I think I can use it too. This alone will shrink space requirement an order of magnitude: from several Gb to several hundreds Mb.

But I think even several hundreds Mb are not strictly necessary. I was thinking about using Nix (the purely functional package manager) to get in the dependencies that I need (of which there's just three, I think: Julia 1.5.4, GNU parallel, timeout from GNU coreutils): this is one more command which I'm sure is reproducible and future-proof. This approach can benefit from the Docker technology in the following way: if the reviewer doesn't have Nix and don't want to get it (likely), they can pull in a docker image with Nix via a single shell command. That would be a publicly available Nix image (also future-proof).

Alternative to Nix would be to actually submit a Docker image (~ 400 Mb) with everything included. From the user point of view, entering such an image is not much different from pulling in the Nix image (the previous solution). The difference is we have to copy back and forth those 400 Mb. Don't think there are any other significant differences. Yulia prefers it because she thinks that it's less hops to the actual artificat but she never used Nix so I don't think it's fare.

If you have any preference between the two approaches (Docker+Nix+light-artifact or full Docker image), let me know. I'm fine with either. Just please, no virtual machines: I'm confident they are a) bloated; b) provide more isolation between client and host than needed, at the expense of clunky UX between the two; c) are way less future-proof than people think about them (as I said I've been using VirtualBox long enough to learn it the hard way).