Getting Started - sharpc42/CGM-theory-sims GitHub Wiki

Let's go over how to set up both our code and the accompanying Athena++ framework.

Download and Setup

Athena++ is required to run the CGM Virtual Environment. To get Athena++, you’ll need to head over to the download page and follow its instructions. If you’re familiar with Git and repositories, then you’ll be right at home.

Now that you have that, you’ll want to download the CGM Virtual Environment itself. You'll want to copy the repo's web address, start up the command line, go to the directory of your choice on your computer (using commands like cd ...), then type in git clone https://github.com/sharpc42/CGM-theory-sims which will begin the download. To grab any updates in the future, you just have to type git pull at any time you wish.

Again, if you know Git, you’ll be right at home.

The important part: To get the CVE to work according to design, you must have it inside the bin folder of the Athena++ installation (typically a folder called athena-public-version). So cut-and-paste, drag-and-drop, or move with mv ... to get that folder into the top CGM-theory-sims folder alongside the main CVE file and custom athinput.cgm Athena++ input parameter file.

OS-specific Instructions

What's Involved

PC vs Supercomputer

Running Athena++ and the CGM Virtual Environment is a little different on a home computer versus a supercomputer, for instance on the UW Seattle supercomputer Hyak. Therefore, these instructions will cover setup for both for completeness.

For sake of specificity, we will here cover supercomputer setup on Hyak. If you’re using a different supercomputer, unfortunately since we can’t know what it is or what it requires the instructions will likely be at least somewhat different. In that case, consult its documentation for how code such as Athena++ or our output processing software would run on it.

That said, there are some significant parallels (including parallel computing) which will be covered here, so not all hope is lost. The way they were written, make sure to pick whichever your setup is, and then read through those instructions to the end, then from there to output file processing. We did it this way because the Hyak setup in particular runs off on its own worldline. Just roping each approach off seemed to be the most straightforward way of handling this documentation.

General Stuff

You’ve downloaded both the CGM Virtual Environment and Athena++ to your platform, and due to the wonders of Git have already implicitly set them up. Now, you’ll need to take the following steps.

  • Copy and paste the file athinput.cgm from your CGM Virtual Environment download directory into the bin folder of where you installed Athena++ (which should have the parent folder athena-public-version.)

  • Similarly copy and paste the file cgm.cpp from your CGM Virtual Environment download directory into the pgen folder under the src folder of where you installed Athena++

  • For output file processing, you’ll simply need to decide where in your file system you want it to take place, for organization’s sake, then copy and paste the [relevant Python files] to that location. Whenever you run a simulation, you’ll want to copy and paste the output files, of the type you specify in the parameter input text file (typically either .vtk or .athdf), from the bin folder in your Athena++ directory to that file location which you’ve designated for processing.

Running From a PC

This is the simple part. If you’re running the CGM Virtual Environment and Athena++ from your home computer, then chances are you’re running in serial: for instance, perhaps you are perfecting some modifications or trial setups in a quick, iterative fashion. This form of computation means setup is straightforward. Let’s begin.

Once you’ve gotten the relevant files in their respective folders as was previously detailed, you just need to hop into your OS command line and start inputting.

The CGM Virtual Environment was designed specifically to facilitate easy execution of Athena++ with respect to given CGM meshes. So all you need to do is type in a command in the form of

sh cve.sh [qty] [qmn] [qmx]

Let's unpack this.

First, this is a shell script. That means that the program is basically a bunch of terminal commands you could have typed in and entered yourself if you weren't so lazy. Literally though, it's a fancy way of pre-typing in terminal commands, all of which will then be dropped at the snap of the digital fingers. Thank you automation.

From there, we have three optional arguments which are denoted by the textual commands which the internal code recognizes, except within brackets. We have qty which says which quantity you want to track for visualization and analysis, then qmn and qmx which specify the minimum and maximum values you're interested in of this quantity.

A typical run of the CGM, for example, might be

sh cve.sh rho 0 2473

This says to run the CVE with attention to the density rho (which tends to most directly what one would see if possibly looking at this astrophysical environment) with minimum and maximum quantities of 0 and 2473 respectively. These are code units which we'll go over in the Overview section.

That's it. Really.

Input File

This all requires a user-provided parameter input file athinput.cgm which was provided with the download. This is the workhorse of the CVE. But what would you want to do with it?

First, there's the stuff we need to make sure that Athena++ configures correctly for the problem you desire.

Want magnetic fields? Make sure to head to the <problem> subsection, and edit mag_fields to be 1 (True) instead of 0 (False) which is also the default. This makes it a magnetohydrodynamic simulation run instead of a plain hydrodynamic one. Other useful options are in there, and we'll cover them in the Parameter Input section.

If something goes wrong, if an error message is given, if an exception is thrown, please copy and paste it into a message to us via the info provided in the Contact section. If you learn something, don't keep it to yourself!

The CVE should provide at least images if not an animation of the simulation output depending on the preferences you set. These will again be covered in the all-important Parameter Input section as well as the Output File Processing section.

Running From Hyak

In essence, high-performance computing (HPC) proceeds similarly to the process described above. But there are both more setup options and different file systems to consider.

Before we get lost in the depth of details, let's be clear. At the end of the day, a supercomputer simulation in the CVE via Athena++ proceeds via the same process described above.

Conceptually, though, this is fairly more complicated. While you can run serial on a single node on a supercomputer (in which case, see the previous instructions), the whole fun of it is to run in parallel with multiple nodes according to your system provider. This is typically for the really ridiculous setups of super high resolutions and detailed meshes, that kind of thing. Let’s begin.

Hyak-specific Instructions

This is a quick overview of the information available at the official Hyak wiki. Always default to what’s in here if there are any conflicts or questions or otherwise weirdness between it and our summary here. We’re just trying to cover the quick and dirty details particular to using the CGM Virtual Environment.

[Login instructions and such]

General Instructions

Once you’ve gotten the relevant files in their respective folders, unlike with serial processing you have a bit more preliminary legwork to do. Outputs for parallel computations are best handled for us with the hdf5 format. See the Output File Processing for install instructions. [where there will be install instructions for this including what the below-mentioned “your hdf5 location” would be referring to.]

This proceeds similarly to the personal PC process. You just need to type in

sh cve.sh [qty] [qmn] [qmx]

with appropriate inputs for desired quantity of analysis and its minimum and maximum: for instance rho 0 2473 would be appropriate in our code units.

There are other special parameter inputs you need to consider in the athinput.cgm file. Let's go over these ones. For redundancy, we'll go over them again in a special section dedicated to the paraminput file.

Here's what we assume what you want by running the CVE in an HPC context like this.

First, we assume you want parallel processing if you set parallel to 1 (True) in the paraminput file. We default to 0 otherwise.

Second, we need to handle file outprocessing in the hdf5 format, which with Athena++ language is the .ahdf file extension. There are complexities associated with this, which are handled by you making sure the file_type parameter under the subsection <output2> says hdf5 otherwise Athena++ will not be happy. (You can do parallel processing with .vtk files after all, but this highly unrecommended for the CVE.)

A file path will need to be specified by adhf_path under the subsection <extra> in the paraminput file.

There's also the question of C++ compiler. We assume icc for the Hyak computation environment. If that's not okay, please say so in the CVE paraminput file via the cxx parameter within the <extra> subsection.

That should do it, assuming HDF5 is set up right. (See 'Output File Processing.) Again, if there are errors, please contact us under the info in the Contact` section.

The Parameter Input File

This is our Athena++ file for you to edit in order to manipulate and set different varieties of different environments. This should be the primary file you directly use without touching source code. The file consists of all the parameters utilized by the C++ problem code.

Comment and Job

Mainly the comment section gives you what you need for configuration. This is based off of the key given in the following "job" section. This key should be the name of the C++ problem file.

Output

This concerns history data first, rendering data second.

The history data can be useful, but we're usually interested in the rendering data. This can be in terms of .vtk or .ahdf files typically (and that is all we support). VTK is simpler and useful for single core on personal computers and such, while HDF5 is more complicated but very useful for multicore computations.

Regardless, each output section aside from specifying file type (avoid touching the .hst parameter in subsection output1) mainly requires you to specify the code time unit interval between file outputs. This is analogous to "frames per second" in film and video games. The larger your dt variable input (greater than one) then the faster a smooth replay of your output renders will appear to march through the simulation. Conversely smaller dt inputs give finer grain renders which progress through the simulation much more slowly.

We've set the render output for every 10 Megayears: which is ten times our code units for time.

Time

This is very practically useful albeit for one input only. The user will mainly be interested in the tlim variable which specifies in time code units how long the simulation is to run for.

The rest of it should probably be left alone unless the user specifically and deliberately knows better.

But I Insist...

There's the Courant-Friedrichs-Lewis number or CFL limit, which sets a tradeoff between computational stability and total physical accuracy; lower is more stable, and anything more than 1 is not viable.

The computer cycle cutoff is set by nlim but we keep this at -1 in value to remove it as a limit in simulation.

The integrator is set at vl2 for the Van Leers integrator which is better for steep density gradients in multiphase environments like ours prone to shocks. (McCourt et al., 2016)

The final parameters also affect stability vs. accuracy or output for computational cycles.

We generally do not recommend changing any of these.

Mesh

As the name implies, this affects how the virtual space is laid out for physics, divvied up into sections for the three Cartesian directions: x, y, and z or x1, x2, and x3 in the parameter file using Einstein's notation.

For each dimension, the parameters involved are these, using the x-direction as an example:

The nx1 variable sets the cell number in that (x) direction. This is directly akin to setting a physical as well as visual resolution to the simulation, so it's pretty important: affecting both meaningful output and practical computation time.

The x1min and x1max variables define -- in physical code units -- the inner and outer boundaries. (Left and right horizontal, down and up vertical.)

Finally, boundary conditions. The three options available are "reflecting," "periodic," and "outflow." These mostly do as they suggest. For example, outflow can be used when streams are expected to carry gas out beyond the simulated system, and reflected can be used in cases of symmetry to simplify computational complexity in larger environments, e.g., in our case either side of a galactic disk.

The one catch is that periodic, which can let you pretend this is one part of a larger system extending out beyond the mesh, must apply to both sides of the appropriate dimension, not just one. In other words, for a given dimension's section you may have a mix of "reflecting" and "outflow," but can have only "periodic" and "periodic" both. This is the case in our default parameter file.

Hydro

This section sparsely informs thermal properties of the mesh's physics.

Gamma indicates degrees of freedom in the equipartition theorem and such, and we set it to 5/2 = 2.5 for the generic CGM environment.

Meanwhile grav_acc2 is the parameter specifying gravitational acceleration, which in Athena++ is generally assumed to be a linear gradient like that we see on the surface of Earth; in the CGM environment this is a fair approximation. The default is calculated for a generic Milky Way-esque spiral galaxy using centripetal acceleration in a force equilibrium with the typical physical values for velocity and radius.

Problem

Finally, most of what the CGM Virtual Environment is envisioned to do is accomplished via the problem section of the parameter input file.

There is the ambient density (damb) which sets the baseline for the matter distribution profile in the simulation's initial mesh.

When magnetic fields are enabled, their strength is determined by the ratio of their pressure to that of the thermal gas in the ambient environment, or the pbrat parameter in the file. The angle of the magnetic field vectors relative to the horizontal axis is set by angle.

Astrophysical cooling is a significant component of the CVE, and is defined in terms of the ratio of cooling time to gravitational free-fall dynamical time, trat, with smaller values corresponding to more powerful and effective cooling mechanisms.

The atmospheric scale height, H, corresponds to the height (here, above the galactic disk) at which the quantity in question -- density -- falls off by a single e-folding.

But most relevantly we have the environment parameter envr which specifies which type of circumgalactic environment is to be computed. This is a continuously evolving set of possibilities, and all suggestions for further relevant simulation environments are welcome for consideration.

For now, we support a generic CGM environment set by the ambience + perturbations of McCourt, et al., 2012, set by 1 and with 2 a generic spherical cloud crushing setup which we hope will be parallel with the cloudlets scenarios. [Perhaps both options even will be available?] If the cloud crushing problem is chosen for "envr," then the radius of the cloud is set with the "radius" parameter.

Output File Processing

There are two baseline file formats (with corresponding types) which we use in the CGM Virtual Environment depending on exact need and context: VTK and HDF5. We will cover both here in turn.

VTK

One of the two baseline files that we use is the “.vtk” type. It is often the easier to use, as effectively no setup is required in serial — but as a consequence it is less robust, and in parallel the setup difficult increases exponentially. Therefore, it is potentially quite useful on a home PC in particular, as well as in some trial setups on a supercomputer like Hyak using single nodes, but if you want to do any kind of scaled parallel processing for larger simulations it can be very limited and so is not recommended. Opt for HDF5 in that case.

Serial Processing

Parallel Processing

If you insist on using VTK with parallel anyway, that’s technically fine, but with conditions. Basically, as parallel processing breaks the mesh up into different segments, you’ll have as outputs little VTK files for each segment at each output time step. Therefore, you’ll need to use Athena++’s join-file Python code to literally patch them together into a single VTK for processing at each output time step, though the performance isn’t great. The practical limit for this approach is 1000 files. If your simulation runs for more output time steps than that, you start running into errors and you’ll likely lose out on renderings and analysis for any output time steps beyond 1000. That’s simply where the code is at this time.

[Instructions for join-VTK use]

Rendering and Plotting with the CGM Virtual Environment

By design, rendering and plotting is handled [nearly?] identically as with the HDF5 case.

HDF5

The other of the two baseline file types we use is the HDF5 file format. It is much more robust for processing parallel computational outputs, therefore is very friendly for large simulations on supercomputers like Hyak. The fairly major catch is that there is extensive setup, which we will cover in full, and so the difficulty increases quite a bit. However, it is still easier to use alongside parallel computations than VTK, so in those contexts the difficulty is very much worth it.

Right off the bat, the weirdness starts with the fact that there are multiple file types. While HDF5 is the format, the base file type is actually “.athdf” and its companion is “.xdf.” Athena++ will output both under otherwise similar file names anytime it runs while configured for HDF5. Most if not all of the Python-based processing of HDF5 files in this project will just use the “.athdf” files, but external rendering programs that you may wish to use such as VisIt or yt will use the “.xdf” files.

The HDF5 format means that, once setup and the simulation using it completes, both serial and parallel processing work the same. You simply take the output files and feed them into your rendering and/or plotting program of choice. If that is our supplied code, see below for detailed instructions.

Serial Processing

Parallel Processing

Rendering and Plotting with the CGM Virtual Environment

By design, rendering and plotting is handled [nearly?] identically as with the VTK case.

REFERENCES

McCourt, Michael, et al., "Thermal Instability in Gravitationally-Stratified Plasmas: Implications for Multi-Phase Structure in Clusters and Galaxy Halos," 2012. McCourt, Michael, et al., "A Characteristic Scale for Cold Gas," 2016.

⚠️ **GitHub.com Fallback** ⚠️