managing and accessing our cloud computers - AstrobioMike/JPL-HBCU-2020 GitHub Wiki

Overview

We are using computational resources provided by an organization known as XSEDE and managed through what's known as Jetstream.

After you created an account here at XSEDE and sent me your username, I added you to our allocation which was awarded for this internship program (the documents for that proposal are stored here if wanted for a future example).

Here we will cover how to start up a computer (called an "instance"), how to access it, and how to manage our resources.

NOTE
Using this interface is definitely a little strange at first, feel free to message me (Mike) anytime on Slack if you have any questions or want help with anything 🙂


Page Contents


Signing into JetStream

We manage our resources through Jetstream. So first we need to go there, and sign in with our XSEDE user information by using the "Login with XSEDE" icon at the top-right:



Which brings us to:



This option should be on "XSEDE", and after clicking continue, we can enter our username and password to sign in to JetStream, and our home screen should look something like this:




Creating a "Project"

Starting up and handling our computers happens in the "Projects" tab at the top. After clicking that tab, you will see a screen like this (though with no projects at first):



Click on "CREATE NEW PROJECT", name it whatever you'd like, and optionally add a description:



Once that is done, click the project to enter it:




Starting up a computer ("instance")

Now we are ready to launch a computer ("instance")! We first want to click on "NEW", then "Instance":



Now we need to select an "image" to use for our "instance". This is just the type of computer we want to start (like which operating system, any specific programs and configurations we may have wanted already). We have one prepared for us called "JPL-Summer-2020". To find it, in the pop-up screen, we need to select the "Show All" tab at the right:



Then it will either be at the top already, or we can find it by beginning to type the name "JPL-Summer-2020":



After selecting it, we are brought to the following screen, with a few options we want to check:

  1. Name the instance something more specific if you'd like
  2. Make sure the Allocation Source is set to TG-DEB200016
  3. The Provider should be ok with being either "Jetstream - TACC" or "Jetstream - Indiana University", but if you end up getting an error when trying to start the instance, trying whatever the other provider is can sometimes help (just be sure to pay attention to which your instance is on when creating a Volume as discussed below)
  4. For Instance Size, if we are starting an instance that will be running one of our 3 primary tools, it needs to be "m1.xxlarge (CPU: 44, Mem: 120 GB, Disk: 60 GB)" – which is different from the picture below. If starting an instance to do some other work, it is likely a smaller one of "m1.medium (CPU:6, MEM: 16 GB, Disk: 60 GB)" will be sufficient. Feel free to message Mike on Slack if unsure 🙂
  5. The image version below says 1.0 in the picture, but use whatever it starts on when going through the process. It's not a problem if it's e.g. 1.1 or higher.

So at the end, things should look something like this:

NOTE
As noted just above, for Instance Size, if we are starting an instance that will be running one of our 3 primary tools, it needs to be "m1.xxlarge (CPU: 44, Mem: 120 GB, Disk: 60 GB)" – which is different from the picture below.



Now, when we click "LAUNCH INSTANCE", the window will change and say "LAUNCHING":



After a minute or so, it will bring us to a refreshed project page that shows the stages occurring as that instance is built and deployed, e.g.:



This typically takes probably 5-10 minutes. When it is done, its status will say "Active" with a green light:




Accessing our instances

We have at least 2 ways we can access our new computing environment. One way is through a web-browser, another is through a Unix-like terminal. Both need the unique IP address that has been assigned to our instance, so we should copy that to our clipboard:



The examples below will use this one, but you should use your IP, of course 🙂

Accessing our instance through a web-browser

One way we can access our computers is through a web-browser. The address is our IP followed by :8000/lab, e.g., pasting my IP into my browser and adding that:



Hitting enter will bring us to a password entry box:



Which we'll tell you while together, or write me on slack for it. But after entering the password, we'll have a screen like this:



And we can access our Unix-like command line environment by clicking the "Terminal" icon on the bottom:



Accessing our instance through ssh

The other way we can easily access our instance is through ssh if we have a Unix-like environment setup on our local computers. If you don't, and would like help with this, definitely reach out on slack 🙂

To connect through our Unix-like terminal, we need to use the command ssh with this formula <username>@<IP>, replacing the relevant info. So for me with the example IP we are using above, I would run the following:

The first time connecting to this IP, we might get asked if we want to continue like this:



Typing "yes" and hitting enter will bring us to the password entry:



And then entering the password we will tell you when we're together (or write me on Slack for it), finishes connecting us:



And we're ready to rock 🙂


Adding additional storage

We can attach a volume, a virtual external harddrive, to our instance to give us more storage, and to be something we can take from one instance and put on another if we wanted. Since we are going to be working with large databases, it's a good idea if we create a volume and attach it to our new instance. A volume can only be attached to an instance that is on the same "Provider", though, so we need to note which one our instance is on.

From our project page, we can see on the right which provider we are on (Jetstream - TACC in the image below), and we now want to click on "NEW" and then "Volume":



In the creation window, we want to give it a name, setting it to 250 GB should be good for the work we are going to do, and then choose the provider our active instance is on:



Then we can click "CREATE VOLUME", and on our project page it should say Status Unattached with a green light in a few seconds:



Now we just need to attach it to our instance. Clicking on the name of the volume will bring us to this page:



And clicking attach on the right side should bring up a menu that let's us select our instance (so long as the instance is running and on the same provider as the volume):



Now clicking "ATTACH VOLUME TO INSTANCE" will bring us back to the volume, which after a few seconds will say it is attached to our instance:



We can see our new volume in the root directory of our instance if we run ls /, it will be added as something like vol_a, vol_b, or vol_c, e.g. on my instance it was attached as vol_b:



Another way we can check is with the df command, which gives us information about storage on our system:



And we can see in that list, the highest one is at size 246 GB, which is mounted at /vol_b.

We can now put things in this location, like our reference databases, and then if we need or want to work on a different instance, we can detach the volume from this one and move it to another (so long as it's on the same provider).

There is some more information on volumes in Jetstream available here.


A note on resources and "shelving" our instances

We have been awarded a limited amount of resources for this project. On Jetstream, if an instance's status is "Active":



That means it is using up resources. And the larger the instance (number of CPUs/RAM), the more resources it uses up per hour. This is of course fine and expected while we are doing work (actively or if we are leaving a process running overnight or while we're away). But if we know we aren't going to be using the instance for maybe a day or longer, and we aren't running any processes, it is a good idea to "shelve" it to save on our resources. This will preserve everything about the instance (except the IP), but just not use any resources until we want to reactivate it. At that point, if we get a new IP, we just use the new one to access the same instance as before.

To shelve an instance, we want to click on it's name to go to it's page:



Click "Shelve" on the right side:



And then click "YES, SHELVE THIS INSTANCE", and the activity will switch to shelving, and after 5 or so minutes to "Shelved_offloaded":



In this state it's not using any resources.

To bring it back online, we just need to click "Unshelve" on the right:





After clicking "YES, UNSHELVE THIS INSTANCE", our status changes to "Unshelving":



And after about 5 minutes or so our computer is ready to go again, we just need to use the new IP address to access it the same way as above.

Note on shelving
Please try to be conscientious about our resource usage. We may need to launch some larger instances as the project moves on, and it will help ensure we have all the resources we need for the duration of the project if we do our best to make sure we aren't leaving instances in an active state when they aren't being utilized 🙂

⚠️ **GitHub.com Fallback** ⚠️