Home - LLNL-Collaboration/uiuc2015 GitHub Wiki

Goal

In this project we will explore technologies and prototype solutions that allow users to discover, interact, and manipulate data generated by running simulations on a secure High Performance Computing (HPC) infrastructure. Solving this problem involves creating the following capabilities:

A library that allows a simulation code to publish data to the HPC infrastructure in a secure way
A brokering service built into Lorenz (LLNL's web-based HPC dashboard) that enables users to choose amongst data sources
Front-end services in Lorenz for visualizing the list of all running data servicing applications and visualizing the data streams serviced from each of these applications

The overall goal is how to enable two machines to talk with a connection restricted to a particular user.

The core requirement here is that we must do this securely! Accordingly, in our suggested project ideas below, we advocate starting with prototypes that lack security in order to get an intuition as to how the three pieces above should connect together, then analyze and work towards securing the infrastructure.

Approach

It will useful for the UIUC team members to develop on their laptops or shared system(s) so that they can explore security vulnerabilities freely and explore solutions without being limited by Livermore computing requirements. As we build prototypes, we can begin to deploy them to LLNL resources and try them at scale and in realistic environments.

The project can be broken down into three interconnected components where we sketch some initial work for the first 30-90 days of our project:

Conduit: a library upon which we will build an initial application that generates data and can be interacted with via the front-end
OpenLorenz: a web framework for managing and interacting with HPC resources: we eventually want it to broker connections between apps and users by supporting registration of simulations that want to publish data and mechanisms for client apps to discover and connect to them
Front End: one or more front ends of your choosing: we'll start with simple web pages, then Lorenz portlets. We can start by connecting directly to our Conduit app, but eventually we will discover and connect via Lorenz

Getting Started (First 30-90 Days)

Set up OpenLorenz (Instructions)
- Security Tasks: bootstrap certificates, investigate compatible authentication strategies (start with plain text passwords)
- Front End Tasks: create a few example Lorenz portlets, modify the OpenLorenz front end so that it can display Conduit's application data
Set up Conduit (Instructions)
- Application Tasks: create an example application that uses Conduit to communicate with OpenLorenz
- Use a VM or AWS instance for the conduit application in order to mock-up having three components in the system: user laptop for client, a VM for OpenLorenz, and a separate VM for a conduit enabled application.
- Ideas to explore:
  - use Munge authentication to go from a separate VM running a Conduit app to the OpenLorenz instance.
  - try getting suexec working in Apache
  - it might be that you can use TLS as a means of encryption from point to point that would be different from SSH. It would be interesting to explore the differences and trade-offs. Potentially this could be easier than an SSH solution.

Next Steps (Remainder of the Project)

Generalize the Application/Lorenz Connection
- Security Tasks: determine how HPC applications can communicate and create a connection with Lorenz
- Application Tasks: create a library that applications can use to broadcast their intent to service data to Lorenz, integrate this library into Conduit
- Front End Tasks: migrate the portlets from OpenLorenz to Lorenz, create a Lorenz portlet to discover and display registered applications
Analyze the Security Faults and Vulnerabilities of the Initial Solution
- Security Tasks: determine how to break the communication between an HPC application and Lorenz, run applications to try and hack/snoop the data coming from a running HPC application
- Application/Front End Tasks: determine how your area of the application can be modified to remove the security vulnerabilities discovered
Create a More Secure Solution
- Security Tasks: assist the application and front end developers in weeding out security vulnerabilities from their ends of the code
- Application/Front End Tasks: adapt your area of the application to remove any security vulnerabilities discovered
Branch Out the Solution to Other Applications
- Security Tasks: continue to improve the integrity and security of the existing pipeline, ensure that no security vulnerabilities are introduced by extended applications
- Application Tasks: integrate the registration library into more libraries (e.g. IPython Notebook)
- Front End Tasks: augment the Lorenz application registration portlet so that it displays more information (e.g. host machine, host user), create customized portlets for all of the data servicing applications, add the ability to open discovered services as separate portlets

Technologies

Data Publishing and Application Discovery Technologies

Munge
- A library that can help link a process running on a batch node with a user and his/her credentials.
- Can this library be used in conjunction with or as an alternative to Crowd (see below) to allow applications running on the LC machines to securely connect to Lorenz?
Pathos (GitHub Page)
- A library that facilitates the configuration and launching on remote processes within heterogeneous environments.
- Can this library be used within an application running on an LC machine to open communications with Lorenz?

Security and Communication Technologies

Right now using TLS + SSL between client web browser and Lorenz webserver. Works to RZ/CZ etc.
We can now allow authentication against Cryptocard and LSA directories, implemented over a RESTful API
Crowd: Atlassian's authentication solution
- We use it to talk to LDAV on LC, uses either RSA or Cryptocard
- Opens up RESTful API
- Includes concept of groups that manages access to resources
- We run a custom version of suexec after authentication by Crowd, then from that point on you can do operations as yourself.
- There may be a way to pass the key to Conduit, and Conduit could ask Crowd to verify user and allow them to access data