Data Server Development in the Context of the Heliophysics Data and Modeling Consortium (HDMC)

started by Jon Vandegriff, Jeremy Faden and Bob Weigel on 2016-12-19 and modified subsequently in discussion during multiple HAPI Data Server Telecons

The following activities describe some of the next steps that the HDMC can take to foster more interoperable data access using the newly developed Heliophysics Application Programmer's Interface (HAPI) specification.

Spec updates ("Activity 0")

Changes to the spec:

Proposed enhancements on issue tracker
Support modes changes (time dependent DEPEND_1) make sure we've not painted ourselves into a corner for higher dimensional
Support for static datasets? (would require relaxing some basic requirements). Maybe support this by having a pass-through server that can take static content and re-expose it dynamically? ability to request individual array elements in a data parameter request (like parameters=protons[2])

Make many datasets available through the HAPI interface ("Activity 1")

Create full production implementations of HAPI servers at the institutions listed below. (Note: it would be best if each team made their code available as open source or at least easily obtainable by other teams so that additional data providers can use it as examples.)

CDAWeb

how to do this is ultimately up to Bernie, et al., and these are just suggestions/ideas
Nand has a prototype for this at http://cdaweb.gsfc.nasa.gov/~nand/hapi.html
have other teams assist to get it done in less than 3 mos.
start with just the datasets with master CDFs
for other datasets (those without master CDFs or other ones that CDAWeb wants to offer), create a framework for capturing the required metadata to hook into their HAPI system

PDS

again, this is up to Todd
maybe focus on a current mission or instrument
utilize Autoplot readers for PDS datasets?
only focus on PDS 4 data, depending on expected speed of PDS 3 to 4 migration

TSDS

add lots of magnetometer datasets
how to best deal with datasets that require an account / ID to access
serve for now as a pass-through server for SSCWeb/CDAWeb?
use the TSDS implementation as a reference implementation (drop-in implementation for use by others?)

APL

add multiple instrument datasets (which ones are not available at CDAWeb?)
focus on current missions
share access with Iowa on JUNO, Cassini

Test Server

create a very basic server for client testing
it just delivers well understood functions (sine wave, simple spectrograms, etc)
open source code so people can see how to emit HAPI data

Advertising the HAPI mechanism at multiple venues ("Activity 2")

EGU meetings?
MOP meeting in June
Pick one or two data conferences

Create documentation and workshops for people who want to

serve their data using HAPI (by creating their own server or adapting a drop-in server)
write a client to access existing HAPI server data
use a HAPI client to do analysis

Keep the SPASE crowd up to date on latest versions.

At the beginning, work with specific individuals to bring adoption to key datasets and agencies, like our inclusion of LASP. The SPEDAS developers would also be helpful. Perhaps existing instrument teams, and other data centers.

Also, an EOS article about the Heliophysics Data Environment (HPDE) could describe HAPI and SPASE, and other data infrastructure items.

A separate HAPI paper may be publishable in a methods paper in JGR.

Create multiple clients ("Activity 3")

Have client developers coordinate so that the native client APIs all look the same. (I.e., the IDL and the Python calls to get HAPI data look similar – same order for arguments, same time formats, same names for key functions)

Generate the native client APIs first, then have people fill that out (?).

create the basic "fill my array" client for multiple languages: IDL, Python, Java, MATLAB, others (Perl?, C/C++).
web-app client for simple plotting TSDS already has this for generating plots using Autoplot back-end. Jeremy is developing a thin Autoplot client that allows one to zoom in on a png and then it fetches another png from server. Probably also want a native Javascript option like Doug has implemented for visualizing LiSIRD data (uses plot.ly or similar).
client that is also another HAPI server and offers more advanced data processing capabilities, such as (averaging, spike removal, merging of datasets, etc) RSW: For most languages this would be easy - just link to autoplot.jar. Has been done in Python, MATLAB, and IDL already.
create a client that can create a local set of files (hourly, daily, monthly, yearly, or custom list of times) from any HAPI server. RSW: This is a lot of code that would need to be written for many languages. Better approach is to link into jar file that does this.

Create a drop-in server other providers can adapt/use to serve data ("Activity 4")

RSW: TSDS is already a drop-in server in the sense that if you have a set of files that follow the http://tsds.org/uri_templates specification, you can serve data provided that you form a template. TSDS then provides a web selection interface, many plotting options, data inspection option, filtering (averaging/mean/min/max over time interval), auto-generation of download scripts and links to more information about the data and how to cite it. To provide HAPI metadata, extra information is needed in a configuration file. If we want a second full-featured drop-in server, I suggest building on the Autoplot codebase. All of the back-end code is already written.

RSW: I think that this activity should be split into two parts: (1) Drop-in full-featured and (2) Basic implementations. The basic implementations should just be "Sample Data Servers" that parse and validate the URL and serve only test data. The idea is that if a user wanted to work on back-end server development, they would have the option of using TSDS or Autoplot or starting with something simple.

Have several types of servers that data providers can choose from to drop in over their data to easily make it available in a HAPI-compliant way.

a server-side Java web-app that can be dropped into a Tomcat or Glassfish or Jetty, etc. (maybe use Autoplot data reader code inside this to allow easily configurable readers; APL has similar software)
a script-based solution that does not require a full web-app container, i.e., something you can drop into the cgi-bin directory of a regular Apache web server

The ultimate goal is to have several options for various levels of users, but for non-experts, to have at least one option that guides users through the installation process with an interview-like process that installs the HAPI server and connects existing data to the server.

RSW: This interview process would require a significant amount of developement for a very small community. To connect a new set of files or to use TSDS to convert from one API to HAPI takes about 1 day of effort. We could spend months getting an interview process right that would be used by ~10 groups.

Key elements to the drop-in server include:

a solid reference implementation of the URI Template specification
configurable file readers for CDF, ASCII / CSV, binary files (like PDS binaries)

Consider creating a SPASE-driven reader for the cases where the SPASE records are good enough to prime the reader. This will likely require lots of small tweaks to SPASE records if they start getting used this way.

**RSW:

Many difficulties with this approach.

This idea that we can "just" use SPASE has been floating around for a long time. If one researches implementation in any depth you will quickly realize that "small tweaks" to SPASE records is not realistic.

Some notes from a recent attempt to do this with TSDS reading SSCWeb SPASE records (see also Bernie's notes on using SPASE for CDAWeb holdings):

The Parameter information section highlights some issues with using SPASE to "prime" (Jon V.'s word) a reader. The parameter information is specific to the CDF output and is not correct in some cases for the ASCII output from the SSCWeb web services. For example, time is given in the number of milliseconds since 0000 in the SPASE record, but in ASCII it is something else. The Region parameter is indicated as an integer in the SPASE record, but the web service puts out a string.

yyyy ddd hh:mm:ss X Y Z Region

2000 200 00:00:00 -121.34 191.68 101.59 Intpl_Med 2000 200 00:12:00 -121.36 191.67 101.59 Intpl_Med **

The last stage of the interview process should be to run a test client that tries the newly deployed server and executes basic queries for the catalog and info and data.

Need some testers to try out initial drop-in servers: Darren De Zeeuw (?), CCMC (Justin, Chiu)

HAPI Registry Development ("Activity 5")

Options for a registry:

none
static file with a few people that can add to it: servers.txt (add name and description – make is JSON format instead)
add a registry endpoint that servers could optionally support; this could be advertised in the capabilities end point (hasRegistry in the capabilities defaults to false if absent); probably to not want to imply relationships (hierarchy) by which server lists others; we do need a naming convention for HAPI servers; this could be hierarchical

Note: probably want to add a naming convention for servers, even now!

Question: where to add the 'name' attribute?

in the root endpoint (but we said this was non-standard)
in the capabilities
in every response?

Also create a mechanism that polls everything in the registry to see which HAPI servers are actually live. Make the live-ness test history available online.

HAPI/SPASE connections ( ("Activity 8")

access URL to hapi root or maybe the info endpoint?

AccessURL "http://hapi-server.org/hapi/info?id=DATASET_ID"

name of dataset in HAPI catalog

Security ("Activity 6")

need to think through various security scenarios – think about possible exploit pathways for the server
any differences for https?
just use http usernames and passwords
develop a test client to probe the security robustness of a new server

Longer-Term Development ("Activity 7")

get active SOCs to use this for instrument team support on current and future missions: MMS, Van Allen Probes, JUNO, STEREO
get instrument teams to use this inside their existing tools (having HAPI client libraries would help with this!)
seek adoption by major software tools: SPEDA/AMDA/VESPA/SpacePy
seek adoption by other communities:
- ground magnetometers (SuperMAG, other magnetometer chains)
- Superdarn?
- Planetary (beyond PPI node of PDS)
- Geomagnetic index providers (Dst, AE, etc) Already available via CDAWeb
- European, Canadian, and Japanese groups
- Astrophysics (HEASARC has a lot of time series and event data)
- USGS

hapi activities - hapi-server/data-specification GitHub Wiki

Data Server Development in the Context of the Heliophysics Data and Modeling Consortium (HDMC)

Spec updates ("Activity 0")

Make many datasets available through the HAPI interface ("Activity 1")

CDAWeb

PDS

TSDS

APL

Test Server

Advertising the HAPI mechanism at multiple venues ("Activity 2")

Create multiple clients ("Activity 3")

Create a drop-in server other providers can adapt/use to serve data ("Activity 4")

HAPI Registry Development ("Activity 5")

HAPI/SPASE connections ( ("Activity 8")

Security ("Activity 6")

Longer-Term Development ("Activity 7")

⚠️ GitHub.com Fallback ⚠️

hapi activities - hapi-server/data-specification GitHub Wiki

Data Server Development in the Context of the Heliophysics Data and Modeling Consortium (HDMC)

Spec updates ("Activity 0")

Make many datasets available through the HAPI interface ("Activity 1")

CDAWeb

PDS

TSDS

APL

Test Server

Advertising the HAPI mechanism at multiple venues ("Activity 2")

Create multiple clients ("Activity 3")

Create a drop-in server other providers can adapt/use to serve data ("Activity 4")

HAPI Registry Development ("Activity 5")

HAPI/SPASE connections ( ("Activity 8")

Security ("Activity 6")

Longer-Term Development ("Activity 7")

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️