GVRS Jump Start 03_Storing Data - gwlucastrig/gridfour GitHub Wiki

Introduction

The previous two articles in this series discussed how to read data from GVRS files. This article will build on the concepts introduced in those articles and use them to explore how to write a GVRS file.

Before getting started, I should note that this wiki already includes an article on writing GVRS files. How to Package Data Using the GVRS Library described the PackageData.java application that is supplied as part of the Gridfour software distribution. PackageData is the application that was used to build the ETOPO1_v1.0.4.gvrs file that was used in the previous articles. A lot of the discussion that follows is taken directly from that earlier article. But this article attempts to simplify the discussion and streamline the process.

Why we need a file

A GVRS file is just a means to an end. The GVRS API was designed to answer three basic requirements:

  1. Allow applications to manage data for rasters that are too large to conveniently store in memory.
  2. Preserve data between different applications runs.
  3. Allow different applications to share raster data.

The obvious solution for these requirements is to store data in a file. The GVRS API allows applications to off-load data from memory and write it to a file when it is not needed, and then read it back on-demand. GVRS manages this data-swapping operation using and efficient in-memory cache that does its work in a manner transparent to application code. And, naturally, the same GVRS file that supports virtual memory management can also be used to store data between program runs. In fact, GVRS files can even be shared across systems to provide data to different applications.

So behind every Gridfour Virtual Raster Store, there is a file. Creating that file isn't a goal in of itself. It's just the way GVRS gets the job done.

Creating a Gridfour Virtual Raster Store

Writing a GVRS file-backed data store is a 3 step process:

  1. Construct a specification object to describe the organization of the GVRS file:

    • Specify the raster (grid) dimensions.
    • Specify names and data types for the elements to be stored in the file.
    • Set other configuration options (data compression, checksums, etc.)
  2. The grid specification and a Java File specification are used to create a new file for writing data. The initial file is treated as an empty collection of data and will typically be smaller than 1 kilobyte in size.

  3. Values are added to the file one grid-point at a time. The internal bookkeeping and management of the underlying raster is mostly transparent to the calling application.

Constructing a GVRS File Specification

To establish the structure and data elements for a GVRS file, an application constructs an object of the class GvrsFileSpecification. GvrsFileSpecification includes a number of methods to allow applications to set parameters and options for creating and interacting with the file. Once an application completes setting up a GvrsFileSpecification, the results are passed to the constructor for GvrsFile which actually creates the file on disk.

The previous articles in this series discussed how objects of type GvrsElement were used to access data stored in a GVRS file. When reading data from an existing file, the GVRS API obtains the specifications for GvrsElements using descriptive metadata stored in the GVRS file header. When creating a new GVRS file, the GVRS API obtains those specifications from the GvrsFileSpecification object. So before an application creates a GvrsFile, it needs to supply specifications for the data elements.

The example code below shows how a GvrsFileSpecification and a GvrsElementSpecification can be used to create a GVRS file. The output file contains one element based on 2 byte signed integers (e.g. "short" integers).

// set up file specification, add one element named "z" ------
GvrsFileSpecification fileSpec    = new GvrsFileSpecification(10, 10);
GvrsElementSpecificationShort zElementSpec = new GvrsElementSpecificationShort("z");
fileSpec.addElementSpecification(zElementSpec);

// create a new file, access the element named "z" --------
File outputFileRef = new File("Example1.gvrs");
try(GvrsFile gvrsFile = new GvrsFile(outputFileRef, fileSpec)){
    GvrsElement zElement = gvrsFile.getElement("z");
    // the zElement object may now be used to read-and-write
    // data to the file:
    zElement.writeValue(0, 1, 2021); // write a value a grid row 0, col 1
}catch(IOException ioex){}
}

Specifying a coordinate system and transform

Let's take a look at how we can overlay a raster grid with a real-valued "model" coordinate system. In the code below, a call to the setCartesianCoordinates() method tells the GvrsFileSpecification to set the model for the product to use Cartesian coordinates. The call also establishes the domain for the grid. It assigns the coordinates (-1, -1) to grid cell at row 0, column 0 (in GVRS, grid cells are numbered starting at zero). It also assigns the coordinates (1,1) to the grid cell in the last row and column of the raster.

An example of the GVRS API at work

The example defines a function called f() which converts Cartesian xy coordinates to a value z = f(x,y).

The example also enables data compression by calling the setDataCompressionEnabled() method provided by file specification.

double f(double x, double y) {
    return Math.sin(x * Math.PI) * Math.sin(y * Math.PI);
}

void run() {
    int nRows = 500;
    int nColumns = 500;
    GvrsFileSpecification fileSpec
      = new GvrsFileSpecification(nRows, nColumns);
    GvrsElementSpecificationFloat zElementSpec = new GvrsElementSpecificationFloat("z");
    fileSpec.addElementSpecification(zElementSpec);
    fileSpec.setDataCompressionEnabled(true);

    // establish the domain of the grid by assigning coordinates
    // to the first grid point (0, 0) and last grid point(nRow-1, nColumn-1)
    fileSpec.setCartesianCoordinates(-1.0, -1.0, 1.0, 1.0); // sets model to Cartesian coordinates

    File outputFileRef = new File("Example1.gvrs");
    try (GvrsFile gvrsFile = new GvrsFile(outputFileRef, fileSpec)) {
      GvrsElement zElement = gvrsFile.getElement("z");
      for (int iRow = 0; iRow < nRows; iRow++) {
        for (int iCol = 0; iCol < nColumns; iCol++) {
          GvrsModelPoint xy = fileSpec.mapGridToModelPoint(iRow, iCol);
          double z = f(xy.getX(), xy.getY());
          zElement.writeValue(iRow, iCol, (float) z);
        }
      }
    } catch (IOException ioex) {
    }

    long fileSize = outputFileRef.length();
    double bitsPerSymbol = (8.0 * fileSize) / (nRows * nColumns);
    System.out.println("Resulting file size: " + fileSize + " bytes");
    System.out.println("Bits per symbol:     " + bitsPerSymbol);
}

When the program runs, it creates a raster containing 250 thousand grid cells (500 rows, 500 columns). The data is stored as four-byte floating-point values (single precision floats). In general, floating-point data does not compress as readily as integer values (see Lossless Compression for Floating-Point Data). But, in this case, the Gridfour compression reduces the data size by about half:

Resulting file size: 475272 bytes
Bits per symbol:     15.208704

Metadata for GVRS elements

The GVRS API features a number of classes and methods for supporting the addition of metadata to a GVRS file. I will cover metadata in a future article. But for now, there are three metadata parameters that are directly related to the operation of a Gridfour Raster Data Store. These optional parameters may be supplied when the GvrsElementSpecification is constructed.

Parameter Description
Minimum Value The minimum value allowed for data storage
Maximum Value The maximum value allowed for data storage
Fill Value Value assigned to grid cells that haven't been populated

These parameters may be supplied when a GVRS element specification is created. For example, the function f(x,y) used in the code fragment above produces values in the range -1.0 to 1.0. The associated element specification could be supplied using the following constructor which takes, in order, minumum value (-1), maximum value (1.0), and the fill value (the floating-point not-a-number).

elementSpec = new GvrsElementSpecificationFloat("z", -1f, 1f, Float.NaN);

The min/max bounds specification can be used to prevent an application from storing out-of-range data when building a file. But their true usefulness is that they provide applications accessing a GVRS file with information about the possible range of values for its content. That information can be used when setting up palettes, designing processing logic, etc.

Creating a temporary file with automatic clean up

The example above created a persistent data file called "Example1.gvrs". In many cases, an application doesn't need to keep the file once it finishes processing. If the only purpose for using the backing file is to manage memory, the GVRS API offers a shortcut constructor:

GvrsFile gvrsFile = new GvrsFile(fileSpec)

When the alternate constructor is supplied, GVRS will delete the temporary file when it is closed.

In some cases, an application might not even require the degree of control provided by the GvrsFileSpecification. In that case, it can use the following approach:

 try (GvrsFile gvrsFile = new GvrsFile(500, 500, GvrsElementType.FLOAT)) {
   GvrsElement element = gvrsFile.getElement(0);
   element.writeValue(0, 1, 2021);
 }catch(IOException ioex){
   
 }

Conclusion

This introduction to the data-storage elements for a GVRS file provides a starting place for developers using the API. A longer, more detailed discussion of a storing data with the GVRS is available in a related article How to Package Data Using the GVRS Library. Additional concepts related to data storage are described in the Javadoc for the GVRS API.

Future articles will discuss the use of metadata in GVRS files and will elaborate on the various data types supported by GvrsElements.