GVRS Jump Start 05_Metadata - gwlucastrig/gridfour GitHub Wiki

Introduction

The GVRS metadata features enable applications to attach supplemental information to their products. Metadata can be used to keep a record of how a product was processed, what it used for data sources, who created it, or how application developers and data scientists can integrate it into their own systems.

This article provides background on the ideas underlying the GVRS metadata API and offers practical examples on how they may be used.

Metadata at work

Here's an example of metadata at work. Although Google Earth does not support the GVRS format (at least not yet), I was able to use the georeferencing metadata in a GVRS elevation data set to create a GeoTIFF image that could be shown in a Google Earth display. In the figure below, the base image from Google Earth is overlaid by a color-coded, shaded-relief elevation product. I will have more to say about how this depiction was produced later on in this article. For now, the image provides an introduction to the idea that metadata is useful not just for documentation purposes, but can also serve a functional role in data-processing workflows.

Shaded-relief image over Nantes, France

Basic metadata

The GvrsFile class includes methods that access the basic metadata related to the file organization and overall content. This information includes the items listed in the table below:

Property Type Description
UUID Java UUID Universally Unique Identification
Label String (UTF-8) An arbitrary, application-defined, human-oriented identifier
Date/Time Created long Internal record of creation date/time, in milliseconds epoch Jan 1, 1970
Data/Time Modified long Internal record of last modification data/time, milliseconds epoch Jan 1, 1970
GVRS specification GvrsFileSpecification Safe copy of file specification used to create original GVRS file

The UUID is a randomly generated 128-bit label used to uniquely identify a GVRS file. While it is possible that two GVRS file instances could have the same UUID, the probabilty of that happening is "close enough to zero to be negligible" (Wikipedia, "UUID"). The UUID returned by the Java version of the GVRS API follows widely accepted standards and can be formatted into a 36 character string for use in user interfaces and printed reports.

If you are using GVRS in conjunction with a relational database (in a content management system or other application), the string value obtained from the UUID instance is suitable as a database key. If you wish, you could also use the label property to represent an inventory control number. Or, of course, you could just use it to label a product so that other users would know what it contains. GVRS does not impose any restrictions on the content of a label.

The basic metadata for a GvrsFile is a little different than some of the other classes in the GVRS API in that it does not include a name property. The semantics of the name property, as used elsewhere in the API, just doesn't fit with the design of GvrsFile. In particular, the name is defined to be always non-null, always unique, and always based on a restricted character set (the ASCII identifier characters). In the development of GvrsFile, I was unable to find a suitable way of supporting these attributes through the API.

General-purpose metadata using the GvrsMetadata class

The GvrsMetadata class provides a container for exchanging metadata between a GvrsFile instance and an application. It is intended to serve as a general-purpose mechanism for handling metadata.

The design of the GvrsMetadata class was inspired in part by the Variable-Length Record concept used in the Lidar LAS format and also the TIFF tags used in TIFF files. The GvrsMetadata class contains the following fields:

Property Type Description
Name String (ASCII) A string that identifies the category of metadata (32 character maximum length)
RecordID Integer A integer code used to uniquely identify entities within a category
DataType GvrsMetadataType Indicates the kind of data stored in an element
Content byte array A binary representation of the metadata (defined based on data type)
Description String (UTF-8) Descriptive: Free-form explanatory text

The combination of the name and record-ID fields serves to uniquely identify a metadata entity. The name, by itself, is not sufficient to uniquely identify an entity. Instead, the name refers to a category of metadata types. A GVRS file may contain multiple metadata entities having the same name, but separate record-ID values.

The use of the record ID property is somewhat specialized. In many cases it is irrelevant to applications that use GVRS. Therefore, the GVRS API provides methods that handle record ID internally and allow applications to work with metadata using only the name attribute. To keep things simple, we will begin our code examples using those methods and defer our discussion of record ID until later in this article.

An example using just the name property

The code fragment shown below constructs a GvrsMetadata object, writes it to a GVRS data store, and then reads back a safe copy of the information in the metadata.

   try (GvrsFile gvrs = new GvrsFile(spec)) {
       GvrsMetadata productCopyright = new GvrsMetadata("Copyright", GvrsMetadataType.STRING);
       productCopyright.setString("This data product is in the public domain");
       gvrs.writeMetadata(productCopyright);
       
       List<GvrsMetadata> metadata = gvrs.readMetadata("Copyright");
       GvrsMetadata resultCopyright = metadata.get(0);
       System.out.println(resultCopyright.getString());
    }

The return value from the readMetadata() method is a list of GvrsMetadata objects. Each item in the list is a safe copy of information stored within the GvrsFile. The version of the readMetadata() method used above did not specify a record ID. So, had multiple copyright been written to the file, all of them would have been valid results for the query operation.

A convenience method for string-based metadata

GVRS metadata instances can be assigned a number of data types including both numeric and string types. Because the use of string-based metadata is so common, GVRS implements a convenience method for storing metadata strings. An example is shown below:

    gvrs.writeMetadata("Copyright", "This data product is in the public domain.");

Automatic assignment for a record ID

Earlier, I mentioned that if the record ID is not specified, the GVRS API handles it internally. In the absense of an explicit specification, the API simply assigns an ID sequentially. Let's consider a case where we wish to add two copyright notices to a file:

    gvrs.writeMetadata("Copyright", "This data product is in the public domain.");
    gvrs.writeMetadata("Copyright", "Este produto de dados é de domínio público.");

    List<GvrsMetadata> metadata = gvrs.readMetadata("Copyright")
    for(GvrsMetadata m: metadata){
         System.out.format("%2d.  %s%n", m.getRecordID(), m.getString());
    }

The result would be:

 ID  Content
 1.  This data product is in the public domain.
 2.  Este produto de dados é de domínio público.

An an application could fetch a specific metadata element by specifying both the name and record ID as shown below. In this case, the return value is a single instance rather than a list.

    GvrsMetadata copyrightInEnglish    = gvrs.readMetadata("Copyright", 1);
    GvrsMetadata copyrightInPortuguese = gvrs.readMetadata("Copyright", 2);

Data type and content

So far, the metadata objects that I have used in the examples have all been of type String. But the GvrsMetadata class supports a number of numerical data types. It also provides an option for storing an array of bytes that can be used in whatever manner an application requires.

The metadata API design follows a strong-typing model. The data type for a GvrsMetadata instance is specified when it is constructed. Data type is specified using the GvrsMetadataType enumeration. A family of accessor methods are provided for each supported data type. Once a GvrsMetadata object is initialized, it will only accept data of the kind specified in the constructor. A few of the GvrsMetadata accessor methods provide the capability to cast data type. These are supplied when there is a well-defined mapping from one type to the other.

Here's a code example that shows how to move data of a particular type in and out of metadata objects.

    GvrsMetadata example1 = new GvrsMetadata("IntExample1", GvrsMetadataType.INTEGER);
    GvrsMetadata example2 = new GvrsMetadata("IntExample2", GvrsMetadataType.INTEGER);
    int []iArray = new int[]{1,2,3};
    example1.setInteger(8675309);
    example2.setIntegers(iArray);

    int []result1 = example1.getIntegers(); // returns array of length 1
    int []result2 = example2.getIntegers(); // returns array of length 3
    int result1a  = example1.getInteger();    // returns integer value 8675309
    int result2a  = example2.getInteger();   // returns integer value 1.
   
    gvrs.writeMetadata(example1);
    gvrs.writeMetadata(example2);

The GVRS Javadoc gives more information about metadata-related data types and operations.

Standardizing metadata names

When working with the GvrsMetadata class, application developers should be careful about how names are specified. Because GVRS allows developers to name metadata in any way that they choose, it is easy for typographic errors to result in a mismatch between applications. For example, consider the following code fragment:

    gvrs.writeMetadata("ExampleName", "Test");
    List<GvrsMetadata>emptyList = gvrs.readMetadata("exampleName");

Because names are case-sensitive, the GVRS API will not match the metadata for "ExampleName". The read operation will return an empty list.

In practice, mistakes like this one are easy to make. To reduce the likelihood of errors and incompatibilites between different applications, the GVRS API includes an enumerated type that defines a handful of common metadata names. The enumerated type is named GvrsMnc for "Metadata Naming Convention". It provides a name string and data-type definition for each enumerated value. It also includes a small set of of convenience methods for creating new instances of GvrsMetadata objects from enumeration values. Examples are shown below:

    // Use the enumeration to look up the name to be used for copyrights
    String copyrightNameString = GvrsMnc.Copyright.name();

    // Write the copyright using either the name string or the enumeration vallue
    gvrs.writeMetadata(copyrightNameString, "Public domain.");
    gvrs.writeMetadata(GvrsMnc.Copyright,   "Public domain.");

    // Use the enumeration value to create a new instance of a metadata object.
    GvrsMetadata copyrightMetadata = GvrsMnc.Copyright.newInstance();
    copyrightMetadata.setString("This data product is in the public domain.");
    gvrs.writeMetadata(copyrightMetadata);

Finally, here's an example of some code that loops through the various values defined by the GvrsMnc enumeration and prints their content. Some of the pre-defined enumeration values provide descriptions, though that feature is optional.

    for(GvrsMnc mncValue :  GvrsMnc.values()){
      System.out.format("%-25s %-15s %s%n",
        mncValue.name(),
        mncValue.getDataType(),
        mncValue.getDescription());
    }

And here are the results:

    Name                      Type            Description 
    ---------------------     ----------      ------------------------------------------------------
    Author                    STRING          The person or organization that created a data product.
    Copyright                 STRING          
    TermsOfUse                STRING          
    Disclaimer                STRING          
    TIFF                      UNSPECIFIED     Tagged Image File Format (Tag specification)
    WKT                       STRING          Well-Known Text (map specification)
    GvrsJavaCodecs            ASCII           Classpaths for codecs (Java only)
    GvrsCompressionCodecs     ASCII           Identification key for compression codecs (all programming languages)

Future work on standardization

At this time, the set of standardized metadata names is quite small. It is easy to envision a larger set of specifications. However, creating a good standard really requires the participation of a well-informed community of users. So the expansion of the standarized name set will have to wait until such time as the GVRS user base grows to the point where it is feasible to do so.

Using metadata to create a georeferenced image

Having covered the GVRS metadata classes and methods, we are now ready to discuss how GVRS' support of metadata allowed us to create the Google Earth image that I showed at the beginning of this article. While Google Earth does not support GVRS data products, it does support georeferenced TIFF formatted images. So I used GVRS and modules from the Apache Commons Imaging software libary to produce a GeoTIFF that could be displayed in Google Earth.

The Tagged Image File Format (TIFF) is an older image format that continues to see wide use in computerized mapping applications because it provides excellent support for geospatial metadata. In fact, we borrowed some of the ideas for our own metadata implementation from the TIFF standard. A TIFF file consists of a number of metadata-related entities called "tags" that are closely related to GvrsMetadata. I extracted data from these tags to obtain geospatial metadata to a GVRS data file.

For this example, I started with a GeoTIFF that provided a raster grid of elevation data over the area of Nantes, France. The file contained georeferenced metadata that would have allowed it to be displayed on Google Earth, except that Google Earth does not implement logic for mapping raw elevations to colors. So what I needed was a way to assign color and shading to the source data to produce a georeferenced image. To do so, I took advantage of tools included in the Gridfour software library and the GVRS API.

To produce the image, I used a four-step process:

  1. Create a GVRS file using the elevation data and, most importantly, the geospatial metadata from the source TIFF file
  2. Create a shaded relief image. Use the GvrsInterpolatorBSpline to compute elevations and surface normals (illumination angles) across the data raster. Use the ColorPaletteTable classes to assign color by elevation.
  3. Use the results from the shaded-relief rendering to create a new TIFF file. Transcribe the metadata from the GVRS file to populate the geospatial metadata in the TIFF file.
  4. Import the resulting TIFF file into Google Earth for plotting.

The above example is, admittedly, a bit contrived. But it illustrates how metadata can be used to integrate GVRS products into other data systems.

And, finally, the shaded-relief example also gives us an opportunity to look at how we can use the record-ID element of the GvrsMetadata class. While GvrsMetadata objects are identified by name and a numeric ID, TIFF tags use just a numeric ID. The TIFF specification includes quite a few tag definitions, but for this article we will limit ourselves to just one: TIFF tag number 33922, the “Model Tie Point" tag. Tag 33922 gives a set of floating-point parameters for tying pixels to real-valued Cartesian coordinates. To transcribe the Model Tie Point tag to a GvrsMetadata element, we simply declare a GvrsMetadata object with the name “TIFF” and the record ID 33922.

    GvrsMetadata modelTiePoint = new GvrsMetadata(“TIFF”, 33922, GvrsMetadataType.DOUBLE);
    modelTiePoint.setDescription("TIFF Tag 33922: Model tie point");

This variation of the constructor for GvrsMetadata takes a record ID as its second argument. We supply the TIFF tag ID as the record ID. Later, when we wish to produce an output GeoTIFF file from our GVRS product, we can use the record ID from the metadata to specify a tag ID for the TIFF file.

The TIFF Model Tiepoint Tag takes six floating-point parameters. These give the pixel coordinates for an arbitrary reference point (the "tie point") on the image and a set of real-valued "model" coordinates associated with that reference point. The model coordinates can be either geographic coordinates or, as in this case, map projection coordinates. So we had:

  1. Pixel coordinates (0, 0, 0) for a "tie point" at the upper-left corner of the source image.
  2. Map projection coordinates (273987.5, 2291012.5, 0) for the tie point.

We added the parameters to the metadata using the following:

     Double []mtp = new double[]{0, 0, 0, 273987.5, 2291012.5, 0};
     modelTiePoint.setDoubles(mtp);

Completing the georeferencing operation did require a few more metadata elements. But the Model Tie Point example will suffice to illustrate the idea of how GVRS metadata can be specified to preserve georefrencing information.

Incidentally, the algorithms we used for illuminating our shaded-relief image are described in detail in a two-part article at Elevation Data from Cloud-Optimized GeoTIFFs. The article also provides a lot of information about how GeoTIFF tags are specified and can be interpreted.

Note: The code that was used to process the data is not yet ready for distribution.
When it is complete, we will provide it as part of the Gridfour "Demo" module.

Conclusion

This article concludes our five-part "Jump Start" series describing GVRS concepts. Future articles will delve into algorithms and software coding techniques that may be useful to developers working with the Gridfour library in general and the GVRS API in particular.

I finish this article with a reminder that it is not always easy knowing what developers need to know to use an API. So your feedback and questions will go a long way toward letting us know what the best way to document our software. If you wish, feel free to post comments our our Gridfour Discussions Page or contact us directly.