A GVRS FAQ - gwlucastrig/gridfour GitHub Wiki

What is GVRS?

The Gridfour Virtual Raster Store (GVRS) is a software tool that uses data files to manage raster (grid) data products. It is particularly useful for grids that are too large to be stored completely in memory.

What does GVRS do?

GVRS implements a fast data cache that automatically moves raster data to and from memory and files in a seamless manner.

  1. Because every GVRS raster product is backed up by a data file, only part of the grid needs to be kept in memory at any time.

  2. Data files can be retained between different program-execution sessions or shared between different applications.

  3. GVRS implements custom data-compression techniques that significantly reduce the size of files needed to store data.

  4. GVRS is designed for simplicity of implementation and to make efficient use of resources. These features make GVRS suitable for operation on single-board computers (such as Raspberry PI) and other small processors.

  5. GVRS is implemented as a software library that can be integrated into applications. It has a small code footprint, is sparing in its use of memory, and does not introduce secondary dependencies into a code base.

How can I use GVRS?

We thought of a few potential uses for GVRS. We hope that our users will find many more:

  1. GVRS efficient caching mechanism and fast performance makes it suitable in server-side applications that analyze data and create raster data products.

  2. GVRS data compression allows it to compactly store large data sets in long-term archives. Data compression is also useful in when data is transmitted across bandwidth-limited communication channels.

  3. The GVRS API's light processing load makes it suitable for deployment on autonomous platforms such airborne survey platforms and Unmanned Underwater Vehicles (UUVs).

  4. The GVRS open-architecture can assist you in developing and testing your own data compression algorithms.

What's the best way to get started with GVRS?

We've written a series of "jump start" articles that show how to use the GVRS library. The Jump Start articles cover the basics of using our software and provide details to help you get the best performance when integrating GVRS in your own applications.

Why did you write the GVRS API?

We started the GVRS project because we needed a good test platform for investigating new ways of performing lossless data compression on numerical raster products. Although there were lots of platforms for working with raster data, none of them provided straightforward ways to integrate new data compression logic with the existing code.

At the beginning, we saw GVRS as an experimental tool. Later on, we realized that there could be a niche for a data format and API that struck a balance between simplicity and performance. So that became our motivation for continuing with the project. Future development will depend on finding a user base and identifying interesting enhancements.

How well does the GVRS data compression work?

GVRS implements data compression for two kinds of data: integers and floating-point values. In both case, the compression is lossless. In other words, the extracted output is an exact match for the input. The compression is also implemented as an on-demand feature. Data is compressed and decompressed a piece-at-a-time, so that an application does not actually have to decompress the entire data set in order to access values in a single location.

As to how well it works... Integer data tends to compress quite well. In our tests with data from the Shuttle Topographic Radar Mission (SRTM), the GVRS compression logic between 1.25 and 2.3 bits per sample depending on the local terrain for the area studied. For more detail, see Lossless Compression for Raster Data Using Optimal Predictors.

Floating-point data presents a challenge for most data compression implementations. Let's take a look at how GVRS compares to two other compressed data formats: NetCDF and TIFF. GEBCO_2019 is a global-scale elevation and bathymetry data set that is provided in NetCDF format. It contains over 3 billion sample points. The San Jacinto data set is provided as a TIFF file (USGS_13_n34w117.tif) giving a high-resolution data set consisting of about 155 million points. Both the NetCDF and TIFF formats implement custom data compression for floating point data. Here's how they compare:

Product Standard Bits/symbol GVRS bits/symbol
GEBCO_2019 (NetCDF) 25.13 15.35
San Jacinto (TIFF) 24.09 16.42

As you can see, the GVRS approach offers an improvement over both TIFF and NetCDF. Our technique is documented at Lossless Compression for Floating Point Data. Of course, NetCDF is an actively supported data format. So, it wouldn't surprise us if they were able to leverage our work to improve their own results in the near future.

Is GVRS fast?

We like to think so. You can read about our test results at GVRS Performance for Data Reading, Writing, and Compression.

Is GVRS intended to compete with NetCDF?

No. Not even close... NetCDF is a well-established, widely used raster data-exchange format that is well-funded and has a large user base and developer community. GVRS is a new project with a tiny user base and a small group of volunteer developers. More importantly, the two products tackle much different problems. NetCDF is primarily intended as a tool for distributing data. GVRS is primary intended as a tool to assist in its production.

The other difference between the two software products is complexity. NetCDF implements a broad set of requirements. But providing so many features leads to a more complex API and file format. GVRS attempts to maintain simplicity by focusing on a core set of requirements and streamlining its API. That being said, we do not mean to imply that there is anything wrong with the NetCDF implementation. The Java source code for NetCDF is solid and well executed. In fact, NetCDF set the bar pretty high for the GVRS implementation. We are doing our best to live up to that standard.

Is GVRS a GIS Tool?

GVRS is a general purpose raster data processing utility. It is not a Geographic Information System (GIS) tool.

A lot of the examples we use on the Gridfour software project website are based on geophysical or environmental data. But that's mainly a matter of convenience. Geographic data is easy to find and easy for readers to relate to. So it provides an effective subject for illustrating some of our software concepts.

Of course, GVRS would be useful for Geospatial processing as part of applications that combined it with conventional GIS libraries. And it would be a natural candidate as for inclusion in a geodatabase or other GIS implementation.

Are GVRS products portable across different operating systems?.

Yes. The GVRS file format specification deliberately avoids platform-specific features.

Why is GVRS written in Java?

Well, we had to start somewhere. GVRS is a new product and Java is just the first language we chose for development. We hope that eventually GVRS will be ported to other languages such as C/C++, C#, and Rust. And we'd really like to see a Python package compatible with numpy.

From the start, we designed GVRS so that it would be easy to port to other languages or development environments. To that end, our software and data format aims for simplicity, consistency, and predictability. We also worked to keep our feature set small and nimble, so that translating the API would be a manageable task.

Why is it important to port GVRS to other languages?

First off, we're not trying to build an empire. Honest!

The idea behind porting GVRS to other languages is that it is the best way to ensure that data stored in GVRS files remains accessible to its users and authors, even if they are not running in a Java environment. When somebody invests a lot of effort into creating a data set, they should expect to be able to use that data for years to come. Porting GVRS to additional languages is one way of preserving data products from accidental obsolescence.

Is the GVRS file format documented?

Yes. A PDF document giving a full description of the GVRS file format can be found under the Gridfour Project Notes page. For anyone interested in working on the GVRS API or implementing new code to read GVRS files, we've also posted a suite of test files as part of the Resources collection in our standard software distribution.

How do you pronounce "GVRS" ?

In English, you may find it convenient to pronounce GVRS as "givers". It will be interesting to hear what speakers of other languages come up with.

What was the Gem93 project?

We mentioned Gem93 on our main page. You may be wondering what it was all about. Gem93 was a software project that anticipated many of the features that are incorporated into GVRS. It implemented predictor-based data compression, tile-indexing, dynamic file-space management, and a data caching mechanism that resembles the one used in GVRS today. Although the state-of-the-art has advanced in the 30 years since Gem93 was created, it is still in use at a few sites today. So that's a good legacy. And we will end this FAQ by acknowledging our roots. To Rick, Laura, Willi, and John... Thank you for the work you did then and the inspiration you've given to the GVRS project now.