GVRS Using Multiple Threads to Speed Processing - gwlucastrig/gridfour GitHub Wiki

Introduction

As of release 1.0.4, the Gridfour software library supports the use of multi-threading techniques to expedite the reading and writing of compressed data.

These notes describe the multi-threaded techniques that the Gridfour API uses for writing and reading data to and from GVRS files. They include code examples and enough background information to permit an application developer to use the multi-threaded options effectively.

Reading data using multiple threads

Throughout these wiki pages, we use the ETOPO1 global elevation and ocean depth data set as an example product for describing the Gridfour API. ETOPO1 consists of a grid of 233 million data points. When stored without compression in the Gridfour Virtual Raster Store (GVRS) file format, it requires 456 megabytes of file space. When stored with compression, the ETOPO1 data set requires only 106 megabytes. But that reduction in storage space comes with a cost. The Gridfour API can read the entire uncompressed version of ETOPO1 in 0.28 seconds. Reading the compressed version requires 3.09 seconds. So 90 percent of the time required to read the compressed version of the data is due to the extra processing required to decompress it.

Clearly, we’d like to speed up file read operations. Multi-threading offers a way to do just that. By sharing the decompression operations across multiple concurrent processes, multi-threading techniques significantly reduce the amount of time required to access data. In fact, the Gridfour multi-threaded implementation allows us to read the compressed ETOPO1 file in just 1.88 seconds. [1]

How multi-threaded reading works

A GVRS file is organized into tiles. When processing large raster data sets, the overall grid is divided into smaller sections, or "tiles", which can be read from a data set independently. The figure below shows one such tiling scheme. [2]

ten-degree tile scheme

When an application needs a single data value, the Gridfour API identifies the tile that contains the point of interest and reads it from the file. For a large number of read operations, multiple tiles may be required. And naturally, each tile read operation carries two cost factors: the time needs to read the encoded information from its data file, and the time needed to decompress that information so that it can be used. To avoid reading the same tile multiple times, the Gridfour API maintains an in-memory cache of decompressed tiles that are retained when needed and discarded when no longer required.

Ordinarily, the time needed to read those tiles is a sequential combination of file access and decompression. But, if the application accesses tiles in a predictable order, the GVRS library can anticipate the next tile that the application will require. Using that prediction, it decompresses the content of the anticipated tile in parallel with its main processing operations. Since the decompression happens in a parallel thread, it does not contribute to the overall time required to read the data. The anticipated tile is stored in the tile cache and kept ready until needed.

In practice, predictions are not always perfect. But, even if a predicted tile is not used immediately, it will be added to the GVRS cache and available for future use. As long as there is a reasonably high probability that it will be accessed before it ages out of the cache, having it available represents a net gain in performance.

The following code snippet shows an example of how an application could open a GVRS file for reading and enable multi-threaded processing:

    File inputFile = new File("ExampleFileWithCompression.gvrs");
    try(GvrsFile gvrsFile = new GvrsFile(inputFile,  "r"))
    {
        gvrsFile.setMultiThreadingEnabled(true);
        // Other operations as defined by application
    }catch(IOException ioex){
        // Action as required by application
    } 

Limitations

There are, of course, situations in which future requirements for tiles cannot be predicted. If an application accesses the grid in random order or across multiple scattered locations, predictions do not help. The GVRS API is designed so that the cost of unsuccessful predictions is small. Also, practical considerations currently require that all I/O operations for file-read access be performed in the primary GVRS thread (the thread from which the API is invoked). Even so, the multi-threaded approach is often worth consideration.

Writing data using multiple threads

When writing data to a GVRS file, the largest contributor to the overall time to process the data is the compression functions. This time can reduced by using multiple threads to conduct different processing operations in parallel.

The Gridfour library implements three separate compressors:

  1. Huffman coding
  2. Deflate (the algorithm used in "zip" files)
  3. Lewis-Smith Optimal Predictors (LSOP).

All three compressors can reduce the size of a raster data set. In general, Deflate works better than Huffman, and LSOP works better than Deflate. Also, in general, the more powerful compressors require more processing time than their weaker counterparts. But, depending on the nature of the data, there are times when that extra processing does not necessarily yield better results. Sometimes the nature of the data is such that the more effective compressor turns out to be Huffman, or Deflate, rather than LSOP.

Because there is no reliable way to predict which compressor will work best, the GVRS API tries all three. When storing data, it compares the results for each of the compressors and and picks the one that produces the smallest output. Thus the processing-time cost of data compression is the aggregate cost of all three compressors.

Using the optional multi-threaded approach, the GVRS API attempts to reduce the time required to compress data by running the three compressors in parallel. The table below gives times-to-compress results from tests conducted using the PackageData.java application which is included in the Gridfour software distribution. The tests were run on a medium-powered laptop computer with a fast Solid-State Drive (SSD) and multiple CPU's. Both the ETOPO1 and GEBCO 2020 products give worldwide surface elevation and ocean bathymetry data. GEBCO 2020 has a higher resolution and contains a substantially larger number of grid cells than ETOPO1, thus it requires more processing time.

In both cases, the use of multi-threading reduces the time to process by nearly 50 percent.

Product Grid Size Single-Thread Multi-Thread
ETOPO1 10800 x 21600 120 seconds 61 seconds
GEBCO 2020 43200 x 86400 1985 seconds 1056 seconds

Activating the multi-threading option

By default, the multi-threaded compression option is not enabled. To activate it, an application simply switches on the option after a GvrsFile is constructed. An example is shown in the code fragment below:

    GvrsFile gvrsFile = new GvrsFile(outputFile,  gvrsFileSpecification);
    gvrsFile.setMultiThreadingEnabled(true); 

How different compressors affect time-to-process

The Gridfour compressors tend to be CPU-intensive processes. In fact, when we tested the time to store the ETOPO1 data in uncompressed format running on the same machine that produced the results above, the data set required only 7.5 seconds for processing. So almost 94 percent of the run time for ETOPO1 was spent compressing the data.

Running the compressors through a profiler (from the Netbeans IDE), we recorded the following times for processing ETOPO1:

Compressor Time (sec)
Huffman 21.3
Deflate 38.8
LSOP 46.7

When both ETOPO1 and GEBCO 2020 were processed in single-thread mode, the overall contribution of data compression to the total time required to process the data was the sum of the individual times for processing. When the three compressors ran in parallel, the two faster compressors (Huffman and Deflate) would finish before LSOP, but the compression process could not complete until all three finished. Thus, the total time to process was driven by the time required for the longest-duration processor.

Coordinating a group of threads

As noted above, the main compressor module needs to wait for all three compressors to finish their work before it can move forward with its own processing. Thus the compression module implements a kind of "barrier" point where it pauses until the individual compressors are finished. The Java standard API features two classes that provide execution barriers: CyclicBarrier, and Phaser. Either class would have worked for our compression threads. But, for the Gridfour software library, we also wanted to take advantage of the capabilities of a Java thread pool. In particular, we wanted to be able to reuse our compression threads and avoid the overhead of creating and terminating a large number of threads. So we implemented a class called TaskGroupExecutor that extends the Java ThreadPoolExecutor and adds logic for waiting for a group of tasks to complete.

The use of the TaskGroupExecutor is illustrated in the code fragment below:

    System.out.println("Begin Test");
    
    TaskGroupExecutor exec = new TaskGroupExecutor(4);  // Allocates 4 threads

    exec.groupExecute(new TestRunnable("alpha  ", 13)); // The test Runnable prints start message
    exec.groupExecute(new TestRunnable("beta   ", 5));  // including its name and its thread identifier,
    exec.groupExecute(new TestRunnable("gamma  ", 3));  // then sleeps the designated time, and
    exec.groupExecute(new TestRunnable("delta  ", 2));  // finally prints an end message.

    exec.groupWaitUntilDone();
    
    System.out.println("Completion of Test");
    exec.shutdown();

The call to groupExecute() indicates that the Runnable being executed is to be treated as part of a group of tasks. The groupWaitUntiDone() method simply waits until all tasks in the group have completed. The TaskGroupExecutor also implements logic for exception handling and determining the status of a processing operation. Like the Java ThreadPoolExecutor, the TaskGroupExecutor can be used and reused many times until it is no longer needed. When an application is done using the executor, it can dispose of it through a call to shutdown().

The example code shown above prints the following text:

     Begin Test
     Start alpha   pool-1-thread-1
     Start delta   pool-1-thread-4
     Start gamma   pool-1-thread-3
     Start beta    pool-1-thread-2
     End   delta
     End   gamma
     End   beta
     End   alpha
     Completion of Test

The TaskGroupExecutor class is part of the public API in the Gridfour software library and can be used by any application that requires its functions. You may view the source code for the class at TaskGroupExecutor.java.

GVRS compressor classes are not required to be thread-safe.

When a GVRS file is opened for writing compressed data, a single instance of each compressor is created for internal use. If the application performs multiple write operations over the lifespan of a GVRS file object, the internal compressor instances are used over and over again. No new compressor objects are be constructed. But even though compressor objects can be used multiple times, GVRS never accesses the same compressor object simultaneously in different threads. So there is no requirement for GVRS data compressors to be thread-safe.

Future work for data compression

The time required to compress data is driven by the longest of the three compressor procedures, the LSOP implementation. The first step in LSOP compression is an analysis operation that processes a grid to compute prediction parameters. As with many grid-based operations, the analysis is easily divided into operations over a set of smaller grids that could be processed using parallel threads. This analysis is followed by a pair of conventional compression operations (Huffman and Deflate) that could also be conducted in parallel. So it is feasible to further reduce the time required by the LSOP compressor, and thus the overall time to process a GVRS data set. This idea will be the topic of future investigations.

To learn more about the Lewis-Smith Optimal Predictor (LSOP) compression technique, see our Application Note Lossless Compression for Raster Data using Optimal Predictors

A caveat

When using the multi-threaded options, it is imperative that the data files be closed through a proper call to the GvrsFile class's close() method. This practice is always advised when working with file-based API's because it disposes of any system resources (such as open file handles) that may be associated with the class instance. But when working with the GVRS multi-threaded options, it is additionally important because the GvrsFile class uses the close() method to terminate the background threads and ensure that they are not left running after the file is no longer needed.

In practice, we like to use the Java try-with-resources construct to ensure proper file closing.

    try(GvrsFile gvrsFile = new GvrsFile(inputFile,  "r"))
    {
        gvrsFile.setMultiThreadingEnabled(true);
        // Other operations as defined by application
    }catch(IOException ioex){
        // Action as required by application
    } 

When used in a try-with-resources block, the GVRS API guarantees that background threads will be terminated even if I/O based operations terminate improperly due to an unexpected I/O exception.

Notes

[1] Times for data reading operations are based on the "Tile Load" operation described in GVRS Performance for Data Reading, Writing, and Compression ). Timing data was collected on a mid-range laptop computer with a Solid State Drive (SSD) storage device.

[2] Although the picture shows a 10 degree tiling scheme, the actual ETOPO1 implementation uses a smaller tile size. The larger size was used for purposes of illustration.