Overview

GrowClust3D is a new-and-improved Julia implementation of GrowClust, a software package for the relative relocation of earthquake hypocenters. Just as in the original (Fortran90) implementation, GrowClust3D uses a clustering approach to relocate earthquake catalogs based on differential time data obtained (externally by the user) from waveform cross-correlation. The Julia release includes several new features that will likely be of interest to users, including:

the ability to relocate using travel time grids obtained from 3D models
the option to parallelize uncertainty quantification through multithreading or multiprocessing
the explicit use of map projections and station elevations during the relocation process
the removal of array size constraints, which are now determined on the fly
the refinement of several program defaults and definitions to improve algorithm robustness
and more yet to come!

Schematic workflow for GrowClust3D

The 3D relocation capabilities comes through integration and compatability with those produced by NonLinLoc (https://github.com/alomax/NonLinLoc), which allows for self-consistency between absolute and relative relocation estimates.

If you use GrowClust3D in your work, please take care to cite the following manuscript:

Daniel T. Trugman, Calum J. Chamberlain, Alexandros Savvaidis, Anthony Lomax; GrowClust3D.jl: A Julia Package for the Relative Relocation of Earthquake Hypocenters Using 3D Velocity Models. Seismological Research Letters 2022; 94 (1): 443–456. doi: https://doi.org/10.1785/0220220193

Installation

To use these codes, you will first need a Julia installation. GrowClust3D is a Julia package;if you are new to Julia, please take a look at the excellent documentation and tutorials to find out more:

While in an initial testing stage, GrowClust3D is an "unregistered" Julia package. However, the package can and its dependencies can still be installed using the Julia Pkg manager:

pkg> add https://github.com/dttrugman/GrowClust3D.jl

Then GrowClust3D can be activated within a Julia script just like any other package.

Once installed, please do run the test set to check your installation by running the suite of test problems:

pkg> test GrowClust3D

Note: The GrowClust3D software was finalized using Julia v1.8. It is recommended to use this version or later; older versions may work but there can be compatibility issues with the packages GrowClust3D depends on. For example, Proj.jl requires v1.6 or later at present writing.

Algorithm Control File

This a text file that is used to specify the file paths and run parameters for GrowClust3D problem. The convention is to denote the file with a ".inp" extension. There are several example .inp files with different configurations listed in the /examples directory of the GitHub repository.

Once you have a GrowClust3D script to run (e.g. my_example.jl) and an control file (e.g. example.inp), you run the program in the command line using syntax like:

julia my_example.jl example.inp

Note: Most users will want to copy and use the example .jl scripts from the example repository with little or no modification. The .inp files can then be used to tailor GrowClust3D to your dataset and specifications.

The control file needs to include the following information, in order. Comment lines are denoted with a leading "*" and are ignored, as are empty / blank lines. Most of the lines correspond to information analogous to the control files used by the original Fortran90 release. In numerical order, the lines should specify:

The input event list format.

This should be an integer between 1 and 3 that indicate how the input catalog columns are formatted. The default value is 1, this format is used for the cases presented in the /examples repository. The possible formats are:

         1 --> yr mon day hr min sec lat lon dep mag errh errz rms evid
         2 --> yr mon day hr min sec evid lat lon dep mag
         3 --> evid yr mon day hr min sec lat lon dep rms errh errz qmag

File path to the input event list.
Input station list format.

This should be an integer, 1 or 2. Longitude and latitude information should be in decimal degrees, the station name should not include spaces, and elevations should be in meters. If you select option 1, all stations will be assumed to be at 0 elevation. The two formats are:

1 -->sname lat lon
2 -->sname lat lon elev

File path to the input station list.
Cross-correlation data formats.

This line has two numbers. The first should be "1", denoting that the cross-correlation file is in a standardized text format (more on that below). The second number indicates the sign convention for the differential times data. "12" means differential times are the travel time from event 1 minus the travel time from event 2. "21" is the opposite sign convention, event 2 minus event 1.

The expected format for the differential time data is as follows. It should contain a series of event pair lines followed by the corresponding differential time lines. Typically, there are several differential time lines following an event pair line. Each event pair line starts with a "#" and contains the two event IDs and an optional origin time adjustment term. Each differential time line includes, in order: the station name, the differential time measurement, the cross correlation value, and the phase ID (P or S). There can be as many or as few differential times per event pair as you like, in any order.

An example would be the following:

#   956586   956621  0.000
    PEA -0.05600 0.9300 P
    PAH -0.05900 0.9300 P
    WVA -0.06500 0.6900 P
   SRVL -0.05900 0.8600 P
   SRV2 -0.05600 0.7100 P
    PEA -0.05200 0.9200 S
    PAH -0.05600 0.9300 S
    WVA -0.05600 0.9900 S
#   956586   956790  0.000
    PEA -0.13500 0.9000 P
    PAH -0.15600 0.9300 P
    ... etc.

File path to the cross-correlation data.
Travel time table mode.

This should either be "trace" if you would like GrowClust3D to perform ray tracing with a 1D model (comparable to the original GrowClust software or "nllgrid" if you would like to use precomputed travel time grids from NonLinLoc (https://github.com/alomax/NonLinLoc). These modes will control the required information in some of the subsequent lines.
Velocity model name.

In "trace" mode (see item 7 above), this should specify a file name for a 1D model to ray trace. In "nllgrid" mode, this should be the name of the velocity model used by NonLinLoc to create the grids, i.e. the first part of grid file names per NonLinLoc's file naming convention.
Travel time grid directory.

In "trace" mode, text files displaying the generated travel time grids will be placed here. In "nllgrid" mode, this should specify the directory where your precomputed NonLinLoc travel time grids live. Include a "/" at the end of the file path.
Projection information.

This line specifies the PROJ4-type map projection (https://proj.org/development/quickstart.html) to use in the location. It is important that the projection is consistent with the one used in NonLinLoc, should you be in "nllgrid" mode. Note however that NonLinLoc and PROJ4 have slightly different naming conventions.

The format for this line is: proj ellps lon0 lat0 rotA [latP1 latP2] where:
proj: name of projection: "tmerc", "merc", "lcc", or "aeqd" (Transverse Mercator, Mercator, Lambert Conformal Conical, Azimuthal Equidistant)
ellps: name of reference ellipse, e.g. "WGS84", "GRS80", "clrk80", etc. (see PROJ4 documentation for more).
lon0 and lat0: center point for projection, should be near center of study region
rotA: rotation angle, should be 0 unless you would like to rotate the coordinate system
latP1 and latP2: these are only required for a "lcc" projection and denote the two parallels
Ray tracing information.

The first number here sets a default Vp/Vs ratio to use in cases where the velocity model only specifies P-wave velocities. The second number is only required in "trace" mode and sets the minimum ray parameter to use. The default value is 0.0 which will allow refracted rays to be first arrivals in some cases.
Travel time grid information: vertical.

In "trace" mode, this line lists the start(top), end(bottom), and interval(step) between travel time grid depth points (three numbers total, units of km, positive down). In "nllgrid mode", only two numbers are needed: the start(top) and end(bottom), and can be used to limit the memory requirements in cases where precomputed grids are larger than needed for the problem size. The start and end points should span the range of possible event depths of your dataset.
Travel time grid information: lateral.

In "trace" mode, this line lists the start, end, and interval(step) between travel time grid horizontal offset points (three numbers total, units of km). In "nllgrid mode", only either two or four numbers are needed: the start and end points for horizontal offset, or for x and y coordinates separately (xmin xmax ymin ymax). Whether two or four numbers are needed depends on whether the grid defines a 1D or 3D velocity model. The start and end points should span the range of possible horizontal positions of your dataset.
Event pair similarity parameters.

Three numbers total are required: rmin delmax rmsmax:
rmin: minimum cross-correlation coefficient (r) for differential times used in the computation of the GrowClust event-pair similarity coefficient.
delmax: maximum station distance used for differential times used in the computation of the GrowClust event-pair similarity coefficient.
rmsmax: maximum root-mean-square (rms) differential time residual for a proposed cluster merger to be allowed during the relocation algorithm. This value can be adjusted up or down depending on the quality and geometry of your dataset; longer path lengths or lower quality data may need higher rmsmax values.
Cross-correlation file threshold parameters.

Four numbers total are required: rpsavgmin rmincut ngoodmin iponly:
rpsavgmin: minimum desired average cross-correlation coefficient to keep event pair (set this to 0 to keep all in the input cross-correlation data file).
rmincut: minimum desired cross-correlation coefficient to keep a differential time observation (set this to 0 to keep all in the input cross-correlation data file)
ngoodmin: minimum number of differential time observations with cross-correlation coefficients greater than rmin to keep an event pair (set this to 0 to keep all in the input cross-correlation data file)
iponly: (0 or 2) keep both P and S-phase differential time, or (1) keep only P-phase
Bootstrapping information.

Two numbers are required: nboot nbranch_min:
nboot: number of bootstrap iterations desired for event location uncertainty analysis (set to zero to skip this)
nbranch_min: minimum nbranch (number of events/cluster) to be output in cluster file
Name/path of output catalog file.
Name/path of output cluster file (can be NONE).
Name/path of output log file.
Name/path of output bootstrap file (can be NONE).

Input Files

In addition to the control file, GrowClust3D requires four different input files:

A initial event catalog that lists the starting location of each earthquake
A station list with the geographic location of each station with a differential travel time measurement
A differential time file with cross-correlation measurements from pairs of earthquakes
A 1D velocity model to ray trace or set of travel time grids

Allowable formats for items (1), (2), and (3) are described above. All three are text files; custom binary formats are no longer supported but can be implemented by the user if needed.

For item (4), if one is using a 1D velocity model and the GrowClust3D ray-tracer, the text file should have three columns: depth Vp Vs specifying the top of a layer (km), the P-wave velocity (km/s) of that layer, and the S-wave velocity of that layer. If the S-wave velocity is unknown, set that field to 0.0 in the input file, and the program will infer the appropriate Vs from Vp and the Vp/Vs ratio specified in the control file.

In generating the travel time tables, GrowClust performs linear interpolation to subsample the input velocity model at regular but finely spaced intervals between the input depth points. To preserve ``layer-cake'' structure, one must craft the input file accordingly. As a simple example, a 3-layer model with Vp of 4.0, 5.0, and 6.0 km/s at layer (top) depths of 0.0, 5.0, and 10.0 km should be input as:

 0.0 3.0 0.0 
 5.0 3.0 0.0
 5.0 4.0 0.0
10.0 4.0 0.0
10.0 5.0 0.0

The first two lines correspond to the top layer, the next two to the second layer, and the last one to the bottom layer (a halfspace). In contrast, a model of

 0.0 3.0 0.0
 5.0 4.0 0.0
10.0 5.0 0.0

implies a linear velocity gradient between the depth points 0.0, 5.0, and 10.0 km.

If instead one is using pre-computed travel time grids, these need to be in NonLinLoc (https://github.com/alomax/NonLinLoc) format with a consistent map projection to the one specified in the control file. These can be grids corresponding to 1D or 3D velocity models, and there should be one file for each station/phase combination where there is a differential time measurement.

Output Files

GrowClust generates four output files, two of which are optional.

An output relocated catalog
A log file containing run information and statistics
Optional: A cluster file with events and relative positions grouped by cluster
Optional: A bootstrapping file with a detailed breakdown of event locations for each bootstrap resampling.

The most important output file to understand is item (1). Each line of the file corresponds to a different event. The format is the same as in the original Fortran90 release:

yr mon day hr min sec: relocated origin time (columns 1-6)
evid: event ID (column 7)
latR lonR depR: relocated latitude, longitude and depth (decimal degrees and km; columns 8-10)
mag: event magnitude (column 11)
qID cID nbranch: event serial ID number, cluster serial ID number, total number of events in this cluster (columns 12-14)
qnpair qndiffP qndiffS: number of event pairs, P-phase differential times, and S-phase differential times used to relocate this event (columns 15--17)
rmsP rmsS: RMS residual differential times for this event for P-phases and S-phases (s; columns 18-19)
eh ez et: estimated location errors in horizontal (km), vertical (km), and origin time (s; columns 20-22)
latC lonC depC: initial (catalog) latitude, longitude and depth (decimal degrees and km; columns 23-25)

Running the Program

For most users, Julia scripts listed in the examples/ repository can be copied and reused with only minor (if any) modification. Each script consists of the Julia code that executes the same basic set of steps:

Defining some global program constants (these are parameters in grow_params.f90 of the Fortran release)
Reading in the algorithm control file and its parameters
Parsing and organizing input files (event and station lists, travel time data)
Implementing quality control and consistency checks
Projecting geographic data into an appropriate coordinate system
Ray-tracing or reading in pre-computed travel time grids for each station
Computing event pair similarity factors
Running the core clustering and relocation algorithm to reconcile observed and predicted differential times
Bootstrap resampling runs for uncertainty quantification (optional)
Writing output files and logging run statistics

The examples/ directory has two different Julia (.jl) "run/driver" scripts: run_growclust3D.jl as a reference example for typical usage on a single processor, and run_growclust3D-MP.jl which is similar in spirit but designed for multiprocessing on multiple cores. These two .jl scripts can be copied and reused as examples for any generic GrowClust3D problem with little/no modification.

The primary way to tailor GrowClust3D to your dataset is by modifying the input files. With this in mind, in examples/ there are four different example input (.inp) files related to the Spanish Springs, Nevada earthquake sequence. Any of the input files can be paired with either of the two run scripts. The example input files are summarized as follows:

example.trace1D.inp: example raytracing a 1D velocity model
example.nllgrid1D.inp: example using precomputed (NonLinLoc) 1D travel time grid
example.nllgrid3D.inp: example using precomputed (NonLinLoc) 3D travel time grid (see note below)
example.nboot100.inp: example with bootstrapping (e.g. to test parallelization)

Example 1 uses internal ray-tracing codes and emulates the classic GrowClust example from the original Trugman and Shearer (2017) publication. Examples 2 and 3 use 1D and (quasi)-3D travel-time grids generated by NonLinLoc (https://github.com/alomax/NonLinLoc) from an equivalent velocity model. Example 4 is identical to Example 1 but with bootstrapping turned on. This is useful for testing the parallelization described below.

Note: To run Example 3, you will first need to generate the 3D travel time grids. If NonLinLoc is installed and its source files added to your path, simply navigate to the examples/ directory and run the driver script:

julia make_nllgrids.jl

This can also be used to regenerate the grids for Example 2. The reason why the 3D grids needed for Example 3 are not on the repo is that they take up a lot of memory, making it hard to pull or sync the repository.

The central difference between the two example run (.jl) scripts is in their mode of parallelization. The run_growclust3D.jl script can use multithreading on a single computational core to accelerate bootstrap uncertainty analysis. This is useful but limited to the number of threads accessible. The run_growclust3D-MP.jl script can use multiprocessing to further accelerate these calculations in cases where multiple cores are available (e.g., runs on a computing cluster). Generally, if sufficient resources are available, multiprocessing will be faster than multithreading (especially for large datasets) due to memory considerations. For most use cases, the computational overhead in transferring data to the additional processors is negligible compared to additional compute power. Below are some example use cases of the codes:

Simple examples without bootstrapping (good for testing):

julia run_growclust3D.jl example.serial1D.inp

julia run_growclust3D.jl example.nllgrid1D.inp

julia run_growclust3D.jl example.nllgrid3D.inp

Multithreading and Multiprocessing examples:

julia -t4 run_growclust3D.jl example.nboot100.inp

julia -p10 run_growclust3D-MP.jl example.nboot100.inp

In the first case above, I request 4 total threads. In the second, I request 10 processes (in addition to main). In either case, the user must request additional threads or processes to invoke parallelization.

Selecting Appropriate Run Parameters For Your Dataset

While the examples and input files provided in the GitHub repository may work well for many use cases, sometimes the default parameters in the algorithm control file or Julia scripts may need modification in order to optimize performance. This often requires a bit of experimentation, running GrowClust multiple times with different parameters and inspecting the results in map view and cross-section, and looking at the statistics in the output log. When doing this, it helps to set nboot to zero for efficiency.

Some general guidelines may be useful (though are not hard and fast rules):

If the relocation results still appear scattered, it can help to raise the minimum cross-correlation coefficient rmincut and/or ngoodmin which controls the number of quality differential times needed to use an event pair (where quality is defined by rmin). Fewer events will be relocated as these parameters are increased, but the benefit is that you are relying on higher-quality differential times.
Conversely, if you are relocating too few events for your liking, you may decrease these parameters, or raise rmsmax to a higher value. Generally speaking this parameter should scale with the typical travel time in your dataset. The examples in the tutorial are at a regional scale, so for a local dataset rmsmax may need to be smaller, and for a larger study region rmsmax may need to be larger. Higher noise levels or large uncertainties in the velocity model may also warrant a larger value of rmsmax.
For datasets collected over larger study regions or with high input location uncertainty, raising the parameters distmax, distmax2, hshiftmax, and vshiftmax may be warranted. These appear at the top of the Julia run script, not in the algorithm control file.

Good luck! Feel free to reach out if you need help thinking through these choices; while I am of course busy I am keen to collaborate with users as time allows.

Documentation - dttrugman/GrowClust3D.jl GitHub Wiki

Overview

Installation

Algorithm Control File

Input Files

Output Files

Running the Program

Selecting Appropriate Run Parameters For Your Dataset

⚠️ GitHub.com Fallback ⚠️

Documentation - dttrugman/GrowClust3D.jl GitHub Wiki

Overview

Installation

Algorithm Control File

Input Files

Output Files

Running the Program

Selecting Appropriate Run Parameters For Your Dataset

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️