Detailed Usage - ahotovec/REDPy GitHub Wiki

Out of the Box

Once dependencies are installed and REDPy is downloaded, REDPy can be run out of the box with the following commands to test if the code is working on your computer (you may append -v to any of these scripts for additional verbosity):

>> python initialize.py
>> python catfill.py mshcat.csv
>> python backfill.py -s 2004-09-15 -e 2004-09-24

This will load a catalog of located earthquakes contained in mshcat.csv from the PNSN catalog within 5 km of the summit of Mount St. Helens between 2004-09-15 and 2004-09-24, then continues using continuous data from the same dates. It uses the default settings in settings.cfg to download data from 8 nearby stations at IRIS. The dates correspond to the beginning of the 2004 eruption of Mount St. Helens, where several repeating earthquakes are sure to be found. It requires a threshold cross-correlation coefficient above 0.7 on 4 or more stations to be considered a repeater.

Setup

REDPy is configured using a .cfg file. The default file is settings.cfg, but the -c flag can be used to specify a different .cfg file when running scripts. settings.cfg is commented with what each setting generally does. The most important for personal use will be defining the station parameters. You may delete the lines of any settings you wish to keep at their defaults; REDPy will automatically assume default parameters for any missing settings.

initialize.py sets up the HDF5 PyTable where data are stored, and creates all the necessary folders. Run this script only once; it will overwrite any existing tables that share a name with the table you are creating (defined in the .cfg file)!

Running Scripts

The table can populated by two scripts: catfill.py and backfill.py.

catfill.py uses a CSV catalog file of earthquakes, and requires a column named 'Time UTC' containing string times of earthquakes. mshcat.csv is included as an example of one such catalog file, from the PNSN. For now, the 'Time UTC' column is the only one that is used. Loading a catalog of known earthquakes can be a good starting point to help you understand how often repeating earthquakes are caught in a catalog, and at what rate they repeat. This may be important in helping you choose how long to let 'orphans' stay in the orphan table before they are eliminated from the pool of potential repeaters. catfill.py is significantly faster for populating the table with multiple years' worth of earthquakes than backfill.py, but is not very well optimized for how it downloads the data (it queries the client for each trigger).

backfill.py fills the table using continuous data between a given start and end date, if provided. If no dates are provided, it will attempt to fill up to the current date and assumes a starting time based on whether or not data exist in the table. If data exist, it will pick up from the time of the last trigger in the table; if data do not exist, it will download only the last nsec seconds (defined in configuration file). backfill.py is designed to be run approximately every few minutes or hourly as a cron job.

removeFamily.py allows the user to manually remove one or more families from the 'repeater' table. The 'core' of that cluster is kept in a separate table, and correlated with all new triggers. If a match is found, it is immediately removed, effectively preventing the cluster from reappearing. It's an imperfect way of removing correlated noise and other noise triggers that get through the algorithm that attempts to remove them automatically. removeFamilyGUI.py allows the user to choose families to remove in a pop-up window with images of the 'core' waveforms.

Typing -h after each script will bring up a help usage with options. For descriptions of all included scripts, see Scripts and Helper Functions.

Images and HTML files generated are output to a folder with the same name as the groupName in the configuration file.

Additional Examples

I'm currently also testing the scripts with a swarm of earthquakes that cropped up to the southeast of La Pine, OR. lapine.cfg is the configuration file I am using. It uses a single station for correlation and triggering, thus running faster... but also with more noise getting through.

Set up the table:

>> python initialize.py -v -c lapine.cfg

Download continuous data starting from a week before the swarm started and up to a week after to start with:

>> python backfill.py -v -c lapine.cfg -s 2015-10-15 -e 2015-10-29

You can keep the table up to date by rerunning backfill.py periodically to keep it up to the current time without the -s or -e flags:

>> python backfill.py -v -c lapine.cfg

Browsing Output

The primary outputs are overview.html and overview_recent.html in the groupName folder. See Outputs for full details on all the output files!

Notes

I have tested backfill.py as a cron job, and every 10-15 minutes seems to work well to prevent running overlapping jobs.

The file size of the table can grow quite quickly because waveform data are saved for cross-correlation and plotting with a decent amount of padding for each station downloaded. The 'out of the box' example takes up ~110MB for a relatively meager amount of repeaters (~200). Be sure you have plenty of space and memory to work with! Also, the more stations you download, the longer it will take to run! The full 8-station default run takes ~30 minutes (a single-station run over the same period of time takes ~6), though much of this time is taken up downloading the data rather than processing it.