07. Batch Processing - RJbalikian/SPRIT-HVSR GitHub Wiki

Introduction

Often HVSR campaigns involve several sites or individual measurements. If data quality is high, many or most of these can be processed with minimal input. For this reason, batch processing is a key feature of the sprit package.

When data is read in using a batch source (source='batch' in the sprit.fetch_data() or sprit.run()), an HVSRBatch data object is produced. An HVSRBatch data object is a collection of HVSRData objects (one for each site), which is the standard data object produced when running single files in sprit. See the API documentation for the data object classes in sprit for more details.

The easiest and best way to implement batch processing is to create a csv (or similar file) that can be read by pandas.read_csv(). Each row in this file is a different site/measurement location, and each column represents a keyword argument passed to sprit.run() (the column name should match the actual keyword, with the value in the cell for each row representing the value passed to that keyword for each site).

Because Batch processing works through multiple sites at once, if one site fails to process for some reason in any point in the processing algorithm, it will continue to work on the other sites and stop processing the one that failed. The processing status of each site can be accessed using [Processing status] key or attribute of the resulting HVSRBatch object.

Usage

Batch processing is started by using the source='batch' parameter either in the sprit.run() or sprit.fetch_data() function. The input_data parameter of sprit.input_params() should point to the csv file with all the processing information for each site (input_data may also point to a pandas.DataFrame() rather than the file itself). The input_data parameter of sprit.run() should be the same.

You can run batch processing on all the sample data by specifying input_data='sample' and source='batch' in the sprit.run() function. See examples below.

Examples

# You can practice batch processing using the sample data
hvsrBatch = sprit.run(input_data='sample', source='batch')
# Replace the path with the filepath of a properly-formatted csv file to 
hvsrBatch = sprit.run(input_data="/path/to/csv/file.csv", source='batch')

Batch file format

NOTE: The sample data represents real, HVSR data, but the locations have been changed. Where there is missing data, sprit will fill in with the default values for those parameters.

Note that the column names are the same as argument names in sprit.input_params(). If a parameter column is not specified, or a cell is blank for a given site, the default value is used for that site/parameter. Since this is already batch mode, the only values used in the source column should be either 'file' or 'raw' (if reading from raw Raspberry Shake data). The default value is 'file', so if you are reading files with all three components already included, this column could be omitted.

input_data site source acq_date starttime endtime tzone xcoord ycoord elevation input_crs
/home/fpath/... SampleHVSRSite1 file 2/15/2023 17:04 17:34 UTC -87.529 41.691 582 EPSG:4326
/home/fpath/... SampleHVSRSite2 file 2/15/2023 15:32 16:00 US/Central -87.5376 41.6690 584.9 EPSG:4326
/home/fpath/... SampleHVSRSite3 file 7/18/2023 14:32 14:55 UTC 456358 4610561 587 EPSG:32616
/home/fpath/... SampleHVSRSite4 file 7/18/2023 16:09 16:29 UTC
/home/fpath/... SampleHVSRSite5 file 7/18/2023 16:39 17:00 US/Eastern 457908 4611959 587 EPSG:32616
/home/fpath/... SampleHVSRSite6 file 7/11/2023 15:10 15:28 UTC 452658 4610214 754 EPSG:32616