07. Batch Processing - RJbalikian/SPRIT-HVSR GitHub Wiki

Introduction

Often HVSR campaigns involve several sites or individual measurements. If data quality is high, many or most of these can be processed with minimal input. For this reason, batch processing is a key feature of the sprit package.

When data is read in using a batch source (source='batch' in the sprit.fetch_data() or sprit.run()), an HVSRBatch data object is produced. An HVSRBatch data object is a collection of HVSRData objects (one for each site), which is the standard data object produced when running single files in sprit. See the API documentation for the data object classes in sprit for more details.

The easiest and best way to implement batch processing is to create a csv (or similar file) that can be read by pandas.read_csv(). Each row in this file is a different site/measurement location, and each column represents a keyword argument passed to sprit.run() (the column name should match the actual keyword, with the value in the cell for each row representing the value passed to that keyword for each site).

Because Batch processing works through multiple sites at once, if one site fails to process for some reason in any point in the processing algorithm, it will continue to work on the other sites and stop processing the one that failed. The processing status of each site can be accessed using [Processing status] key or attribute of the resulting HVSRBatch object.

Usage

Batch processing is started by using the source='batch' parameter either in the sprit.run() or sprit.fetch_data() function. The input_data parameter of sprit.input_params() should point to the csv file with all the processing information for each site (input_data may also point to a pandas.DataFrame() rather than the file itself). The input_data parameter of sprit.run() should be the same.

You can run batch processing on all the sample data by specifying input_data='sample' and source='batch' in the sprit.run() function. See examples below.

Examples

# You can practice batch processing using the sample data
hvsrBatch = sprit.run(input_data='sample', source='batch')

# Replace the path with the filepath of a properly-formatted csv file to 
hvsrBatch = sprit.run(input_data="/path/to/csv/file.csv", source='batch')

Batch file format

NOTE: The sample data represents real, HVSR data, but the locations have been changed. Where there is missing data, sprit will fill in with the default values for those parameters.

Note that the column names are the same as argument names in sprit.input_params(). If a parameter column is not specified, or a cell is blank for a given site, the default value is used for that site/parameter. Since this is already batch mode, the only values used in the source column should be either 'file' or 'raw' (if reading from raw Raspberry Shake data). The default value is 'file', so if you are reading files with all three components already included, this column could be omitted.

input_data	site	source	acq_date	starttime	endtime	tzone	xcoord	ycoord	elevation	input_crs
/home/fpath/...	SampleHVSRSite1	file	2/15/2023	17:04	17:34	UTC	-87.529	41.691	582	EPSG:4326
/home/fpath/...	SampleHVSRSite2	file	2/15/2023	15:32	16:00	US/Central	-87.5376	41.6690	584.9	EPSG:4326
/home/fpath/...	SampleHVSRSite3	file	7/18/2023	14:32	14:55	UTC	456358	4610561	587	EPSG:32616
/home/fpath/...	SampleHVSRSite4	file	7/18/2023	16:09	16:29	UTC
/home/fpath/...	SampleHVSRSite5	file	7/18/2023	16:39	17:00	US/Eastern	457908	4611959	587	EPSG:32616
/home/fpath/...	SampleHVSRSite6	file	7/11/2023	15:10	15:28	UTC	452658	4610214	754	EPSG:32616