04. Python API and SpRIT Worfklow - RJbalikian/SPRIT-HVSR GitHub Wiki
Introduction
The python API is the basis for how the rest of the package works. Both the graphical user interface (GUI) and the command line interface (CLI) call the python functions used in the python API. Understanding how the python functions are working can better help you understand how the GUI and CLI are working as well.
NOTE: IF THERE IS DISCREPANCY BETWEEN FUNCTION PARAMETERS IN THE EXAMPLES HERE AND ON THE PYTHON API DOCUMENTATION, THE API DOCUMENTATION IS USUALLY MORE UP-TO-DATE
See here: https://sprit.readthedocs.io/en/latest/sprit.html
Outline
The following basic steps are used when processing HVSR Data using SPRIT HVSR:
- Input parameters (including filenames)
- Read Metadata and Data
- Remove Noise (if needed) and handle gaps
- Generate Probabilistic Power Spectral Densities
- Generate and Calculate H/V ratios
- Remove outlier H/V Curves
- Find peaks in H/V Curve(s)
- Analyze and export
Example workflow: sprit.run() (Recommended)
The sprit
package is designed to be easily run with minimum inputs. If you have an existing seismic data file that does not need to be trimmed, edited, or have noise removed in any way, you can simply use:
sprit.run(input_data='/path/to/file.mseed')
Even if you do not have data ready to go, you can experiment with the features using the provided sample data. There are a handful of sample files included (6 at the time of writing), and they may be accessed simply by using the string "sample" or "sample1" as the input_data parameter (replace 1 with any number 1-6).
The simplest way to do this is:
hvsrData = sprit.run(input_data="sample")
You can add parameters to sprit.run()
or use your variable hvsrData to get further reports, plots, or other outputs. For example, the following code shows two ways to get a plot and a printed report from the sample data:
# Method 1: add sprit.run() parameters
hvsrData = sprit.run(input_data="sample2", report_format=['plot', 'print'])
# Method 2: use the get_report() method on the hvsrData variable itself
hvsrData = sprit.run(input_data="sample2")
hvsrData.get_report(report_format=['plot', 'print'])
The sample data can also be used in batch mode with the following code:
hvsrBatchData = sprit.run(input_data="sample", source='batch') #In the code above, source was set to the default value: source='file'
hvsrData.get_report(report_format='print') #This will display a series of reports
For more information on how to use the sample data, see the Using the Sample Data wiki page.
The sprit.run()
command takes as inputs any of the keyword parameters of the functions specified below (see the API documentation for these functions for more information about their parameters. Or, if you use verbose=True
in the sprit.run()
function, it will print each function and the parameters that were passed to them):
The sprit.run() function calls the following functions. This is also the recommended order/set of functions to run to process HVSR using SpRIT. Links to the API documentation for each function are given below
Function | Description |
---|---|
input_params() | Input all the metadata about your data, including filepaths to help SpRIT find and upload your data. The input_data parameter of input_params() is the only required variable, though others may also need to be called for your data to process correctly |
fetch_data() | Fetches data from indicated sources. The source parameter of fetch_data() is the only explicit variable in the sprit.run() function aside from input_data and verbose. Everything else gets delivered to the correct function via the kwargs dictionary. This defaults to 'file'. When file is used, metadata will update based on the actual data file. |
*calculate_azimuth() | [Optional] Function to calculate azimuthal horizontal component at specified angle(s). Adds each new horizontal component as a radial component to obspy.Stream object at hvsr_data[‘stream’] |
*remove_noise() | [Optional] This function will remove data that is too "noisy" for HVSR analysis, based on user input. Noise removal methods are outlined in the API documentation. By default, the kind of noise removal is remove_method='auto', which uses a saturation threshold, noise threshold, and sta/lta anti-trigger (if warmup/cooldown times are specified, it will also do this). If the remove_method parameter is set to anything other than one of the explicit options in remove_noise() , noise removal will not be carried out. |
generate_ppsds() | Generates (probabilistic) power spectral density (ppsd) data for each component, which will be used in later analysis. In addition to the explicit parameters of sprit.generate_ppsds(), any parameter of obspy.signal.spectral_estimation.PPSD() may also be used as a parameter to this function |
process_hvsr() | This is the main function processing the hvsr curve and statistics. The hvsr_band parameter sets the frequency spectrum over which these calculations occur, while the peak_freq_range specified where to search for peaks (i.e., peak_freq_range is not used in process_hvsr(), but hvsr_band is). |
*remove_outlier_curves() | [Optional] This function will remove "outlier" curves from further processing and for results validation. This can be run any time after the generate_ppsds() function has been run if comparing the ppsd value curves against themselves is desired (or after process_hvsr() if use_hv_curve=True and H/V curves are instead analyzed for outliers). |
check_peaks() | This is the main function that will find (and if desired 'score') peaks to get a best peak (which is then used in further analysis as the peak indicating the fundamental frequency). The parameter peak_freq_range can be set to limit the frequencies within which peaks are checked and scored. |
get_report() | This is the main function that will print, plot, and/or save the results of the data. |
export_data() | This function exports the final data output as a pickle file (by default, this pickle object has a .hvsr extension). This can be used to read data back into SpRIT without having to reprocess data. |
*Indicates optional steps
Example workflow: Running functions individually (not recommended)
Assuming the input data is correctly specified, this code can be used to rapidly and fully process an HVSR site.
This code is looking for a single .mseed file containing all three components generated by a Raspberry Shake instrument (this is not the default output, see the "Working with Raspberry Shake Data" page).
import sprit
dPath = r"path/to/data.mseed" #If you set this variable to "sample", this example should run with the first sample dataset
dPath = 'sample'
input_parameters = sprit.input_params(
input_data=dPath,
metapath = '', #not specified, will use default, packaged data for Raspberry Shake
site='HVSR Site of Interest',
station='ABC12',
instrument='RS', #can also use 'Raspberry Shake' to indicate raspberry shake is being used
channels=['EHZ', 'EHN', 'EHE'], #These are the channels used by raspberry shake
xcoord = -88.2290526,
ycoord = 40.1012122,
elevation = 755,
hvsr_band=[0.4, 40] #Will only process data between 0.4 Hz and 40Hz
)
#Get metadata and data
hvsr_data = sprit.fetch_data(params=input_parameters, source='file') #source=file means we already have a single file just to read in with all 3 channels
#Remove 'noisy' data from large amplitude transients, for example
hvsr_data = sprit.remove_noise(hvsr_data=hvsr_data, remove_method='auto') #automatically removes bad windows (see API for more details)
#Generate the Probabilistic Power Spectral Density data using Obspy
hvsr_data= sprit.generate_ppsds(hvsr_data=hvsr_data,
ppsd_length=20, #How long each processing window should be, in seconds for each individual HVSR curve
overlap=0.1, #How much to overlap each processing window
)
#Calculate the H/V data from the PPSDS
hvsr_data= sprit.process_hvsr(hvsr_data=hvsr_data,
method=4, #Which method should be used to combine the horizontal (E and N, usually) components
freq_smooth=True, #Whether to smooth the input PPSDS along the frequency/period axis before processing
resample=250, #Whether to resample the H/V curve along the frequency/period axis after processing
smooth=3, #Whether to smooth the H/V curve along the frequency/period axis after processing
)
#Find the best peak in the H/V curve
hvsr_data= sprit.check_peaks(hvsr_data=hvsr_data, peak_freq_range=[1,20])
hvsr_data.get_report(report_format=['print', 'csv'], export_path=False)
#Visualize and analyze processed data
sprit.plot_hvsr(hvsr_data,
plot_type='hvsr p t ann c+ p ann spec p ann', #Input string specifying which plots to plot and what to include on them. See [hvplot documentation](https://sprit.readthedocs.io/en/latest/main.html#sprit.plot_hvsr)
cmap='magma')
Input parameters
The very first step of data processing using SPRIT is to input the parameters of interest. The code snippet below shows the input_params() function and its default values.
sprit.input_params()
hvsr_inputs = sprit.input_params(input_data, #input_data is the filepath(s) to the data
site='HVSR Site', #This is a string for your site name, for your reference
network='AM', #Network name, using the International Federation of Digital Seismograph Networks (FDSN) naming standards
station='RAC84', #Station name, using the International Federation of Digital Seismograph Networks (FDSN) naming standards
loc='00', #Location, using the FDSN naming standards
channels=['EHZ', 'EHN', 'EHE'], #channel names, using the FDSN naming standards
acq_date=str(datetime.datetime.now().date()), #Date of acquisition
starttime = '00:00:00.00', #Start time of site data, especially important for Raspberry Shake or other long streams
endtime = '23:59:59.999', #Start time of site data, especially important for Raspberry Shake or other long streams
tzone = 'UTC', #Timezone of starttime and endtime
xcoord= -88.2290526, #Longitude (or xcoord) of seismometer at site
ycoord = 40.1012122, #Latitude (or ycoord) of seismometer at site
elevation = 755, #Elevation (or zcoord) of seismometer at site
depth = 0, #Depth of seismometer at site
elev_unit='feet',
input_crs = 'EPSG:4326',
output_crs = 'EPSG:4326',
instrument = 'Raspberry Shake', #Instrument being used
metapath = '', #Path to metadata (Poles and Zeros needed for PPSD step)
hvsr_band = [0.4, 40] #Frequency band within which to analyze data
peak_freq_range=[0.4,40]
)
Returns HVSRData or HVSRBatch
Read Metadata and Data
In some cases, the metadata (specifically, the Poles and Zeros information that is needed for the PPSD processing in Obspy) is not contained in the main data file. This is true for any mseed file, for example. In this case, the needed data may be contained in a separate file in, for example, a StationXML format.
The Poles and Zeros data for Raspberry Shake v7 data is included as a resource for the SPRIT package and will be read in by default if Raspberry Shake is selected as an instrument and no metadata path is specified.
sprit.fetch_data()
Based on the data in the input dictionary, this function will read in the actual data, and structure the data appropriately.
SPRIT uses ObsPy to manage data input/output and manipulation. The specified data must be compatible with obspy, and the output data will be in an Obspy Stream format.
hvsr_data = sprit.fetch_data(
params, #Dictionary containing appropriate parameters
source='file', #The type of source from which the data is being read (can be 'file', 'raw', 'dir', or 'batch' (see [API Documentation](https://sprit.readthedocs.io/en/latest/main.html#sprit.fetch_data)))
data_export_path=None, #Directory to export data trimmed to specified start/end times (in case file is larger, but analysis only takes place over subset)
data_export_format='mseed', #Format for export, if trim_dir is not None
detrend='spline', #Type of detrending to implement on data. detrend=False, will not detrend. Otherwise, see documentation for [SPRIT](https://sprit.readthedocs.io/en/latest/main.html#sprit.fetch_data) and [ObsPy](https://docs.obspy.org/packages/autogen/obspy.core.trace.Trace.detrend.html))
detrend_options=2 #Used with the obspy detrend method, as appropriate
filter_type=None,
filter_options=[],
update_metadata=True,
plot_input_stream=False,
plot_engine='matplotlib',
show_plot=True,
verbose=False
)
Returns HVSRData or HVSRBatch
Remove Noise (if needed) and handle gaps
Data gaps are handled automatically with SPRIT. Gaps in data can cause issues with the Obspy interface, which uses a numpy masked array to indicate where data does not exist. However, many Obspy functions cannot handle masked arrays, so raise errors if data with gaps is read in. In the SPRIT pacakge, where gaps need to be addressed, the data are split into separate traces between all gaps (using the stream.split() method in Obspy), then recombined back into 3 traces for their appropriate channel (using the stream.merge() method in Obspy). By default, the skip_on_gaps parameter in the obspy.signal.spectral_estimation.PPSD() function is set to True. One of the leading papers on HVSR uses the equivalent of skip_on_gaps=False, which fills gaps with a 0 value, resulting in outlier PPSDs that will likely need to be dealt with. However, skip_on_gaps=True also often produces outlier PPSDs (though not as many).
sprit.remove_noise()
hvsr_data = sprit.remove_noise(
hvsr_data, #Input object, usually dictionary output from fetch_data()
remove_method=None, #What kind of noise removal to use. See [API Documentation](https://sprit.readthedocs.io/en/latest/main.html#sprit.remove_noise) for options
processing_window=None,
sat_percent=0.995, #If _kind_='saturation threshold' (or 'auto'), value (percentage, out of 1) to use as saturation threshold
noise_percent=0.80, #If _kind_=;noise threshold' (or 'auto'), value (percentage, out of 1) to use as noise percentage
sta=2, #if _kind_='antitrigger' (or 'auto'), short-term-average window size (in seconds) to use
lta=30, #if _kind_='antitrigger' (or 'auto'), long-term-average window size (in seconds) to use
stalta_thresh=[0.5,5], #if _kind_='antitrigger' (or 'auto'), lower and upper stalta threshold values to use
std_ratio_thresh=2.0,
std_window_size=20.0,
min_std_win=5.0,
warmup_time=0, #if _kind_='warmup' or 'buffer', time at start of stream to remove for instrument warmup
cooldown_time=0, #if _kind_='cooldown' or 'buffer', time at end of stream to remove for instrument cooldown
min_win_size=1 #minimum window size (in seconds) to use for valid (not noise) window,
remove_raw_noise=False,
show_stalta_plot=False,
verbose=False
)
Returns HVSRData or HVSRBatch
Can also be used with an obspy stream, in which case it will return an obspy stream
Generate Power Spectral Densities
sprit.generate_psds()
hvsr_data = sprit.generate_ppsds(
hvsr_data, #Input parameters, output from fetch_data() or remove_noise()
window_length=30.0,
overlap_pct=0.5,
window_type='hann',
window_length_method='length',
skip_on_gaps=True,
num_freq_bins=500,
obspy_ppsds=False,
azimuthal_ppsds=False,
verbose=False, #Whether to print processing information to terminal
plot_psds=False,
**ppsd_kwargs #keyword arguments that will be directly passed to obspy.signal.spectral_estimation.PPSD()
)
Returns a dictionary like hvsr_data, with much more ppsd information.
Remove outlier curves
This function takes the PPSD Curves or H/V curve of each time step, calculates a median curve value at each frequency, then calculates an error (sqaure root of the square of the difference) at each frequency step. It then sums these errors for each curve to get an RMSE value for each curve. The curves are removed from further analysis of they exceed the RMSE value or percentile threshold designated by the rmse_thresh and use_percentile parameters.
sprit.remove_outlier_curves()
hvsr_data = remove_outlier_curves(
hvsr_data, # The input data, usually the output from generate_ppsds() or process_hvsr()
rmse_thresh=98, # The threshold value over which the data will considered an outlier. This is either percentile (0-100) or RMSE value, depending on value of use_percentile
use_percentile=True, # Whether rmse_thresh is referring to a percentile [0-100] or an actual RMSE value
use_hv_curve=False, # Whether to use the H/V curves (True) of each time step or the PPSD curves (False). Can only be True after process_hvsr()
plot_engine="matplotlib",
show_plot=False, # Whether to display a plot of which curves have been removed
verbose=False)
Generate and Calculate H/V ratios
sprit.process_hvsr()
hvsr_data = sprit.process_hvsr(
hvsr_data, #Input HVSRData or HVSRBatch, as output from generate_ppsds()
horizontal_method=3, #Method to use for combining horizontal components (see [API Documentation](https://sprit.readthedocs.io/en/latest/main.html#sprit.process_hvsr))
smooth=True, #Whether to smooth output H/V Curve
freq_smooth='konno ohmachi', #Type of smoothing to use along frequency axis before processing
f_smooth_width=40, #Parameter used by _freq_smooth_ to indicate how much smoothing to do (values mean different things depending on type of smoothing used)
resample=True, #Whether to resample the output H/V curve to add more resolution
outlier_curve_rmse_percentile=False, #RMSE Percentile to use for outlier curve removal, if _remove_outlier_curves_=True
azimuth=None,
verbose=False #Whether to print information about processing to terminal
)
Returns HVSRData or HVSRBatch
Find peaks in H/V Curve(s)
sprit.check_peaks()
sprit.check_peaks(
hvsr_dict, #Input dictionary, output from process_hvsr()
hvsr_band=[0.4, 40], #Final hvsr_band to use for peak search (can be smaller than original hvsr_band used for processing)
peak_selection='max',
peak_freq_range=[0.4, 40], #Peak values to use to get rid of peaks outside range of interest
azimuth="HV",
verbose=False
)
Returns HVSRData or HVSRBatch
Analyze and export
sprit.plot_hvsr()
sprit.plot_hvsr(
hvsr_dict, #Input dictionary, as output from check_peaks()
plot_type='HVSR', #String containing formatting for types of charts to use and what to include. See [Examples: HVPlot](https://github.com/RJbalikian/SPRIT-HVSR/wiki/Examples:-HVPlot) or [API Documentation](https://sprit.readthedocs.io/en/latest/sprit.html#sprit.plot_hvsr) for details
azimuth='HV',
use_subplots=True, #Whether to separate the charts into subplots (True) or separate figures (False)
fig=None, #Used to specify an existing figure to use for the chart, if desired
ax=None, #Used to specify an existing axis to use for the chart, if desired
return_fig=False, #Whether to return the figure
plot_engine='matplotlib',
save_dir=None, #Directory to save figure (not currently used)
save_suffix='', #Suffix to use when saving figure (not currently used)
show_legend=False,
show_plot=True, #Whether to show the chart (True), or just process/export it
close_figs=False,
clear_fig=True,
**kwargs #Keyword arguments to use, varies depending on which type of chart is being used (see _kind _parameter)
)
Returns figure if return_fig=True.
sprit.get_report()
sprit.get_report(
hvsr_data, #Dictionary with H/V data to work with (output from check_peaks())
report_formats=['print', 'table', 'plot', 'html', 'pdf'], #Format of report
azimuth="HV",
plot_type='HVSR p ann COMP+ p ann Spec p ann', #only used for plot report
plot_engine='matplotlib',
show_print_report=True,
show_table_report=False,
show_plot_report=True,
show_html_report=False,
show_pdf_report=True,
suppress_report_outputs=False,
show_report_outputs=False,
csv_handling='append',
report_export_format=None,
report_export_path=None, #Where to export report (not yet implemented)
verbose=False
)
sprit.export_data()
sprit.export_data(
hvsr_data,
hvsr_export_path=None,
ext='hvsr',
verbose=False
)