the datasource class - geoscience-community-codes/GISMO GitHub Wiki

The most up to date documentation about datasource is at https://seiscode.iris.washington.edu/projects/thewaveformsuitezetatest/wiki/Datasource_examples/

Datasource objects are the gateway between objects (such as waveform) and their databases or stored files.

The datasource class was added to the waveform suite in spring 2009, and has since proven a valuable way to insulate the waveform class from the (often changing) data streams. This class has provided the flexability to allow anyone to easily extend the waveform class to understand and import any number of file or database formats.

The datasource and waveform classes already have the innate ability to work with several database and file formats. A quick list of predefined formats follows:

  • Antelope - Accessing an antelope database requires that the antelope toolbox for matlab be installed. This toolbox is available from Boulder Real Time Technologies.
  • Winston - Accessing a winston wave server requires that the appropriate java libraries be installed. At the time of this writing, these utilities are included in the usgs.jar file, available with the SWARM program.
  • SAC - No additional utilities are required to access SAC (Seismic Analysis Code) files.
  • SEISAN - No additional utilities are required to import files from the SEISmic ANalysis system. However, due to SEISAN's file naming conventions, datasource will not likely be able to automatically determine which file is desired. In this case, the datasource class will need to be created with specific filenames, rather than with intelligent paths.
  • FILE - Datasource is capable of looking within .mat files for all variables of a desired type. This allows it to parse specific data from files with waveform variables of any array size, any name, or even if they are contained within cells.

The datasource class provides the connection between waveforms (or other classes) and their databases or stored files. The datasource class has proven to be a valuable way to insulate the waveform class from the (often changing) data streams. Together, the datasource and waveform classes have built-in interpreters for several database and file formats including:

  • Antelope—The waveform suite wraps the required elements from the Antelope toolbox for MATLAB (Lindquist 2009), providing the ability to access Antelope databases. This is included in the standard Antelope distribution from Boulder Real Time Technologies.

  • Winston—The waveform suite reaches directly into Winston using the java library distributed with SWARM (Cervelli et al. 2004).

  • SAC—Seismic Analysis Code (SAC) files (Tapley and Tull 1992) may be imported without additional codes. Additional header fields are translated into similarly named user-defined fields.

  • SEISAN—No additional utilities are required to import files from the SEISmic ANalysis system (Havskov and Ottemöller,2003). However, due to SEISAN's file naming conventions, datasource may not be able to automatically determine which file is desired. In this case, the datasource should be created using specific filenames.

  • .mat file—Datasource is capable of looking within .mat files for all variables of a desired type. This allows it to parse data from files that contain previously generated waveform objects.

  • User-defined—The datasource/waveform object combination makes it straightforward to translate a file of any type into an array of one or more waveform objects. A short wrapper function should be all that is necessary to make an existing import routine compatible with the waveform suite. Notably, this does not require an understanding of the datasource/waveform codebase beyond the set function.

Most users are well acquainted with reading data on a per file basis. This is straightforward in the waveform suite. In addition, complex directory structures and file naming schemes can be traversed thanks to the datasource's ability to interpret fprintf() style formatting statements that may describe a file's time and/or station/channel/network/location information (see Figure 2). The following example shows how a datasource might be created that can access SAC files stored in the current directory.

Essentially the combination of the waveform, scnlobject, and datasource objects allow a user's homegrown data organization structure to be queried as a simple relational database. Datasource facilitates this by providing separation between data requests and the explicit data storage structure. Though not required, this approach is in our opinion generally preferable to hardwiring code to specific file names.

Additionally, the datasource is able to return information that crosses file or database boundaries. Data that is retrieved from individual files or databases can be combined into a continuous object. In the case of a waveform object, this is done automatically. By accessing files through their generalized formats, instead of individually, issues such as the “11:59 p.m. earthquake problem” can be avoided.

While the datasource class works in concert with the waveform class to retrieve information, it is not dependent upon waveform and may be used to access data of any type. Many users choose to save commonly used datasource objects in a .mat data file or an .m script file where they can be loaded automatically, such as from the startup.m file.

Commonly used datasources may best be saved saved in either a .mat datafile or a .m script, where they can be loaded without constantly needing to be re-entered. If these are placed in an .m script, then that script may be called at startup by including it in startup.m

Creating a datasource

matlab ds = datasource(type, filename, [parameter1][, parameter2][...])

creates a datasource of type "type" from constituent parts. Additional parameters depend upon the datasource type.

1. `ds = datasource('antelope',databasepath)`  
associates this datasource with a specific database. With additional parameters, the datasource may be able to traverse a directory tree to determine the correct database.
1. `ds = datasource('winston',server, port)`  
associates the datasource with a winston wave server.
1. `ds = datasource('sac',filelocation)`  
associates the datasource with a SAC file. With additional parameters, the datasource may be able to traverse a directory tree to determine the correct file.
1. `ds = datasource('file',filelocation)`  
associates the datasource with a SAC file. With additional parameters, the datasource may be able to traverse a directory tree to determine the correct file.
1. `ds = datasource('seisan',filelocation)`  
associates the datasource with a SEISAN file. With additional parameters, the datasource may be able to traverse a directory tree to determine the correct seisan file.
1. `ds = datasource(@interpreter,filelocation)`  
associates the datasource with any function designed to read a file, and then translate it into an object(or objects) via that object's constructor method. With additional parameters, the datasource may be able to traverse a directory tree to determine the correct file.

```matlab
ds = datasource('winston', server, port)
``
creates a datasource for a winston wave server.

```matlab
ds = datasource(@interpreter,filelocation)

associates the datasource with any function designed to read a file, and then translate it into an object(or objects) via that object's constructor method. With additional parameters, the datasource may be able to traverse a directory tree to determine the correct file.

Creating an interpreter (import) function

Interpreter functions translate a file of any type into an array of one or more objects.

The internals of the interpreter are a black-box to the datasource class-- it doesn't care how or where the interpreter retrieves the data. All that matters is that the function accepts a character array (string) as an argument and that it returns one or more objects of the appropriate class. (For the waveform suite, that would be, surprise, a waveform object). If no objects are found, then an empty array may be returned.

Datasource will check to ensure that the datafile exists, so the interpeter function is guarenteed that much. However, the existence of a file doesn't mean that it is of the proper type. The interpreter is responsible for generating an error if the file is unparsable.

Information about filenames and parameters

Interpreter functions translate a file of any type into an array of one or more objects.

##Modifying Datasources

The filenames (path, directory, files) associated with a datasource can be cahnged with the setfile command. Also, the interpreter function can be changed with the setinterpreter command. For additional changes, a new datafile should be created.

##Functions of note

getfilename
retrieve file names associated with scnl and date-time information

  1. getfilename(ds,myscnlobject, startTime)
    returns the filename associated with the place described by myscnlobject at time startTime. ie, if the filename is dependant upon the date of the requested data, then the startTime is used to determine that date.
  2. getfilename(ds,[], [])
    return the generic form of the filename, where each data-dependent field is enclosed with brackets.

subdivide_files_by_date
given a datasource and daterange return list of files.

  1. fns = subdivide_files_by_date(ds,startTime,endTime)
    returns a list of all files that have data during the timerange for a particular datasource ds.
  2. fn(ab,cd)
    Use ab and CD

##Examples

Example 1: example of an interpreter function

Here is a listing of a simple, yet relatively complete interpreter function. Notice the error checks, and how it handles the waveforms

function w = load_crazyformat(filename)

% LOAD_CRAZYFORMAT loads my old ad-hoc data from .mat files into waveforms.

%data is stored in a matlab file FILENAME as a variable 
%named "data" of type struct. Its fields are "location" 
%(5 digits for station, 2 digits will become channel),
% "values" which will map to data, and start, which will 
%map to the same name. All data is sampled at 20 samples per second.

dataIn = load(filename);
% should now have data in dataIn.data
if ~exist('dataIn','var')
  error('DataImporter:unknownFormat',...
  'The file was not interpretable as a crazyformat (.CF) file');
end;
try
  w = repmat(waveform,size(dataIn.data)); % preallocate
  for n=1:numel(dataIn.data)
    station = dataIn.data(n).location(1:5);
    channel = dataIn.data(n).location(6:7);
    data = dataIn.data(n).values;
    thisscnl = scnlobject(station,channel);
    w(n) = set(w(n),'scnl',thisscnl,'start',start,'data',data,'freq',20);
  end
catch
  error('DataImporter:parseError',...
  'Unable to parse... unexpected error with fields');
end
w = w(~isempty(w));
w = addhistory(clearhistory(w),'Imported from a crazyformat file');

and here's an example matlab session that uses the load_crazyformat function...

>>ds = datasource(@load_crazyformat,...
'C:/mydatafile/%04d/%03d.mat','year','jday') % assoc with my interpreter.
% - notice the filename depends on the year and julian day

ds = 
type: USER_DEFINED
location: C:/mydatafile/[YEAR]/[JDAY].mat
Interpreting Function: load_crazyformat

>>scnl = scnlobject('STA01','LL'); this is station/channel of interest 
>> w = waveform(ds,scnl,'2/5/2009','2/6/2009') load a single waveform

w = 1 x 1 wavform object 
--details of waveform displayed here--

>>stanums = 1:25;we'll load 25 stations that have numerical names
>>stanames = strcat('STA',num2str(stanums,'%02d')); change numbers into names like "STA03"
>>scnl = scnlobject(stanames,'LL'); create 1x25 station/channel list
>>w = waveform(ds,scnl,'2/5/2009 02:00', 2/9/2009 14:00) load all stations for this 4.5-day period

w = 1 x 25 wavform object 
--details of waveform displayed here--

This page was ported from: http://kiska.giseis.alaska.edu/input/celso/matlabweb/waveform_suite/datasource.html