Reference : Directed graphs for DAS interpretation - Schlumberger/distpy GitHub Wiki
The distpy model of interpretation flow
In distpy the general model involves digestion to multiple data blocks. Each block is processed by the same interpretation graph. The results are further processed and plotted as multiple entities.
For example, if we have n data-chunks ingested from the store and want m interpretation results we will have a graph:
This rather complicated n-by-m interaction is broken down in distpy into the following generic steps:
- Ingest (a few very large files to n numpy 2D array files, each containing 1-second of data)
- Strainrate-to-Summary (for each of the n files, perform a directed graph signal processing flow, resulting in n times m results)
- Egest (Amalgamate all n results for each result type, resulting in m output files)
This makes distpy potentially quite storage-hungry during processing (i.e. temporary storage), but does enable the problem to be treated as up to n-way parallel, so with elastic compute the processing can be completed very quickly even for huge datasets.
The strainrate-to-summary step is captured as a sub-graph, and that sub-graph is identical across all the n datablocks. In distpy this sub-graph is captured separately in a JSON file, so the code can go to data (e.g. a push to the Edge for real-time processing and ony the m outputs served back), or the data can come to the code (e.g. upload to Cloud cold storage, followed by up to n-way parallel processing).
The description of strainrate-to-summary flows as a graph
Consider the following steps, which would correspond to a basic event detector: 0. Load data
- Perform an FFT so that we have (x,f) domain data
- Print a quicklook thumbnail image of the originally loaded data
- Calculate the RMS at each depth as an output (commonly called Band-00)
- Apply a Butterworth filter to the original data to dampen the high frequencies
- Print a quicklook thumbnail image of the result from 4
- Apply and STA/LTA transform to the result from 4 so that even onsets are highlighted
- Print a quicklook thumbnail of the result from 6
- Auto-pick the events by calculating the maximum peak-to-peak variation in a running time window, making an EventIndicator
- Write out both Band-00 and the EventIndicator to WITSML
The following dot language description captures the relationships
digraph G {
load_data_0 -> fft_1
load_data_0 -> thumbnail_2
fft_1 -> rms_from_fft_3
load_data_0 -> butter_4
butter_4 -> thumbnail_5
butter_4 -> sta_lta_6
sta_lta_6 -> thumbnail_7
sta_lta_6 -> peak_to_peak_8
peak_to_peak_8 -> write_witsml_9
rms_from_fft_3 -> write_witsml_9
}
Auto-generated pictures of processing flows
The system can be run in documentation mode, and in this case the dot language is used to generate a graph, which can be used by software such as GraphViz.
The example below shows a rendering of the processing flow described above