Query Language - gtfierro/pundat GitHub Wiki
This query language is a subset of the Giles query language which is itself a superset of the sMAP query language. The motivation for the subset is removing the "editing" capabilities from the query language; the BOSSWAVE archiver simply views data and metadata.
Metadata queries take the form
query : SELECT <target> WHERE <where-clause>
target : <taglist>
| <distinct>
taglist : tag
| tag , taglist
distinct: DISTINCT tag
Returns
// PID 2.0.8.2
type QueryMetadataResult struct {
Nonce uint32
Data []KeyValueMetadata
}
type KeyValueMetadata struct {
UUID string
Path string
Metadata map[string]interface{}
}
tag
s are simply the names of the keys, e.g. "UnitofTime", "UnitofMeasure", "UUID", "Location", etc. Note that here, we no longer maintain the distinction between "Metadata" and "Properties" like was seen in sMAP.
Removing this distinction makes it easier to reason about how your key-value pairs will be represented by the archiver. For example, if a stream inherits metadata from a URI a/b/c/!meta/keyname
, that metadata will be made available under the name keyname
rather than Metadata/keyname
.
Retrieving the list of all UUIDs:
select distinct uuid;
Retrieving the rooms and UUIDs of temperature sensors in Soda Hall:
select Room, uuid where Building="Soda Hall" and SensorType="Temperature"
The query language supports the three types of data queries available in BtrDB:
- raw data access
- statistical summaries of data
- changed ranges queries
Raw data queries will return the exact contents of the reported timeseries streams that match the provided WHERE clause. The returned data takes the form of a list of messages containing a UUID and a list of <time, value>
pairs, where time
is a uint64 in the requested units of time and value
is a float64.
query : SELECT DATA <target> <datalimit?> <streamlimit?> <time-conversion?> WHERE <where-clause>
target : IN ( start-time, end-time )
| BEFORE start-time
| AFTER start-time
datalimit : LIMIT number
streamlimit: STREAMLIMIT number
time-conversion : AS timeunit
Returns
// PID 2.0.8.4
type QueryTimeseriesResult struct {
Nonce uint32
Data []Timeseries
Stats []Statistics
}
type Timeseries struct {
UUID string
Path string
Generation uint64
Times []uint64
Values []float64
}
Statistical queries return the list of summaries for each of the streams matching the provided WHERE clause. The returned data takes the form of a list of messages containing a UUID and a list of <time, count, min, mean, max>
pairs; time
is a uint64 timestamp in the requested units of time respecting the start of the window, and the rest are all float64 values containing the number of points in the window as well as the minimum, mean and maximum values.
pointwidth
is the length of the time intervals to summarize. These intervals have width 1 << pointwidth
nanoseconds. The pointwidth
parameter should be an integer in the range [0, 62]
width
is the width of a window given as a time expression number``units
, where number
is any integer and units
is any entry from the "Time references" table below, e.g. "5min", "3day", "100ms"
query : SELECT <stats-query> DATA <target> <datalimit?> <streamlimit?> <time-conversion?> WHERE <where-clause>
stats-query : STATISTICAL (<pointwidth>)
| WINDOW (<width>)
pointwidth : number
width: duration
target : IN ( start-time, end-time )
| BEFORE start-time
| AFTER start-time
datalimit : LIMIT number
streamlimit: STREAMLIMIT number
time-conversion : AS timeunit
Returns
// PID 2.0.8.4
type QueryTimeseriesResult struct {
Nonce uint32
Data []Timeseries
Stats []Statistics
}
type Statistics struct {
UUID string
Generation uint64
Times []uint64
Count []uint64
Min []float64
Mean []float64
Max []float64
}
query : SELECT CHANGED(<gen1>, <gen2>, <resolution>) DATA WHERE <where-clause>
gen1 : number
gen2 : number
resolution : number
Returns
// PID 2.0.8.8
type QueryChangedResult struct {
Nonce uint32
Changed []ChangedRange
}
type ChangedRange struct {
UUID string
Generation uint64
StartTime int64
EndTime int64
}
gen{1,2}
are the generations of data as reported by BtrDB; these generations are returned with all data queries. To find the ranges of data that changed between generation X and generation Y (where X < Y), the arguments to "CHANGED" are CHANGED(X,Y, ). resolution
is the minimum size of a returned range in 2<<resolution nanoseconds.
Data can be retrieved for some time region using a range query (in
) or relative to some point in time (before
, after
). These reference times
must be a UNIX-style timestamp, the now keyword, or a quoted time string.
Time references use the following abbreviations:
Unit | Abbreviation | Unix support | Conversion to Seconds |
---|---|---|---|
nanoseconds | ns | yes | 1 second = 1e9 nanoseconds |
microseconds | us | yes | 1 second = 1e6 microseconds |
milliseconds | ms | yes | 1 second = 1000 milliseconds |
seconds | s | yes | 1 second = 1 second |
minutes | m | no | 1 minute = 60 seconds |
hours | h | no | 1 hour = 60 minutes |
days | d | no | 1 day = 24 hours |
Time reference options:
-
Unix-style timestamp: Unix/POSIX/Epoch time is defined as the number of seconds since 00:00:00 1 January 1970, UTC (Coordinated Universal Time), not counting leap seconds. In Python, the current Unix time (in seconds) can be found with
import time # Python actually returns the milliseconds as a decimal, # so we use int to coerce to seconds only print int(time.time())
Giles includes support for Unix-style timestamps in units other than seconds. By suffixing timestamps with one of the unit abbreviations specified above (that have Unix support), we can introduce a finer resolution to our data queries. The following timestamps are all equivalent.
1429655468s
1429655468000ms
1429655468000000us
1429655468000000000ns
Specifying a timestamp without units will default to seconds.
-
The now keyword: uses the current local time as perceived by the server. The now time can be adjusted using relative time references, described below.
-
Quoted time strings: Giles supports timestrings enclosed in double quotes that adhere to one of the following formats:
-
1/2/2006
-
1/2/2006 03:04:05 PM MST
-
1/2/2006 15:04:05 MST
-
1-2-2006
rather than1/2/2006
is also supportedThese time strings follow the canonical Go reference time, which is defined to be
Mon Jan 2 15:04:05 -0700 MST 2006
-
-
Relative time references: the above time references are absolute, meaning that they define a specific point in time. Using relative time references, these absolute times can be altered. The most common form of this is specifying offsets of now.
Relative time references in Giles take the form of
number``unit
wherenumber
is a positive or negative integer andunit
is one of the abbreviations defined in the table above (not limited to those marked with Unix support). Relative time references can be chained.For example, to specify 10 minutes before now, we could use
now -10m
. To specify 15 minutes and 30 seconds after midnight March 13th 2010, we could use"3/13/2010" +15m +30s
limit controls the number of points returned per stream, and streamlimit controls the number of streams returned. For the before and after queries, limit will always be 1, so it only makes sense to use streamlimit in those cases. The exact syntax looks like
limit number streamlimit number
where number
is some positive integer. Both the limit and streamlimit components are optional and can be specified independently, together
or not at all.
The as
component allows a query to specify what units of time it would like the data returned as. The default is milliseconds, but the user
can specify others (ns, us, ms, s) as per the Unix-compatible notation in the Time Reference table below.
For a sample source, here's the same data point with 4 different units of time. Obviously the resolution is only as good as the underlying source. The archiver does not add additional time resolution, so if our source published in milliseconds, querying for data as micro- or nanoseconds would not return more detailed information. The sample source here reported in nanoseconds.
The where-clause
describes how to filter the result set. There are several operators you can use:
Tag values should be quoted strings, and tag names should not be quoted. Statements can be grouped using parenthesis.
Operator | Description | Usage | Example |
---|---|---|---|
= |
Compare tag values. | tagname = "tagval" |
Metadata/Location/Building = "Soda Hall" |
like |
String matching. Use Perl-style regex | tagname like "pattern" |
Metadata/Instrument/Manufacturer like "Dent.*" |
has |
Filters streams that have the provided tag | has tagname |
has Metadata/System |
and |
Logical AND of two queries (on either side) | where-clause and where-clause |
has Metadata/System and Properties/UnitofTime = "s" |
or |
Logical OR of two queries | ||
not |
Inverts a where clause | not where-clause |
not Properties/UnitofMeasure = "volts" |
matches |
Performs regex search over all documents (warning: this can be slow) | matches "regex" |
matches ".*RTU.*" |