Data - smclements/vega GitHub Wiki

WikiDocumentationData

The basic data model used by Vega is tabular data, similar to a spreadsheet or database table. Individual data sets are assumed to contain a collection of records (or "rows"), which may contain any number of named data attributes (fields, or "columns"). Upon load, Vega maps each data record into a data object and assigns it a unique _id.

For example, if a Vega spec loads input JSON data like this:

[{"x":0, "y":3}, {"x":1, "y":5}]

the input data is then loaded into data objects like this:

[{"_id":0, "x":0, "y":3}, {"_id":1, "x":1, "y":5}]

Data sets can be specified directly (either through including the data inline or providing a URL from which to load the data), or bound dynamically at runtime (by providing data at chart instantiation time). Note that loading data from a URL will be subject to the policies of your runtime environment (e.g., cross-domain request rules).

Examples

Here is an example defining data directly in a specification:

{"name": "table", "values": [12, 23, 47, 6, 52, 19]}

One can also load data from an external file (in this case, JSON):

{"name": "points", "url": "data/points.json"}

Or, one can simply declare the existence of a data set. The data can then be dynamically provided when the visualization is instantiated (see the Runtime documentation for more):

{"name": "table"}

Finally, one might copy an existing data set and/or apply data transforms. In this case, we create a new data set ("stats") by computing statistics for data groups (facets) computed from the "table" data set:

{
  "name": "stats",
  "source": "table",
  "transform": [
    {
      "type": "facet", 
      "groupby": ["x"], 
      "summarize": {"y": ["average", "sum", "min", "max"]}
    }
  ]
}

Data Properties

Property Type Description
name String A unique name for the data set.
format Object An object that specifies the format for the data file, if loaded from a URL. The currently supported formats are json (JavaScript Object Notation), csv (comma-separated values), tsv (tab-separated values), topojson, and treejson. These options are specified by the type property of the format object. For other parameters, see the Formats documentation below.
values JSON The actual data set to use. The values property allows data to be inlined directly within the specification itself.
source String The name of another data set to use as the source for this data set. The source property is particularly useful in combination with a transform pipeline to derive new data.
url String A URL from which to load the data set. Use the format property to ensure the loaded data is correctly parsed. If the format property is not specified, the data is assumed to be in a row-oriented JSON format.
transform Array<Transform> An array of transforms to perform on the data. Transform operators will be run on the default data, as provided by late-binding or as specified by the source, values, or url properties. See the Data Transforms documentation for more details.
modify Array<StreamingOps> An array of streaming operators to insert, remove, & toggle data values, or clear the data set entirely. These operators are run after data transforms. See the Streaming Data documentation for more details.

Formats

The supported formats for loading data from external files are:

json

Loads a JavaScript Object Notation (JSON) file. Assumes row-oriented data, where each row is an object with named attributes. This is the default file format, and so will be used if no format parameter is provided. If specified, the format parameter should have a type property of "json", and can also accept the following:

Name Type Description
parse Object A collection of parsing instructions that can be used to define the data types of string-valued attributes in the JSON file. Each instruction is a name-value pair, where the name is the name of the attribute, and the value is the desired data type (one of "number", "boolean" or "date"). For example, "parse": {"modified_on":"date"} ensures that the modified_on value in each row of the input data is parsed as a Date value.
property String The JSON property containing the desired data. This parameter can be used when the loaded JSON file may have surrounding structure or meta-data. For example "property": "values.features" is equivalent to retrieving json.values.features from the loaded JSON object.

csv

Load a comma-separated values (CSV) file. The properties of the loaded JSON object are taken from the values of the first row of the file. The format parameter should have a type property of "csv", and can also accept the following:

Name Type Description
parse Object A collection of parsing instructions that can be used to define the data types of attributes in the CSV file. By default, all attributes are treated as string-typed data. Each instruction is a name-value pair, where the name is the name of the attribute, and the value is the desired data type (one of "number", "boolean" or "date"). For example, "parse": {"y":"number"} ensures that the y value in each row of the input data is parsed as a numerical value.

tsv

Load a tab-separated values (TSV) file. The properties of the loaded JSON object are taken from the values of the first row of the file. The format parameter should have a type property of "tsv", and can also accept the following:

Name Type Description
parse Object A collection of parsing instructions that can be used to define the data types of attributes in the TSV file. By default, all attributes are treated as string-typed data. Each instruction is a name-value pair, where the name is the name of the attribute, and the value is the desired data type (one of "number", "boolean" or "date"). For example, "parse": {"y":"number"} ensures that the y value in each row of the input data is parsed as a numerical value.

topojson

Load a JavaScript Object Notation (JSON) file using the TopoJSON format. The input file must contain valid TopoJSON data. The TopoJSON input is then converted into a GeoJSON format for use within Vega. There are two mutually exclusive properties that can be used to specify the conversion process:

Name Type Description
feature String The name of the TopoJSON object set to convert to a GeoJSON feature collection. For example, in a map of the world, there may be an object set named "countries". Using the feature property, we can extract this set and generate a GeoJSON feature object for each country.
mesh String The name of the TopoJSON object set to convert to a mesh. Similar to the feature option, mesh extracts a named TopoJSON object set. Unlike the feature option, the corresponding geo data is returned as a single, unified mesh instance, not as inidividual GeoJSON features. Extracting a mesh is useful for more efficiently drawing borders or other geographic elements that you do not need to associate with specific regions such as individual countries, states or counties.

treejson

Load a JavaScript Object Notation (JSON) file that contains hierarchical (tree) data. This format consists of a top-level JSON object (the root) that includes a named array of child objects. Each child may then have its own children in turn. For an example of this format, see flare.json. The treejson reader accepts the following properties:

Name Type Description
children String The JSON property that contains an array of children nodes for each intermediate node. This parameter defaults to "children".
parse Object A collection of parsing instructions that can be used to define the data types of string-valued attributes in the JSON file. Each instruction is a name-value pair, where the name is the name of the attribute, and the value is the desired data type (one of "number", "boolean" or "date"). For example, "parse": {"modified_on":"date"} ensures that the modified_on value in each row of the input data is parsed as a Date value.

Transforms

Data sets can also be manipulated by a number of transforms. Transformations are specified as an array of transform definitions. See the Data Transforms documentation for more details.

Streaming Data

Vega 2.0 supports streaming inserts, updates, and removals of data values. Streaming operations can be specified directly within the specification, or can be executed via an API for external updates. See the Streaming Data documentation for more details.