Data Transforms - nyurik/vega GitHub Wiki

This wiki documents Vega version 2. For Vega 3 documentation, see vega.github.io/vega.

Wiki ▸ Documentation ▸ Data Transforms

A data transform performs operations on a data set prior to visualization. Common examples include filtering and grouping (e.g., group data points with the same stock ticker for plotting as separate lines). Other examples include layout functions (e.g., stream graphs, treemaps, graph layout) that are run prior to mark encoding.

All transform definitions must include a type parameter, which specifies the transform to apply. Each transform then has a set of transform-specific parameters. Transformation workflows are defined as an array of transforms; each transform is then evaluated in the specified order.

These workflows can be specified as part of either a dataset definition:

  "data":[
    {
      "name": "stats",
      "source": "table",
      "transform": [
        {"type": "facet", "groupby": ["x"]}
      ]
    }
  ]

or added to a mark's from property:

  "marks": [
    {
      "type": "group",
      "from": {
        "data": "fields",
        "transform": [{"type": "cross"}]
      },
      ...
    }
  ],

Many transforms accept data attributes, or fields (denoted below as "Field"), as parameters. Data field parameters are strings that describe either individual attributes (e.g., "price") or access paths (e.g., "price.min"). Data fields can also access array indices, for example "list[0]".

Data Manipulation Transforms

These transforms can be used to manipulate the data: to filter and sort elements, form groups, and merge different data sets together.

Data Manipulation Transforms:

aggregate - Computes aggregate summary statistics (e.g., median, min, max) over groups of data.
bin - Bins raw data values into quantitative bins (e.g., for a histogram).
countpattern - Counts the number of occurrences of a text pattern, as defined by a regular expression.
cross - Compute the cross-product of two data sets.
facet - Organizes a data set into groups or "facets".
filter - Filters elements from a data set to remove unwanted items.
fold - Collapse ("fold") one or more data properties into two properties: a key property (containing the original data property name) and a value property (containing the data value).
formula - Extends data elements with new values according to a calculation formula.
impute - Performs imputation of missing values.
lookup - Extends a primary data set by looking up values on a secondary data set.
rank - Computes an ascending rank score for each data tuple.
sort - Sorts the values of a data set.
treeify - Computes a tree structure over a flat tabular dataset.

▸ aggregate

Computes aggregate summary statistics (e.g., median, min, max) over groups of data.

Property	Type	Description
groupby	Array<String>	An optional array of fields by which to group data values by.
summarize	JSON	Summary aggregates to compute for each group. This property supports two formats: a convenient short format, and a more complete long format.

The short format for summarize uses a single object hash that maps from field names to one or more aggregation operations: {"foo": "mean", "bar": ["sum", "median"]}. The aggregation operation can be either a single string or an array of strings, each a valid aggregation operation name.

The long format for summarize uses an array of aggregate specification objects. The previous short format example translates to the following long format: [{"field": "foo", "ops": ["valid"]}, {"field": "bar", "ops": ["sum", "median"]}]

An aggregate specification supports the following properties:

Property	Type	Description
field	String	The name of the field to aggregate. This name will be used to generate output field names, unless a custom name is specified (below).
ops	Array	An array of aggregate operations. See below for supported aggregates.
as	Array	An optional array of names to use for the output properties. By default, the aggregator will automatically create output field names of the form op_name (e.g., `sum_bar`, `median_bar`). The as array provides a set of custom names to use instead. The array should be the same length as the ops array. Standard automatic name generation is used for `null` entries.

The supported aggregation operations are:

Operation	Description
values	Builds up an array of all input objects in the group.
count	Count the total number of elements in the group.
valid	Count values that are not `null`, `undefined` or `NaN`.
missing	Count the number of `null` or `undefined` values.
distinct	Count the number distinct values.
sum	Compute the sum of values in a group.
mean	Compute the mean (average) of values in a group.
average	Compute the mean (average) of values in a group. Identical to mean.
variance	Compute the sample variance of values in a group.
variancep	Compute the population variance of values in a group.
stdev	Compute the sample standard deviation of values in a group.
stdevp	Compute the population standard deviation of values in a group.
median	Compute the median of values in a group.
q1	Compute the lower quartile boundary of values in a group.
q3	Compute the upper quartile boundary of values in a group.
modeskew	Compute the mode skewness of values in a group.
min	Compute the minimum value in a group.
max	Compute the maximum value in a group.
argmin	Find the input object that minimizes the value in a group.
argmax	Find the input object that maximizes the value in a group.

Many of the aggregation functions above are straightforward, but a few deserve additional discussion.

The 'values' and 'count' functions operate directly on the input objects and return the same value regardless of the provided field name. Similar to SQL's count(*), these can be specified with the special name "*", as in "summarize": {"*": "count"}.

The 'argmin' and 'argmax' functions are a bit unusual: instead of returning the minimum or maximum value of a field, they return the original input object that contains the minimum or maximum value. This can be useful for retrieving another field associated with the minimum or maximum value (e.g., for each region, in which year did I have the maximum revenue?). If multiple entries share the minimum or maximum value, the first observed input object will be returned.

Output

The aggregate transform outputs a new array of data objects, one for each group, with the computed aggregate statistics.

Example

For the following input data:

[{"foo": 1, "bar": 1}, {"foo": 1, "bar": 2}, {"foo": null, "bar": 3}]

This short format aggregate transform

{"type": "aggregate", "summarize": {"foo": "valid", "bar": ["sum", "median"]}}

would produce the following output:

[{"valid_foo": 2, "sum_bar": 6, "median_bar": 2}]

Similarly, this long format aggregate transform:

{
  "type": "aggregate",
  "summarize": [
    {"field": "foo", "ops": ["valid"]},
    {"field": "bar", "ops": ["sum", "median"], "as": ["s", "m"]}
  ]
}

would produce the following output:

[{"valid_foo": 2, "s": 6, "m": 2}]

▸ bin

Bins raw data values into quantitative bins (e.g., for a histogram).

Property	Type	Description
field	String	The name of the field to bin values from.
min	Number	The minimum bin value to consider. If unspecified, the minimum value of the specified field is used.
max	Number	The maximum bin value to consider. If unspecified, the maximum value of the specified field is used.
base	Number	The number base to use for automatic bin determination (default is base 10).
maxbins	Number	The maximum number of allowable bins.
step	Number	An exact step size to use between bins. If provided, options such as maxbins will be ignored.
steps	Array	An array of allowable step sizes to choose from.
minstep	Number	A minimum allowable step size (particularly useful for integer values).
div	Array	Scale factors indicating allowable subdivisions. The default value is [5, 2], which indicates that for base 10 numbers (the default base), the method may consider dividing bin sizes by 5 and/or 2. For example, for an initial step size of 10, the method can check if bin sizes of 2 (= 10/5), 5 (= 10/2), or 1 (= 10/(5*2)) might also satisfy the given constraints.

Output

The bin transform sets the following values on each node datum:

Name	Default Property	Description
start	bin_start	The starting bin boundary.
end	bin_end	The ending bin boundary.

These properties may be renamed by specifying an output map. For example, "output": {"start": "bs", "end": "be"} will use bs and be as the output properties instead of bin_start and bin_end.

Example

{"type": "bin", "field": "amount", "min": 0, "max": 10, "maxbins": 5}

This example will bin values in the amount field into one of 5 bins between 0 and 10. Given the following input data:

[
  {"amount": 3.7},
  {"amount": 6.2},
  {"amount": 5.9},
  {"amount": 8}
]

The bin transform produces the following output:

[
  {"amount": 3.7, "bin_start": 2, "bin_end": 4},
  {"amount": 6.2, "bin_start": 6, "bin_end": 8},
  {"amount": 5.9, "bin_start": 4, "bin_end": 6},
  {"amount": 8, "bin_start": 8, "bin_end": 10},
]

▸ countpattern

Counts the number of occurrences of a text pattern, as defined by a regular expression. The countpattern transform will iterate through each data object and count all unique pattern matches found within the designated text field.

Both the pattern and stopwords parameters below are not "raw" regular expression patterns – they are embedded in a string. As a result, take care to make sure you use proper escape characters as needed (e.g., to match digits, use "\\d", not "\d").

Parameter	Type	Description
field	Field	The field containing the text data.
pattern	String	A string containing a well-formatted regular expression, defining a pattern to match in the text. All unique pattern matches will be separately counted. The default value is `[\\w\']+`, which will match sequences containing word characters and apostrophes, but no other characters.
case	String	A lower- or upper-case transformation to apply prior to pattern matching. One of `"lower"` (the default), `"upper"` or `"none"`.
stopwords	String	A string containing a well-formatted regular expression, defining a pattern of text to ignore. For example, the value `"(foo

Output

The countpattern transform returns a new array of data objects, with two properties: text (a text segment that matches the pattern), and count (the number of observed occurrences of the pattern). The names of these new properties can be changed by setting the transform's output map. For example, the parameter "output": {"text": "t", "count": "c"} causes the properties t and c to be used instead of text and count.

Example

{
  "type": "countpattern",
  "field": "comment",
  "pattern": "\\d+",
  "stopwords": "13"
}

This example counts the occurrences of each digit sequence in the comment field, except for the number 13.

Running the transform on this input data:

[
  {"comment": "between 12 and 12.43"},
  {"comment": "43 minutes past 12 o'clock (and 13 seconds)"}
]

will produce the following output:

[
  {"text": "12", "count": 3},
  {"text": "43", "count": 2},
]

▸ cross

Compute the cross-product of two data sets.

Property	Type	Description
with	String	The name of the secondary data set to cross with the primary data. If unspecified, the primary data is crossed with itself.
diagonal	Boolean	If false, items along the "diagonal" of the cross-product (those elements with the same index in their respective array) will not be included in the output. This parameter only applies when a dataset is crossed with itself, and is true by default.
filter	[[Expression	Expressions]]

Output

The cross transform outputs an array of data objects with two properties: one containing an item from the primary data set (named a by default), and an item from the secondary data set (named b by default). These names can be changed by setting the transform's output map. For example, the parameter "output": {"left":"thing1", "right":"thing2"}, causes the properties thing1 and thing2 to be used instead of a and b.

Example

{"type": "cross", "diagonal": false}

This example crosses a data set with itself, ignoring entries with the same indices (e.g., along the diagonal). If the input data is [1, 2, 3], then the cross transform will output:

[
  {"a":1, "b":2},
  {"a":1, "b":3},
  {"a":2, "b":1},
  {"a":2, "b":3},
  {"a":3, "b":1},
  {"a":3, "b":2},
]

Similarly, with the same input data, the following

{"type": "cross", "filter": "datum.a <= 2 && datum.b >= 2"}

will produce:

[
  {"a":1, "b":2},
  {"a":1, "b":3},
  {"a":2, "b":2},
  {"a":2, "b":3}
]

▸ facet

Organizes a data set into groups or "facets". The facet transform is useful for creating collections of data that are then passed along to group marks to create hierarchical structure in a visualization. It can also be used (like the aggregate transform) to compute descriptive statistics over subgroups of data. In this sense, it is similar to a "group by" operation in SQL.

Property	Type	Description
groupby	Array<Field>	The fields to use as keys. Each unique set of key values corresponds to a single facet in the output. If unspecified, all data objects will be gathered into a single facet.
summarize	JSON	The summary aggregates to compute for each subgroup. See the aggregate transform for more information.
transform	Array<Transform>	A workflow of data transformations to apply to each subgroup.

Output

The facet transform returns a transformed data set organized into facets. Vega uses a standardized data structure for representing hierarchical or faceted data, which consists of a hierarchy of objects with properties for each key and a values array containing all data objects in the facet.

Example

{"type": "facet", "groupby": ["category"]}

Facets the data according to the values of the category attribute. Given the following input data:

[
  {"category":"A", "value":0},
  {"category":"B", "value":1},
  {"category":"A", "value":2},
  {"category":"B", "value":3}
]

The facet transform produces a hierarchical collection of data arrays:

[
  {
    "category": "A",
    "key": "A",
    "values": [{"category":"A", "value":0}, {"category":"A", "value":2}]
  },
  {
    "category": "B",
    "key": "B",
    "values": [{"category":"B", "value":1}, {"category":"B", "value":3}]
  }
]

Each facet (group) is bundled into a "sentinel" object that includes:

The original key properties and values (category in this example).
A string concatenating all key values (key). This can be useful in conjunction with ordinal scales.
The array of grouped data objects (values).

When a faceted data set is the input data for a group mark, Vega will automatically lookup the values array and pass it down as the data source for any marks contained within each enclosing group mark instance.

▸ filter

Filters elements from a data set to remove unwanted items.

Property	Type	Description
test	[[Expression	Expressions]]

Output

The filter transform returns a new data set containing only elements that match the filter test.

Examples

{"type": "filter", "test": "datum.x > 10"}

This example retains only data elements for which the field x is greater than 10.

{"type": "filter", "test": "log(datum.y)/LN10 > 2"}

This example retains only data elements for which the base-10 logarithm of y is greater than 2.

▸ fold

Collapse ("fold") one or more data properties into two properties: a key property (containing the original data property name) and a value property (containing the data value). The fold transform is useful for mapping matrix or cross-tabulation data into a standardized format.

Property	Type	Description
fields	Array<Field>	An array of field references indicating the data properties to fold.

Output

The fold transform returns a new array of data objects, with two additional properties: key (an extracted property name), and value (an extracted data value). The names of these new properties can be changed by setting the transform's output map. For example, the parameter "output": {"key": "k", "value": "v"} causes the properties k and v to be used instead of key and value.

Example

{"type": "fold", "fields": ["gold", "silver"]}

This example folds the gold and silver properties. Given the following input data:

[
  {"country": "USA", "gold": 10, "silver": 20}, 
  {"country": "Canada", "gold": 7, "silver": 26}
]

this example will produce the following output:

[
  {"key": "gold", "value": 10, "country": "USA", "gold": 10, "silver": 20},
  {"key": "silver", "value": 20, "country": "USA", "gold": 10, "silver": 20},
  {"key": "gold", "value": 7, "country": "Canada", "gold": 7, "silver": 26},
  {"key": "silver", "value": 26, "country": "Canada", "gold": 7, "silver": 26}
]

▸ formula

Extends data elements with new values according to a calculation formula.

Property	Type	Description
field	String	The property name in which to store the computed formula value.
expr	[[Expression	Expressions]]

Output

The formula transform returns the input data set, with each element extended with the computed formula value on the field property.

Examples

{"type": "formula", "field": "logx", "expr": "log(datum.x)/LN10"}

This example computes the base-10 logarithm of x and stores the result on each datum as the "logx" property.

{"type": "formula", "field": "hr", "expr": "hours(datum.date)"}

This example extracts the hour of the date field, and stores the result on each datum as the hr property.

▸ impute

Performs imputation of missing values.

Property	Type	Description
method	String	The imputation method to use. One of `value`, `mean`, `median`, `min`, `max`.
value	*	The value to use for missing data if the method is `value`.
field	String	The name of the data field to impute.
groupby	Array	A list of fields to group the data into series.
orderby	Array	A list of fields to determine ordering within series.

Output

The impute transform returns the input data set, with additional imputed tuples for missing values.

Examples

{
  "data": [{
    "name": "table",
    "values": [
      {"x": 0, "y": 28, "c":0}, {"x": 0, "y": 55, "c":1},
      {"x": 1, "y": 43, "c":0}, {"x": 1, "y": 91, "c":1},
      {"x": 2, "y": 81, "c":0}, {"x": 2, "y": 53, "c":1},
      {"x": 3, "y": 19, "c":0}
    ],
    "transform": [
      {
        "type": "impute",
        "groupby": ["c"],
        "orderby": ["x"],
        "field": "y",
        "method": "value",
        "value": 500
      }
    ]
  }]
}

In this example, if the transform would impute the following tuple:

{"x": 3, "c": 1, "y": 500}

▸ lookup

Extends a primary data set by looking up values on a secondary data set. In other words, performs a join that adds new properties onto the primary data set only. Lookup accepts one or more key values for the primary data set, each of which are then searched for within a single key field of the secondary data set. If a match is found, the full data object in the secondary data set is added as a property of the primary data set.

Property	Type	Description
on	String	The name of the secondary data set to treat as a lookup table.
onKey	Field	The field in the secondary data set to match against the primary data set. If unspecified, the integer indices of the secondary data set will be used instead.
keys	Array<Field>	An array of one or more key fields in the primary data set to match against the secondary data set.
as	Array<String>	An array of field names in which to store the results of the lookup. This array should have the same length as the keys parameter.
default	*	A default value to use if no matching key value is found. If not specified `undefined` is used as the default value.

Output

The lookup transform extends the primary data set with matching records in the secondary (on) data set, and stores the value from the secondary data in a field specified by the as parameter.

The use of lookup often results in the need for nested property accessors. For example, if the primary data object {key: 'a', count: 5} is used to lookup {id: 'a', value: 3.14} and the as parameter is ["obj"], the result will be: {key: 'a', count:5, obj: {id: 'a', value: 3.14}}. To access the extended value property, use obj.value as the field name.

Example

{
  "type": "lookup",
  "on": "unemployment",
  "onKey": "key",
  "keys": ["id"],
  "as": ["value"],
  "default": null
}

This example matches records in the input data with records in the data set named "unemployment", where the values of id (primary data) and key (secondary data) match. Matching records in the secondary data are added to the primary data in the field named "value".

▸ rank

Computes an ascending rank score for data tuples based on their observed order and any key fields. This is particularly useful for sorting ordinal scales by multiple key fields.

Property	Type	Description
field	String	The key field used to rank tuples. If undefined, tuples will be ranked in their observed sort order.
normalize	Boolean	If true, calculated ranks will lie in the range [0, 1].

Output

The rank transform extends the primary data set with an additional field called rank. This property's name can changed by setting the transform's output map. For example, the parameter "output": {"rank": "r"} causes the name r to be used instead of rank.

Example

With the following snippet of a Vega specification

{
  "data": [{
    "name": "table",
    "values": [
      {"x": "A","y": 12}, {"x": "A","y": 32},
      {"x": "B","y": 6},  {"x": "B","y": 35},
      {"x": "C","y": 19}, {"x": "C","y": 66}
    ],
    "transform": [
      {"type": "sort","by": ["y"]},
      {"type": "rank","field": "x"}
    ]
  }]
}

the table data set will contain the following tuples:

[
  {"x":"B", "y":6,  "rank":1},
  {"x":"A", "y":12, "rank":2},
  {"x":"C", "y":19, "rank":3},
  {"x":"A", "y":32, "rank":2},
  {"x":"B", "y":35, "rank":1},
  {"x":"C", "y":66, "rank":3}
]

▸ sort

Sorts the values of a data set. This is useful if subsequent transforms are order dependent (e.g., with a visual layout) or to determine rendering order of mark items. The sort order of an ordinal scale's domain can be specified on the scale directly.

Property	Type	Description
by	Field \| Array<Field>	A list of fields to use as sort criteria. By default, ascending order is assumed. Field names may be prepended with a "-" (minus) character to indicate descending order.

Output

The sort transform returns the input data set with elements sorted in place. Note: Sorting the elements on a data set will not affect the order of values in an ordinal scale. The latter must be specified on the scale directly.

Example

{"type": "sort", "by": "-_id"}

This example sorts a data set in descending order by the value of the _id field.

▸ treeify

Computes a tree structure over a flat tabular dataset.

Property	Type	Description
groupby	Array	An ordered list of fields by which to group tuples into a tree.

Output

The treeify transform computes a tree structure by adding children and parent pointers to every tuple. The names of these new properties can be changed by setting the transform's output map. For example, the parameter "output": {"children": "c", "parent": "p"} causes the properties c and p to be used instead of children and parent.

Visual Encoding Transforms

Visual encoding transforms can be used to create more advanced visualizations, including layout algorithms and geographic projections.

Visual Encoding Transforms: force, geo, geopath, hierarchy, linkpath, pie, stack, treemap, voronoi, wordcloud

▸ force

Performs force-directed layout for network data. Force-directed layouts treat nodes as charged particles and edges (links among nodes) as springs, and uses a physics simulation to determine node positions. The force transform acts on two data sets: one containing nodes and one containing links. Apply the transform to the node data, and include the name of the link data as a transform parameter.

Note that the force transform modifies the nodes data only. It does not modify any properties of the links data. To layout the links, elsewhere you can use a lookup transform to join the node data onto the links data and a linkpath transform to route the edge path.

Property	Type	Description
links	String	The name of the link (edge) data set. Objects in this data set must have appropriately defined `source` and `target` attributes.
size	Array	The dimensions [width, height] of this force layout. Defaults to the width and height of the enclosing data rectangle or group.
bound	Boolean	A flag indicating if the layout should constrain node positions to layout size. True by default.
interactive	Boolean	A flag indicating if the layout should run in interactive mode, using an animated layout until convergence. False by default.
iterations	Number	The number of iterations to run the force directed layout when not in interactive mode. The default value is 500.
charge	Number \| Field	The strength of the charge each node exerts. If the parameter value is a number, it will be used for all nodes. If the parameter is a field definition, the charge will be determined by the data. The default value is -30. Negative values indicate a repulsive force, positive values an attractive force.
linkDistance	Number \| Field	Determines the length of edges, in pixels. If the parameter value is a number, it will be used for all edges. If the parameter is a field definition, the linkDistance will be determined by the data. The default value is 20.
linkStrength	Number \| Field	Determines the tension of edges (the spring constant). If the parameter value is a number, it will be used for all edges. If the parameter is a field definition, the linkStrength will be determined by the data. The default value is 1.
friction	Number	The strength of the friction force used to stabilize the layout.
theta	Number	The theta parameter for the Barnes-Hut algorithm, which is used to compute charge forces between nodes.
gravity	Number	The strength of the pseudo-gravity force that pulls nodes towards the center of the layout area.
alpha	Number	A "temperature" parameter that determines how much node positions are adjusted at each step.
active	Signal	A signal value with information about a node that is being interacted with (e.g., dragged). The signal value should take the form of either `{id: tuple_id}` or `{id: tuple_id, x: x_coord, y: y_coord, update: true}`. If only an `id` is specified, the node is treated as fixed, but no other changes are made. If the `x` and `y` coordinates are provided, the node is moved to the given position. If the `update` flag is true, the layout will re-start at the alpha temperature.
fixed	String	The name of a data set containing nodes whose layout should be fixed. This data set may be populated in response to interactions in order to make nodes stay at a specific coordinate.

Output

The force transform sets the following values on each node datum:

Name	Default Property	Description
x	layout_x	the x-coordinate of the current node position.
y	layout_y	the y-coordinate of the current node position.

These properties may be renamed by specifying an output map. For example, "output": {"x": "xcoord", "y": "ycoord"} will use "xcoord" and "ycoord" as the output properties instead of "layout_x" and "layout_y".

Example

{"type": "force", "links": "edges", "linkDistance": 70, "charge": -100, "iterations": 1000}

This example assumes a data set named "edges" has already been defined, and has appropriate source and target attributes that reference the graph nodes.

▸ geo

Performs a cartographic projection. Given longitude and latitude values, sets corresponding x and y properties for a mark.

Property	Type	Description
projection	String	The type of cartographic projection to use. Defaults to `"mercator"`. The geo transform accepts any projection supported by the D3 projection plug-in (for example, `albersUsa`, `albers`, `hammer`, `winkel3`, etc).
lon	Field	The input longitude values.
lat	Field	The input latitude values.
center	Array	The center of the projection. The value should be a two-element array of numbers.
translate	Array	The translation of the projection. The value should be a two-element array of numbers.
scale	Number	The scale of the projection.
rotate	Number	The rotation of the projection.
precision	Number	The desired precision of the projection.
clipAngle	Number	The clip angle of the projection.

Output

The geo transform sets the following values on each datum:

Name	Default Property	Description
x	layout_x	the x-coordinate of the projection.
y	layout_y	the y-coordinate of the projection.

Example

{
  "type": "geo",
  "lat": "latitude",
  "lon": "longitude",
  "projection": "winkel3",
  "scale": 300,
  "translate": [960, 500]
}

This example computes a Winkel3 projection for lat/lon pairs stored in the latitude and longitude attributes.

▸ geopath

Creates paths for geographic regions, such as countries, states and counties. Given a GeoJSON Feature data value, produces a corresponding path definition, subject to a specified cartographic projection. The geopath transform is intended for use with the path mark type.

Property	Type	Description
field	Field	The data field containing GeoJSON Feature data.
projection	String	The type of cartographic projection to use. Defaults to `"mercator"`. The geo transform accepts any projection supported by the D3 projection plug-in (for example, `albersUsa`, `albers`, `hammer`, `winkel3`, etc).
center	Array	The center of the projection. The value should be a two-element array of numbers.
translate	Array	The translation of the project. The value should be a two-element array of numbers.
scale	Number	The scale of the projection.
rotate	Number	The rotation of the projection.
precision	Number	The desired precision of the projection.
clipAngle	Number	The clip angle of the projection.

Output

The geopath transform sets the following values on each datum:

Name	Default Property	Description
path	layout_path	the resulting path, as an SVG path string.

Example

{
  "type": "geopath",
  "field": "data",
  "projection": "winkel3",
  "scale": 300,
  "translate": [960, 500]
}

This example creates path definitions using a Winkel3 projection applied to GeoJSON data stored in the data attribute of input data elements.

▸ hierarchy

Computes tidy, cluster, and partition layouts.

Property	Type	Description
children	String	The data field for the children node array (default: `children`).
parent	String	The data field for the parent node (default: `parent`).
sort	Array	A list of fields to use as sort criteria for sibling nodes.
field	String	The value for the area of each leaf-level node for partition layouts.
mode	String	The layout algorithm mode to use. One of `tidy` (default), `cluster`, or `partition`.
orient	String	The layout orientation to use. One of `cartesian` (default) or `radial`.
size	Array	The dimensions of the tree layout. Defaults to the top-level `[width, height]`.
nodesize	Array	Sets a fixed x,y size for each node (overrides the size parameter).

Output

The hierarchy transform sets the following values on each datum:

Name	Default Property	Description
x	layout_x	the x-coordinate of the node.
y	layout_y	the y-coordinate of the node.
width	layout_width	the width-coordinate of the node.
height	layout_height	the height-coordinate of the node.
depth	layout_depth	the depth-coordinate of the node within the hierarchy.

Example

See this link for example specifications of cartesian & radial trees, cluster & radial dendrograms, and a time-slice treemap using treeify.

▸ linkpath

Computes a path definition for connecting nodes within a node-link network or tree diagram.

Property	Type	Description
sourceX	Field	The data field for the source x-coordinate of this link.
sourceY	Field	The data field for the source y-coordinate of this link.
targetX	Field	The data field for the target x-coordinate of this link.
targetY	Field	The data field for the target y-coordinate of this link.
shape	String	A string describing the path shape to use. One of `"line"` (default), `"curve"`, `"diagonal"`, `"diagonalX"`, or `"diagonalY"`.
tension	Number	A tension parameter in the range [0,1] for the "tightness" of `"curve"`-shaped links.

Output

The linkpath transform sets the following values on each datum:

Name	Default Property	Description
path	layout_path	the resulting path, as an SVG path string.

Example

{"type": "link", "shape": "line"}

Creates straight-line links.

{"type": "link", "shape": "curve", "tension": 0.15}

Creates curved links with a limited amount of curvature (tension = 0.15).

▸ pie

Computes a pie chart layout. Given a set of data values, sets startAngle and endAngle properties for a mark. The pie encoder is intended for use with the arc mark type.

Property	Type	Description
field	Field	The data values from this field will be encoded as angular spans. If this property is omitted, all pie slices will have equal spans.
startAngle	Number	A starting angle, in radians, for angular span calculations (default 0).
endAngle	Number	An ending angle, in radians, for angular span calculations (default 2π).
sort	Boolean	If true, will sort the data prior to computing angles.

Output

The pie transform sets the following values on each datum:

Name	Default Property	Description
start	layout_start	the start angle of the pie slice (in radians).
end	layout_end	the end angle of the pie slice (in radians).
mid	layout_mid	the mid angle of the pie slice (in radians).

Examples

{"type": "pie", "field": "price"}

Computes angular widths for pie slices based on the field price.

{"type": "pie"}

Computes angular widths for equal-width pie slices.

▸ stack

Computes layout values for stacked graphs, as in stacked bar charts or stream graphs.

Property	Type	Description
groupby	Array<Field>	A list of fields to partition the data into groups (stacks). When values are stacked vertically, this corresponds to the x-coordinates.
field	Field	The data field that determines the thickness or height of each stack.
sortby	Array<Field>	A list of fields to determine the order of stack layers.
offset	String	The baseline offset style. One of `"zero"` (default), `"center"`, or `"normalize"`. The `"center"` offset will center the stacks. The `"normalize"` offset will compute percentage values for each stack point; the output values will be in the range [0,1].

Output

The stack transform sets the following values on each datum:

Name	Default Property	Description
start	layout_start	the start coordinate of the stack.
end	layout_end	the end coordinate of the stack.
mid	layout_mid	the mid coordinate of the stack.

Example

{"type": "stack", "groupby": ["x"], "sortby": ["c"], "field": "y"}

▸ treemap

Computes a squarified treemap layout. The treemap transform is intended for visualizing hierarchical or faceted data with the rect mark type.

Property	Type	Description
field	Field	The values to use to determine the area of each leaf-level treemap cell.
padding	Number \| Array	The padding (in pixels) to provide around internal nodes in the treemap. For example, this might be used to create space to label the internal nodes. The padding value can either be a single number or an array of four numbers [top, right, bottom, left]. The default padding is zero pixels.
ratio	Number	The target aspect ratio for the layout to optimize. The default value is the golden ratio, (1 + sqrt(5))/2 =~ 1.618.
round	Boolean	If true, treemap cell dimensions will be rounded to integer pixels.
size	Array	The dimensions [width, height] of the treemap layout. Defaults to the width and height of the enclosing data rectangle or group.
sticky	Boolean	If true, repeated runs of the treemap will use cached partition boundaries. This results in smoother transition animations, at the cost of unoptimized aspect ratios. If sticky is used, do not reuse the same treemap encoder instance across data sets.
children	Field	A data field that represents the children array, `children` by default.
sort	Array<Field>	A list of fields to use as sort criteria for sibling nodes. By default, ascending order is assumed. Field names may be prepended with a "-" (minus) character to indicate descending order.

Output

The treemap transform sets the following values on each datum:

Name	Default Property	Description
x	layout_x	the x-coordinate of the treemap rectangle.
y	layout_y	the y-coordinate of the treemap rectangle.
width	layout_width	the width-coordinate of the treemap rectangle.
height	layout_height	the height-coordinate of the treemap rectangle.
depth	layout_depth	the depth-coordinate of the node within the tree.

Example

{"type": "treemap", "field": "price"}

Computes a treemap layout where elements are sized according to the field price. This example assumes the input data is hierarchical or has already been suitably faceted.

▸ voronoi

Computes a voronoi diagram for a set of input seed points and returns the computed cell paths.

Property	Type	Description
x	Field	The data field for seed point x-coordinates.
y	Field	The data field for seed point y-coordinates.
clipExtent	Array<Array<Number>>	An array containing the minimum and maximum coordinate for clipping the extreme edges of the voronoi diagram. For example, `[[-1e5, -1e5], [1e5, 1e5]]` will clip the voronoi diagram at 10,000 pixels in both the negative and positive directions.

Output

The voronoi transform sets the following values on each datum:

Name	Default Property	Description
path	layout_path	the Voronoi cell path, as an SVG path string.

Example

{"type": "voronoi", "x": "layout_x", "y": "layout_y"}

Adds Voronoi cell paths based on previously computed layout coordinates.

▸ wordcloud

Computes a word cloud layout, similar to Wordle. The wordcloud transform is intended for visualizing words or phrases with the text mark type.

Property	Type	Description
font	String \| Field	The font face to use within the word cloud, or a field containing the font value.
fontSize	Number \| Field	The font size for a word, in pixels, or a field containing the size value.
fontStyle	String \| Field	The font style (e.g., `"italic"`) to use, or a field containing the style value.
fontWeight	String \| Field	The font weight (e.g., `"bold"`) to use, or a field containing the weight value.
fontScale	Array<Number>, Null	The minimum and maximum font size to use. The values of the fontSize parameter will be rescaled to this range according to a square-root transform (so that area more accurately represents the value). The default value is `[10, 50]`. If set to `null`, the fontSize values will be used as-is.
padding	Number \| Array<Number>	The padding (in pixels) to provide around text in the word cloud. The padding value can either be a single number or an array of four numbers [top, right, bottom, left]. The default padding is zero pixels.
rotate	Field	The data field containing a rotation angle for a word (assumed zero by default).
size	Array<Number>	The dimensions [width, height] of the wordcloud layout.
spiral	String	The spiral type to use for the layout. One of `"archimedean"` (the default) or `"rectangular"`.
text	Field	The data field containing the text to visualize for each datum.

Output

The wordcloud transform sets the following values on each datum:

Name	Default Property	Description
x	layout_x	the x-coordinate of the center of the word.
y	layout_y	the y-coordinate of the center of the word.
font	layout_font	the font for the word.
fontSize	layout_fontSize	the fontSize for the word (in pixels).
fontStyle	layout_fontStyle	the fontStyle for the word.
fontWeight	layout_fontWeight	the fontWeight for the word.
rotate	layout_rotate	the angle by which to rotate the word (in degrees)

Example

{
  "type": "wordcloud",
  "text": "word",
  "font": "Helvetica Neue",
  "fontSize": "count",
  "fontScale": [10, 64]
}

Computes a word cloud layout for the word property, sized by the count property. The resulting font sizes will be scaled to the range [10, 64] pixels.

Data Transforms - nyurik/vega GitHub Wiki

Data Manipulation Transforms

▸ aggregate

▸ bin

▸ countpattern

▸ cross

▸ facet

▸ filter

▸ fold

▸ formula

▸ impute

▸ lookup

▸ rank

▸ sort

▸ treeify

Visual Encoding Transforms

▸ force

▸ geo

▸ geopath

▸ hierarchy

▸ linkpath

▸ pie

▸ stack

▸ treemap

▸ voronoi

▸ wordcloud

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️