Tutorial - smclements/vega GitHub Wiki

WikiDocumentationTutorial

This tutorial is intended to introduce you to the basics of Vega. We'll look at a bar chart with tooltips, and deconstruct it into its component elements. After completing the tutorial, you should be ready to start exploring and modifying Vega visualizations.

Bar Chart Specification

Here is one of the most basic (but also most useful!) forms of visualization, the humble bar chart:

Vega Bar Chart

Here is the Vega specification that defines this bar chart. First take a look over the full definition; we'll then examine each part in turn.

{
  "width": 400,
  "height": 200,
  "padding": {"top": 10, "left": 30, "bottom": 20, "right": 10},

  "data": [
    {
      "name": "table",
      "values": [
        {"category":"A", "amount":28},
        {"category":"B", "amount":55},
        {"category":"C", "amount":43},
        {"category":"D", "amount":91},
        {"category":"E", "amount":81},
        {"category":"F", "amount":53},
        {"category":"G", "amount":19},
        {"category":"H", "amount":87},
        {"category":"I", "amount":52}
      ]
    }
  ],

  "signals": [
    {
      "name": "tooltip",
      "init": {},
      "streams": [
        {"type": "rect:mouseover", "expr": "datum"},
        {"type": "rect:mouseout", "expr": "{}"}
      ]
    }
  ],

  "predicates": [
    {
      "name": "tooltip", "type": "==", 
      "operands": [{"signal": "tooltip._id"}, {"arg": "id"}]
    }
  ],

  "scales": [
    { "name": "xscale", "type": "ordinal", "range": "width",
      "domain": {"data": "table", "field": "category"} },
    { "name": "yscale", "range": "height", "nice": true,
      "domain": {"data": "table", "field": "amount"} }
  ],

  "axes": [
    { "type": "x", "scale": "xscale" },
    { "type": "y", "scale": "yscale" }
  ],

  "marks": [
    {
      "type": "rect",
      "from": {"data":"table"},
      "properties": {
        "enter": {
          "x": {"scale": "xscale", "field": "category"},
          "width": {"scale": "xscale", "band": true, "offset": -1},
          "y": {"scale": "yscale", "field": "amount"},
          "y2": {"scale": "yscale", "value":0}
        },
        "update": { "fill": {"value": "steelblue"} },
        "hover": { "fill": {"value": "red"} }
      }
    },
    {
      "type": "text",
      "properties": {
        "enter": {
          "align": {"value": "center"},
          "fill": {"value": "#333"}
        },
        "update": {
          "x": {"scale": "xscale", "signal": "tooltip.category"},
          "dx": {"scale": "xscale", "band": true, "mult": 0.5},
          "y": {"scale": "yscale", "signal": "tooltip.amount", "offset": -5},
          "text": {"signal": "tooltip.amount"},
          "fillOpacity": {
            "rule": [
              {
                "predicate": {"name": "tooltip", "id": {"value": null}},
                "value": 0
              },
              {"value": 1}
            ]
          }
        }
      }
    }
  ]
}

Visualization

The first set of properties determine the size of the visualization. The total size is the sum of the width, height, and individual padding values. The padding property provides internal margins for the visualization view. If the padding property is omitted, Vega will automatically compute the padding necessary to include all marks and labels.

  "width": 400,
  "height": 200,
  "padding": {"top": 10, "left": 30, "bottom": 20, "right": 10},

Though not shown here, one can use the viewport property to determine the visible, scrollable region. For example, "viewport": [500, 500] will limit the on-screen size of the visualization to 500 by 500 pixels. If the actual visualization is larger than that, the view will be scrollable.

Data

The data property is an array of data definitions. Each entry in the data array must be an object with a unique name for the data set. As shown here, data can be directly defined inline using the values property. In this example, we have an array of data objects with category (a string label) and amount (a number) fields.

  "data": [
      {
        "name": "table",
        "values": [
          {"category":"A", "amount":28},
          {"category":"B", "amount":55},
          {"category":"C", "amount":43},
          {"category":"D", "amount":91},
          {"category":"E", "amount":81},
          {"category":"F", "amount":53},
          {"category":"G", "amount":19},
          {"category":"H", "amount":87},
          {"category":"I", "amount":52}
        ]
      }
    ],

In Vega specifications, data can be:

  • loaded from the web by using the url property (including JSON and CSV files),
  • derived from a previously defined data set using the source property,
  • or left undefined and dynamically set when the visualization is constructed.

Only one of the values, url or source properties may be defined.

When a data set is loaded into Vega, it is also further processed. Each individual data object (or "datum") is wrapped within a new object, such that the new object inherits from the original data object (note: internally this is done using prototypal inheritance via Object.create). As a result, fields of the raw data such as category or amount can still be referenced as category or amount. The container object can be extended with additional fields, such as the results of layout or statistics calculations. This allows new data fields to be added without modifying the input data. For example, each container object includes a field "_id", which is a unique id assigned to the datum by Vega.

Data sets in Vega can also be modified using a collection of Data Transforms such as filter, grouping, statistics and layout operations. Transformations are specified using the transform property, which takes an array of transform definitions.

For more details see the Data and Data Transforms documentation.

Scales

Scale functions map data values to visual values, such as pixel positions or colors:

  "scales": [
    { "name": "xscale", "type": "ordinal", "range": "width",
      "domain": {"data": "table", "field": "category"} },
    { "name": "yscale", "range": "height", "nice": true,
      "domain": {"data": "table", "field": "amount"} }
  ],

Here we've defined two scales, one each for the X and Y axes. The X axis uses an ordinal scale, which maps a domain of ordered elements (in this case letters) to a visual range. The Y axis uses a quantitative linear scale. Linear scales are used by default, and so is not explicitly included in the Y scale definition above.

Note that each scale definition should have a unique name. (Actually, to be precise, scale definitions nested within group marks can repeat names to override previously defined scales, but that is a more advanced concept.)

The range settings of "width" and "height" are conveniences provided by Vega, and in this case map to the arrays [0, 400] and [0, 200], as defined by the size of the visualization. Ranges can also be defined explicitly as arrays of values: two-element numerical arrays should be used for spatial mappings, longer arrays (e.g., of RGB hex values like "#ffa804") can be used for ordinal mappings such as color palettes.

The domain property determines the input domain for the scale. The domain can be defined directly as an array of values (a quantitative range or list of ordinal values) or determined dynamically from the data. In the example above, the minimum and maximum values for the field amount from the data set named table are used as the domain. By default, quantitative scales also automatically include the zero value. To disable this feature, include the property "zero": false in the scale definition.

Finally, notice that the Y scale includes the property "nice": true. This optional property tells Vega that the scale domain can be made "nice" so that it is more human-friendly and readable. For example, if the raw data domain is [0, 94.345], it is made "nicer" as [0, 100].

For more details, see the Scales documentation.

Axes

Axes visualize scales using ticks and labels that can help viewers interpret a chart.

  "axes": [
    { "type": "x", "scale": "xscale" },
    { "type": "y", "scale": "yscale" }
  ],

Vega supports standard x and y axis types for horizontal and vertical axes, respectively. At minimum, an axis definition must specify the axis type and the scale to visualize. Based on their type, axes are automatically positioned on the edges of a visualization (or enclosing group mark, in more advanced situations). To ensure axes are visible, you may need to appropriately set the padding values for the visualization.

Now let's look at how we might further customize the axes:

  "axes": [
    { "type": "x", "scale": "xscale" },
    { "type": "y", "scale": "yscale",
      "ticks": 5, "orient": "right", "offset": 6 }
  ],

Here we've adjusted the Y axis in multiple ways, resulting in the modified chart shown below. By setting "ticks": 5, we've requested that the axis show roughly five tick marks, rather than the ten or so shown previously. By setting "orient": "right", we've requested that the axis be placed on the right side of the chart, rather than the default left position. Finally, setting "offset": 6 adjusts the axis position, in this case moving it to the right by 6 pixels. Here's what the modified visualization looks like:

Vega Bar Chart

For more details, see the Axes documentation.

Marks

Marks are the primary elements of a visualization: they are graphical primitives whose properties (such as position, size, shape, and color) can be used to visually encode data. Similar to previous systems like Protovis, Vega provides a set of marks that serve as building blocks that can be combined to form rich visualizations. Here, we simply use rectangles (rect marks) to construct a bar chart.

Every mark must have a type property, which determines which kind of mark (rectangle, line, area, etc) to use. Next, we must specify the data to be visualized using the from property. In many cases, one simply needs to reference a named data set defined in the earlier top-level data property. In addition, from specifications can include a transform definition to further manipulate the data (see the Data documentation for more details about that).

  "marks": [
    {
      "type": "rect",
      "from": {"data":"table"},
      "properties": {
        "enter": {
          "x": {"scale": "xscale", "field": "category"},
          "width": {"scale": "xscale", "band": true, "offset": -1},
          "y": {"scale": "yscale", "field": "amount"},
          "y2": {"scale": "yscale", "value":0}
        },
        "update": { "fill": {"value": "steelblue"} },
        "hover": { "fill": {"value": "red"} }
      }
    },

Visual mark properties, such as position and color, are specified using named property sets defined within the properties property. The standard property sets are the enter set (for properties that should be set when the mark is first created), the exit set (for property settings when a mark is about to be removed), the update set (to update settings upon changes), and the hover set (to set properties upon mouse hover). In the example above, the enter set is first evaluated, followed by the update set, to create the bar chart. Upon mouse over, the hover set is evaluated to color the hovered bar in red. When the mouse leaves a bar, the update set is evaluated again to return the bar to its original color. Note that if we omit the update set, a mouse hover would cause the bar to turn permanently red!

Now let's take a closer look at the specific mark definitions in the enter set:

          "x": {"scale": "xscale", "field": "category"},
          "width": {"scale": "xscale", "band": true, "offset": -1},
          "y": {"scale": "yscale", "field": "amount"},
          "y2": {"scale": "yscale", "value":0}

The first two properties (x and width) set the horizontal position and width of the bar. The x mark property (the leftmost edge of the bar) is set to the value obtained by applying the scale named "xscale" (defined in scales above) to the data field category.

The width property is set to the range band determined by the ordinal scale xscale. Ordinal scales can chop up a spatial range into a set of uniformly sized "bands". Including "band": true retrieves the size of the band for the scale. In addition, "offset": -1 reduces the width by one pixel, to enforce a 1px space between each of the bars.

The second two properties (y and y2) determine the vertical position and height of the bars. Similar to x and width, one could use y and height properties. However, here it is easier to specify the bar heights using two end points: one for the top of the bar (y) and one for the bottom of the bar (y2). We hardwire the value 0 and pass it through the linear scale named "yscale" to ensure that one edge of each bar is always at zero. It actually does not matter which of y or y2 is greater than the other; Vega will set the positions correctly. You can similarly use x and x2, which can be useful for creating visualizations such as horizontal bar charts and timelines.

In addition to standard graphical marks (rectangles, arcs, plotting symbols, etc), Vega also supports nested marks through the special group mark type. Groups are marks that can contain other marks, and can be used to visualize nested data (e.g., hierarchical data created with a data transform) and create small multiple displays. If nested data is provided, one group element is created for each top-level data set. The data is then recursively passed down to children marks within the group. Groups can also include custom scales and axes definitions that are specific to a group instance and its backing data.

For more details see the Marks documentation.

Tooltip Interaction

The signals and predicates properties define the tooltip interaction technique.

  "signals": [
    {
      "name": "tooltip",
      "init": {},
      "streams": [
        {"type": "rect:mouseover", "expr": "datum"},
        {"type": "rect:mouseout", "expr": "{}"}
      ]
    }
  ],

  "predicates": [
    {
      "name": "ifTooltip", "type": "==", 
      "operands": [{"signal": "tooltip._id"}, {"arg": "id"}]
    }
  ]

Signals can be thought of as "dynamic variables": expressions that are automatically reevaluated when other signal values change, or when DOM events occur. Each signal must have a unique name and an initial value (init); subsequent properties define how this value might change. With this example, the value of the tooltip signal changes in response to mouseover and mouseout events that occur on rect marks (see Event Stream Selectors). Every time these events occur, the corresponding expression is evaluated and set as the tooltip value. Thus, when the mouse pointer is moved over a rectangle mark, tooltip is equal to the mark's backing data value; when the pointer is moved off the rectangle, tooltip is an empty object.

Signals can be used throughout a specification. For example, they can be directly used to specify the properties of Data Transforms, Scales and mark visual properties. For more details, see the Signals documentation.

In this example, the tooltip signal is used to define an interactive selection known as a "predicate." Predicates must also be uniquely named, and they specify a condition that identifies members of the selection. In this example, the ifTooltip predicate evaluates to true if the _id field of the tooltip signal is equal to a given argument, here named "id". For more details, see the Predicates documentation.

The final step is to use the ifTooltip selection to dynamically display the correct tooltip text mark:

{
  "marks": [
    ...
    {
      "type": "text",
      "properties": {
        "enter": {
          "align": {"value": "center"},
          "fill": {"value": "#333"}
        },
        "update": {
          "x": {"scale": "xscale", "signal": "tooltip.category"},
          "dx": {"scale": "xscale", "band": true, "mult": 0.5},
          "y": {"scale": "yscale", "signal": "tooltip.amount", "offset": -5},
          "text": {"signal": "tooltip.amount"},
          "fillOpacity": {
            "rule": [
              {
                "predicate": {"name": "ifTooltip", "id": {"value": null}},
                "value": 0
              },
              {"value": 1}
            ]
          }
        }
      }
    }

Here, a single text mark instance serves as our tooltip text (when the from property of the mark definition is omitted, a single "dummy" datum is used by default). The position and text value are based on the tooltip signal. However, to only show the tooltip text when the mouse pointer is over a rectangle, we use the ifTooltip predicate along with a production rule. The fillOpacity of the tooltip is determined by an if-then-else style chain: if the ifTooltip predicate evaluates to true (that is, if the _id field of the tooltip signal is null, then the tooltip text is fully transparent, otherwise it is opaque.

We had also previously discussed the update and hover property sets that set the fill color of the rectangle mark: red on hover and blue otherwise. We could also use the ifTooltip predicate to express this within a single property set instead:

"update": {
  "fill": {
    "rule": [
      {
        "predicate": {"name": "tooltip", "id": {"field": "_id"}},
        "value": "red"
      },
      {"value": "steelblue"}
    ]
  }
}

Next Steps

You've now worked through a full Vega visualization. Next, we recommend experimenting with and modifying this example. Copy & paste the full specification above into the online Vega Editor. Can you adjust the scales and axes? Can you change the chart from a vertical bar chart to a horizontal bar chart? Can you visualize a new data set with a similar structure?

You should then be ready to explore and modify the other examples included in the Vega Editor. Many of the more advanced examples include data transforms that organize data elements and perform layouts. As you experiment with different examples, you may find it useful to refer to the documentation for each of the main specification components:

  • Visualization - Top-level visualization properties.
  • Data - Define and load data to visualize.
  • Data Transforms - Transform data prior to visualization.
  • Scales - Map data properties to visual properties using scales.
  • Axes - Axes visualize scales for spatial encodings.
  • Legends - Legends visualize scales for color, shape and size encodings.
  • Marks - Visualize data using various graphical marks.
  • Runtime - Deploying and using the browser-based Vega runtime.