autoplot - janeshdev/ggplot2 GitHub Wiki
autoplot
NOTE: This is still a discussion and nothing is final.
Summary
autoplot
is an idiom for creating complete ggplot
graphs that are appropriate for specific types of data first introduced in version 0.9.0. ggplot2
does not provide any useful methods, but declares the S3 generic for other packages to extend
Rationale
ggplot
provides a framework for creating plots based on starting with data, mapping data to aesthetics, scaling the data and aesthetics, and providing a theming system for controlling plot appearance. At the same time, many packages which create specific data structures, typically S3 or S4 objects, provide plot
methods to graphically display them based, typically, on base graphics. These default plots implement appropriate conventions for the type of data plotted. A layer that is missing is the ability to create a standard plot using ggplot
so that it can be further adjusted by setting scales, themes, etc. autoplot
aims to fill this niche. However, the design should not be so rigid as to make adaptation impossible; a strength of ggplot
is the ability to rearrange data presentation in different ways for different needs.
Best practices
The plotting of specialized types of data structures can be split into two steps:
- Converting the specialized data structure into a
data.frame
which exposes variables in a structure appropriate forggplot
. A mechanism/idiom for this already exists infortify
. - Define the plot based on mapping these variables to the appropriate aesthetics using existing geoms/stats/scales. This is the step
autoplot
should do.
fortify
Any object which has an autoplot
method should also have a fortify
method which the autoplot
method uses to convert the specialized data structure into a data.frame
. The purpose of this separation is to be able to re-use the work of data restructuring even if the specific autoplot
method is not used. The documentation for the fortify
method should enumerate the variables in the returned data.frame
in such a way that they are known for other uses and that how they relate to the original data is known.
autoplot
The autoplot
method should use the appropriate fortify
method to convert the data structure to a data.frame
. It can then construct a ggplot
object using this data.frame
and creating layers (geoms or stats) with the appropriate aesthetic mappings. Documentation should state what layers/geoms/stats are created, and what aesthetic mappings are made. just include the actual code?
autoplot
should define default aesthetic mappings but allow the user to override them. One way to do it is to have a mapping
argument to autoplot and then define the actual mapping as:
map <- c(mapping, aes_string(x="nameOfX", y="nameOfY", colour="nameOfColour"))
ggplot() + geom_foo(mapping=map)
for example.
naming conventions
If a package which defines these specific data structures also defines fortify
and autoplot
, then they are just two additional methods. Enhances or depends on ggplot
?
If a separate package implements them, what should the package naming convention be? GGobject? ggobject? autplotObject? originalpackageGG?
Future work / open questions
- S4 classes?
- What should the convention be if the data structure can not be well represented by a single
data.frame
, but rather by a set ofdata.frame
s? - a list of data.frames pro: natural R idiom for collecting two or more things; con: breaks the return value convention for
fortify
- a "block diagonal"
data.frame
- that is, a singledata.frame
that has columns which are the combination of all the columns in all the individualdata.frame
s, but only one set of columns are filled in at a time. pros: is adata.frame
, whichfortify
is supposed to return; easy to create given the separatedata.frame
s -- justplyr::rbind.fill
them. con: inelegant as a data structure; wasteful of space - a
data.frame
with additionaldata.frame
s as attributes. pro: is adata.frame
whichfortify
is to return (and which won't chokeggplot
/qplot
). con: Sets one of thedata.frame
s as dominant over the others; seems not all that natural. - What should package naming conventions be?
- Extends vs depends?
- How much should
autoplot
take extra parameters to define variations on standard plot? example: triangle versus square lines inggdendro
. Should thefortify
function pull out all possible data so that any version can be plotted? - If calling
fortify
is expensive, should eachautoplot
function check its input data to see if it is either a object of that type (dispatched via S3 methods) or called directly with the fortified data (and thus already adata.frame
)? If so, then theautoplot
function should be exported so it can be called directly. - Should the
fortify
andautoplot
methods for specific data types be exported/public?
Case studies / example implementations
These are examples of packages or functions which create complete graphics of specific data types using ggplot, whether or not they use the autoplot mechanism.
- Original discussion was based on http://stackoverflow.com/questions/7098830/bad-idea-ggplotting-an-s3-class-object which had discussion of linear regression model diagnostics and an example of trees.
- ggdendro (CRAN page) (GitHub repo): does not implement in this way (as of 0.0-7), but has many of the pieces and some of the separation. Could be expanded/adapted if conventions are settled on.
- granovaGG (CRAN page): first release September 4, 2011.
- Survival curves: I (BrianDiggs) have some code that creates Kaplan-Meier curves from
survfit
objects, but it needs work; partially, I was wondering about a framework such as this.