19.Visualization01.Standalone plots - sporedata/researchdesigneR GitHub Wiki

1. Use cases: in which situations should I use this method?

Standalone plots describe a series of graphics used for exploratory analysis. Each plot is briefly explained in terms of the type of variables -- continuous, categorical, time, among others. We then provide a link to a Web page where source code can be found to reproduce the graphic. In practice, the easiest way to choose a plot is to flip through the pages while searching for an image that you like. Once you find that, check whether that graphic is feasible given the type of data you have.

Occasionally, you may need to use labeled plots. Labeled plots are figures of different panels used to describe specific individuals from a sample. For example, you could point out a specific hospital or country. The ggtext package (https://www.cararthompson.com/talks/user2022/) facilitates this.

As a last piece of advice, it is easier to create the plot by changing your local data set to match the one used in the example source code. In other words, first, check the data format used in the code, then transform your local data to ensure that it is in the same format, apply the source code, and only then make adjustments in terms of colors, lines, and other graphical aspects.

2. Input: what kind of data does the method require?

3. Algorithm: how does the method work?

Model mechanics

Reporting guidelines

Data science packages

  • patchwork allows for multiple plots to be brought together into a single figure

  • A package for data cleaning

  • Composite plots - use ggarrange function from the ggpubr package to combine plots and create a composite figure composite plots ggarrange

  • A color palette for plots in R

  • The ggstatsplot is an extension of ggplot2 to create graphics with details from statistical tests included in the plots. The vignettes for individual functions are available here. Welch's t-test

  • Visualize Data on Spirals informs on how to visualize data along an Archimedean spiral and create spiral plots with time series data and genomic data.

Suggested companion methods

Learning materials

  1. Books
  2. Articles

4. Output: how do I interpret this method's results?

Mock conclusions or most frequent format for conclusions reached at the end of a typical analysis.

Tables, plots, and their interpretation

  • Univariate

    • Box (source) and pirate plots (source) for the display of continuous variables. Pirate plots and box ( traditional but less informative) plots display the relationship between one continuous variable and one or multiple categorical variables.
      Imgur
      Imgur

    The image below is a used case of pirate plots in showing distributions of individual Lateralization index (LI) values for each task(List, sentence, word), with the bold horizontal line indicating the mean

    Imgur

    • Histogram - displaying continuous variables in pre-defined bins - source

    Imgur

    • Bar and step charts - to display categories along the vertical (category) axis and values along with the horizontal (value) axis - source

    Imgur

    • Basic density plot - shows the distribution of a numeric variable - source

    Imgur

  • Bivariate

    • Box Source and pirate plots source - displaying continuous variables.
      Imgur
      Imgur

    • Histograms - source

    Imgur

    • Barplot - source
      Imgur

    • Lollipop plot is a merge between a bar and a dot plot. It is used to show the relationship between a numeric and a categoric variable - source
      Imgur

    • Half-violin half-dot plot - source

    Imgur

    • Scatter plot to visually identify trends in the data.

      • Scatter plot for two continuous variables - source.

      • Scatter plot can be associated with splines - source.

      • Scatter plot can be done in a blended way - source

      Imgur

    • Mosaic plots - to display the distribution of categorical variables - source.

    Imgur

    • Treemap - to display the percentage of individual components and subcomponents of a categorical variable and are an alternative to mosaic plots when you have many levels of your categorical variables - source Imgur

    • Polar plot - for two continuous variables - source

    Imgur

    • Double density plot - shows the distribution of a numeric variable - source

    Imgur

    • Mirror density plot - shows the distribution of a numeric variable - source
      Imgur

    • Horizontal boxplot - source
      Imgur

  • Involving time

    • Parallel coordinates plot - to display the trajectory of individual observations over time and can also be combined with a faced.

      • Simple parallel coordinates - source. Imgur

      • Parallel coordinates with facet - source. Imgur

    • Plots displaying the evolution of continuous variables over time - source. Imgur

    • Plot of evolution of continuous variables over time with confidence interval - source
      Imgur

    • Candlestick chart for continuous variables over time - sorce. Imgur

    • Ridge plot - source
      Imgur

    • Circular barplot - is used to show cyclic dataseries from the combination of a measurement variable and a time-related variable (hours, days, month) - source
      Imgur

    • Density plots over time - source
      Imgur

    • Heatmap of monthly average - source
      Imgur

  • Multivariate

    • Scatter plots where numeric variables can be included in a numeric scale using a combination of circle size and color. See below in the example: smaller circles are in yellow, and they are small (they are not even inward). When bigger they get more red - source Imgur

    • Ternary plots used with three continuous variables, often when dealing with the display of a composition - source Imgur

    • Contour plot - source
      Imgur

    • Heatmap is used for visualizing observations, correlations, missing values patterns, and more - source

    • Interactive heatmaps allow an analysis of specific value using the mouse - source Imgur

    • Alluvial plot is a variant of a Parallel Coordinates Plot (PCP) used for multivariate and time series-like data - source. Imgur

    • Radio plot - for categorical x continuous variables - source
      Imgur

    • River charts for continuous variables over time in a categorical variable - source Imgur

    • Radar plot for continuous and categorical variables, usually with five or more categories - source. Imgur

      • Radar chart to compare multiple individuals (e.g., treatments, hospitals, etc.) across multiple quantitative attributes plotted on axes starting from a central point - source. Imgur
    • Stacked barplot - is used to demonstrate variability inside which variable displayed in X; in this case, distinct colors are used to illustrate the different categories in the bar - source
      Imgur

    • Stacked area plot - is used to display the evolution of several groups over time on the same graphic - source
      Imgur

    • Small multiple area plot - an option to show dataseries from different groups in a separate manner using the same area plot source
      Imgur

    • Cleveland dot plot - It is basically a lollipop plot with three variables - source
      Imgur

    • Streamgraph - is used to show the evolution of two variables from several groups displaced around a central axis, resulting in a flowing and organic shape - source
      Imgur

    • Overlay Density Plots - enables you to visualize the density plots of several variables simultaneously - source
      Imgur
      Imgur

  • Multidimensional scaling plot - is used to evaluate the degree of association among multiple variables - source and Applied Multidimensional Scaling and Unfolding

    Imgur

  • Composite plots

    • Geofacet plot shows a sequence of plots of data for different geographical entities into a grid, preserving some original geographic orientation of the entities - source
  • Continuous and ordinal variables

    • Funnel assesses the potential role of publication bias - source Imgur

    • Sankey diagram is a dynamic visualization used to depict a flow from one set of values to another. They are used when you want to show multiple paths through a set of stages - source. Imgur

    • Sunburst plot shows hierarchical data spanning outwards radially from root to leaves. It is used for percentages under progressive stratification - [source[(https://echarts4r.john-coene.com/articles/chart_types.html#calendar). Imgur

    • Dendogram plot - is used to show the hierarchical relationship between variables - source
      Imgur

  • Calendar for continuous variables over time - source. Imgurl

  • Wordcloud is a visual representation of text data, where the distribution of frequency and importance of the words is shown with font size or color proportional - source. Imgurl

  • Interactivity - Are constructed by adding a slider that controls the value in a plot, triggering a function when the user mouseover or mousemove the element. It can also be used to filter or change the input dataset.

    • Density plot with slider - source

    Imgurl

  • Correlogram - is used when visualizing the relationship between each pair of variables through a scatterplot; it can and also be represented by adding symbols that express the correlation. They can be constructed using different sets of plots, see below three different ways of showing the same data.

    • Correlogram basic - source
      Imgurl

    • Correlogram with scatterplot - source
      Imgurl

    • Correlogram with scatterplot and histogram - source
      Imgurl

  • Bubble plot - It can be used instead of a scatter plot if your data has three variables, each containing a set of values. The third variable is represented by the sizes of the bubbles that determine the values in the data series.

  • Dynamic plots - are useful in situations where the relationship among variables progresses over time, with the dynamic plot demonstrating these changes.

  • Stratigraphic plots - are series of variables in a stratigraphic diagram. It can be plotted as line graphs and / or bar charts. Samples are plotted on the y-axis by sample number by default but may be plotted against sample age or depth by specifying a variable for yvar.

  • Stratigraphic plot - source

    Imgur

  • Population pyramid plots - are back-to-back horizontal histograms, best used when the data is organized hierarchically. The levels indicate some kind of progressive order, e.g., more to least “important,” older to newer, specific to least specific, least to most, etc. Although it is a hierarchical-type graph, it isn’t always in the shape of a pyramid. Despite it being mainly used in demographics and ecology studies to determine the overall age distribution of a population by plotting age (y-axis) vs. gender (x-axis), it can also be applied to other types of variables and areas of research.

    • Population pyramid plots - example

      Imgurl

  • Scree Plots - is a heuristic method to evaluate eigenvalues and help us determine how many factors we should extract from the data. It implies that the factors that better the data variability will have a higher eigenvalue. A scree plot fits a heuristic method because, even though you can't determine precisely the correlations related to each variable, the plot is a practical way to visualize the most relevant cluster of variables that can describe the data variability. There are a lot of methods to determine the optimum number of factors to extract, but in the context of a scree plot, the most common way is to use the point just before the line becoming horizontal in the graph. In practice, the most frequent route is to take those rules lightly and choose a number that will allow one to make each extracted factor make sense from a qualitative perspective. In other words, if the items in a given factor are all focused on something that we can distinguish as a trait, that's good enough. If the group of items doesn't mean anything and can't be distinguished from others, then we abandon that factor solution.

  • Treatment Timelines - You'll find the visualization method of treatment timelines, sometimes known as "swimmer plots," to be helpful for investigating longitudinal data structures. source

    Imgurl

5. SporeData-specific

Templates

Data science functions

  • sdatools::histogram
  • sdatools::boxPlot
  • sdatools::scatterPlot
  • sdatools::barPlot
  • sdatools::stackedBarPlot
  • sdatools::piratePlot
  • sdatools::likertPlot

Image repositories

References

[1] How to calculate the percentage of overlap between any two distributions

⚠️ **GitHub.com Fallback** ⚠️