Data Visualization with Python - clizarraga-UAD7/Workshops GitHub Wiki

Data Visualization Libraries in Python

(Image credit: Plotly.com)


Introduction

There are many options for doing Data Visualization in Python. Matplotlib visualization library is among the most widely used plotting library and first choice for many data scientists and machine learning researchers and practitioners.

Matplotlib includes the Pyplot module which provides a MATLAB-like interface. Matplotlib is designed to be as usable as MATLAB, with the ability to use Python, and the advantage of being free and open-source.

Among all visualization libraries for Python, we enlist some of the most popular:

  • Altair. Altair is a declarative statistical visualization library for Python, based on Vega and Vega-Lite.
  • Bokeh. Bokeh is a Python library for creating interactive visualizations for modern web browsers.
  • Ggplot. Ggplot is a Python implementation of the grammar of graphics ggplot2. Ggplot is not necessary a feature by feature equivalent of ggplot2, but does have some overlap.
  • HoloViews. HoloViews is an open-source Python library designed to make data analysis and visualization seamless and simple.
  • Matplotlib. Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.
  • Plotly. Plotly's Python graphing library makes interactive, publication-quality graphs.
  • Plotnine. plotnine is an implementation of a grammar of graphics in Python, it is based on ggplot2 used in R programming language.
  • Seaborn. Seaborn is a Python data visualization library based on Matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.

Matplotlib Color Tables and Color Maps

Matplotlib comes with a series of color tables and colormap options. You can try different options and select one that suits your needs.

You can also use different online color map tools, like Colorbrewer2.org or Paletton.com to guide you in selecting a color map.

You could also enhance your plots by adding different line types, data markers or shapes.


Pandas Plot

Pandas does include a set of basic plot functions for analyzing a dataframe. We will show more options to supplement our graphics,

Please remember that, we need to load the Pandas Library before using it by including the command in a code cell:

import pandas as pd

So far we may have come across some of these:

Functions Description
Relational Plots
pd.df.plot.line Simple (x,y) plot
pd.df.plot.scatter() Plotting y vs. x as scatter plot
Distribution Plot
pd.df.plot.hist() Plot a histogram
pd.df.hist()
pd.df.plot.kde() Generate Kernel Density Estimate plot using Gaussian kernels.
pd.df.plot.hexbin() Hexagonal bin distribution plot
Categorical Plots
pd.df.plot.box Draw a box and whisker plot
pd.df.boxplot()
pd.plotting.boxplot()
pd.df.plot.bar() Make a bar plot
Multiplot Grids
pd.plotting.scatter_matrix A matrix of scatter plots

Please see more plotting information with Pandas


Matplotlib

Anatomy of a Matplotlib figure. (Image credit Matplotlib.org)

Useful resources:

There are essentially two ways to use Matplotlib:

  • Explicitly create Figures and Axes, and call methods on them (the "object-oriented (OO) style").
  • Rely on pyplot to automatically create and manage the Figures and Axes, and use pyplot functions for plotting.

To load Matplotlib and Pyplot, we enter the following import commands in a code cell:

import numpy as np import matplotlib as mpl import matplotlib.pyplot as plt

Some basic Matplotlib Pyplot functions:

Functions Description
Relational Plots
plt.scatter() Plotting y vs. x as scatter plot with varying markers size/color.
plt.plot() Plotting y vs. x, as lines or markers
plt.hlines() Plot horizontal lines
plt.vlines() Plot vertical lines
Distribution Plot
plt.hist() Plot a histogram
Categorical Plots
plt.boxplot() Draw a box and whisker plot
plt.bar() Make a bar plot
plt.violinplot() Make a violinplot
Multiplot Grids
plt.subplots() Creates a figure and a set of subplots

See Pyplot full list of functions.


Seaborn

The Seaborn Library is based on the general visualization library Matplotlib. Seaborn makes visualization of a dataset statistical properties more easier to use.

Useful resources:

To load Seaborn into our working memory, we include the following command into the beginning of the code:

import seaborn as sns

There are several types of graphics that we can produce with Seaborn, we will only show a small set of them.

Function Description
Relational Plots
sns.scatterplot() Basic relational plot between variables
sns.lineplot() Plot lines between values
Distribution Plots
sns.histplot() Basic frequency distribution plot
sns.kdeplot() The kernel density estimation plot
Categorical Plots
sns.stripplot() Basic distribution categorical plot
sns.swarmplot Categorical plot without overlapping points
sns.boxplot() Categorical box plots
sns.violinplot() Categorical violin plots
sns.boxenplot() Enhanced boxplot for larger datasets
sns.pointplot() Point estimates and confidence intervals using scatter plot glyphs
sns.barplot() Point estimates and confidence intervals as rectangular bars
sns.countplot() Counts of observations in each categorical bin using bars
Regression Plots
sns.lmplot() Plot data and regression model fits
Matrix Plots
sns.heatmap() Plot rectangular data as a color-encoded matrix
Multiplot grids
sns.FacetGrid() Multi-plot grid for plotting conditional relationships
sns.pairplot() Plot pairwise relationships in a dataset
sns.jointplot() Draw a plot of two variables with bivariate and univariate graphs

More information about Seaborn functions


Plotnine

The Plotnine library

Plotnine is a Python data visualizations library that mimics the ggplot2 library of R programming. It was designed to keep R programming users in mind to let them use the same interface to develop charts in Python. The ggplot2 is based on the concept of grammar of graphics.

Useful resources:

To load the Plotnine library to the working memory environment, insert the following command in a code cell.

import plotting as p9

Plotnine has several plotting functions, we will only mention some.

Function Description
Relational Plots
p9.geoms.geom_jitter() Scatter plot with points jittered to reduce overplotting
p9.geoms.geom_point() Plot points (Scatter plot)
p9.geoms.geom_line() Connected points
p9.geoms.geom_hline() Horizontal line
p9.geoms.geom_vline() Vertical line
Distribution Plots
p9.geoms.geom_histogram() Histogram
p9.geoms.geom_density() Smooth density estimate
Categorical Plots
p9.geoms.geom_boxplot() Box and whiskers plot
p9.geoms.geom_violin() Violin plot
p9.geoms.geom_bar() Bar plot
p9.geoms.geom_col() Bar plot based on the x-axis
Matrix Plots
p9.geoms.geom_bin2d() 2D-bins counts Heatmap
Multiplot grids
p9.facets.facet_grid() Wrap 1D Panels onto 2D surface

More information about Plotnine functions.


Jupyter Notebook related with this workshop


General References

More on Data Visualization


Created: 04/04/2022 (C. Lizárraga); Last Update: 03/16/2023 (C. Lizárraga)

CC BY-NC-SA 4.0