Data Visualization with Python - clizarraga-UAD7/Workshops GitHub Wiki
Data Visualization Libraries in Python
(Image credit: Plotly.com)
Introduction
There are many options for doing Data Visualization in Python. Matplotlib visualization library is among the most widely used plotting library and first choice for many data scientists and machine learning researchers and practitioners.
Matplotlib includes the Pyplot module which provides a MATLAB-like interface. Matplotlib is designed to be as usable as MATLAB, with the ability to use Python, and the advantage of being free and open-source.
Among all visualization libraries for Python, we enlist some of the most popular:
- Altair. Altair is a declarative statistical visualization library for Python, based on Vega and Vega-Lite.
- Bokeh. Bokeh is a Python library for creating interactive visualizations for modern web browsers.
- Ggplot. Ggplot is a Python implementation of the grammar of graphics ggplot2. Ggplot is not necessary a feature by feature equivalent of ggplot2, but does have some overlap.
- HoloViews. HoloViews is an open-source Python library designed to make data analysis and visualization seamless and simple.
- Matplotlib. Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.
- Plotly. Plotly's Python graphing library makes interactive, publication-quality graphs.
- Plotnine. plotnine is an implementation of a grammar of graphics in Python, it is based on ggplot2 used in R programming language.
- Seaborn. Seaborn is a Python data visualization library based on Matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.
Matplotlib Color Tables and Color Maps
Matplotlib comes with a series of color tables and colormap options. You can try different options and select one that suits your needs.
You can also use different online color map tools, like Colorbrewer2.org or Paletton.com to guide you in selecting a color map.
You could also enhance your plots by adding different line types, data markers or shapes.
Pandas Plot
Pandas does include a set of basic plot functions for analyzing a dataframe. We will show more options to supplement our graphics,
Please remember that, we need to load the Pandas Library before using it by including the command in a code cell:
import pandas as pd
So far we may have come across some of these:
Functions | Description |
---|---|
Relational Plots | |
pd.df.plot.line | Simple (x,y) plot |
pd.df.plot.scatter() | Plotting y vs. x as scatter plot |
Distribution Plot | |
pd.df.plot.hist() | Plot a histogram |
pd.df.hist() | |
pd.df.plot.kde() | Generate Kernel Density Estimate plot using Gaussian kernels. |
pd.df.plot.hexbin() | Hexagonal bin distribution plot |
Categorical Plots | |
pd.df.plot.box | Draw a box and whisker plot |
pd.df.boxplot() | |
pd.plotting.boxplot() | |
pd.df.plot.bar() | Make a bar plot |
Multiplot Grids | |
pd.plotting.scatter_matrix | A matrix of scatter plots |
Please see more plotting information with Pandas
Matplotlib
Anatomy of a Matplotlib figure. (Image credit Matplotlib.org)
Useful resources:
- Matplotlib Examples
- Matplotlib Basics
- Pyplot Tutorial
- Matplotlib documentation
- Matplotlib Pyplot Module documentation
- Matplotlib Cheat Sheets
There are essentially two ways to use Matplotlib:
- Explicitly create Figures and Axes, and call methods on them (the "object-oriented (OO) style").
- Rely on pyplot to automatically create and manage the Figures and Axes, and use pyplot functions for plotting.
To load Matplotlib and Pyplot, we enter the following import commands in a code cell:
import numpy as np import matplotlib as mpl import matplotlib.pyplot as plt
Some basic Matplotlib Pyplot functions:
Functions | Description |
---|---|
Relational Plots | |
plt.scatter() | Plotting y vs. x as scatter plot with varying markers size/color. |
plt.plot() | Plotting y vs. x, as lines or markers |
plt.hlines() | Plot horizontal lines |
plt.vlines() | Plot vertical lines |
Distribution Plot | |
plt.hist() | Plot a histogram |
Categorical Plots | |
plt.boxplot() | Draw a box and whisker plot |
plt.bar() | Make a bar plot |
plt.violinplot() | Make a violinplot |
Multiplot Grids | |
plt.subplots() | Creates a figure and a set of subplots |
See Pyplot full list of functions.
Seaborn
The Seaborn Library is based on the general visualization library Matplotlib. Seaborn makes visualization of a dataset statistical properties more easier to use.
Useful resources:
- Seaborn Examples
- Seaborn Tutorial
- Seaborn Documentation
- Seaborn Cheat Sheet - Kaggle
- Seaborn Cheat Sheet
To load Seaborn into our working memory, we include the following command into the beginning of the code:
import seaborn as sns
There are several types of graphics that we can produce with Seaborn, we will only show a small set of them.
Function | Description |
---|---|
Relational Plots | |
sns.scatterplot() | Basic relational plot between variables |
sns.lineplot() | Plot lines between values |
Distribution Plots | |
sns.histplot() | Basic frequency distribution plot |
sns.kdeplot() | The kernel density estimation plot |
Categorical Plots | |
sns.stripplot() | Basic distribution categorical plot |
sns.swarmplot | Categorical plot without overlapping points |
sns.boxplot() | Categorical box plots |
sns.violinplot() | Categorical violin plots |
sns.boxenplot() | Enhanced boxplot for larger datasets |
sns.pointplot() | Point estimates and confidence intervals using scatter plot glyphs |
sns.barplot() | Point estimates and confidence intervals as rectangular bars |
sns.countplot() | Counts of observations in each categorical bin using bars |
Regression Plots | |
sns.lmplot() | Plot data and regression model fits |
Matrix Plots | |
sns.heatmap() | Plot rectangular data as a color-encoded matrix |
Multiplot grids | |
sns.FacetGrid() | Multi-plot grid for plotting conditional relationships |
sns.pairplot() | Plot pairwise relationships in a dataset |
sns.jointplot() | Draw a plot of two variables with bivariate and univariate graphs |
More information about Seaborn functions
Plotnine
The Plotnine library
Plotnine is a Python data visualizations library that mimics the ggplot2 library of R programming. It was designed to keep R programming users in mind to let them use the same interface to develop charts in Python. The ggplot2 is based on the concept of grammar of graphics.
Useful resources:
To load the Plotnine library to the working memory environment, insert the following command in a code cell.
import plotting as p9
Plotnine has several plotting functions, we will only mention some.
Function | Description |
---|---|
Relational Plots | |
p9.geoms.geom_jitter() | Scatter plot with points jittered to reduce overplotting |
p9.geoms.geom_point() | Plot points (Scatter plot) |
p9.geoms.geom_line() | Connected points |
p9.geoms.geom_hline() | Horizontal line |
p9.geoms.geom_vline() | Vertical line |
Distribution Plots | |
p9.geoms.geom_histogram() | Histogram |
p9.geoms.geom_density() | Smooth density estimate |
Categorical Plots | |
p9.geoms.geom_boxplot() | Box and whiskers plot |
p9.geoms.geom_violin() | Violin plot |
p9.geoms.geom_bar() | Bar plot |
p9.geoms.geom_col() | Bar plot based on the x-axis |
Matrix Plots | |
p9.geoms.geom_bin2d() | 2D-bins counts Heatmap |
Multiplot grids | |
p9.facets.facet_grid() | Wrap 1D Panels onto 2D surface |
More information about Plotnine functions.
Jupyter Notebook related with this workshop
General References
- Altair documentation
- Bokeh documentation
- Matplotlib documentation
- Matplotlib Pyplot Module
- Plotly Documentation
- Plotnine documentation
- Seaborn Documentation
More on Data Visualization
- Data Visualization. A practical introduction. Kieran Healy.
- Fundamentals of Data Visualization. Claus O. Wilke.
- The misuse of colour in science communication. Fabio Crameri, Grace E. Shephard and Philip J. Heron.
Created: 04/04/2022 (C. Lizárraga); Last Update: 03/16/2023 (C. Lizárraga)