A11 Scatterplot - cimat/data-visualization-patterns GitHub Wiki
A scatterplot displays correlations between two metric variables. It visually describes a two-column table with pairs of variates that doesn’t provide much meaningful information in the tabular form, especially when the underlying datasets become large. In a scatterplot chart, each pair of variates is represented by a dot in a two- dimensional Cartesian coordinate system. With a sufficient number of elements, it enables the viewer to identify certain development trends of the data and potentially even points to functiona correlations between the observed variables. Also, exceptions from such functional rules become visible, like outliers (Behrens, 2008).
Use a scatterplot to display a dataset that consist of a list of pairs of variates. Both attributes of such pairs (i.e. both observed variables) represent quantitative values. The usual written form for this kind of data is a two-column table with each column representing one variable and each row containing the values of one individual pair of variates. A typical example for this kind of data are quantitative results extracted from experiments or studies (Behrens, 2008).
Create a two-dimensional Cartesian coordinate system. Label and subdivide the axes according to the variables they represent. For each pair of variates from your data tabl , draw a point at the corresponding coordinates. The result of this process is a cloud of dots scattered over the coordinate space. In certain cases it is possible that identical pairs of variates occur more than once. This means that several dots in the scatterplot would overlap each other, distorting the expressiveness of the graphic. There are several methods to circumvent this problem with the two most popular ones being the sunflower and the jittered scatterplot technique. In the first case, attach a short stroke to each dot in the plot every time a duplicate occurs. This way the user will be able to count multiple overlapping dots. The second method, known as jittered scatterplot, adds a random value to each pair of variate, so that elements with identical attributes will slightly deviate from their actual coordinate position and move out of the shadow of overlapping elements. The random value must be large enough to make the deviation of a dot detectable for the user’s eye. However, it should be kept as small as possible so that it doesn’t jeopardize chart’s accuracy. In any case you will buy the readability of the graphic at the price of a slightly less precise display (Behrens, 2008).
In short: A picture says more than a thousand table rows. While human perception is not suitable at all to browse and process long tables of numbers in order to derive meaning from them, visual patterns and their corresponding exceptions become clear within an instance. That’s why scatterplots are a simple but useful tool to display relationships, detect trends and make developments visible that lie hidden within your data. Although a quite simplistic display type, the scatterplot can give you great insight into these data with very little effort (Behrens, 2008).
A1.2 Bubble Chart
- rpy2: The rpy2 package is used to access all R datasets from Python.
- Matplotlib
- Seaborn
- Vispy
- Pyqtgraph
import numpy as np
import matplotlib.pyplot as plt
from datos import data
d=data('mtcars')
plt.scatter(d.wt,d.mpg, c='blue')
plt.title('Scatterplot by Milles per Gallon and Car Weight',
family='serif', size=16)
plt.xlabel('Car Weight', family= 'serif')
plt.ylabel('Miles per Gallon', family='serif')
plt.show()
{width=12 cm}
import matplotlib.pyplot as plt
import seaborn as sns
from datos import data
d=data('mtcars')
sns.set(style="white")
g = sns.FacetGrid(d)
g.map(plt.scatter, "wt", "mpg")
plt.title("Scatterplot by Milles per Gallon and Car Weight",
family='serif', size=16)
g.set_axis_labels("Car Weight","Milles per Gallon")
plt.show()
{width=12 cm}
import pyqtgraph as pg
from pyqtgraph.Qt import QtCore, QtGui
import numpy as np
from datos import data
d=data('mtcars')
win = pg.GraphicsWindow()
win.resize(800,500)
win.setWindowTitle('pyqtgraph example: Scatterplot')
plt= win.addPlot(title="Scatterplots by Milles per Gallon and Car
Weigh")
plt.plot(d.wt,d.mpg, pen=None, symbol='o', symbolSize=5,
symbolPen=(255,255,255,200), symbolBrush=(0,0,255,150))
plt.setLabel('left', "Miles per Gallon", units='mpg')
plt.setLabel('bottom', "Car Weight", units='lbs')
if __name__ == '__main__':
import sys
if (sys.flags.interactive != 1) or not hasattr(QtCore,
'PYQT_VERSION'):
QtGui.QApplication.instance().exec_()
{width=12 cm}
R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, �) and graphical techniques, and is highly extensible (The R Fundation, 2016)[1].
For this proyect we will use the dataset mtcars.
The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973�74 models).
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
- Lattice
- ggplot2
This package contains functions for �base� graphics. Base graphics are traditional S-like graphics, as opposed to the more recent grid graphics.
For a complete list of functions with individual help pages, use library (help = "graphics")(R Core Team,s.f. )[2].
The online documentation is also available at docs.Graphics
Plot is generic function for plotting of R objects and draw a scatter plot with decorations such as axes and titles in the active graphics window.
plot(mtcars$wt, mtcars$mpg,
main="Scatterplots by Milles per Gallon and Car Weight",
xlab="Car Weight ", ylab="Miles Per Gallon ", pch=19, col="blue")
The lattice add-on package is an implementation of Trellis graphics for R. It is a powerful and elegant high-level data visualization system with an emphasis on multivariate data. It is designed to meet most typical graphics needs with minimal tuning, but can also be easily extended to handle most nonstandard requirements(Sarkar, 2011)[3].
The complete online documentation is also available in the form of a single PDF file at CRAN. From within R, type > help(Lattice)
Lattice functions. xyplot produces bivariate scatterplots.
library(lattice)
xyplot(mtcars$mpg~mtcars$wt,
main="Scatterplots by Milles per Gallon and Car Weight",
xlab="Car Weight", ylab="Miles per Gallon", pch=19)
ggplot2 is a plotting system for R, based on the grammar of graphics, which tries to take the good parts of base and lattice graphics and none of the bad parts. It takes care of many of the fiddly details that make plotting a hassle (like drawing legends) as well as providing a powerful model of graphics that makes it easy to produce complex multi-layered graphics(Wickham, 2013)[4]. ggplot2 documentation is available at docs.ggplot2.org
The point geom is used to create scatterplots in ggplot2.
library(ggplot2)
g <- ggplot(mtcars, aes(wt, mpg))+geom_point(colour="blue")
g + labs(list(title = "Scatterplots by Milles per Gallon and Car Weight", x="Car Weight", y="Miles per Gallon"))
[1] The R Fundation. (s.f). R.Consultado el 3 de marzo, 2016 en https://www.r-project.org/about.html.
[2] R Core Team. (s.f). The R Graphics Package. Consultado el 3 de marzo, 2016 en https://stat.ethz.ch/R-manual/R-devel/library/graphics/html/graphics-package.html.
[3] Sarkar, Deepayan. (2011). Lattice: trellis graphics for R. Consultado el 4 de marzo, 2016 en http://lattice.r-forge.r-project.org/
[4] Wickham, Hadley. (2013). ggplot2. Consultado el 4 de marzo, 2016 en http://ggplot2.org/