08 03 Additional Plot Types - HannaAA17/Data-Scientist-With-Python-datacamp GitHub Wiki

Categorical Plot Types

Plots of each observation

  1. stripplot
    • sns.stripplot(data=df, y= , x= , jitter=True)
    • Amount of jitter (only along the categorical axis) to apply. This can be useful when you have many points and they overlap, so that it is easier to see the distribution. You can specify the amount of jitter (half the width of the uniform random variable support), or just use True for a good default.
  2. swarmplot
    • sns.swarmplot(data, y=, x= )

Abstract representations

  1. boxplot
    • sns.boxplot(data=df, y= ,x= )
  2. violinplot: computationally intensive
    • sns.violinplot(data=df, y= ,x=)
  3. lvplot : hybrid between boxplot and violinplot
    • sns.lvplot(data=df, y= ,x=)

Statistical estimates

  1. barplot
  2. pointplot
  3. countplot

Regression Plot

Plotting with regplot()

  • sns.regplot(data=df, x, y, marker='+')

Evaluating regression with residplot()

  • useful for evaluating the fit of a model
  • sns.residplot(df, x, y)
  • supports polymoial regression using the order parameter
    • sns.regplot(data=df, x, y, order=2)
    • sns.residplot(df, x, y, order=2)

Categorical values

  • sns.regplot(data=df, x='month', y='total_rentals', x_jitter=.1, order=2)

Estimators

  • x_estimators can be useful for highlighting trends sometimes, e.g. x_estimator=np.mean

Binning the data

  • x_bins can be used to divide the data into discrete bins, e.g x_bins=4

Matrix Plots

Getting data in the right format

  • pandas crosstabl() is frequently used to manipulate the data
  • pd.crosstab(df["Group"], df["YEAR"]), 如果不specify values, aggfunc 就是出现频率
  • pd_crosstab(df['mnth'], df['weekday'], values=df['total_rentals'], aggfunc='mean').round(0)

Build a heatmap

  • sns.heatmap(df_crosstab)

Customize a heatmap

  • annot=True: annotation(data) inside each square
  • fmt='d': set format as integer
  • cmap=: set color map
  • cbar=False: not display color bar
  • center=df_crosstab.loc[x,y]: center the heatmap colors on a specific value