R Python Cheatsheet - JasonLocklin/jasonlocklin.github.com GitHub Wiki

R to Python/Pandas/Numpy cheat-sheet

Goal R Python*
Read data read.csv("file.csv") pd.read_csv("file.csv")
Write data write.csv(d, "file.csv") d.to_csv("file")
Assignment a <- 1 a = 1
Descriptive stats summary(d) d.describe()
Dataframe variable shortcut1 d$variable d.variable
Boolean values TRUE True
Load a package library("package")2 import package3
  1. Use interactively only and not for assignment to be safe. Both use d['variable'] for normal access, and pandas has it's .loc, .iloc, and .ix.

  2. Never use require(package) unless you absolutely know what you are doing. It basically does try library(package) and return boolean so if the package doesn't load, your code will fail later and be very difficult to debug. In a similar vain, attach(dataframe) is a way to add a data frame's variables to the local scope and save typing. This is a bad idea! Just use short dataframe names (like a single letter), and always call variables by dataframe[['variable']], or it's short form dataframe$variable.

  3. import package put's package's attributes under it's scope. If package includes the function foo it can now be called as package.foo. To make things shorter, import package as pkg allows pkg.foo, and from package import foo brings foo into the top level scope, callable as just foo (be weary of name-conflicts here). Don't use from package import * except from your own packages as it is likely to cause problems and conflicts. By convention maptlotlib, numpy, and scipy are usually imported in the idiosyncratic way described below.

*Assuming the following import statements:

import pandas as pd
import numpy as np
import scipy as sp
import seaborn as sns
from scipy import stats
import matplotlib as mpl
import matplotlib.pyplot as plt
⚠️ **GitHub.com Fallback** ⚠️