R Python Cheatsheet - JasonLocklin/jasonlocklin.github.com GitHub Wiki
Goal | R | Python* |
---|---|---|
Read data | read.csv("file.csv") | pd.read_csv("file.csv") |
Write data | write.csv(d, "file.csv") | d.to_csv("file") |
Assignment | a <- 1 | a = 1 |
Descriptive stats | summary(d) | d.describe() |
Dataframe variable shortcut1 | d$variable | d.variable |
Boolean values | TRUE | True |
Load a package | library("package")2 | import package3 |
-
Use interactively only and not for assignment to be safe. Both use
d['variable']
for normal access, and pandas has it's.loc
,.iloc
, and.ix
. -
Never use
require(package)
unless you absolutely know what you are doing. It basically doestry library(package) and return boolean
so if the package doesn't load, your code will fail later and be very difficult to debug. In a similar vain,attach(dataframe)
is a way to add a data frame's variables to the local scope and save typing. This is a bad idea! Just use short dataframe names (like a single letter), and always call variables bydataframe[['variable']]
, or it's short formdataframe$variable
. -
import package
put's package's attributes under it's scope. Ifpackage
includes the functionfoo
it can now be called aspackage.foo
. To make things shorter,import package as pkg
allowspkg.foo
, andfrom package import foo
bringsfoo
into the top level scope, callable as justfoo
(be weary of name-conflicts here). Don't usefrom package import *
except from your own packages as it is likely to cause problems and conflicts. By conventionmaptlotlib
,numpy
, andscipy
are usually imported in the idiosyncratic way described below.
*Assuming the following import statements:
import pandas as pd
import numpy as np
import scipy as sp
import seaborn as sns
from scipy import stats
import matplotlib as mpl
import matplotlib.pyplot as plt