12 04 Dates and Times in Pandas - HannaAA17/Data-Scientist-With-Python-datacamp GitHub Wiki

Reading date and time data in Pandas

  • pd.read_csv(),parse_dates = [set of column names]
  • pd.to_datetime(df['column name'], format = '')

Summarizing datetime data in Pandas

  • Grouping rows with .groupby() lets you calculate aggregates per group. For example, .first() , .min() or .mean()
  • .resample('M', on='') groups rows on the basis of a datetime column, by year, month, day, and so on
    • Mmonthly, D daily
# Import matplotlib
import matplotlib.pyplot as plt

# Resample rides to monthly, take the size, plot the results
rides.resample('M', on = 'Start date')\
  .size()\
  .plot(ylim = [0, 150])

# Show the results
plt.show()
# Resample rides to be monthly on the basis of Start date
monthly_rides = rides.resample('M',on='Start date')['Member type']

# Take the ratio of the .value_counts() over the total number of rides
print(monthly_rides.value_counts() / monthly_rides.size())
Start date  Member type
2017-10-31  Member         0.768519
            Casual         0.231481
2017-11-30  Member         0.825243
            Casual         0.174757
2017-12-31  Member         0.860759
            Casual         0.139241
Name: Member type, dtype: float64

Additional datetime methods in Pandas

Timezones in Pandas

  • dt.tzlocalize(): to set a timezone, keeping the date and time the same
    • ambiguous = 'NAT' to set ambiguous datetime to NAT
  • dt.tzconvert(): to change the date and time to match a new timezone
# Localize the Start date column to America/New_York
rides['Start date'] = rides['Start date']\
    .dt.tz_localize('America/New_York', ambiguous = 'NaT')

# Print first value
print(rides['Start date'].iloc[0])

Other datetime operations in Pandas

  • .dt.year
  • .dt.weekday_name
  • .dt.total_times()