12 04 Dates and Times in Pandas - HannaAA17/Data-Scientist-With-Python-datacamp GitHub Wiki
Reading date and time data in Pandas
pd.read_csv()
,parse_dates = [set of column names]
pd.to_datetime(df['column name'], format = '')
Summarizing datetime data in Pandas
- Grouping rows with
.groupby()
lets you calculate aggregates per group. For example, .first()
, .min()
or .mean()
.resample('M', on='')
groups rows on the basis of a datetime
column, by year, month, day, and so on
# Import matplotlib
import matplotlib.pyplot as plt
# Resample rides to monthly, take the size, plot the results
rides.resample('M', on = 'Start date')\
.size()\
.plot(ylim = [0, 150])
# Show the results
plt.show()
# Resample rides to be monthly on the basis of Start date
monthly_rides = rides.resample('M',on='Start date')['Member type']
# Take the ratio of the .value_counts() over the total number of rides
print(monthly_rides.value_counts() / monthly_rides.size())
Start date Member type
2017-10-31 Member 0.768519
Casual 0.231481
2017-11-30 Member 0.825243
Casual 0.174757
2017-12-31 Member 0.860759
Casual 0.139241
Name: Member type, dtype: float64
Additional datetime methods in Pandas
Timezones in Pandas
dt.tzlocalize()
: to set a timezone, keeping the date and time the same
ambiguous = 'NAT'
to set ambiguous datetime to NAT
dt.tzconvert()
: to change the date and time to match a new timezone
# Localize the Start date column to America/New_York
rides['Start date'] = rides['Start date']\
.dt.tz_localize('America/New_York', ambiguous = 'NaT')
# Print first value
print(rides['Start date'].iloc[0])
Other datetime operations in Pandas
.dt.year
.dt.weekday_name
.dt.total_times()