1. Dates and Calendars
1.1 Dates in Python
1.2 Which day of the week?
Hurricane Andrew, which hit Florida on August 24, 1992, was one of the costliest and deadliest hurricanes in US history. Which day of the week did it make landfall?
Let’s walk through all of the steps to figure this out.
Instruction
- Import
date
fromdatetime
- Create a
date
object for August 24, 1992. - Now ask Python what day of the week Hurricane Andrew hit (remember that Python counts days of the week starting from Monday as 0, Tuesday as 1, and so on).
在这里插入代码片
1.3 How many hurricanes come early?
In this chapter, you will work with a list of the hurricanes that made landfall in Florida from 1950 to 2017. There were 235 in total. Check out the variable florida_hurricane_dates
, which has all of these dates.
Atlantic hurricane season officially begins on June 1. How many hurricanes since 1950 have made landfall in Florida before the official start of hurricane season?
Instruction
- Complete the
for
loop to iterate throughflorida_hurricane_dates
. - Complete the
if
statement to increment the counter (early_hurricanes
) if the hurricane made landfall before June.
在这里插入代码片
1.4 Math with dates
1.5 Subtracting dates
Python date
objects let us treat calendar dates as something similar to numbers: we can compare them, sort them, add, and even subtract them. This lets us do math with dates in a way that would be a pain to do by hand.
The 2007 Florida hurricane season was one of the busiest on record, with 8 hurricanes in one year. The first one hit on May 9th, 2007, and the last one hit on December 13th, 2007. How many days elapsed between the first and last hurricane in 2007?
Instruction
- Import
date
fromdatetime
. - Create a
date
object for May 9th, 2007, and assign it to thestart
variable. - Create a
date
object for December 13th, 2007, and assign it to theend
variable. - Subtract
start
fromend
, to print the number of days in the resultingtimedelta
object.
在这里插入代码片
1.6 Counting events per calendar month
Hurricanes can make landfall in Florida throughout the year. As we’ve already discussed, some months are more hurricane-prone than others.
Using florida_hurricane_dates
, let’s see how hurricanes in Florida were distributed across months throughout the year.
We’ve created a dictionary called hurricanes_each_month
to hold your counts and set the initial counts to zero. You will loop over the list of hurricanes, incrementing the correct month in hurricanes_each_month
as you go, and then print the result.
Instruction
- Within the
for
loop: - Assign
month
to be the month of that hurricane. - Increment
hurricanes_each_month
for the relevant month by 1.
在这里插入代码片
1.7 Putting a list of dates in order
Much like numbers and strings, date
objects in Python can be put in order. Earlier dates come before later ones, and so we can sort a list of date
objects from earliest to latest.
What if our Florida hurricane dates had been scrambled? We’ve gone ahead and shuffled them so they’re in random order and saved the results as dates_scrambled
. Your job is to put them back in chronological order, and then print the first and last dates from this sorted list.
Instruction
- Print the first and last dates in
dates_scrambled
. - Sort
dates_scrambled
using Python’s built-insorted()
method, and save the results todates_ordered
. - Print the first and last dates in
dates_ordered
.
在这里插入代码片
1.8 Turning dates into strings
1.9 Printing dates in a friendly format
Because people may want to see dates in many different formats, Python comes with very flexible functions for turning date
objects into strings.
Let’s see what event was recorded first in the Florida hurricane data set. In this exercise, you will format the earliest date in the florida_hurriance_dates
list in two ways so you can decide which one you want to use: either the ISO standard or the typical US style.
Instruction
- Assign the earliest date in
florida_hurricane_dates
tofirst_date
. - Print
first_date
in the ISO standard. For example, December 1st, 2000 would be"2000-12-01"
. - Print
first_date
in the US style, using.strftime()
. For example, December 1st, 2000 would be “12/1/2000” .
在这里插入代码片
1.10 Representing dates in different ways
date
objects in Python have a great number of ways they can be printed out as strings. In some cases, you want to know the date in a clear, language-agnostic format. In other cases, you want something which can fit into a paragraph and flow naturally.
Let’s try printing out the same date, August 26, 1992 (the day that Hurricane Andrew made landfall in Florida), in a number of different ways, to practice using the .strftime()
method.
Instruction 1
Print andrew
in the format ‘YYYY-MM’.
在这里插入代码片
Instruction 2
Print andrew
in the format ‘MONTH (YYYY)’, using %B for the month’s full name, which in this case will be August.
在这里插入代码片
Instruction 3
Print andrew
in the format ‘YYYY-DDD’ (where DDD is the day of the year) using %j
.
在这里插入代码片
2. Combining Dates and Times
2.1 Dates and times
2.2 Creating datetimes by hand
Often you create datetime
objects based on outside data. Sometimes though, you want to create a datetime
object from scratch.
You’re going to create a few different datetime
objects from scratch to get the hang of that process. These come from the bikeshare data set that you’ll use throughout the rest of the chapter.
Instruction 1
- Import the
datetime
class. - Create a
datetime
for October 1, 2017 at 15:26:26. - Print the results in ISO format.
在这里插入代码片
Instruction 2
- Import the
datetime
class. - Create a
datetime
for December 31, 2017 at 15:19:13. - Print the results in ISO format.
在这里插入代码片
Instruction 3
Create a new datetime by replacing the year in dt with 1917 (instead of 2017)
在这里插入代码片
2.3 Counting events before and after noon
In this chapter, you will be working with a list of all bike trips for one Capital Bikeshare bike, W20529, from October 1, 2017 to December 31, 2017. This list has been loaded as onebike_datetimes
.
Each element of the list is a dictionary with two entries: start
is a datetime
object corresponding to the start of a trip (when a bike is removed from the dock) and end
is a datetime
object corresponding to the end of a trip (when a bike is put back into a dock).
You can use this data set to understand better how this bike was used. Did more trips start before noon or after noon?
Instruction
- Within the
for
loop, complete theif
statement to check if the trip started before noon. - Within the
for
loop, incrementtrip_counts['AM']
if the trip started before noon, andtrip_counts['PM']
if it started after noon.
在这里插入代码片
2.4 Printing and parsing datetimes
2.5 Turning strings into datetimes
When you download data from the Internet, dates and times usually come to you as strings. Often the first step is to turn those strings into datetime
objects.
In this exercise, you will practice this transformation.
Reference
%Y | 4 digit year (0000-9999) |
%m | 2 digit month (1-12) |
%d | 2 digit day (1-31) |
%H | 2 digit hour (0-23) |
%M | 2 digit minute (0-59) |
%S | 2 digit second (0-59) |
Instruction 1
- Determine the format needed to convert
s
to datetime and assign it tofmt
. - Convert the string
s
to datetime usingfmt
.
在这里插入代码片
Instruction 2
- Determine the format needed to convert
s
to datetime and assign it tofmt
. - Convert the string
s
to datetime usingfmt
.
在这里插入代码片
Instruction 3
- Determine the format needed to convert
s
to datetime and assign it tofmt
. - Convert the string
s
to datetime usingfmt
.
在这里插入代码片
2.6 Parsing pairs of strings as datetimes
Up until now, you’ve been working with a pre-processed list of datetimes for W20529’s trips. For this exercise, you’re going to go one step back in the data cleaning pipeline and work with the strings that the data started as.
Instruction
- Outside the
for
loop, fill out thefmt
string with the correct parsing format for the data. - Within the
for
loop, parse thestart
andend
strings into thetrip
dictionary withstart
andend
keys anddatetime
objects for values.
在这里插入代码片
2.7 Recreating ISO format with strftime()
In the last chapter, you used strftime()
to create strings from date
objects. Now that you know about datetime
objects, let’s practice doing something similar.
Re-create the .isoformat()
method, using .strftime()
, and print the first trip start in our data set.
Instruction
- Complete
fmt
to match the format of ISO 8601. - Print
first_start
with both.isoformat()
and.strftime()
; they should match.
在这里插入代码片
2.8 Unix timestamps
Datetimes are sometimes stored as Unix timestamps: the number of seconds since January 1, 1970. This is especially common with computer infrastructure, like the log files that websites keep when they get visitors.
Instruction
- Complete the
for
loop to loop overtimestamps
. - Complete the code to turn each timestamp
ts
into adatetime
.
在这里插入代码片
2.9 Working with durations
2.10 Turning pairs of datetimes into durations
When working with timestamps, we often want to know how much time has elapsed between events. Thankfully, we can use datetime
arithmetic to ask Python to do the heavy lifting for us so we don’t need to worry about day, month, or year boundaries. Let’s calculate the number of seconds that the bike was out of the dock for each trip.
Continuing our work from a previous coding exercise, the bike trip data has been loaded as the list onebike_datetimes
.
Instruction
- Within the loop:
- Use arithmetic on the
start
andend
elements to find the length of the trip - Save the results to
trip_duration
. - Calculate
trip_length_seconds
fromtrip_duration
.
- Use arithmetic on the
在这里插入代码片
2.11 Average trip time
W20529 took 291 trips in our data set. How long were the trips on average? We can use the built-in Python functions sum()
and len()
to make this calculation.
Based on your last coding exercise, the data has been loaded as onebike_durations
. Each entry is a number of seconds that the bike was out of the dock.
Instruction
- Calculate
total_elapsed_time
across all trips inonebike_durations
. - Calculate
number_of_trips
foronebike_durations
. - Divide
total_elapsed_time
bynumber_of_trips
to get the average trip length.
在这里插入代码片
2.12 The long and the short of why time is hard
Out of 291 trips taken by W20529, how long was the longest? How short was the shortest? Does anything look fishy?
Instruction
- Calculate
shortest_trip
fromonebike_durations
. - Calculate
longest_trip
fromonebike_durations
. - Print the results, turning
shortest_trip
andlongest_trip
into strings so they can print.
在这里插入代码片
3. Time Zones and Daylight Saving
3.1 UTC offsets
3.2 Creating timezone aware datetime
In this exercise, you will practice setting timezones manually.
Instruction 1
- Import timezone.
- Set the tzinfo to UTC, without using timedelta.
在这里插入代码片
Instruction 2
- Set
pst
to be a timezone set for UTC-8. - Set
dt
's timezone to bepst
.
在这里插入代码片
Instruction 3
- Set
tz
to be a timezone set for UTC+11. - Set
dt
's timezone to betz
.
在这里插入代码片
3.3 Setting timezones
Now that you have the hang of setting timezones one at a time, let’s look at setting them for the first ten trips that W20529 took.
Instruction
- Create
edt
, atimezone
object whose UTC offset is -4 hours. - Within the
for
loop: - Set the
tzinfo
fortrip['start']
. - Set the
tzinfo
fortrip['end']
.
在这里插入代码片
3.4 What time did the bike leave in UTC?
Having set the timezone for the first ten rides that W20529 took, let’s see what time the bike left in UTC. We’ve already loaded the results of the previous exercise into memory.
Instruction
Within the for
loop, move dt
to be in UTC. Use timezone.utc
as a convenient shortcut for UTC.
在这里插入代码片
3.5 Time zone database
3.6 Putting the bike trips into the right time zone
Instead of setting the timezones for W20529 by hand, let’s assign them to their IANA timezone: ‘America/New_York’. Since we know their political jurisdiction, we don’t need to look up their UTC offset. Python will do that for us.
Instruction
- Import
tz
fromdateutil
. - Assign
et
to be the timezone'America/New_York'
. - Within the
for
loop, setstart
andend
to have et as their timezone (use.replace()
).
在这里插入代码片
3.7 What time did the bike leave? (Global edition)
When you need to move a datetime
from one timezone into another, use .astimezone()
and tz
. Often you will be moving things into UTC, but for fun let’s try moving things from ‘America/New_York’ into a few different time zones.
Instruction 1
- Set
uk
to be the timezone for the UK: ‘Europe/London’. - Change
local
to be in theuk
timezone and assign it tonotlocal
.
在这里插入代码片
Instruction 2
- Set
ist
to be the timezone for India: ‘Asia/Kolkata’. - Change
local
to be in theist
timezone and assign it tonotlocal
.
在这里插入代码片
Instruction 3
- Set
sm
to be the timezone for Samoa: ‘Pacific/Apia’. - Change
local
to be in thesm
timezone and assign it tonotlocal
.
在这里插入代码片
3.8 Starting daylight saving time
3.9 How many hours elapsed around daylight saving?
Since our bike data takes place in the fall, you’ll have to do something else to learn about the start of daylight savings time.
Let’s look at March 12, 2017, in the Eastern United States, when Daylight Saving kicked in at 2 AM.
If you create a datetime
for midnight that night, and add 6 hours to it, how much time will have elapsed?
Instruction 1
- You already have a
datetime
calledstart
, set for March 12, 2017 at midnight, set to the timezone ‘America/New_York’. - Add six hours to
start
and assign it toend
. Look at the UTC offset for the two results.
在这里插入代码片
Instruction 2
- You added 6 hours, and got 6 AM, despite the fact that the clocks springing forward means only 5 hours would have actually elapsed!
- Calculate the time between
start
andend
. How much time does Python think has elapsed?
在这里插入代码片
Instruction 3
Move your datetime
objects into UTC and calculate the elapsed time again.
Once you’re in UTC, what result do you get?
在这里插入代码片
3.10 March 29, throughout a decade
March 29, throughout a decadeDaylight Saving rules are complicated: they’re different in different places, they change over time, and they usually start on a Sunday (and so they move around the calendar).
For example, in the United Kingdom, as of the time this lesson was written, Daylight Saving begins on the last Sunday in March. Let’s look at the UTC offset for March 29, at midnight, for the years 2000 to 2010.
Instruction
- Using
tz
, set the timezone fordt
to be'Europe/London'
. - Within the
for
loop: - Use the
.replace()
method to change the year for dt to bey
. - Call
.isoformat()
on the result to observe the results.
在这里插入代码片
3.11 Ending daylight saving time
3.12 Finding ambiguous datetimes
At the end of lesson 2, we saw something anomalous in our bike trip duration data. Let’s see if we can identify what the problem might be.
Instruction
- Loop over the trips in
onebike_datetimes
:- Print any rides whose start is ambiguous.
- Print any rides whose end is ambiguous.
在这里插入代码片
3.13 Cleaning daylight saving data with fold
As we’ve just discovered, there is a ride in our data set which is being messed up by a Daylight Savings shift. Let’s clean up the data set so we actually have a correct minimum ride length. We can use the fact that we know the end of the ride happened after the beginning to fix up the duration messed up by the shift out of Daylight Savings.
Since Python does not handle tz.enfold()
when doing arithmetic, we must put our datetime objects into UTC, where ambiguities have been resolved.
Instruction
- Complete the
if
statement to be true only when a ride’sstart
comes after itsend
. - When
start
is afterend
, calltz.enfold()
on theend
so you know it refers to the one after the daylight savings time change. - After the
if
statement, convert the start and end to UTC so you can make a proper comparison.
在这里插入代码片
4. Easy and Powerful: Dates and Times in Pandas
4.1 Reading date and time data in Pandas
4.2 Loading a csv file in Pandas
The capital_onebike.csv
file covers the October, November and December rides of the Capital Bikeshare bike W20529.
Here are the first two columns:
Start date | End date | ··· |
---|---|---|
2017-10-01 15:23:25 | 2017-10-01 15:26:26 | ··· |
2017-10-01 15:42:57 | 2017-10-01 17:49:59 | ··· |
Instruction
- Import Pandas.
- Complete the call to
read_csv()
so that it correctly parses the date columnsStart date
andEnd date
.
在这里插入代码片
4.3 Making timedelta columns
Earlier in this course, you wrote a loop to subtract datetime
objects and determine how long our sample bike had been out of the docks. Now you’ll do the same thing with Pandas.
Instruction
- Subtract the
Start date
column from theEnd date
column to get a Series of timedeltas; assign the result toride_durations
. - Convert
ride_durations
into seconds and assign the result to the'Duration'
column ofrides
.
在这里插入代码片
4.4 Summarizing datetime data in Pandas
4.5 How many joyrides?
Suppose you have a theory that some people take long bike rides before putting their bike back in the same dock. Let’s call these rides “joyrides”.
You only have data on one bike, so while you can’t draw any bigger conclusions, it’s certainly worth a look.
Are there many joyrides? How long were they in our data set? Use the median instead of the mean, because we know there are some very long trips in our data set that might skew the answer, and the median is less sensitive to outliers.
Instruction
- Create a Pandas Series which is
True
whenStart station
andEnd station
are the same, and assign the result tojoyrides
. - Calculate the median duration of all rides.
- Calculate the median duration of
joyrides
.
在这里插入代码片
4.6 It’s getting cold outside, W20529
Washington, D.C. has mild weather overall, but the average high temperature in October (68ºF / 20ºC) is certainly higher than the average high temperature in December (47ºF / 8ºC). People also travel more in December, and they work fewer days so they commute less.
How might the weather or the season have affected the length of bike trips?
Instruction 1
- Resample
rides
to the daily level, based on theStart date
column. - Plot the
.size()
of each result.
在这里插入代码片
Instruction 2
Since the daily time series is so noisy for this one bike, change the resampling to be monthly.
在这里插入代码片
4.7 Members vs casual riders over time
Riders can either be “Members”, meaning they pay yearly for the ability to take a bike at any time, or “Casual”, meaning they pay at the kiosk attached to the bike dock.
Do members and casual riders drop off at the same rate over October to December, or does one drop off faster than the other?
Instruction
- Set
monthly_rides
to be a resampled version ofrides
, by month, based on start date. - Use the method
.value_counts()
to find out how many Member and Casual rides there were, and divide them by the total number of rides per month.
在这里插入代码片
4.8 Combining groupby() and resample()
A very powerful method in Pandas is .groupby()
. Whereas .resample()
groups rows by some time or date information, .groupby()
groups rows based on the values in one or more columns. For example, rides.groupby('Member type').size()
would tell us how many rides there were by member type in our entire DataFrame.
.resample()
can be called after .groupby()
. For example, how long was the median ride by month, and by Membership type?
Instruction
- Complete the
.groupby()
call to group by ‘Member type’, and the.resample()
call to resample according to ‘Start date’, by month. - Print the median
Duration
for each group.
在这里插入代码片
4.9 Additional datetime methods in Pandas
4.10 Timezones in Pandas
Earlier in this course, you assigned a timezone to each datetime
in a list. Now with Pandas you can do that with a single method call.
(Note that, just as before, your data set actually includes some ambiguous datetimes on account of daylight saving; for now, we’ll tell Pandas to not even try on those ones. Figuring them out would require more work.)
Instruction 1
Make the Start date
column timezone aware by localizing it to 'America/New_York'
while ignoring any ambiguous datetimes.
在这里插入代码片
Instruction 2
Now switch the Start date
column to the timezone 'Europe/London'
using the .dt.tz_convert()
method.
在这里插入代码片
4.11 How long per weekday?
Pandas has a number of datetime-related attributes within the .dt
accessor. Many of them are ones you’ve encountered before, like .dt.month
. Others are convenient and save time compared to standard Python, like .dt.weekday_name
.
Instruction
- Add a new column to
rides
called'Ride start weekday'
, which is the weekday of theStart date
. - Print the median ride duration for each weekday.
在这里插入代码片
4.12 How long between rides?
For your final exercise, let’s take advantage of Pandas indexing to do something interesting. How much time elapsed between rides?
Instruction
- Calculate the difference in the
Start date
of the current row and theEnd date
of the previous row and assign it torides['Time since']
. - Convert
rides['Time since']
to seconds to make it easier to work with. - Resample
rides
to be in monthly buckets according to theStart date
. - Divide the average by (60*60) to get the number of hours on average that W20529 waited in the dock before being picked up again.
在这里插入代码片