Flying over the usa - janeshdev/ggplot2 GitHub Wiki
The data is part of the ASA Data Expo 2009. One aspect we are interested in, is the daily pattern of flights across the US, as well as how this pattern is affected by extraordinary events, such as snow storms or hurricanes. In particular, we had a closer look at the following three dates.
cv service is great only in case you uncover your career plans and relevant data.
- January 19, 2006 – Just another day over the USA (data)
- March 13, 1993 – The White Hurricane (data)
- Attacks of Sep 11 (data)
The movies are strung together from a sequence of ggplot2 stills taken at 2 min intervals throughout the day.
is part of the ASA Data Expo ’09, but has been processed and supplemented with additional data. The functions used will assume the following variables:
- Orig, Dest – the airports of origin and destination for each flight
- DepTime, ArrTime – actual departure and arrival time
- CRSDepTime, CRSArrTime – scheduled arrival and departure time (CRS is the Computer Reservation System)
- DepDelay, ArrDelay – the departure and arrival delays based on scheduled arrival and departure times
- Cancelled – a binary variable on whether a flight was cancelled (0 = no, 1 = yes)
All flying times are reported in a format of HHMM (double digit hours and minutes), e.g. 1802 would correspond to 6:02 pm. Times are reported in local time of the airport, but for our purpose, we have transformed times to Eastern Standard Time (EST).
Additionally, we also make use of the data set airports which contains GPS coordinates of each airport and data on annual flight volume:
# airport information > names(airports) [1] "iata" "state" "airport" "city" "country" [6] "latitude" "longitude" "timezone" "dst" "Volume"
For a single time point, we subset the data to only regard those flights currently in the air, i.e. flights which depart before time point t and arrive after time t, as well as cancelled flights that should have taken off at this time. To give a better rendering, we regard not only a time point, but an interval as well (usually set to 2 minutes).
# get all flights in the air and all flights that have been cancelled inFlight <- function(df, tp, interval) { # df is the data set to subset # tp is the time point startHour <- tp endHour <- tp + interval if (endHour %% 100 >= 60) { endHour <- (endHour - 60) + 100 } tm <- subset(df, ((DepTime < endHour) & (ArrTime > startHour)) | ((CRSDepTime %in% (startHour:endHour))) & (Cancelled == 1))) return(tm) }
Depending on the time of day, this results in a few thousand flights on a regular day. For each of these records we want to get the approximate location of a plane between its originating airport and its destination. We do this by interpolating the GPS coordinates of origin and destination. In order to avoid overplotting, flight routes are not rendered as straight lines but as arcs with random curvature (resulting in plots very similar to what jittering would do) – with a trick, though. By specifically setting the seed of the random number generator used (here, we use DepTime*ArrTime as seed, which gives a unique number for each flight), we can control the rendering of a flight and are therefore able to keep track of a flight across multiple time points.
# flightTrack returns approximate latitude and longitude of a plane flightTrack <- function(fromXY, toXY, ratio, seed) { # from XY and toXY are GPS coordinates of origin and destination airports # ratio is a number between 0 and 1, indicating how much of the distance # the plane has travelled, with 0 indicating the origin and 1 indicating # the destination # seed is the seed used in the random number generator - here we use # ArrTime*DepTime to uniquely identify each flight rand <- sapply(seed, function(x) { set.seed(x) return(runif(1,-.5,.5)) }) dir <- toXY-fromXY orth <- rev(dir) orth[,1] <- orth[,1]*(-1) location <- fromXY+ratio*dir+(1-ratio)*ratio*orth*rand return(location) } # compute time in air, extract GPS coordinates for airports getAircraftLocation <- function(df) { # helper function: get coordinates for airport airport.location <- function(iata) { idx <- sapply(iata, function(x) return(which(airports$iata %in% x))) x <- airports[idx,7:6] return (x) } # get GPS coordinates of airports origXY <- airport.location(df$Origin) destXY <- airport.location(df$Dest) # compute air time based on departure and arrival times airtime <- with(df, (ArrTime %% 100 - DepTime %% 100) + (ArrTime%/%100 - DepTime%/%100)*60) # compute the ratio flown, adjust for possible data errors flown <- with(df, (time %% 100 - DepTime %% 100) + (time%/%100 - DepTime%/%100)*60) flown[flown < 0] <- 0 ratio <- flown/airtime ratio[is.na(ratio)] <- 0 ratio[ratio > 1] <- 1 # render flights on straight line # return(origXY+ratio*(destXY-origXY)) # render flights on arcs with random curvature return(flightTrack(origXY, destXY, ratio, df$DepTime*df$ArrTime)) }
To put all flights in a spatial context, we use a map of the USA as background. Since we are dealing with airports, these are shown as dots on the map (very small dots), but only if they have a volume of at least 100 commercial flights a year. In order to visualize the time of day, an additional slider display is put underneath the map. A marker will be put on this slider when we show flight data for a specific time.
# setting up a map for the US and specify plotting parameters library(ggplot2) library(maps) states <- map_data(states) # a set of personal choices for the map display map.opts <- opts(panel.grid.minor=theme_blank(), panel.grid.major=theme_blank(), panel.background=theme_blank(), axis.title.x=theme_blank(), axis.title.y=theme_blank(), axis.line=theme_blank(), axis.ticks=theme_blank(), axis.text.y = theme_text(colour="#FFFFFF"), axis.text.x = theme_text(colour = "#FFFFFF")) (usamap <- qplot(x, y, data = states, geom = "polygon", fill = I("grey85"), colour = I("white"), xlim = c(-130,-60), ylim = c(20,50)) + map.opts + geom_point(aes(x = longitude, y = latitude), size = 0.7, colour = "grey65", data = subset(airports, Volume > 100)) + geom_line(aes(x = x, y = y, group = id), data = slider, colour = "grey55", size = 0.25) + geom_line(aes(x = x, y = y, group = id), data = ticks, colour = "grey55", size = 0.25) + annotate("text", x = ticks$x[1], y = 22.8, label = c("Jan 18 2006"), colour = "grey40", size = 3, hjust = 0.25, vjust = 0) + annotate("text", x = ticks$x[nrow(ticks)-2], y = 22.8, label = c("Jan 19 2006"), colour = "grey40", size = 3, hjust = 0.5, vjust = 0) + annotate("text", x = ticks$x[idx], y = ticks$y[idx]-1, label = c("3am EST", "6am", "9am", "12pm", "3pm", "6pm", "9pm", "12am EST", "3am"), colour = "grey40", size = 3, hjust = 0.5, vjust = 0) )
Map of the USA. The time slider underneath the map is set to one day – in this case January 19th, 2006. Airports with a volume of at least 100 annual flights are rendered as (very) faint grey dots.
Since one of our goals with this data set was to be able to render a whole day of flights as a sequence of plots, we have to keep track of the time component in each individual plot.
We do this twofold: first, we use a time slider under the map, on which we add a single point to mark the position of a plot in its sequence during the day. Second, we plot clocks at either side of the map, one showing Pacific time, the other Eastern time. Plotting the clock is done by using three sets of lines in a graphic with polar coordinates. The first two lines come out from the center of the plot and represent the hand. The third line is a circle around, completing the clock’s face:
plotFace <- function(time) { # time is given in the format HHMM ho <- time %/% 100 ho <- ho %% 12 # ensure hour between 0 and 12 mi <- time %% 100 # minute and hour hands face <- data.frame(cbind(x = c(rep((ho+mi/60)/12,2), rep(mi/60, 2)), y = c(0,0.5, 0,0.9))) face$id <- c("hour", "hour", "min", "min") # clock frame around clock <- data.frame(cbind(x = seq(0,1,length = 100), y = rep(1,100))) clock$id <- "clock" face <- rbind(face,clock) q <- qplot(x,y, geom = "line", colour = I("grey65"), group = id, data = face, xlim = c(0,1)) + coord_polar() + scale_x_continuous(breaks = seq(0,1, length = 5), labels = "") + labs(x = NULL, y = NULL) + opts(plot.margin = unit(rep(0,4), "lines"), panel.background = theme_blank(), panel.grid.minor = theme_blank(), axis.title.x = theme_blank(), axis.title.y = theme_blank(), axis.line = theme_blank(), axis.ticks = theme_blank(), axis.text.y = theme_text(colour = "#FFFFFF", lineheight = -10, size = 0), axis.text.x = theme_text(colour = "#FFFFFF", lineheight = -10, size = 0)) return(q) }
Flights are shown as dots on the map with increasing point sizes for increasing amounts of delays. To get a more robust rendering, delays of more than 4 hours are right censored (i.e. clipped to 4 h). Two additional viewports are used for displaying the clock faces.
plotMap <- function(res, time) { require(ggplot2) cancel <- subset(res, Cancelled == 1) res <- subset(res, Cancelled == 0) q <- usamap + geom_point(aes(x = x, y = y), data = data.frame(cbind(x = getTSliderX(time), y = 22)), color = "grey30") if (nrow(cancel) > 0) { cancelxy <- airport.location(cancel$Origin) q <- q + geom_jitter(aes(x = longitude, y = latitude), size = 2, colour = I(alpha("red",5/10)), data = cancelxy) } if(nrow(res) > 0) { res$time <- time loc <- getAircraftLocation(res[,c("Origin", "Dest", "DepTime", "ArrTime", "time")]) loc$delay <- with(res, pmax(ArrDelay,0)) loc$delay <- with(res, pmin(ArrDelay,300)) q <- q + geom_point(aes(x = longitude, y = latitude, size = delay), colour = I(alpha("black", 7/10)), data = loc) + scale_size(name = "Arrival\nDelays", breaks = c(15, 60, 120, 240), labels = c("15 min", "1h", "2h", "4h"), limits = c(0,300)) } print(q) vp1 <- viewport(x = 0.05, y = 0.25, height = unit(2.5, "cm"), width = unit(2.5, "cm"), just = c("left","bottom")) vp2 <- viewport(x = 0.8, y = 0.25, height = unit(2.5, "cm"), width = unit(2.5, "cm"), just = c("right","bottom")) # show Pacific time print(plotFace(time-300), vp = vp1) # show Eastern time print(plotFace(time), vp = vp2) }
A map for a single time point can now be created with the command:
plotMap(inFlight(jan19, 1830, 2), 1830) # map for 6:30 pm on January 19
In order to get an idea about the regular flight pattern of planes over the US, we picked different days and rendered them as movies. January 19th is a typical day among those.
At 1 am PST red-eye flights are on their way from the West to airports on the East Coast. Regular flights start as early as 6 am local time. By 7 am EST, a first wave of planes has taken off, soon to be followed by flights from airports on Central Time. The red dots in a few airports along the East Coast indicate flight cancellations.
By 12 pm the sky is bustling with planes. Denver seems to have trouble with delayed flights on that particular day. Generally, delays build during the course of a day – a lot more flights show delays on the map on the right, showing a snapshot of the situation at 6 pm EST. Both maps show the absence of activity over the sparser populated states of Montana and the Dakotas.
At midnight Eastern Time the second time zone wave starts, when airports go to sleep along time zones. By midnight EST regular activity along the airports on the East Coast has ceased. Chicago is sending out its delayed planes at 2 am EST.
# get the above maps and save as pngs png("jan19-cancel-%3d.png", height=300, width=427) for (i in c(400, 730, 1500, 1800, 2400, 2600)) { res <- inFlight(jan19, i, 2) plotMap(res, i) } dev.off()
The snowstorm of March 13, 1993 is also dubbed the “Snowstorm of the Century”, the “No-Name Hurricane” or the “White Hurricane” – on its path across 26 States it dumped more than 3 ft of snow in some areas between Texas, Pennsylvania, and Nova Scotia. Not surprisingly, this did affect airports, and led to a lot of closings and flight cancellations.
At 6:30 am EST the usual flight pattern is already severely disrupted. Instead of a bustling sky, most planes are grounded and flights cancelled (red dots). Those flights that do take off, are not heavily delayed – most of them land within 60 min of their scheduled arrival. By noon EST, delays have increased dramatically – an area of low flight volume and multiple cancellations reaches from Alabama to the Northeast.
By early evening the Northeast has completely shut down. Flights neither leave nor come in. There are heavy flight delays across the US, particularly in the Chicago area and the Southeast. Most flights arrive with delays of over 2h. At 1 am the situation at airports is cleared enough that planes can take off – below a map of the situation at 2 am EST.
# get the above maps and save as pngs png("mar13-%1d.png", height=300, width=427) for (i in c(630, 1200, 1800, 2600)) { # get all flights that are in the air at time i res <- inFlight(mar13, i, 2) plotMap(res, i) } dev.off()
In the morning of September 11, 2001 terrorists hijacked four planes of American Airlines. Two of the airplanes hit the World Trade Center at 8:46 and 9:03 Eastern time. The third flight crashed into the Pentagon at 9:37 am. The terrorists were overcome by passengers of the fourth flight, but the hijacking ended tragically with the plane crashing into a field near Shanksville, Pennsylvania at 10:03 am.
Immediately after the attacks, planes in the air were grounded and flights across the country cancelled. Normal flights did not resume until days afterwards.
The maps below show snapshots of the day as the events unfold.
# get the above maps and save as pngs png("sep11-cancel-%1d.png", height=300, width=427) for (i in c(845, 900, 1030, 1200)) { res <- getMinutes(sep11, i, 2) plotMap(res, i) } dev.off()