Exploring Flight Delay Causes

- Course Project

Important Note: I have provided the final plots in the grid below. However, please do view my GitHub repository to view the code for this project. Also, since GitHub doesn't directly support Plotly visualizations please visit this link (also provided in the repository) to view the Jupyter notebook with the code and images rendered, thus giving you a better idea of how I went about the whole storytelling process and what my initial plots were and how I refined them to reach the 'final plots' which I've displayed below. 

All plots were made using Plotly. Thus to hover and view details on plots, please do visit the second link provided above. 

This was the final project for the Visualization for Data Journalism course offered by the University of Illinois on Coursera.

Dataset — Airline Delay Causes. This data set consists of on-time statistics for airlines in the United States. This data set is freely available from The US Department of Transportation: http://www.transtats.bts.gov/OT_Delay/OT_DelayCause1.asp.

The Data and column labels are explained in the "Project Brief_Using_plotly_for_data_visualization" PDF document provided in the repository.

For the purpose of this assignment - data set of entries from February 2009 to February 2019 was downloaded and used.

Delays can be categorized into:

  • Air carrier, where the cause was within the airline's control (e.g., maintenance or crew problems, aircraft cleaning, baggage loading, fueling, etc.).

  • Extreme weather such as tornadoes, blizzards, or hurricanes.

  • National Aviation System (NAS), which refers to a broad set of conditions, such as non-extreme weather conditions, airport operations, heavy traffic volume, and air traffic control.

  • Late-arriving aircraft, where a previous flight with same aircraft arrived late, causing the present flight to depart late.

  • Security, caused by evacuation of a terminal or concourse, re-boarding of aircraft because of a security breach, inoperative screening equipment, and/or long lines in excess of 29 minutes at screening areas.

This was an open-ended assignment which required students (me) to explore and program on my own. In the images provided, you can see that I visualized the data, based on some guidance questions which were:

  • On which airline should you fly to avoid significant delays?

  • In which months should you fly to avoid significant delays?

  • Are there any relationships between seasons and flight delays?

  • Which region of airport has the most significant number of delays?

  • What are the major reasons for the delays?

As you can see from the final images provided, I utilised bar graphs, maps, pie charts, scatter plots, etc. to visualize the data and come to some comprehensive answers to the above questions.