Data Study to Investigate Airport Connectedness

Posted on Feb 16, 2016
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Contributed by Sricharan Maddineni. He is currently in the NYC Data Science Academy 12 week full time Data Science Bootcamp program taking place between January 11th to April 1st, 2016. This post is based on his second class project - R Shiny (due on the 4th week of the program).

Why Are Airports Important?

Data Study to Investigate Airport Connectedness

(Photo by

Aviation infrastructure has been a bedrock of the United States economy and culture for many decades, and it was the first instrument through which we connected with the world. Before the invention of flight, data history shows humans were inexorably confined by the immenseness of Earth's oceans.

All the disdain and unpleasantries we endure on flights are quickly forgotten once we safely land at our destinations and realize we have just been transported to a new place on our vast planet. Every time I have flown and landed in a new country or city, I am overwhelmed with feelings of how beautiful our world is and how much I wish I could visit every corner of our planet. My love of aviation has led me to investigate the connectedness of United States airports and the passenger-disparity between the developed and developing countries.

Click here to try the app!

The App

The interactive map can be used as a tool to investigate the connectedness of the US airports. Users can choose from a list of airports including LAX, JFK, IAD and more to visualize the connections out of that airport. The 'Airport Connections' table shows us the combinations of connections by Airline Carrier. For example, we can see that American Airlines (AA) had 8058 flights out of LAX to JFK (2009 dataset). The 'Carriers' table shows us the total flights out of LAX by American Airlines (76,670).

If we select Hartsfield-Jackson Atlanta International, we see that it is the most connected airport in the United States. *Please note that I am not plotting all possible connections, just major airport connections and only within the United States (the map would be filled solid if I plotted all connections!). The size of the airport bubble is calculated by the number of connections. Therefore, all large bubbles are international airports and smaller bubbles are regional/domestic airports.

Data Study to Investigate Airport Connectedness

I also plotted Voronoi tesselations between the airports using one nearest neighbor to show the area differences between airports in the Eastcoast/Westcoast/Midwest. The largest polygons are found in the Midwest because airports are far apart in all directions. These airports are generally more connected as well since they are connecting the east and west coast (see Denver International or Salt Lake City International). Clicking on a Voronoi polygon brings up the nearest airport within that area.

Why is it important for countries to improve their airport infrastructure?

Looking at the Motion/Bubble Chart, we observe that developing countries travel horizontally whereas developed countries travel vertically. This indicates that developed countries populations have remained steady but they have seen a rise in passenger travelers. On the flip side, developing countries have seen their populations boom but the number of air travelers has remained stagnant.

Most importantly, countries moving upward show noticeable gains in GDP whereas countries moving horizontally show minimal gains over the last four decades (GDP is represented by the size of the bubble). We can also notice that airline passenger counts plunge during recessions for first world countries but remain comparatively steady for developing countries (1980, 2000, 2009). We can interpret this to mean that developing countries are not as connected to the rest of the world since their economies are unaffected by global economic crises.

bubble chart


Passenger Counts during weekends and Holidays

The calendar heatmap shows us the Daily flight count in the United States. We can recognize that airlines operate significantly fewer flights on Saturdays and National Holidays such as July 4th and Thanksgiving. The days leading up to and after National Holidays show an increase in flights as expected. Looking carefully, you can also notice there are fewer flights on Tuesdays and Wednesdays, and there are more flights during the summer season.

If you select a day on the calendar, a table shows us the top 20 Airline carrier flight counts on that day. Southwest, American Airlines, SkyWest and Delta seem to operate the most airlines in the United States.

Data Study to Investigate Airport Connectedness

Screen Shot 2016-02-14 at 11.06.20 PM

Click here to try the app!

The Data

1. Interactive Map

I utilized comprehensive datasets provided by the United States Department of Transportation and Open Data by Socrata that allowed me to map airport connections in the United States. The first airport dataset included airport locations (city/state) and their latitude and longitude degrees, and the second dataset included the airport connections (LAX - JFK, LAX-SFO, ...). First I used these datasets to calculate the size of the airport based on how many connections each had.

2. Motion Chart

The second analysis was done using the airline passenger, population, and GDP numbers for the world's countries over the last 45 years. Most of the work here was in transforming the three datasets provided by the World Bank from wide to long. See the code below.

3. Calendar Chart

Lastly, I used the Transtats database to obtain the daily flight counts by Airline Carrier for the years 2004-2007. Some transformation was done to create two separate data frames - flight counts per day and flight counts per carrier. While trying to calculate flight counts by day, I tried this code:

f2007_2 <- f2007 %>% group_by(UniqueCarrier, month) %>% summarise(sum = n())

I knew there as an error by looking at the resulting heatmap but I didn't realize this was showing me a cumulative sum by month rather than the daily flight count, so I hit twitter to see if I could get help diagnosing my problem. I tweeted Jeff Weis who appeared as the Aviation Analyst on CNN during the Malaysian Airlines MH370 disappearance and he caught my mistake! After he pushed me in the right direction, I corrected my code to:

group_by(UniqueCarrier, date) %>% summarise(count = n())

The Code

Creating Voronoi Polygons

Connection Lines

The second step was creating the line connections between the airports. To do this, I used the polylines function in Leaflet to add connecting lines between airports filtered by user input. input$Input1 catches the user selected airport and subsets the dataset by all origin airports that equal the selected airport. The gcIntermediate function makes those lines curved.

Calendar json capture

The calendar chart required two parameters, the datevar which reads the date column, and numvar which plots the value for each day on the calendar. Then I utilized a gvis.listener.jscode method to capture the user selected date and filter the dataset for the table.

About Author

Sricharan Maddineni

Sricharan Maddineni was a Neuroscience undergrad at Rutgers university. He is a professional music producer turned Data Scientist who has worked with major artists like Kid Ink, Dj Mustard, BMG and garnered over 18 million plays. He has...
View all posts by Sricharan Maddineni >

Leave a Comment

used rolex submariner imitation September 4, 2017
I just like the valuable info you provide to your articles. I will bookmark your blog and check again here frequently. I am relatively sure I’ll be told many new stuff right here! Good luck for the next!| used rolex submariner imitation
bague bulgari bulgari February 14, 2017
Let’s look at your statements, and again, I’m very pro streaming…I love it and use it daily. bague bulgari bulgari
Nicole November 1, 2016
I was just looking at your Airport DashBoard site and see that your site has the potential to become very popular. I just want to tell you, In case you don't already know... There is a website network which already has more than 16 million users, and most of the users are interested in topics like yours. By getting your website on this network you have a chance to get your site more popular than you can imagine. It is free to sign up and you can read more about it here: - Now, let me ask you... Do you need your site to be successful to maintain your way of life? Do you need targeted traffic who are interested in the services and products you offer? Are looking for exposure, to increase sales, and to quickly develop awareness for your website? If your answer is YES, you can achieve these things only if you get your website on the network I am describing. This traffic service advertises you to thousands, while also giving you a chance to test the service before paying anything. All the popular sites are using this network to boost their traffic and ad revenue! Why aren’t you? And what is better than traffic? It’s recurring traffic! That's how running a successful website works... Here's to your success! Find out more here: - Unsubscribe here:
Investigating Airport Connectedness - Launchship August 26, 2016
[…] To experience Sricharan Maddineni the interactive Shiny App […]
Investigating Airport Connectedness – WebProfIT Consulting August 4, 2016
[…] To experience Sricharan Maddineni the interactive Shiny App […]
Investigating Airport Connectedness – WebProfIT Consulting July 31, 2016
[…] To experience Sricharan Maddineni the interactive Shiny App […]

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI