Data Visualization on NYC Citi Bike

Posted on Aug 27, 2017
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Introduction

Like all other sharing systems,  Airbnb the housing sharing system, Uber the car sharing system, Citi Bike is the network of bicycle rental stations intended for point-to-point transportation. Data shows Citi Bike is New York City's largest bike sharing system. It’s a convenient solution for trips that are too far to walk but too short for a taxi or the subway. The bike sharing system is combined with all other transportation methods available in the area for commuters.

Data Visualization on NYC Citi Bike Currently, there are about a million trips on average per month by Citi Bike riders. The system has 10,000 bicycles and 610 stations. By end of 2017, the total size of Citi Bike system will be 12,000 bikes and 750 stations. The grey area is the current service area. The yellow and blue areas represent the sections to be covered by end of 2017.

 

 

The Optimization Questions

Any Citi Bike client has come up against two frustrating scenarios: the empty dock at the start and full dock at the end of the trip. Researchers call this as "rebalancing" problem as part of "fleet optimization" questions.  This problem has attracted the attention of data scientists to develop complex methodologies to optimize the available bikes and open docks.

Following I attempt to utilize the shiny visualization app to provide a hint for the 3 questions:

  1. Fleet Routing Pattern Detection: what are the most popular routes during peak hours and off-peak? What is the direction of the flow?
  2. Station Balance Prediction: what is the average volume of imbalance in the distributed system? What is the station-level inflow and outflow? Is it sensitive to the time? How does it look like in a time series?
  3. Reducing rebalancing demand: What are the riders' activities like? Is it possible to rebalance through pricing schemes?

The visualization app is intended to provide a way to explore different comparative measures at the route, station and system levels with spatial attributes and time series.

 

The Data

Citi published Citi Bike Trip Histories - downloadable files of Citi Bike trip data. I used the Citi Bike data for the month of March 2017 (approximately 1 million observations). The data includes:

  • Trip Duration (in seconds)
  • Start Time and Date
  • Stop Time and Date
  • Start Station Name
  • End Station Name
  • Station ID
  • Station Lat/Long
  • Bike ID
  • User Type (Customer = 24-hour pass or 7-day pass user; Subscriber = Annual Member)
  • Gender (Zero=unknown; 1=male; 2=female)
  • Year of Birth

Before moving ahead with building the app, I was interested in exploring the data and identifying patterns of rebalancing.

Bar Chart 1 - Time wise imbalance (Peak/Off peak)

Bar Chart 2 -Location wise imbalance (Top 10 popular Station)

 

Data Insights

On the interactive map, each dot presents a station.  The visualization will also provide options to identify popular routes by selecting date and hour range. The top popular routes are marked in orange as the lines between the spatial points. The direction of the routes is indicated by moving from the more red towards the more green dots.

 

  1. Citi Bike Daily Migration


The Patterns

Interesting patterns are observed. The most popular routes on the west side run through Central Park and Chelsea Pier. Grand central/Penn Station centered routes are also in the hottest route list. Outside Manhattan there are centers in Queen and Brooklyn initiating lots of popular routes. Riders bike more along the west and east streets than along north and south avenue. That makes sense in light of the fact that  there are more uptown and downtown subways than crosstown ones, and riders do utilize the Citi Bike as a an alternative transportation option.

While not enough bikes available in hot pick up stations, the docks are lacking in hot drop off stations. The red dots are where outflow of bikes exceeds the inflow of bikes The green dots are where inflow of bikes exceeds the outflow of bikes. In the other words, the green dots are the hot spot to pick up a bike(more inbound bikes) and the red(more empty docks) to drop them off.

And The more extreme the color of dot is, the higher percentage change of the flows this stations has. The size and transparency of the dot is represented by the volume of  both inflow and outflow of the stations. The more obvious the dot is, the hotter spot the station is.

Balancing Problems

What caused the balancing problem? The map based interactive app provides an insight for predicting demand. The information displayed is the accumulated hourly variables based on dates selected. Details of statistic numbers is also available for each stations by zooming in.

New York has a classic peak commuter flow pattern.  Most commuters ride bikes towards the center of the town from its edge in the morning. At the end of the day, they ride the reverse way to the edge where they live, especially at the edge sections with fewer public transportations options.   

 

Data on The Rider's Activities

What about the rider's activities. Is there any pattern involved? The app provides insights of rider's performance for reducing rebalancing demands. By studying rider's activities, it will provides suggestions for potential solutions.

Below each bubble represents an age and gender group. The age is represented as the number on each bubble.  A negative correlation is observed between age and speed. The younger the rider is, the faster he/she rides. In similarity, the group in the thirties shows similar miles per trip. The performance between female and male group are also different.  The male groups in blue perform a higher speed level than female groups in red. 

Data Visualization on NYC Citi Bike

 

The Balancing Solutions

Is there solutions for rebalancing to cut the cost and improve the efficiency, instead of manually moving bikes via trucks, bike-draw trailers and sprinter vans from full stations to empty stations?  The moving will take crews travel in pairs 45 minutes to load a truck. 

Citi Bike sought a way to get the riders to move the bikes themselves. In May it started the pilot Bike Angel. The reverse-commuter would be perfect target member of the Angel program. What is so appealing about the program is the bike sharing system could self -distribute its fleet with the proper incentives. The member can easily make 10 Amazon gift card with a few reverse trips. As a result, the demand of manually moving bike around would decrease.

 

The Conclusions

The Visualization app provides the real time status of fleets: popular routes, inbound/outbound, net change, time series of stations, hot spot analysis and rider's activities. It supports the self distributed fleet by establishing a baseline for identifying "healthy" rebalancing within the bike share system. It provides a hint for a future transportation solutions.

 

The App

The interactive app is available on Shiny.io.

About Author

Summer Sun

Summer is passionate about data science. She has 3 years’ experience analyzing large scale client data for major financial institutions. She loves contact from any challenging project.
View all posts by Summer Sun >

Related Articles

Leave a Comment

Summer Sun November 7, 2017
Thank you for your interest. The R code is available here via https://github.com/summersuny/01shinyDemo
john froeschke August 28, 2017
Is the R code for this app available on github or other? Very interesting app!
NYC Citi Bike Visualization – A Hint of Future Transportation | A bunch of data August 27, 2017
[…] article was first published on R – NYC Data Science Academy Blog, and kindly contributed to […]
NYC Citi Bike Visualization – A Hint of Future Transportation – Cloud Data Architect August 27, 2017
[…] post NYC Citi Bike Visualization – A Hint of Future Transportation appeared first on NYC Data Science Academy […]

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI