Data Visualization on NYC Citi Bike
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Introduction
Like all other sharing systems, Airbnb the housing sharing system, Uber the car sharing system, Citi Bike is the network of bicycle rental stations intended for point-to-point transportation. Data shows Citi Bike is New York City's largest bike sharing system. It’s a convenient solution for trips that are too far to walk but too short for a taxi or the subway. The bike sharing system is combined with all other transportation methods available in the area for commuters.
Currently, there are about a million trips on average per month by Citi Bike riders. The system has 10,000 bicycles and 610 stations. By end of 2017, the total size of Citi Bike system will be 12,000 bikes and 750 stations. The grey area is the current service area. The yellow and blue areas represent the sections to be covered by end of 2017.
The Optimization Questions
Any Citi Bike client has come up against two frustrating scenarios: the empty dock at the start and full dock at the end of the trip. Researchers call this as "rebalancing" problem as part of "fleet optimization" questions. This problem has attracted the attention of data scientists to develop complex methodologies to optimize the available bikes and open docks.
Following I attempt to utilize the shiny visualization app to provide a hint for the 3 questions:
- Fleet Routing Pattern Detection: what are the most popular routes during peak hours and off-peak? What is the direction of the flow?
- Station Balance Prediction: what is the average volume of imbalance in the distributed system? What is the station-level inflow and outflow? Is it sensitive to the time? How does it look like in a time series?
- Reducing rebalancing demand: What are the riders' activities like? Is it possible to rebalance through pricing schemes?
The visualization app is intended to provide a way to explore different comparative measures at the route, station and system levels with spatial attributes and time series.
The Data
Citi published Citi Bike Trip Histories - downloadable files of Citi Bike trip data. I used the Citi Bike data for the month of March 2017 (approximately 1 million observations). The data includes:
- Trip Duration (in seconds)
- Start Time and Date
- Stop Time and Date
- Start Station Name
- End Station Name
- Station ID
- Station Lat/Long
- Bike ID
- User Type (Customer = 24-hour pass or 7-day pass user; Subscriber = Annual Member)
- Gender (Zero=unknown; 1=male; 2=female)
- Year of Birth
Before moving ahead with building the app, I was interested in exploring the data and identifying patterns of rebalancing.
Bar Chart 1 - Time wise imbalance (Peak/Off peak)
Bar Chart 2 -Location wise imbalance (Top 10 popular Station)
Data Insights
On the interactive map, each dot presents a station. The visualization will also provide options to identify popular routes by selecting date and hour range. The top popular routes are marked in orange as the lines between the spatial points. The direction of the routes is indicated by moving from the more red towards the more green dots.
The Patterns
Interesting patterns are observed. The most popular routes on the west side run through Central Park and Chelsea Pier. Grand central/Penn Station centered routes are also in the hottest route list. Outside Manhattan there are centers in Queen and Brooklyn initiating lots of popular routes. Riders bike more along the west and east streets than along north and south avenue. That makes sense in light of the fact that there are more uptown and downtown subways than crosstown ones, and riders do utilize the Citi Bike as a an alternative transportation option.
While not enough bikes available in hot pick up stations, the docks are lacking in hot drop off stations. The red dots are where outflow of bikes exceeds the inflow of bikes The green dots are where inflow of bikes exceeds the outflow of bikes. In the other words, the green dots are the hot spot to pick up a bike(more inbound bikes) and the red(more empty docks) to drop them off.
And The more extreme the color of dot is, the higher percentage change of the flows this stations has. The size and transparency of the dot is represented by the volume of both inflow and outflow of the stations. The more obvious the dot is, the hotter spot the station is.
Balancing Problems
What caused the balancing problem? The map based interactive app provides an insight for predicting demand. The information displayed is the accumulated hourly variables based on dates selected. Details of statistic numbers is also available for each stations by zooming in.
New York has a classic peak commuter flow pattern. Most commuters ride bikes towards the center of the town from its edge in the morning. At the end of the day, they ride the reverse way to the edge where they live, especially at the edge sections with fewer public transportations options.
Data on The Rider's Activities
What about the rider's activities. Is there any pattern involved? The app provides insights of rider's performance for reducing rebalancing demands. By studying rider's activities, it will provides suggestions for potential solutions.
Below each bubble represents an age and gender group. The age is represented as the number on each bubble. A negative correlation is observed between age and speed. The younger the rider is, the faster he/she rides. In similarity, the group in the thirties shows similar miles per trip. The performance between female and male group are also different. The male groups in blue perform a higher speed level than female groups in red.
The Balancing Solutions
Is there solutions for rebalancing to cut the cost and improve the efficiency, instead of manually moving bikes via trucks, bike-draw trailers and sprinter vans from full stations to empty stations? The moving will take crews travel in pairs 45 minutes to load a truck.
Citi Bike sought a way to get the riders to move the bikes themselves. In May it started the pilot Bike Angel. The reverse-commuter would be perfect target member of the Angel program. What is so appealing about the program is the bike sharing system could self -distribute its fleet with the proper incentives. The member can easily make 10 Amazon gift card with a few reverse trips. As a result, the demand of manually moving bike around would decrease.
The Conclusions
The Visualization app provides the real time status of fleets: popular routes, inbound/outbound, net change, time series of stations, hot spot analysis and rider's activities. It supports the self distributed fleet by establishing a baseline for identifying "healthy" rebalancing within the bike share system. It provides a hint for a future transportation solutions.
The App
The interactive app is available on Shiny.io.