Data Visualization on NYC Citi Bike

Summer Sun

Posted on Aug 27, 2017

The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Introduction

Like all other sharing systems, Airbnb the housing sharing system, Uber the car sharing system, Citi Bike is the network of bicycle rental stations intended for point-to-point transportation. Data shows Citi Bike is New York City's largest bike sharing system. It’s a convenient solution for trips that are too far to walk but too short for a taxi or the subway. The bike sharing system is combined with all other transportation methods available in the area for commuters.

Data Visualization on NYC Citi Bike Currently, there are about a million trips on average per month by Citi Bike riders. The system has 10,000 bicycles and 610 stations. By end of 2017, the total size of Citi Bike system will be 12,000 bikes and 750 stations. The grey area is the current service area. The yellow and blue areas represent the sections to be covered by end of 2017.

The Optimization Questions

Any Citi Bike client has come up against two frustrating scenarios: the empty dock at the start and full dock at the end of the trip. Researchers call this as "rebalancing" problem as part of "fleet optimization" questions. This problem has attracted the attention of data scientists to develop complex methodologies to optimize the available bikes and open docks.

Following I attempt to utilize the shiny visualization app to provide a hint for the 3 questions:

Fleet Routing Pattern Detection: what are the most popular routes during peak hours and off-peak? What is the direction of the flow?
Station Balance Prediction: what is the average volume of imbalance in the distributed system? What is the station-level inflow and outflow? Is it sensitive to the time? How does it look like in a time series?
Reducing rebalancing demand: What are the riders' activities like? Is it possible to rebalance through pricing schemes?

The visualization app is intended to provide a way to explore different comparative measures at the route, station and system levels with spatial attributes and time series.

The Data

Citi published Citi Bike Trip Histories - downloadable files of Citi Bike trip data. I used the Citi Bike data for the month of March 2017 (approximately 1 million observations). The data includes:

Trip Duration (in seconds)
Start Time and Date
Stop Time and Date
Start Station Name
End Station Name
Station ID
Station Lat/Long
Bike ID
User Type (Customer = 24-hour pass or 7-day pass user; Subscriber = Annual Member)
Gender (Zero=unknown; 1=male; 2=female)
Year of Birth

Before moving ahead with building the app, I was interested in exploring the data and identifying patterns of rebalancing.

Bar Chart 1 - Time wise imbalance (Peak/Off peak)

Bar Chart 2 -Location wise imbalance (Top 10 popular Station)

Data Insights

On the interactive map, each dot presents a station. The visualization will also provide options to identify popular routes by selecting date and hour range. The top popular routes are marked in orange as the lines between the spatial points. The direction of the routes is indicated by moving from the more red towards the more green dots.

The Patterns

Interesting patterns are observed. The most popular routes on the west side run through Central Park and Chelsea Pier. Grand central/Penn Station centered routes are also in the hottest route list. Outside Manhattan there are centers in Queen and Brooklyn initiating lots of popular routes. Riders bike more along the west and east streets than along north and south avenue. That makes sense in light of the fact that there are more uptown and downtown subways than crosstown ones, and riders do utilize the Citi Bike as a an alternative transportation option.

While not enough bikes available in hot pick up stations, the docks are lacking in hot drop off stations. The red dots are where outflow of bikes exceeds the inflow of bikes The green dots are where inflow of bikes exceeds the outflow of bikes. In the other words, the green dots are the hot spot to pick up a bike(more inbound bikes) and the red(more empty docks) to drop them off.

And The more extreme the color of dot is, the higher percentage change of the flows this stations has. The size and transparency of the dot is represented by the volume of both inflow and outflow of the stations. The more obvious the dot is, the hotter spot the station is.

Balancing Problems

What caused the balancing problem? The map based interactive app provides an insight for predicting demand. The information displayed is the accumulated hourly variables based on dates selected. Details of statistic numbers is also available for each stations by zooming in.

New York has a classic peak commuter flow pattern. Most commuters ride bikes towards the center of the town from its edge in the morning. At the end of the day, they ride the reverse way to the edge where they live, especially at the edge sections with fewer public transportations options.

Data on The Rider's Activities

What about the rider's activities. Is there any pattern involved? The app provides insights of rider's performance for reducing rebalancing demands. By studying rider's activities, it will provides suggestions for potential solutions.

Below each bubble represents an age and gender group. The age is represented as the number on each bubble. A negative correlation is observed between age and speed. The younger the rider is, the faster he/she rides. In similarity, the group in the thirties shows similar miles per trip. The performance between female and male group are also different. The male groups in blue perform a higher speed level than female groups in red.

The Balancing Solutions

Is there solutions for rebalancing to cut the cost and improve the efficiency, instead of manually moving bikes via trucks, bike-draw trailers and sprinter vans from full stations to empty stations? The moving will take crews travel in pairs 45 minutes to load a truck.

Citi Bike sought a way to get the riders to move the bikes themselves. In May it started the pilot Bike Angel. The reverse-commuter would be perfect target member of the Angel program. What is so appealing about the program is the bike sharing system could self -distribute its fleet with the proper incentives. The member can easily make 10 Amazon gift card with a few reverse trips. As a result, the demand of manually moving bike around would decrease.

The Conclusions

The Visualization app provides the real time status of fleets: popular routes, inbound/outbound, net change, time series of stations, hot spot analysis and rider's activities. It supports the self distributed fleet by establishing a baseline for identifying "healthy" rebalancing within the bike share system. It provides a hint for a future transportation solutions.

The App

The interactive app is available on Shiny.io.

About Author

Summer Sun

Summer is passionate about data science. She has 3 years’ experience analyzing large scale client data for major financial institutions. She loves contact from any challenging project.

View all posts by Summer Sun >

Student Works

Airbnb vs Long-Term Rentals: Understanding NYC Real Estate

Data Visualization

Data Analysis on the Airbnb NYC Market

Student Works

Data Analysis on Airbnb in NYC

Data Science News and Sharing

‘Airbnb com vs Hotels.com’ - A Webscraping Project

Featured

Data Visualization of Panda Go

Cancel reply

You must be logged in to post a comment.

Summer Sun November 7, 2017

Thank you for your interest. The R code is available here via https://github.com/summersuny/01shinyDemo

john froeschke August 28, 2017

Is the R code for this app available on github or other? Very interesting app!

NYC Citi Bike Visualization – A Hint of Future Transportation | A bunch of data August 27, 2017

[…] article was first published on R – NYC Data Science Academy Blog, and kindly contributed to […]

NYC Citi Bike Visualization – A Hint of Future Transportation – Cloud Data Architect August 27, 2017

[…] post NYC Citi Bike Visualization – A Hint of Future Transportation appeared first on NYC Data Science Academy […]

Data Visualization on NYC Citi Bike

The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Introduction

The Optimization Questions

The Data

Bar Chart 1 - Time wise imbalance (Peak/Off peak)

Bar Chart 2 -Location wise imbalance (Top 10 popular Station)

Data Insights

The Patterns

Balancing Problems

Data on The Rider's Activities

The Balancing Solutions

The Conclusions

The App

About Author

Summer Sun

Related Articles

Leave a Comment

Cancel reply

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

NYC Data Science Academy

Get detailed curriculum information about our
amazing bootcamp!

Offerings

About

SOCIAL MEDIA

Data Visualization on NYC Citi Bike

The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Introduction

The Optimization Questions

The Data

Bar Chart 1 - Time wise imbalance (Peak/Off peak)

Bar Chart 2 -Location wise imbalance (Top 10 popular Station)

Data Insights

The Patterns

Balancing Problems

Data on The Rider's Activities

The Balancing Solutions

The Conclusions

The App

About Author

Summer Sun

Related Articles

Leave a Comment

Cancel reply

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

NYC Data Science Academy

Get detailed curriculum information about our amazing bootcamp!

Offerings

About

SOCIAL MEDIA

Get detailed curriculum information about our
amazing bootcamp!