CitiBike Supply and Demand in NYC

Posted on Jul 25, 2023

What is rebalancing?

CitiBike, the country's largest bikeshare program, provides an affordable and sustainable alternative to the subway and taxi services. With a network of more than 25,000 bikes distributed over 1500 stations throughout New York City and New Jersey, CitiBike's operations rely on its ability to meet shifts in demand through the day and in different locations. As customer trips move bikes from station to station, CitiBike uses a variety of tactics to ensure there are both sufficient bikes and sufficient docks available where needed. To manage these so-called “rebalancing” efforts, CitiBike employs a series of predictive algorithms to determine where and when to transport bikes.

Without employing a predictive algorithm, this analysis aims to examine the patterns of movement of bikes around NYC. Where and when do New Yorkers move around by CitiBike? How does demand for bikes differ by time and day and location? To keep up with these shifts in demand, where and when must CitiBike employ their rebalancing efforts?

Trends in Customer Trips

This analysis pulls from open source CitiBike system data from June 2020 that details 1.8 million individual trips. Each observation includes the date and time for the trip start, the total trip duration,  the trip’s origin and destination station IDs, and the station latitude and longitude. In order to facilitate geospatial visualizations, we also employ NYC zip code shapefiles from NYC Open Data.

To understand the supply and demand of bikes, we visualize the total number of trips by time of day and day of the week. No matter the day of the week, the greatest quantity of trips occur in the evening hours. During the week, there is a clear peak in usage at 5pm, with another peak in the morning at 8am. However, on the weekend, demand is more evenly distributed throughout the afternoon hours.

Geographical demand also varies considerably. Aggregated trips by zip code are visualized below. These choropleths depict differences in user behavior across regions of the city. For bike trips beginning in each zip code, the map on the left illustrates the average trip duration, while the map on the right illustrates the total count of originating trips. On the left, we see that trips starting in lower Manhattan tend to be shorter, while trips beginning in zip codes farther from central Manhattan tend to be longer. On the right, we see that the greatest quantity of trips begin in zip codes in lower Manhattan.

CitiBikes in Motion

Given that 94% of rides end at a different station from the one where the bike was picked up and that 79% of rides end in a different zip code than they began, this movement of bikes is the foundation of rebalancing. Movement can be represented by hourly surplus or deficit of bikes by zip code or by station. A station or zip code with more bikes leaving than arriving is at an overall deficit, while a station or zip code with fewer bikes leaving than arriving is at an overall surplus. 

Aggregating rides by zip code, we can see the maximum surplus and deficit occurring in any one zip code for each hour of the day. With the exception of early morning hours, there are sizable surpluses and deficits occurring consistently throughout the day. This indicates that bikes do not move across zip code lines equally, but that certain zip codes are getting more incoming traffic and others are getting more outgoing traffic at each time of day. The greatest surpluses and deficits occur between 5am and 10am and between 4pm and 9pm.

Within these morning and evening peak hours, we can visualize which areas get the most incoming and outgoing traffic. In the following visuals, shades of blue represent a surplus of bikes – more bikes arriving than departing – and shades of red represent a deficit of bikes – more bikes departing than arriving. On the left, we see during the morning peak, zip codes in midtown Manhattan have a surplus of bikes, while the East Village has a deficit. The map on the right shows that during the evening peak hours, zip codes Central Park and the Upper East Side have the greatest deficit of bikes, while the East Village has the greatest surplus.

This offers an idea of the motion of bikes across the city throughout the day. However, to effectively achieve rebalancing, CitiBike is more concerned with individual stations, where bikes are either added or subtracted at different points throughout the day. In this visual, we see the maximum surplus and deficit of individual stations at each hour of the day. The five lines represent the five stations with the greatest surplus or deficit. The blue line has both the greatest surplus and the greatest deficit of all stations, with a large influx of bikes at 6am and an outflow of bikes at 4pm. This corresponds with the blue point, a station on the Upper East Side. This station, as well as the others shown on the map, require bike rebalancing.

Conclusion and Next Steps

The presence of these surpluses and deficits in the data are evidence of CitiBike’s rebalancing efforts in action. The data only depict the motion of bikes through customer trips, which does not account for the transportation of bikes behind the scenes. Yet, a station surplus in the data represents a moment when more users are arriving than the station has capacity to accept. Without rebalancing, a station surplus or deficit would be impossible. That is because a station can only accept as many bikes as it has docks for, and can only provide as many bikes as it has available. Thus, each station surplus or deficit is evidence of CitiBike moving bikes to meet demand.

For further analysis, we would build a model to approximate CitiBike’s rebalancing algorithms to predict trip destinations. Without knowing the details of CitiBike’s algorithm, we might try a logistic regression to classify whether a trip starts and ends in the same location. We would also try k-means clustering to group trips by shared features and determine the likelihood of different destinations.

About Author

Emmeline Danforth

Data scientist with a background producing analyses, visualizations, and insights to drive decision-making within the education and policy sectors.
View all posts by Emmeline Danforth >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI