CitiBike Supply and Demand in NYC
This post provides a summary of the project with several visualizations highlighted. View all code, generated visualizations, and the final presentation slides on GitHub.
What is rebalancing?
CitiBike, the country's largest bikeshare program, provides an affordable and sustainable alternative to the subway and taxi services. With a network of more than 25,000 bikes distributed over 1500 stations throughout New York City and New Jersey, CitiBike's operations rely on its ability to meet shifts in demand through the day and in different locations. As customer trips move bikes from station to station, CitiBike uses a variety of tactics to ensure there are both sufficient bikes and sufficient docks available where needed. To manage these so-called “rebalancing” efforts, CitiBike employs a series of predictive algorithms to determine where and when to transport bikes.
Without employing a predictive algorithm, this analysis aims to examine the patterns of movement of bikes around NYC. Where and when do New Yorkers move around by CitiBike? How does demand for bikes differ by time and day and location? To keep up with these shifts in demand, where and when must CitiBike employ their rebalancing efforts?
Trends in Customer Trips
This analysis pulls from open source CitiBike system data from June 2020 that details 1.8 million individual trips. Each observation includes the date and time for the trip start, the total trip duration, the trip’s origin and destination station IDs, and the station latitude and longitude. In order to facilitate geospatial visualizations, we also employ NYC zip code shapefiles from NYC Open Data.
To understand the supply and demand of bikes, we visualize the total number of trips by time of day and day of the week. No matter the day of the week, the greatest quantity of trips occur in the evening hours. During the week, there is a clear peak in usage at 5pm, with another peak in the morning at 8am. However, on the weekend, demand is more evenly distributed throughout the afternoon hours.
Geographical demand also varies considerably. Aggregated trips by zip code are visualized below. These choropleths depict differences in user behavior across regions of the city. For bike trips beginning in each zip code, the map on the left illustrates the average trip duration, while the map on the right illustrates the total count of originating trips. On the left, we see that trips starting in lower Manhattan tend to be shorter, while trips beginning in zip codes farther from central Manhattan tend to be longer. On the right, we see that the greatest quantity of trips begin in zip codes in lower Manhattan.
CitiBikes in Motion
Given that 94% of rides end at a different station from the one where the bike was picked up and that 79% of rides end in a different zip code than they began, this movement of bikes is the foundation of rebalancing. Movement can be represented by hourly surplus or deficit of bikes by zip code or by station. A station or zip code with more bikes leaving than arriving is at an overall deficit, while a station or zip code with fewer bikes leaving than arriving is at an overall surplus.
Aggregating rides by zip code, we can see the maximum surplus and deficit occurring in any one zip code for each hour of the day. With the exception of early morning hours, there are sizable surpluses and deficits occurring consistently throughout the day. This indicates that bikes do not move across zip code lines equally, but that certain zip codes are getting more incoming traffic and others are getting more outgoing traffic at each time of day. The greatest surpluses and deficits occur between 5am and 10am and between 4pm and 9pm.
Within these morning and evening peak hours, we can visualize which areas get the most incoming and outgoing traffic. In the following visuals, shades of blue represent a surplus of bikes – more bikes arriving than departing – and shades of red represent a deficit of bikes – more bikes departing than arriving. On the left, we see during the morning peak, zip codes in midtown Manhattan have a surplus of bikes, while the East Village has a deficit. The map on the right shows that during the evening peak hours, zip codes Central Park and the Upper East Side have the greatest deficit of bikes, while the East Village has the greatest surplus.
This offers an idea of the motion of bikes across the city throughout the day. However, to effectively achieve rebalancing, CitiBike is more concerned with individual stations, where bikes are either added or subtracted at different points throughout the day. In this visual, we see the maximum surplus and deficit of individual stations at each hour of the day. The five lines represent the five stations with the greatest surplus or deficit. The blue line has both the greatest surplus and the greatest deficit of all stations, with a large influx of bikes at 6am and an outflow of bikes at 4pm. This corresponds with the blue point, a station on the Upper East Side. This station, as well as the others shown on the map, require bike rebalancing.
Conclusion and Next Steps
The presence of these surpluses and deficits in the data are evidence of CitiBike’s rebalancing efforts in action. The data only depict the motion of bikes through customer trips, which does not account for the transportation of bikes behind the scenes. Yet, a station surplus in the data represents a moment when more users are arriving than the station has capacity to accept. Without rebalancing, a station surplus or deficit would be impossible. That is because a station can only accept as many bikes as it has docks for, and can only provide as many bikes as it has available. Thus, each station surplus or deficit is evidence of CitiBike moving bikes to meet demand.
For further analysis, we would build a model to approximate CitiBike’s rebalancing algorithms to predict trip destinations. Without knowing the details of CitiBike’s algorithm, we might try a logistic regression to classify whether a trip starts and ends in the same location. We would also try k-means clustering to group trips by shared features and determine the likelihood of different destinations.