How Fast Can You CitiBike?
The skills the authors demonstrated here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
This is Part 2 of a two-part series on analyzing the performance of CitiBike riders in NYC. The second part can be found here.
The original source code for producing the visualizations in this article can be found here.
Introduction
In the last blog post about CitiBike, the feasibility of bike-share ridership in NYC was evaluated. Now, it is time to turn attention to the actual performance of riding the bikes in context. Variables to be considered include the type of rider and bike, the distance covered, the location of the ride, and the time and day of the ride.
Snapshot of the Riders
As mentioned before, members make up three-quarters of CitiBike rides, and one-third of riders have used an electric bike. Both classic and electric bike rides tend to range from 14 to 15 minutes on average, but electric bike rides tend to cover a little more distance on average (approximately 2.5 miles as opposed to 2 miles by that of classic bikes, a 25% increase). Correspondingly, while classic bikes are normally running at almost 8.75 mph, electric bikes are typically gliding at almost 10.5 mph (a 20% increase).
Although both member and casual riders tend to cover on average 2.25 miles on a typical ride, members tend to spend less time covering that distance, typically finishing a ride in less than 14 minutes as opposed to more than 16 minutes for casual riders (more than 15% in time savings). Correspondingly, members tend to ride faster at almost 9.50 mph as opposed to around 8.75 mph by casual riders (a 7% increase).
Members ride more frequently on weekdays and less on the weekends, while the opposite is true for casual riders.
The ridership of both members and casual riders are equally affected by seasonal trends and holidays.
Where Do You Come From, Where Do You Go?
CitiBike only records the start and end stations, all tagged by their geographic coordinates (latitude and longitude), in their system data files. However, such information does not provide insight into the human aspects of its riders to understand the various localized cultures of the boroughs and neighborhoods with respect to bike-share usage. Therefore, the coordinates of the stations have been assigned to their respective boroughs and neighborhoods joined to that original data to accomplish this goal.
The Boroughs
Almost three-quarters of CitiBike rides are in Manhattan. Among the remainder, the vast majority are in Brooklyn, with very small participation in Queens and the Bronx. There are no CitiBike stations in Staten Island.
For the most part, rides that begin in one borough end there as well. Almost 95% of rides in Manhattan end there. Anyone who leaves Manhattan tends to go to Brooklyn. About 85% of rides in Brooklyn end there, but almost 12% will travel to Manhattan while less than 3% will make it to Queens. Both Queens and the Bronx retain about 69% of their riders. However, while the rides traveling out of Queens are equally split between Brooklyn and Manhattan; Bronx riders almost exclusively travel to Manhattan.
Nevertheless, in terms of net flow, whatever a borough loses is made up with inflow of rides coming from a corresponding one, reflecting established commuter corridors.
Special Case: New Jersey
Although Hoboken (“the sixth borough”) and Jersey City (“the seventh city”) in Hudson County, New Jersey are serviced by the NYC CitiBike system; it would seem logical that riders would have no interest, let alone audacity or legal travel lanes, to traverse over state lines separated by a river only crossable by toll bridges, tunnels, and an expensive ferry. Nevertheless, such trips do occur. From May 2021 to April 2022, a total of 458 of such trips were braved. However, it only consists of riders leaving New York to come to New Jersey.
Surprisingly, most of these rides were made from Monday through Thursday rather than on the weekend.
Throughout the week, rides occur mostly in the pre-rush hour times of the morning (5 – 6 am) and early afternoon (2 – 3 pm), and in a small window of time from midnight to early afternoon on the weekends.
The Neighborhoods
To examine CitiBike ridership patterns at an even more granular level, a study of the neighborhoods that have the most frequent rides records has been conducted.
Given the overall city ridership data illustrated before, it is expected that nine of the top ten neighborhoods for CitiBike usage are in Manhattan. They are all concentrated within the mostly commercial midtown neighborhoods of Chelsea, Midtown, and Hell’s Kitchen; the uptown residential neighborhoods of Upper West Side (UWS) and Upper East Side (UES); and the downtown neighborhoods of East Village (EV), West Village (WV), Lower East Side (LES) and SoHo. The outlier of this trend is the neighborhood of Williamsburg located in the northern part of Brooklyn.
In these key neighborhoods, most of the rides that start there end there as well. However, there are heavily traveled corridors between Chelsea and Hell’s Kitchen and between UES and Midtown within Manhattan. In Brooklyn they occur between Williamsburg and the neighborhood of Greenpoint just north of it within Brooklyn.
Manhattan
Whenever trips are made to a borough outside Manhattan, there is a corridor funneling traffic from LES, EV, and SoHo to Williamsburg in Brooklyn (through the bikeable Williamsburg Bridge), as well as another one converging that of UES and Midtown to Long Island City (LIC) in Queens (through the bikeable Ed Koch Queensboro Bridge).
Although Chelsea, UES, and SoHo have strong CitiBike ridership, they only occupy three of the top five neighborhoods with the most active stations in terms of rides originating in Manhattan. Battery Park City (on the southwestern tip by the Hudson River waterfront) and Gramercy (directly north of EV) also have very active ridership in terms of volume.
Brooklyn
A strong bike-share culture seems to be entrenched in a corridor between adjoining neighborhoods in the northern half of Brooklyn: Greenpoint, Williamsburg, Bushwick, and Bedford-Stuyvesant (Bed-Stuy). Also, riders from Williamsburg frequently head across the Williamsburg Bridge westward to LES and then northward to EV. On the other hand, the Pulaski, Greenpoint Avenue, and Kosciuszko Bridges all allow bike traffic from Greenpoint and Williamsburg to cross the Newtown Creek into LIC in Queens. Finally, riders from Bushwick often go to the bordering neighborhood of Ridgewood that is within the borough limits of Queens.
Quite expectedly, all the top five most active stations for rides originating in Brooklyn are within Williamsburg.
Queens
Of the top five cross-neighborhood routes in Queens, four are within the borough. The contiguous neighborhoods of LIC, Astoria, and Ditmars Steinway on the western end of the borough have a vibrant bike-share culture. Likewise, riders from LIC travel to Greenpoint in Brooklyn as well as UES and Midtown in Manhattan, and riders from Ridgewood cross into Bushwick correspondingly.
Consistently, all the top five most active stations for rides originating in Queens are in LIC.
The Bronx
The top three cross-neighborhood routes originating from the Bronx go to Manhattan. Otherwise, there is frequent travel between the neighborhoods of Mott Haven and Longwood, both on the southern tip of the borough separated by the Harlem River from Manhattan. Multiple bridges connecting the borough to Manhattan have pedestrian sidewalks that are probably used as bike lanes to facilitate travel from the Bronx neighborhoods of Mott Haven, Concourse, and Longwood in the Bronx to Manhattan neighborhoods Harlem, East Harlem, and UES.
While Concourse and Mott Haven have three of the five most active CitiBike stations for rides originating in the Bronx, Melrose (adjoining both neighborhoods) and Fordham (further north) also host the most frequently used ones in that borough.
A Bird’s Eye View
The visualization below shows where the most active CitiBike usage tends to be found. As elaborated before, Chelsea, Hell’s Kitchen, Midtown, and UES are strongly connected in Manhattan with high activity within UWS and EV, while a solid corridor between Williamsburg and Greenpoint exists in Brooklyn. The activity tapers off at their respective bordering neighborhoods for all the boroughs.
On weekdays, there are commuter rush hour periods with a sharp spike at 8 am in the morning and a heavy period during 5 – 6 pm in the evening that gently tapers off later in the night.
During the weekday morning rush hour, Chelsea, UWS, UES, WV, and EV and the neighborhoods filling the voids between them – Hell’s Kitchen and Midtown – tend to pick up in Manhattan. In Brooklyn, Williamsburg is relatively active as compared to the nearby neighborhoods of Greenpoint, LIC, and Bed-Stuy.
During the weekday evening rush hour, Hell’s Kitchen and Midtown join Chelsea, UWS, UES, WV, and EV to produce the highest volume of rides in Manhattan, in addition to an uptick in activity in downtown neighborhoods such as Battery Park City, SoHo, and LES. Concurrently, Williamsburg is extremely active in Brooklyn. There is also activity in the neighborhoods nearby – Greenpoint, LIC, and Bed-Stuy. Ditmars-Steinway in Queens and Bushwick in Brooklyn, as well as the southern half of that borough, also pick up noticeably.
During the time outside rush hour on weekdays, the usual neighborhoods and corridors within Manhattan and Brooklyn with generally high activity remain consistently so.
On weekends, the ridership gently rises in the late morning, remains steady in the early afternoon, and then steadily tapers off in the evening.
Similarly, on weekends, the usual neighborhoods and corridors discussed previously with strong CitiBike activity maintain their pattern consistently.
The Final Model
To understand how the majority of CitiBike riders use the service, the distribution of ride durations, distances, and speeds were modeled after extensive cleaning of the data to eliminate the following aberrations:
“Ghost Bikes” | No start station | |
---|---|---|
Lost Bikes | No end station | |
“Neverminds”/”Joy Rides” | Same start and end station |
And meet the following assumptions:
Minimum Ride Duration | = | 5 | min | ||
---|---|---|---|---|---|
Maximum Ride Duration | = | 150 | min | (2-½ hours) | |
Minimum Ride Speed | = | 5 | mph | (greater than typical human walking speed) | |
Maximum Ride Speed | = | 20 | mph | (maximum electric bike speed capped by manufacturers across the industry) |
Although the average ride lasts about 14.5 minutes, most of them are distributed below that threshold.
Although the average ride distance is almost 2.25 miles, most of them are distributed below that threshold as well.
Finally, although the average ride speed is about 9.25 mph, most of them are distributed below that threshold as well.
The Record Holders
The extreme rides were examined for plausibility. Here are the honorable mentions:
The Longest Ride (Duration)
At 1 pm on Thursday, August 26, 2021, a casual rider on a classic bike rode for around 2-½ hours to travel 15 miles from SoHo in downtown Manhattan all the way to Inwood, the northernmost tip of the borough (the entire journey in that direction being uphill on top of that). That ride would have averaged about 6 mph, which is comparable to a typical human running speed. Such a ride would have cost $31.57 if a single ride were purchased and $32.37 if a $15 day pass was used. That person would have been better off taking the subway or just a cab if desiring to take the scenic route on this hot summer afternoon.
The Longest Ride (Distance)
At 4:45 pm on Monday, December 2021, a member rider on an electric bike spent around 1-½ hours to travel about 22-½ miles (the length of a full marathon race) from Inwood all the way to the neighborhood of Sunset Park in Brooklyn, coincidentally. This ride would have averaged 13-¼ mph, which is fast enough to be worth it. Such a ride would have cost $11.39, which is still cheaper than a cab ride but does not justify time spent comparable to that of riding the subway. As per membership policy, there is a $3.00 cap for any electric bike rides within 45 minutes in duration starting or ending outside Manhattan. Unfortunately, this ride well exceeded that limit to take advantage of this discount, unless the rider was willing to split the ride in two on this cold winter afternoon, which hopefully was not icy.
The Fastest Ride (Speed)
Past the stroke of midnight to start Friday, May 21, 2021, a member on a classic bike darted around 2-½ miles from UES to LIC in about 7-½ minutes, which would have amounted to 20 mph. The need for such a trip at that time of night may never be discovered. But it proved to be a bargain in terms of both cost (free) and time (not having to wait for a subway train). Nevertheless, if the rider took the Queensboro Bridge to cross the East River, it would have been a daunting task, given the high spiral climb required to reach the high vertical clearance that the bridge has. In fact, that approach has been rated one of the most dangerous ones for cyclists in the city, particularly for one of the most dangerous bridges for non-automobile traffic to cross in general.
The Methodology
The dependent variable was the ride duration (in minutes), and the independent variables were as follows:
1. | Ride Distance | miles | |||
---|---|---|---|---|---|
2. | Member Type | Casual | = | 0 | |
Member | = | 1 | |||
3. | Bike Type | Classic | = | 0 | |
Electric | = | 1 | |||
4. | Month of the Year | (1 - 12) | |||
5. | Week of the Year | (1 - 52) | |||
6. | Day of the Week | Weekday | = | 0 | |
Weekend | = | 1 | |||
7. | Hour of the Day | (0 - 23) | |||
8. | Neighborhood | Same | = | 0 | |
Different | = | 1 | |||
9. | Borough | Same | = | 0 | |
Different | = | 1 |
The Results
Ridge was the most accurate and efficient regression, producing a coefficient of determination (R2) of 0.77. Logistic, Huber, and Random Forest were consuming too much time and Huber and Lasso generated similar results to Ridge. The Ridge regression coefficients are as follows:
0. | β0 | = | 2.32 |
---|---|---|---|
1. | βdistance_mi | = | 5.31 |
2. | βmember_casual | = | - 1.38 |
3. | βrideable_type | = | - 2.27 |
4. | βmonth | = | 0.04 |
5. | βweek_of_year | = | - 0.00 |
6. | βday_of_week | = | 0.57 |
7. | βhour_of_day | = | 0.06 |
8. | βdiff_hood | = | 0.89 |
9. | βdiff_boro | = | 1.86 |
Predictably, distance is the greatest factor governing ride duration by far. However, in descending order, type of bike (classic or electric), whether traversing a different borough (since the bridges connecting them tend to be bottlenecks), and type of rider (member or casual) are significant factors as well. However, traversing different neighborhoods or on different days of the week are weakly correlated with ride duration. The month, week, and hour of the day have inconsequential impacts. Such a machine learning model could improve the estimates provided by the NYC CitiBike app to better guide a rider’s decision as to whether riding a certain distance would be financially feasible, let alone physically practical.
Resources for Data
CitiBike System Data
https://ride.citibikenyc.com/system-data
PEDIACITIES-NYC-NEIGHBORHOODS (data.βetaNYC)
https://data.beta.nyc/dataset/pediacities-nyc-neighborhoods/resource/35dd04fb-81b3-479b-a074-a27a37888ce7