Citi Bike Visual Exploration

Wing Yan Sang
Posted on Oct 12, 2017



Citi Bike is New York City's bike share program that started in May 2013 and has expanded rapidly ever since.  Operated by Motivate, the nearly ubiquitous blue bikes have become a highly visible component of daily traffic flow in many parts of the city. Citi Bikes are now available in over 600 stations spread throughout most of Manhattan, and parts of Brooklyn and Queens. As of August 2016, Citi Bike has even expanded across the other side of the Hudson River to Jersey City.

One of the major challenges of operating such an extensive network of bicycle stations is making sure that bicycles are available to check out when users need them and that docks are available when users arrive at their destination, otherwise known as the "rebalancing problem".  One could easily imagine that the rebalancing problem is exacerbated in the summer months during the height of tourist season when in addition to the typical Citi Bike commuter, there are also many out-of-town visitors using the bike share system to explore the city.

Given the complexity of the rebalancing problem, my goal for this Shiny app project is modest.  Citi Bike operates on a simple pricing system in which one can sign up for an annual membership of $163 per year ("subscriber") or one can pay $12 for a one-day access or $24 for a three-day access ("non-subscriber").   Using data visualization, I attempt to gain some insight to the ridership behavior of the latter group of users, who tend to be visitors to the city and who contribute to the increase in volume of Citi Bike trips, especially during the summer weekends.  In particular, I attempt to gain a better understanding of which areas non-subscribers concentrate in, how this varies over the day, and the flow of bike usage at particular stations in select areas. It is hoped that the analysis will provide some level of assistance to those interested in improving the experience for all Citi Bike users.



The datasets used in the analysis are from the January 2017 and July 2017  Citi Bike Trip Histories downloadable files located in the System Data section of Citi Bike’s website (  From these datasets, I excluded those trips that were over 60 minutes in duration as well as those observations in which the value for the “usertype” field is missing. I also excluded those observations that appear to be related to Citi Bike’s transference of bikes to and from their operations facilities in Brooklyn and Manhattan.


Seasonal Comparison

To get a sense of how non-subscriber ridership volume changes from winter to summer relative to subscribers, I compared their share of total average daily trips in January and July of 2017.  As can be seen in the following graph, their share of total average daily trips increases more than two-fold during the weekday and the weekend. It is also interesting to note that, with the exception of the first week in January, non-subscribers’ share of weekend trips is consistently higher than its share of weekday trips, for both January and July. However, in July, the weekend spike in volume appears to be more dramatic.



To arrive at a more detailed understanding of where this overall increase in summer ridership is occurring, I added markers to the following Leaflet map showing the top twenty Citi Bike stations in terms of percentage increase in number of trips that started or ended at the station. It is not wholly surprising that almost all the markers are along the waterfront (Hudson River Greenway, Brooklyn Bridge Park, and Red Hook) or along the parks (Central Park and Prospect Park West).   It is worth noting that even if we expand the list to the top fifty Citi Bike stations, the vast majority of these stations are in the same aforementioned areas.



Citi Bike Hotspots by Time of Day during the Weekend in July

Given that weekends are especially popular for non-subscribers, a breakdown by neighborhood of where these Citi Bike users start their trips may uncover some interesting trends. For this analysis, I overlaid neighborhood boundaries, defined by a  GeoJSON file managed by Ontonodia, on top of the Leaflet map  shown previously.  I then tabulated the number of trips started in each neighborhood for each time period of the day and shaded the neighborhoods based on the number of trips.  From  9 a.m. to 9 p.m., Central Park is the most popular neighborhood among non-subscribers, followed by the Upper West Side and Midtown. This is in marked contrast with the ridership behavior of subscribers who consistently concentrate in Chelsea and East Village during the same time period. For illustrative purposes, the following two images compare the neighborhood concentrations of subscribers and non-subscribers during the 3 p.m. to 6 p.m. time period.  The darkest region on the left image is Chelsea and the darkest region on the right image is Central Park.



Central Park Citi Bike Stations Activity

The results of the previous analysis served as a motivation to examine the flow of Citi Bike usage for the stations around Central Park on the weekends.  For each time period of the day, I calculated the net inflow or outflow of Citi Bike trips at each station, represented by a circle on the map. A green circle means that there is a net inflow (more Citi Bike trips end at the station than start at the station) whereas a red circle means that there is a net outflow (more Citi Bike trips start at the station than end at the station). The size of each circle corresponds to the relative magnitude of the net inflow or net outflow. The purpose of the analysis was to gain a sense of where bike availability may or may not be an issue.  The analysis suggests that there may be a surplus of available bikes in the weekend mornings whereas beginning at 3 p.m., there may be an issue with bike availability at many stations on both the eastern and western perimeters of the park.  For illustrative purposes, please refer to the following two images. The first image represents the time period from 9 a.m. to 12 p.m. and the second image represents the time period from 3 p.m. to 6 p.m.



Inter-borough Citi Bike Trips

Central Park is not the only area in which there is a marked contrast between non-subscriber and subscriber ridership behavior. During the weekday in July, non-subscribers are responsible for approximately only 13 percent of inter-borough Citi Bike trips (trips that begin in one borough but end in another). This figure jumps to approximately 32 percent in the weekend. This is most likely driven by the decrease in Citi Bike users commuting to work during the weekday.  However,  it is also driven by the fact that average weekend inter-borough bike trips is more than double the weekday inter-borough bike trips among non-subscribers.

Using a similar analysis employed for the Citi Bike stations in Central Park described previously, I examined the flow of trips among the top ten Citi Bike stations used by non-subscribers during the weekend and represented the results visually using the same types of markers as those used in the Central Park analysis. Based on the analysis, there is a net inflow of Citi Bike trips in the morning and early afternoon, whereas there is a net outflow at most of these stations during the evening starting at 6 p.m. Again, for illustrative purposes, I have included the following two images to contrast the flow of Citi Bike trips among these inter-borough stations. The first image is for the time period  from 9 a.m. to 12 p.m. and the second image is for the time period from 6 p.m. to 9 p.m.




General Observations

With the aid of the Shiny package, certain intuitions and hunches about Citi Bike ridership behavior were corroborated by data visualization tools and techniques. A simple mapping of Citi Bike stations that experienced the greatest increase in activity since January clearly showed a preference among all Citi Bike riders for the waterfront areas and the parks. Furthermore, choropleth maps allowed easier comparisons between non-subscriber and subscriber activity on the weekends across the various neighborhoods, with non-subscribers clearly preferring Central Park throughout most of the day whereas subscribers concentrate further to the south in Chelsea and East  Village.  Detailed analysis at the Citi Bike station level in Central Park also revealed a net inflow of Citi Bike trips at most stations in the morning whereas beginning in the mid-afternoon, most of the Citi Bike stations in this area experienced a net outflow.  And finally, although non-subscribers do not play a significant factor in inter-borough trips during the weekday, they comprise almost a third of all inter-borough trips during the weekend. The dynamics of net inflow and outflow of Citi Bike trips from the most popular inter-borough Citi Bike stations are similar to the dynamics found at Central Park, with a net inflow of trips in the morning but a net outflow towards the late afternoon and early evening.  


Concluding Remarks

My experience on this project demonstrated for me the power of interactive data visualizations to corroborate or disconfirm certain beliefs as well as to serve as catalysts for other avenues of research and analysis.  The sensitivity of Citi Bike non-subscribers to season, day of the week, time of day, and location, served as a great opportunity to deploy a data visualization tool such as Shiny to study their behavior.  As Citi Bike continues to expand to other areas of the city, the rebalancing problem is certainly expected to remain a challenge to providing the best experience for all Citi Bike users.  However,  I am optimistic that  problems that once seemed intractable will become less so as data science continues to advance and tools are developed to address these issues.

About Author

Wing Yan Sang

Wing Yan Sang

Graduate of NYC Data Science Academy (December 2017)
View all posts by Wing Yan Sang >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp