Using NYC Citi Bike Data to Help Bike Enthusiasts Find their Mates

Claire Keser
Posted on Apr 26, 2017

There is no shortage of analyses on the NYC bike share system. Most of them aim at predicting the demand for bikes and balancing bike stock, i.e forecasting when to remove bikes from fully occupied stations, and refill stations before the supply runs dry.

 

This is why I decided to take a different approach and use the Citi Bike data to help riders find each other; a kind of Tinder for bike riders...

If you want to skip the analysis, you can check out the app here.

 

The Challenge

citibike_citiTinder2As a bike enthusiast, I wish I had a platform where I could have spotted like-minded people who did ride a bike (and not just pretend they did).

The goal of this project was to turn the Citi Bike data into an app where a rider could identify the best spots and times to meet other Citi Bike users and cyclists in general.

 

 

 

 

The Data

mapAs of March 31, 2016, the total number of annual subscribers to Citi Bike was 163,865, and its riders took an average of 38,491 rides per day in 2016 (source: Wikipedia)

That adds up to more than 14 million rides in 2016!

I used the Citi Bike data for the month of May 2016 (approximately 1 million observations). Citi Bike provides the following variables:

  • Trip duration (in seconds).
  • Timestamps for when the trip started and ended.
  • Station locations for where the trip started and ended (both the names and coordinates).
  • Rider’s gender and birth year - this is the only demographic data we have.
  • Rider’s plan (annual subscriber, 7-day pass user or 1-day pass user).

 

Riders per Age Group

Before moving ahead with building the app, I was interested in exploring the data and identifying patterns in relation to gender, age and day of the week. Answering the following questions helped identify which variables influence how riders use the Citi Bike system and form better features for the app:

  • Who are the primary users of Citi Bike?
  • What is the median age per Citi Bike station?
  • How do the days of the week impact biking behaviours?

As I expected, based on my daily rides from Queens to Manhattan, 75% of the Citi Bike trips are taken by males. The primary users are 25 to 24 years old.

ridersperage

Riders per Age Group

 

Distribution of Riders per Hour of the Day (weekdays)

However, while we might expect these young professionals to be the primary users on weekdays between 8am-9am and 5pm-6pm (when they commute to and from work), and the older users to take over the Citi Bike system midday, this hypothesis proved to be wrong. Also, tourists seemed to have little impact on usage as the short term customers only represent 10% of the dataset.

agegroupdistribution

Distribution of Riders per Hour of the Day (weekdays only)

 

Median Age per Departure Station

Looking at the median age of the riders for each station departure, we see the youngest riders in East Village, while older riders start their commute from Lower Manhattan (as shown in the map below). The age trends disappear when mapping the station arrival, above all in the financial district (in Lower Manhattan), which is populated by the young wolves of Wall Street (map not shown).

The map also confirms that the Citi Bike riders are mostly between 30 and 45 years old.

medianage

Median Age per Departure Station

 

 

Rides by Hour of the Day

Finally, when analyzing how the days of the week impacted biking behaviours, I was surprised to see that Citi Bike users didn’t ride for a longer period of time during the weekend: the median trip duration is 19 minutes for each day of the week.

tripdurationperminute

Trip Duration per Gender and Age Group

 

However, as illustrated below, there is a difference in peak hours. While the peak hours during the weekdays are around 8am-9am and 5pm-7pm when riders commute to and from work, on the weekends, riders hop on a bike later during the day, with most of the rides happening midday.

 

weekday_weekend

Number of Riders per Hour of the Day (weekdays vs. weekends)

 

 

The App

Where does this analysis leave us?

  • The day of the week and the hour of the day are meaningful variables that we need to take into account in the app.
  • Most of the users are between 30 and 45 years. This means that the age groups 25-34 and 35-44 won’t be granular enough when app users need to filter their search. We will let them filter by age instead.

 

The Citi Tinder app in a few words and screenshots.

There are 3 steps to the app:

  • The "when": find the times and days where your ideal mate is more likely to ride.

step1_when

 

  • The "where": once you know the best times and days, filter out the location by day of the week, time of the day, gender and age. You can also select if you want to spot where they arrive or depart.

step2_where

 

  • The "how': the final step is to grab a Citi Bike and get to those hot spots. The app calls the Google Maps API to show the directions with a little extra: you can compare the time estimated by Google to connect two stations versus the average time it took Citi Bike users. I believe the latter is more accurate because it factors in the time of the day and day of the week (which the app let you filter).

step3_how

 

Although screenshots are nice, the interactive app is better so head to the first step of the app to get started!

 

 

Would Have, Should Have, Could Have

This is the first of the four projects from the NYC Data Science Academy Data Science Bootcamp program. With a two-week timeline and only 24 hours in a day, it was impossible to cover every data angle. Below is a quick list of the analysis I could have, would have and should have done if given more time and data:yeahbike

  • Limited scope : I only took the data from May 2016. However, I expect the Citi Bike riders to behave differently depending on the season, temperature, etc. Obviously, the bigger the sample size the more reliable the insights are.
  • Missing data : There was no data on the docks available per station that could be scraped from the Citi Bike website. The map would have been more complete if the availability of docks had been displayed.
  • Limited number of variables : I would have liked to have more demographics data (aside from gender and age); a dating app with only the age and gender as filters is restrictive...
  • Incomplete filters : With more time, I'd have added a filter 'speed' in the second step of the app (the 'where' part) to enable the hard core cyclists to filter the fastest ones...
  • Sub-optimal visualization : I am aware that the map in the introduction page (with the dots displaying the median age per station) is hard to read and with more time, I'd have used polygons instead to group by neighbourhoods.
  • Finally, I would have liked to track unique users. Although users don't have a unique identifier in the Citi Bike dataset, I could have identified unique users by looking at their gender, age, zip and usual start/end stations.

About Author

Claire Keser

Claire Keser

Claire Keser completed her MBA at the University of Victoria (Canada). Her work experience has been primarily in Conversion Optimization (A/B testing) where she built & led a team focused on turning data into products, actionable insights, and...
View all posts by Claire Keser >

Related Articles

Leave a Comment

Avatar
Using NYC Citi Bike Data to Help Bike Enthusiasts Find their Mate | A bunch of data April 27, 2017
[…] article was first published on R – NYC Data Science Academy Blog, and kindly contributed to […]
Avatar
Using NYC Citi Bike Data to Help Bike Enthusiasts Find their Mate – Mubashir Qasim April 27, 2017
[…] article was first published on R – NYC Data Science Academy Blog, and kindly contributed to […]

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

2019 airbnb alumni Alumni Interview Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Big Data Book Launch Book-Signing bootcamp Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Industry Experts Job Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest recommendation recommendation system regression Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Tableau TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp