Data Study on Citi Bike Riders in Different Ages

Posted on Feb 5, 2018
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Introduction

Data Study on Citi Bike Riders in Different Ages

As more and more people enjoy healthy and efficient lifestyles, riding bikes to work is recognized as a better commuting choice than driving or walking. The Citi Bike program launched in 2013 with 332 stations and 6000 bikes. Data shows it now owns 706 stations and 12,000 bikes, making it be the largest sharing bike program in the U.S. In 2016 Citi Bike riders took an average 38,491 rides per day, a number that nearly doubled in 2017. While Citi Bike has a large number of customers and subscribers,  it can win over more by advertising to car drivers, bus riders or pedestrians.

With a view of the potential market for CitiBike services, I would like to explore the data and try to seek a way to better know the crowd.

As we known different age group would have different needs and desires, so I am going to use age as the main factor to analyze when and where to place advertisements to different people. To that end,  I built an app based on R Shiny and use ggplot2, plotly, and leaflet to visualize my finding. This is the link to github and my Shiny app is also available here.

Data

I downloaded the 2016 January to December Trip Data in Jersey City from Citi Bike System Data.

The original data structure was as follows:

  • Trip Duration (seconds)
  • Start Time and Date
  • Stop Time and Date
  • Start Station Name
  • End Station Name
  • Station ID
  • Station Lat/Long
  • Bike ID
  • User Type (Customer = 24-hour pass or 3-day pass user; Subscriber = Annual Member)
  • Gender (Zero=unknown; 1=male; 2=female)
  • Year of Birth

I used dyplr, tidyr, and data table to clear up and manipulate the data set. Based on the data, I divided people into six age groups: 15-25, 25-35, 35-45, 45-55, 55-65, and 65-75.

Data Analysis

In this analysis, we try to figure out:

  • What hours have the most riders flow rate?
  • Which day of the week is more popular for which age group?Which month would be best to advertise for specific age groups?
  • Where is the ideal location for certain group?

Relationship Between Age Group and Times of Day

First, let's start to analyze the relationship between each age group changes and different times of day.

Data Study on Citi Bike Riders in Different Ages

In the bar chart, we can see 25 - 35 and 35 -45 take up a large portion of the bikes' use. However, these two groups have different behaviours. , Leaving at 20:00 -23:00 and returning between 1:00 - 4:00 applies to both the younger age groups:  15-25  and 25-35 year-olds.

For 35 - 45 year-old, the riding time is relatively stable, likely because they have less nightlife. They would ride bikes to work or exercise more in the daytime.

Behaviors Among Different Age Groups

Next, we are going to analyze how different age groups behave differently in each day.

From the 2D Histogram, we can see that 3 different age groups, ranging from  25 -35, 35 -45, and 45 -55 make up the major users. Compared to other groups, that 25 -35 year-olds would be more likely to go out on Saturday. Accordingly, it makes sense for ads to target young people on Saturday. Also, we could purposely market on the days when demand is down among a certain group to try to increase it, for example, stores and eateries put out promotions on slower days of the week rather than on days when they are already busy.

Scatter Plot

Thirdly, I use a scatter plot to illustrate how each age group tends to ride Citi Bike in each month.

In the scatter plot, we can see that15 -25, 55-65, and 65-75 remain stable throughout the year, while 25- 35 and 35 -45 have a huge deviation from month to month. The curve for 25 - 35 and 35 - 45 goes up from February to October and goes down from October to February. This variance might be caused by the weather. When the temperature gets cold, fewer people will ride a bike. So for these two group, July to November is an optimal period for targeted ads.

Popular Locations

Lastly, I create an interactive map to show which location is popular for each age group. In this map, the area of the yellow circle represents the number of people starting their journey. On the right side of the screen, R shiny app users can choose age group, user type, gender, and date to filter out the data.

In the map, I summarize the following chart to highlight the most popular bike station for each age group.

Age Group Popular Place to Start
15 - 25 Brunswick St
25 - 35 Grove St. Path,  Sip Ave
35 - 45 Grove St. Path,   Exchange Place
45 - 55 Grove St. Path,   Sip Ave
55 - 65 Grove St. Path,  Hamilton Park
65 -75 Grove St. Path, Newark Ave

From the chart, we can see that though Grove St. Path shows up as a favourite for most of the age groups, there is quite a bit of deviation for the second rank among them. That may indicate good choices of locations to target for particular ages.

Conclusion

  • Grove St. Path is a nice location to target for almost for every age group.
  • If you wish to target for 25 -45 year-old at a particular time of the year, July to Nov would be a good choice.
  • Saturday seems to be a perfect date to make more promotion to 25 -35 year-old.
  • At midnight, the advertisement material would be most relevant to 25 - 35 year-old. During business hours, the advertisement would likely appeal more to those in the next three age brackets:  35 - 65 year-olds.

Next Step

I will combine both the time and location data together to further see how they react to each other and check whether there is some huge deviation from the results we have so far. I will also provide more detailed analysis for specific date and location, and expand the scope to every city in the U.S. that runs sharing bike program.

Reference

https://en.wikipedia.org/wiki/Citi_Bike

https://www.citibikenyc.com/system-data

https://archpaper.com/2014/10/bikers-tan-citi-bike-system-opens-in-miami-next-month/

 

About Author

Related Articles

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI