Hubway Station Metrics

Posted on Jul 1, 2019

In 2014 Boston held a data visualization challenge. They asked users to look at ridership statistics in new and exciting ways. Boston wanted to know what insight could be gained by crowdsourcing data scientists to look at the activity of their fleet of ride-share bikes on the Hubway network.

Hubway made all of their data public for the first three years of their operation. They posted data on every single instance of a user taking a bike from Station A to Station B. They also included user information on registered riders.

I approached this project with the aim to build a station statistic tool. An interface to allow station operators and riders to examine the usage of a given station. This app would provide insight into which types of users frequent the station, the amount of traffic throughout the day, and where users were coming from or going to.

As cycling is a seasonal activity, I limited my data to only examine rides taken in 2012 and examined the annual ridership. I also removed all trips that had a duration fewer than a few minutes as this likely indicated false starts.

I built my app into an interactive Shiny App. The tool allows you to select any station on its network and immediately see graphs showing the Net Traffic through the day, the portion of riders that are commuter or casual, and the gender ratio of riders.

Users can also toggle the time period to examine, from hours in the day, months in the year or days in the week. This allows users to see differences in peak activity. We can use this to confirm that the most active months are in the summer, most active hours are 8 am and 6 pm. Additionally, you can toggle whether you want to examine weekends or not. As commuter traffic drops sharply on the weekends, this can show rider demographics and activity patterns that are drowned in the noise.

The map also updates with the top 10 start and end stations users are most likely to go between this station. Paths that are arriving at this station are marked in blue, whereas paths leaving from this station are in red. This can be used to identify sister stations.

Ideally, these tools would be used best to identify stations that historically are underused or underserviced, rider demographics and peak activity times. Station operators can use this app to determine when and where to rebalance the network, and how to advertise to its users based on need. For example, stations identified as heavy casual usage, are likely major tourist locations, which can be tapped for advertising local shows or events that commuters are less interested in.

If you are interested in this project or others like it, please visit my GitHub.

About Author

Related Articles

Leave a Comment

Distilled News | AnalytiXon July 4, 2019
[…] Hubway Station Metrics […]

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

2019 airbnb alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp