A look at the Bay Area Bike Share

Aungshuman Zaman
Posted on Jul 29, 2018

Introduction:

Bicycles are one of the best ways to travel in an urban setting.  Because of its zero-emission and active lifestyle promoting reputation, it has increasingly become mainstream in cities across the world.  City planners have moved to make their streets more bike-friendly, and several private sector bike-sharing programs have come forward to make it easier for commuters and tourists to share bikes. For this project, I looked at a dataset made public by a bike-sharing program based in San Francisco, CA.

The dataset:
The data for the project came from the Kaggle website and can be found here. There are four .csv files, of which I have used two: 'station.csv' and 'trip.csv'. The dataset chronicles every bike trip undertaken using the program over two years, from August 2013 to August 2015.

I wrote a R Shiny app to summarize the findings of my exploration. The shiny app link for the project can be found here. My code to implement this project can be found on my github.

Description of the app:
The dataset contains information about 70 bike stations scattered around five cities (namely San Francisco, San Jose, Redwood city, Mountain view, and Palo Alto) in the bay area.

In the 'Stations' tab, the app presents a map showing the location of the bike stations. There is also a bar plot showing number of bicycle docks in individual stations. The user can select one or more cities, and stations located at those cities will be shown. The radius of a circle, which represents a station, is proportional to the number of docks.

In the 'Trip frquency' tab, again the stations are shown, but this time the radius is proportional to trip frequency originating from a station. Users may filter this by choosing a city, hour of the day the trip originated or a date range.

In the 'Station connection' tab, users may select a station where a trip originates, and the map on the left will show all the stations where the trip ended. Larger circle again means larger frequency. On the right side, the trip frequency is shown in a bar plot.

    

The 'Trip duration' tab shows the trip duration in different seasons (left) and seasonal variation of trip frequency. The right plot also shows the split between two different subscription types. The program has a system of subscription where for a monthly fee, a subscriber can ride a bike for free for 30 minutes. By playing with the two slidebars on this page, it can be seen that those subscribers are more likely to ride the bikes for a shorter period of time, and during the peak morning and evening hours, possibly to commute to and from work. There is another group of users who rent the bikes on an hourly or daily basis. This group dominates when the bike ride is longer than 30 minutes. It is also clear that there is a general decline in bike riding during the winter months despite of Bay area's comparatively mild winter.

The tab 'Variation of trip frequency' shows how trip frequency varied over the two years. Some of the fluctuation on a day-to-day basis is clearly statistical or may be due to weather, which will be interesting to explore. Not surprisingly, number of trips is larger on the weekdays than on the weekends. There is a steep decline in bike riding during the Christmas holidays. A bar plot of trip frequency over the hours of the day gives us some more insight. It confirms the bimodal nature of bike-riding during the weekdays, the two peaks corresponding to the morning and the evening peak hours. This tells us, on a weekday, most bike rides are by commuters . On the weekends, trip frequency stays pretty flat throughout the day.

 

Conclusion:

As an exploration tool, I found Shiny to be quite useful. Its interactive nature  makes it very easy to vary the different filters on the dataset. Sometimes it leads to unexpected insights. For example, the behavior of bike renters for the two different subscription models were strikingly different.

Future work:

The complete dataset contains files that I have not explored. In particular, it will be interesting to see how weather information can be used to predict general user behavior.

About Author

Aungshuman Zaman

Aungshuman Zaman

I am a Physics PhD with experience of working with big datasets in large scientific collaboration. I worked for the ATLAS high energy physics experiment at the European Particle Physics Laboratory CERN, where I collected, cleaned and interpreted...
View all posts by Aungshuman Zaman >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

2019 airbnb alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp