A look at the Bay Area Bike Share

Posted on Jul 29, 2018


Bicycles are one of the best ways to travel in an urban setting.  Because of its zero-emission and active lifestyle promoting reputation, it has increasingly become mainstream in cities across the world.  City planners have moved to make their streets more bike-friendly, and several private sector bike-sharing programs have come forward to make it easier for commuters and tourists to share bikes. For this project, I looked at a dataset made public by a bike-sharing program based in San Francisco, CA.

The dataset:
The data for the project came from the Kaggle website and can be found here. There are four .csv files, of which I have used two: 'station.csv' and 'trip.csv'. The dataset chronicles every bike trip undertaken using the program over two years, from August 2013 to August 2015.

I wrote a R Shiny app to summarize the findings of my exploration. The shiny app link for the project can be found here. My code to implement this project can be found on my github.

Description of the app:
The dataset contains information about 70 bike stations scattered around five cities (namely San Francisco, San Jose, Redwood city, Mountain view, and Palo Alto) in the bay area.

In the 'Stations' tab, the app presents a map showing the location of the bike stations. There is also a bar plot showing number of bicycle docks in individual stations. The user can select one or more cities, and stations located at those cities will be shown. The radius of a circle, which represents a station, is proportional to the number of docks.

In the 'Trip frquency' tab, again the stations are shown, but this time the radius is proportional to trip frequency originating from a station. Users may filter this by choosing a city, hour of the day the trip originated or a date range.

In the 'Station connection' tab, users may select a station where a trip originates, and the map on the left will show all the stations where the trip ended. Larger circle again means larger frequency. On the right side, the trip frequency is shown in a bar plot.


The 'Trip duration' tab shows the trip duration in different seasons (left) and seasonal variation of trip frequency. The right plot also shows the split between two different subscription types. The program has a system of subscription where for a monthly fee, a subscriber can ride a bike for free for 30 minutes. By playing with the two slidebars on this page, it can be seen that those subscribers are more likely to ride the bikes for a shorter period of time, and during the peak morning and evening hours, possibly to commute to and from work. There is another group of users who rent the bikes on an hourly or daily basis. This group dominates when the bike ride is longer than 30 minutes. It is also clear that there is a general decline in bike riding during the winter months despite of Bay area's comparatively mild winter.

The tab 'Variation of trip frequency' shows how trip frequency varied over the two years. Some of the fluctuation on a day-to-day basis is clearly statistical or may be due to weather, which will be interesting to explore. Not surprisingly, number of trips is larger on the weekdays than on the weekends. There is a steep decline in bike riding during the Christmas holidays. A bar plot of trip frequency over the hours of the day gives us some more insight. It confirms the bimodal nature of bike-riding during the weekdays, the two peaks corresponding to the morning and the evening peak hours. This tells us, on a weekday, most bike rides are by commuters . On the weekends, trip frequency stays pretty flat throughout the day.



As an exploration tool, I found Shiny to be quite useful. Its interactive nature  makes it very easy to vary the different filters on the dataset. Sometimes it leads to unexpected insights. For example, the behavior of bike renters for the two different subscription models were strikingly different.

Future work:

The complete dataset contains files that I have not explored. In particular, it will be interesting to see how weather information can be used to predict general user behavior.

About Author

Aungshuman Zaman

I am a Physics PhD with experience of working with big datasets in large scientific collaboration. I worked for the ATLAS high energy physics experiment at the European Particle Physics Laboratory CERN, where I collected, cleaned and interpreted...
View all posts by Aungshuman Zaman >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI