Data Visualization on Birds

Posted on Jul 30, 2018
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

The American Birder

For the millions of bird watchers in America, relevant and useful data resources are always welcome. Range maps and ecological histories enhance the bird watching experience by adding a layer of conservation awareness and help hobbyists become more acquainted with the birds they observe.

As a birder myself, I am always looking for new applications that help me achieve a greater understanding of the birds I observe on a daily basis. Learning more about these fascinating and beautiful animals helps to bring the bigger picture into focus; they have a much larger role in our world than just the momentary glimpse you get when observing them in a park or when walking down the street.

The Data

I chose to use the eBird dataset from the Cornell Lab of Ornithology to construct a Shiny application in R that allows a birder to "zoom out" from an isolated bird observation. Crowd-sourced data from around the world is what enables eBird to track the locations and times of bird observations. Using their online interface, a user can view observations of many different species of birds, explore bird watching hot-spots, and even see a real time observation submission map. Observations are available from as far back as the year 1900, and the increasing accessibility of technology translates to an ever increasing avalanche of data pouring in. The total data set today consists of over 500 million observations!

The App

Using the Shiny package in R, I was able to build an application that explored observations from 2016 of 10 species of birds in the United States. I aggregated the observations by county and produced a graph using ggplot2 to show the frequency of observations throughout the US. The observations can be filtered by month using a slider bar to inspect the distribution of sightings during a particular season.

I also added a feature that allows the user to filter the observations by breeding season, which I implemented by using estimated breeding season ranges from the Cornell Lab of Ornithology Birds of North America website. A feature that I found really interesting and was excited to add to my application is the β€œplay” button which shows an animated map that cycles through the months of the year and displays the bird sightings accordingly. This provides the user with an important perspective on the movement of different species throughout the US which can sometimes be lost when looking at separate monthly range maps one at a time.


Data Visualization on Birds

Map that shows number of species observations in a specific month range


In addition to visualizing the movement of birds throughout the US, I wanted to add additional functionality for bird watchers. I thought an interesting question to ask would be: at what time are birds most often being seen?

To answer this question, I added a histogram which shows the frequency of times that a certain species was observed. What I found was that 8:00 AM was by far the most frequent time that an eBird user submitted an observation. What I have concluded is that the data is biased; many more people are actively bird watching around 8:00 AM, thus, the amount of observations spike around that time. This does not necessarily mean that a species is more likely to be seen at 8:00 AM, only that there are more people actively looking.

However, I did find a different trend for the only owl species (the Short-eared Owl) that I included in my list of species. This species showed a maximum sighting frequency at around 5:00 PM, which is in agreement with the fact that owls become more active around dusk. This leads me to believe that the functionality of this feature is mainly relevant for determining very basic activity levels for certain species.


Data Visualization on Birds

Histogram that shows the frequency of species observations by time of day


When viewing the range map I found that regional movement of bird sightings was apparent, however it was easy to overlook state-level observation trends. I elected to add a bar graph that brakes down observations by month for every state where there were sightings. This feature allows the user to see a clear pattern in sighting frequency over the course of a year.

The visualization enabled by the graph makes simple work of detecting whether the bird is a year-round resident or only present for certain seasons. This is important for understanding seasonal distribution of species. Although I have not implemented this functionality yet, a daily breakdown of sightings could yield important information on bird migration stopover sites (i.e. where birds temporarily stop to refuel during migration).


Data Visualization on Birds

Graph that shows the frequency of species observations by month for a specific state


The final section of my application allows the user to inspect the data behind the graphics. This can be useful if the user wishes to extract a specific value of observations at a certain time or in a certain location. The data table has a search function that allows the user to filter the data by county, state, or time of day.


Data Visualization on Birds

Data tables that allow users to inspect specific values in the data set


Going Forward

Although the application is functional, there are several potential areas for improvement that I would like to address in the future.

  1. First

    1. The size of the data for some species is quite large which leads to issues with loading where graphics can take several seconds to render. This makes the play function of the map difficult to use effectively in some cases. I believe that these cases would benefit tremendously from either further optimization of the code or incorporating graphics packages with quicker rendering capabilities (ideally a combination of both)
  2. Second

    1. I would like to allow the user to inspect a much larger list of species and range of years. Due to the size of the data, storage is a significant issue which may not be avoidable without establishing a dedicated server to host the data.
  3. Third

    1. the observations by county are currently displayed using a log scale. I decided to use a log scale over the raw number of observations because many areas have observations of 1-100 and were vastly overshadowed by areas that had observations in the thousands. These areas with lower observations can still show significant trends, and I wanted to make sure they were not ignored. Still, this system does not address the issue of there being a bias in sighting frequency strictly due to larger numbers of available birders.
    2. Areas with larger populations will produce higher numbers of sightings simply due to the fact that there are more people actively looking for birds (similar to the issue I have with the time histogram). I would like to implement a system that standardizes county sightings by county population which would give a more normalized representation of sighting frequency.
  4. Finally

    1. my current system uses counties as groups for aggregating observations by location. This can introduce problems in location detail. For example, many counties in Western United States are very large and can diminish the granularity of the map. I have seen other bird range mapping tools (such as the eBird map) that instead use rectangular areas denoted by longitude and latitude and do not rely on human designated borders. Implementing this method could increase the level of accuracy of the map's representation of sighting hot-spots.

Hey, Thanks!

Thank you for taking the time to read about my project! As someone who is passionate about ecology and animal behavior, I found building this application to be very rewarding. I really appreciated the new insights I gained as a result of running it. I am always trying to think about new ways to marry technology and environmental biology, and data science is an incredibly powerful tool that I can use to ask and answer questions in a field that I think is fascinating. Feel free to check out my applicationΒ and I welcome any feedback!Β Here is a link to my GitHub repositoryΒ if you would like to explore my code.

About Author

Alex Baransky

Alex graduated from Columbia University with training in natural and technical sciences. He enjoys finding ways to utilize data science to answer questions efficiently and to improve the interpretability of results. Alex takes pride in his ability to...
View all posts by Alex Baransky >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI