Data Visualization on Birds

Alex Baransky

Posted on Jul 30, 2018

The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

The American Birder

For the millions of bird watchers in America, relevant and useful data resources are always welcome. Range maps and ecological histories enhance the bird watching experience by adding a layer of conservation awareness and help hobbyists become more acquainted with the birds they observe.

As a birder myself, I am always looking for new applications that help me achieve a greater understanding of the birds I observe on a daily basis. Learning more about these fascinating and beautiful animals helps to bring the bigger picture into focus; they have a much larger role in our world than just the momentary glimpse you get when observing them in a park or when walking down the street.

The Data

I chose to use the eBird dataset from the Cornell Lab of Ornithology to construct a Shiny application in R that allows a birder to "zoom out" from an isolated bird observation. Crowd-sourced data from around the world is what enables eBird to track the locations and times of bird observations. Using their online interface, a user can view observations of many different species of birds, explore bird watching hot-spots, and even see a real time observation submission map. Observations are available from as far back as the year 1900, and the increasing accessibility of technology translates to an ever increasing avalanche of data pouring in. The total data set today consists of over 500 million observations!

The App

Using the Shiny package in R, I was able to build an application that explored observations from 2016 of 10 species of birds in the United States. I aggregated the observations by county and produced a graph using ggplot2 to show the frequency of observations throughout the US. The observations can be filtered by month using a slider bar to inspect the distribution of sightings during a particular season.

I also added a feature that allows the user to filter the observations by breeding season, which I implemented by using estimated breeding season ranges from the Cornell Lab of Ornithology Birds of North America website. A feature that I found really interesting and was excited to add to my application is the “play” button which shows an animated map that cycles through the months of the year and displays the bird sightings accordingly. This provides the user with an important perspective on the movement of different species throughout the US which can sometimes be lost when looking at separate monthly range maps one at a time.

Map that shows number of species observations in a specific month range

In addition to visualizing the movement of birds throughout the US, I wanted to add additional functionality for bird watchers. I thought an interesting question to ask would be: at what time are birds most often being seen?

To answer this question, I added a histogram which shows the frequency of times that a certain species was observed. What I found was that 8:00 AM was by far the most frequent time that an eBird user submitted an observation. What I have concluded is that the data is biased; many more people are actively bird watching around 8:00 AM, thus, the amount of observations spike around that time. This does not necessarily mean that a species is more likely to be seen at 8:00 AM, only that there are more people actively looking.

However, I did find a different trend for the only owl species (the Short-eared Owl) that I included in my list of species. This species showed a maximum sighting frequency at around 5:00 PM, which is in agreement with the fact that owls become more active around dusk. This leads me to believe that the functionality of this feature is mainly relevant for determining very basic activity levels for certain species.

Histogram that shows the frequency of species observations by time of day

When viewing the range map I found that regional movement of bird sightings was apparent, however it was easy to overlook state-level observation trends. I elected to add a bar graph that brakes down observations by month for every state where there were sightings. This feature allows the user to see a clear pattern in sighting frequency over the course of a year.

The visualization enabled by the graph makes simple work of detecting whether the bird is a year-round resident or only present for certain seasons. This is important for understanding seasonal distribution of species. Although I have not implemented this functionality yet, a daily breakdown of sightings could yield important information on bird migration stopover sites (i.e. where birds temporarily stop to refuel during migration).

Graph that shows the frequency of species observations by month for a specific state

The final section of my application allows the user to inspect the data behind the graphics. This can be useful if the user wishes to extract a specific value of observations at a certain time or in a certain location. The data table has a search function that allows the user to filter the data by county, state, or time of day.

Data tables that allow users to inspect specific values in the data set

Going Forward

Although the application is functional, there are several potential areas for improvement that I would like to address in the future.

First
1. The size of the data for some species is quite large which leads to issues with loading where graphics can take several seconds to render. This makes the play function of the map difficult to use effectively in some cases. I believe that these cases would benefit tremendously from either further optimization of the code or incorporating graphics packages with quicker rendering capabilities (ideally a combination of both)
Second
1. I would like to allow the user to inspect a much larger list of species and range of years. Due to the size of the data, storage is a significant issue which may not be avoidable without establishing a dedicated server to host the data.
Third
1. the observations by county are currently displayed using a log scale. I decided to use a log scale over the raw number of observations because many areas have observations of 1-100 and were vastly overshadowed by areas that had observations in the thousands. These areas with lower observations can still show significant trends, and I wanted to make sure they were not ignored. Still, this system does not address the issue of there being a bias in sighting frequency strictly due to larger numbers of available birders.
2. Areas with larger populations will produce higher numbers of sightings simply due to the fact that there are more people actively looking for birds (similar to the issue I have with the time histogram). I would like to implement a system that standardizes county sightings by county population which would give a more normalized representation of sighting frequency.
Finally
1. my current system uses counties as groups for aggregating observations by location. This can introduce problems in location detail. For example, many counties in Western United States are very large and can diminish the granularity of the map. I have seen other bird range mapping tools (such as the eBird map) that instead use rectangular areas denoted by longitude and latitude and do not rely on human designated borders. Implementing this method could increase the level of accuracy of the map's representation of sighting hot-spots.

Hey, Thanks!

Thank you for taking the time to read about my project! As someone who is passionate about ecology and animal behavior, I found building this application to be very rewarding. I really appreciated the new insights I gained as a result of running it. I am always trying to think about new ways to marry technology and environmental biology, and data science is an incredibly powerful tool that I can use to ask and answer questions in a field that I think is fascinating. Feel free to check out my application and I welcome any feedback! Here is a link to my GitHub repository if you would like to explore my code.

About Author

Alex Baransky

Alex graduated from Columbia University with training in natural and technical sciences. He enjoys finding ways to utilize data science to answer questions efficiently and to improve the interpretability of results. Alex takes pride in his ability to...

View all posts by Alex Baransky >

Meetup

What can data say about work-life balance and achievement?

Python

Tech Layoffs: Exploring the Trends and Industry Shifts

R Shiny

Making US Crime Data Accessible with R Shiny

Meetup

Examining Digital Connectivity in Kenya's 2019 Census Data

Data Visualization

Modeling Life Expectancy

Cancel reply

You must be logged in to post a comment.

No comments found.

Data Visualization on Birds

The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

The American Birder

The Data

The App

Going Forward

First

Second

Third

Finally

Hey, Thanks!

About Author

Alex Baransky

Related Articles

Leave a Comment

Cancel reply

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

NYC Data Science Academy

Get detailed curriculum information about our
amazing bootcamp!

Offerings

About

SOCIAL MEDIA

Data Visualization on Birds

The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

The American Birder

The Data

The App

Going Forward

First

Second

Third

Finally

Hey, Thanks!

About Author

Alex Baransky

Related Articles

Leave a Comment

Cancel reply

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

NYC Data Science Academy

Get detailed curriculum information about our amazing bootcamp!

Offerings

About

SOCIAL MEDIA

Get detailed curriculum information about our
amazing bootcamp!