Exploratory Data Visualization of Manhattan Street Trees

Posted on May 1, 2016
Contributed by Belinda Kanpetch. She is currently in the NYC Data Science Academy 12 week full time Data Science Bootcamp program taking place between April 11th to July 1st, 2016. This post is based on her first class project - R visualization (due on the 2nd week of the program).
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Why Street Trees?

The New York City street tree can sometimes be taken for granted or go unnoticed. Located along paths of travel they stand steady and patient; quietly going about their business of filtering out pollutants in our air, bringing us oxygen, providing shade during the warmer months, blocking winds during cold seasons, and relieving our sewer systems during heavy rainfall. All of this while beautifying our streets and neighborhoods. Some recent data studies have found a link between presence of streets and lower stress levels in urban citizens.

So what makes a street tree different from any other tree? Mainly its location. A street tree is defined as any tree that lives within the public right of way; not in a park or on private property. Although they reside in the public right of way (or within the jurisdiction of The Department of Transportation) they are the property of and cared for by the NYC Department of Parks and Recreation.

With the intent to understand the data and explore what the data was telling me I started with some very basic questions:

  • How many street trees are there in Manhattan?
  • How many different species are there?
  • What is the general condition of the street trees?
  • What is the distribution of species by community district?
  • Is there a connection between median income of a community district to the number of street trees?

The Data Set

The dataset used for this exploratory visualization was downloaded from the NYC Open Data Portal and was collected as part of TreeCount!2015, a street tree census maintained by the NYC Department of Park and Recreation. The first census count was 1995 and has been conducted every 10 years by trained volunteers.

Some challenges with this dataset involved missing values in the form of unidentifiable species types. There were 2285 observations with unclassifiable species type, 487 observations that had unclassifiable community districts, geographic information (longitude and latitude) were character strings that had to be split into different variables, and species codes were given by 4 letter characters without any reference to genus, species, or cultivar and I had to find another dataset to decipher that code.

See the code on github.

Visualizing the data

A quick summary of the dataset revealed a total of 51,660 trees total in Manhattan with 91 identifiable species with one ‘species’ as missing values.

A bar plot of all 92 species gave an interesting snapshot of the range in total number of trees per species. It was quite obvious that there was one species that has a dominant presence. In order to get better understanding of their counts and what were common species, I broke them down by quartiles and plotted them.

Exploratory Data Visualization of Manhattan Street Trees

Plotting the first quartile (< 3.75)revealed that there were several species in which there was only one tree that existed in Manhattan!

Exploratory Data Visualization of Manhattan Street Trees

Within the IQR (3.75 << total >>181.75) there was still quite a range but nothing that stood out as a species to investigate.

Exploratory Data Visualization of Manhattan Street Trees

The distribution within the 4th quartile (181.75 << total >> 11529) was informative in that it helped to visualize the dominance of two specific species, the Honeylocust and Ornamental Pear that make up 23% and 15% of all the trees in Manhattan respectively. Coming in close were Ginko trees with 9.47% and London Plane with 7.8%. This quartile also contained the missing species group ‘0’.

species_dist_4

A palette of the top 4 species in Manhattan.

species_palette

Looking at data on trees by Community District

I wanted to look at community districts as opposed to zip codes because in my opinion community districts are more representative of community cohesiveness and character. So I plot the distribution by community district and tree condition.

species_health_comdist

Plotting the species distribution by community board using facet grid helped visualize other species that were not showing up dominant in the previous graphs. It would be interesting to look further into what those species are and why they are more dominant within some community districts and not others.

species_dist_comdist

Attempts at mapping

The ultimate goal was to map each individual tree location on a map of Manhattan with the community districts outlined or shaded in. I attempted to plot them on a map using leaflet, bringing in shape files and converting to a data frame, and ggplot but neither yielded anything useful. The only visualization I was able to get was using qplot which took over 2 hours to render.

map_point

Next steps:

I would like take this into our next project utilizing shiny. For this project I would:

  • Map tree locations with community district outlines.
  • Scale the analysis to include all 5 boroughs for comparison.
  • Add variables, such as drought and soil tolerances, to better understand the tree characteristics by geographic location.
  • Look at previous years census to compare tree condition through time.

About Author

Belinda Kanpetch

Belinda hails from the Bayou City, where she earned her Bachelors of Architecture from the University of Houston. Drawn to the bright lights and energy of the city, she relocated to New York to attend Columbia University at...
View all posts by Belinda Kanpetch >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI