A Tree Grows in New York City

KISOO CHO

Posted on Aug 11, 2019

Project GitHub | LinkedIn: Niki Moritz Hao-Wei Matthew Oren

The skills we demoed here are taught in NYC Data Science Academy's Data Science with Machine Learning bootcamp .

Introduction

While more famous for it skyscrapers, New York City also is home to many trees that include a range of varieties. How many? That's a question that has to be further broken down: How many trees are there in New York City altogether? Which variety is most common overall? What differences are there between the five boroughs with respect to tree density and issues surrounding trees? To get the data on the trees, we dug into the street tree census data and 311 service request data from NYC OPEN DATA to see if there's a relationship between complaints about trees and street tree data.

tree-planting-banner-798__59de59ddc5b66 | Data Science Blog

Tree distribution

Let's start with a look at the overall distribution of trees visualized below. There are about 680,000 trees in New York City.

Screen-Shot-2019-08-10-at-1.41.25-AM | Data Science Blog

Then let's look at the distribution of street trees and 311requests in New York City.
That's the size of the tree, even though only 10 percent of the trees are randomly extracted and marked.

Screen-Shot-2019-08-10-at-1.43.20-AM | Data Science Blog

The results are displayed on the same map. Of course, the more trees there are, the more complaints there are. This map just shows that it's distributed evenly across the map.

Distribution by NTA

Screen-Shot-2019-08-10-at-1.45.45-AM | Data Science Blog

Regional distribution of trees separated by NTA. There are many trees on the side of the Staten Island and Queens.

Screen-Shot-2019-08-10-at-1.47.01-AM | Data Science Blog

But if it divided by area and look at the density, there's a lot of trees in Manhattan.

Distribution by postcode

Screen-Shot-2019-08-10-at-1.48.00-AM | Data Science Blog

This time, let's compare 311 request and tree quantities by region. That was corelated with post codes because there was no NTA information in 311 data. Some of the data were cleaned because the post codes classification may not be consistent across postmap data and 311 data. It may be distorted due to very small areas. As a general rule, areas with many trees have a large number of complaints. However, in a few areas, that ratio doesn't hold with complaints exceeding or coming out below the standard ratio.

Screen-Shot-2019-08-10-at-1.50.29-AM | Data Science Blog

See the visualization above. Compared to the area, red is a relatively more civil requested area than blue. But,I can't find anything unusual based on this alone.

Distribution by borough

Screen-Shot-2019-08-10-at-1.59.30-AM | Data Science Blog

This is a comparison of the density of the trees by borough and the complaints per tree. This is the result of excluding new tree requests to compare the number of complaints. Though Manhattan is dense with trees, there are many civil complaints in Brooklyn. The complaints in Manhattan are mostly requests for a new tree. You might conclude that Manhattan is relatively good at managing trees. Or maybe the species of tree in Manhattan is less civil requests.

Complaint about street trees

Screen-Shot-2019-08-11-at-8.29.58-AM | Data Science Blog

Complaint about street trees

Screen-Shot-2019-08-11-at-8.31.21-AM | Data Science Blog

This is 311 service request data. The largest percentage of street tree complaints descriptor is about dead/dying tree(the first one is about new tree request). If I gather similar complaints like dead branch/poor condition and so on, the ratio is larger. I thought that NYC would have planted a lot of trees with high health rates because of the complaints about dead trees from this data.

Rank about street tree’s quantity and health ratio

Now, let's check the health status rank of the street trees. I compared the ranking of the quantity of trees in the NYC and the proportion of healthy trees.

Screen-Shot-2019-08-10-at-2.06.34-AM | Data Science Blog

This is it. It's not relevant at all

In case there is distortion due to trees that are low in quantity, even if I check top 20 is the same result like follow.

Screen-Shot-2019-08-10-at-2.07.41-AM | Data Science Blog

Percentage of tree quantity by multiple criteria

Screen-Shot-2019-08-10-at-2.11.22-AM | Data Science Blog

Now let's look for other relevant data by checking the percentage of trees by status, the proportion by region and the proportion by variety. The most common trees are London Planetree and Honeylocust as pictured above. But it's hard to find anything significant from this.

Number of tree by multiple criteria

Screen-Shot-2019-08-10-at-2.12.47-AM | Data Science Blog

You can get the information about which trees are the most, what size trees are the most.

Diameter of tree by multiple criteria

Screen-Shot-2019-08-10-at-2.16.13-AM | Data Science Blog

According to the average size of trees by borough, Manhattan is the smallest.

Screen-Shot-2019-08-10-at-2.21.29-AM | Data Science Blog

The reason why the average tree size in Manhattan is small is because there are many Honeylocusts, which are relatively small trees.

I'm guessing they planted a lot of trees that weren't too big because it's downtown.

Screen-Shot-2019-08-10-at-2.25.15-AM | Data Science Blog

On the other hand, the Queens have a relatively large number of london planetrees, so the average size of the trees is large.

Compare data for each borough

Screen-Shot-2019-08-10-at-2.26.30-AM | Data Science Blog

I compared data by borough. This reveals that the number of complaints and the number of trees are in proportion.

Screen-Shot-2019-08-10-at-2.27.14-AM | Data Science Blog

Screen-Shot-2019-08-10-at-2.27.54-AM | Data Science Blog

The graph also shows a proportional relationship between the number of complaints and the size of the trees, which can be seen as proportional to the quantity of trees and the size of the trees. You can assume that there are more trees out of the city, or more trees with more complaints, but the relationship that data shows does not explain the causes and effect.

Conclusion and future works

The number of trees, the number of complaints, and the size of trees, were shown to be proportional when divided by borough. Whether the number of complaints increases with the size of the tree or with the species of trees is not known by the information given.

I tried a lot to find a great relationship, but it was hard to find any good results. The quantified data is only the diameter of the tree and, except for the region, it was hard to find the appropriate data for classification.

If I have time later, I would like to use data related to pedestrian inconvenience that I did not use very much in this analysis.It would be interesting to analyze abnormal stem or root conditions and related data.

The Shiny app I build for the above analysis can be found here: Shinyapp of NYC street tree

A Tree Grows in New York City

Project GitHub | LinkedIn: Niki Moritz Hao-Wei Matthew Oren

Introduction

Tree distribution

Distribution by NTA

Distribution by postcode

Distribution by borough

Complaint about street trees

Complaint about street trees

Rank about street tree’s quantity and health ratio

Percentage of tree quantity by multiple criteria

Number of tree by multiple criteria

Diameter of tree by multiple criteria

Compare data for each borough

Conclusion and future works

About Author

KISOO CHO

Leave a Comment

Cancel reply

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

NYC Data Science Academy

Get detailed curriculum information about our
amazing bootcamp!

Offerings

About

SOCIAL MEDIA

A Tree Grows in New York City

Project GitHub | LinkedIn: Niki Moritz Hao-Wei Matthew Oren

Introduction

Tree distribution

Distribution by NTA

Distribution by postcode

Distribution by borough

Complaint about street trees

Complaint about street trees

Rank about street tree’s quantity and health ratio

Percentage of tree quantity by multiple criteria

Number of tree by multiple criteria

Diameter of tree by multiple criteria

Compare data for each borough

Conclusion and future works

About Author

KISOO CHO

Leave a Comment

Cancel reply

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

NYC Data Science Academy

Get detailed curriculum information about our amazing bootcamp!

Offerings

About

SOCIAL MEDIA

Get detailed curriculum information about our
amazing bootcamp!