Pizza. Everyone loves pizza.

Avatar
Posted on May 4, 2014

Screen Shot 2014-06-17 at 1.49.46 PM

Contributed by Laila El Gohary.
Laila took R003 class with Vivian Zhang(Data Science by R, Intensive beginner level) in Mar-Apr, 2014 and did great in class.The post was based on her final project submission.
---------------------------------------

Videos:

---------------------------------------
While preparing my final project for the R intensive beginner class, there were a lot of things (big and small) that I learnt along the way.

1. It's usually the small things that trip you up and have you searching the internet for hours for the answer.  Don't do it!  (Unless, of course it's absolutely necessary to move your project along.)

2. Finding a reliable geocode API or software that:

(a) takes more than couple of thousand hits a day

(b) doesn't cost a fortune or

(c) moves faster than a snail's pace....is a very hard thing to do.  Thankfully I found one that I will go through later in this post!

3.Creating an aesthetically pleasing map takes some trial and error, and a few hours.  Make sure you've given yourself enough time to accomplish your goal.

With that, let's begin!

Acquire:
Finding an appropriate dataset is usually the most time consuming part of the process. Luckily I was able to find just what I was looking for at the NYC Open Data site. I decided to look at restaurant sanitation grades across the 4 boroughs.

Clean:
As I noted earlier, it's always the smallest thing that slow you down. In my case it was removing rows that had zero in the borough column, which indicated that the restaurants in question weren't in any of the 4 boroughs.

Here's how I did it.

df$BORO[df$BORO==0]<-NA
 df <- na.omit(df)  

Another slightly more elegant way to do this is:

df <- df [df [,3]!='0',]
 #Looking at column 3, keep any rows that are not equal to zero.

Geocode:

Finding an appropriate geocode software for free can be difficult.  I tried a couple of options, but the one that worked best for me was the datatoolkit API.

Make sure before you begin that all elements of the address are in one string, and that there are no characters in there that shouldn’t be.  Those were a few hours of my life I wish I could get back!

#GEOCODING FUNCTION
 geo.dsk <- function(addr){
 require(httr)
 require(rjson)
 url      <- "http://www.datasciencetoolkit.org/maps/api/geocode/json"
 response <- GET(url,query=list(sensor="FALSE",address=addr))
 json <- fromJSON(content(response,type="text"))
 loc  <- json['results'][[1]][[1]]$geometry$location
 return(c(address=addr,long=loc$lng, lat= loc$lat))
 }

Once you have the function, you need to call it, and pass through the necessary addresses.

#Calling the function
 result <- do.call(rbind,lapply(as.character(df$address), geo.dsk))
 result <- data.frame(result)

Once you’ve managed to geocode everything and join it back to your data frame, you’re now ready for the fun part… mapping!

 Mapping:

For my project I used ggmap because I was really interested in using stamen maps.

#SETTING UP THE BASIC MAP
 nycmap <- get_map(location = c(left = -74.045448, bottom = 40.544714, right = -73.629591, top = 40.928859),
 source= "stamen",
 zoom = 13,
 maptype = "toner",
 urlonly= FALSE,
 filename = "ggmapnyc")

You’ll notice above that for location, instead of using a fixed point and a zoom, I used the four corners that I wanted my map to plot within.  It makes things much more specific, and worked really well for a tricky geographical location like New York.

When pulling a map through stamen, it’s good to know that what you get isn’t an image, but a bunch of color tiles.  To see them you need to map them using ggmap.

ggmap(nycmap)

initial map
After using the stamen toner map I found the black to be too harsh for my purposes.

I was able to replace all the black with a light shade of blue.

cny<-attr(nycmap,"bb")
 nycmap[nycmap == "#000000"] <-"#5DCFC3"
 class(nycmap) <- c("ggmap", "raster")
 attr(nycmap,"bb") <-cny

For the final part of my project, I wanted to map particular cuisines to see if there were any clusters in certain parts of the city.

My first map was of Chinese food.

chinese <- subset(finaldf,CODEDESC == "Chinese")
 ggmap(nycmap)+
 geom_point(aes(x=lon, y=lat), data = chinese, alpha =.5, color = "darkred", size=1.5)+
 theme(axis.title.x = element_blank(), axis.text.x =element_blank(),
 axis.title.y = element_blank(), axis.text.y =element_blank(),
 title = element_text(face="bold",size=20)) +
 labs(title="Map of Chinese Restaurants")

Map Chinese
One of the great things about ggmaps is the seemingly endless way to customize it, and make things look the way you’d like.

In the above map, I wanted to make sure to remove the x and y axis, as well as add a title.

Looking at American restaurants, we can see that there is a big  grouping in Manhattan
Map American

French food is particularly hot in Manhattan, but if you want some French cuisine in the outer boroughs you can forget about it.
Map French

I decided to look at the pizza places in the city. I guessed that the map would look a lot like the American restaurants map. Everywhere but much more densely populated in Manhattan.
This is what happened instead.
Map Pizza

I knew New Yorkers loved pizza.
But it looks like pizza may be the great equalizer of the city.

About Author

Related Articles

Leave a Comment

Avatar
fake bvlgari jewelry necklaces December 3, 2016
Awesome. And yes, you know where to find me! fake bvlgari jewelry necklaces http://www.bzero1jewelry.net/en/bvlgari-bzero1-necklace-white-ceramic-pink-gold-pendant-p-233.html

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp