Pizza. Everyone loves pizza.
Contributed by Laila El Gohary.
Laila took R003 class with Vivian Zhang(Data Science by R, Intensive beginner level) in Mar-Apr, 2014 and did great in class.The post was based on her final project submission.
---------------------------------------
Videos:
---------------------------------------
While preparing my final project for the R intensive beginner class, there were a lot of things (big and small) that I learnt along the way.
1. It's usually the small things that trip you up and have you searching the internet for hours for the answer. Don't do it! (Unless, of course it's absolutely necessary to move your project along.)
2. Finding a reliable geocode API or software that:
(a) takes more than couple of thousand hits a day
(b) doesn't cost a fortune or
(c) moves faster than a snail's pace....is a very hard thing to do. Thankfully I found one that I will go through later in this post!
3.Creating an aesthetically pleasing map takes some trial and error, and a few hours. Make sure you've given yourself enough time to accomplish your goal.
With that, let's begin!
Acquire:
Finding an appropriate dataset is usually the most time consuming part of the process. Luckily I was able to find just what I was looking for at the NYC Open Data site. I decided to look at restaurant sanitation grades across the 4 boroughs.
Clean:
As I noted earlier, it's always the smallest thing that slow you down. In my case it was removing rows that had zero in the borough column, which indicated that the restaurants in question weren't in any of the 4 boroughs.
Here's how I did it.
df$BORO[df$BORO==0]<-NA df <- na.omit(df)
Another slightly more elegant way to do this is:
df <- df [df [,3]!='0',]
#Looking at column 3, keep any rows that are not equal to zero.
Geocode:
Finding an appropriate geocode software for free can be difficult. I tried a couple of options, but the one that worked best for me was the datatoolkit API.
Make sure before you begin that all elements of the address are in one string, and that there are no characters in there that shouldn’t be. Those were a few hours of my life I wish I could get back!
#GEOCODING FUNCTIONgeo.dsk <- function(addr){ require(httr) require(rjson) url <- "http://www.datasciencetoolkit.org/maps/api/geocode/json" response <- GET(url,query=list(sensor="FALSE",address=addr)) json <- fromJSON(content(response,type="text")) loc <- json['results'][[1]][[1]]$geometry$location return(c(address=addr,long=loc$lng, lat= loc$lat)) }
Once you have the function, you need to call it, and pass through the necessary addresses.
#Calling the functionresult <- do.call(rbind,lapply(as.character(df$address), geo.dsk)) result <- data.frame(result)
Once you’ve managed to geocode everything and join it back to your data frame, you’re now ready for the fun part… mapping!
Mapping:
For my project I used ggmap because I was really interested in using stamen maps.
#SETTING UP THE BASIC MAPnycmap <- get_map(location = c(left = -74.045448, bottom = 40.544714, right = -73.629591, top = 40.928859), source= "stamen", zoom = 13, maptype = "toner", urlonly= FALSE, filename = "ggmapnyc")
You’ll notice above that for location, instead of using a fixed point and a zoom, I used the four corners that I wanted my map to plot within. It makes things much more specific, and worked really well for a tricky geographical location like New York.
When pulling a map through stamen, it’s good to know that what you get isn’t an image, but a bunch of color tiles. To see them you need to map them using ggmap.
ggmap(nycmap)
After using the stamen toner map I found the black to be too harsh for my purposes.
I was able to replace all the black with a light shade of blue.
cny<-attr(nycmap,"bb") nycmap[nycmap == "#000000"] <-"#5DCFC3" class(nycmap) <- c("ggmap", "raster") attr(nycmap,"bb") <-cny
For the final part of my project, I wanted to map particular cuisines to see if there were any clusters in certain parts of the city.
My first map was of Chinese food.
chinese <- subset(finaldf,CODEDESC == "Chinese") ggmap(nycmap)+ geom_point(aes(x=lon, y=lat), data = chinese, alpha =.5, color = "darkred", size=1.5)+ theme(axis.title.x = element_blank(), axis.text.x =element_blank(), axis.title.y = element_blank(), axis.text.y =element_blank(), title = element_text(face="bold",size=20)) + labs(title="Map of Chinese Restaurants")
One of the great things about ggmaps is the seemingly endless way to customize it, and make things look the way you’d like.
In the above map, I wanted to make sure to remove the x and y axis, as well as add a title.
Looking at American restaurants, we can see that there is a big grouping in Manhattan
French food is particularly hot in Manhattan, but if you want some French cuisine in the outer boroughs you can forget about it.
I decided to look at the pizza places in the city. I guessed that the map would look a lot like the American restaurants map. Everywhere but much more densely populated in Manhattan.
This is what happened instead.
I knew New Yorkers loved pizza.
But it looks like pizza may be the great equalizer of the city.