Finding the Right NYC Bar
"Anywhere is walking distance, if you've got the time.
- Stephen Wright
A few months ago I was tasked with the following question: do bars located near many other bars do a brisk business because of the foot traffic? or does too much competition cut into a bar's revenue. Along the way I was able to identify the top rated bars for those who like popular watering holes, as well as those highly rated out of the way places - for those of us who like a little quiet with our scotch.
I started with a list of 615 Manhattan bars, and the following information: address, number of online reviews, and the bars' ratings. To determine how many bars are in near proximity to other bars, the distance of every bar to every other bar had to be calculated. The latitude and longitude was downloaded for every bar's location using ggmap and a simple code:
library(ggmap)
addresses <- with(bars, paste(street, city, "NY", sep=", "))
#create readable addresses from .csv file
locs <- geocode(addresses)
A double loop was used to create a 615 by 615 matrix of the distance between every bar and every other bar:
traffic <- function(blocks,lat,lon){
distance <- matrix(1,615,615) #initialize the matrix
for (i in 1:length(lat)) {
for (j in 1:length(lat)) {
distance[i,j] <- abs(lat[i]-lat[j]) + abs(lon[i]-lon[j]) } }
distance <- distance*1000 #scale numbers to approximately 1 equal to one block
index.dist = (distance <= blocks) # the number of bars that are within x "blocks" of any single bar
result = rowSums(index.dist)-1 #exclude any bars distance from itself
return(result)
}
crowd <-traffic(blocks,lat,lon)
For the sake of analysis, the number of reviews was used as a proxy for the amount of business a bar does. Using this measure it turns out there is no statistically significant relationship between how close a bar is to other bars and how much business it does. Anecdotally, however, at least in this sample, there appears to be a relationship that is characteristically New York.
When there are 5 or less bars within a 1 block radius, there is a lot of traffic (avg. review count = 222), but when there are 10 bars within a block, competition hurts everyone a little bit (avg. review count = 179). Finally, if the ten nearest bars are within 5 blocks...well, 5 blocks is just too far for a New Yorker to walk!
Now for the fun facts. Two data frames were created so that highly rated /high traffic bars could be mapped, and highly rated / low traffic bars could be mapped.
lo.traf <- subset(manhattan, crowd<=1 & rating == 5)
hi.traf <- subset(manhattan, crowd >= 5 & rating >= 4.5)
ggmap makes it easy to download maps that can then be used with ggplot2. It just takes a little trial and error to get the exact lattitude / longitude dimensions that fit the data:
nycmap <- get_googlemap(center = c(lon = -73.99, lat = 40.725), zoom=14)
The first map shows the best choices for those of us who don't especially like crowds. These are the highest rated bars among the least crowded choices.
p + geom_point(data=lo.traf, aes(x=lon, y=lat), color="red", size=3) + geom_text(data=lo.traf, color="navy", label=name.lo, vjust=1.5, size=4)
Google maps, while easy to download and use, doesn't offer a lot of options in terms of color and style. The map above is rather too dark for adding text. While the colors can be changed, this needs to be done pixel by pixel, and "nycmap" has over 400,000 pixels! Daunting?
As it turns out, "nycmap" has fewer than 50 unique hexadecimal values. These unique values can be identified by turning the raster matrix into a data frame with the dimensions 410240 x 3. From there it is easy to pull out the unique values as follows:
library(reshape2)
id <- c(1:640)
new.map <-cbind(id,nycmap)
new.map.L <- melt(data=new.map, id='id')
new.map.U <-
new.map.L[unique(new.map.L$value)
The unwanted values can be changed with code like this - for each value you want to change:
cny<-attr(nycmap,"bb")
nycmap[nycmap == "#A0A0A0"] <-"#D1D1D3" #in this case A0A0A0 is the original value being replaced
lass(nycmap)
attr(nycmap,"bb") <-cny
The result is a map of the hot spots - the most highly rated bars on the most happening blocks.