Finding the Right NYC Bar

Posted on May 8, 2014

"Anywhere is walking distance, if you've got the time.

- Stephen Wright

A few months ago I was tasked with the following question: do bars located near many other bars do a brisk business because of the foot traffic? or does too much competition cut into a bar's revenue. Along the way I was able to identify the top rated bars for those who like popular watering holes, as well as those highly rated out of the way places - for those of us who like a little quiet with our scotch.

I started with a list of 615 Manhattan bars, and the following information: address, number of online reviews, and the bars' ratings. To determine how many bars are in near proximity to other bars, the distance of every bar to every other bar had to be calculated. The latitude and longitude was downloaded for every bar's location using ggmap and a simple code:

addresses <- with(bars, paste(street, city, "NY", sep=", "))   #create readable addresses from .csv file
locs <- geocode(addresses)

A double loop was used to create a 615 by 615 matrix of the distance between every bar and every other bar:

traffic <- function(blocks,lat,lon){
distance <- matrix(1,615,615) #initialize the matrix
for (i in 1:length(lat)) {
for (j in 1:length(lat)) {
distance[i,j] <- abs(lat[i]-lat[j]) + abs(lon[i]-lon[j])
} }
distance <- distance*1000 #scale numbers to approximately 1 equal to one block
index.dist = (distance <= blocks) # the number of bars that are within x "blocks" of any single bar
result = rowSums(index.dist)-1 #exclude any bars distance from itself

crowd <-traffic(blocks,lat,lon)

For the sake of analysis, the number of reviews was used as a proxy for the amount of business a bar does. Using this measure it turns out there is no statistically significant relationship between how close a bar is to other bars and how much business it does. Anecdotally, however, at least in this sample, there appears to be a relationship that is characteristically New York.


When there are 5 or less bars within a 1 block radius, there is a lot of traffic (avg. review count = 222), but when there are 10 bars within a block, competition  hurts everyone a little bit (avg. review count = 179).  Finally, if the ten nearest bars are within 5 blocks...well, 5 blocks is just too far for a New Yorker to walk!

Now for the fun facts.  Two data frames were created so that highly rated /high traffic bars could be mapped, and highly rated / low traffic bars could be mapped.

lo.traf <- subset(manhattan, crowd<=1 & rating == 5)
hi.traf <- subset(manhattan, crowd >= 5 & rating >= 4.5)

ggmap makes it easy to download maps that can then be used with ggplot2.  It just takes a little trial and error to get the exact lattitude / longitude dimensions that fit the data:

nycmap <- get_googlemap(center = c(lon = -73.99, lat = 40.725), zoom=14)

The first map shows the best choices for those of us who don't especially like crowds. These are the highest rated bars among the least crowded choices.

p + geom_point(data=lo.traf, aes(x=lon, y=lat), color="red", size=3) +   geom_text(data=lo.traf, color="navy", label=name.lo, vjust=1.5, size=4)

map lo largest pring

Google maps, while easy to download and use, doesn't offer a lot of options in terms of color and style. The map above is rather too dark for adding text. While the colors can be changed, this needs to be done pixel by pixel, and "nycmap" has over 400,000 pixels! Daunting?

As it turns out, "nycmap" has fewer than 50 unique hexadecimal values. These unique values can be identified by turning the raster matrix into a data frame with the dimensions 410240 x 3. From there it is easy to pull out the unique values as follows:


id <- c(1:640) <-cbind(id,nycmap) <- melt(, id='id') <-[unique($value)

The unwanted values can be changed with code like this - for each value you want to change:

nycmap[nycmap == "#A0A0A0"] <-"#D1D1D3" #in this case A0A0A0 is the original value being replaced
attr(nycmap,"bb") <-cny

The result is a map of the hot spots - the most highly rated bars on the most happening blocks.

map hi largest pring

About Author

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp