Data Analysis on the Best Bars in New York City

Posted on Oct 22, 2018
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Data Analysis on the Best Bars in New York City Image source:

| Motivation |

Many people go bars to connect, relax, have fun, and meet people. While others go to put an end to the monotonous life, stay in touch with friends, be seen, be heard, listen to music, watch games, etc. Whatever may be the intention of going, data shows bars provides social lubricant to relax people.

My sole intention of this project was to answer my friend’s question “Which is the best bar in New York City?” that I was unable to answer quantitatively when he asked me before his visit here. Prior to this project, I did not have any quantitative information regarding bars other than  reviewing Yelp search results or other similar applications for bars. With this project I intended to update my understanding of bars around New York City with my own quantitative measurements .

| Questions expected to be answered |

What's the neighborhood in New York City has the most active night life? Which are the best bars in New York City? Which day of the week is best and worst to go bars? What percentage of bars are wheel chair accessible? What percentage of bars have happy hours, bar TV, own parking, etc.

| Methods and tools |

In order to collect data about bars in New York city, I scraped Yelp using Scrapy tool written in Python. All data cleaning, analysis and data visualization were performed in Pandas and NumPy. All of my coding including the data can be found in following git hub: link

| Data on Neighborhoods with most active Night life |

 Before diving into the best New York City bars, I wanted to find out which neighborhood had the most active night life in New York. To accomplish this, I created a bar plot demonstrating the number of bars versus neighborhood. From this bar plot, the top five neighborhoods with most active night life were found to be Mid town West, Mid town East, East Village, Upper East Side, and West Village.


Data Analysis on the Best Bars in New York City

| Data on Best Bars in New York City |

 In order to find best bars in New York City, I created a "popularity index", defined by the product of the number of reviews and the bar ratings listed in the Yelp website. The best five bars in New York City on the basis of popularity index are shown below. Moreover, best bars were also found to have price range in the less expensive region.Data Analysis on the Best Bars in New York City

| Best and worst night for going bars | 

The best and worst night of a week to go bars were calculated from both the popularity index, and the best nights listed on individual bars page.

For example, if the bar has listed the best night to be Friday, it was given value 1 for Friday and rest of the days in week were given zero. Then values for the particular day of week was multiplied with the popularity index of each bar which was then summed over all the bars. Finally, whichever day of the week has the highest popularity index value was assigned the best night and that with the least value was worst night to go bars. The histogram of various night with popularity index is shown below:

Data Analysis on the Best Bars in New York City

| Other useful data |

From the data I collected from the Yelp website, I calculated various percentage of different facilities in bars. In New York city, 30 percent bars are wheel chair accessible. Only 8.15 percent have bar dancing facility and 9.44 percent have their own parking garage. The percentage of bars that provide reservations, happy hours, and with bar TV are 51.23,  53.04, and 53.56, respectively.

| Conclusions |

I hope with these informations about bars will be helpful to choose your best bars in New York City. From business point of view, this project provides areas to improve such as bar parking, bar dancing, etc. in order to have successful bars in New York City.

| Future directions |

It would be nice to additionally collect more information about male to female ratio in each bar by scraping the individual reviewers for each bar. Male to female ratio might help people to choose right bar to go according to their interest. Finding zones of popular drinking site (may be using heat map) might provide driving industry new area to focus to expand their business in future.

About Author

Basant Dhital

Basant Dhital is a Physics Ph.D. with an excellent background in Mathematics and Statistics and demonstrated programming skills. During his Ph.D. research, he developed several algorithms to process and analyze NMR and other spectroscopic data. He developed a...
View all posts by Basant Dhital >

Leave a Comment

Basant Dhital January 16, 2020
I, later on, edited on writing to address some statistical flaws but on the presentation slide maybe I forgot to do it. I don't remember about the six-star review. Where did you find six-star review? I don't remember.
Lexi De Veaux January 15, 2020
This is awesome, Basant ! In this writeup you define "popularity index" as the product of the number of reviews and the bar ratings listed in yelp. However, in your presentation from the Github link I noticed you defined the "popularity index" as number of reviews/6-star reviews. Could you expand a little on what exactly is a 6-star review? On yelp, reviews only go up to 5-stars. Thanks !

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI