NYC Data Science Academy| Blog
Bootcamps
Lifetime Job Support Available Financing Available
Bootcamps
Data Science with Machine Learning Flagship 🏆 Data Analytics Bootcamp Artificial Intelligence Bootcamp New Release 🎉
Free Lesson
Intro to Data Science New Release 🎉
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook Graduate Outcomes Must See 🔥
Alumni
Success Stories Testimonials Alumni Directory Alumni Exclusive Study Program
Courses
View Bundled Courses
Financing Available
Bootcamp Prep Popular 🔥 Data Science Mastery Data Science Launchpad with Python View AI Courses Generative AI for Everyone New 🎉 Generative AI for Finance New 🎉 Generative AI for Marketing New 🎉
Bundle Up
Learn More and Save More
Combination of data science courses.
View Data Science Courses
Beginner
Introductory Python
Intermediate
Data Science Python: Data Analysis and Visualization Popular 🔥 Data Science R: Data Analysis and Visualization
Advanced
Data Science Python: Machine Learning Popular 🔥 Data Science R: Machine Learning Designing and Implementing Production MLOps New 🎉 Natural Language Processing for Production (NLP) New 🎉
Find Inspiration
Get Course Recommendation Must Try 💎 An Ultimate Guide to Become a Data Scientist
For Companies
For Companies
Corporate Offerings Hiring Partners Candidate Portfolio Hire Our Graduates
Students Work
Students Work
All Posts Capstone Data Visualization Machine Learning Python Projects R Projects
Tutorials
About
About
About Us Accreditation Contact Us Join Us FAQ Webinars Subscription An Ultimate Guide to
Become a Data Scientist
    Login
NYC Data Science Acedemy
Bootcamps
Courses
Students Work
About
Bootcamps
Bootcamps
Data Science with Machine Learning Flagship
Data Analytics Bootcamp
Artificial Intelligence Bootcamp New Release 🎉
Free Lessons
Intro to Data Science New Release 🎉
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook
Graduate Outcomes Must See 🔥
Alumni
Success Stories
Testimonials
Alumni Directory
Alumni Exclusive Study Program
Courses
Bundles
financing available
View All Bundles
Bootcamp Prep
Data Science Mastery
Data Science Launchpad with Python NEW!
View AI Courses
Generative AI for Everyone
Generative AI for Finance
Generative AI for Marketing
View Data Science Courses
View All Professional Development Courses
Beginner
Introductory Python
Intermediate
Python: Data Analysis and Visualization
R: Data Analysis and Visualization
Advanced
Python: Machine Learning
R: Machine Learning
Designing and Implementing Production MLOps
Natural Language Processing for Production (NLP)
For Companies
Corporate Offerings
Hiring Partners
Candidate Portfolio
Hire Our Graduates
Students Work
All Posts
Capstone
Data Visualization
Machine Learning
Python Projects
R Projects
About
Accreditation
About Us
Contact Us
Join Us
FAQ
Webinars
Subscription
An Ultimate Guide to Become a Data Scientist
Tutorials
Data Analytics
  • Learn Pandas
  • Learn NumPy
  • Learn SciPy
  • Learn Matplotlib
Machine Learning
  • Boosting
  • Random Forest
  • Linear Regression
  • Decision Tree
  • PCA
Interview by Companies
  • JPMC
  • Google
  • Facebook
Artificial Intelligence
  • Learn Generative AI
  • Learn ChatGPT-3.5
  • Learn ChatGPT-4
  • Learn Google Bard
Coding
  • Learn Python
  • Learn SQL
  • Learn MySQL
  • Learn NoSQL
  • Learn PySpark
  • Learn PyTorch
Interview Questions
  • Python Hard
  • R Easy
  • R Hard
  • SQL Easy
  • SQL Hard
  • Python Easy
Data Science Blog > Student Works > A Traveler’s Guide to Broadway Musicals

A Traveler’s Guide to Broadway Musicals

Zhenggang Xu
Posted on Aug 16, 2018

Motivation: Broadway is one of the signatures of New York City. Statistics shows that 13.8 million people attended a Broadway show during the 2017 – 2018 season ( https://www.broadwayleague.com/press/press-releases/2017-2018-broadway-end-of-season-statistics/), a number which is ~1.6 times of the NYC population. Statistics also shows ~60% of the attendance was contributed by tourists. As tourists make up a significant percentage of the Broadway audience, it would be interesting to find out what their take is on the shows.k. Are there any patterns, and can we use this information to guide the future tourists? In order to explore possible answers to these questions, I did some research on the reviews of some of the most popular Broadway musicals on Tripadvisor. Tripadvisor might not be the most comprehensive or professional website for Broadway reviews, but that is an ideal place to study the traveler's real opinion on those shows, as, local people  do not normally post their reviews there. If you are traveling to NYC and considering taking into a Broadway show, those reviews might be helpful.

 

 

(1) Methodologies:

I used the Scrapy package of Python to do the web scraping. I chose ~10 most popular musicals in Broadway and collected the reviews and some user information. For  shows with over 5,000 reviews I only grabbed half of review items. I ended up scraping ~ 20,000 review items in total. This is the starting point for all my analysis.

 

(2) Analysis

i. The reviewers

I firstly plotted the distributions for reviewers' review counts and the votes (indicating the reviews are helpful) they received. Since most of the reviewers are ordinary travelers, the voting is not affected by biases such as reputations. As we can see in the charts, most of the people published less than ten reviews and received very few endorsements. Only a small portion of them are active writers of reviews. I defined a metric called review quality (number of helpful votes received/number of reviews written) to roughly quantify the impact of a particular reviewer. I further divided them into three groups (low, medium and high quality) and set the means on the ratings they gave to these selected shows.

 A clear trend came to light. There may be two reasons. Probably people tend to think the reviews containing some criticisms are more trustworthy, or people who are actively comment tend to be on the picky side.

 

 

 

ii. Seasonal fluctuations:

Then I looked at the number of reviews vs. the month. Clear patterns can be observed in the bar graph. After the holiday season, the number of reviews drop sharply in February. If we assume the number of reviews are correlated with the attendance, it indicates that the tourist attendance at Broadway shows hit the bottom in February. It gradually picks up in the spring and finally reaches the peak in July, as NYC is a popular destination for travelers during their summer vacations.  Attendance in the second half of the year remains good with some random fluctuations .

 

 

When looking at this graph, you may wonder if there is also some similar pattern in tourists' overall experience (eg. satisfaction or not). Is there an optimal time to experience  a Broadway show? I looked at the ratings in detail and found the rating distributions do not fluctuate too much across the year. This is a good news for tourists: you can go to Broadway any time of the year and enjoy the same experience from the shows.

 

 

You may have noticed that the overall ratings are pretty high. It is indeed one problem of this data since it suffers strong "survival bias." These popular shows in Broadway are the best of the best. Competitions in Broadway stages are fierce. Only 20% of the new production each year can break even. Far fewer shows can achieve success and survive to the next season and beyond. Therefore, the most popular shows must be outstanding in many ways so that people traveling in from across the country or across the world are willing to spend their money and their vacation time here  to see them.

Another question I sought to answer was: do these reviews just contain words of praise without any insight?

Not really. I will show you some analysis on the review simply using the word cloud.

iii. Analysis on the review

We begin by running a word cloud for all the reviews.

 

From this word cloud we can see a lot of key words, such as “performance,”, ”song,” ”story,”, “cast,” and so on. But it is not easy to see any pattern from it. Sometimes too much information is not all that informative. So we need to take a different approach:. How about looking at individual shows?

We can do so with the musical Come from Away, a new production just landed on Broadway last March, and it has the highest rating on TripAdvisor. Based on true stories happened in a small Canadian town far away from United States in the following week after 9/11, the musical makes us believe in humanity over hate in the darkest hours.

 

 

From this word cloud, we can see that ”story” is the biggest key word in the reviews, indicating that what the audiences appreciated most was the story of the show. “Music” and “cast” received a lot of attention too. The story-telling is its best part. With no props beyond a few tables and chairs and no elaborate stage sets or costumes, a dozen people vividly conveyed a warm story.

Next (I am not following the order of the overall ratings), let’s look at  the longest running musical currently on Broadway, The Phantom of the Opera' . The word cloud shows that members of the audience are most fascinated by the music of the show. The shows’ songs drew countless people into the world of musicals, including me. Besides music, surprisingly, people mentioned ”seat” a lot of times. I think it is probably because Majestic Theater is a bigger theater. Where do you sit really matters on the experience so that people tend to keep talking about it. In contrast to Come from Away, ”story” is not among the key words anymore; “cast” received less attention too. For this show, music outshines everything else.

 

 

The next one word cloud comes from The Lion King. This is where things start to get more interesting.

 

 

Although the music of The Lion King is outstanding, and  the story is well-known to everyone, those are not what people focus on. Costumes are the most commented-on component. I think this is indeed the key of its success because the music and story are nothing new to the audiences. But the spectacular costumes give the audiences, especially the kids, (another key words in the word cloud) sitting in the theater a totally different experience from watching a movie. Besides costume, we notice that “'ticket” receives considerable attention, probably because its tickets are normally quite pricey.

Now let us move to Hamilton, the biggest Broadway hit in recent years. What do people talk most about this show? If you think people talks about music, history, story or even rap about this show most often, you are wrong. Actually “ticket” is the word which enjoys the most attentions.. In my opinion, Hamilton is truly a work of genius, but when people pay more attention on tickets than on the show itself, it is not something good. Besides tickets, of course, audiences should like the music, the story, the cast and the performance, and all of them are highlighted in the word cloud.

 

 

Now let us have look of the four word cloud graphs together.

 

 

We can easily see the four shows have different key words from the reviews. Thanks to the diversity of Broadway shows, theater-lovers can always find what they like on the stages. Diverse as they are, good music is the bottom line for good musicals. So you will find “music” is a significant element in all of them.

Furthermore, if the show is in a bigger theater, people tend to mention “seat” more often since it is a critical factor. Similarly, the more expensive the tickets are, the more often people will talk about it. We see in the case of  Come from Away,  which is playing  in a smaller theater with lower ticket prices that you do not see  “seat” and “ticket” in the word cloud. That indicates to me that for that musical, people can focus more on the show itself in contrast to the Phantom of the Opera in which “seat” is of central concern and Hamilton in which “ticket” dominates everything else.

We have covered some overall patterns of the reviews. How about criticisms? We do see some low ratings in the previous bar graphs. I looked into the negative reviews (have 1 and 2 in ratings) of some shows. Here is one example:

The key words in low rating reviews of Hamilton.

 

We can see besides the complaints on the ticket price that “understudy”  was mentioned quite a few times. I do not think it is necessarily because the understudy did a bad job. It is natural that people got upset when they did not see their favorite actors/actresses showing up on stage. But when you add in the very high price they pay as a factor on their feelings, the disappointment grows to the point of exaggeration. I checked out some bad reviews and found  people often emphasize that they paid a fortune for the show but ended up watching understudies. So if you really care, do some homework one the cast schedules.

In summary, what is the take away from this little study?

  1. Broadway shows are highly diverse in topics and features. You might not like everyone of them, but there must be something for your particular taste. Do some homework before purchasing the tickets. Also if you want to see some actor/actress in particular, check his/her schedule.
  2. If you come from away to NYC and have already spent a fortune on the flight and other stuff, my suggestion is not try to save too much on the tickets. For a lot of theaters and shows, different seats will bring you totally different experience. You don’t want to be one of the people who end up posting "I should have bought a better seat" on TripAdvisor.

About Author

Zhenggang Xu

Zhenggang is currently a data science fellow in NYC data science academy. He received his education in computational chemistry and worked in deep water exploration for a few years. He believes in numbers since computations have helped him...
View all posts by Zhenggang Xu >

Leave a Comment

Cancel reply

You must be logged in to post a comment.

No comments found.

View Posts by Categories

All Posts 2399 posts
AI 7 posts
AI Agent 2 posts
AI-based hotel recommendation 1 posts
AIForGood 1 posts
Alumni 60 posts
Animated Maps 1 posts
APIs 41 posts
Artificial Intelligence 2 posts
Artificial Intelligence 2 posts
AWS 13 posts
Banking 1 posts
Big Data 50 posts
Branch Analysis 1 posts
Capstone 206 posts
Career Education 7 posts
CLIP 1 posts
Community 72 posts
Congestion Zone 1 posts
Content Recommendation 1 posts
Cosine SImilarity 1 posts
Data Analysis 5 posts
Data Engineering 1 posts
Data Engineering 3 posts
Data Science 7 posts
Data Science News and Sharing 73 posts
Data Visualization 324 posts
Events 5 posts
Featured 37 posts
Function calling 1 posts
FutureTech 1 posts
Generative AI 5 posts
Hadoop 13 posts
Image Classification 1 posts
Innovation 2 posts
Kmeans Cluster 1 posts
LLM 6 posts
Machine Learning 364 posts
Marketing 1 posts
Meetup 144 posts
MLOPs 1 posts
Model Deployment 1 posts
Nagamas69 1 posts
NLP 1 posts
OpenAI 5 posts
OpenNYC Data 1 posts
pySpark 1 posts
Python 16 posts
Python 458 posts
Python data analysis 4 posts
Python Shiny 2 posts
R 404 posts
R Data Analysis 1 posts
R Shiny 560 posts
R Visualization 445 posts
RAG 1 posts
RoBERTa 1 posts
semantic rearch 2 posts
Spark 17 posts
SQL 1 posts
Streamlit 2 posts
Student Works 1687 posts
Tableau 12 posts
TensorFlow 3 posts
Traffic 1 posts
User Preference Modeling 1 posts
Vector database 2 posts
Web Scraping 483 posts
wukong138 1 posts

Our Recent Popular Posts

AI 4 AI: ChatGPT Unifies My Blog Posts
by Vinod Chugani
Dec 18, 2022
Meet Your Machine Learning Mentors: Kyle Gallatin
by Vivian Zhang
Nov 4, 2020
NICU Admissions and CCHD: Predicting Based on Data Analysis
by Paul Lee, Aron Berke, Bee Kim, Bettina Meier and Ira Villar
Jan 7, 2020

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day ChatGPT citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay football gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income industry Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI

NYC Data Science Academy

NYC Data Science Academy teaches data science, trains companies and their employees to better profit from data, excels at big data project consulting, and connects trained Data Scientists to our industry.

NYC Data Science Academy is licensed by New York State Education Department.

Get detailed curriculum information about our
amazing bootcamp!

Please enter a valid email address
Sign up completed. Thank you!

Offerings

  • HOME
  • DATA SCIENCE BOOTCAMP
  • ONLINE DATA SCIENCE BOOTCAMP
  • Professional Development Courses
  • CORPORATE OFFERINGS
  • HIRING PARTNERS
  • About

  • About Us
  • Alumni
  • Blog
  • FAQ
  • Contact Us
  • Refund Policy
  • Join Us
  • SOCIAL MEDIA

    © 2025 NYC Data Science Academy
    All rights reserved. | Site Map
    Privacy Policy | Terms of Service
    Bootcamp Application