Spotify Data Tells Us What Makes a Hit Song

Posted on Mar 10, 2022

The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Spotify Data Tells Us What Makes a Hit Song

Spotify Data Tells Us What Makes a Hit Song

Source: Trusted Reviews

Data Science Introduction

The holy grail of aspiring musical stars of every genre is finding out what makes a mega-hit song.  Is it a powerful love ballad or beat thumping club banger? For this exploratory data analysis project, I wanted to take the first steps in finding that grail using data science techniques, the Spotify API, and the Spotify Charts Kaggle Dataset.

Why Spotify you ask? First, Spotify is one of the largest global online music streaming services in the world with 406 million monthly active users, including 180 million paying subscribers, as of December 2021. (Spotify)  Second, with the Spotify API, you get access to a wide variety of song characteristics/audio features that provide a quantifiable way to analyze a song.

Furthermore, the Spotify playlist has become the new mixtape tape; something to share your music with friends and family or your public.

Project Goals

  1. Determine if there is a relationship between song characteristics/audio features and the number of times a hit song is streamed?
  2. Determine if there is a significant relationship for mega-hit songs in the top 1% of songs that were streamed?
  3. How does the song genre play into this analysis?

Wrangling the Data

The Kaggle data set had over 26,000,000 entries from multiple markets, a global streaming accumulation, a mix of the top 50 viral songs, and the top 200 daily hits songs. I decided that was too broad and unwieldy I selected the largest market in the data set with the most recent complete year that didn't contain viral songs.

Into the Data...

Since I wanted to find out if there was a relationship between characteristics/audio features and the number of streams it was important to me to figure out what songs were streamed the most.

Spotify Data Tells Us What Makes a Hit Song

Spotify Data Tells Us What Makes a Hit Song

It was also important to see which genres were streamed the most.

Song Characteristics/Audio Features

Structural: key, mode, time signature, duration,
Mood: Danceability, Valence, Energy, Tempo
Context: Loudness, Speechiness, Instrumentalness
Properties: Liveness, Acousticness
Comparative: Popularity

Data Analysis

Spotify Data Tells Us What Makes a Hit Song

Spotify Data Tells Us What Makes a Hit Song

There wasn’t any significant correlation among hit songs.

Megahits Correlation Characteristics Values

Megahits have slightly more correlation to each other but it is not enough to make any real statements based on this analysis.

Spotify Data Tells Us What Makes a Hit Song

Spotify Data Tells Us What Makes a Hit Song

Most streamed Genre: Pop Correlation Characteristics Values

Pop megahits had a strong correlation to valence, a weak correlation to danceability, energy and loudness.


1. There is no magic bullet or holy grail  to hit songs.
2.  In the US market characteristics of a hit songs widely vary.
3. Megahits have slightly more correlation to each other but it is not enough to make any real statements based on this analysis.
4. As would be expected songs in the same genre have some correlation but I was surprised that based on this analysis it wasn’t significantly greater.

Content creators that want to leverage this analysis should focus on making pop music that is more happy, cheerful, and euphoric ( high valence number), that is energetic, and has tempo, rhythm stability, beat strength, and overall regularity (danceability).

Source: Piqsels


About Author

Corey Kelly

I received a Bachelor of Science in Aerospace Engineering and a Master of Education from the University of Notre Dame. I also received a Master of Business Administration in Management from Dowling College. I am looking to contribute...
View all posts by Corey Kelly >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI