Analyzing Spotify Song Metrics to Visualize Popular Songs

Posted on May 6, 2018

How would you go about describing your favorite new song? What makes it catchy to you? According to the people at Spotify, characteristics like energy or mood of a song can actually be quantified, and have many algorithms to describe music in an amazing amount of ways.

Now, why would you care about what the people at Spotify have to say about a song? With a user base of 159 million active monthly users, determining key factors that affect popularity can actually be a powerful tool for record label producers to find new artists to sign, or for aspiring data scientists to show off some nice visualizations and determine what to put on the ultimate summer playlist. Popularity is well defined in their API notes as a function of the number of times a particular song was streamed, as well as how recent those streams are.

About the App

This app visualizes several key factors and investigates their correlation to popularity visually across a wide spectrum of music.

The first two plots offer a degree of interactivity, allowing the user to visualize the difference amongst the genres. The box plot helps to see a more quantitative take on the separation across a wide musical spectrum. The density plot helps more with visualizing across the 0 - 100 scale of popularity, to see if there are any abnormalities with popularity distribution (for instance, classical music seems to have a pretty well defined bi-modal distribution of popularity!)

Besides looking at each genre as a whole, I wanted visualize a subset of each genre on a scatter plot to identify clusters to look at other variables like energy or danceability and how they change along with popularity. However, I ran into many issues in trying to separate the genres and effectively display the information. I decided on a 3D scatter plot, adding another user-input variable to look at two separate correlations with a very interactive plot for the user to zoom in and rotate the axes to better display information to their preference.  I have also included a small table to look at the Pearson correlation coefficient of several of the metrics from Spotify with popularity.

Finally, I took the 50th percentile (in terms of popularity) from each genre in my dataset and displayed them in a datatable in terms of 'threshold values' for each genre. For instance, for a successfully popular metal song, the relative would need to be quite high, as the 50th percentile has a value of 0.902. Also interestingly enough, danceability seems to be a much more crucial factor for pop as opposed to indie pop.

About the Dataset

The dataset was obtained by using the Spotify Web API in combination with the Python 3 library Spotipy. For each genre I chose, I queried 3,000 songs for the Spotify audio analysis and features. The API has a 50 song limit at each time, so I had to create a loop to query the API in 3,000 song chunks, and store them in a relevant pandas dataframe. Afterwards, I wrote the data into a CSV to do the majority of the analysis within R.

The jupyter notebook used to query the server as well as the Shiny application can be found at this GitHub repo.

Future Work

I would love to continue this analysis of popularity metrics with clustering/regression analysis at a further date, or to be able to develop a predictive model and feed information into it via Spotipy to determine up-and-coming popular artists.

For any comments or questions, please reach me via e-mail at [email protected]

About Author

Josh Vichare

BS in Materials Science & Engineering with a concentration on the study of Nanomaterials at Rutgers University. Josh has worked in the biomedical engineering field for close to 4 years in research and development, analyzing various performance metrics...
View all posts by Josh Vichare >

Related Articles

Leave a Comment

Your email address will not be published. Required fields are marked *

Josh Vichare May 8, 2018
Thanks! Spotify's Web API documentation has a lot of the definitions you're looking for I think. As for how they actually determine the value, unfortunately that's not too well known. My guess its an in-house algorithm that they're not too willing to share out in the public.
luca May 8, 2018
hey! great job! how are defined the metrics you're including e.g. danceability, etc?

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags