Analysis of Top 50 Spotify Songs in 2021

Posted on Oct 18, 2022

Spotify is one of the world's most popular subscription streaming services - it includes over 80 million tracks on its platform, and its subscriber base has increased these years dramatically. According to Spotify's financial report, in 2022, it has 433 million monthly active users (MAUs), including 188 million premium subscribers.

Streaming music has genuinely changed nowaday's music industry. It not only changes the way we listen to music and react to our preferences, but it also changes the way artists share their works. On the other hand, these profits musicians and music makers who can use large libraries and data to know the users' tastes better.

So, we all have the same question - does hit music have common traits? In this post, we are going to analyze Spotify's top 50 most listened songs worldwide in 2021 to investigate this topic. The dataset is extracted from Spotify with 14 descriptive variables:

  • Popularity - The higher the value, the more popular the song is
  • Danceability - The higher the value, the easier it is to dance to this song
  • Energy - The higher the value, the more energetic the song is
  • Key - The key of the song.
  • Loudness (dB) - The higher the value, the louder the song
  • Mode - Indicates the modality (major or minor) of a track
  • Speechiness - The higher the value, the more spoken word the song contains
  • Acousticness - The higher the value, the more acoustic the song is
  • Instrumentalness - The closer the value to 1, the more instrumental the song is
  • Liveness - The higher the value, the more likely the song is a live recording
  • Valence - The higher the value, the more positive mood for the song
  • Tempo - the overall estimated tempo of a track in beats per minute (BPM)
  • Duration - duration of the song in minutes
  • Time signature - The time signature is a notational convention to specify how many beats are in each bar/measure.

Before analyzing, let's take a look at this data.

Density Plot

This density plot reacts how popular an artist is relative to other artists on Spotify. The Spotify popularity index is a 0-to-100 score that ranks an artist's popularity relative to other artists. As the numbers grow, the artists will get placed in more editorial playlists and increase the reach of algorithmic playlists and recommendations. Other factors may also affect the rank, for example, the save rate, the number of playlists, the skip rate, and the share rate. All these factors can indirectly bump up or push down a song's popularity index. Here, we can see a relatively higher density in the rank of 85 to 90, which means most of the songs in this top 50 list are in the popularity of 85 -90. And to get into this list, the popularity rank should be at least 65 and above.

Next, let's see who is on the top 50 list.

Based on the graph, we know that there is a total of 35 artists whose songs are on the top 50 most listened list, while 75% of them have one song placed on the list, and 25% of them have more than one song on this list. The top three popular artists are Olivia Rodrigo, Doja Cat, and Bad Bunny. They all have at least three songs on the list.

After knowing all this basic information, we can start to analyze what features may affect the song's hit. Let's check the correlation table to identify some baseline correlations between the variables.

In this table, we found that "energy" and "loudness" have the highest positive correlation, and "energy" and "acousticness" have a correlated inverse relationship. But unfortunately, with our dependent variable being "popularity", we noticed low correlation values across our independent variables. Even though we can't tell the correlation here, we can still find the common features through the EDA.

Based on the features given, I put them into two groups - one is to show the music character, and the other is how the music is presented.

First, let's check on the group of characters. Three features describe the song's character: Valence, energy, and danceability. All of them are measured from 0 to 1.

In this plot, we can see Valence spread evenly. Valence describes musical positiveness. Tracks with high valence sound more positive, while tracks with low valence sound more negative. So here, it means songs in all kinds of moods would have the chance to be popular.

And let's take a look at danceability and energy. Energy measures the songs' intensity and activity. Typically, energetic tracks tend to be fast and loud. For example, death metal has high energy, while classical music scores low in energy. Danceability describes a track's suitability for dancing based on a combination of musical elements, including tempo, rhythm stability, and beat strength.

So here, we can tell that high danceability and high energy are more popular than low danceability and low energy. This is more obvious in danceability - almost all the songs are with danceability above the score of 0.5.

Next, let's check on the features of music present - acousticness, liveness, Instrumentalness, and speechiness.

Instrumentalness detects whether a track contains no vocals. The closer the value is to 1.0, the greater likelihood the track has no vocal content, and values above 0.5 are intended to represent instrumental tracks. Liveness detects the presence of an audience in the recording. Higher values in liveness represent an increased probability that the track was performed live. Generally, a value above 0.8 provides a strong likelihood that the track is live. Based on instrumentalness and liveness in this plot, we know that all the hit tracks are non-instrumental music and are pre-recorded.

The speechiness scores are all below 0.3, and most of the songs are below 0.2. Speechiness detects the presence of spoken words in a track. Values above 0.66 describe tracks that are probably made entirely of spoken words (e.g., talk shows or audiobooks), values between 0.33 and 0.66 describe tracks that may contain both music and speech (e.g., rap), and values below 0.33 most likely represent music and other non-speech-like tracks. So this tells us that non-rap and non-speech-like tracks are more likely to get hit.

Last but not least - the Acousticness. Acousticness stands for whether the track is acoustic or not. 1.0 represents high acoustic, meaning the song is more likely to be lower energy and quieter. Acoustic spread comparatively even here, which means both quiet and loud songs have their market, but the song with more energy tend to be more popular, which we can also tell from the previous graph.

Next, I want to talk about the other features that may also affect the songs' popularity - keys, duration, tempo, and time signature.

In Western music, there are 12 major keys and 12 minor keys. The bar plot describes how the popularity differs for the same key across different modes. 0 stands for C key, and 10 stands for B key. Based on the theory, most people can sing fairly comfortably in the range from middle C to C' or below. Here, we can tell that C major, C sharp major, and B minor are more popular than all the other keys. And major music is more popular than minor music in this 50 most-listened list. So we can assume that the more popular a track is, the more likely it contains vocals and is more singable for listeners.

And speaking of the duration, tempo, and time signature, the numbers tell us that duration which is from 2.5 to 4 minutes, and time signature in 4/4 t will have more chance to be popular.

Without a doubt, hit songs are not easy to create. But upon this analysis, we find out that hit songs do have some common characteristics. Yet, there are other features and factors we didn't mention here that can be explored more deeply, but maybe adding these traits into your song may help you construct a popular one.

About Author

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI