Data Analysis on Spotify

Posted on Jun 14, 2021

The skills the authors demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Data Science Introduction

Based on data, Spotify is one of the largest streaming and media services in the world. Having a better understanding of the Spotify markets would be highly beneficial to artists, producers, and labels alike. The highest importance is placed on understanding the popularity score that Spotify calculates for each artist, as it provides a standardized metric by which the artist can compare themselves to the competition.

Previously, we explored artists in the US market to better understand what the relationship between artists' popularity and follower count. We saw an expected correlation: as the popularity of an artist increased, so did their number of followers. However, this trend is only visible when looking at the macro scale of the entire US market. When looking at follower counts within a popularity range, we see numerous artists that have more followers than artists at higher popularity scores. This was seen for scores up to popularity 90, after which the number of artists at each popularity was reduced considerably.

From this, we gathered that a high follower count does not necessitate a high popularity. In other words, artists need to focus on getting their tracks played, not acquiring followers, to increase their popularity.

With this in mind, we move on to determining whether this trend exists in foreign markets as well. It is likely that this trend is present in all markets for several reasons. First, the scoring system used by Spotify is the same across all markets. Second, highly popular artists likely have an international presence. We will also briefly explore the genres of US artists, to see if there is anything that may help us understand how a US artist can grow their popularity. Confirming this trend will help us set a baseline for future work.

Data Preparation

We again used the US Spotify Tracks dataset available on Kaggle. In addition, we will use another dataset, also available on Kaggle, with the artist market data in 125 foreign markets. For the genre, which is a part of the previously mentioned US market dataset.

We will manipulate and clean the artist market data in Python, using the Pandas and Numpy libraries. The cleaned data will be exported to .csv files, which we will import into R. We can then manipulate these datasets to generate the necessary visualizations and statistics.

Visualizing the Analysis

We can take a side-by-side look at the US and Japan markets to see if trends are similar:

Data Analysis on Spotify

As expected, the Japan market shows the trend seen in the US market. We will adjust the bins to see how this looks at each popularity above 80:

Data Analysis on Spotify

In general, an increase in popularity means an artists will have more followers. Furthermore, we still see artists that have more followers than other artists having greater popularity.

We will now explore whether genres overall have any relation to popularity. We get the following distribution for number of genres at each popularity:

Data Analysis on Spotify

The genres follow a normal distribution, centered around a popularity of 45. A significant number of genres have zero popularity, which makes sense considering unique genres will be associated with artists of zero popularity. There are also not many genres associated with artists of increasingly high popularity. This is expected, since the number of artists decreases significantly at higher popularity scores. These highly popular artists will likely not have many unique genres associated with them.



Although we did not find anything novel in this analysis, we can confirm that the trend seen in the US market is actually a trend pervasive across all markets. This finding helps to simplify our future work; we can infer that any trends we find within the US market will also be present in foreign markets.

By looking at US market genres as a whole, we could not obtain any information to concretely establish a relationship between an artists' genre and popularity.

Future Work

With this baseline established, our next analyses can delve deeper into other relationships that might impact an artists' Spotify popularity score. For instance, we can take a closer look at whether the number of genres an artist has correlates with their popularity. Another angle would be to see whether any particular genre is more popular than others, though this will require us to categorize all genres first. In addition, we are free to explore possible trends between a song's composition and an artist's popularity.

About Author

Aleksey Klimchenko

Data Scientist seeking to leverage on model development experience and understanding of research design and hypothesis testing. Previous experience in Computer Science and Bioinformatics.
View all posts by Aleksey Klimchenko >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI