Data Analysis on Spotify
The skills the authors demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Data Science Introduction
Based on data, Spotify is one of the largest streaming and media services in the world. Having a better understanding of the Spotify markets would be highly beneficial to artists, producers, and labels alike. The highest importance is placed on understanding the popularity score that Spotify calculates for each artist, as it provides a standardized metric by which the artist can compare themselves to the competition.
Previously, we explored artists in the US market to better understand what the relationship between artists' popularity and follower count. We saw an expected correlation: as the popularity of an artist increased, so did their number of followers. However, this trend is only visible when looking at the macro scale of the entire US market. When looking at follower counts within a popularity range, we see numerous artists that have more followers than artists at higher popularity scores. This was seen for scores up to popularity 90, after which the number of artists at each popularity was reduced considerably.
From this, we gathered that a high follower count does not necessitate a high popularity. In other words, artists need to focus on getting their tracks played, not acquiring followers, to increase their popularity.
With this in mind, we move on to determining whether this trend exists in foreign markets as well. It is likely that this trend is present in all markets for several reasons. First, the scoring system used by Spotify is the same across all markets. Second, highly popular artists likely have an international presence. We will also briefly explore the genres of US artists, to see if there is anything that may help us understand how a US artist can grow their popularity. Confirming this trend will help us set a baseline for future work.
We again used the US Spotify Tracks dataset available on Kaggle. In addition, we will use another dataset, also available on Kaggle, with the artist market data in 125 foreign markets. For the genre, which is a part of the previously mentioned US market dataset.
We will manipulate and clean the artist market data in Python, using the Pandas and Numpy libraries. The cleaned data will be exported to .csv files, which we will import into R. We can then manipulate these datasets to generate the necessary visualizations and statistics.
Visualizing the Analysis
We can take a side-by-side look at the US and Japan markets to see if trends are similar:
As expected, the Japan market shows the trend seen in the US market. We will adjust the bins to see how this looks at each popularity above 80:
In general, an increase in popularity means an artists will have more followers. Furthermore, we still see artists that have more followers than other artists having greater popularity.
We will now explore whether genres overall have any relation to popularity. We get the following distribution for number of genres at each popularity:
The genres follow a normal distribution, centered around a popularity of 45. A significant number of genres have zero popularity, which makes sense considering unique genres will be associated with artists of zero popularity. There are also not many genres associated with artists of increasingly high popularity. This is expected, since the number of artists decreases significantly at higher popularity scores. These highly popular artists will likely not have many unique genres associated with them.
Although we did not find anything novel in this analysis, we can confirm that the trend seen in the US market is actually a trend pervasive across all markets. This finding helps to simplify our future work; we can infer that any trends we find within the US market will also be present in foreign markets.
By looking at US market genres as a whole, we could not obtain any information to concretely establish a relationship between an artists' genre and popularity.
With this baseline established, our next analyses can delve deeper into other relationships that might impact an artists' Spotify popularity score. For instance, we can take a closer look at whether the number of genres an artist has correlates with their popularity. Another angle would be to see whether any particular genre is more popular than others, though this will require us to categorize all genres first. In addition, we are free to explore possible trends between a song's composition and an artist's popularity.