Data Study on listening habits: Spotify’s Top 200
The skills we demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
4 out of every 5 streams came from the top 20% most streamed songs in Spotify’s Top 200 charts for 2017
90% of streaming in the U.K. comes from the top 20%
Spotify publishes a chart with the most streamed tracks per country. The chart is updated daily and goes back as far as the beginning of 2017. This Kaggle competition collected all the historical data from 2017. The data contains over 3 million rows and sums up more than 99 billion streams.
Data on Track performance
Exploratory Data Analysis can be helpful to see what kind of interesting insights can be drawn from the dataset. I started off by mapping the top tracks worldwide, where every country is colored by the amount of streams generated.
Top track per country in 2017:
Top track for Mexico was 'Me Rehúso' with 127.5 million streams
The map only shows ‘the tip of the iceberg’ concerning top ranking tracks. It does not shed light onto how the songs ranked over time and how they performed in different countries.
To monitor individual songs over time, we can use a time series plot to visualize how their position changes and compare the different patterns between countries. This can serve as a granular analysis into music adoption and retention per country.
In general, each song seems to have its own particular behaviour, and this becomes more noticeable regarding music of different genres.
Individual track adoption and retention:
‘Despacito’ and ‘HUMBLE.’ ranking over time in Great Britain, Mexico and the U.S.
It is important to note that the data is not representative of the entire music streaming industry and that it has significant selection bias (only Spotify users). Audio streaming comes from important sources other than Spotify. Apple Music probably has very similar behavior in their data, but YouTube has very different streaming habits.
Singling out indiviual songs has provided some interesting insight as to how different songs perform uniquely in each country. However, does this behaviour scale up to countrywide listening habits? On average, does every country stream music in the same way? Or does average music adoption and retention vary between countries?
To answer these questions, I began by grouping the data by country and making a scatter plot to compare, on average, all tracks that reached the top 10 in each country by the number of days it took to reach their top positions against the number of days it took to leave the top charts entirely. The following plot sums up the findings.
Mean track adoption and retention:
Mexico’s new music adoption is slower than most countries, however, the top tracks stay at in the charts for longer
The graph shows an important difference between continents which is probably due to the native speaking languages of each country. Regardless, it seems that if a country is quick at adopting new music, it will also be quick to move on to the next music trend.
There are some countries in which the top tracks stay in the top charts for longer, like Brazil or Mexico. The U.S. and Great Britain are better at adopting music and therefore move on easily to new songs.
Data on Extreme events
Individual song streaming:
Global phenomenons like ‘Despacito’ dwarf most songs
Zooming in again to visualize individual songs, I plotted a random sample of 1,000 songs and their respective total global streams. The plot shows that there are various extreme values in the sample (this holds true for all random samples taken from the datset). In addition, most data points add up to a very low amount of streams in comparison to the extremely popular tracks.
This is surprising given that the dataset only contains information on the most streamed songs in the world.
To can check if this is true on a nationwide scale, I made boxplots showing the total number of streams for each song by country.
Song streaming in top countries:
Red points represent outlying observations in the data
As expected, the outlying observations have very extreme values, so much so that the boxes are not properly visible. Changing the scale can provide a better look, however, regular (non-outlying) observations are predictable and consistent events. Examining the extreme values can be much more insightful.
I labelled songs that belong to the top 20% most streamed tracks to compare the amount of streams they generate versus the rest of the songs in the data.
Streaming from the top 20%
Top streaming countries in order of polarized listening habits
Polarized listening habits refers to the imbalance in music streaming. If polarization is high, then this means that a small number of songs are responsible for a high volume of streams. The higher the imbalance, the higher the polarization.
The barplot shows a pretty stark contast in listening habits between countries. At the top is Great Britain where 20% of the songs account for 90% of total streams during 2017. Sweden’s top songs account for 89% of streams.
At the bottom is Brazil where the top tracks only contibute 75% of total streaming. 79% of Mexico’s streaming comes from the top 20%.
Valuable information has risen from diving into extreme events in the data. Now only one last question remains: does streaming polarization have any relation to music adoption and retention?
This sheds light onto how to market new releases and what to expect within each country. Marketing strategies can be tailored for a country which is highly reliable on top tracks and that easily moves on to the next top trending music.
For future research, we can look into when is the prime moment for an artist to announce their tour dates in specific countries.