Data Visualization on Spotify's Progress Overtime

Posted on Aug 17, 2020
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.


The medium through which we listen to music has been changing from cassette tapes, cds, mp3 players, and now to streaming services. Thus, it was only logical to explore the popular songs from Spotify, the leading music streaming market, if I wanted to explore the data trends in music and how it has changed over the past 10 years. If any trends exist among the features of the popular songs from the Spotify, it might be helpful to predict the future music trend for the entertainment business to be more successful.


Popular Features

First, I wanted to explore the most popular feature for past 10 years overall. According to the data, Memories by Maroon 5 turned out the be the most famous song for the past 10 years. Also, Katy Perry is the most popular artist with dance pop being the most popular genre.

Data Visualization on Spotify's Progress Overtime

Not only that, with the various characteristics of the song that Spotify offers, I was able to check the other popular features of the song. For example, BPM (beats per minute) of about 124 was the most common among the songs, and energy (the higher the value, the more energetic) was about 81.

Also, danceability (the higher the value, the easier it is to dance to the song), loudness (the higher the value, the louder the song), liveness (the higher the value, the more likely the song is a live recording), valence (the higher the value, the more positive mood for the song), duration (the length of the song), acousticness (the higher the value, the more acoustic the song is), and speechiness (the higher the value, the more spoken word the song contains) was 69, -5, 10, 45, 220, 3.4, and 5, respectively. With this data, the most popular features were revealed, but how did the features change over the span of 10 years?

Average of Each Feature Every Year

It led me to check by observing the average of each feature every year.

BPM seemed like it was getting higher until the year of 2014 but started decreasing, while energy tends to keep on decreasing. Danceability started increasing significantly starting from the year of 2013, and the loudness has been fluctuating since the year of 2014. Liveness has been decreasing since 2016, valence has been fluctuating since 2015, and duration tended to decrease since 2015, meaning people are preferring shorter songs. While the acousticness significantly increased in 2018, speechiness fluctuated but seems to be on a general decline since 2017.

After exploring the change in trends annually, I also started wondering: how much does each feature contribute to the popularity of the song?

Data Visualization on Spotify's Progress Overtime

Scatterplot Dataย 

I decided to create scatterplots and density maps to more accurately identify the frequency of a featureโ€™s value against the popularity of the song and checked the approximated median of the most dense area. Surprisingly, all of the features seem to be the most dense near the values in the 70s, which indicates that songs with popularity values around 70 have more mainstream characteristics than the ones on either extreme side of the popularity spectrum.

Data Visualization on Spotify's Progress Overtime


But is there any correlation between the features? Over the span of 10 years all together, there seems to be no strong correlation between year and features or popularity and features. However, among the features, energy and loudness, danceability and valence, and energy and acousticness seem to have correlations.
With the data analyzed above, I was able to explore the trend of music in past 10 years. However, this made me want to go more in depth dataset-wise by comparing data with the Billboardโ€™s chart of 10 years, or with the list of songs that got an award to see if any other trends exist between them.

About Author

Jay Kim

BA in Psychology at NYU & Assistant Accountant
View all posts by Jay Kim >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI