NYC Data Science Academy| Blog
Bootcamps
Lifetime Job Support Available Financing Available
Bootcamps
Data Science with Machine Learning Flagship ๐Ÿ† Data Analytics Bootcamp Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lesson
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories Testimonials Alumni Directory Alumni Exclusive Study Program
Courses
View Bundled Courses
Financing Available
Bootcamp Prep Popular ๐Ÿ”ฅ Data Science Mastery Data Science Launchpad with Python View AI Courses Generative AI for Everyone New ๐ŸŽ‰ Generative AI for Finance New ๐ŸŽ‰ Generative AI for Marketing New ๐ŸŽ‰
Bundle Up
Learn More and Save More
Combination of data science courses.
View Data Science Courses
Beginner
Introductory Python
Intermediate
Data Science Python: Data Analysis and Visualization Popular ๐Ÿ”ฅ Data Science R: Data Analysis and Visualization
Advanced
Data Science Python: Machine Learning Popular ๐Ÿ”ฅ Data Science R: Machine Learning Designing and Implementing Production MLOps New ๐ŸŽ‰ Natural Language Processing for Production (NLP) New ๐ŸŽ‰
Find Inspiration
Get Course Recommendation Must Try ๐Ÿ’Ž An Ultimate Guide to Become a Data Scientist
For Companies
For Companies
Corporate Offerings Hiring Partners Candidate Portfolio Hire Our Graduates
Students Work
Students Work
All Posts Capstone Data Visualization Machine Learning Python Projects R Projects
Tutorials
About
About
About Us Accreditation Contact Us Join Us FAQ Webinars Subscription An Ultimate Guide to
Become a Data Scientist
    Login
NYC Data Science Acedemy
Bootcamps
Courses
Students Work
About
Bootcamps
Bootcamps
Data Science with Machine Learning Flagship
Data Analytics Bootcamp
Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lessons
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook
Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories
Testimonials
Alumni Directory
Alumni Exclusive Study Program
Courses
Bundles
financing available
View All Bundles
Bootcamp Prep
Data Science Mastery
Data Science Launchpad with Python NEW!
View AI Courses
Generative AI for Everyone
Generative AI for Finance
Generative AI for Marketing
View Data Science Courses
View All Professional Development Courses
Beginner
Introductory Python
Intermediate
Python: Data Analysis and Visualization
R: Data Analysis and Visualization
Advanced
Python: Machine Learning
R: Machine Learning
Designing and Implementing Production MLOps
Natural Language Processing for Production (NLP)
For Companies
Corporate Offerings
Hiring Partners
Candidate Portfolio
Hire Our Graduates
Students Work
All Posts
Capstone
Data Visualization
Machine Learning
Python Projects
R Projects
About
Accreditation
About Us
Contact Us
Join Us
FAQ
Webinars
Subscription
An Ultimate Guide to Become a Data Scientist
Tutorials
Data Analytics
  • Learn Pandas
  • Learn NumPy
  • Learn SciPy
  • Learn Matplotlib
Machine Learning
  • Boosting
  • Random Forest
  • Linear Regression
  • Decision Tree
  • PCA
Interview by Companies
  • JPMC
  • Google
  • Facebook
Artificial Intelligence
  • Learn Generative AI
  • Learn ChatGPT-3.5
  • Learn ChatGPT-4
  • Learn Google Bard
Coding
  • Learn Python
  • Learn SQL
  • Learn MySQL
  • Learn NoSQL
  • Learn PySpark
  • Learn PyTorch
Interview Questions
  • Python Hard
  • R Easy
  • R Hard
  • SQL Easy
  • SQL Hard
  • Python Easy
Data Science Blog > Data Visualization > Voyage to the intelligent music streaming service: Ep 1. The analysis of Top ranked Songs and Artists

Voyage to the intelligent music streaming service: Ep 1. The analysis of Top ranked Songs and Artists

Daniel (Donghyun) Kang
Posted on Mar 10, 2018

Prologue

Previously in 2012 and 2013, I had worked on developing a Music Recommendation System algorithm based on gathered information from sensors of smart phones. There were some reasons not to be implemented in mass production, but the critical one was processing time of big data and its clustering, which was not efficient for applying real-time engine at that time. Now I am rebooting this project from the scratch by myself. This journey will cover not only for the music recommendation system but also for all the smart artificial intelligence system. Even though I am not aware of where the end of this travel is, enjoying it would enough.

My journey start with an analysis of the current music streaming trend by using database from Spotify.

Why Spotify ?

The Spotify is a music streaming service founded in 2006 in Sweden, which provides convenient user interface to listen to music and share playlists with other users in two ways: one is free of charge with advertisements, the other is to purchase a subscription for unlimited ad-free music streaming. According to the recent statistic provided by Statista gives information on the number of paying subscribers to Spotify's music streaming service. As of January 2018, Spotify had 70 million paying subscribers worldwide, up from 60 million paying subscribers in July 2017. This rapid and steady growth of paid subscribers give a good reason to investigate current trend of music for many data scientists.

Data Source

Spotify data set in Kaggle has been used for the analysis.

  1. spotifys-worldwide-daily-song-ranking (for this analysis)
  2. top-tracks-of-2017 (for additional information)
  3. every-song-you-have-heard-almost (for lyric analytics)
  4. world-cities-database (for city mapping)

The first data set has been mainly used here. This contains the daily ranking of the 200 most listened songs in 53 countries from 2017 and 2018 by Spotify users. It contains more than 2 million rows, which comprises 6629 artists, 18598 songs for a total count of one hundred five billion streams count. Each row contains a ranking position on a specific day for a song. For instance, the first 200 rows present the ranking for the 1st of January in Argentina. The following 200 rows will contain the ranking for the 2nd of January in Argentina. The regions are alphabetically sorted. To build an interactive shiny app efficiently, every 100 ranking songs and artists in each country are filtered out. The reduced data set is as below:

RANKING TRACK_NAME ARTIST STREAMS DATE REGION
Min. : 1.00 Length:1874422 Length:1874422 Min. : 1001 Length:1874422 Length:1874422
1st Qu.: 24.00 Class :character Class :character 1st Qu.: 4004 Class :character  Class :character
Median : 49.00 Mode :character Mode :character Median : 12341 Mode :character Mode :character
Mean : 49.25 Mean : 70325
3rd Qu.: 74.00 3rd Qu.: 44110
Max. :100.00 Max. :11381520

Motivation

The trend analysis will give us to explore how artists and songs' popularity varies in time. By using the daily ranking of the 100 most listened songs in 53 countries listened by Spotify users,
- track the current trend of music and forecast the next trend;
- track the flow or influence of top ranked musics from one country to another, then make some insight of business model for commercial service such as localized advertisement;
- make a distribution/influence map by region, country, genre(music categories), and other information (such as season, weather, social event, etc);
- check a possibility to develop a music recommendation service based on user-preference.

Insights

Predict
- what type of songs will be popular in the future?
Forecast
- what is the next trend of music in this country?
Share
- how are popular songs propagate country by country?
Pattern
- can we categorize countries by their music trend?
Dominant factors
- something common in lyrics, ranks, or artists ?
Business Models
- what kind of features make a tendency or trend in this field?
Influence Map
- how long does a top ranked songs take to get into the ranking of neighbor countries?
Recommendation Service
- is this possible? Any pattern? Any user-preference country by country?

All of these queries will be anwered one by one during this journey, and the visualization app built in this episode would be the first set foot for an intuitive grasp.

Data Analysis

There are 53 countries categorized to 'region' in this data set. Nine of them are in Asia, 26 in Europe, 16 in South America, and 2 is Canada and the United States. Currently no countries form Africa and Middle East are included. The distribution is as shown in the below map. This data set has been provided by Spotify, and it is probable that when the countries have more influential local music streaming service provider, the subscriber of the Spotify could be too small to be counted on the data statistics. For example, there are three major music streaming service companies in South Korea, and their market-share is more than 90% in total.

Total aggregating result of the number of all 'streams' by countries shows that a streaming power - the number of streams downloaded a year - is concentrated on some of countries such as the US, Great Britain, Brazil, Mexico, etc. When we see the top 10 countries - 'The USA', 'Great Britain', 'Brazil', 'Mexico', 'Germany', 'Ecuador', 'Netherlands', 'Sweden', 'Australia', 'Philippines' - not all the countries speaks English as a first language. So the streaming power has nothing to do with a language, however we can think that the Spotify is a powerful streaming-service provider in these countries. This makes sense why Sweden is in top 10. For an in-depth analysis on the trend and tendency, the first top 7 to 10 countries will be in initial research stage.

In a wide common sense, there is strong correlation between top songs and top artists, so the "Shape of You" and Ed Sheeran does. They were both very popular on Spotify in global 2017 based on the below table which is calculated by maximum downloaded stream.


* global, us, gb(GreatBritain), mx(Mexico), br(Brazil), de(Germany), es(Ecuador), se(Sweden)

When we aggregate the number of download-streams for the 'Shape of You' in each country, the top rank of countries described in below pie chart is almost same for the calculated streaming power list. There is only a small amount of order difference.

To find out some patterns in the time series,  the number of streams are measured in monthly basis over top seven ranked countries for streaming power. There isn't a big significance but we can see that the number of streams in January and February are significantly lower than other month. Why? Are people too busy to listen to music in January and February ? Or not many music has been newly released ? With this data set, it is difficult to say there is a significant tendency in the first quarter of the year. This may happened only in 2017. For a clarification, further data set will be explored later - at least for 2015, 2016.

To trace the popular music trend, periodic popular songs are calculated and top 3 songs for each country are listed as shown in below three figures. Date for first one is whole 2017 year, second is from Jan.2017 to Jul.2017, and the last is from Aug.2017 to Dec.2018. It can be seen that some songs were loved in global, but some are only for specific country. When the song or artist are really loved in specific country, it will be most likely the homeland of the artist. For example on the 'shape of you' by Ed Sheeran, even though it is a great hit across all over the world, the number of streams in Great Britain are almost twice than other country, and Sheeran was born in Halifax, West Yorkshire. And the next is linguistic factors. It make sense that the 'shape of you' has been so popular in the US. Language reflects a culture and music cannot be fully understandable without the cultural consensus.

As you can see, the data set looks simple but getting insight form this data requires lots of trial by changing date, artist, steams, and selecting various countries, which has regarded as redundant time consuming. For this reason, interactive visualization tool are designed to analyze this. In fact, all the above findings had been from the visualization tool as introduced in the next chapter.

Data Visualization Tool

This app is specially designed not only to visualize for better understanding but also to analyze and find insight form the Spotify data set.

ShinyApp

Spotify_Music_Analyzer

Data Table

Number selector is provided for any numerical columns such as Ranking, Streams, and DATE for sorting whereas string-based filter function is provided for the categorical columns such as Track_name, Artist, and Region.

 Chart

Four types of graph can be plotted with facet_wrap by selecting countries. Start and End date can be selected from 2017-01-01 to 2018-01-31. Whenever the Artist and related Track_name is chosen, corresponding plot will be drawn immediately. When you don't select Region, countries - the USA, Great Britain, France, Ecuador) operate over the simulation by default.

Graph

This section show two graph with corresponding selections. One is a trend curve for the Artist, another is a trend curve for the Track over selected countries. This graph really helps to understand how long the song sustain in top rank and when a new trend come up.

 Status

Selecting cutoff rankings, the amount of streams of songs in the ranking calculated and compared with others over respective countries. This can give a insight for an analysis of a music streaming similarity presenting precise number of downloaded streams.

Word Cloud

Originally this is intended to forecast next trend of music genre by analyzing the statistics of lyrics used in popular music. Currently, this word cloud shows a characteristic of trendy name of songs and popularity of a artist. When the lyric data set has been cleaned and transformed, the forecasting service will be provided. You can find that 'Drake', 'Chainsmokers', and 'Sheeran' were the most popular artists in the USA from January to March in 2017.

Takeaway

Through this app, several insights has came up with results.

  • Whatever the reason may be, The Spotify has not many subscribers in Middle East, Africa, China, and South Korea.
  • A streaming power - the number of streams downloaded a year - is concentrated on some of countries, but not only English-speaking countries.
  • 'Shape of You' is the most downloaded, and streamed song by Spotify users all over the world in 2017, and so Ed Sheeran is the artist.
  • Spotify users had not used streaming service as much as usual in January and February in 2017.
  • Music reflects culture, so it is highly correlate with its language.
  • 'Drake', 'Chainsmokers', and 'Sheeran' were the most popular artists in the USA in 2017.
  • Popular music had been spread out across borders without much delay. Why?  Ecosystem of On-line streaming ?

Next Place for Visit

The start point is important. There are several steps for data science, and the very first and the most important part is data selection and its reliability. The next episode deals with the journey to find a reason why the Spotify's data is one representative sample for music streaming trend analysis.

Epilogue

"We've only just begun this journey" by Carpenters.

About Author

Daniel (Donghyun) Kang

Daniel (Donghyun) got a Ph. D. in Electronic Engineering (Wireless Communication Systems) from Sungkyunkwan University, South Korea. Since 2002, He has served as a wireless communication system design engineer for Samsung Electronics, where he has been recognized for...
View all posts by Daniel (Donghyun) Kang >

Related Articles

Capstone
Catching Fraud in the Healthcare System
Data Analysis
Car Sales Report R Shiny App
Data Analysis
Injury Analysis of Soccer Players with Python
Capstone
The Convenience Factor: How Grocery Stores Impact Property Values
Capstone
Acquisition Due Dilligence Automation for Smaller Firms

Leave a Comment

Cancel reply

You must be logged in to post a comment.

No comments found.

View Posts by Categories

All Posts 2399 posts
AI 7 posts
AI Agent 2 posts
AI-based hotel recommendation 1 posts
AIForGood 1 posts
Alumni 60 posts
Animated Maps 1 posts
APIs 41 posts
Artificial Intelligence 2 posts
Artificial Intelligence 2 posts
AWS 13 posts
Banking 1 posts
Big Data 50 posts
Branch Analysis 1 posts
Capstone 206 posts
Career Education 7 posts
CLIP 1 posts
Community 72 posts
Congestion Zone 1 posts
Content Recommendation 1 posts
Cosine SImilarity 1 posts
Data Analysis 5 posts
Data Engineering 1 posts
Data Engineering 3 posts
Data Science 7 posts
Data Science News and Sharing 73 posts
Data Visualization 324 posts
Events 5 posts
Featured 37 posts
Function calling 1 posts
FutureTech 1 posts
Generative AI 5 posts
Hadoop 13 posts
Image Classification 1 posts
Innovation 2 posts
Kmeans Cluster 1 posts
LLM 6 posts
Machine Learning 364 posts
Marketing 1 posts
Meetup 144 posts
MLOPs 1 posts
Model Deployment 1 posts
Nagamas69 1 posts
NLP 1 posts
OpenAI 5 posts
OpenNYC Data 1 posts
pySpark 1 posts
Python 16 posts
Python 458 posts
Python data analysis 4 posts
Python Shiny 2 posts
R 404 posts
R Data Analysis 1 posts
R Shiny 560 posts
R Visualization 445 posts
RAG 1 posts
RoBERTa 1 posts
semantic rearch 2 posts
Spark 17 posts
SQL 1 posts
Streamlit 2 posts
Student Works 1687 posts
Tableau 12 posts
TensorFlow 3 posts
Traffic 1 posts
User Preference Modeling 1 posts
Vector database 2 posts
Web Scraping 483 posts
wukong138 1 posts

Our Recent Popular Posts

AI 4 AI: ChatGPT Unifies My Blog Posts
by Vinod Chugani
Dec 18, 2022
Meet Your Machine Learning Mentors: Kyle Gallatin
by Vivian Zhang
Nov 4, 2020
NICU Admissions and CCHD: Predicting Based on Data Analysis
by Paul Lee, Aron Berke, Bee Kim, Bettina Meier and Ira Villar
Jan 7, 2020

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day ChatGPT citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay football gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income industry Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI

NYC Data Science Academy

NYC Data Science Academy teaches data science, trains companies and their employees to better profit from data, excels at big data project consulting, and connects trained Data Scientists to our industry.

NYC Data Science Academy is licensed by New York State Education Department.

Get detailed curriculum information about our
amazing bootcamp!

Please enter a valid email address
Sign up completed. Thank you!

Offerings

  • HOME
  • DATA SCIENCE BOOTCAMP
  • ONLINE DATA SCIENCE BOOTCAMP
  • Professional Development Courses
  • CORPORATE OFFERINGS
  • HIRING PARTNERS
  • About

  • About Us
  • Alumni
  • Blog
  • FAQ
  • Contact Us
  • Refund Policy
  • Join Us
  • SOCIAL MEDIA

    ยฉ 2025 NYC Data Science Academy
    All rights reserved. | Site Map
    Privacy Policy | Terms of Service
    Bootcamp Application