NYC Data Science Academy| Blog
Bootcamps
Lifetime Job Support Available Financing Available
Bootcamps
Data Science with Machine Learning Flagship πŸ† Data Analytics Bootcamp Artificial Intelligence Bootcamp New Release πŸŽ‰
Free Lesson
Intro to Data Science New Release πŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook Graduate Outcomes Must See πŸ”₯
Alumni
Success Stories Testimonials Alumni Directory Alumni Exclusive Study Program
Courses
View Bundled Courses
Financing Available
Bootcamp Prep Popular πŸ”₯ Data Science Mastery Data Science Launchpad with Python View AI Courses Generative AI for Everyone New πŸŽ‰ Generative AI for Finance New πŸŽ‰ Generative AI for Marketing New πŸŽ‰
Bundle Up
Learn More and Save More
Combination of data science courses.
View Data Science Courses
Beginner
Introductory Python
Intermediate
Data Science Python: Data Analysis and Visualization Popular πŸ”₯ Data Science R: Data Analysis and Visualization
Advanced
Data Science Python: Machine Learning Popular πŸ”₯ Data Science R: Machine Learning Designing and Implementing Production MLOps New πŸŽ‰ Natural Language Processing for Production (NLP) New πŸŽ‰
Find Inspiration
Get Course Recommendation Must Try πŸ’Ž An Ultimate Guide to Become a Data Scientist
For Companies
For Companies
Corporate Offerings Hiring Partners Candidate Portfolio Hire Our Graduates
Students Work
Students Work
All Posts Capstone Data Visualization Machine Learning Python Projects R Projects
Tutorials
About
About
About Us Accreditation Contact Us Join Us FAQ Webinars Subscription An Ultimate Guide to
Become a Data Scientist
    Login
NYC Data Science Acedemy
Bootcamps
Courses
Students Work
About
Bootcamps
Bootcamps
Data Science with Machine Learning Flagship
Data Analytics Bootcamp
Artificial Intelligence Bootcamp New Release πŸŽ‰
Free Lessons
Intro to Data Science New Release πŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook
Graduate Outcomes Must See πŸ”₯
Alumni
Success Stories
Testimonials
Alumni Directory
Alumni Exclusive Study Program
Courses
Bundles
financing available
View All Bundles
Bootcamp Prep
Data Science Mastery
Data Science Launchpad with Python NEW!
View AI Courses
Generative AI for Everyone
Generative AI for Finance
Generative AI for Marketing
View Data Science Courses
View All Professional Development Courses
Beginner
Introductory Python
Intermediate
Python: Data Analysis and Visualization
R: Data Analysis and Visualization
Advanced
Python: Machine Learning
R: Machine Learning
Designing and Implementing Production MLOps
Natural Language Processing for Production (NLP)
For Companies
Corporate Offerings
Hiring Partners
Candidate Portfolio
Hire Our Graduates
Students Work
All Posts
Capstone
Data Visualization
Machine Learning
Python Projects
R Projects
About
Accreditation
About Us
Contact Us
Join Us
FAQ
Webinars
Subscription
An Ultimate Guide to Become a Data Scientist
Tutorials
Data Analytics
  • Learn Pandas
  • Learn NumPy
  • Learn SciPy
  • Learn Matplotlib
Machine Learning
  • Boosting
  • Random Forest
  • Linear Regression
  • Decision Tree
  • PCA
Interview by Companies
  • JPMC
  • Google
  • Facebook
Artificial Intelligence
  • Learn Generative AI
  • Learn ChatGPT-3.5
  • Learn ChatGPT-4
  • Learn Google Bard
Coding
  • Learn Python
  • Learn SQL
  • Learn MySQL
  • Learn NoSQL
  • Learn PySpark
  • Learn PyTorch
Interview Questions
  • Python Hard
  • R Easy
  • R Hard
  • SQL Easy
  • SQL Hard
  • Python Easy
Data Science Blog > Python > 2016 in songs on Germany's most popular radio station

2016 in songs on Germany's most popular radio station

Stefan Heinz
Posted on Feb 18, 2017

Introduction

This post is about the second of the four projects we are supposed to deliver at the NYC Data Science Academy Data Science Bootcamp program. The requirements were:

For this project, your primary task is to collect data from a web source by method of
scraping. What you do with that data after its collection is up to you (e.g., numeric
description, basic/interactive graphics, machine learning, etc.); however, you still must
lead the audience through an overall insight. While it is required you scrape your data
using Python, the analyses following are language agnostic – but remember that the
primary task is to foster data scraping skills.

After looking for a unique and interesting subject matter, I decided to base this project on the songs that were played throughout the whole year 2016 on Germany's most popular radio station SWR3.

Code and data can be found on GitHub, while the app itself is online at shinyapps.io.

Data Source

SWR3 is part of the regional public broadcasting corporation SΓΌwdestrundfunk (SWR, "Southwest Broadcasting"), servicing the southwest of Germany (Source: Wikipedia). On their website swr3.de, they offer the possibility to explore which songs were played on any given day at any given time.

Time and day can be selected by two dropdown filters on the top of the page, both implemented as select elements. By clicking the submit button, the page is reloaded with two parameters added to the URL in the format: ?hour=10&date=2017-02-18. This makes it especially easy to navigate through a given date range and get all the songs played in this range.

From each page for each hour of a day it was then possible to get each date, time, artist and title that were played during this hour. To retrieve the content of these elements I used the Python package Beautiful Soup and employed CSS selectors to extract the elements one by one. The four elements describing one entry were saved as a dictionary, while the data for the whole hour was saved as a list of dictionaries, which in turn was appended to an overall list which stored data for the whole execution of the script, usually a month at a time. The data was then written to a CSV file.

Source Data

I ended up making 24 * 366 = 8.784 HTTP requests to the SWR3 playlists page, one for each hour of each day of the leap year 2016. I did this in blocks of months for two reasons: breaking down the overall scraping time into several chunks, and trying to not get banned from the webserver. I ended up with 12 CSV files - one for each month - which were combined into one large CSV file consisting of 113,174 rows and 4 variables:

date time artist title

After some date/time arithmetics I ended up with the final CSV file consisting of 113,174 rows and 16 variables:

date time artist title ts day month year wday wdayLabel wk qrtr hr min rushHour season

Most of the variables are straightforward. However I would like to explain two of them in more detail:

  • rushHour {morning, evening, NULL}: the value in this column determines whether a song was played during one of the rush hours of the day, with them being defined as:
    • morning: 06:00am - 08:59am
    • evening: 04:00pm - 06:59pm
    • only on weekdays (Mon - Fri)
  • season {winter15, spring, summer, fall, winter16}: the value in this column determines in which season of the year a song was played, with them being defined as:
    • winter15: 2015-12-01 - 2016-02-29
    • spring: 2016-03-01 - 2016-05-31
    • summer: 2016-06-01 - 2016-08-31
    • fall: 2016-09-01 - 2016-11-30
    • winter16: 2016-12-01 - 2017-02-28

The data was in very good quality so that I had to only some light cleaning. Usually, the artist is stated on the website as LastName, FirstName. Sometimes however it might be listed as LastName,FirstName. The reason for this is unclear. I used regular expressions to catch each of the occurrences of the latter form and then converted them into the one mentioned first. When a song had two or more artists, this substitution was applied to each of the artists.

Also, some hours might be missing songs, especially when the song count for an hour was much lower than the average of 12.8. This might be due to an error in webscraping but might as well be due to special programming of the radio station.

Data Aggregates

Up until now I was working in Python for scraping the data and computing the date/time arithmetics. For creating data aggregates for the final visualization I switched to R.

Based on the CSV file mentioned above, the following aggregates were created:

  • distinctSongs: Grouping of playlist entries by artist and title to to count the overall occurrence, or playcount, as well as first day and last day of appearance on the program for each song in the year 2016.
  • songsPerArtist: Grouping of distinct songs per artist to count the number of distinct songs that were played by an artist.
  • songsPerDate: Grouping of songs per date to count the number of songs played on each day of the year.
  • songsPerHour: Grouping of songs per hour to count the number of songs played per hour of day.

All of these aggregates are calculated on the fly because the user is able to filter the underlying data source using up to 8 of these filters:

Filters

Filters

Visualization in Shiny

For this project I again turned to Shiny, which is a "web application framework for R [to] turn [...] analyses into interactive web applications" and the Shiny Dashboard dashboard package. This makes the data and results more approachable and interactive then just having it all in a rigid report such as in a PDF or PPTX format.

Start

The start page of the SWR3 Song Explorer features key insights into the data such as the date range of the data loaded, how many songs were played in that range, which song and artist were played the most, etc.

Start

Start

It is supposed to give a quick overview about the data at hand.

Table

The table could be seen as the centerpiece of this application. It is sorted in descending order by play count, meaning the most played song is on top. The sorting can be changed by the user.

Table

Table

Next to the artist name and title of the song it also shows the first and last time the song was played on this radio station and - most importantly - how often.

This table, as well as all the other charts under Songs which are described below, can be filtered by up to 8 filters hidden in the collapsible box right above the table/chart.

Calendar

The calendar shows how many songs were played on any given day of the year. While not yielding much information when no filters are applied - basically every day the same amount of songs are played more or less - it might be interesting to take a closer look at this chart when only filtering for artists or titles.

Calendar

Calendar filtered for the most played artist, "Coldplay"

Clock

This is a rose diagram which should be read as a 24h clock. For each hour in a 24 cycle it shows how many songs were played. When not filtered by day, it shows the accumulated song count played during this hour of every day.

Clock

Clock filtered for the most played artist, "Coldplay"

For this chart it is the same as for the calendar: it might be more interesting to take a closer look at this chart when only filtering for artists or titles.

Histogram

The histogram gives an interesting insight towards the distribution of songs on this radio station.

Histogram

Histogram

While the vast majority of songs are played between 1 and 15 times over the course of the year 2016, there are some outliers which are played much more often, up to the most played song "X Ambassadors - Renegades", being played 35 times more often than the mean (21.6) or even 318 times more often than the median (2).

Songs per Artist

While it is interesting to learn about song statistics, taking a look at the artists might not be a bad idea.

sh-proj-02-spa

This chart shows in decreasing order the most popular artists in terms of distinct songs played. The top artist, Bon Jovi, had 41 of their songs played, with the mean and median only being 2.3 and 1, respectively.

Song Title Word Cloud

The word cloud, very fashionable in info graphics theses days, shows the most used words in song titles played on SWR3. Each song is only counted once in order to not take popularity into account.

Word Cloud

Word Cloud

The word cloud has two more special filters: the amount of words shown can be changed, and words to be filtered out can be specified by the user. The cloud is only updated when the apply button is pressed. Because the mascot of SWR3 is a moose, this word cloud gets generated in a moose shape when rendered for the first time. The shape can be changed to represent a circle.

Conclusion

There were some interesting findings in the data. First of all, I was amazed by the sheer amount of songs that are played during the course of one year: 113,173. Giving it a bit more thought it makes sense though: ~13 songs per hour over the course of 366 days gives a close estimate.

Also very surprising was the distribution of songs. While I had a feeling that some songs were played more than others - which was actually the main reason I analyzed the data in the first place - I was surprised that there indeed seems to be a kind of "hot rotation" going on: only 226 songs are played more than 100 times (509; 50) while the rest of the songs - 5,021 - are played less than 100 times (4,738; 50), with 1,985 of these songs only being played exactly once.

The same goes for the artists: only 68 artists have 10 or more of their songs played, the rest - 2,188 - have less than 10 of their songs played, with 1,406 artists having exactly one of their songs played.

As stated above, code and data can be found on GitHub, while the app itself is online at shinyapps.io.

About Author

Stefan Heinz

Stefan received his Bachelor's degree in Logistics from Heilbronn University in Germany, including a one year stopover in Hong Kong. He then went on to graduate cum laude from Maastricht University's School of Business and Economics in the...
View all posts by Stefan Heinz >

Related Articles

Capstone
Catching Fraud in the Healthcare System
Capstone
The Convenience Factor: How Grocery Stores Impact Property Values
Capstone
Acquisition Due Dilligence Automation for Smaller Firms
Machine Learning
Pandemic Effects on the Ames Housing Market and Lifestyle
Machine Learning
The Ames Data Set: Sales Price Tackled With Diverse Models

Leave a Comment

Cancel reply

You must be logged in to post a comment.

Drittes Projekt: Update – Stefan in NYC March 2, 2017
[…] Abschluss zum SWR3 Song Explorer ist mittlerweile ΓΌbrigens der technische Blogpost […]

View Posts by Categories

All Posts 2399 posts
AI 7 posts
AI Agent 2 posts
AI-based hotel recommendation 1 posts
AIForGood 1 posts
Alumni 60 posts
Animated Maps 1 posts
APIs 41 posts
Artificial Intelligence 2 posts
Artificial Intelligence 2 posts
AWS 13 posts
Banking 1 posts
Big Data 50 posts
Branch Analysis 1 posts
Capstone 206 posts
Career Education 7 posts
CLIP 1 posts
Community 72 posts
Congestion Zone 1 posts
Content Recommendation 1 posts
Cosine SImilarity 1 posts
Data Analysis 5 posts
Data Engineering 1 posts
Data Engineering 3 posts
Data Science 7 posts
Data Science News and Sharing 73 posts
Data Visualization 324 posts
Events 5 posts
Featured 37 posts
Function calling 1 posts
FutureTech 1 posts
Generative AI 5 posts
Hadoop 13 posts
Image Classification 1 posts
Innovation 2 posts
Kmeans Cluster 1 posts
LLM 6 posts
Machine Learning 364 posts
Marketing 1 posts
Meetup 144 posts
MLOPs 1 posts
Model Deployment 1 posts
Nagamas69 1 posts
NLP 1 posts
OpenAI 5 posts
OpenNYC Data 1 posts
pySpark 1 posts
Python 16 posts
Python 458 posts
Python data analysis 4 posts
Python Shiny 2 posts
R 404 posts
R Data Analysis 1 posts
R Shiny 560 posts
R Visualization 445 posts
RAG 1 posts
RoBERTa 1 posts
semantic rearch 2 posts
Spark 17 posts
SQL 1 posts
Streamlit 2 posts
Student Works 1687 posts
Tableau 12 posts
TensorFlow 3 posts
Traffic 1 posts
User Preference Modeling 1 posts
Vector database 2 posts
Web Scraping 483 posts
wukong138 1 posts

Our Recent Popular Posts

AI 4 AI: ChatGPT Unifies My Blog Posts
by Vinod Chugani
Dec 18, 2022
Meet Your Machine Learning Mentors: Kyle Gallatin
by Vivian Zhang
Nov 4, 2020
NICU Admissions and CCHD: Predicting Based on Data Analysis
by Paul Lee, Aron Berke, Bee Kim, Bettina Meier and Ira Villar
Jan 7, 2020

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day ChatGPT citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay football gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income industry Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI

NYC Data Science Academy

NYC Data Science Academy teaches data science, trains companies and their employees to better profit from data, excels at big data project consulting, and connects trained Data Scientists to our industry.

NYC Data Science Academy is licensed by New York State Education Department.

Get detailed curriculum information about our
amazing bootcamp!

Please enter a valid email address
Sign up completed. Thank you!

Offerings

  • HOME
  • DATA SCIENCE BOOTCAMP
  • ONLINE DATA SCIENCE BOOTCAMP
  • Professional Development Courses
  • CORPORATE OFFERINGS
  • HIRING PARTNERS
  • About

  • About Us
  • Alumni
  • Blog
  • FAQ
  • Contact Us
  • Refund Policy
  • Join Us
  • SOCIAL MEDIA

    Β© 2025 NYC Data Science Academy
    All rights reserved. | Site Map
    Privacy Policy | Terms of Service
    Bootcamp Application