Analyzing the Evolution of Rap Music from 1989 to 2016

Efezino Erome-Utunedi
Posted on May 27, 2017

Introduction

The current state of rap music today is something that is discussed in many hip hop and rap communities. Numerous people, myself included, believe that rap music has slowly been deteriorating, especially since 2010. That is because today’s rap artists rely solely on beat and not on good lyrics. In my view, the lyrics favored today have little or no word play, and  the vocabulary has been severely dumbed down.  I was willing to put that impression to the test of data analysis.

One of the perks of being a data scientist is being able to identify metrics to quantify a question like this. Some of the questions I wanted to answer were:

  • What are some of the main topics per year for rap music?
  • What were the most used words per year? Do they provide further insight into topics?
  • What was the change in the usage of derogatory words?
  • What is the measure of the vocabulary level of lyrics each year?

The assumption for this project was:

  • Rap music will be limited to top rap songs from the billboard website

Data (Web Scraping)

In order to get a general trend of the lyrics of rap music, I decided to scrap the billboard website for top rap songs from 1989 to 2016. Although this is by no means a comprehensive list of the rap songs from the selected time period, it is a good location to start.

I decided to use the python class beautifulsoup to scrape the billboard and stored the results in two dataframes, one dataframe for the artists per year and another for the song titles per year.  

Once the artist name and song title for the top rap songs were obtained, I used the unofficial API, tswift, to obtain the lyrics based on the artist name and song title. In order to accomplish this, a lot of the artist name and song title had to be reformatted in order to successfully use the tswift API. Non-alphanumerical characters had to be removed and spaces had to be replaced with “-”. Below are tables displaying the artist name and song titles from 2011 to 2016 after they have been adjusted to function with the tswift API.

The lyrics  were then put in a pandas dataframe where they could later be manipulated or utilized for any analysis. Note that the tswift API was unable to identify ~ 10% of the songs. For future work on project, I will create my own API to obtain song lyrics from a website with a bigger database of songs or from different websites.  

Data Analysis (NLP)

LDA

The first Natural Language Processing (NLP) technique I wanted to try was latent dirichlet allocation (lda). This is an unsupervised method that aims to iteratively identify the probability of a word in lyrics ( a document) connected  to a particular topic as well as  the combinations of topics touched on by  a particular  word. As the number of topics is a very important tuning parameter, I played around with different topic lengths (2,3,4 and 5) to see if I could identify topics that broke down the lyrics per year into different factions. I also tuned some parameters, known as alpha and beta, for the lda model. Both values are  usually between 0 and 1. A higher alpha value corresponds to each document containing a mixture of most topics and visa versa. A high beta value corresponds to each topic likely to contain a mixture of many words in the all the documents and visa versa. Unfortunately, I was unable to adequately identify different topics in the lyrics per year. What the model did tell me was the usage of derogatory words after 1996 spiked up. Some of the topics per year definitely centered around love, but, apart from that, the lda model provided little insight into the topics. Below is an example of the words belonging to the top three and top 4 topics in 2015 and 2016.

 

What I hope to do in the future is pass a list of all the lyrics and see if the model can identify different topics in all the lyrics rather than per year.  

Top 25 Word Count Per Year

Since I was unable to identify different topics from the lyrics, I decided to look at the top 25 words per year. I did this using the natural language toolkit class in Python. I tokenized the lyrics and found the stem of each word and passed into a natural language toolkit function that identified the top 25 words per year. Rather than display barplots of the top 25 words per year from 1989 - 2016, I displayed the top 25 words per year in 1996, 2006, and 2016 as examples.

As can be seen from the barplots above, words like, what, that, said, they, etc. can be ignored since they provide no insight into the topics. Once they are taken out, it is obvious that derogatory words become more prevalent in the top 25 words per year.  

Trend of derogatory words and words alike

Again, a lot of the words obtained did not help in differentiating the lyrics from 1989 to 2016 but what I did notice was the usage of derogatory words in the top 25 words per year. From this I created a list of derogatory words and decided to plot a trend of the number of derogatory words used per year in the lyrics.

Apart from the spikes in the plot, the general trend in the plot states that there is an increase in the usage of derogatory words, especially from early 2000s to 2016.

Another issue I wanted to see was the number of times the words like money, power, and sex were utilized.

As can be seen from the figure above, apart from the spike in 1998, the trend is relatively flat, maybe even decreasing. It goes to show that that rappers might consistently talk about money, power and
sex but the usage of derogatory words is definitely rising.

One point to note is that the slang used today is completely different from slang used 10 or even five  years ago. So although the usage of words  like “money”, “power”, and “sex” has decreased, different slang could have been used to refer to the same words. What I hope to do in the future is identify all word similar to “money”, “power” and “sex” and check if there is a major change in trend displayed above.

Conclusion

Although this is by no means a comprehensive list of the rap songs from 1989 to 2016, the lyrics from the billboard website was a good place to start. Given more time, more tuning of the LDA function will be necessary to identify if there are topics to be extracted from the lyrics. Although I was able to prove that the usage of derogatory words have on average been increasing, I was unable to find a correlation that corresponded to a decrease in the lyrical word play and just overall better storytelling of lyrics. What I hope to do in the future is create an API capable of accessing a wider database of rap music lyrics, creating code to measure the vocabulary level of the lyrics, and identifying measures of quantifying lyrical superiority.

 

References

  • Wikipedia reference of latent dirichlet allocation (lda)
    • https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation
  • Lda reference
    • https://www.youtube.com/watch?v=BuMu-bdoVrU&t=1039s

 

About Author

Efezino Erome-Utunedi

Efezino Erome-Utunedi

Efezino recently completed his MENG in Mechatronics Design at the University of British Columbia, focusing on controls engineering. He now works full-time at an engineering consulting firm while enrolled in the NYCDSA's 2017 January to May online cohort,...
View all posts by Efezino Erome-Utunedi >

Related Articles

Leave a Comment

Avatar
Google October 11, 2019
Google Usually posts some very exciting stuff like this. If you are new to this site.
Avatar
Google September 29, 2019
Google We came across a cool site that you might delight in. Take a search if you want.
Avatar
Top Rap Songs 2016 - Music News Beat June 21, 2017
[…] lyrics of rap music, I decided to scrap the billboard website for top rap songs from 1989 to 2016. [2] Once the artist name and song title for the top rap songs were obtained, I used the unofficial API, […]
Avatar
insomniac June 10, 2017
certainly like your web site however you have to check the spelling on several of your posts. A number of them are rife with spelling issues and I in finding it very bothersome to inform the reality however I will surely come again again.
Avatar
share investment advice June 8, 2017
If some one wishes expert view on the topic of blogging then i propose him/her to go to see this web site, Keep up the good job.
Avatar
stock quotes June 7, 2017
Thanks in support of sharing such a good thought, post is nice, thats why i have read it completely
Avatar
best digital camera June 6, 2017
Keep on working, great job!
Avatar
starting your own business June 5, 2017
I am sure this piece of writing has touched all the internet visitors, its really really fastidious article on building up new web site.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

2019 airbnb alumni Alumni Interview Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Big Data Book Launch Book-Signing bootcamp Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Industry Experts Job Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest recommendation recommendation system regression Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Tableau TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp