Data Analysis: rap music 2009-2017 U.S. presidents

Posted on Apr 22, 2019

Project GitHub | LinkedIn:   Niki   Moritz   Hao-Wei   Matthew   Oren

The skills we demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.


Since its birth in 1969, hiphop culture and political theatre has had a close tie. Rap music became the voice of the African-American people as they lost social standing as a result of gentrification of New York. Evidently, data suggests many rappers have referred to the presidents to raise awareness of the political/economic situation of the black community.

After a transition of presidency two years ago, it seems obvious that the hip hop community loved Obama and has been openly criticizing Donald Trump for his political actions. However, we have never witnessed a direct correlation between references of the president and the actual living situation of Afro-Americans. Thus, I decided to scrape lyrics with the references to analyze the relationship between the two.

Scraping & Data Collection

In order to get the references of Obama and Trump in rap lyrics, I scarped using Selenium. It would search the website with the name of the corresponding president, get hyperlinks to all the lyric results, get the song name, artist name, album name, and the release year. However, it was nearly impossible to scrape lyrics from the website because lyrics was divided into multiple hyperlinks instead of a text paragraph. Thus, I scraped using Scrappy to get the lyrics of the songs that referred the U.S. presidents.

For the analysis of scraped lyrics data, I manually searched for data including the approval rate of two presidents in each year, police-murder-rate, poverty rate, and mean-income of African Americans. The data I collected was collected from different government organizations and media that they do not have the perfect consistency among them. Some fields of data miss data for particular years that analysis of the corresponding fields in those years was not possible.

Data Found

Although most of the data were able to be found in government-issued reports, police-murder-rates of African Americans were not completely reliable. The reasons are that such data was not collected with a reliable structure until 2015 by the government. Such is due to the FBI's announcement in 2015 to ask police departments to report police murder of black lives.

(Which most likely was caused by multiple incidents of police shootings of black people including the shooting of Michale Brown in Ferguson, Missouri and of Eric Garner in Staten Island) In addition, different media companies reported different numbers. Consequently, I got the rates data from multiple media platforms including Vox, New York Times, Washington Post, the Guardian, etc.

Natural Language Processing(LSTM)

To analyze the scraped lyrics, sentiment analysis was a crucial process. In order to do so, I used Tenserflow and a Long Term Short Term Memory(LSTM) model developed by This model takes each word of the sentence as one vector and creates an n-dimensional vector. (

n being the number of words within the sentence) Each word is converted into a vector and the sequence of vectors are interpreted as a sentence by the Recursive Neural Network(RNN). RNN forms a vector that formulates the correlations between each vocabulary within the sentence and which enables the interpretation of the sentiments of the sentences.

Shortcomings in NLP

There were two major shortcomings in the sentiment analysis process. One was that the lyrics do not take the form of a sentence. Since there are no punctuations that separate sentences such as period, I had to find a measure to determine if each reference of Obama or Trump was positive or not. Thus, I took 50 characters in front of and after the names of the presidents to form a group of words that is somewhat similar to a sentence. Such an environment made the interpretation of the references of the presidents more vague than I wanted it to be.

Another shortcoming was that the NLP model could not read between the lines like humans do. As the example above shows, if the sentence contained a number of negative words, the reference was interpreted as negative even if the negativity was not directed toward the president.


First of all, I would have to clarify the possible misunderstanding that can be caused by the visualization of the presidential approval rate. From 2009-2016, the approval rates indicate the approval rate of Obama, and the approval rates of 2017 and 2018 indicate those of Trump. For the sake of visualization, I demonstrated them in one graph.

As we can see from the graph, the negative references of president Obama go up drastically in 2013. Such can be understood in the context of mean income and poverty rate of the black people. The approval rate of Obama was the highest in 2009 and keeps going down until 2014, except for a bump in 2012, when he was re-elected. Also, we can see that the poverty rate of African Americans is constantly rising during the corresponding period and the mean income of the people is also constantly going down.

From this, it can be deduced that the economic situation of the black lives has correlations with not only the approval rate of the president but also the sentiments of the references of him in rap songs. In addition, we can see that negative mentions of the president went down considerably as Obamacare came into force in 2014, but goes up again until 2016 as it did not have a major impact on either poverty rate nor the mean income of the African American people.

Although the mean income and poverty rate do seem to have a relationship with approval rate, it does not explain a sudden drop of the approval rate in 2014. For that, we can see the major political injustice that happened in 2014. In 2014, there were multiple police shooting incidents of black people including that of Michael Brown in Ferguson, Missouri and Eric Garner in Staten Island. Those two incidents became national phenomena and Obama was criticized for not making an appearance in Ferguson while the social uprising was viral.

However, we can see that the approval rate and the positive mentions in rap music of Obama go up in 2015 as FBI announced that it will ask police departments for better reporting system of police injustice on black lives. It is interesting that the announcement was made in April, because the 2015 Baltimore protests that were caused by the death of Freddie Gray as a result of police brutality also took place in April 2015.

As the rate of the police murder of black people constantly went down from then on, it can be inferred that the change in government's policies on police brutality on black people does have a correlation with the approval rate and the positive references of Obama.

Donald Trump is an interesting reference to analyze within rap music. As many of the hip hop fans may know, he has been used as an icon of capitalistic success as braggadocio about one's wealth had become a trend in hip hop music. For example, Mac Miller released a song named Donald Trump in 2011 that talks about how the rapper established himself from a local rapper to a wealthy rap star and compared himself to Donald Trump.

Such Trend, however, changed drastically following Donald Trump's announcement of running for the presidency in 2015. Ever since then, negative references to president Trump has been rising.

Conclusion & Future Works

As we observed through the analysis of the scraped lyrics from two websites, we could find out that Obama's policies and actions were correlated with not only black lives in multiple dimensions but also references to him in rap music. In addition, we could deduce the relationship between references to Donald Trump and his presidency.

Although it was a very interesting topic to collect data and analyze, the relationship between Trump's actions and lives of African-Americans was not direct enough for me to connect the dots. For the future, I would like to study Trump's policies and actions and correlate them with references to him in rap music and the living conditions of black people deeper.

About Author

Joon Soo (Rudin) Ro

Rudin is a passionate data scientist with a problem-solving ability aided by strong communication skills and goal-oriented leadership. He has hands-on experience in R and Python in web-scraping, data visualization, machine learning, as well as algorithm optimization. He...
View all posts by Joon Soo (Rudin) Ro >

Leave a Comment

Google September 1, 2021
Google Sites of interest we've a link to.
Google January 1, 2021
Google Wonderful story, reckoned we could combine a number of unrelated information, nonetheless truly really worth taking a look, whoa did one understand about Mid East has got far more problerms too.
CBD For Dogs December 21, 2020
CBD For Dogs [...]Here are several of the sites we recommend for our visitors[...]
Google December 20, 2020
Google Here are some links to websites that we link to because we think they may be worth visiting.
Google September 18, 2020
Google Always a significant fan of linking to bloggers that I really like but do not get a whole lot of link like from.
Google September 14, 2020
Google Always a significant fan of linking to bloggers that I like but don’t get lots of link adore from.
YouTube Backlink August 28, 2020
YouTube Backlink [...]very handful of web sites that occur to be detailed beneath, from our point of view are undoubtedly nicely really worth checking out[...] August 26, 2020 [...]below you will uncover the link to some web pages that we consider you must visit[...] August 19, 2020 [...]always a major fan of linking to bloggers that I really like but don’t get a lot of link adore from[...]
cbd for pain July 9, 2020
cbd for pain [...]Wonderful story, reckoned we could combine a number of unrelated information, nonetheless really worth taking a look, whoa did one study about Mid East has got more problerms as well [...]

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI