Examining Billboard Hot 100 Lyrics from 1987 - 2016

Scott Edenbaum
Posted on February 20, 2017

Going back to August 4th, 1958, Billboard has released a weekly publication of the "Hot 100" Singles chart

Introduction - Billboard magazine cemented their status as an integral figure of American popular culture with the creation of the Billboard Hot 100 chart. Since 1958, the Hot 100 chart has been accepted as the 'gold standard,' or benchmark of the popular music rankings.

The rankings are based on a formulaic approach, not the subjective to the musical preferences of the individuals tasked with compiling the list. Airplay on roughly one thousand terrestrial radio stations are tracked to form the foundation of the ranking data. Nielsen provides song sales data for both digital and physical formats which are factored into the rankings. Most recently, Billboard added music streaming data to be factored into the hot 100 chart rankings.

Scope - For this project I wanted to analyze the lyrics from popular songs over the past 30 years. In order to have a consistent input source and not have my musical preferences bias the results of the analysis I chose to work with the Billboard Hot 100 charts. Billboard releases weekly Hot 100 charts going back to the 1950's. Click here to find This Week's Hot 100 Chart.

My goal was to analyze the lyrics by year, and find trends in the most popular words used .

Data - All data used in this project was scraped.

The Billboard Hot 100 chart data was scraped from The Ultimate Music Database using a combination of BeautifulSoup, and Regular Expressions.

The twsift unofficial API for MetroLyrics was used to acquire the lyrics corresponding to each song entry in the Billboard Hot 100 charts. This API allows quick access to the lyrical content hosted by MetroLyrics with one major caveat - the song title and artist must be meticulously adjusted (removing non alpha-numerical characters, replacing spaces with '-', and correctly identifying the title & artist) otherwise it wont return the correct lyrics.

Click here for lyric scraping code





Top 25 Words per Year, 1988-2016


1988
topwords88

1989topwords89

1990topwords90

1991

topwords91

1992topwords92

1993topwords93

1994

topwords94

1995topwords95

1996topwords96

1997
topwords97

1998
topwords98

1999
topwords99

2000

2001topwords01

2002topwords02

2003topwords03

2004topwords04

2005topwords05

2006topwords06

2007topwords07

2008topwords08

2009topwords09

2010topwords10

2011topwords11

2012topwords12

2013topwords13

2014topwords14

2015topwords15

2016topwords16

Wordcloud by year from 1987-2016

 

1987
words 1987

1988words 1988

1989

words 1989

1990words 1990

1991words 1991

1992words 1992

1993words 1993

1994words 1994

1995words 1995

1996words 1996

1997words 1997

1998words 1998

1999

words 1999

2000

words 2000

 

2001

words 2001

 

2002

words 2002

 

2003
words 2003

2004words 2004

2005words 2005

2006
words 2006

2007words 2007

2008words 2008

2009words 2009

2010words 2010

2011words 2011

2012words 2012

2013
words 2013

2014words 2014

2015words 2015

2016words 2016

 

 


######Generate Word-cloud by year

from os import path
from wordcloud import WordCloud
def get_wordcloud_year(year):
wordbag = words_by_year(year)

words = remove_nonalphanum(wordbag)
print 0, len(words)

words = words.split()
# Remove single-character & 2-character tokens (mostly punctuation)
words = [word for word in words if len(word) > 2]
print 1, len(words)

# Remove numbers
words = [word for word in words if not word.isdigit()]
print 2, len(words)

# Lowercase all words (default_stopwords are lowercase too)
words = [word.lower() for word in words]
print 3, len(words)

#remove stopwords
words = [word for word in words if word not in all_stopwords]
print 4, len(words)

#wordcloud = WordCloud().generate(words)

# Display the generated image:
# the matplotlib way:
import matplotlib.pyplot as plt
# plt.imshow(wordcloud)
plt.axis("off")

# lower max_font_size
#wordcloud = WordCloud(max_font_size=50).generate() (str(words))
plt.figure()
#plt.imshow(wordcloud)
plt.axis("off")
#plt.show()
print(len(words))
wordcloud = WordCloud(width = 1000, height = 750, font_path='/Library/Fonts/Verdana.ttf',
relative_scaling = 1.0,
stopwords = all_stopwords,
).generate(' '.join(words))
plt.figure(figsize=(20,12))
plt.imshow(wordcloud)
plt.axis("off")
plt.show()


 

###Generates histogram of top 25 lyrics

stopwords_file = './stopwords.txt'
custom_stopwords = set(codecs.open(stopwords_file, 'r', 'utf-8').read().splitlines())

all_stopwords = default_stopwords | custom_stopwords

def get_wordfreq_df(year):

wordbag = words_by_year(year).decode('utf-8')#vocab.decode('utf-8')#words_by_year(year)
words = nltk.word_tokenize(wordbag)
words = [word for word in words if len(word) > 2]
words = [word for word in words if not word.isdigit()]
words = [word.lower() for word in words]
words = [word for word in words if word not in all_stopwords]

fdist = nltk.FreqDist(words)

d = Counter(fdist)
word_df = pd.DataFrame.from_dict(d, orient='index').reset_index()
word_df = word_df.rename(columns={'index':'Word',0:'count'})

df = pd.DataFrame(fdist.most_common(25))
df.columns = ['Words', 'Count']
df.sort_index(ascending=False).plot(
kind='barh',
x = 'Words',
title = "Most Common Lyrics in: " + year,
)

 

Conclusion - A lot has changed in regards to popular music over the past 30 years, but one theme stands the test of time - Love. Although in recent years its lead seems to be fading (though that may be an artifact of my data collection), "Love" is consistently one of the most frequently used words in popular music

The most frequent words in the Billboard Hot 100 lyrics since 1987 are:

I'm

Love

Don't

Like

Know

Oh

Just

Got

Baby

Yeah

Want

You're

Cause

Make

Time

Let

Girl

Say

Way

Come

I'll

Ain't

Right

Gonna

Need


About Author

Scott Edenbaum

Scott Edenbaum

Scott Edenbaum is a recent graduate from the NYC Data Science Academy. He was hired by the Academy to assist in buildout of the learning management system and seeks to pursue a career as a Data Scientist. Scott's...
Read more

Leave Responses

Your email address will not be published. Required fields are marked *

cHG7uAHg September 3, 2017
127335 115137Hello there, just became alert to your blog via Google, and discovered that its truly informative. Im gonna watch out for brussels. Ill be grateful in the event you continue this in future. Numerous individuals will probably be benefited from your writing. Cheers! 999175
JanisX August 31, 2017
I see your website needs some fresh posts. Writing manually takes a lot of time, but there is tool for this boring task, search for: Wrastain's tools for content
URL August 17, 2017
... [Trackback] [...] Read More: nycdatascience.com/blog/student-works/web-scraping/billboard-hot-100-lyrics-1987-2017/ [...]
EltonX July 9, 2017
I must say you have very interesting posts here. Your content should go viral. You need initial boost only. How to get massive traffic? Search for: Murgrabia's tools go viral
seo plugin March 31, 2017
Hello Web Admin, I noticed that your On-Page SEO is is missing a few factors, for one you do not use all three H tags in your post, also I notice that you are not using bold or italics properly in your SEO optimization. On-Page SEO means more now than ever since the new Google update: Panda. No longer are backlinks and simply pinging or sending out a RSS feed the key to getting Google PageRank or Alexa Rankings, You now NEED On-Page SEO. So what is good On-Page SEO?First your keyword must appear in the title.Then it must appear in the URL.You have to optimize your keyword and make sure that it has a nice keyword density of 3-5% in your article with relevant LSI (Latent Semantic Indexing). Then you should spread all H1,H2,H3 tags in your article.Your Keyword should appear in your first paragraph and in the last sentence of the page. You should have relevant usage of Bold and italics of your keyword.There should be one internal link to a page on your blog and you should have one image with an alt tag that has your keyword....wait there's even more Now what if i told you there was a simple Wordpress plugin that does all the On-Page SEO, and automatically for you? That's right AUTOMATICALLY, just watch this 4minute video for more information at. Seo Plugin