2016 Presidential Candidates Twitter Analysis
Contributed by Sricharan Maddineni.He is currently in the NYC Data Science Academy 12 week full time Data Science Bootcamp program taking place between January 11th to April 1st, 2016. This post is based on his third class project – Web scraping(due on the 6th week of the program).
The 2016 presidential race is by far the most interesting race I've ever followed and as twitter becomes more important, it's become a must for presidential candidates to get their message out and their tweets reflect their personality, interests, and strategy. Analyzing their social presence leads to some interesting insights.
Using Twitter's API, I've scrapped the last 3,200 tweets from each of the top 6 presidential candidates and explored differences in engagement and sentiment. If you're unfamiliar with Twitter, you can use the widgets below to view their Twitter accounts (left), and see the live firehose of tweets from all candidates (right).
I am only interested in the most retweeted tweet for each candidate as it shows their highest engagement on that day. I also figured it would also be interesting to see the actual tweet content on the selected date. In the example above, I've selected December 25th to gain some insights on what the candidates were tweeting on Christmas day. Interestingly all the republican candidates tweeted some lengthy messages and Jeb Bush even mentioned the troops, but Hilary and Bernie simply tweeted "Merry Christmas!" and a picture of their families. In the top right corner, we can also obtain the exact retweet count for the candidates being plotted. Below the graph we can see the total retweet count for all the candidates (for all their tweets in addition to their most retweeted) and the total number of tweets by the candidates on the selected day.
I also performed a sentiment analysis on the tweet content to determine if there were any patterns between the candidates. The example plots show the sentiment score plotted against the retweet count for Donald Trump and Hillary Clinton. What's interesting is both the anti-establishment candidates (Trump and Bernie) have more retweets on both sides of the sentiment distribution, whereas, Hillary has a much flatter score distribution. This shows that supporters are very engaged with the anti-establishment rhetoric (negative sentiment tweets). In Donald Trumps case, these tweets generally refer to ISIS and for Bernie, they pertain to gun control.
It was also pertinent to visualize the word frequency in tweets between candidates to notice differences in their vocaubulary. Preliminary text cleaning was done to remove stop words and force lowercase letters in order to extract the most useful word frequencies.
Looking @realDonaldTrump, we notice that his most frequent word is 'Trump', definitely proving that he is exteremly self-referential. Interestintly, 'Donald' is mentioned less frequently leading me to believe he finds more value in his last name which makes sense because of his brand. The Trump brand is the backbone of his campaign strategy. Some other words that stand out are 'great' and 'will', since he frequently uses the phrase "make America great again". He also mentions Hilary, Cruz, Jeb Bush, Carson, and Rubio but doesn't seem to be mentioning Bernie Sanders.
Looking @BernieSanders we see that his vocabulary is more varied. There are a significant number of words (as compared to Trump) highlighted in red meaning they are used in medium frequency. Also Bernie doesn't seem to be mentioning the other candidates much because none of their names show up in the word cloud.
Under the Hood
I utilized the Tweepy python library to extract the tweets from the Twitter API. Once I had the tweets for each candidate in separate CSV's, I prepared the data for visualization.
Summarizing Retweets by Hour of Day example: