2016 Presidential Candidates Twitter Analysis

Sricharan Maddineni
Posted on Mar 3, 2016

Contributed by Sricharan Maddineni.He is currently in the NYC Data Science Academy 12 week full time Data Science Bootcamp program taking place between January 11th to April 1st, 2016. This post is based on his third class project – Web scraping(due on the 6th week of the program).

Overview

The 2016 presidential race is by far the most interesting race I've ever followed and as twitter becomes more important, it's become a must for presidential candidates to get their message out and their tweets reflect their personality, interests, and strategy. Analyzing their social presence leads to some interesting insights.

Using Twitter's API, I've scrapped the last 3,200 tweets from each of the top 6 presidential candidates and explored differences in engagement and sentiment. If you're unfamiliar with Twitter, you can use the widgets below to view their Twitter accounts (left), and see the live firehose of tweets from all candidates (right).


The App

Time-Series

Screen Shot 2016-03-03 at 4.37.20 PM

I am only interested in the most retweeted tweet for each candidate as it shows their highest engagement on that day. I also figured it would also be interesting to see the actual tweet content on the selected date. In the example above, I've selected December 25th to gain some insights on what the candidates were tweeting on Christmas day. Interestingly all the republican candidates tweeted some lengthy messages and Jeb Bush even mentioned the troops, but Hilary and Bernie simply tweeted "Merry Christmas!" and a picture of their families. In the top right corner, we can also obtain the exact retweet count for the candidates being plotted. Below the graph we can see the total retweet count for all the candidates (for all their tweets in addition to their most retweeted) and the total number of tweets by the candidates on the selected day.


SentimentScreen Shot 2016-03-03 at 5.21.40 PM

Screen Shot 2016-03-03 at 5.13.37 PM

I also performed a sentiment analysis on the tweet content to determine if there were any patterns between the candidates. The example plots show the sentiment score plotted against the retweet count for Donald Trump and Hillary Clinton. What's interesting is both the anti-establishment candidates (Trump and Bernie) have more retweets on both sides of the sentiment distribution, whereas, Hillary has a much flatter score distribution. This shows that supporters are very engaged with the anti-establishment rhetoric (negative sentiment tweets). In Donald Trumps case, these tweets generally refer to ISIS and for Bernie, they pertain to gun control.


WordCloudScreen Shot 2016-03-03 at 5.30.48 PM

It was also pertinent to visualize the word frequency in tweets between candidates to notice differences in their vocaubulary. Preliminary text cleaning was done to remove stop words and force lowercase letters in order to extract the most useful word frequencies.

Looking @realDonaldTrump, we notice that his most frequent word is 'Trump', definitely proving that he is exteremly self-referential. Interestintly, 'Donald' is mentioned less frequently leading me to believe he finds more value in his last name which makes sense because of his brand. The Trump brand is the backbone of his campaign strategy. Some other words that stand out are 'great' and 'will', since he frequently uses the phrase "make America great again". He also mentions Hilary, Cruz, Jeb Bush, Carson, and Rubio but doesn't seem to be mentioning Bernie Sanders.

Looking @BernieSanders we see that his vocabulary is more varied. There are a significant number of words (as compared to Trump) highlighted in red meaning they are used in medium frequency. Also Bernie doesn't seem to be mentioning the other candidates much because none of their names show up in the word cloud.


Under the Hood

I utilized the Tweepy python library to extract the tweets from the Twitter API. Once I had the tweets for each candidate in separate CSV's, I prepared the data for visualization.

https://gist.github.com/sriyoda/f205ec649f3f3debf28c

Summarizing Retweets by Hour of Day example:

https://gist.github.com/sriyoda/ad6d76c8bfe58c4854d3

Sentiment

https://gist.github.com/sriyoda/aec02c9c32936432ecf3

 

About Author

Sricharan Maddineni

Sricharan Maddineni

Sricharan Maddineni was a Neuroscience undergrad at Rutgers university. He is a professional music producer turned Data Scientist who has worked with major artists like Kid Ink, Dj Mustard, BMG and garnered over 18 million plays. He has...
View all posts by Sricharan Maddineni >

Leave a Comment

Avatar
copie bracelet cartier July 7, 2016
cartierbraceletlove Awesome post, I’ve used each one of these in what we do and have wanted to do something like this post for a while. Now I have something to share with people to really explain how to implement each level of what is discussed in Cialdini’s original book. copie bracelet cartier http://www.amourcabijouxtier.cn/

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp