2016 Presidential Candidates Twitter Analysis

Posted on Mar 3, 2016

Contributed by Sricharan Maddineni.He is currently in the NYC Data Science Academy 12 week full time Data Science Bootcamp program taking place between January 11th to April 1st, 2016. This post is based on his third class project – Web scraping(due on the 6th week of the program).


The 2016 presidential race is by far the most interesting race I've ever followed and as twitter becomes more important, it's become a must for presidential candidates to get their message out and their tweets reflect their personality, interests, and strategy. Analyzing their social presence leads to some interesting insights.

Using Twitter's API, I've scrapped the last 3,200 tweets from each of the top 6 presidential candidates and explored differences in engagement and sentiment. If you're unfamiliar with Twitter, you can use the widgets below to view their Twitter accounts (left), and see the live firehose of tweets from all candidates (right).

The App


Screen Shot 2016-03-03 at 4.37.20 PM

I am only interested in the most retweeted tweet for each candidate as it shows their highest engagement on that day. I also figured it would also be interesting to see the actual tweet content on the selected date. In the example above, I've selected December 25th to gain some insights on what the candidates were tweeting on Christmas day. Interestingly all the republican candidates tweeted some lengthy messages and Jeb Bush even mentioned the troops, but Hilary and Bernie simply tweeted "Merry Christmas!" and a picture of their families. In the top right corner, we can also obtain the exact retweet count for the candidates being plotted. Below the graph we can see the total retweet count for all the candidates (for all their tweets in addition to their most retweeted) and the total number of tweets by the candidates on the selected day.

SentimentScreen Shot 2016-03-03 at 5.21.40 PM

Screen Shot 2016-03-03 at 5.13.37 PM

I also performed a sentiment analysis on the tweet content to determine if there were any patterns between the candidates. The example plots show the sentiment score plotted against the retweet count for Donald Trump and Hillary Clinton. What's interesting is both the anti-establishment candidates (Trump and Bernie) have more retweets on both sides of the sentiment distribution, whereas, Hillary has a much flatter score distribution. This shows that supporters are very engaged with the anti-establishment rhetoric (negative sentiment tweets). In Donald Trumps case, these tweets generally refer to ISIS and for Bernie, they pertain to gun control.

WordCloudScreen Shot 2016-03-03 at 5.30.48 PM

It was also pertinent to visualize the word frequency in tweets between candidates to notice differences in their vocaubulary. Preliminary text cleaning was done to remove stop words and force lowercase letters in order to extract the most useful word frequencies.

Looking @realDonaldTrump, we notice that his most frequent word is 'Trump', definitely proving that he is exteremly self-referential. Interestintly, 'Donald' is mentioned less frequently leading me to believe he finds more value in his last name which makes sense because of his brand. The Trump brand is the backbone of his campaign strategy. Some other words that stand out are 'great' and 'will', since he frequently uses the phrase "make America great again". He also mentions Hilary, Cruz, Jeb Bush, Carson, and Rubio but doesn't seem to be mentioning Bernie Sanders.

Looking @BernieSanders we see that his vocabulary is more varied. There are a significant number of words (as compared to Trump) highlighted in red meaning they are used in medium frequency. Also Bernie doesn't seem to be mentioning the other candidates much because none of their names show up in the word cloud.

Under the Hood

I utilized the Tweepy python library to extract the tweets from the Twitter API. Once I had the tweets for each candidate in separate CSV's, I prepared the data for visualization.

Summarizing Retweets by Hour of Day example:



About Author

Sricharan Maddineni

Sricharan Maddineni was a Neuroscience undergrad at Rutgers university. He is a professional music producer turned Data Scientist who has worked with major artists like Kid Ink, Dj Mustard, BMG and garnered over 18 million plays. He has...
View all posts by Sricharan Maddineni >

Leave a Comment

copie bracelet cartier July 7, 2016
cartierbraceletlove Awesome post, I’ve used each one of these in what we do and have wanted to do something like this post for a while. Now I have something to share with people to really explain how to implement each level of what is discussed in Cialdini’s original book. copie bracelet cartier http://www.amourcabijouxtier.cn/

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI