Creating a Real-time Streaming Analytical Platform to manage social media marketing campaign

Motivation and Vision

The goal of the project was to provide actionable, scalable and data-driven insights to marketing managers to grow their customer base.


Research from Twitter (2016) shows that 49 percent of consumers seek purchase guidance from social media influencers. Even more important for marketers, nearly 40 percent of Twitter users said they had made a purchase as a direct result of an influencer’s tweet. Finally, a study by Collective Bias found that these non-celebrity influencers were likely to drive 10x more in-store purchases than celebrities.


This is why we decided to help marketers understand the top influencers for their brand by creating a platform, TwitterTalker, on which they could see the top influencers for a specific keyword.


When a brand aligns with an influencer, not only do they bring their audience, but they also bring their audience’s network. Because of the loyalty of their audience, an influencer has the ability to drive traffic to a company’s site, increase its social media exposure, and sell its product through their recommendation or story about their experience.

That’s why it’s important for marketers to understand the dynamics of influence and how influence is changing all the time.


Process & Team Work

Foundational Methodology

The methodology we followed was an iterative process that consisted of 10 stages from business understanding to solution deployment (see figure below). It illustrates the iterative nature of the problem-solving process. We believed the application should not be left in place unchanged after being created. Through the iterative process of feedback, refinement and redeployment, the application continuously evolved.






Teamwork Management

To manage our resources as effectively as possible, we used a shiny app and Asana, a web-based project management tool for task assignment and progress tracking. For urgent information sharing, we used Slack to communicate with each other.



Pipeline Structure

TwitterTalker was designed and developed in the course of 2 weeks. The final workflow is described below:



The data collection began by implementing the streaming Twitter API using Python. We also implemented the Google NLP API to understand and analyze the sentiment of the data being scraped. The script constantly ran on the cloud on an Amazon EC2 instance. For each tweet, the following information was stored:



The streaming data was sent to Amazon S3, for convenient data storage using Amazon Kinesis Firehose. Amazon Kinesis Firehose provided an easy way to send streaming data into Amazon Web Services (AWS) to enable near real-time analytics with existing business tools.  


The data stored on S3 was then batch processed using Apache Spark to ensure a scalable and reliable product for the clients of TwitterTalker. We used Spark to implement the majority of our analysis, e.g. calculate the influence score, etc. The data was processed every 15 minutes.


The final output from our analysis in Spark was sent to Amazon RDS as a SQL table and updated every 15 minutes as mentioned earlier. The Python Flask App called the database to display the real time data to the client interface.


Flask App Architecture

We used Flask, a so-called micro-framework, to take raw data from RDS through SQLAlchemy and used templates to convert it into a viewable form. Like a magician, we mixed some ingredients (data) according to a recipe (template) to create a potion (website).

The formats that the browser can display include HTML, CSS and JS triple. By combining all three elements, a browser is able the render a nice looking, interactive web site, web page, web application etc.



Interactive App

Below is the homepage of the interactive web application. After clicking the ‘Find out more’ button, users are invited to sign up. This is designed to collect user information. All input is stored in our RDS database. For future work, we would like to collect more data on how users interact with our platform, which would be used to train models to provide more sophisticated recommendations.


The real-time analytics platform consists of 4 core parts:

  1. Influence Score Dashboard, which allows users to identify top influencers in hashtag events and their tweets.
  2. Influence Map, which helps users spot where the top influencers are located.
  3. Hashtag Word Cloud, which allows users to understand the top keywords associated with a specific hashtag.
  4. Hashtag Trend, which aims at providing an overview of the hashtag’s popularity over time.


1- Influence Score Dashboard

The Influence Score Dashboard provides a clear breakdown of what influencers are saying about a hashtag, how many people the influencer is reaching and whether their view is positive, negative or neutral.

We expect marketers to leverage positive influencers in their campaigns and transform the negative and neutral ones into brand ambassadors.


Below is an example of the dashboard for influencers with a positive influence sentiment. The dashboard includes the name of the top 5 influencers, their influence score, their total number of followers, the number of times they got retweeted for the searched hashtag and their tweet for the searched hashtag.




In order to define and measure influence, we first identified which aspects gave an individual the power to influence others within their social sphere.

There are 3 components called the “Pillars of influence” that marketers want to consider: Reach, Resonance, and Relevance.


Reach is the ability to reach and impact a large audience. We measured it by looking at:

  • The number of people who were following the influencer on Twitter
  • The number of lists the influencer was part of
  • The total number of tweets posted by the influencer


Resonance is the ability to engage others with valuable content. We measured it by analyzing:

  • The number of retweets and likes the influencer received on a specific keyword


Relevance is the ability to create content that is relevant to the brand. This is when the Flask app comes in handy as we let marketers filter influencers based on specific keywords. The score of an influencer will be different for each keyword.

The criteria in the influence score needed to be weighted differently. The challenge was that we were working in an unsupervised environment where we had a lot of criteria but no output variable. In other words, we did not know whether a user for whom we had scraped data was an influencer or not.

Luckily, there was a Kaggle competition that aimed at predicting people who were influential on social media. The dataset included the criteria defined in the influence score above, plus the output variable, i.e. whether or not a user was an influencer.

We used the dataset from Kaggle to weight each variable using the measure of variable importance in Random Forest.


Below is the importance plot generated by random forest that we used to choose and assign weight to the variables in the influence score.


2- Influencer Map

Once marketers have identified the top 5 influencers by influencer sentiment, the next step is to locate them and identify possible trends. The Flask app displays where the top influencers are, and differentiates influencers who demonstrate a positive sentiment towards the hashtag from influencers who express negative feelings towards it.


This information is valuable as it enables marketers to spot potential markets that they may have dismissed. For retailers, it is an opportunity to open a new store next to one of these hot spots or display an ad showcasing one of the local influencers.


3- Hashtag Word Cloud

Our real time app generates three different word clouds for the hashtag specified by the marketer, one for each sentiment: positive, neutral and negative.


They are several ways marketers could use the hashtag word cloud. However, the main goal would be to evaluate how effectively they are conveying their brand messaging by identifying the top words associated with their brand (or whichever hashtag they specified). Are the industry buzzwords part of the word cloud? Are the words they are targeting for SEO (Search Engine Optimization) showing up? Which words have a positive or negative connotation?


Marketers can also use the interactive app to search for their competitors’ hashtag and see which top words are associated with them.


4- Hashtag Trend

Finally, in a social media marketing campaign, it is important to know a topic’s popularity and people’s sentiments towards it. Our real-time twitter analytics platform tracks the number of tweets and average sentiment score for a specified hashtag.


This helps marketers track how much conversation their brand (or whichever hashtag they specified) is driving and gives them a clear understanding of their brand in the competitive landscape. The average sentiment score is another component that helps marketers understand how the social media audience feel about the hashtag they searched for.


Overall this part of the dashboard is a good predictor of business growth by displaying upward or downwards trends in social media health for a specific hashtag.


Case Study

To gain a deeper insight of the twitter data we had collected, we did an analysis on retweet pattern, sentiment and influence power. The goal here is to guide users who are using our app by showing a real life application of our dashboard.


We selected 3 key words : ‘Friday Feeling’, ‘Fathers day’, ‘Trump’, and scraped over 100K tweets within 72 hours.




Number of tweets & average sentiment / hour

In the hashtag event ‘Friday Feeling’, we can see an ongoing topic popularity starting from Friday 2:00 pm. Approximately 2750 tweets were posted during peak hours. The popularity gradually came down from 2:00 am.

As for the sentiment, the average scores remained positive at all time. The sentiment scores slightly decreased during peak hours, however, increased after 4:00 am.


This is just an example of hashtag popularity tracking. In a real marketing campaign, a marketing managers should always keep an eye on popularity. If a sharp drop in tweet volume appears, one will be able to immediately react to the problem by implementing the following strategies:

  1. Identify current influencers and keep them active in posting tweets.
  2. Invite potential influencer (similar users) to join the current topic.
  3. Create new related topics


Understanding what twitter users are talking about is an important component of a marketing campaign. The sentiment tracking and word cloud functions allow marketing manager to know how audiences feel and what are the frequently mentioned words. If the average sentiment score drops significantly, it means that a potential public relationship crisis might be happening. Our product can be used as a powerful tool for crisis management as it can reveal the direction of public opinion in real-time, which allows leadership to take action faster.


Take United Airlines as an example; UA CEO apologized 24 hrs after the video of the incident had gone viral on Twitter. The late and insincere apology caused UA's stock price to drop by 1.1%, the equivalent of a $255M loss. If the crisis manager had identified the increase in tweet volume and negativity against UA, then the CEO of UA could have responded much more quickly, and the crisis management team could have effectively responded to the virus-like videos which spread at an early stage. 

Source: Business Insider


Social media marketing patterns

The difference between the two pattern: The figure below reveals the relationship between retweet count and follower count of two hashtags: ‘#Trump’ and ‘#Fathers Day’. We can easily find the difference between the two hashtags:

  1. #Trump: Medium-low follower count with high retweet count
  2. #FathersDay: High follower count with medium-low retweet count


What is causing the difference:  #Trump is always a hot topic, and that can be leveraged for marketing purposes. It is what enabled Trump to get elected when he only spent about half of what Clinton spent on his presidential campaign. In social media marketing, this pattern is called viral spreading.

Comparing with viral marketing, the traditional marketing seems to be far less cost-efficient. Just like the ‘#FathersDay’ pattern, companies need to pay big twitter influencers to post tweets about their products. However, the retweet counts are much lower since people are not interested in sharing the topics.


Elements of viral marketing: To leverage the benefits of viral marketing, marketers need to leverage two important components of a viral marketing strategy.

  1. Exploit common motivations / behaviors / emotions: Viral content uses a strong emotional hook to encourage hyper-accelerated sharing. The top viral emotions are: surprise, curiosity, amazement, interest, astonishment, and uncertainty.
  2. Provide for effortless transfer to others: Twitter is one of best soils for viral marketing as instant communication is easy and inexpensive.




Future Work

The product we created is a Minimum Viable Product (MVP) meaning that we built the minimum set of features possible to be able to deploy our product. The goal is to put it in front of real users and keep improving upon it as we gather feedback. This is based on an agile approach, building a product quickly, measuring each iteration of it and upgrading it often (as depicted below).


For future work, we’d like to collect data on the usage of our product and feedback from our users to keep providing scalable, actionable and data-driven insights to marketers. We also want to help marketers to be proactive by predicting the top influencers for a specific hashtag in the next 4-6 months. Finally, we want to support decision-making by giving recommendations based on the information our platform provides. This is where a data scientist would want to collaborate with the marketing team to bring the product to a new level.


About Authors

Claire Keser

Claire Keser completed her MBA at the University of Victoria (Canada). Her work experience has been primarily in Conversion Optimization (A/B testing) where she built & led a team focused on turning data into products, actionable insights, and...
View all posts by Claire Keser >

Grant Webb

Dedicated professional with nine years of experience in high-energy nuclear physics and large-scale data analysis at Brookhaven National Laboratory. As a member of an extensive collaboration of over 500+ scientists from 33 institutions in 15 countries, I directed...
View all posts by Grant Webb >

Yabin Fan

Yabin recently received her Master’s degree in Information System Engineering in Computer Science track from Johns Hopkins University. Before she came to NYC Data Science Academy, she worked as a Data Programmer to develop scripts in SQL and...
View all posts by Yabin Fan >

William Zhou

William Zhou is quantitative thinker and deep learning enthusiast with a strong background in healthcare. After graduating from Soochow University in Pharmaceutical science, he obtained a MHA from Columbia University in Healthcare management. In the following 2 years,...
View all posts by William Zhou >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI