Finding Influencers on Twitter
Have you been followed on Twitter or Instagram by someone you don't know?ย I get this a lot.ย And so to avoid being thought of as rude, I follow back. Eventually, I got tired of following back when I realized that some of these accounts don't really doย anything but collect followers. Now, why would anyone go through all the trouble of following people in the hopes of being followed back? Why would anyone waste so much time on the internet for this?
I eventually realized the answer when I sawย thatย most of these accounts wereย not personal. A lot of these accounts I encountered were about food, some aboutย beach vacations,ย and on some occasion accounts with risque content.
Advertising has infiltrated the social network.ย It used to be just ads on banners but now companies hire personalities on social media to spread the word about their product or event. Companies spend big bucks on celebritiesย in an effort to publicize their brand and attract a celebrity's fan base. A sponsored tweet could net as much as $13,000 as was the case for Kloe Kardashian in 2013.ย
Celebrities have multitudes of followers and get paid big bucks by sponsors. So peopleย may have thought that creating accounts and amassing followers would eventually get them sponsorship deals with advertisers. In this exercise, we see that sponsors might be looking for some other things other than the number of followers.
In a social network, a link could represent a relationship asย in Facebook or the passing of a tweetย as in Twitter. These links determine the flow of information and are therefore a good indicatorย of a user's influence. I will be presenting two methods of finding potential influencers in a network. One would beย by extractingย a user's influence measuresย and the other is byย using network graphs.
A large database was found on Followthehashtag.com. The database contained a stream of tweets related to NASDAQ 100 stocks extracted from twitter forย 79 days, fromย 2016 March 28th to 2016 June 15th. This was selected because of a good mix of accounts representing organizations and personalities.ย The database also containedย information about how many times a tweet was passed alongย and who the original tweet came from. This act, more popularly known as retweeting can be identified in the stream as tweets having 'RT @user' or 'via @ user' at the beginning of the tweet. The streamย also contained information about mentions. In twitter, a mention isย a public conversation between users. A user calls the attention ofย another user by mentioning them in a tweet. Mentioning is identified by tweets beginning with '@user'.
The influence measures extracted from the stream were the following: indegree, retweet, and mentions. Theseย measures wereย selected because of how they affect theย flow of information in the network.ย Indegree measures the user's popularity. This wasย easily extracted from the database by the number of followers a user has. The number of followers shows us the size of the user's audience base. Retweet influence represents a user's ability to create content which other users find worthy of sharing. When a tweet is shared by another user, a bigger network of users is exposed to the tweet.ย From the stream, this was extracted by counting the number of retweeted messages for each user. The third measure, mention influence, was extracted by counting the number of mentions containing the user's name. This influence measure indicates the ability of the user to engage others in a conversation. This represents the top-of-mind value of the user's name.
A total of 96,613 users tweeted about NASDAQ 100 stocks during the timeframe. Between them, over 680 thousand tweets were broadcast.ย Aย word cloud of the NASDAQ symbols most often mentioned shows that Apple, represented by AAPL, was the most tweeted stock among the group.ย
Users were most active on April 27 where they broadcast over 20,800ย tweets. This coincidesย with the day when AAPL stocks slumpedย following speculations that iPhone salesย may decline by as much as 60 million units compared to the same quarter a year ago. The slump in Apple shares dragged the tech-heavy NASDAQ into the red by the day's end.
Users' activity on this day showed that activity was mostly during trading market hours which is 13:30 to 20:30 UTC.
Each user'sย ranking over the three influence categories wasย assigned by using fractional ranking.ย For example, inย assigning the indegree ranking, a rank of 1 was given to the user with the most number of followers. Users with the same number of followers receive theย same ranking number, which is the mean of what they would have under ordinal rankings.ย Table 1 shows the top 30 users across the three influence measures. Notice that minimal overlap can be seen across each influence rank. The first user to show up across all three measures of influence wasย "WSJ".
Table 1. Top influentials based on indegree, retweets, and mentions
Rank | TopIndegree | TopRT | TopMentions |
1 | cnnbrk | philstockworld | jimcramer |
2 | nytimes | StocksHighAlert | CNBC |
3 | CNN | ValaAfshar | AlertTrade |
4 | Reuters | YahooFinance | WSJ |
5 | WSJ | BK_Stocks | Benzinga |
6 | Forbes | businessinsider | YahooFinance |
7 | AP | StockTwits | CNBCFastMoney |
8 | DRJAMESCABOT | timothysykes | TheStreet |
9 | GMA | CNNMoney | autumnalcity87 |
10 | MarketWatch | devonshiretech | carlquintanilla |
11 | JohnLegere | CNBCnow | HalftimeReport |
12 | USATODAY | ppprophet | markbspiegel |
13 | CNBC | carlquintanilla | RiskReversal |
14 | ForbesTech | Stockology101 | petenajarian |
15 | FortuneMagazine | Benzinga | TTtradertwit |
16 | timoreilly | OpenOutcrier | GerberKawasaki |
17 | rsAnakin_FBGx20 | marketexclusive | barronsonline |
18 | philstockworld | DayTradersGroup | ReformedBroker |
19 | dickc | theflynews | Reuters |
20 | ReutersBiz | TakeFlightSales | GuyAdami |
21 | businessinsider | WrigleyTom | StockTwits |
22 | om | TweakTown | RedDogT3 |
23 | Yahoo | markbspiegel | jonnajarian |
24 | SAI | SAI | cek_cpa |
25 | globeandmail | WSJ | JustinPulitzer |
26 | Variety | CenterTrading | ryanwallace198 |
27 | VH1 | CBOE | DougKass |
28 | CNNMoney | davidmoadel | technology |
29 | WSJbusiness | GerberKawasaki | BossHoggHazzard |
30 | CNET | options_answers | SquawkStreet |
To see how much users overlap across the three categories, a Venn diagram of the top 100 users was derived. Figure 4 shows that among the 239 users in the top list, only 10 users can be seen across all three measures of influence.


Figure 4. Venn diagram of top influentials across measures.
Figure 5 below showsย a correlation matrix which represents how a user's rank varies across the three different measures of influence. The correlation matrix represents the strength of the association betweenย a pair of rankings. This matrix was derived by comparing the relative influence ranks of all 96,613 users in the database.


Figure 5. Correlation plot across all influence measures.
The users show a strong correlation in their retweet influence and mention influence. The low correlation of the indegree measureย across the other two measures show that indegree ranking may not be related to the other rankings.
A couple of conclusions can be derived from the correlation plot. First,ย we can sayย that in most cases, users who are retweeted often are also mentioned often, and vice versa. Another one is that the most followed user may not be the most engaging user in the group.ย A user's popularity, therefore, is a weak representation of theย ability to motivate the spread of information.
Retweets and mentions have direction.ย A retweet is the path of an idea from User Aย to User B. User A broadcast a tweet which was read by User B. User B, thought it was worth sharing and retweeted it. This retweet will eventually be seen by users not directly accessible to User A. Whenย User A mentions User B, this is again a linkย from User A to User B. ย With this in mind, weย have enough data toย convert our twitter streamย into a directed network graph. All users will be aย node in our graph and all directed links will be edges. The igraph library will be used to extract information from the resulting network graph.
A quick look at the resulting network graph for the whole stream shows that we were able to create a graph with 96,613 nodes and 168, 519 edges. Because of this size, the resulting network graph will not be shown. This is because of the amount of time and computational effort needed to come up with a plot. It would most likely beย a crowded mess of dots and lines anyway. However, we can still extract some information from the graph object.
The density of a network object is the proportionย of present edges from all possible edges in the network. Our present graph has a density ofย 2.799118e-05. A very low density would mean that there is a very low interaction between ourย users.
The diameter of a network graph is theย length of the longest path across unique nodes and edges. Considering the direction of the links, theย diameter of our network isย 14. Thisย means thatย we are able to trace an unbroken pathย across 15 users.
The hubs and authority algorithm was developed byย Jon Kleinberg to examine the relevance of a web page's content.ย He categorized pages into hubs and authority pages. Hubs,ย which have more outgoing links are the internet's catalog. This is similar to the early days of Yahoo where it touted itself as the internet's yellow pages. Authority pages have more incoming links presumably because of their high-quality content.ย Translated to twitter activity, hub pages would fit the description ofย a user with high retweet influence and authority pages would be similar to a twitter user who hasย high mention influence.
The hub score and authority score of the network graph was derived using a simple igraph function call. The resulting top hub score went to "markbspiegel" while the topย authority score went to "Benzinga". This is in contrast to the ranking tables where the top retweet and mention belong to "philstockworld" and"jimcramer" respectively.
To find out where the discrepancy came from, each node were investigated. Although it showed that "markbspiegel" had more unique edgesย than "philstockworld" if we consider and sumย the weight of each unique edge, philstockworld still beats markbspiegel. The same is observed when looking at the edges ofย "Benzinga" and "jimcramer".ย The discrepancy is consistent with how web pages are rated wherein the number of links matter more over the number of times each link was activated.ย The hub and authority score also does not take into account the weight characteristics of the nodes.
To see an actual network graph, we narrow down our selection to a twitter stream of users tweeting aboutย CA Technologies.
Table 2 shows us the resulting top influentials derived from our ranking method. The first user to cross the three influence categories is "Benzinga".ย
Table 2. Top influentials of the CA stream.
Rank | TopIndegree | TopRT | TopMentions |
1 | CBOE | WrigleyTom | sam_miller00 |
2 | InvestorIdeas | ppprophet | diggingplatinum |
3 | 247WallSt | TradeZer0 | AlertTrade |
4 | androsForm | LMTentarelli | LMTentarelli |
5 | jjjinvesting | eWhispers | AdaptToReality |
6 | AlertTrade | PersonsPlanet | Opinterest |
7 | DirectorsTalk | Boursier_com | eWhispers |
8 | PENNYBUSTER1 | bored2tears | Le_Revenu |
9 | scottrade | pnoytrader | TransitoOK |
10 | MorningstarInc | crosshairtrader | DozenStocks |
11 | Quaikey | UTradePH | Benzinga |
12 | PersonsPlanet | OpenOutcrier | leahanneta |
13 | Benzinga | quack1612 | jascapital1 |
14 | airtransat | MorningstarInc | Jascapitalforex |
15 | traderstewie | SleekMoneycom | XFenaux |
16 | OptionAlert | App_sw_ | AmericanBanking |
17 | KimAuclair | DividendSheet | ConsumerFeed |
18 | stt2318 | InvestirFr | SleekMoneycom |
19 | AltruistWealth | iviewmarkets | desota |
20 | MarketCurrents | ChinaInvest | dailypoliticaln |
21 | selfmade_harris | 1MinuteStock | saidjarrah |
22 | jfahmy | ACInvestorBlog | Boursier_com |
23 | daytradingninja | Benzinga | TickerReport |
The resulting network graph of this smaller twitter stream comes up with 431 nodes and 131 edges.ย
There is comparatively more interaction between users compared to our initial network object with the density clocking in atย 0.0009550531. The diameter is shorter with just 9 hops across 10 nodes.
The resulting hub and authority score show a more consistent result with the ranking tables because theย actual number of retweets and mentions were low.ย This time, theย number of unique edges were not significantly lower than the total weight of theย edges.
Figure 7 and 8 show the network graphs with the nodes adjusted based on the hub and authority score. The higher the score, the bigger the node size.


Figure 7. Closeup of network graph with node sizes adjusted based on hub score.


Figure 8. Closeup of network graph with node sizes adjusted based on authority score.
The fractional ranking method is found to be a more realistic measure of a twitter user's influence. The frequency of interactions between users must be considered in measuring influence, even ifย it is among a usual set of audience. This just means thatย the user is consistent in producing high-quality content that has pass-along value.
For smaller networks, the network graph method may yield additional information that can't be derived from fractional ranking. The key would be to check whether the ratio of the number of edges to the total edge weight is close to 1. Theย discrepancy between the ranking method and the network graph is expected to be greater when this ratio approachesย zero.
References:
Celli, F., Di Lascio, F., Magnani, M., Pacelli, B., Rossi, L. 2009. Social Network Data and Practices: the case of Friendfeed.
Cha, M., Haddadi, H., Benevenuto, F., and Gummadi, K. 2010. Measuring User Influence in Twitter: The Million Follower Fallacy.
Ognyanova, K. 2016. Network Analysis and Visualization with R and igraph.