Finding Influencers on Twitter
Have you been followed on Twitter or Instagram by someone you don't know?Β I get this a lot.Β And so to avoid being thought of as rude, I follow back. Eventually, I got tired of following back when I realized that some of these accounts don't really doΒ anything but collect followers. Now, why would anyone go through all the trouble of following people in the hopes of being followed back? Why would anyone waste so much time on the internet for this?
I eventually realized the answer when I sawΒ thatΒ most of these accounts wereΒ not personal. A lot of these accounts I encountered were about food, some aboutΒ beach vacations,Β and on some occasion accounts with risque content.
Advertising has infiltrated the social network.Β It used to be just ads on banners but now companies hire personalities on social media to spread the word about their product or event. Companies spend big bucks on celebritiesΒ in an effort to publicize their brand and attract a celebrity's fan base. A sponsored tweet could net as much as $13,000 as was the case for Kloe Kardashian in 2013.Β
Celebrities have multitudes of followers and get paid big bucks by sponsors. So peopleΒ may have thought that creating accounts and amassing followers would eventually get them sponsorship deals with advertisers. In this exercise, we see that sponsors might be looking for some other things other than the number of followers.
In a social network, a link could represent a relationship asΒ in Facebook or the passing of a tweetΒ as in Twitter. These links determine the flow of information and are therefore a good indicatorΒ of a user's influence. I will be presenting two methods of finding potential influencers in a network. One would beΒ by extractingΒ a user's influence measuresΒ and the other is byΒ using network graphs.
A large database was found on Followthehashtag.com. The database contained a stream of tweets related to NASDAQ 100 stocks extracted from twitter forΒ 79 days, fromΒ 2016 March 28th to 2016 June 15th. This was selected because of a good mix of accounts representing organizations and personalities.Β The database also containedΒ information about how many times a tweet was passed alongΒ and who the original tweet came from. This act, more popularly known as retweeting can be identified in the stream as tweets having 'RT @user' or 'via @ user' at the beginning of the tweet. The streamΒ also contained information about mentions. In twitter, a mention isΒ a public conversation between users. A user calls the attention ofΒ another user by mentioning them in a tweet. Mentioning is identified by tweets beginning with '@user'.
The influence measures extracted from the stream were the following: indegree, retweet, and mentions. TheseΒ measures wereΒ selected because of how they affect theΒ flow of information in the network.Β Indegree measures the user's popularity. This wasΒ easily extracted from the database by the number of followers a user has. The number of followers shows us the size of the user's audience base. Retweet influence represents a user's ability to create content which other users find worthy of sharing. When a tweet is shared by another user, a bigger network of users is exposed to the tweet.Β From the stream, this was extracted by counting the number of retweeted messages for each user. The third measure, mention influence, was extracted by counting the number of mentions containing the user's name. This influence measure indicates the ability of the user to engage others in a conversation. This represents the top-of-mind value of the user's name.
A total of 96,613 users tweeted about NASDAQ 100 stocks during the timeframe. Between them, over 680 thousand tweets were broadcast.Β AΒ word cloud of the NASDAQ symbols most often mentioned shows that Apple, represented by AAPL, was the most tweeted stock among the group.Β
Users were most active on April 27 where they broadcast over 20,800Β tweets. This coincidesΒ with the day when AAPL stocks slumpedΒ following speculations that iPhone salesΒ may decline by as much as 60 million units compared to the same quarter a year ago. The slump in Apple shares dragged the tech-heavy NASDAQ into the red by the day's end.
Users' activity on this day showed that activity was mostly during trading market hours which is 13:30 to 20:30 UTC.
Each user'sΒ ranking over the three influence categories wasΒ assigned by using fractional ranking.Β For example, inΒ assigning the indegree ranking, a rank of 1 was given to the user with the most number of followers. Users with the same number of followers receive theΒ same ranking number, which is the mean of what they would have under ordinal rankings.Β Table 1 shows the top 30 users across the three influence measures. Notice that minimal overlap can be seen across each influence rank. The first user to show up across all three measures of influence wasΒ "WSJ".
Table 1. Top influentials based on indegree, retweets, and mentions
Rank | TopIndegree | TopRT | TopMentions |
1 | cnnbrk | philstockworld | jimcramer |
2 | nytimes | StocksHighAlert | CNBC |
3 | CNN | ValaAfshar | AlertTrade |
4 | Reuters | YahooFinance | WSJ |
5 | WSJ | BK_Stocks | Benzinga |
6 | Forbes | businessinsider | YahooFinance |
7 | AP | StockTwits | CNBCFastMoney |
8 | DRJAMESCABOT | timothysykes | TheStreet |
9 | GMA | CNNMoney | autumnalcity87 |
10 | MarketWatch | devonshiretech | carlquintanilla |
11 | JohnLegere | CNBCnow | HalftimeReport |
12 | USATODAY | ppprophet | markbspiegel |
13 | CNBC | carlquintanilla | RiskReversal |
14 | ForbesTech | Stockology101 | petenajarian |
15 | FortuneMagazine | Benzinga | TTtradertwit |
16 | timoreilly | OpenOutcrier | GerberKawasaki |
17 | rsAnakin_FBGx20 | marketexclusive | barronsonline |
18 | philstockworld | DayTradersGroup | ReformedBroker |
19 | dickc | theflynews | Reuters |
20 | ReutersBiz | TakeFlightSales | GuyAdami |
21 | businessinsider | WrigleyTom | StockTwits |
22 | om | TweakTown | RedDogT3 |
23 | Yahoo | markbspiegel | jonnajarian |
24 | SAI | SAI | cek_cpa |
25 | globeandmail | WSJ | JustinPulitzer |
26 | Variety | CenterTrading | ryanwallace198 |
27 | VH1 | CBOE | DougKass |
28 | CNNMoney | davidmoadel | technology |
29 | WSJbusiness | GerberKawasaki | BossHoggHazzard |
30 | CNET | options_answers | SquawkStreet |
To see how much users overlap across the three categories, a Venn diagram of the top 100 users was derived. Figure 4 shows that among the 239 users in the top list, only 10 users can be seen across all three measures of influence.
Figure 5 below showsΒ a correlation matrix which represents how a user's rank varies across the three different measures of influence. The correlation matrix represents the strength of the association betweenΒ a pair of rankings. This matrix was derived by comparing the relative influence ranks of all 96,613 users in the database.
The users show a strong correlation in their retweet influence and mention influence. The low correlation of the indegree measureΒ across the other two measures show that indegree ranking may not be related to the other rankings.
A couple of conclusions can be derived from the correlation plot. First,Β we can sayΒ that in most cases, users who are retweeted often are also mentioned often, and vice versa. Another one is that the most followed user may not be the most engaging user in the group.Β A user's popularity, therefore, is a weak representation of theΒ ability to motivate the spread of information.
Retweets and mentions have direction.Β A retweet is the path of an idea from User AΒ to User B. User A broadcast a tweet which was read by User B. User B, thought it was worth sharing and retweeted it. This retweet will eventually be seen by users not directly accessible to User A. WhenΒ User A mentions User B, this is again a linkΒ from User A to User B. Β With this in mind, weΒ have enough data toΒ convert our twitter streamΒ into a directed network graph. All users will be aΒ node in our graph and all directed links will be edges. The igraph library will be used to extract information from the resulting network graph.
A quick look at the resulting network graph for the whole stream shows that we were able to create a graph with 96,613 nodes and 168, 519 edges. Because of this size, the resulting network graph will not be shown. This is because of the amount of time and computational effort needed to come up with a plot. It would most likely beΒ a crowded mess of dots and lines anyway. However, we can still extract some information from the graph object.
The density of a network object is the proportionΒ of present edges from all possible edges in the network. Our present graph has a density ofΒ 2.799118e-05. A very low density would mean that there is a very low interaction between ourΒ users.
The diameter of a network graph is theΒ length of the longest path across unique nodes and edges. Considering the direction of the links, theΒ diameter of our network isΒ 14. ThisΒ means thatΒ we are able to trace an unbroken pathΒ across 15 users.
The hubs and authority algorithm was developed byΒ Jon Kleinberg to examine the relevance of a web page's content.Β He categorized pages into hubs and authority pages. Hubs,Β which have more outgoing links are the internet's catalog. This is similar to the early days of Yahoo where it touted itself as the internet's yellow pages. Authority pages have more incoming links presumably because of their high-quality content.Β Translated to twitter activity, hub pages would fit the description ofΒ a user with high retweet influence and authority pages would be similar to a twitter user who hasΒ high mention influence.
The hub score and authority score of the network graph was derived using a simple igraph function call. The resulting top hub score went to "markbspiegel" while the topΒ authority score went to "Benzinga". This is in contrast to the ranking tables where the top retweet and mention belong to "philstockworld" and"jimcramer" respectively.
To find out where the discrepancy came from, each node were investigated. Although it showed that "markbspiegel" had more unique edgesΒ than "philstockworld" if we consider and sumΒ the weight of each unique edge, philstockworld still beats markbspiegel. The same is observed when looking at the edges ofΒ "Benzinga" and "jimcramer".Β The discrepancy is consistent with how web pages are rated wherein the number of links matter more over the number of times each link was activated.Β The hub and authority score also does not take into account the weight characteristics of the nodes.
To see an actual network graph, we narrow down our selection to a twitter stream of users tweeting aboutΒ CA Technologies.
Table 2 shows us the resulting top influentials derived from our ranking method. The first user to cross the three influence categories is "Benzinga".Β
Table 2. Top influentials of the CA stream.
Rank | TopIndegree | TopRT | TopMentions |
1 | CBOE | WrigleyTom | sam_miller00 |
2 | InvestorIdeas | ppprophet | diggingplatinum |
3 | 247WallSt | TradeZer0 | AlertTrade |
4 | androsForm | LMTentarelli | LMTentarelli |
5 | jjjinvesting | eWhispers | AdaptToReality |
6 | AlertTrade | PersonsPlanet | Opinterest |
7 | DirectorsTalk | Boursier_com | eWhispers |
8 | PENNYBUSTER1 | bored2tears | Le_Revenu |
9 | scottrade | pnoytrader | TransitoOK |
10 | MorningstarInc | crosshairtrader | DozenStocks |
11 | Quaikey | UTradePH | Benzinga |
12 | PersonsPlanet | OpenOutcrier | leahanneta |
13 | Benzinga | quack1612 | jascapital1 |
14 | airtransat | MorningstarInc | Jascapitalforex |
15 | traderstewie | SleekMoneycom | XFenaux |
16 | OptionAlert | App_sw_ | AmericanBanking |
17 | KimAuclair | DividendSheet | ConsumerFeed |
18 | stt2318 | InvestirFr | SleekMoneycom |
19 | AltruistWealth | iviewmarkets | desota |
20 | MarketCurrents | ChinaInvest | dailypoliticaln |
21 | selfmade_harris | 1MinuteStock | saidjarrah |
22 | jfahmy | ACInvestorBlog | Boursier_com |
23 | daytradingninja | Benzinga | TickerReport |
The resulting network graph of this smaller twitter stream comes up with 431 nodes and 131 edges.Β
There is comparatively more interaction between users compared to our initial network object with the density clocking in atΒ 0.0009550531. The diameter is shorter with just 9 hops across 10 nodes.
The resulting hub and authority score show a more consistent result with the ranking tables because theΒ actual number of retweets and mentions were low.Β This time, theΒ number of unique edges were not significantly lower than the total weight of theΒ edges.
Figure 7 and 8 show the network graphs with the nodes adjusted based on the hub and authority score. The higher the score, the bigger the node size.
The fractional ranking method is found to be a more realistic measure of a twitter user's influence. The frequency of interactions between users must be considered in measuring influence, even ifΒ it is among a usual set of audience. This just means thatΒ the user is consistent in producing high-quality content that has pass-along value.
For smaller networks, the network graph method may yield additional information that can't be derived from fractional ranking. The key would be to check whether the ratio of the number of edges to the total edge weight is close to 1. TheΒ discrepancy between the ranking method and the network graph is expected to be greater when this ratio approachesΒ zero.
References:
Celli, F., Di Lascio, F., Magnani, M., Pacelli, B., Rossi, L. 2009. Social Network Data and Practices: the case of Friendfeed.
Cha, M., Haddadi, H., Benevenuto, F., and Gummadi, K. 2010. Measuring User Influence in Twitter: The Million Follower Fallacy.
Ognyanova, K. 2016. Network Analysis and Visualization with R and igraph.