Scraping Instagram for Hashtags
On Instagram, I have an account where I share pictures and/or videos related to my yoga practice. Lately, I have been thinking about how to acquire more followers.
I was interested in knowing what hashtags my favorite Instagram yoga teachers use. Typically, on any given day there is a trend going on in the yoga Instagram community. Some days you may find yogis posting headstands and other days handstands or balancing poses. Normally I would visit my favorite yogis’ pages to see the trending yoga pose of the day/week and collect their hashtags from their most recent posts. In turn, I would use these hashtags on my own posts in the hopes of getting more views and likes. Then I thought to myself, why not automate it?
I decided to scrape the last 7 posts(pictures) provided by dylanwerneryoga(Dylan), seanphelpsyoga(Sean), and kevindhofer(Kevin).
First, I had to automate signing into my account (you cannot see posts without having an Ig handle). Then I found my way to the yogis’ page by using the Xpath of the search field and created ActionChain to type and click on the yogi handle I wanted. Afterwards, using an Xpath and another ActionChain, I was able to click on the most recent post. However, when I reached the desired page, the hashtags were not readily available to get scraped. They were not visible on the page. The reason being is that many teachers comment-in the hashtags on their own post under the caption as opposed to having the hashtags being a part of it. They do this because they want people to focus solely on the content in their caption. As a result, their hashtag comments “disappear” once followers start commenting on the post as well. Consequently, the only way to see the hashtag comment is to load all the comments for that given post.
To tackle this problem, I created a While loop that would click on the "load more comments" button until it disappeared. Next, I scraped the hashtags from the yogis' comments and saved them into a csv file to be later imported into R Studio.
Exploratory Data Analysis
Total likes and average likes per post.
Likes per day for the 7 most posts.
Dylan used 76 hashtags which 41 of them are unique. On average Dylan used 11 hashtags per post. His favorite are #yoga(7), #mensyoga(7) ,#yogainspiration(7) , and #yogachallenge(5).
Sean used 78 hashtags which 49 of them were unique. On average Sean used 12 hashtags per post. His favorite are #yogatips (8), #yogahelp(4), #yogafit(4), and #yogabeginners(3). An examining his hashtag choices shows that he is targeting practitioners who are knew to yoga. In fact, Sean has recently released an online training program.
Kevin used 163 hashtags which 84 of them were unique. On average Sean used 12 hashtags per post. His favorite are #portugal (11), #yoga(7), #yogainspiration(7), and #instayoga(6). Surveying his tags Kevin’s highlight his tags focuses on yoga postures and place/environment of the post itself.
In total, 21 photos provided 317 hashtags which 156 of them being unique.
My program runs only if the yogis’ hashtag comment is the first comment on a given post. If not, I would have to change the Xpath according to the index of the comment I am searching for. The algorithm can also be used to scrape captions as well. However, if emojis are used in the caption an error will appear.
#Yogainspritation, #yoga, and #menyoga are the most widely used hashtags my favorite yogis used in their most recent post. I will plan to use these tags and algorithm to inform me which tags to use on next my 21 posts. I will update this blog when complete to show results.