Scraping Instagram for hashtags

Posted on December 3, 2017

 

On Instagram, I have an account where I share pictures and/or videos related to my yoga practice.  Lately, I have been thinking about how to acquire more followers.

I was interested in knowing what hashtags my favorite Instagram yoga teachers use. Typically, on any given day there is a trend going on in the yoga Instagram community. Some days you may find yogis posting headstands and other days handstands or balancing poses. Normally I would visit my favorite yogis’ pages to see the trending yoga pose of the day and collect their hashtags from their most recent post. In turn, I would use these hashtags on my own posts in hopes of getting more views and likes. Then I thought to myself, why not automate it?

Technologies Used:

Selenium

Python

R Studio

WordCloud

The Process

First I had to automate signing into my account(you cannot see posts without having an ig handle). Then I found my way to the yogis’ page by using the Xpath of the search field and created ActionChain to type and click on the yogi handle I wanted. Afterwards, using an Xpath and another ActionChain, I was able to click on the most recent post.  However, when I reached the desired page, the hashtags were not readily available to get scraped. They were not visible on the page.  The reason being is that many teachers comment-in the hashtags on their own post under the caption as opposed having the hashtags being a part of it. They do this because they want to people to focus on their solely on the content in their caption. As a result, their hashtag comments “disappear“ once followers start commenting on the post as well. Consequently, the only way to see the hashtag comment is to load all the comments for that given post.

 

 

To tackle this problem, I created a While loop that would click on the "load more comments" button until it disappeared. Next, I scraped the hashtags from the yogis' comments and saved them into a csv file to be later imported into R Studio.  After removing the ‘#’ symbol, I created a WordCloud to showcase which hashtag is trending the most.

Scraping 7 of my favorite yogis resulted in acquiring 148 hashtags.

 

 

 

Problems

My program runs only if the yogis’ hashtag comment is the first comment on a given post. If not, I would have to change the Xpath according to the index of the comment I am searching for. The algorithm can also be used to scrape captions as well. However, if emojis are used in the caption an error will come up.

Results

#Yogainspritation, #yoga, and #yogaeverydamnday are the most widely used hashtags my favorite yogis used in their most recent post. I will plan to use these tags and algorithm to inform me which tags to use on next my 7 posts. I will update this blog when complete to show results.

 

Code

https://gist.github.com/JosephMata/7c3bac580234a17102c2d3ba19822cd2

 

 

 


About Author

Joseph Mata

Read more

Leave Responses

Your email address will not be published. Required fields are marked *

No comments found.