Scraping Instagram for Hashtags

Posted on Dec 3, 2017

On Instagram, I have an account where I share pictures and/or videos related to my yoga practice. Lately, I have been thinking about how to acquire more followers.
I was interested in knowing what hashtags my favorite Instagram yoga teachers use. Typically, on any given day there is a trend going on in the yoga Instagram community. Some days you may find yogis posting headstands and other days handstands or balancing poses. Normally I would visit my favorite yogis’ pages to see the trending yoga pose of the day/week and collect their hashtags from their most recent posts. In turn, I would use these hashtags on my own posts in the hopes of getting more views and likes. Then I thought to myself, why not automate it?

I decided to scrape the last 7 posts(pictures) provided by dylanwerneryoga(Dylan), seanphelpsyoga(Sean), and kevindhofer(Kevin).

Technologies Used:



R Studio


The Process

First, I had to automate signing into my account (you cannot see posts without having an Ig handle). Then I found my way to the yogis’ page by using the Xpath of the search field and created ActionChain to type and click on the yogi handle I wanted. Afterward, using an Xpath and another ActionChain, I was able to click on the most recent post.  However, when I reached the desired page, the hashtags were not readily available to get scraped. They were not visible on the page.  The reason being is that many teachers comment-in the hashtags on their own post under the caption as opposed to having the hashtags being a part of it. They do this because they want people to focus solely on the content in their caption. As a result, their hashtag comments “disappear” once followers start commenting on the post as well. Consequently, the only way to see the hashtag comment is to load all the comments for that given post.

To tackle this problem, I created a While loop that would click on the "load more comments" button until it disappeared. Next, I scraped the hashtags from the yogis' comments and saved them into a csv file to be later imported into R Studio.

Exploratory Data Analysis

Total likes and average likes per post.

Likes per day for the 7 most posts.

Dylan used 76 hashtags which 41 of them are unique. On average Dylan used 11 hashtags per post. His favorite are #yoga(7), #mensyoga(7) ,#yogainspiration(7) , and #yogachallenge(5).

Sean used 78 hashtags which 49 of them were unique. On average Sean used 12 hashtags per post. His favorite are #yogatips (8),  #yogahelp(4), #yogafit(4), and  #yogabeginners(3). An examining his hashtag choices shows that he is targeting practitioners who are new to yoga. In fact, Sean has recently released an online training program.

Kevin used 163 hashtags which 84 of them were unique. On average Sean used 12 hashtags per post. His favorite are #portugal (11), #yoga(7), #yogainspiration(7), and  #instayoga(6). Surveying his tags, Kevin focuses on yoga postures and the place/environment of the post itself.

In total, 21 photos provided 317 hashtags which 156 of them being unique.


My program runs only if the yogis’ hashtag comment is the first comment on a given post. If not, I would have to change the Xpath according to the index of the comment I am searching for. The algorithm can also be used to scrape captions as well. However, if emojis are used in the caption an error will appear.


#Yogainspritation, #yoga and #menyoga are the most widely used hashtags my favorite yogis used in their most recent post. I will plan to use these tags and algorithms to inform me which tags to use on next my 21 posts. I will update this blog when complete to show results.


About Author

Related Articles

Leave a Comment

Hashtags Generator June 8, 2019
I use it helps me to make sets of hashtags

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp