Scraping Instagram for Hashtags

Avatar
Posted on Dec 3, 2017

 

On Instagram, I have an account where I share pictures and/or videos related to my yoga practice. Lately, I have been thinking about how to acquire more followers.
I was interested in knowing what hashtags my favorite Instagram yoga teachers use. Typically, on any given day there is a trend going on in the yoga Instagram community. Some days you may find yogis posting headstands and other days handstands or balancing poses. Normally I would visit my favorite yogis’ pages to see the trending yoga pose of the day/week and collect their hashtags from their most recent posts. In turn, I would use these hashtags on my own posts in the hopes of getting more views and likes. Then I thought to myself, why not automate it?

I decided to scrape the last 7 posts(pictures) provided by dylanwerneryoga(Dylan), seanphelpsyoga(Sean), and kevindhofer(Kevin).

 

 

Technologies Used:

Selenium

Python

R Studio

WordCloud2

 

The Process

First, I had to automate signing into my account (you cannot see posts without having an Ig handle). Then I found my way to the yogis’ page by using the Xpath of the search field and created ActionChain to type and click on the yogi handle I wanted. Afterwards, using an Xpath and another ActionChain, I was able to click on the most recent post.  However, when I reached the desired page, the hashtags were not readily available to get scraped. They were not visible on the page.  The reason being is that many teachers comment-in the hashtags on their own post under the caption as opposed to having the hashtags being a part of it. They do this because they want people to focus solely on the content in their caption. As a result, their hashtag comments “disappear” once followers start commenting on the post as well. Consequently, the only way to see the hashtag comment is to load all the comments for that given post.

 

 

 

To tackle this problem, I created a While loop that would click on the "load more comments" button until it disappeared. Next, I scraped the hashtags from the yogis' comments and saved them into a csv file to be later imported into R Studio.

 

Exploratory Data Analysis

Total likes and average likes per post.

 

Likes per day for the 7 most posts.

 

Dylan used 76 hashtags which 41 of them are unique. On average Dylan used 11 hashtags per post. His favorite are #yoga(7), #mensyoga(7) ,#yogainspiration(7) , and #yogachallenge(5).

 

 

Sean used 78 hashtags which 49 of them were unique. On average Sean used 12 hashtags per post. His favorite are #yogatips (8),  #yogahelp(4), #yogafit(4), and  #yogabeginners(3). An examining his hashtag choices shows that he is targeting practitioners who are knew to yoga. In fact, Sean has recently released an online training program.

 

 

 

Kevin used 163 hashtags which 84 of them were unique. On average Sean used 12 hashtags per post. His favorite are #portugal (11), #yoga(7), #yogainspiration(7), and  #instayoga(6). Surveying his tags Kevin’s highlight his tags focuses on yoga postures and place/environment of the post itself.

 

 

In total, 21 photos provided 317 hashtags which 156 of them being unique.

 

Problems

My program runs only if the yogis’ hashtag comment is the first comment on a given post. If not, I would have to change the Xpath according to the index of the comment I am searching for. The algorithm can also be used to scrape captions as well. However, if emojis are used in the caption an error will appear.

Results

#Yogainspritation, #yoga, and #menyoga are the most widely used hashtags my favorite yogis used in their most recent post. I will plan to use these tags and algorithm to inform me which tags to use on next my 21 posts. I will update this blog when complete to show results.

 

Code

https://gist.github.com/JosephMata/7c3bac580234a17102c2d3ba19822cd2

 

 

 

About Author

Related Articles

Leave a Comment

Avatar
Hashtags Generator June 8, 2019
I use https://gettags.info/en/ it helps me to make sets of hashtags

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

2019 airbnb alumni Alumni Interview Alumni Spotlight alumni story Alumnus API artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Big Data bootcamp Bootcamp Prep Bundles California Cancer Research capstone Career citibike clustering Coding Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Industry Experts Job JP Morgan Chase Kaggle lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Open Data painter pandas Portfolio Development prediction Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest recommendation recommendation system regression Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Tableau Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping What to expect word cloud word2vec XGBoost yelp