Yoga Retreat Worldwide, Scraping Data From Web
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
LinkedIn | GitHub | Email | Data | Web App

Background
This project is aimed at Scraping Data from randomly selected website and conduct basic analysis on it from business point of view.
- Chosen website is: https://www.bookyogaretreats.com/
- Theme: yoga
- Retreats
- Vacation
- Worldwide
- 7447 retreats
Question:
Provide some insights for sb who is planning on setting up a yoga retreat.
Website Struc
ture
Website Structure

12 listings/page
621 pages

Scraping Tool: Scrapy


The collected data consists 7447 observations with 17 features each.
DEA
Language Distribution:


Popularity by Country

- India: 19%
- Indonesia: 11%
- Spain: 9%
- Thailand: 6%
- Portugal: 6%
- USA: 5% Italy:
- 5% Costa Rica: 4%
Participants Review Rating Distribution

Popularity by Country
Global general performance is quite good. •5.5: 27% •4.5: 39% •4.0: 28% •3.5: 9% •91% of ratings are 4 and 4 above •No lower than 3.5 ratings.
Participants Review Rating by Country

Skill Levels

•All level: 58%
•Beginner and Intermedia: 23%
•Beginner: 11%
•Intermediate and Advanced: 3%
Yoga Style Popularity


Popularity of Lengths of the stay


2~8 days length are the most adopted program
Possible answers for the new yoga retreat business:
- Instruction: English
- Skill level: all level
- Yoga style: consider the Vinyasa, Astanga etc.
- Length of program: 5-8 days
- Location: along the coastal line
- Third party – expert tip
Palm Beach/Miami, Florida, America Should be my first choice.
Improvement:
Use Selenium to scrap and retain:
- Price
- Customer Reviews (detail)
- Traveller (single or family)
With the detail reviews, we can use NLP to receive more information on feedback. Also, we can discover more travel pattern which will help on the new business set up.
Palm Beach/Miami, Florida, America Should be my first choice.