Web Scraping the Apple Mac App Store

Drucila Lefevre
Posted on Dec 2, 2019

There has been an increase in the use of Apple devices in the workplace. From tech companies to graphic and design fields, Apple devices have increased in productivity and has made the workload easier for employees. In 1982, Apple's former CEO, Steve Jobs brought 19% of Adobes' shares which formed as a resource for individuals who work and study in the design field. Today, Apple has created thousands of apps that serve as a resourceful tool for individuals who are in professional fields. Todays' laptop market has been partitioned by both Windows and Apple. The Mac App Store provides users with a variety of apps that can be beneficial to them in their professional filed.  As a start in enhancing my knowledge in the market research field, I explored seven categories ( Education, Business, Medical, Photography, Graphics & Design, Music, Video)  in the Mac App Store. The objective of this web scraping project was to visualize which category will continue to improve this market.

Tools: Scrapy using Python

Using Scrapy, I scraped and iterated through each feature needed and formatted the information extracted into a dataset.

Source Code

Information Extracted: 

  • Name
  • Size (GB, KB, MB)
  • App Category
  • Languages
  • App Rating (0-5)
  • Price

Cleaning: 

Using the tools provided from the pandas' library in python, I was able to create  functions that were able  to clean the raw dataset.

Data Analysis :

 

How many apps are in each category? 

Educational apps tend to be the most predominant. Apple sells their Laptops/Ipad at a discounted rate for current students. 

Size (GB, MB, KB) :

Which category consumes the most computational power (GB)?

  • Graphic Design consumes the most computational power of apps, while business applications consume the least computational power.

Let's continue to explore a few of the categories. 

  • Each of the seven categories contains applications that vary within size.
  • Based on users’ memory capacity, they are able to choose which application is best for them to use.

Price (Free / Paid): 

Paid Apps

  • Majority of applications are not free.
  • I picked the top ten most expensive apps in each category and compared their price ranges to the others.

 

  • Majority of the paid apps are graphic and design applications. It seems to be that the applications that consumed the most computational power were not free.
  • These apps are also the most expensive ones. 

Free Apps:

  • We all love free items. Majority of educational applications are free.  Individuals are offered free resources to enhance their knowledge in their educational field. 

Ratings: 

  • Now that we have looked at the prices and memory size,  the rating is also essential in deciding what types of applications are worth to buy and download.
  • Applications within the 7 categories contain more highly rated applications except for business applications. 

 

Summary: 

In conclusion, users within these seven professional fields have a variety of applications to choose from. Graphics and Design appeared to be the category that consumes the most memory power, tends to be the most expensive, and contains an average rating.

On the other hand, educational applications are at little to no cost and consume less computational power. After visualizing all of this information, it is fair to predict that both education and graphics & design applications are benefiting the apple market. As new applications are formed, an increase in profits is likely to occur. 

Future Work: 

  1. Scraping the total ratings
  2. Scraping the Windows Application store and comparing its features to Apple

 

 

 

About Author

Drucila Lefevre

Drucila Lefevre

Drucila holds a Masters' degree in Psychology with a concentration in Applied Statistics from Columbia University. Drucila discovered her passion in data through research projects during her employment, where she worked as an research analyst and analyzed neuro...
View all posts by Drucila Lefevre >

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Classes Demo Day Demo Lesson Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet Lectures linear regression Live Chat Live Online Bootcamp Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Lectures Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking Realtime Interaction recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp