Web Scraping Data on the Apple Mac App Store

Posted on Dec 2, 2019
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

There has been an increase in the use of Apple devices in the workplace. From tech companies to graphic and design fields, data shows Apple devices have increased in productivity and has made the workload easier for employees. In 1982, Apple's former CEO, Steve Jobs brought 19% of Adobes' shares which formed as a resource for individuals who work and study in the design field. Today, Apple has created thousands of apps that serve as a resourceful tool for individuals who are in professional fields.

Todays' laptop market has been partitioned by both Windows and Apple. The Mac App Store provides users with a variety of apps that can be beneficial to them in their professional fieled.  As a start in enhancing my knowledge in the market research field, I explored seven categories ( Education, Business, Medical, Photography, Graphics & Design, Music, Video)  in the Mac App Store. The objective of this web scraping project was to visualize which category will continue to improve this market.

Tools: Scrapy using Python

Using Scrapy, I scraped and iterated through each feature needed and formatted the information extracted into a dataset.

Source Code

Web Scraping Data on the Apple Mac App Store

Information and Data Extracted: 

  • Name
  • Size (GB, KB, MB)
  • App Category
  • Languages
  • App Rating (0-5)
  • Price

Cleaning: 

Using the tools provided from the pandas' library in python, I was able to create  functions that were able  to clean the raw dataset.

Data Analysis :

 

How many apps are in each category? 

Web Scraping Data on the Apple Mac App Store

Educational apps tend to be the most predominant. Apple sells their Laptops/Ipad at a discounted rate for current students. 

Size (GB, MB, KB) :

Which category consumes the most computational power (GB)?

Web Scraping Data on the Apple Mac App Store

  • Graphic Design consumes the most computational power of apps, while business applications consume the least computational power.

Let's continue to explore a few of the categories. 

  • Each of the seven categories contains applications that vary within size.
  • Based on users’ memory capacity, they are able to choose which application is best for them to use.

Price (Free / Paid) Data: 

Paid Apps

Web Scraping Data on the Apple Mac App Store

  • Majority of applications are not free.
  • I picked the top ten most expensive apps in each category and compared their price ranges to the others.

 

  • Majority of the paid apps are graphic and design applications. It seems to be that the applications that consumed the most computational power were not free.
  • These apps are also the most expensive ones. 

Free Apps:

  • We all love free items. Majority of educational applications are free.  Individuals are offered free resources to enhance their knowledge in their educational field. 

Ratings: 

  • Now that we have looked at the prices and memory size,  the rating is also essential in deciding what types of applications are worth to buy and download.
  • Applications within the 7 categories contain more highly rated applications except for business applications. 

 

Summary: 

In conclusion, users within these seven professional fields have a variety of applications to choose from. Graphics and Design appeared to be the category that consumes the most memory power, tends to be the most expensive, and contains an average rating.

On the other hand, educational applications are at little to no cost and consume less computational power. After visualizing all of this information, it is fair to predict that both education and graphics & design applications are benefiting the apple market. As new applications are formed, an increase in profits is likely to occur. 

Future Work: 

  1. Scraping the total ratings
  2. Scraping the Windows Application store and comparing its features to Apple

 

 

 

About Author

Drucila Lefevre

Drucila holds a Masters' degree in Psychology with a concentration in Applied Statistics from Columbia University. Drucila discovered her passion in data through research projects during her employment, where she worked as an research analyst and analyzed neuro...
View all posts by Drucila Lefevre >

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI