Google Play Store 2018 Category Analysis

Posted on Mar 20, 2021
Shiny App | Github 


Google Play Store consists of numerous applications within varying categories. The digital distribution service for Android grants users access to millions of apps, in a number of categories, including many family, gaming, and tool, to name a few. There were approximately 2.7 million applications available as of February 2017. The scraped Kaggle data reveals the state of the Google Play Store as of 2018. The dataset highlights 33 categories in the Google Play Store that developers can explore for potential business growth.

Further exploration and analysis can be done on the Shiny app to help organizations make narrow category comparisons and determine which categories contain the largest or smallest number of apps and installations, the characteristics of each category, and more to determine opportunities for app development.

Overview and Ranking

At first glance, we see a left-skewed distribution across all apps in the Google Play Store, or a 4.1 average rating, which indicates a large number of highly-rated apps.

However, since ratings have a low correlation with the number of installs an app receives, less attention should be placed on ratings to determine the number of installs. Instead, the correlation matrix indicates the number of reviews is slightly correlated (at 0.6) with how many installations the app has, and vice versa. This corresponds to the assumption that the number of reviews increases in line with the number of installations. 

Top 10 Made


Top 10 Installs


Despite there being more Family apps, as seen in the Top 10 Made bar chart, Top 10 Installs reveal Game apps as the most installed category, followed by Communication and Tools. Therefore, from a business point of view, considering which category is heavily saturated, like Family apps, is something to recognize when thinking about competition and competitors in the market. On the other hand, the bottom 10 apps made can provide an opportunity for growth, with less competition among these categories. In addition, recognizing the categories that are installed the most may provide insight on which categories consumers prefer to download, thus, providing developers or entrepreneurs opportunities to enter this category and compete for installs.

Category Number of Apps Installs
FAMILY 1877 6,237,542,505
GAME 946 13,457,924,415
TOOLS 830 8,112,771,915
PRODUCTIVITY 374 5,793,091,369


Of the categories that are made and installed the most, Family, Game, Tools, and Productivity fall under both these top 10 lists.

Category Details

Top 3 Category Apps Made: Free or Paid


Taking a look at the top 3 category apps made–Family, Game, and Tools–we see that most of these apps are Free, with a median pay price of $2.99 for paid apps. Together, these 3 categories make up 27.81 billion installations or 36.9% of all installed apps from the Google Play Store.

Top 3 Category Apps Made: Content Rating


Among the top 3 categories of apps, the most common content rating is accessible for Everyone. There is also a notable Teen rating presence between Family and Game apps. In all, having more apps rated as appropriate for everyone can lead to a high number of installations from users.

The two features most commonly found among the top apps made and installed include Free and appropriate for Everyone. 

Future Direction

Further expansion of this Shiny app may be done with an up-to-date dataset to determine the characteristics of categories with the most installs. Based on the correlation matrix, high ratings, price, and the number of reviews don't determine the number of Installs. Since most apps are free, installations may be driven by other features of an app not given in the dataset. With an updated dataset, I would like to identify characteristics of categories and apps that lead to an increased number of installs. Categories with the most number of installs not only determine the needs of consumers today but provide an opportunity for developers to enter this competitive market. 

The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

About Author

Kristin Teves

Kristin has a Masters in Business Administration with a concentration in Business Analytics from California State University, Fullerton and a Bachelor's in Biological Science from University of California, Irvine. She is excited to combine previous business domain knowledge...
View all posts by Kristin Teves >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI