Google Play Store 2018 Category Analysis

Posted on Mar 20, 2021
Shiny App | Github 


Google Play Store consists of numerous applications within varying categories. The digital distribution service for Android grants users access to millions of apps, in a number of categories, including many family, gaming, and tool, to name a few. There were approximately 2.7 million applications available as of February 2017. The scraped Kaggle data reveals the state of the Google Play Store as of 2018. The dataset highlights 33 categories in the Google Play Store that developers can explore for potential business growth. Further exploration and analysis can be done on the Shiny app to help organizations make narrow category comparisons and determine which categories contain the largest or smallest number of apps and installations, the characteristics of each category, and more to determine opportunities for app development.

Overview and Ranking

At first glance, we see a left-skewed distribution across all apps in the Google Play Store, or a 4.1 average rating, which indicates a large number of highly-rated apps.

However, since ratings have a low correlation with the number of installs an app receives, less attention should be placed on ratings to determine the number of installs. Instead, the correlation matrix indicates the number of reviews is slightly correlated (at 0.6) with how many installations the app has, and vice versa. This corresponds to the assumption that the number of reviews increases in line with the number of installations. 

Top 10 Made


Top 10 Installs


Despite there being more Family apps, as seen in the Top 10 Made bar chart, Top 10 Installs reveal Game apps as the most installed category, followed by Communication and Tools. Therefore, from a business point of view, considering which category is heavily saturated, like Family apps, is something to recognize when thinking about competition and competitors in the market. On the other hand, the bottom 10 apps made can provide an opportunity for growth, with less competition among these categories. In addition, recognizing the categories that are installed the most may provide insight on which categories consumers prefer to download, thus, providing developers or entrepreneurs opportunities to enter this category and compete for installs.

Category Number of Apps Installs
FAMILY 1877 6,237,542,505
GAME 946 13,457,924,415
TOOLS 830 8,112,771,915
PRODUCTIVITY 374 5,793,091,369


Of the categories that are made and installed the most, Family, Game, Tools, and Productivity fall under both these top 10 lists.

Category Details

Top 3 Category Apps Made: Free or Paid


Taking a look at the top 3 category apps made–Family, Game, and Tools–we see that most of these apps are Free, with a median pay price of $2.99 for paid apps. Together, these 3 categories make up 27.81 billion installations or 36.9% of all installed apps from the Google Play Store.

Top 3 Category Apps Made: Content Rating


Among the top 3 categories of apps, the most common content rating is accessible for Everyone. There is also a notable Teen rating presence between Family and Game apps. In all, having more apps rated as appropriate for everyone can lead to a high number of installations from users.

The two features most commonly found among the top apps made and installed include Free and appropriate for Everyone. 

Future Direction

Further expansion of this Shiny app may be done with an up-to-date dataset to determine the characteristics of categories with the most installs. Based on the correlation matrix, high ratings, price, and the number of reviews don't determine the number of Installs. Since most apps are free, installations may be driven by other features of an app not given in the dataset. With an updated dataset, I would like to identify characteristics of categories and apps that lead to an increased number of installs. Categories with the most number of installs not only determine the needs of consumers today but provide an opportunity for developers to enter this competitive market. 

About Author

Kristin Teves

Kristin has a Masters in Business Administration with a concentration in Business Analytics from California State University, Fullerton and a Bachelor's in Biological Science from University of California, Irvine. She is excited to combine previous business domain knowledge...
View all posts by Kristin Teves >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp