PokeViz: Visualizing and Predicting Pokemon Go using k-Nearest Neighbors

Posted on Nov 7, 2016


Pokemon Go, a location-based, augmented reality mobile game released by Niantic Inc. The players uses the GPS to locate, capture, and battle fictional creatures in a virtual setting. Like many millennials, I grew up playing Pokemon games on my Gameboy Color. So when this game was released, I was undoubtedly joined the rank of fellow Pokemon Go trainer, running on the streets of New York City looking for the rarest Pokemon. Although the game was a huge success, there were many flaws that need to be addressed, and new features to be implemented. One of which was a way to locate and predict location of Pokemon. As a hardcore Pokemon fan, I developed an Shiny app to offer a potential solution for this problem - PokeViz.



This app has three main functionalities:

  • Visualization: The app displays all available Pokemon and their previous spawn location, which can be used to detect specific Pokemon nest.
  • Prediction: The app implemented a simple k-Nearest Neighbor (kNN) prediction algorithm to predict the rarity of the Pokemon at any specific location (by longitude and latitude).
  • Density and Distribution: The app can generate distribution contour plot of selected regions/cities, allowing the user to determine the location within a city that has higher Pokemon spawn density.

Visualization and Prediction:


The map displays all the Pokemons and their previous spawn location. The user can select the filters to choose the type of Pokemon, number of Pokemon to be displayed, and the continent of interest. Once the selection is complete, simply click "Go!" button will generate the desired map.


The Pokemons are divided into five different classes based on their rarity of appearance: Common, Uncommon, Rare, Very Rare, and Super Rare. By clicking any desired location on the map, the longitude and latitude of that location will be generated in the top left "Position" panel, and the predicted probability of each Pokemon rarity class near that location will be displayed in the top right "Prediction" panel.

Density and Distribution:

Another important feature of the app is to display the density contour map of all Pokemon spawns in major cities. With this feature, user can specifically look for areas with higher Pokemon spawn rate within the city.


The contour map is a map illustrated with contour lines, it shows the different level of elevation, as well as the steepness of the slope. For example in the map above, the color gradient ranges from red to yellow. The red contour lines represent Pokemon density at lower elevation, and as color of the line goes toward yellow, the density gets higher. The brightest yellow indicates the highest density, thus the most Pokemon spawn.


The distribution of each rarity class can be find under the Distribution tab. It shows the frequency of every rarity class in the contour map.

Data Source:

The dataset used in this app was uploaded by Kaggle user SemionKorchevskiy, and can be downloaded at https://www.kaggle.com/semioniy/predictemall. All data processing were done using R. The code can be find here.

Insights and Future Updates:

From this app, we can find the general popularity of Pokemon Go. For instance, North America and Europe shows the most number of Pokemon spawns, and some countries, such as China, shows no sign of Pokemon spawning.

For future directions, more Pokemon data will be add to reflect a more accurate predictor. And also, the density contour can be divided into five different contours with distinguishable color that corresponds to the five rarity classes, which will be easier for the user to target Pokemon of specific rarity.


K-Nearest Neighbor Algorithm:

The k-Nearest Neighbor (kNN) algorithm is a non-parametric algorithm that is capable of both classification and regression. It takes nearest k observations (neighbors) from the input observation, and outputs a class membership (or probability of each class) from the majority vote of the neighbors for classification, and outputs the average value of the neighbors for regression.


The reason I used kNN as the prediction model is because due to the nature of the dataset, the observations are location coordinates, kNN will be great for this purpose. Also, since I wanted to use a model that is capable of performing multi-class multi-label classification, for which kNN is easier to implement and achieves better result compared to other classification models.

About Author

Alex Yuan Li

Alex Li received his Master of Science in Mechanical Engineering at Columbia University and Bachelor of Science in Chemical Engineering at University of Utah with professional experience in Research and Development in the Medical Device industry. He discovered...
View all posts by Alex Yuan Li >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI