PokeViz: Visualizing and Predicting Pokemon Go using k-Nearest Neighbors

Posted on Nov 7, 2016


Pokemon Go, a location-based, augmented reality mobile game released by Niantic Inc. The players uses the GPS to locate, capture, and battle fictional creatures in a virtual setting. Like many millennials, I grew up playing Pokemon games on my Gameboy Color. So when this game was released, I was undoubtedly joined the rank of fellow Pokemon Go trainer, running on the streets of New York City looking for the rarest Pokemon. Although the game was a huge success, there were many flaws that need to be addressed, and new features to be implemented. One of which was a way to locate and predict location of Pokemon. As a hardcore Pokemon fan, I developed an Shiny app to offer a potential solution for this problem - PokeViz.



This app has three main functionalities:

  • Visualization: The app displays all available Pokemon and their previous spawn location, which can be used to detect specific Pokemon nest.
  • Prediction: The app implemented a simple k-Nearest Neighbor (kNN) prediction algorithm to predict the rarity of the Pokemon at any specific location (by longitude and latitude).
  • Density and Distribution: The app can generate distribution contour plot of selected regions/cities, allowing the user to determine the location within a city that has higher Pokemon spawn density.

Visualization and Prediction:


The map displays all the Pokemons and their previous spawn location. The user can select the filters to choose the type of Pokemon, number of Pokemon to be displayed, and the continent of interest. Once the selection is complete, simply click "Go!" button will generate the desired map.


The Pokemons are divided into five different classes based on their rarity of appearance: Common, Uncommon, Rare, Very Rare, and Super Rare. By clicking any desired location on the map, the longitude and latitude of that location will be generated in the top left "Position" panel, and the predicted probability of each Pokemon rarity class near that location will be displayed in the top right "Prediction" panel.

Density and Distribution:

Another important feature of the app is to display the density contour map of all Pokemon spawns in major cities. With this feature, user can specifically look for areas with higher Pokemon spawn rate within the city.


The contour map is a map illustrated with contour lines, it shows the different level of elevation, as well as the steepness of the slope. For example in the map above, the color gradient ranges from red to yellow. The red contour lines represent Pokemon density at lower elevation, and as color of the line goes toward yellow, the density gets higher. The brightest yellow indicates the highest density, thus the most Pokemon spawn.


The distribution of each rarity class can be find under the Distribution tab. It shows the frequency of every rarity class in the contour map.

Data Source:

The dataset used in this app was uploaded by Kaggle user SemionKorchevskiy, and can be downloaded at https://www.kaggle.com/semioniy/predictemall. All data processing were done using R. The code can be find here.

Insights and Future Updates:

From this app, we can find the general popularity of Pokemon Go. For instance, North America and Europe shows the most number of Pokemon spawns, and some countries, such as China, shows no sign of Pokemon spawning.

For future directions, more Pokemon data will be add to reflect a more accurate predictor. And also, the density contour can be divided into five different contours with distinguishable color that corresponds to the five rarity classes, which will be easier for the user to target Pokemon of specific rarity.


K-Nearest Neighbor Algorithm:

The k-Nearest Neighbor (kNN) algorithm is a non-parametric algorithm that is capable of both classification and regression. It takes nearest k observations (neighbors) from the input observation, and outputs a class membership (or probability of each class) from the majority vote of the neighbors for classification, and outputs the average value of the neighbors for regression.


The reason I used kNN as the prediction model is because due to the nature of the dataset, the observations are location coordinates, kNN will be great for this purpose. Also, since I wanted to use a model that is capable of performing multi-class multi-label classification, for which kNN is easier to implement and achieves better result compared to other classification models.

About Author

Alex Yuan Li

Alex Li received his Master of Science in Mechanical Engineering at Columbia University and Bachelor of Science in Chemical Engineering at University of Utah with professional experience in Research and Development in the Medical Device industry. He discovered...
View all posts by Alex Yuan Li >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup music Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp