PokeViz: Visualizing and Predicting Pokemon Go using k-Nearest Neighbors
Motivation:
Pokemon Go, a location-based, augmented reality mobile game released by Niantic Inc. The players uses the GPS to locate, capture, and battle fictional creatures in a virtual setting. Like many millennials, I grew up playing Pokemon games on my Gameboy Color. So when this game was released, I was undoubtedly joined the rank of fellow Pokemon Go trainer, running on the streets of New York City looking for the rarest Pokemon. Although the game was a huge success, there were many flaws that need to be addressed, and new features to be implemented. One of which was a way to locate and predict location of Pokemon. As a hardcore Pokemon fan, I developed an Shiny app to offer a potential solution for this problem - PokeViz.
Functionality:
This app has three main functionalities:
- Visualization: The app displays all available Pokemon and their previous spawn location, which can be used to detect specific Pokemon nest.
- Prediction: The app implemented a simple k-Nearest Neighbor (kNN) prediction algorithm to predict the rarity of the Pokemon at any specific location (by longitude and latitude).
- Density and Distribution: The app can generate distribution contour plot of selected regions/cities, allowing the user to determine the location within a city that has higher Pokemon spawn density.
Visualization and Prediction:
The map displays all the Pokemons and their previous spawn location. The user can select the filters to choose the type of Pokemon, number of Pokemon to be displayed, and the continent of interest. Once the selection is complete, simply click "Go!" button will generate the desired map.
The Pokemons are divided into five different classes based on their rarity of appearance: Common, Uncommon, Rare, Very Rare, and Super Rare. By clicking any desired location on the map, the longitude and latitude of that location will be generated in the top left "Position" panel, and the predicted probability of each Pokemon rarity class near that location will be displayed in the top right "Prediction" panel.
Density and Distribution:
Another important feature of the app is to display the density contour map of all Pokemon spawns in major cities. With this feature, user can specifically look for areas with higher Pokemon spawn rate within the city.
The contour map is a map illustrated with contour lines, it shows the different level of elevation, as well as the steepness of the slope. For example in the map above, the color gradient ranges from red to yellow. The red contour lines represent Pokemon density at lower elevation, and as color of the line goes toward yellow, the density gets higher. The brightest yellow indicates the highest density, thus the most Pokemon spawn.
The distribution of each rarity class can be find under the Distribution tab. It shows the frequency of every rarity class in the contour map.
Data Source:
The dataset used in this app was uploaded by Kaggle user SemionKorchevskiy, and can be downloaded at https://www.kaggle.com/semioniy/predictemall. All data processing were done using R. The code can be find here.
Insights and Future Updates:
From this app, we can find the general popularity of Pokemon Go. For instance, North America and Europe shows the most number of Pokemon spawns, and some countries, such as China, shows no sign of Pokemon spawning.
For future directions, more Pokemon data will be add to reflect a more accurate predictor. And also, the density contour can be divided into five different contours with distinguishable color that corresponds to the five rarity classes, which will be easier for the user to target Pokemon of specific rarity.
Appendix:
K-Nearest Neighbor Algorithm:
The k-Nearest Neighbor (kNN) algorithm is a non-parametric algorithm that is capable of both classification and regression. It takes nearest k observations (neighbors) from the input observation, and outputs a class membership (or probability of each class) from the majority vote of the neighbors for classification, and outputs the average value of the neighbors for regression.
The reason I used kNN as the prediction model is because due to the nature of the dataset, the observations are location coordinates, kNN will be great for this purpose. Also, since I wanted to use a model that is capable of performing multi-class multi-label classification, for which kNN is easier to implement and achieves better result compared to other classification models.