Secret to Winning a League of Legends Game
As a League of Legends fan and data scientist, I never give up combining those 2 things I love together. In this project, League of Legends game data was collected with a well-structured scraping framework, to support the further analysis and exploration, and about 35,000 rows match data of more than 400 players were scraped, the dataset is consisting of original data and features created with feature engineering based on my gaming experience, covering data of player's information, her or his game performance statistics, and so on. With the this pretty informative dataset, I not only the made game result predictive model, but also made pre-game strategy analysis function and a
auto-break-up system dating reminder function for helping users built desired romantic relationship and find the right person earlier.
The data source website: na.op.gg
Data Structure and Scraping
There are 10 players in each game, and they form 2 teams to fight each other. The website gives almost all information on player level for each hame she or he had in 2 months.
Figure 1 - game information sample from na.op.gg
Don't panic if you are not familiar with those game terms. Please assume that we are predicting the result of a fight between 2 teams of people. For each person in this fight, let's say we exam her or his following features (speed, strength and intelligence):
Figure 2 - table of a fighter's tests
If the number of tests is large enough, we are confident to use the average value to indicate the 'fight power' of each person, and then get the 'fight power' of teams, and finally utilize forecasting models to predict the result of a League of Legends game.
For scraping the website, Python package Selenium was used because the website require visitor to click the 'game details' bottom to display full information of a game. By using this package, we could simulate real users' behavior so that we can access to some contents otherwise could not be scarped with other packages like beautiful soup.
The scrapping process starts from visiting the page of a player, and:
1) Refresh the information by clicking the 'renew' button.
2) Choose the game type 'normal' by clicking the drop-down list.
3) Collapse all game tabs all information in the table will be scraped.
4) Open another web page on which we can scrap information of each player's champion preference.
Figure 3 - the tricky parts of scraping
For each player, about 80 game records will be collected for estimating her or his game level, and for a 10-player game, it takes about 35 to scrape after I optimizing script to make it well-designed for collecting data effectively and flexible enough to overcome problems resulted from the complex html structure of this website.
To make better prediction, four features were made from original dataset, based on the understanding of this game, they are as follows:
Champion Win Rate: This feature is the player's win rate when she or he plays a certain champion, which brings the analysis to the level on champion.
Champion Top Number: The website has a list for each player, indicating how well a player can do with a certain champion, by reordering the list, I got a feature that precisely depict to which extent a player is good at a certain champion.
Game Frequency: How many games does one player played during a period. This feature was made because the belief that a player have to play a champion to keep she or he familiar with it, like you have to keep programming to ensure that you are good at it.
Champion Frequency: How many games does one player played with a certain champion during a period.
Figure 4 - the summary of logistics regression model
Figure 4 illustrates the importance of each feature, we could see that the Champion Win Rate is the most significant feature for predicting the game result, which beats all original feature. Another created feature Champion Frequency is also with strong predictive power. We can draw a conclusion that predict the game result on champion level (drill down features onto champion level) will offer a great model. What's more, by applying this model to test data, we get 86.1% accuracy.
Have More Fun
Pre-game strategy analysis
With the model, we know that what's important to win a League of Legends normal game, and we are able to get opponents' information before the real game starts, with scraping. So, we will know on which opponent we should focus, and an indeed wiser decision could be made if we know enemies better.
It is noteworthy that there is a button named 'Living Game' next the refresh button, which allows user to check the status of a player, and we scrapers can play with it.
By using the Python googlemaps package, we could get how long does it take to go from place to place. So, we could automatically remind our date with another package which will sends e-mail for you.
Let's say a girl is going to have a dinner with her game addict boyfriend at 5:00, and it takes 1 hour to be the restaurant. She tells the function the date time and destination. At 3:20, my function helps her find that her boyfriend still in game, so an e-mail will be sent as follows:
It's 3:20, and YOU START A GAME!
the average length of lol game is 40 mins
you have 50% chance to be dumped. so quit.
or date a girl who doesnt know data science next time 🙂
We can have tons of fun with scraping, can't we?