Dota 2: Heroes and Items Selection

Posted on Feb 3, 2017


Contributed by Xu Gao. He is currently in the NYC Data Science Academy 12 week full time Data Science Bootcamp program taking place between Jan 9th to March 30th, 2017. This post is based on his first class project - Shiny (due on the 4th week of the program). - See the app at:

I. Introduction to Dota 2

Dota 2 is a free-to-play multiplayer online battle arena (MOBA) video game developed and published by Valve Corporation.
Dota 2 is played in matches between two teams that consist of five players, with both teams occupying their own separate base on the map. Each of the ten players independently control a powerful character, known as a hero, that each feature unique abilities and different styles of play. During a match, a player and their team collects experience points and items for their heroes in order to fight through the opposing team's defenses. A team wins by being the first to destroy a large structure located in the opposing team's base, called the Ancient.

Ten players each control one of the game's 113 playable characters, known as heroes, with each having their own design, benefits,  and weaknesses. Heroes are divided into two primary roles, known as the Carry and Support. Carries, which are also called "cores", begin each match as weak and vulnerable, but are able to become more powerful later in the game, thus becoming able to "carry" their team to victory. Supports generally lack abilities that deal heavy damage, instead having ones with more functionality and utility that provide assistance for their carries.

Since each hero has 6 item slots at most,  and each item has different functions, how to choose items from a big list of 189 is a very difficult but interesting thing to consider. Also hero choosing is a tough problem for everyone who plays this game, especially when you pick the hero after your teammates. Although people can choose whatever they want, it is their responsibility to choose the best hero to fit the team if they pursue victory for this game.

This R visualization and shiny project is used to help rookies choose heroes and also provide much important information for each hero you choose. I hope this can be truly helpful to those who are still confused about Dota 2.


II. Data Source&Data Cleaning

This data is from Kaggle. This dataset contains 50000 ranked ladder matches from the Dota 2 data dump created by Opendota. It was inspired by the Dota 2 Matches data published here by Joe Ramir.

I choose 20000 sample matches from "match.csv" and "players.csv", combine them together to make sure every match recorded in "match.csv" is linked to "players.csv" and write it down to "ql.csv".


click to see

Notice: V16 means whether this player in this game wins or not. Hero_id and Item_id are matched with other two tables: "hero_names.csv" and "item_ids.csv".



III. Quick Look on Dota 2 Dataset

We can see this dataset from a very first point of view. A Dota player usually cares about which hero has the best winning ratio or which item is most useful for heroes. Even we can see Top 10 the most popular heroes or richest heroes since gold is very important resource in the game to buy gear to strengthen heroes.

Top 10 Winning Ratio

Top 10 Winning Ratio

Top Pick Heroes

Top 10 Pick Heroes

Top Gold Per Min

Top 10 Gold Per Min

Top Popular Items

Top 10 Popular Items

From the slider, we can choose how many top heroes or items we want in order to help us when we start a new game. For example, people who want to win usually tend to choose heroes with high winning ratio.  Omniknight, Wraith King, Ursa, Spectre, Undying are top 5 choices. However, one interesting thing we can find is that the heroes with highest winning ratio are not most popular ones. From the Top 10 Pick Heroes graph, we can learn that top 5 heroes are Windranger, Shadow Fiend, Invoker, Earthshaker, Queen of Pain. The only one hero in both lists is Slardar.  People play games for fun. So those heroes with high winning ratio might not be "excited" to play. For a new player to this game, he(she) usually wants to find a good hero with high winning ratio, and also enjoyable enough to play. "Slardar" must be his best choice.

Gold is used to buy gear in this game. And it can be made from killing soldiers, enemy heroes and destroying towels. There is one position in the team we call "carry". They enjoy to make gold and buy powerful gear to strengthen themselves. So for those guys, gold per min is a good term. We can see that Alchemist GPM leads all heroes, with Anti-Mage, Shadow Fiend and Meepo followed. And for most popular items, tpscroll, power treads and blink are top 3. Tpscroll and blink are very strategic items which might change the trend of the game. Power treads are good things for most of heroes since it can provide moving speed increasing.


IV. Customizing your hero choice&Specific hero information

For a Dota 2 gamer, sometimes you have to choose a hero from several selected ones. This type of game is called Random Draft(RD). So on this scenario, it is important to compare the heroes on your list to find which one has the best winning ratio. The function is under "Customized Heroes" menu item on the left of Shiny Dashboard.

For Example, we want to choose a carry or your team need the gamer to choose a carry. His teammates list some of the choices: Anti-Mage, Morphling, Sven, Luna, etc. So he can choose these heroes from chooser box and compare their winning ratio.


Click to see

From this graph, we can compare several widely used carry heroes. Wraith King has the highest winning ratio. Luna, Sven are next and have similar performance, while Faceless Void and Morphling have the relatively worst performance. They are in the lowest stage of this stair.

Once you decide to use a hero, except their spell information which is listed on Dota 2 interface, it is also important to know when this hero has the dominating power in this game. Also people might want to know the whole information about GPM, XPM, winning ratio, Kill, Death, Assist statistics.

This is in "Pick one hero to see" bottom. In this page, you can choose any hero you want to check all his information.

Luna  We choose Luna as an example. The table shows average GPM, average Kill/min, average Death/min, average Assist/min and winning ratio. Also it provides the ranking on these issues. Also the graph shows when this hero has the best winning ratio. For this, people who pick Luna better ends this game on 20-30 minutes.  If the game is already at 20 minutes, you have to be aggressive from this point.


V. Simulation on hero pick

This Shiny app also provides a very cool function, which can help you choose the hero after your teammates are ready. This algorithm is quite simple. In a game, I assume the total Gold per minute and Experience per min for the whole team are fixed, which are quite true if both teams are in the same type, aggressive or protective.  If your teammates choose four supporting heroes, which means that they do not need too much gold or experience, you will have a very big space to "farm" the gold. So choosing one hero of one of the richest "cores" might be a good point. But what if your teammates already fill all the room of gold? That means you need to choose a support hero, who needs few resources and helps the "core" to dominate this game.

The detail of this algorithm should be explained more. Since the real game is more volatile than what we assume, I set a range of fixed team gold and team experience, for 90% to 110%, and rank all hero suits these ranges.

Let's use an example:

EG, VG This game is Game 1 from Grand Finals of Dota 2 Asia Champions in 2015, EG V.S VG.Gaming. The VG side picked Rubick, Ember Spirit, Leshrac, Tidehunter and Shadow Fiend. And EG side picked Pheonix, Enigma, Huskar, Treant Protector and Templar Assassin. Following graphs show the simulation result from Shiny.

Click to see



VG Pick


EG Pick


From these two simulations, we can find for VG Gaming Team, Shadow Fiend is listed on No.6. Considering the patch and enemy pick, this pick can be seen as a good pick. Because Shadow Fiend is the one who can make up rest space of Gold and Experience for this team. On the other side, EG picked Pheonix, Enigma, Huskar, and Treant Protector. The final pick on our simulation is Templar Assassin, which was just the one they pick in this game.


VI. Summary

Dota 2 is a great and diverse game. Players need to consider much information to decide what heroes they choose or what item they use. From this data visualisation project, I think one can find a lot of important detail about the hero picks. So we can say that data for Dota 2 is important and useful. In the future, I will analyse the influence of the chat in the game. Sometimes to give teammates pressure helps. But sometimes, we need to encourage our friends to become better in a team battle or regular game time.  I hope I can find best way to communicate with your teammates.


Source Code:



About Author


Xu is a Master of Financial Engineering student in New York University. He received Bachelor of Economics in University of International Business and Economics. Xu has a good experience about machine learning and pair trading system. Besides, he...
View all posts by Xu >

Related Articles

Leave a Comment

La ciencia entra en los eSports: Mad Lions E.C. y otros ejemplos May 31, 2018
[…] datos realizando sus prácticas o trabajos internos tomando los eSports como molde. Xu Gao lo hizo con ‘DOTA 2’. Blizzard aún cuenta con una vacante abierta para aplicar en lo mismo para el juego […]
Leon Tan October 6, 2017
Hello Xu, My name is Leon and I am currently taking a Foundation Data Science boot camp and looking for data that I can use for my Capstone Project. I came upon you data set in Dota 2 and I found it quite interesting. I was wondering if you can provide me some advice on where to obtain these data as a stepping stone for the project? Hope to hear back fro you soon! Thank you :D Sincerely, Leon

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI