How to Make the Best Board Game
Introduction - Data Set Overview
For this project, I built a data set based on the ranked games on BoardGameGeek. BoardGameGeek is a popular board game forum where users can rank and discuss board games. The site also discusses categories, mechanics, and designers of the games as well as expansions to board games. However this analysis does not look deeply into the expansions. Each board game has a page similar to the one shown in figure 1.
This game page exposes information for ratings, statistics, general game information, and more. From pages like this it is possible to collect a wealth of information about many different games, and patterns may start to emerge.
To answer the question of what makes a game popular. It seemed necessary to look at both the top and bottom games. For this, I used the site's search function with no parameters, and then sorted the resulting list by rank, both increasing (figure 2) and decreasing. Note that a game ranked #1 is considered better than a game ranked #100.
Collecting games by increasing rank gave me access to 50 pages, each holding 100 games. There was an exception where the final page only held 99 games, giving a total of 4999 ranked games. However, when searching by decreasing rank, the site returned a lot of games with a rank of N/A. On BoardGameGeek, ranking is based on their own metric, Geek Rating. Geek Rating is based on number of factors including number of votes. If too few people voted, the Geek Rating will be incalculable, and the game will be unrankable. For this reason, I was limited to the top 4999 games. Ultimately this was not an issue as due to the sluggishness of Selenium and time constraints, I would not have been able to scrape data for too many more games.
As mentioned previously, the data set contained data for 4999 ranked games. For each game, the scraper attempted to collect:
- Game Id
- Game Page
- Year Published
- BoardGameGeek Ranking
- Number of Votes
- Geek Rating
- Average User Rating
- User Rating Standard Deviation
- Number of Comments
- Number of Fans
- Game Mechanics
- Game Categories
- Minimum Players
- Maximum Players
- Best Players
- Minimum Age
- Minimum Playtime
- Maximum Playtime
- Number of Expansions
- Number of Plays
- Number Owned
- Number Previously Owned
Not all games had information for all of these features. They did all at least have game id, name, page, geek rating, number of votes, average user rating, user rating standard deviation, number of comments, number of fans, number of expansions, number of plays, number owned, and number previously owned.
Information existed for most other games for the other features, and information that was typically missing was "maximum" values. However, due to a bug in the code, the "best players" feature was missing in all games. If data was missing for any feature, it was replaced by an empty string. As can be seen in Figure 3, most of the missing data is in max playtime. This means that most games either didn't have a maximum playtime listed, or the scraper failed to collect that data (as it did with best players). Minimum age is barely above 5% missing.
To see what did and didn't make a good game, I began looking at how different game features correlated with game rank. Figure 4 shows the correlation with game features such as playtime, and min age. Generally these are values suggested by the game producer. Average user rating was included in this plot to see how these features also affect average user rating.
As can be seen in the figure 4, things such as minimum age, and minimum playtime have almost no correlation with geek rating, average rating, or rank. Number of expansions is somewhat positively correlated, but most likely this is more of an reverse correlation where the more popular a game is the more expansions it will receive.
Next I looked at the correlation between user input and game rank. User input covers features that are more in line with user opinions. This is both explicit with their ratings, and implicit with how they interact and share ideas on the game page. For figure 5, I did consider average user rating when looking at correlation as it looks at different ways users voiced their thoughts on the game, rather than what came from the producers.
Again, it can be seen that geek rating and rank are tightly correlated (which is true since geek rating is how rank is decided). However most other attributes are not closely correlated. The closest we see is in number of votes and number of comments. One interesting thing to note here is that average rating is not a huge contributor towards geek rating and rank. While it may be nice to have a high user rating, it may be better to have a more active player community that post a lot of comments.
Looking at the plots for some of these features, such as number of fans in figure 6, we see that there is a long tail up to high values.
However, if we look at log10(# of Fans) instead, we see a much better and clearer graph. Looking at figure 7, the histogram of log10(# of Fans) even takes a more normal structure.
Similar graphs and results are produced when log10 is applied to comments and votes as well. These new values can then be run again through correlation to produce the plot in figure 8. Average rating follows a roughly normal distribution as well, and does not require a transformation.
Comparing this back to figure 5, we see that when log10 is applied, number of fans, number of votes and number of comments become much more important. By applying log10 to these values, it also helps clarify the trends. Compare figures 9 and 10, where original values are used in figure 9 and log10 values are used in figure 10. When looking at the graphs below, keep in mind that a lower rank is better.
Figures 9 and 10 clearly show that games with higher ranks get both more votes and more comments. This keeps in line with what was shown in the log10 correlation plot in figure 8. These plots suggest that log10 of comments, votes, and fans are good indicators of how high a game will rank on BoardGameGeek. This still more or less stands to reason. A highly ranked game has a lot of fans, and gets lots of comments and votes. But even considering that, they still provide extra, useful information. The reason for this is that rating, and especially rank, are not normally distributed. But the log10 of fans, comments, and votes are very close to normal distributions. We can then use these, along with non-numerical features to start to look at what makes games popular.
Also included in the game data that was scraped is information like game categories, mechanics, and designers. This information provides the best visibility into what does and doesn't make for popular games as these are descriptors of the actual attributes of the games. Categories covers things like "Politics," "Economy," "WW II," and "Horror." Mechanics is more about "Dice Rolling," "Variable Player Powers," and "Trading." Not every game has listings for these, but more than 95% of the games collected do, which should be more than sufficient to look for trends in both categories and mechanics.
Since it's already been seen that by transforming certain features by log10, we can get normalized values (figure 7), then we can run t-tests on the values for categories and mechanics, and see which ones will be more likely to yield more fans, comments, or votes. As before, games that have more fans, comments, or votes tend to have a higher rank.
Four t-tests were run for every category in categories, one each for average rating, fans, votes, and comments. In each case, the null hypothesis stated that the given category had an equal average value for the tested feature. For example, if the "Sci-Fi" category was being tested, the null hypothesis would state that the average log10(number of fans) for all games with "Sci-Fi" listed in their categories is equal to the average log10(number of fans) for all games without "Sci-Fi" listed in their categories. In all cases, the alternate hypothesis was that the sample average was larger than the population average. This would be like saying that "Sci-Fi" games have more fans on average than other games.
Test Results: Categories
The table below shows the top ten games that had the lowest p-values from their respective tests. The p-value basically corresponds with the likelihood that the null hypothesis stands. Typically, a p-value below .05 is desired in order to reject the null hypothesis with 95% confidence. For the categories below, we are saying that with more than 95% confidence, this category yields a higher average rating, number of fans, number of votes, or number of comments than other categories. When looking at the top (or bottom) ten, order isn't hugely important, just being in the top ten is what should be considered. The difference between placement in order here is not statistically significant.
|Average Rating||Number of Fans||Number of Votes||Number of Comments|
|World War II||Fantasy||Economic||City Building|
|American Civil War||Horror||Medieval||Medieval|
|Civil War||Space Exploration||Humor||Territory Building|
The table above starts to show some interesting things. War games tend to yield higher average ratings. Thematic games, on the other hand, tend to attract more fans. Finally, users are more likely to vote and comment on games that have some sort of constructive aspect to them. Since votes and comments tend to correlate highly with a higher rank (and average rating interestingly less so), games with constructive themes will likely yield higher ratings. Especially interesting is that war games are not likely to yield more votes or comments, and may not provide for a higher rank.
Test Results: Mechanics
Just as with the categories table above, t-tests produced the following table for game mechanics.
|Average Rating||Number of Fans||Number of Votes||Number of Comments|
|Hand Management||Variable Player Powers||Set Collection||Set Collection|
|Variable Player Powers||Dice Rolling||Hand Management||Hand Management|
|Set Collection||Modular Board||Card Drafting||Area Control/Area Influence|
|Player Elimination||Grid Movement||Variable Player Powers||Auction/Bidding|
|Card Drafting||Co-operative Play||Area Control/Area Influence||Card Drafting|
|Grid Movement||Action Point Allowance System||Player Elimination||Variable Player Powers|
|Simultaneous Action Selection||Deck/Pool Building||Auction/Bidding||Player Elimination|
|Worker Placement||Card Drafting||Grid Movement||Modular Board|
|Co-Operative Play||Area Movement||Modular Board||Tile Placement|
|Variable Phase Order||Worker Placement||Worker Placement||Grid Movement|
Again, we see that number of votes and number of comments share very similar results. Some results show up in all sets. However, unlike with categories, game mechanics are much more spread out and don't really follow a trend for the top values.
That being said, the bottom values do follow an interesting trend. Over all tests, performance-based games did the worst. The people who visit BoardGameGeek do not like to act or sing in their games.
Test Results: Designers
Unfortunately, there are not enough data points for each designer to perform accurate t-tests. So I was unable to learn anything about which designers are and are not popular.
Based on the analysis performed, two of the major indicators of a game's rank on BoardGameGeek are the number of votes it has gotten and the number of comments it has gotten. Likely this is because people who enjoy a game want to get online and talk about it, and express how much they enjoyed it. If they don't like a game, they are probably more likely to simply never play it again.
Using the knowledge that votes and comments are indicators, and the fact that they are more or less normal distributions when transformed, we can use the number of votes and comments to see which categories and mechanics are doing the best. Games where there is some sort of constructive aspect, building cities, businesses, etc, tend to yield more votes and comments. War games, while making for games with a higher average rating, do not. While it may seem like it would be best to go for games that yield higher average ratings, games with higher average ratings aren't as highly correlated with rank on BoardGameGeek. While game mechanics are varied for the top mechanics, one can still learn something by looking at the bottom mechanics. Performance-based games should be avoided. They don't increase the likelihood of higher ratings, fans, votes, or comments.
Going forward I'd like to look at more games. I was only able to scrape about 5000 games, and I think I could learn more from more games. I'd also like to look more into the connection between votes and comments. In all tests I performed, they were very tightly correlated, and I think it would be interesting to find out more. Finally, I want to run more varied tests and see how much that changes my results so I get get a higher confidence and understanding of what makes for good categories and mechanics in games.