Webscraping the Steam Game Platform
What is Steam?
Steam is a digital distribution platform for PC gaming that was created by Valve Software in 2003. It has grown over the years and has now become the largest PC platform for selling games. It has 150 million registered users and 18.5 million concurrent users. If you are a PC gamer, odds are good that you look for games to play by using the Steam platform.
To scrape data associated with all of the games available for purchase through the Steam platform and to use this data to gain insights on the growth of the platform as well as the popularity of types of games.
Description of Data to be Scraped
The data that will be collected for each game includes the title, developer, price, description of the game, total number of reviews, percentage of reviews that are positive, release data, and the category of the game as defined by users. The user-defined categories are referred to as tags by Steam, and they provide some interesting information about the game.
While scraping, I encountered a few challenges. First there are some games that are not yet available for sale, and those games were skipped. Also there are some games that are on sale, so there are two prices associated with the game -- the original price and the current sale price. Both of these prices were scraped. Finally, there are some games that have fewer than ten reviews, so there is not yet a percent positive number for the game. For those games I just set the percent positive to N/A.
The average price for a game available on Steam is $8.42, and the median price is $5.99. Interestingly, the game Crisis Action VR is the most expensive game on the platform at $199. The price for this game ends up being a bit of a mystery since the game itself doesn’t seem out of the ordinary, and the reviews for the game just add to the confusion about why the price is so high on a game that was priced lower on release.
The average price of a game released since the platform became available has fluctuated around $8. One very interesting thing is that there has been a dramatic increase in the number of games available since 2014. This appears to be a response to Steam introducing the Early Access games that allow small developers to release games in unfinished format so that they can raise money for development, as well get early feedback on a game’s development.
Since 2014 each game available for purchase through Steam has a number of categories that can be used to describe the game. These categories are called tags, and the users of Steam can pick them based on their experience with the game. Steam allows a total of 20 tags to be associate with each game, and there are a total of 354 distinct tags. Overall the Indie tag has the largest number of games associated with it. Action, Casual, and Adventure are the next three top categories. That Indie is the highest category is not very surprising since Steam provides a means for independent game developers to reach a sizeable audience that would be difficult otherwise.
The overlap in tags is also interesting with Indie and Action games being the most common ones on the service.
|Free to Play||Great Soundtrack||Early Access||RPG||VR|
By analyzing the top ten tags for each year since 2014, it’s possible to see certain types of games increase and sometimes decrease in popularity over time. For example it’s possible to see VR increase in popularity in 2016 and 2017, but that popularity appears to have waned in 2018. Early Access games have increased in popularity since 2016, and they are likely to be continuing that increase this year as well.
Word Cloud Analysis
Each game on Steam comes with a description in the form of a short blurb provided by the game’s developer as a sort of sales pitch to potential buyers. By using these descriptions, it’s possible to do a word cloud analysis to get an idea of how games on Steam are marketed and to also see what are the most common terms used by developers. Some terms are generally what one expect to find associated with games, like adventure and puzzle, but one term that is kind of surprising to see so prominently is VR. This is surprising since even at its highest popularity point in 2017, it was only the 7th most popular tag. My guess as to why it’s so prominent is that there was a lot of buzz around VR games a few years ago and that developers who use it wished to bring it to the attention of gamers because VR requires additional hardware to work properly.
Ideas for Future Development
The Steam platform provides a large amount of data about games, and it’s possible to see many trends using that data. One thing that I would like to do for future development is to scrape the reviews for each game along with the data about the games. One thing that would make this difficult is that the reviews are only available using an infinitely scrolling page, and some games have over a million reviews. Scrapy-splash will need to be implemented to access the reviews, and it might be worthwhile to only consider the most recent 500 reviews to keep the number to be scraped down to a manageable amount.