Video Game Descriptions: Do Some Words Sell Better?

Posted on Feb 17, 2023

If you've ever considered purchasing a video game, you may have read the game's description. A game's description is a short or long blurb explaining the premise of the game, allowing players to get a sense of the gaming experience before they decide to make the purchase. Along with trailers and reviews, the description is one way you may be convinced to buy a new game. Could video game makers improve sales just by using the right words in the description? I wanted to determine if there are specific words and phrases that, when included in the description, drive more or fewer sales.

And if you haven't already guessed why some words above are in green, read on.

The Data

There were two main pieces of data I needed for this study: game descriptions and game sales. I chose to use Top Video Games 1995-2021 Metacritic and Video Game Sales from Kaggle.com. The "Top Video Games 1995-2021 Metacritic" dataset consists of 18,800 video games rated on Metacritic, including their name, platform, release date, and game description. The "Video Game Sales" dataset consists of 16,598 video games with sales greater than 100,000 copies and includes their name, platform, year, and global sales. After tidying up the datasets and making them compatible for merging, I merged them on the basis of name, platform, and year. That left 5,294 observations. All work for this study was done in Python.

Data Cleaning

The figure below demonstrates the distribution of global sales within the data. The majority of the games made less than 2 million sales. However, the outliers – many of which were related to Wii and Mario – went as high as 80 million sales for Wii Sports. These outliers could sway the results to suggest the words used in Wii and Mario games are the best words to use in a description. However, the wording of these descriptions are likely not the reason they have such high sales. For this reason, I removed all games with sales greater than 2 million.

The next step was to clean up the descriptions to prepare them for analysis. This process included removing all special characters, making everything lowercase, removing filler words like "the" and "and", and removing words and phrases that are specific to a game or company, such as "star wars", "modern warfare", "nintendo", and "electronic arts".

Methodology

The methodology of this study involved finding unique words and phrases that appear frequently in the descriptions and determining if the average global sales are significantly different between games with and without each word or phrase in their description.

The 500 most frequently used words in the game descriptions (after words like "the" were removed):

I began by finding every word that appeared in the descriptions at least 100 times using the python package collections. There were 616 words found. I then wanted to find frequently used two and three word phrases, called bigrams and trigrams, respectively. Using the python package nltk (Natural Language Toolkit), I found the bigrams and trigrams that appeared at least 50 times in the descriptions. There were 134 bigrams and 6 trigrams.

I removed every unique word that was contained inside of a bigram or trigram. This was necessary to avert having results that considered  both "brand" and "brand new" to be significant. "Brand" on its own may only seem significant because "brand new" was significant. The ideal way to deal with these overlaps would be to separate instances of "brand" that are not followed by "new" from the instances of "brand new". However, for the simplicity of the analyses, I removed "brand" and other such words.

Once I had my list of unique words, bigrams, and trigrams, I conducted a significance test on each word or phrase. I found the average global sales of game descriptions that did contain the word or phrase, and the average for those that did not. I performed a two sided t-test with a 95% confidence interval on the null hypothesis that the descriptions with and without the word or phrase had the same mean global sales.

Results

The below two figures summarize the results of the significance tests. The left figure shows the words and phrases such that the mean global sales of games with these words were significantly lower than those without. Thus these are the "bad" words. Similarly, the right figure shows the words and phrases whose inclusion in a game description resulted in higher average global sales, making these "good" words. In both figures, the x-axis is the difference in mean global sales of games with and without the word, in millions of sales.

The phrases "open world", "new level", and "never before seen" were the highest performing, increasing sales by almost 250,000. "Puzzle", "turn based", and "strategy game" were the lowest performing phrases, with "strategy game" reducing sales by upwards of 250,000. One caveat of these results is that we cannot decipher between whether "strategy game" performed so poorly because it's unappealing wording, or if strategy games is a style of game that makes less sales. Future work could involve performing these analyses on game genres separately.

 

Best and Worst Descriptions

Let's look back at the data and see which descriptions in the dataset were the best and worst, according to my findings. I calculated a score for each game. Games started with a score of zero and gained or lost value as determined by the change in mean global sales of each significant word contained in the description. (Example: "open world" = +0.256, "strategy game" = -0.273).

Highest Score: Guitar Hero 5

The highest scoring game description was for Guitar Hero 5. It received a score of 1.89. Below is the description with the good and bad words and phrases highlighted.

"For the first time ever, players can customize the make-up of their band by rocking with any combination of instruments in-game: whether it be two guitars and two drums, or three guitars and a microphone, any combination is possible, allowing players to experience music their own way. Brand new, innovative, easy-to-use gameplay modes like Party Play and RockFest put fun, competition and control at center stage as fans tailor the Guitar Hero experience to match their personal style and interests. For extended hours of entertainment, downloadable content from Guitar Hero World Tour is compatible with the game and can automatically be updated to include all of the upgrades and enhancements of Guitar Hero 5. Guitar Hero 5 features the strongest, most varied set list to-date comprised of master tracks from 85 of the hottest bands of today and the biggest classic acts including; Kings of Leon, The Rolling Stones, The White Stripes, Santana, Vampire Weekend, Tom Petty, Johnny Cash, Bob Dylan, plus more than 25 artists from a variety of music genres that are making their music video game debut. Among the first-time-ever features of Guitar Hero 5 are: Party Play, where players can jump in or drop out of gameplay seamlessly; RockFest, a comprehensive competitive experience available featuring five new head-to-head modes playable online or in your living room; and the ability to play the entire set-list from the first time the game is turned on. Guitar Hero 5 refines the player experience, enhances the art style and redesigns core features such as GHMusic StudioSM, making it the most accessible, fun-to-play and authentic experience for seasoned music gamers as well as first-time players. New innovations such as Band Moments, where bands are rewarded for hitting special note streams together and song challenges where gamers are tasked to play through a song a specific way, add a new competitive layer of excitement and accomplishment to the music rhythm genre. [Activision]"

Note: "Band Moments" was one of the game-specific terms that were dropped prior to analysis. "Band" and "moments" still came out as statistically significant words.

Lowest Score: The Chronicles of Narnia: The Lion, The Witch and The Wardrobe

The lowest scoring game description was for The Chronicles of Narnia: The Lion, The Witch and The Wardrobe. It received a score of -0.94. Below is the description with the good and bad words and phrases highlighted.

"The Chronicles of Narnia: The Lion, The Witch and The Wardrobe is an action adventure based on the Disney and Walden Media film capturing the book series from author C.S. Lewis. Players enter the world of Narnia, a land frozen in eternal winter by the evil powers of the malevolent and evil White Witch. In order to end this frigid captivity and free his people, the mighty lion Aslan, true ruler of Narnia, invokes an ancient prophecy. It will become the destiny of four young siblings from our world: Peter, Susan, Edmund and Lucy Pevensie to work together and use their unique combat skills, weaponry and abilities to defeat the Witch and her armies and save Narnia. These four unlikely heroes must battle the evil forces of the White Witch by waging war against a vast variety of creatures, including Centaurs, Minotaurs, Minoboars, Cyclops, Werewolves, Wraiths, Ankleslicers, Wolves, Satyrs, Boggles and more. Battle the evil forces of the White Witch by waging war against a huge variety of creatures, including Centaurs, Minotaurs, Minoboars, Cyclops, Werewolves, Wraiths, Ankleslicers, Wolves, Satyrs, Boggles and more. Utilize the unique combat skills, weaponry and special moves of each character to fight and fend off hordes of dark mythical beasts, or solve intricate puzzles and progress through the adventure. Two-player action featuring all four characters. [Disney Interactive]"

Remember the first paragraph of this article? That had a score of 1.09!

Final Thoughts

There are a few caveats not yet mentioned. For two game descriptions that both use "good" words, the longer description would score better because it has more opportunities to use the "good" words. However, in the real world, there is a downside to making a description too long (or too short), though the length of the descriptions are not considered in this study.

Another caveat is that variations of the same word like "customize" and "customization", for example, are not treated as the same word. This also applies to singular vs plural and past tense vs present tense versions of words. An update to this study would involve transforming all words into the same state, although there may not currently exist a package that can do this easily.

So the next time you're writing a video game description, be sure to mention how it's a never-before-seen, open world, next gen gaming experience, and NOT how it's a turn-based, puzzle solving, action-adventure strategy game!

About Author

Grainne O'Neill

As a soon-to-be Ph.D. graduate with a background in mathematics and a passion for data science, I am seeking opportunities to leverage my skills and enthusiasm for solving complex problems through data-driven insights.
View all posts by Grainne O'Neill >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI