Words to Learn to Solve the New York Times Crossword
<Featured Header Image by vectorjuice on Freepik>
Introduction
If you've tried to solve a crossword puzzle before, you've probably learned some new words in the process. I never knew what anΒ epee was or who Arthur AsheΒ was before doing a crossword. Are there ways to improve without just doing whole puzzles? What about a study list: a list of words that you might not know but that are common in crosswords?Β I analyzed a dataset from Kaggle featuring words and clues from the New York Times Crosswords between 1993 and 2021. This included examining the dataset for missing values and examined trends in the most-common clues.
Then, I made a list of the top words featured in the puzzles. This list features lots of common words that most English speakers already know, so it's not a great study guide. My idea, then, was to cross-reference this list with a second list from wordfrequency.info. This second list features 5,050 of the most common and currently-used words in the English language. By eliminating these common words, we get a valuable study guide. This guide is a list of words that are common in the NYT Crossword, but less known to a beginner. Please feel free to take a closer look at my GitHub.
The Data
The NYT dataset features clues, words, and dates from puzzles between 1993 and 2021. There are 781,573 separate entries in the set that I examined and processed entirely with Python.
Missing Data
The data set had some missing values that I needed to examine.Β 'NaN' is a symbol indicating that the value is missing. Do you notice anything about the clues that point to these 'NaN's?
All of the missing words were originally the wordΒ null, which I had to tidy up. This was a very fun problem to solve, but there were some other missing values as well.
A close look at the number of clues for each date revealed that some puzzles were missing clues:
I confirmed that some clues were missing by referencing puzzles versus the authoritative online resource for the NYT Crossword, xwordinfo.com. I decided to leave dates with missing clues in the set to make a more complete study list. Projects more concerned with tracing trends in the puzzles over time would need to either remove these incomplete dates or restore the missing data.
Puzzles Over Time
Lots of people have done analyses on crosswords to understand how their difficulty changes over the week. For instance, from Monday through Saturday, mean word length gradually goes up while the number of clues goes down. While Sunday puzzles are bigger and look intimidating, their word lengths are shorter and they are intended to be easier than the Saturday puzzle. There is more great information on the NYT crossword here.
Analysis: Common Clues
Before working on the word list, I had a thought to look at the most common clues and see if that informs how crosswords are structured.
These clues are the most common because they are either very specific or very general. Some words that feature in the puzzle may only have one or two plausible denotations in the English language, so only a few clues to cue the average solver. For instance, "JAI _" is the most frequently used clue, because of the frequency of the word "ALAI" in puzzles. Merriam-Webster.com indicates that "ALAI" only has one other usage in English: a mountain range in Kyrgyzstan. These two meanings make up virtually all the clues for βALAIβ.Β
Some clues feature prominently not because they only point to one common word, but instead point to lots of possible words. For example, "Cut", the 11th most common clue, points to 35 (!) different words including:Β AXED, SHEAR, SKIP, HEWN, LOP, SAWN, SCISSOR, MOW, OMIT, AXE, SEVER, SAWED, SLIT, DECREASE, ABRIDGE, DELETE, SHARE, KNIFE, HEWED, HEW, MOWED, MOWN, LESION, LESSENED, ETCH, SHORTENED, and SLASHED. Notice all the different connotations and tenses that the simple word "Cut" can point to. Because the clue is so general and flexible, it is very common in crosswords.Β
This second bar chart shows the contrast in the top clues with the number of words they indicate. Some clues only point to a couple words, and others point to many. Taking a close look at these clues would be a great way for a puzzler to study and improve their crossword game.
Analysis: Common Words
Finding the most common words in the puzzle is a good start to making a study list, but a lot of these top words are already a bit too simple for a beginning solver:
In this chart, you probably know a chunk of these words already. Instead of using this list and having to take time to cross out words you already know, I wanted to do some of that work for you in advance.
Take the raw list of the most common words from the crossword and compare that to this list from wordfrequency.info, a list of 5,050 of the most common and currently-used words in the English language across a variety of media. By eliminating these common words from the puzzle list, we get a closer approximation of an effective study guide: Words that are common in the puzzles, but not as common in English usage.
Here are some samples of the words included in the top 5,050. These are words that are not too obscure to be used in everyday conversation and writing:
Removing these words and trimming that list down to words that feature more than 200 times yields a study list of 173 words.
The Study List
Consider this random sample of 10 words from the study list:
The column 'index' indicates its original placement in the crossword list. For example, "ERIE" is the 12th (11+1) most common word in the NYT Crossword, while "OSLO" is the 87th. 'count' indicates the number of times it has featured in the puzzle. VowelRatio is a function I designed to examine if these common words tend to have a lot of vowels.
I bet that you know of some of these words, but not all. And you may not know all of the ways these words can be used. Some words have a variety of meanings and clues, while others only a few. Even within this sample, there are proper nouns ("OSLO", "OTTO", "ERIE", "ASHE"), other regular English words ("URN", "EERIE"), and abbreviations and non-English phrases ("RNA", "ETAL", "EDS"). A word like "ADA" functions in many contexts, both as an abbreviation for multiple laws and associations, a character name in books, multiple geographical locations, and a proper name like computing pioneer Ada Lovelace.
Working with the List
You could take the list now and work on it on your own, but I also designed a function called "findclues" that allows you to search for all the featured clues that point to any of these words within the Python code. Knowing that the word "ADA"Β is common is one thing, but knowing all the possibilities for its clues another!
The following word cloud of the study list here helps indicate the kinds of words to concentrate on as puzzlers practice. The words to concentrate on learning to improve your crossword game are words that are mostly 3 and 4 letters long. The letter content of these words mirrors the distribution you see in other word games like Scrabble: lots of vowels (particularly A, E, I, and O) and the most-commonly used consonants like L, R, S, N, and T.
Conclusions and Next Steps
For someone who wants to take solving the NYT Crossword Puzzle seriously, my analysis points them in a couple directions. Studying high-yield clues and understanding why they are common is a good strategy. Studying the words in the study list is another good approach. By concentrating on common crossword answers that fall outside the most common words in the English language, we get a vital jumping off point for those who want to improve their game. Puzzlers should keep their eyes peeled for short words, phrases, and abbreviations in their day-to-day life that could fit in with the study list too!
Next steps in this work could include several possibilities. I plan to clean this work up into a database so that you could search individual words and clues.
A way to reinforce the study list's validity would be to trace words that feature in the list over time. A word may have featured prominently, say, in the 1990s, but not since. While it could still be present on the list, maybe it has gone out of fashion with new puzzle creators.
Thanks for reading, and I hope you feel more confident the next time you take a crack at a crossword puzzle!