The NYT crossword puzzle is approximately as cool as the OED

Rachel Kogan
Posted on May 10, 2017

There’s a misconception that being good at crosswords requires knowledge of trivia, and it couldn't be more false. Sometimes a crossword constructor will resort to an obscure word just to get all the clues to fit, but trivia runs counter to the goal of the New York Times crossword, and puzzles with too many esoteric clues don't get printed.

A good New York Times crossword puzzle consists of two elements:

  • clever puns/jokes
  • clue-answer pairs that reflect the zeitgeist

“Zeitgeist” is a german word which means “spirit of the times”.  The zeitgeist is the opposite of trivia; it is the collection of cultural references that should be familiar to most people.

My project is about the second bullet point: trying to understand and visualize how the NYT crossword puzzle stays current and captures the spirit of the time in which it is published.

I. Data

I scraped five years' of clue-answer pairs from the crossword blog xwordinfo.com using scrapy.  There was a minor issue where my spider would get redirected if I tried to grab too much data at a time, so I had to crawl in chunks. Ultimately I was able to get most of the data I wanted, and I believe that anything excluded is missing completely at random.

I also scraped all the words added to the OED over the past four years (about 3000 words), and the entire Urban Dictionary word of the day archive (about 4000 words).  Lastly, I obtained a list of the 5000 most common English words from the Corpus of Contemporary American English (COCA).

II. Analysis

I examined two different classes of answers:

  • frequently-used answers, and how the clues to these answers change throughout time
  • answers that have recently been used for the very first time (known as "debuts")

A. Frequently-Used Answers

In order to analyze frequent words, let’s briefly summarize which words are actually showing up a lot in the crossword puzzle.  Here are the most commonly used crossword answers, along with their frequency counts over the last five years.

So we can see it's a lot of three-letter words, and a lot of the same letters appearing throughout the list.  In fact, out of the 5000 most common crossword puzzle answers, about 1400 are three letters long (of the 5000 most common words in the English language, only about 300 have three letters).  There aren't 1400 common three-letter words in the English language, so we get a lot of three-letter prefixes, acronyms, and names.

We can learn a lot about the crossword puzzle by tracing some of these three-letter answers throughout time and cataloging how the clues change.  I chose the following clues intentionally to illustrate how the NYT crossword puzzle stays current.

HBO has been an answer in the crossword puzzle fourteen times in the past 5 years, and it’s always clued with a specific TV show: "Game of Thrones” network", "The Newsroom" channel, etc.

In this graph, the colorful dots represent the show appearing as an HBO crossword clue, and the black dots represent the year that show premiered.

If you follow which TV show was used throughout time, you can see that:

  • The editors are trying to switch it up, so in any year they use a few different shows
  • The editors are trying to stay current, so as new shows come out they add them to the clue roster – most recently, True Detective
  • The editors are trying to use the most popular shows, so there’s at least a year lag between a show premiering and the show appearing in the crossword for the first time

If it keeps up its current level of popularity, I predict that West World will appear in the crossword puzzle as an HBO answer sometime in 2018.

The next answer I analyzed was LIN.  LIN has appeared thirteen times in the past 5 years, and it’s always clued as a person's first or last name, for example: "Justin who directed four of the Fast and the Furious movies" or "Jeremy of the NBA".

In this graph, the orange dots represent LIN clues and the colorful dots represent relevant current events.

Jeremy Lin is a basketball player who started for the Knicks in 2012 and sparked a fan craze called LINSANITY. And you can see that Jeremy Lin was the go-to LIN clue for a while after that. But then he went and played for Houston, and the crossword constructors started rotating with Justin Lin the director, and Lin Biao, a figure in Communist China. And I don't think it's a coincidence that Lin Manuel-Miranda was LIN clue three days before his show won 11 Tonys, or that Justin Lin showed up a few months after the Star Trek premier.

You can see that LIN hasn’t appeared yet in 2017, but Jeremy Lin is back in NYC, playing for the Brooklyn Nets, and I predict that Jeremy Lin will make a crossword comeback.

B. Debut Answers

Debut answers are words that appear as answers in a puzzle for the very first time.  There's usually at least a few debut answers every day.  Here are some debut answers from the most recent Sunday crossword:

Debut answers usually come in one of two types:

  • Long multi-word jokes, usually related to the theme of the puzzle, that will probably never reappear
    • MODELYODEL, MASSAGEPASSAGE
  • New words added to the crossword corpus that may reappear
    • slang words like SWOLE
    • tech jargon like MOOC
    • celebrities like Amy POEHLER (I’m surprised this is her first xword appearance because she has been famous for a while but I guess her last name is a little long for the crossword.)

I was curious about whether words were being added faster to the Oxford English dictionary corpus or the NYT crossword puzzle corpus, so I scraped all the new additions to the OED over the past four years.  It turns out to be kind of a dead heat with few discernible patterns.

In this timeline graph, each side of the bar represents the word's addition to a corpus; the color of the bar represents whether the crossword or the OED was first.

I was pretty surprised that "emoji" was adapted before "selfie".  I was taking selfies long before I ever used an emoji.

I also scraped urban dictionary to see if their words of the day end up in the NYT crossword, and they do! There’s actually a lot more overlap with UD than with the OED.  Here's a few of the overlapping words below, along with the debut date for each corpus.

It's not too surprising that words show up in the urban dictionary a lot earlier than they show up in the crossword. But it is interesting is that almost all of these words were submitted to urban dictionary before 2010.  It’s possible that more recent words just haven’t shown up in the crossword yet, but I think it’s suggestive that UD had a golden age is now on the decline.

III. Conclusion

I used to try to do crossword puzzles from before I was born, and I found them impossible.   So I assumed that the puzzles were just objectively harder back then.

Now after this project I no longer think that’s the case. I think that the NYT Crossword is so aligned with its publication era that it's very difficult to do puzzles that you didn't live through.

IV. Ideas for Further Exploration

  • Natural Language Processing
    • Get better at grouping clues and answers that are similar but not identical
    • Figure out how to distinguish between compound words and multi-word answers
    • Catalogue new portmanteaus and compound words
  • Build a crossword solver

V. Acknowledgements

Thanks to Zeyu Zhang for teaching me how to scrape a password-protected website, and to Thomas Kolasa for reminding me not to push my password to github.

VI. Addendum

A debut word from Feb 10, 2017, and the only Friday crossword I've ever solved without cheating:


About Author

Rachel Kogan

Rachel Kogan

Rachel graduated from Princeton in 2013 with a B.A. in Mathematics, and then worked at Morgan Stanley as a mortgage-backed securities trader for two years. Her lifelong obsession with math has more recently progressed into a fascination with...
Read more

Leave Responses

Your email address will not be published. Required fields are marked *

Fingerprint December 17, 2017
Thanks for the great tips! I do have a question however that I think you could probably answer. I was wondering, What is difference between Interaction design, Visual Design, Web design, UX design, UI design, UI development? I'm really confused about how they are differnet. Any insight would be greatly appreciated!
home page October 25, 2017
I'll right away clutch your rss as I can't find your email subscription link or newsletter service. Do you have any? Please allow me realize so that I may just subscribe. Thanks.|
لایسنس سانترال پاناسونیک October 22, 2017
Great goods from you, man. I have consider your stuff previous to and you are just extremely wonderful. I really like what you have acquired here, certainly like what you are stating and the way through which you assert it. You are making it entertaining and you still care for to stay it wise. I cant wait to learn much more from you. That is really a terrific web site.
homescapes free coins October 16, 2017
Much like Gardenscapes I like this game.
BryceX October 6, 2017
I see your page needs some fresh content. Writing manually takes a lot of time, but there is tool for this boring task, search for; Wrastain's tools for content
TandyX September 21, 2017
I have noticed that your website needs some fresh articles. Writing manually takes a lot of time, but there is tool for this boring task, search for: Wrastain's tools for content
86Calvin September 15, 2017
I have noticed you don't monetize your blog, don't waste your traffic, you can earn additional cash every month because you've got hi quality content. If you want to know how to make extra bucks, search for: best adsense alternative Wrastain's tools
URL September 9, 2017
... [Trackback] [...] Read More: nycdatascience.com/blog/student-works/web-scraping/nyt-crossword-puzzle-approximately-cool-oed/ [...]
lBc1jkvBG4 September 9, 2017
18539 477159Most beneficial human beings toasts need to amuse and present give about the couple. Beginner audio systems previous to obnoxious throngs would be wise to remember often the valuable signal making use of grow to be, which is to be an individuals home. finest man speech examples 899037
Jack May 27, 2017
Howԁy just ᴡanted to give you a qսick heads up. The text iin your post seem tߋ be runnkng off thhe screen iin Intᥱrnet exⲣlorer. I'm not sure if tɦis is a format issuhe oг sߋmething tto do with internet browsewr compatibiliy but I fіgured I'd post tⲟ let you know. The layout look great though! Hope you get the issue solveɗ soon. Many thanks
Rachel Kogan May 26, 2017
Thanks for the feedback, Rex! I'm a big fan of your crossword blog.
Rex May 25, 2017
AMYPOEHLER debuted many years earlier. I know 'cause I did it.