NYC Data Science Academy| Blog
Bootcamps
Lifetime Job Support Available Financing Available
Bootcamps
Data Science with Machine Learning Flagship 🏆 Data Analytics Bootcamp Artificial Intelligence Bootcamp New Release 🎉
Free Lesson
Intro to Data Science New Release 🎉
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook Graduate Outcomes Must See 🔥
Alumni
Success Stories Testimonials Alumni Directory Alumni Exclusive Study Program
Courses
View Bundled Courses
Financing Available
Bootcamp Prep Popular 🔥 Data Science Mastery Data Science Launchpad with Python View AI Courses Generative AI for Everyone New 🎉 Generative AI for Finance New 🎉 Generative AI for Marketing New 🎉
Bundle Up
Learn More and Save More
Combination of data science courses.
View Data Science Courses
Beginner
Introductory Python
Intermediate
Data Science Python: Data Analysis and Visualization Popular 🔥 Data Science R: Data Analysis and Visualization
Advanced
Data Science Python: Machine Learning Popular 🔥 Data Science R: Machine Learning Designing and Implementing Production MLOps New 🎉 Natural Language Processing for Production (NLP) New 🎉
Find Inspiration
Get Course Recommendation Must Try 💎 An Ultimate Guide to Become a Data Scientist
For Companies
For Companies
Corporate Offerings Hiring Partners Candidate Portfolio Hire Our Graduates
Students Work
Students Work
All Posts Capstone Data Visualization Machine Learning Python Projects R Projects
Tutorials
About
About
About Us Accreditation Contact Us Join Us FAQ Webinars Subscription An Ultimate Guide to
Become a Data Scientist
    Login
NYC Data Science Acedemy
Bootcamps
Courses
Students Work
About
Bootcamps
Bootcamps
Data Science with Machine Learning Flagship
Data Analytics Bootcamp
Artificial Intelligence Bootcamp New Release 🎉
Free Lessons
Intro to Data Science New Release 🎉
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook
Graduate Outcomes Must See 🔥
Alumni
Success Stories
Testimonials
Alumni Directory
Alumni Exclusive Study Program
Courses
Bundles
financing available
View All Bundles
Bootcamp Prep
Data Science Mastery
Data Science Launchpad with Python NEW!
View AI Courses
Generative AI for Everyone
Generative AI for Finance
Generative AI for Marketing
View Data Science Courses
View All Professional Development Courses
Beginner
Introductory Python
Intermediate
Python: Data Analysis and Visualization
R: Data Analysis and Visualization
Advanced
Python: Machine Learning
R: Machine Learning
Designing and Implementing Production MLOps
Natural Language Processing for Production (NLP)
For Companies
Corporate Offerings
Hiring Partners
Candidate Portfolio
Hire Our Graduates
Students Work
All Posts
Capstone
Data Visualization
Machine Learning
Python Projects
R Projects
About
Accreditation
About Us
Contact Us
Join Us
FAQ
Webinars
Subscription
An Ultimate Guide to Become a Data Scientist
Tutorials
Data Analytics
  • Learn Pandas
  • Learn NumPy
  • Learn SciPy
  • Learn Matplotlib
Machine Learning
  • Boosting
  • Random Forest
  • Linear Regression
  • Decision Tree
  • PCA
Interview by Companies
  • JPMC
  • Google
  • Facebook
Artificial Intelligence
  • Learn Generative AI
  • Learn ChatGPT-3.5
  • Learn ChatGPT-4
  • Learn Google Bard
Coding
  • Learn Python
  • Learn SQL
  • Learn MySQL
  • Learn NoSQL
  • Learn PySpark
  • Learn PyTorch
Interview Questions
  • Python Hard
  • R Easy
  • R Hard
  • SQL Easy
  • SQL Hard
  • Python Easy
Data Science Blog > Spark > Data-driven Crossword Puzzle Solving

Data-driven Crossword Puzzle Solving

Rachel Kogan
Posted on May 10, 2017

The skills the authors demonstrated here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

There’s a misconception that being good at crosswords and getting NYT crossword answers quickly requires knowledge of trivia, and it couldn't be more false. Sometimes a crossword constructor will resort to an obscure word just to get all the clues to fit, but trivia runs counter to the goal of the New York Times crossword, and puzzles with too many esoteric clues don't get printed. Let's collect data to see how to solve crossword puzzles.

A good New York Times crossword puzzle consists of two elements:

  • clever puns/jokes
  • clue-answer pairs that reflect the zeitgeist

“Zeitgeist” is a German word which means “spirit of the times”.  The zeitgeist is the opposite of trivia; it is the collection of cultural references that should be familiar to most people.

My project is about the second bullet point: trying to understand and visualize how the NYT crossword puzzle stays current and captures the spirit of the time in which it is published.

I. Data

I scraped five years' of clue-answer pairs from the crossword blog xwordinfo.com using scrapy.  There was a minor issue where my spider would get redirected if I tried to grab too much data at a time, so I had to crawl in chunks. Ultimately I was able to get most of the data I wanted, and I believe that anything excluded is missing completely at random.

I also scraped all the words added to the OED over the past four years (about 3000 words), and the entire Urban Dictionary word of the day archive (about 4000 words).  Lastly, I obtained a list of the 5000 most common English words from the Corpus of Contemporary American English (COCA).

II. Analysis

I examined two different classes of answers:

  • frequently-used answers, and how the clues to these answers change throughout time
  • answers that have recently been used for the very first time (known as "debuts")

A. Frequently-Used Answers

In order to analyze frequent words, let’s briefly summarize which words are actually showing up a lot in the crossword puzzle.  Here are the most commonly used crossword answers, along with their frequency counts over the last five years.

NYT Frequently Used Answers

So we can see it's a lot of three-letter words, and a lot of the same letters appearing throughout the list.  In fact, out of the 5000 most common crossword puzzle answers, about 1400 are three letters long (of the 5000 most common words in the English language, only about 300 have three letters).  There aren't 1400 common three-letter words in the English language, so we get a lot of three-letter prefixes, acronyms, and names.

NYT Top Word Lengths

We can learn a lot about the crossword puzzle by tracing some of these three-letter answers throughout time and cataloging how the clues change.  I chose the following clues intentionally to illustrate how the NYT crossword puzzle stays current.

HBO has been an answer in the crossword puzzle fourteen times in the past 5 years, and it’s always clued with a specific TV show: "Game of Thrones” network", "The Newsroom" channel, etc.

HBO

In this graph, the colorful dots represent the show appearing as an HBO crossword clue, and the black dots represent the year that show premiered.

If you follow which TV show was used throughout time, you can see that:

  • The editors are trying to switch it up, so in any year they use a few different shows
  • The editors are trying to stay current, so as new shows come out they add them to the clue roster – most recently, True Detective
  • The editors are trying to use the most popular shows, so there’s at least a year lag between a show premiering and the show appearing in the crossword for the first time

If it keeps up its current level of popularity, I predict that West World will appear in the crossword puzzle as an HBO answer sometime in 2018.

The next answer I analyzed was LIN.  LIN has appeared thirteen times in the past 5 years, and it’s always clued as a person's first or last name, for example: "Justin who directed four of the Fast and the Furious movies" or "Jeremy of the NBA".

LIN Clue

In this graph, the orange dots represent LIN clues and the colorful dots represent relevant current events.

Jeremy Lin is a basketball player who started for the Knicks in 2012 and sparked a fan craze called LINSANITY. And you can see that Jeremy Lin was the go-to LIN clue for a while after that. But then he went and played for Houston, and the crossword constructors started rotating with Justin Lin the director, and Lin Biao, a figure in Communist China. And I don't think it's a coincidence that Lin Manuel-Miranda was LIN clue three days before his show won 11 Tonys, or that Justin Lin showed up a few months after the Star Trek premier.

You can see that LIN hasn’t appeared yet in 2017, but Jeremy Lin is back in NYC, playing for the Brooklyn Nets, and I predict that Jeremy Lin will make a crossword comeback.

B. Debut Answers

Debut answers are words that appear as answers in a puzzle for the very first time.  There's usually at least a few debut answers every day.  Here are some debut answers from the most recent Sunday crossword:

Debut answers usually come in one of two types:

  • Long multi-word jokes, usually related to the theme of the puzzle, that will probably never reappear
    • MODELYODEL, MASSAGEPASSAGE
  • New words added to the crossword corpus that may reappear
    • slang words like SWOLE
    • tech jargon like MOOC
    • celebrities like Amy POEHLER (I’m surprised this is her first xword appearance because she has been famous for a while but I guess her last name is a little long for the crossword.)

I was curious about whether words were being added faster to the Oxford English dictionary corpus or the NYT crossword puzzle corpus, so I scraped all the new additions to the OED over the past four years.  It turns out to be kind of a dead heat with few discernible patterns.

In this timeline graph, each side of the bar represents the word's addition to a corpus; the color of the bar represents whether the crossword or the OED was first.

I was pretty surprised that "emoji" was adapted before "selfie".  I was taking selfies long before I ever used an emoji.

I also scraped urban dictionary to see if their words of the day end up in the NYT crossword, and they do! There’s actually a lot more overlap with UD than with the OED.  Here's a few of the overlapping words below, along with the debut date for each corpus.

It's not too surprising that words show up in the urban dictionary a lot earlier than they show up in the crossword. But it is interesting is that almost all of these words were submitted to urban dictionary before 2010.  It’s possible that more recent words just haven’t shown up in the crossword yet, but I think it’s suggestive that UD had a golden age is now on the decline.

III. Conclusion

I used to try to do crossword puzzles from before I was born, and I found them impossible.   So I assumed that the puzzles were just objectively harder back then.

Now after this project I no longer think that’s the case. I think that the NYT Crossword is so aligned with its publication era that it's very difficult to do puzzles that you didn't live through.

IV. Ideas for Further Exploration

  • Natural Language Processing
    • Get better at grouping clues and answers that are similar but not identical
    • Figure out how to distinguish between compound words and multi-word answers
    • Catalogue new portmanteaus and compound words
  • Build a crossword solver

V. Acknowledgements

Thanks to Zeyu Zhang for teaching me how to scrape a password-protected website, and to Thomas Kolasa for reminding me not to push my password to github.

VI. Addendum

A debut word from Feb 10, 2017, and the only Friday crossword I've ever solved without cheating:

Did you know we offer a FREE 30 hour Introductory Data Science Course?

New call-to-action

About Author

Rachel Kogan

Rachel graduated from Princeton in 2013 with a B.A. in Mathematics, and then worked at Morgan Stanley as a mortgage-backed securities trader for two years. She's currently a developer at Bloomberg L.P. Check out her blog at https://rachel1792.github.io/.
View all posts by Rachel Kogan >

Related Articles

Capstone
Using NLP to Explore Unconventional Targets
Python
Video Game Descriptions: Do Some Words Sell Better?
Capstone
Using Data for A Recipe Recommendation System
Capstone
NLP Recipe Search Engine
Data Visualization
Sentiment Data Analysis of Amazon's Decaying Product Ratings

Leave a Comment

Cancel reply

You must be logged in to post a comment.

Google August 31, 2021
Google Check beneath, are some absolutely unrelated websites to ours, nonetheless, they're most trustworthy sources that we use.
Google January 30, 2021
Google Although internet websites we backlink to below are considerably not related to ours, we really feel they may be actually worth a go as a result of, so have a look.
Google January 24, 2021
Google Just beneath, are numerous absolutely not related web sites to ours, having said that, they're certainly worth going over.
CBD Oil For Dogs December 16, 2020
CBD Oil For Dogs [...]Sites of interest we have a link to[...]
Mac RDP August 28, 2020
Mac RDP [...]check below, are some entirely unrelated web-sites to ours, on the other hand, they're most trustworthy sources that we use[...]
MKsOrb August 26, 2020
MKsOrb [...]Every when in a even though we select blogs that we study. Listed below would be the most current web sites that we opt for [...]
OnHax Me August 19, 2020
OnHax Me [...]Every the moment in a though we select blogs that we read. Listed below are the most current web pages that we select [...]
mksorb.com August 5, 2020
mksorb.com [...]Here are some of the internet sites we suggest for our visitors[...]
mksorb.com July 30, 2020
mksorb.com [...]here are some links to sites that we link to for the reason that we consider they're really worth visiting[...]
cbd oil for pain July 9, 2020
cbd oil for pain [...]just beneath, are various totally not associated sites to ours, however, they're certainly really worth going over[...]
Fingerprint December 17, 2017
Thanks for the great tips! I do have a question however that I think you could probably answer. I was wondering, What is difference between Interaction design, Visual Design, Web design, UX design, UI design, UI development? I'm really confused about how they are differnet. Any insight would be greatly appreciated!
لایسنس سانترال پاناسونیک October 22, 2017
Great goods from you, man. I have consider your stuff previous to and you are just extremely wonderful. I really like what you have acquired here, certainly like what you are stating and the way through which you assert it. You are making it entertaining and you still care for to stay it wise. I cant wait to learn much more from you. That is really a terrific web site.
homescapes free coins October 16, 2017
Much like Gardenscapes I like this game.
Rachel Kogan May 26, 2017
Thanks for the feedback, Rex! I'm a big fan of your crossword blog.
Rex May 25, 2017
AMYPOEHLER debuted many years earlier. I know 'cause I did it.

View Posts by Categories

All Posts 2399 posts
AI 7 posts
AI Agent 2 posts
AI-based hotel recommendation 1 posts
AIForGood 1 posts
Alumni 60 posts
Animated Maps 1 posts
APIs 41 posts
Artificial Intelligence 2 posts
Artificial Intelligence 2 posts
AWS 13 posts
Banking 1 posts
Big Data 50 posts
Branch Analysis 1 posts
Capstone 206 posts
Career Education 7 posts
CLIP 1 posts
Community 72 posts
Congestion Zone 1 posts
Content Recommendation 1 posts
Cosine SImilarity 1 posts
Data Analysis 5 posts
Data Engineering 1 posts
Data Engineering 3 posts
Data Science 7 posts
Data Science News and Sharing 73 posts
Data Visualization 324 posts
Events 5 posts
Featured 37 posts
Function calling 1 posts
FutureTech 1 posts
Generative AI 5 posts
Hadoop 13 posts
Image Classification 1 posts
Innovation 2 posts
Kmeans Cluster 1 posts
LLM 6 posts
Machine Learning 364 posts
Marketing 1 posts
Meetup 144 posts
MLOPs 1 posts
Model Deployment 1 posts
Nagamas69 1 posts
NLP 1 posts
OpenAI 5 posts
OpenNYC Data 1 posts
pySpark 1 posts
Python 16 posts
Python 458 posts
Python data analysis 4 posts
Python Shiny 2 posts
R 404 posts
R Data Analysis 1 posts
R Shiny 560 posts
R Visualization 445 posts
RAG 1 posts
RoBERTa 1 posts
semantic rearch 2 posts
Spark 17 posts
SQL 1 posts
Streamlit 2 posts
Student Works 1687 posts
Tableau 12 posts
TensorFlow 3 posts
Traffic 1 posts
User Preference Modeling 1 posts
Vector database 2 posts
Web Scraping 483 posts
wukong138 1 posts

Our Recent Popular Posts

AI 4 AI: ChatGPT Unifies My Blog Posts
by Vinod Chugani
Dec 18, 2022
Meet Your Machine Learning Mentors: Kyle Gallatin
by Vivian Zhang
Nov 4, 2020
NICU Admissions and CCHD: Predicting Based on Data Analysis
by Paul Lee, Aron Berke, Bee Kim, Bettina Meier and Ira Villar
Jan 7, 2020

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day ChatGPT citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay football gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income industry Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI

NYC Data Science Academy

NYC Data Science Academy teaches data science, trains companies and their employees to better profit from data, excels at big data project consulting, and connects trained Data Scientists to our industry.

NYC Data Science Academy is licensed by New York State Education Department.

Get detailed curriculum information about our
amazing bootcamp!

Please enter a valid email address
Sign up completed. Thank you!

Offerings

  • HOME
  • DATA SCIENCE BOOTCAMP
  • ONLINE DATA SCIENCE BOOTCAMP
  • Professional Development Courses
  • CORPORATE OFFERINGS
  • HIRING PARTNERS
  • About

  • About Us
  • Alumni
  • Blog
  • FAQ
  • Contact Us
  • Refund Policy
  • Join Us
  • SOCIAL MEDIA

    © 2025 NYC Data Science Academy
    All rights reserved. | Site Map
    Privacy Policy | Terms of Service
    Bootcamp Application