NYC Data Science Academy| Blog
Bootcamps
Lifetime Job Support Available Financing Available
Bootcamps
Data Science with Machine Learning Flagship ๐Ÿ† Data Analytics Bootcamp Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lesson
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories Testimonials Alumni Directory Alumni Exclusive Study Program
Courses
View Bundled Courses
Financing Available
Bootcamp Prep Popular ๐Ÿ”ฅ Data Science Mastery Data Science Launchpad with Python View AI Courses Generative AI for Everyone New ๐ŸŽ‰ Generative AI for Finance New ๐ŸŽ‰ Generative AI for Marketing New ๐ŸŽ‰
Bundle Up
Learn More and Save More
Combination of data science courses.
View Data Science Courses
Beginner
Introductory Python
Intermediate
Data Science Python: Data Analysis and Visualization Popular ๐Ÿ”ฅ Data Science R: Data Analysis and Visualization
Advanced
Data Science Python: Machine Learning Popular ๐Ÿ”ฅ Data Science R: Machine Learning Designing and Implementing Production MLOps New ๐ŸŽ‰ Natural Language Processing for Production (NLP) New ๐ŸŽ‰
Find Inspiration
Get Course Recommendation Must Try ๐Ÿ’Ž An Ultimate Guide to Become a Data Scientist
For Companies
For Companies
Corporate Offerings Hiring Partners Candidate Portfolio Hire Our Graduates
Students Work
Students Work
All Posts Capstone Data Visualization Machine Learning Python Projects R Projects
Tutorials
About
About
About Us Accreditation Contact Us Join Us FAQ Webinars Subscription An Ultimate Guide to
Become a Data Scientist
    Login
NYC Data Science Acedemy
Bootcamps
Courses
Students Work
About
Bootcamps
Bootcamps
Data Science with Machine Learning Flagship
Data Analytics Bootcamp
Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lessons
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook
Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories
Testimonials
Alumni Directory
Alumni Exclusive Study Program
Courses
Bundles
financing available
View All Bundles
Bootcamp Prep
Data Science Mastery
Data Science Launchpad with Python NEW!
View AI Courses
Generative AI for Everyone
Generative AI for Finance
Generative AI for Marketing
View Data Science Courses
View All Professional Development Courses
Beginner
Introductory Python
Intermediate
Python: Data Analysis and Visualization
R: Data Analysis and Visualization
Advanced
Python: Machine Learning
R: Machine Learning
Designing and Implementing Production MLOps
Natural Language Processing for Production (NLP)
For Companies
Corporate Offerings
Hiring Partners
Candidate Portfolio
Hire Our Graduates
Students Work
All Posts
Capstone
Data Visualization
Machine Learning
Python Projects
R Projects
About
Accreditation
About Us
Contact Us
Join Us
FAQ
Webinars
Subscription
An Ultimate Guide to Become a Data Scientist
Tutorials
Data Analytics
  • Learn Pandas
  • Learn NumPy
  • Learn SciPy
  • Learn Matplotlib
Machine Learning
  • Boosting
  • Random Forest
  • Linear Regression
  • Decision Tree
  • PCA
Interview by Companies
  • JPMC
  • Google
  • Facebook
Artificial Intelligence
  • Learn Generative AI
  • Learn ChatGPT-3.5
  • Learn ChatGPT-4
  • Learn Google Bard
Coding
  • Learn Python
  • Learn SQL
  • Learn MySQL
  • Learn NoSQL
  • Learn PySpark
  • Learn PyTorch
Interview Questions
  • Python Hard
  • R Easy
  • R Hard
  • SQL Easy
  • SQL Hard
  • Python Easy
Data Science Blog > APIs > Data Study on NYT Bestseller

Data Study on NYT Bestseller

David Letzler
Posted on Apr 4, 2017
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Introduction: The Paucity of Book-Market Data

If you want to know what films have generated the highest receipts, box office data has long been available for major releases.  For television, it's not hard to find the Nielsen ratings for any show.  Music sales are a little trickier, but between RIAA certifications and the Billboard charts, you can usually locate what you need to know.

For books, it's a different story.  Did the latest Jodi Picoult bestseller outsell the latest John Grisham?  It's hard to say.  The publishing companies and book stores do not divulge unit sales  Though Nielsen and Amazon both track book sales, their rankings capture only a minority of the market and cannot be compiled over a long period of time without paying a fee.

The New York Times Bestseller Lists don't compile direct unit numbers, but they do provide weekly rankings of books in hardcover, trade paperback, mass-market paperback, and e-book formats, as well as in a variety of subgenres.  Granted, there may be ways of gaming their system, and it's not clear that the lists reflect true sales.  Still, even if they are not perfect, the bestseller lists provide data that is easily accessible (through the Times API) and formatted to allow comparisons over time.

The number of weeks spent as a bestseller may not directly reflect unit sales, but the two are surely correlated.  Moreover, they are important predictive tools in addition to being reflectors of success: getting onto the bestseller list encourages readers to buy the book.  They are probably our best tool to investigate the otherwise-opaque book market.

Objective

How well have the major publishers been doing against each other?  What imprints dominate each genre?  If you're not affiliated with a major publisher, where's your best bet to break through?  And how much is a spot in the prestigious Times Book Review worth?

To find out, I acquired the fiction bestsellers on the four major formats (hardcover, trade paperback, mass-market paperback, and e-book) from June 2008 through the beginning of March 2017, then hand-annotated them with data about their publishers' genre and corporate affiliation.  The compiled results are visualized in this application.  I also used the API to acquire the URLs of every review the Times Book Review published over that time and scraped their text.  In analyzing the data, I made the following discoveries:

  1. The book market is extremely concentrated.
  2. There are only a few genres not completely dominated by the biggest companies, including literary fiction and romance.
  3. Books reviewed by the Times are much more likely to be bestsellers, though it's not clear which is the cause and which the effect.
  4. It is exceedingly hard to determine the judgment of a Times book review without manually reading it.

 The Bestseller Oligopoly Is in Danger of Becoming a Monopoly

Though hundreds of publishing imprints are represented on the bestseller lists, 90% of the space is taken up by a handful of parent companies.  Until 2013, these were known as the Big Six: Random House, Penguin, HarperCollins, Hachette, Simon & Schuster, and Macmillan.  With the merger of the two biggest houses (Random House and Penguin) in 2013, that concentration has only gotten more extreme.  A single company now controls nearly half of the bestseller lists.

Data Study on  NYT BestsellerData Study on  NYT Bestseller

The nested corporate structure of publishing imprints can be dizzying.  For instance, the prestige novels of Margaret Atwood and Ian McEwan are published under the pie_sfpersonal imprint of veteran editor Nan A. Talese.  This might give the impression of a small, boutique operation.  But Talese is owned by the eminent publisher Doubleday, which, since 2009, has been part of Knopf Doubleday, having been merged with Alfred A. Knopf by their joint corporate owner Random House.  Random House is now a subsidiary of Penguin Random House, which is itself jointly operated by the international media conglomerates Bertelsmann and Pearson.

 

pie_litWhat hope does any smaller publisher have against a force so mammoth?

Though the biggest companies dominate mainstream commercial fiction, there is some breathing room within individual genres.   For instance, Macmillanโ€”a distant fifth among the Big Five in most areasโ€”leads the Science Fiction/Fantasy market, via their Tor and Minotaur imprints (respectively run by Tom Doherty Associates and St. Martin's).

Independent houses, led by Grove/Atlantic, do well in literary fiction, cumulatively publishing a fifth of literary imprint bestsellers.  Perhaps most astonishing is the e-book list.  A mere decade ago, the notion of a self-published bestseller would have been laughed at.  Now, nearly half of all romance/erotica e-book bestsellers are self-published.

pie_romStill, the bestseller list is extremely top-heavy, dominated by the biggest-selling books.  On the hardcover, e-book, and mass-market lists, about 10% of total space is taken up by the 1-3 annual books that last on the charts for over a year.  This is even more extreme on the trade paperback list, where over 33% of the list is taken up by books that chart for over a year.

When a book like Gone Girl or Fifty Shades of Gray hits the trade paperback list, that is, it stays there for a long time, preventing other books from climbing into the top 20.  As a result, only a third as many individual trade paperbacks reach bestseller status as in other formats.

Across the board, the typical bestseller in print lasts only 2-3 weeks.  For e-books, attention spans run even shorter: while the top books take up as much space as on the other lists, 75% of digital bestsellers don't merit a second week.

The Book Review Boost Data

Data Study on  NYT BestsellerThe Bestseller lists are compiled by the New York Times Book Review, arguably the most prestigious book review in the country.  Out of the ~40,000 new fiction titles released each year,* the Book Review covers about 350.  Those 350 are about 10 times as likely to make the hardcover and mass-market lists as the average new title, and 30 times as likely to make the trade paperback list.

However, it's not clear how much of this correlation is actually causative.  The Book Review is more likely to review books that are already a good bet to sell well.  Furthermore, while the Book Review impact looks big in the chart on the right, it looks less impressive in the chart on the left.  Since the number of reviewed books is so small, the majority of books that make the bestseller list are not reviewed at all.best_reviews

Findings

We can see the importance of book reviews to the trade paperback bestsellers, as they are reviewed nearly as often as the hardcover bestsellers, despite their smaller  number of bestsellers.  However, only 3% of mass-market bestsellers received a review.  Overall, the bestsellers that did not receive a review far outnumber those that did.  In other words, prestigious though it may be, a Times book review may not be that relevant to how a book sells.  How can we determine their market value?  One way is to try to see whether books that receive a good review do any better than those that do not.

 

 

 New York Times Reviews Are Difficult to Classify

Determining whether or not a Times review is positive, though, is difficult.  Machine-learning classifiers are frequently trained to separate positive from negative commentary, but they are usually designed to handle born-digital material that is tailor-made for easy classification.  For example, the first review on today's top-selling Amazon book (Jay Asher's Thirteen Reasons Why)  contains only slightly more than 100 words, lots of descriptive language reflecting the reviewer's judgment (e.g., "intrigued," "page turner"), and plenty of metadata, including a five-star rating.  It's easy to tell that this is a positive review.

A New York Times review is quite different.  They average about 1200 words, rendering them computationally complex.  Over 75% of their sentences make no direct evaluative statement about the book, instead discussing and contextualizing its contents.  They do not come with a star rating, which means we have no labels on which to train and test models.  The Times API supplies little metadata.  They are written, that is, so that you to actually have to read the article.

This leaves us with two problems.  First, since we are focusing only on fiction, we have to separate out fiction reviews from nonfiction reviews.  Second, we have to devise a scalable method of evaluating the review's judgment.

Locating the "Fiction" Topic

Separating fiction and nonfiction proved relatively simple.  Even if a nonfiction and fiction book share similar contentโ€”say, Erik Larson's Devil in the White City and Thomas Pynchon's Against the Day, which both are partially set around the Chicago World's Fair of 1893โ€”a review of the latter will likely feature different terminology than the former: it will refer more frequently to the "narrator," the "characters," and the "story."

Topic Modeling

To isolate that discourse, we can use an unsupervised machine-learning process called "topic modeling."  Topic modeling assumes that each document corpus is generated by a set of topics, each itself construed as a set of word frequencies across the vocabulary.  Each document in the corpus will have a certain mix of those topics, and that mix will be reflected in the specific word distribution in the document.

topic_modelTraining multiple 30-topic models on the book reviews produced a consistent set of topics.  As expected, most deal with the books' subject matter.  In the model I used,  Topic 9's most frequent words include "science," "brain," and "universe."  Topic 6, similarly, has "sex," "marriage," and "love."  Several topics, though, emphasize the writing process.  Most important, Topic 2, whose most frequent words include "novel," "characters," and "fiction," seems to address issues of fictionality.

There was not, unfortunately, a clear numerical boundary separating fiction and nonfiction, but examining specific reviews with high and low concentrations of Topic 2 confirmed that the former addressed fiction and the later nonfiction.  I had only to sort out the ones in the middle.

Manual Decision Tree

To do so, I constructed a manual decision tree.  Those documents which had less than 8% of the "Fiction" topic I classified as nonfiction, and those over 16% I classified as fiction.  For the middle range, I filtered out three types of reviews as nonfiction.

First were several hundred that contained the word "memoir," since memoirs are nonfiction works written in a semi-fictional style.

Second were those with more than 15% of Topic 8 ("Literary Life"), which included author biographies, works of poetry, and literary essays.  Third were books with more than 15.5% of Topic 20 ("Public Writing"), which included social criticism and creative nonfiction.  For the books that remained, I assigned each one with over 12% of Topic 2 to fiction and the rest to nonfiction.  A spot check suggested this process had about 95% accuracy.

In retrospect, I might have achieved better results by manually labeling ~400 reviews as fiction or nonfiction, then letting a Random Forest automate the decision tree process I underwent manually.  It might have been faster and more accurate.  Still, I am satisfied with my results.

The Limitations of a Sentiment Lexicon

Determining a review's critical evaluation proved more difficult.  The most basic algorithmic tool for classifying reviews are sentiment lexicons, which assign positive/negative numbers to each word in a document based on its typical emotional valence.  However, because the majority of a Times review describes rather than evaluates a book, sentiment lexicons can become easily confused.  For instance, take this passage from Michiko Kakutani's review of Toni Morrison's Home:

Threaded through the story are reminders of our country's vicious inhospitality toward some of its own. On his way south, Frank makes use of a Green Book, part of the essential series of travelers' guides for African-Americans during a more overtly racist era. On a train, he encounters fellow passengers who've been beaten and bloodied simply for trying to buy coffee from a white establishment. He meets a boy who, out playing with a cap gun, was shot by a policeman and lost the use of one arm.

In context, this quote approvingly describes Morrison's depiction of racial intolerance.  But even a sentiment lexicon as subtle as the one created by literature professor Matthew Jockers (dubbed "Syuzhet") sees words like "vicious," "bloodied," and "racist" and decides this to be a starkly negative text, giving it one of the lowest ratings in the set.  Overall, on a three-category classification test ("positive," mixed," "negative"), a straight sentiment score proved little better than random guessing.

Filtering

A logical route for surmounting this problem is to filter out sentences that do not evaluate the book.  The best method I devised for doing so was to eliminate all sentences that make no mention of the author's surname or the book's title.  By getting rid of sentences like those above in analyzing Home, the sentiment classifier focuses on ones like "This haunting, slender novel is a kind of tiny Rosetta Stone to Toni Morrison's entire oeuvre" and returns a more accurate score.

A spot check suggested that this approach still yielded only 66% accuracy.  One reason is that sentiment lexicons are bad at understanding irony.  For instance, take this arch assessment of Nora Roberts's The Villa by Janet Maslin:

So it would be an understatement to say that Nora Roberts deals in feminine wish fulfillment, especially when David turns out to be the kind of man who is excited by making the perfect jewelry purchase for his beloved, when he has teenage children who won't really mind a stepmom, and when he also turns out to be a stern corporate boss ready to upbraid Pilar's ex-husband on the job. Even when David is injured during one of the occasional moments of light mayhem in ''The Villa,'' he remains the romance reader's idea of a perfect 10.

Syuzhet sees words like "perfect," "excited," and "fulfillment" and gives this review a strong positive rating of 7.1.  Obviously, that does not accurately reflect Maslin's attitude.

The Limitations of Neural Networks

I attempted several clustering efforts to determine a review's evaluation, but because the majority of each review addressed content, standard algorithms proved ineffective.  Hoping for a more sophisticated approach to sentiment classification, one that would take context into account alongside individual word choice, I built a neural network based on the Word2Vec algorithm.**

Word2Vec generates a numeric vector for each word in a corpus, using back-propagation to align the word with its common collocates.  Consequently, each word's vector will be similar to those assigned to functionally-similar words.  When loaded into a convolution neural network, a Word2Vec model can consequently be used to classify sentences.

I spent some time tuning Word2Vec parameters by manually examining whether they produced appropriate similarity scores for common critical words like "chilling," "hackneyed," and "sympathize."  Next, I hand-labeled two thousand sentences from fifty reviews based on whether they were "positive," "neutral," or "negative."  I used a subset of those two thousand to train the CNN, which produced 81% accuracy on the test subset.  Finally, I loaded every sentence from the fiction reviews into the CNN to predict classifications for those sentences, then recombined the sentences to produce overall classifications.

It didn't work.  On a spot check, I found accuracy to be barely better than random guessing.  The accuracy improved when the inputs were restricted to sentences mentioning the author or book's name, but they were still below 60%.  That left raw word-by-word sentiment score on filtered sentences as the best remaining method.

Data Results

Given the inaccuracies in the sentiment score, any further results will need to be taken with a rock-sized portion of salt.  Still, to finish the exercise, I found that these figures did not show significant differences in reviewer attitudes toward bestsellers and non-bestsellers.  The bestseller median was slightly higher, but this difference was dwarfed by the standard deviation of the data.  Similarly, bestsellers received slightly longer reviews, but not significantly longer.

bad_sentbad_box2

I am not satisfied with my results.  The problem of classifying Times reviews is a difficult one, and the subject will require further study and experimentation.

Lessons for Further Work

The superiority of basic sentiment score to the neural network was surprising but logical.  Sentiment scores make word-level distinctions at various levels of intensity, while neural networks based on models like Word2Vec are limited to broad classifications of whole sentences.  That is, while the CNN  could only judge a sentence on the range of three values {-1, 0, 1}, the Syuzhet sentiment score could evaluate one across a potentially infinite spectrum, though in practice it was limited to real numbers on the range [-6, 6].

Regardless, the sentiment score is more transparent and  more flexible, a conclusion Jockers and co-author Jodie Archer reached in their own work text-mining bestsellers.

In further work, I would pursue the following avenues:

  1. We might refine the sentiment score by producing a criticism-specific sentiment lexicon.  A keyness test could be applied to the review corpus to isolate especially prominent critical words to which criticism-specific scores could be assigned.  This would not solve problems like those surrounding irony, but it might better handle words that a regular sentiment analysis would misclassify (e.g., "terrifying").
  2. Given that the CNN was limited by a lack of labeled reviews, we could give it a second chance by importing lexically-similar book reviews from a source that labels its reviews with a ratingโ€”e.g., the starred/unstarred Publisher's Weekly reviews.  This would still be difficult, because the rating would be at the review rather than sentence level, but it would improve the training process.
  3. Once a more satisfactory sentiment measure was devised, we could use that information (in combination with review length, time of review, etc.) to generate a predictive model for the bestsellers.

Conclusion

In some ways, the question of how to classify Times reviews is merely an intellectual problem.  Still, it could provide some real insight.  If we could confirm that getting a Times review is all that matters, with the review's judgment being secondary, that would have significant implications for marketing strategies.  If the Times' opinion doesn't matter, it might be better to simply engage its attention, whether positively or not, rather than to try to court its favor.

*There is no good data on the actual number of new fiction titles published each year.  The ProQuest division R.R. Bowker logs about 50,000 new fiction ISBNs each year, but that figure a) double-counts titles released in multiple formats (i.e., a paperback and hardcover of the same book will receive different ISBNs) and b) under-counts digitally-published work.  My rough estimates are based on a back-of-napkin calculation that (based on overall market share and prices) there are 18,000 new trade paperback and e-book fiction titles each annually, plus 9,000 new hardcover and mass-market titles each.

**I used Word2Vec instead of Doc2Vec because, again, only a fraction of each review's sentences were evaluative.  Using Doc2Vec would likely cause the CNN to cluster reviews based on content rather than evaluation.

NYT_API_logo

About Author

David Letzler

Dr. David Letzler has received a Ph.D. in English Literature from the Graduate Center at CUNY and an M.A. in creative writing from Temple University. While researching long, complicated novels and the cognitive science of attention for his...
View all posts by David Letzler >

Related Articles

Capstone
The Convenience Factor: How Grocery Stores Impact Property Values
Machine Learning
Pandemic Effects on the Ames Housing Market and Lifestyle
APIs
Female Artists: MoMA Analysis for Art Collectors
APIs
Music and Audio Quantifying Recommendation Data
APIs
Data Analysis on Local Used Items with Python and Tableau

Leave a Comment

Cancel reply

You must be logged in to post a comment.

David Letzler May 12, 2017
Fair question. It's 2013-2017. I probably should have input that on the graphic (and may still, if I get some spare time), but you can work it out from context. Basically, there's one single time split allowed on the dashboard (which you can explore yourself by clicking the link in the Introduction) at 2013, separating out the period before and after the big Penguin-Random merger.
rkiga May 12, 2017
I assumed it was for 2008-2017, but that seems like a lot of self-published Romance when stretching that far back. Also, is this site missing a "submit comment" button? I can only reply by hitting enter in the Email box. 0_o
rkiga May 12, 2017
Hey David, Really great work. But I can't figure out the dates for the charts of SFF, Literary, and Romance. Can you list them?

View Posts by Categories

All Posts 2399 posts
AI 7 posts
AI Agent 2 posts
AI-based hotel recommendation 1 posts
AIForGood 1 posts
Alumni 60 posts
Animated Maps 1 posts
APIs 41 posts
Artificial Intelligence 2 posts
Artificial Intelligence 2 posts
AWS 13 posts
Banking 1 posts
Big Data 50 posts
Branch Analysis 1 posts
Capstone 206 posts
Career Education 7 posts
CLIP 1 posts
Community 72 posts
Congestion Zone 1 posts
Content Recommendation 1 posts
Cosine SImilarity 1 posts
Data Analysis 5 posts
Data Engineering 1 posts
Data Engineering 3 posts
Data Science 7 posts
Data Science News and Sharing 73 posts
Data Visualization 324 posts
Events 5 posts
Featured 37 posts
Function calling 1 posts
FutureTech 1 posts
Generative AI 5 posts
Hadoop 13 posts
Image Classification 1 posts
Innovation 2 posts
Kmeans Cluster 1 posts
LLM 6 posts
Machine Learning 364 posts
Marketing 1 posts
Meetup 144 posts
MLOPs 1 posts
Model Deployment 1 posts
Nagamas69 1 posts
NLP 1 posts
OpenAI 5 posts
OpenNYC Data 1 posts
pySpark 1 posts
Python 16 posts
Python 458 posts
Python data analysis 4 posts
Python Shiny 2 posts
R 404 posts
R Data Analysis 1 posts
R Shiny 560 posts
R Visualization 445 posts
RAG 1 posts
RoBERTa 1 posts
semantic rearch 2 posts
Spark 17 posts
SQL 1 posts
Streamlit 2 posts
Student Works 1687 posts
Tableau 12 posts
TensorFlow 3 posts
Traffic 1 posts
User Preference Modeling 1 posts
Vector database 2 posts
Web Scraping 483 posts
wukong138 1 posts

Our Recent Popular Posts

AI 4 AI: ChatGPT Unifies My Blog Posts
by Vinod Chugani
Dec 18, 2022
Meet Your Machine Learning Mentors: Kyle Gallatin
by Vivian Zhang
Nov 4, 2020
NICU Admissions and CCHD: Predicting Based on Data Analysis
by Paul Lee, Aron Berke, Bee Kim, Bettina Meier and Ira Villar
Jan 7, 2020

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day ChatGPT citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay football gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income industry Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI

NYC Data Science Academy

NYC Data Science Academy teaches data science, trains companies and their employees to better profit from data, excels at big data project consulting, and connects trained Data Scientists to our industry.

NYC Data Science Academy is licensed by New York State Education Department.

Get detailed curriculum information about our
amazing bootcamp!

Please enter a valid email address
Sign up completed. Thank you!

Offerings

  • HOME
  • DATA SCIENCE BOOTCAMP
  • ONLINE DATA SCIENCE BOOTCAMP
  • Professional Development Courses
  • CORPORATE OFFERINGS
  • HIRING PARTNERS
  • About

  • About Us
  • Alumni
  • Blog
  • FAQ
  • Contact Us
  • Refund Policy
  • Join Us
  • SOCIAL MEDIA

    ยฉ 2025 NYC Data Science Academy
    All rights reserved. | Site Map
    Privacy Policy | Terms of Service
    Bootcamp Application