NYC Data Science Academy| Blog
Bootcamps
Lifetime Job Support Available Financing Available
Bootcamps
Data Science with Machine Learning Flagship ๐Ÿ† Data Analytics Bootcamp Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lesson
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories Testimonials Alumni Directory Alumni Exclusive Study Program
Courses
View Bundled Courses
Financing Available
Bootcamp Prep Popular ๐Ÿ”ฅ Data Science Mastery Data Science Launchpad with Python View AI Courses Generative AI for Everyone New ๐ŸŽ‰ Generative AI for Finance New ๐ŸŽ‰ Generative AI for Marketing New ๐ŸŽ‰
Bundle Up
Learn More and Save More
Combination of data science courses.
View Data Science Courses
Beginner
Introductory Python
Intermediate
Data Science Python: Data Analysis and Visualization Popular ๐Ÿ”ฅ Data Science R: Data Analysis and Visualization
Advanced
Data Science Python: Machine Learning Popular ๐Ÿ”ฅ Data Science R: Machine Learning Designing and Implementing Production MLOps New ๐ŸŽ‰ Natural Language Processing for Production (NLP) New ๐ŸŽ‰
Find Inspiration
Get Course Recommendation Must Try ๐Ÿ’Ž An Ultimate Guide to Become a Data Scientist
For Companies
For Companies
Corporate Offerings Hiring Partners Candidate Portfolio Hire Our Graduates
Students Work
Students Work
All Posts Capstone Data Visualization Machine Learning Python Projects R Projects
Tutorials
About
About
About Us Accreditation Contact Us Join Us FAQ Webinars Subscription An Ultimate Guide to
Become a Data Scientist
    Login
NYC Data Science Acedemy
Bootcamps
Courses
Students Work
About
Bootcamps
Bootcamps
Data Science with Machine Learning Flagship
Data Analytics Bootcamp
Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lessons
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook
Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories
Testimonials
Alumni Directory
Alumni Exclusive Study Program
Courses
Bundles
financing available
View All Bundles
Bootcamp Prep
Data Science Mastery
Data Science Launchpad with Python NEW!
View AI Courses
Generative AI for Everyone
Generative AI for Finance
Generative AI for Marketing
View Data Science Courses
View All Professional Development Courses
Beginner
Introductory Python
Intermediate
Python: Data Analysis and Visualization
R: Data Analysis and Visualization
Advanced
Python: Machine Learning
R: Machine Learning
Designing and Implementing Production MLOps
Natural Language Processing for Production (NLP)
For Companies
Corporate Offerings
Hiring Partners
Candidate Portfolio
Hire Our Graduates
Students Work
All Posts
Capstone
Data Visualization
Machine Learning
Python Projects
R Projects
About
Accreditation
About Us
Contact Us
Join Us
FAQ
Webinars
Subscription
An Ultimate Guide to Become a Data Scientist
Tutorials
Data Analytics
  • Learn Pandas
  • Learn NumPy
  • Learn SciPy
  • Learn Matplotlib
Machine Learning
  • Boosting
  • Random Forest
  • Linear Regression
  • Decision Tree
  • PCA
Interview by Companies
  • JPMC
  • Google
  • Facebook
Artificial Intelligence
  • Learn Generative AI
  • Learn ChatGPT-3.5
  • Learn ChatGPT-4
  • Learn Google Bard
Coding
  • Learn Python
  • Learn SQL
  • Learn MySQL
  • Learn NoSQL
  • Learn PySpark
  • Learn PyTorch
Interview Questions
  • Python Hard
  • R Easy
  • R Hard
  • SQL Easy
  • SQL Hard
  • Python Easy
Data Science Blog > APIs > BookLab: Helping You Discover New Books With Machine Learning

BookLab: Helping You Discover New Books With Machine Learning

Chris Valle, Jhonasttan Regalado and Conred
Posted on Dec 18, 2016

Overview

For our capstone project, the team decided to create BookLab, a book recommendation engine for Barnes & Noble, a traditional brick and mortar bookstore, to help them increase book sales and customer loyalty. We used hybrid ensembled machine learning models (Random Forest) with collaborative filtering to make BookLab's recommendation results more creative than a simple book and author matching.

I. Project Background

 

Target Audience

We envision BookLab to be an app that helps traditional brick-and-mortar booksellers like Barnes & Noble customers find the books that meet their needs whether it is for school, work, or leisure reading. We feel that B&N is lagging behind Amazon, the current market-leader in terms customer engagement and monetization using data analytics. Their website allows customers to search for books by title, author, or category. However, it is not possible to get a book recommendation based on your personal preference such as your favorite books.

barnesnoblesearch

In the example above, a customer searches for Harry Potter and the search engine provides all Harry Potter books and memorabilia. However, there were no similar books recommended feature unlike in Amazon.

Searching for the same book in Amazon will also provide the customer a selection of similar books not necessarily from JK Rowling. โ€œDiary of a Wimpy Kid: Double Downโ€ by Jeff Kinney and โ€œMagnus Chase and the the Gods of Asgard, Book2: The Hammer of Thorโ€ by Rick Riordan was part of the Top 5 of algorithm recommendation.

harrypotteramazonreco

Presenting these 2 books of similar genre but different authors motivates the customer to explore new books. This is beneficial for both the customer and for the bookstore.

 

Goals

We want to help Barnes and Nobleโ€™s customers find great books that will inspire them, make them laugh, make them cry, and invoke their curiosity. Books that will make them a book-lover and continue to go to BNโ€™s website to discover new books. To do this, we will:

  • analyze reader behavior and preferences using EDA and clustering
  • develop a machine learning algorithm that predicts the reader satisfaction rate for books
  • create a recommendation engine algorithm to select the top matches for their needs
  • design a customer friendly interface that can be used by Bookstore specialists and customers

Languages, Tools, Platforms

  • Languages: R, Python
  • Platform: Spark, Data Science Studio, GraphLab
  • API: Google , GoodReads

II. Dataset and Pipeline

We sourced book ratings data from the University of Freiburgโ€™s Department of Computer Science, which scraped data from the Book-Crossing website with the permission of the website owner in 2004. The dataset contained 278,858 users (anonymized but with demographic information) providing 1,149,780 ratings (explicit / implicit) for 271,379 books.

bookcrossingsite

Similar to real business scenarios, our dataset had missing information and formatting issues. Upon inspection, we saw that we had to deal with the following challenges:

  • formatting issues such as misspelled City, State, and Country information
  • Book title issues especially for non-English titles
  • Missing user data such as age and location (City, State, and Country)
  • Read but unrated books exceeding read and rated books
  • Unreal user age exceeding 100 years old all the way to 250 years old

missingnessbooks

ageorig

Our approach was to pre-process the data carefully ensuring that we preserve the original data structure as much as possible to avoid inducing bias. The appendix below details this process. 

Apart from this dataset, the team gathered information from Google Books API and Goodreads API to gather the following features:

  • Book Genre
  • Page Count
  • Maturity

III. Reader Insights from Exploratory Data Analysis

Insight 1: Global Readers Are Using Social Media To Feed their Love for Books

To have a quick understanding of the reader demographics, we created a geographical map to plot their location. Book-Crossing has users from all over the world with majority of the readers coming from United States. There were also readers from the African continent namely from Egypt, South Africa, and Nigeria.

usermap

Insight2:  The Young and Dissatisfied, the 30s and Happy

It is interesting to note that on average most of the low ratings came from Book-crossing users between the ages of 16-18. On the other hand, most readers in their 30s rated their books higher on average.

It would be interesting to find out what drove these younger readers to rate books lower. That will be in another post.

bookratingsagedataiku

Insight3: Readers Seem Happier to Escape Reality

Fiction represented by green points seemed to be the most read genre by book-crossing members. This needs to be taken with a grain of salt as it seems like there is sparsity of other types of books that are non-fiction. 

fictioncount

Insight 4: Short Stories versus Long Stories Ratings

It appears that it is more common for shorter books with less than 250 pages to be rated low. The same pattern seems to be visible for books with 750 pages and above. After further research, we found out that "one reason it's harder for a new author to sell a 140,000 word manuscript is the size of the book. A 500+ page book is going to take up the space of almost two, 300 page books on the shelves. It's also going to cost more for the publishers to produce, so unless the author is well known, the book stores aren't going to stock that many copies of the 'door-stopper' novel as compared to the thinner novel."

storylenguide

pagecountgenre

IV. Supervised Learning: Predicting Book Approval Rating Classification

Since the team found more value in determining which books lead to high or low satisfaction rate, books with no ratings in our overall dataset was excluded in the training dataset. Below are 2 graphs breaking down rated and unrated book titles count.

ratingsorigdata

ratednew

Model Selection: Logistic Regression and Random Forest

The team decided to use Logistic Regression and Random Forest to perform a 10 multi-class prediction with the expectation that the Logistic Regression model will allow us to have a highly interpretable model which would be easy to explain to B&N's management. On the other hand, the Random Forest seems more robust for our dataset type given that it is better in dealing with uneven distribution and outliers. Therefore, theoretically giving us a more accurate prediction in terms of sensitivity and specificity.

Cross-Validation: Imbalanced Classes

We saw that most of the reviews were 8-10 and realized that we were faced with an imbalanced target class challenge. First, we performed a K-Fold cross-validation split on the entire rated dataset.

Performing cross-validation and then model fitting on the rated dataset with 10 classes for prediction resulted to low accuracy rate, low sensitivity, and low specificity.

mldashboard

Tuning the Model and Feature Engineering

To improve the predictive power of the models, team revised the problem into a 3-class prediction. With this revision, the model performed better overall with a higher AUC.

percentagerocperclass

highroc

medroc

lowroc

Resampling: Under-sampling and Over-sampling

This updated model performs much better than random classification and the previous model. However, a third adjustment can be done by the team using resampling methods that under samples the majority class, in this case, the High Rating class and over-sampling the minority class, Low Rating.

In a separate blog post, the team will test penalized models which imposes additional cost for making incorrect minority class prediction such as penalized SVM and penalized LDA to see if they will perform better than model 2.

The team recognizes that it is essential that BookLab is able to offer higher sensitivities in detecting lower rated books. We want to decrease the chances of our users encountering lower rated books as part of its trust-building efforts.

V. Unsupervised Learning: Book Recommendation 

Collaborative Filtering

The team used a distance-based similarity scoring algorithm called Collaborative Filtering (CF) to build BookLab, our recommendation engine. Since the Book Crossing Dataset has many zero implicit ratings, we replaced these ratings with average ratings when possible [for detail, please review Appendix-B]. Then we re-tested our engine with the enhanced dataset.  And we observed that the number of ratings available to CF do impact the recommendations made by the engine.

Book Recommendation Functions

*** For detail, please review Appendix-C.

In addition to basic ETL functions for loading data and transforming data structures, we have three main types of functions:

  1. Functions which calculate distance-base similarity score:
    • sim_euclidean : calculate the Euclidean distance between user1 and user2.
    • sim_pearson   : calculate the Pearson correlation coefficient for user1 and user2.
  1. Function topMatches which find similar users or items:
    • If you give it a prefs matrix and an user as input, it returns top matched similar users.
    • If you give it a critics matrix and a book title as input, it returns top matched similar books.
  1. Function getRecommendations which finds items for users or users for items:
    • If you give it a prefs matrix and a user as input, it returns few book recommendations.
    • If you give it a critics matrix and a book as input, it returns few users who may want to read the book.

Experiment & Observation

*** For detail, please review Appendix-B.

We have tested the function topMatches and getRecommendations with:

  • Euclidean distance and Pearson correlation.
  • Original dataset (less ratings) and our enhanced dataset (more ratings).

We cannot confirm if any recommendations made are valid since:

  • The dataset is not ideally clean.
  • The dataset does not have enough information about users or books.
  • Marcel Caracioloโ€™s approach does not make use of usersโ€™ profile.

But we do observe that the number of ratings available to CF do impact the recommendations made by the engine.

More Sophisticated Model

Although Marcel Caracioloโ€™s collaborative filtering algorithm is simple, it does provide us with many basic functionalities to build a book recommendation engine.  We believe we can build a more sophisticated engine by combining the important features we identified in the machine learning part of this project. 

For example, โ€œcategoryโ€ and โ€œpublisherโ€ are two important features we identified.  We can expand our data with โ€œcategoryโ€ and โ€œpublisherโ€.

Letโ€™s say we have one million user-item-rating available. 

Case-1

  • Letโ€™s say reader James Bond, with User-ID <007>, uses our book recommendation engine:
    • Action-1 : We first get all the records from our expanded data about [User-ID=007].
      • Action-1a : We discover the most frequent category James bought from us is โ€œThriller & Suspenseโ€.
      • Action-1b : We discover the most frequent publisher James bought from us is โ€œHarper Paperbacksโ€.
    • Action-2 :We then get all the records from our expanded data about [Category=Thriller & Suspense] and [Publisher=Harper Paperbacks]. 
      • Action-2a : We feed our engine with the User-ID, ISBN and Rating from these subset of data.
      • Action-2b : We then invoke โ€œgetRecommendations(prefs, '007', sim_pearson)[0:5]โ€ to give James our recommendation of five books.
    • Why?
      • Action-1 and its sub-steps are to discover Jamesโ€™ taste / preference.
      • Action-2 is for narrowing down ratings with similar taste / preference.   Thus, our engine will be fed with much less ratings, and the respond time of Action-2b will be much faster.  We donโ€™t want James waits!
Case-2

  • Letโ€™s say Harper Paperbacks just find an never-published thriller-and-suspense book by Ian Fleming called โ€œFor the data scientists Onlyโ€.  And we start an email book campaign to 200 potential readers.  As respond time is not of the essence as Case-1, 
    • Action-1 : We identify that Ianโ€™s โ€œFor your eyes onlyโ€ has same category and publisher as the new book.
    • Action-2 : We feed our engine with all one million ratings.
    • Action-3 : We enhance our distance calculation functions to consider also Category and Publisher.
    • Action-4 :
      • Action-4a : We invoke enhanced โ€œgetRecommendations(prefs, โ€˜For your eyes only', sim_euclidean, โ€œratingโ€)[0:400]โ€ to get 400 User-IDs.
      • Action-4b : We invoke enhanced โ€œgetRecommendations(prefs, โ€˜For your eyes only', sim_euclidean, โ€œcategoryโ€)[0:400]โ€ to get 400 User-IDs.
      • Action-4c : We invoke enhanced โ€œgetRecommendations(prefs, โ€˜For your eyes only', sim_euclidean, โ€œpublisherโ€)[0:400]โ€ to get 400 User-IDs.
      • Action-4d : Hopefully we will find 200 User-IDs common to all 3 result sets above.

Basically,

  • For case-1, we reduce our data with user profile in order to provide reasonable recommendation as soon as possible.
  • For case-2, we expand our feature space and hope to cover more potential readers. 

Moreover,

  • For case-2, the reason, in Action-1, we pick an existing book which similar to the new book, is to deal with the cold start problem.  
  • Let's say James' friend M, who never buy from us, wants our book recommendation, how can we handle the cold start problem of a new user for scenario case-1?  In this case, we will first ask her about her taste (category, publisher, ..), then we just follow the same Action-2 to provide her our recommendations. 

BookLab Interface Using GraphLab and Python

We wanted to have a user-interface for both the clients and the customers to experience BookLabโ€™s recommendation system. This initial version uses Python language to perform Collaborative Filtering and show the results with a GraphLab  User Interface. The GraphLab syntax and objects are quite similar with Python making it faster for the team to learn it and design an interface very quickly. For example, GraphLab has its own version of data frames and arrays called SFrames and SArrays.

graphlabparams2

This initial version allows users to get a book recommendation using an ISBN or book title. The idea is users can type in their favorite books (its ISBN) and BookLab will provide them 5 new book recommendation.

graphlablookup
graphlabpred10

graphlabdemo

results when we used book category, publisher, and page count

useritemgl1

results when we used all features

useritemwithpredictiongl2

when we used book category, publisher, page count, and predicted book rating

useritemwithoutpredictions-catpagecountpub

useritemwitoutpredictionsandallfeatures

results with all the features and with predicted book ratings

cfplusml

Vi. Business Application

BookLab can be implemented by Barnes and Noble using their own proprietary dataset. We expect that with the rich dataset they have from their BN Members and over 6 million books, BookLab will render more accurate book classification and recommendation results compared to the more limited dataset the team used for this capstone.

The recommendation algorithm can also be used for more personalized email marketing campaigns to BN members wherein every month TopMatch books alerts will be sent instead of generic email ads.

targetedemailer

Future Work

  • Perform SMOTE data balancing and other penalizing models to check if better ROC, sensitivity, and specificity can be achieved
  • Add more book features such as pricing via scraping to understand price sensitivity of customers
  • Identify interesting customer clusters after the addition of more features

Sources & Credits

  • Book Crossing Dataset
  • Barnes & Noble website
  • Amazon Website
  • Collective Intelligence
  • Quora
  • Collaborative Filtering : Implementation with Python

Appendix

Appendix A - Experiment & Observation

  • CollaborativeFiltering.ipynb

Appendix B - Imputation

Step-1-Baseline

  • BRCF.1.Baseline.ipynb is used to check Caraciolo's implementation against the Book-Crossing Dataset.
  • Our baseline model produced same results as shown in Caraciolo's article.
  • However, we observed many exceptions occurred during data loading and Caraaciolo only utilized non-zero ratings:
    • There are 1,149,781 ratings in BX-Book-Ratings.csv.
    • When loading it to Caraciolo's implementation, there is 1 Value exception and are 49,818 Key exceptions.
    • There are only 383,853 non-zero ratings used to build dictionary prefs (for user-based filter) which has 77,805 entries.
  • We believe we can treat zero ratings as missing values and impute them with average rating.
    • Let say 100 users bought the Book-A, but only 10 user provided ratings.
    • We can compute average rating for Book-A based on 10 user ratings and then feed it back to 90 zero-ratings.
    • Like a lot of Amazon customers, they buy book without feedback.  But they see the average rating from customers who provided ratings.  Thus, one can argue that they implicitly agree with the average rating posted on Amazon.

Following steps are our approach for such imputation.

Step-2-CleanerData

  • BRCF.2.CleanerData.ipynb is used to capture what Caraciolo's implementation used from BX-Books and BX-Book-Ratings and save them in true comma-seaparate files: MC.Books.csv and MC.Ratings.csv.
  • We also need to eliminate one line from MC.Ratings.csv with editor:
  • Line "130499,,.0330486187,6" as there are more than 3 fields

Step-3-VerifyData

  • BRCF.3.VerifyData.ipynb is used to verify MC.Books.csv and MC.Ratings.csv.
  • MC.Books.csv and MC.Ratings.csv produced same results as shown in Caraciolo's article.

Step-4-ImputeImplicit

  • BRCF.4.ImputeImplicit.ipynb is used to create Good.Ratings.csv which replace zero ratings with average ratings if available.
    • We read 1,149,781 from BX-Book-Ratings.csv.
    • 433,671 have non-zero ratings; thus, no impute needed.
    • We use average ratings for 494,024 records.
    • We can only use zero rating for 222,085 records as buyers of those books not provide rating.
    • We wrote 1,149,780 records to Good.ratings.csv
    • Thus, we double ratings available for building CF.

Step-5-DataImpact

  • BRCF.5.DataImpact.ipynb is to re-examine our baseline model using two times more ratings from Good.Ratings.csv.
  • As expected, more ratings changed the recommendations.
  1. BRCF.1.Baseline.ipynb
  2. BRCF.2.CleanerData.ipynb
  3. BRCF.3.VerifyData.ipynb
  4. BRCF.4.ImputeImplicit.ipynb
  5. BRCF.5.DataImpact.ipynb

Appendix C- Data Structure for Collaborative Filtering

We used two types of 2D matrices, implemented using Python dictionary, to capture the user-item-rating information.

1. prefs

To provide recommendation for a user, 2D prefs matrix will have users as rows and items as columns; i.e., rating stores as prefs<user><item>.  We provide two prefs matrices:

  1. prefsLess which is based on non-zero ratings from original dataset.
  2. prefsMore which includes prefsLess plus additional non-zero ratings by imputing average ratings.

2. critics

To provide recommendation for an item, 2D critics matrix will have items as rows and users as columns; i.e., rating stores as critics<item><user>.  We provide two critics matrices:

  1. criticsLess which is just a re-arrangement of prefsLess.
  2. criticsMore which is just a re-arrangement of prefsMore.

About Authors

Chris Valle

Chris is a Digital Strategy Manager and Marketer who, for 10 years, has been combining her data-driven insights and customer-centric marketing strategies to grow her clients' business. Her forte is monetizing digital and mobile channels to drive international...
View all posts by Chris Valle >

Jhonasttan Regalado

Jhonasttan Regalado is an established leader and technologist with domain expertise in Global Markets Trading and a Masters of Science in Management of Technology from the NYU Tandon School of Engineering, Polytechnic Institute. With practical knowledge and a...
View all posts by Jhonasttan Regalado >

Conred

As a software engineer, scrum master and project management professional, Conred Wang believes in, "Worry less, smile more. Don't regret, just learn and grow.", which motivated him to study at NYCDSA and become a data scientist. His exposure...
View all posts by Conred >

Related Articles

Capstone
Catching Fraud in the Healthcare System
Capstone
The Convenience Factor: How Grocery Stores Impact Property Values
Capstone
Acquisition Due Dilligence Automation for Smaller Firms
Machine Learning
Pandemic Effects on the Ames Housing Market and Lifestyle
Machine Learning
The Ames Data Set: Sales Price Tackled With Diverse Models

Leave a Comment

Cancel reply

You must be logged in to post a comment.

hammer of thor asli December 2, 2017
Exceโ…ผlent ิa๏ฝ™ of tะตlling, and good paragraph to obtain facts on the topic of my presentatั–on subject, whihh i am going to deliver in school.

View Posts by Categories

All Posts 2399 posts
AI 7 posts
AI Agent 2 posts
AI-based hotel recommendation 1 posts
AIForGood 1 posts
Alumni 60 posts
Animated Maps 1 posts
APIs 41 posts
Artificial Intelligence 2 posts
Artificial Intelligence 2 posts
AWS 13 posts
Banking 1 posts
Big Data 50 posts
Branch Analysis 1 posts
Capstone 206 posts
Career Education 7 posts
CLIP 1 posts
Community 72 posts
Congestion Zone 1 posts
Content Recommendation 1 posts
Cosine SImilarity 1 posts
Data Analysis 5 posts
Data Engineering 1 posts
Data Engineering 3 posts
Data Science 7 posts
Data Science News and Sharing 73 posts
Data Visualization 324 posts
Events 5 posts
Featured 37 posts
Function calling 1 posts
FutureTech 1 posts
Generative AI 5 posts
Hadoop 13 posts
Image Classification 1 posts
Innovation 2 posts
Kmeans Cluster 1 posts
LLM 6 posts
Machine Learning 364 posts
Marketing 1 posts
Meetup 144 posts
MLOPs 1 posts
Model Deployment 1 posts
Nagamas69 1 posts
NLP 1 posts
OpenAI 5 posts
OpenNYC Data 1 posts
pySpark 1 posts
Python 16 posts
Python 458 posts
Python data analysis 4 posts
Python Shiny 2 posts
R 404 posts
R Data Analysis 1 posts
R Shiny 560 posts
R Visualization 445 posts
RAG 1 posts
RoBERTa 1 posts
semantic rearch 2 posts
Spark 17 posts
SQL 1 posts
Streamlit 2 posts
Student Works 1687 posts
Tableau 12 posts
TensorFlow 3 posts
Traffic 1 posts
User Preference Modeling 1 posts
Vector database 2 posts
Web Scraping 483 posts
wukong138 1 posts

Our Recent Popular Posts

AI 4 AI: ChatGPT Unifies My Blog Posts
by Vinod Chugani
Dec 18, 2022
Meet Your Machine Learning Mentors: Kyle Gallatin
by Vivian Zhang
Nov 4, 2020
NICU Admissions and CCHD: Predicting Based on Data Analysis
by Paul Lee, Aron Berke, Bee Kim, Bettina Meier and Ira Villar
Jan 7, 2020

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day ChatGPT citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay football gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income industry Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI

NYC Data Science Academy

NYC Data Science Academy teaches data science, trains companies and their employees to better profit from data, excels at big data project consulting, and connects trained Data Scientists to our industry.

NYC Data Science Academy is licensed by New York State Education Department.

Get detailed curriculum information about our
amazing bootcamp!

Please enter a valid email address
Sign up completed. Thank you!

Offerings

  • HOME
  • DATA SCIENCE BOOTCAMP
  • ONLINE DATA SCIENCE BOOTCAMP
  • Professional Development Courses
  • CORPORATE OFFERINGS
  • HIRING PARTNERS
  • About

  • About Us
  • Alumni
  • Blog
  • FAQ
  • Contact Us
  • Refund Policy
  • Join Us
  • SOCIAL MEDIA

    ยฉ 2025 NYC Data Science Academy
    All rights reserved. | Site Map
    Privacy Policy | Terms of Service
    Bootcamp Application