NYC Data Science Academy| Blog
Bootcamps
Lifetime Job Support Available Financing Available
Bootcamps
Data Science with Machine Learning Flagship ๐Ÿ† Data Analytics Bootcamp Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lesson
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories Testimonials Alumni Directory Alumni Exclusive Study Program
Courses
View Bundled Courses
Financing Available
Bootcamp Prep Popular ๐Ÿ”ฅ Data Science Mastery Data Science Launchpad with Python View AI Courses Generative AI for Everyone New ๐ŸŽ‰ Generative AI for Finance New ๐ŸŽ‰ Generative AI for Marketing New ๐ŸŽ‰
Bundle Up
Learn More and Save More
Combination of data science courses.
View Data Science Courses
Beginner
Introductory Python
Intermediate
Data Science Python: Data Analysis and Visualization Popular ๐Ÿ”ฅ Data Science R: Data Analysis and Visualization
Advanced
Data Science Python: Machine Learning Popular ๐Ÿ”ฅ Data Science R: Machine Learning Designing and Implementing Production MLOps New ๐ŸŽ‰ Natural Language Processing for Production (NLP) New ๐ŸŽ‰
Find Inspiration
Get Course Recommendation Must Try ๐Ÿ’Ž An Ultimate Guide to Become a Data Scientist
For Companies
For Companies
Corporate Offerings Hiring Partners Candidate Portfolio Hire Our Graduates
Students Work
Students Work
All Posts Capstone Data Visualization Machine Learning Python Projects R Projects
Tutorials
About
About
About Us Accreditation Contact Us Join Us FAQ Webinars Subscription An Ultimate Guide to
Become a Data Scientist
    Login
NYC Data Science Acedemy
Bootcamps
Courses
Students Work
About
Bootcamps
Bootcamps
Data Science with Machine Learning Flagship
Data Analytics Bootcamp
Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lessons
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook
Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories
Testimonials
Alumni Directory
Alumni Exclusive Study Program
Courses
Bundles
financing available
View All Bundles
Bootcamp Prep
Data Science Mastery
Data Science Launchpad with Python NEW!
View AI Courses
Generative AI for Everyone
Generative AI for Finance
Generative AI for Marketing
View Data Science Courses
View All Professional Development Courses
Beginner
Introductory Python
Intermediate
Python: Data Analysis and Visualization
R: Data Analysis and Visualization
Advanced
Python: Machine Learning
R: Machine Learning
Designing and Implementing Production MLOps
Natural Language Processing for Production (NLP)
For Companies
Corporate Offerings
Hiring Partners
Candidate Portfolio
Hire Our Graduates
Students Work
All Posts
Capstone
Data Visualization
Machine Learning
Python Projects
R Projects
About
Accreditation
About Us
Contact Us
Join Us
FAQ
Webinars
Subscription
An Ultimate Guide to Become a Data Scientist
Tutorials
Data Analytics
  • Learn Pandas
  • Learn NumPy
  • Learn SciPy
  • Learn Matplotlib
Machine Learning
  • Boosting
  • Random Forest
  • Linear Regression
  • Decision Tree
  • PCA
Interview by Companies
  • JPMC
  • Google
  • Facebook
Artificial Intelligence
  • Learn Generative AI
  • Learn ChatGPT-3.5
  • Learn ChatGPT-4
  • Learn Google Bard
Coding
  • Learn Python
  • Learn SQL
  • Learn MySQL
  • Learn NoSQL
  • Learn PySpark
  • Learn PyTorch
Interview Questions
  • Python Hard
  • R Easy
  • R Hard
  • SQL Easy
  • SQL Hard
  • Python Easy
Data Science Blog > R > HarvardX / MITx Online Courses - Year 1

HarvardX / MITx Online Courses - Year 1

John Montroy
Posted on Nov 13, 2015

Contributed by John Montroy. John took NYC Data Science Academy 12 week full time Data Science Bootcamp program between Sept 23 to Dec 18, 2015. The post was based on his first class project(due at 2nd week of the program).

(Course) Introduction

Online courses are often touted as the savior of higher education, serving as an accessible and affordable alternative to four-year colleges. Proponents point to the accessibility and affordability of online courses; skeptics are quick to note poor completion rates and scant evidence of any clear job market advantage.

Using a student-level dataset provided by HarvardX/MITx, I explore the skeptic's former point in the below analysis. Specifically, I seek to discuss:

  • How do completion rates fair among various sample groups?
  • Are there any groups with higher completion rates than other groups, on average?
  • How do subgroups perform grade-wise, comparatively?

To summarize, we can simply ask: how can we better engage our students through online courses?

In this analysis, we tackle this question using R and ggplot2.

The Data

The dataset analyzed was provided  in an open release by HarvardX and MITx, publicly available here. The data contain student-level observations per course - a student who enrolled in multiple courses would have multiple records in the dataset. The data release included all de-identified student records as well as an accompanying codebook detailing column interpretation. Some high-level statistics:

  • All courses launched on edX for the 2012 - 2013 academic year
  • 17 courses across 3 semesters (eg. "The Ancient Greek Hero", "Intro to Solid State Chemistry")
  • 1,055,562 total registrants
  • 597,692 unique users
  • 43,196 certificates of completion issued

One glance at these numbers outlines the issue nicely - only 7.2% of users received a certificate of completion (without factoring in multiple certificates per user, which would only reduce this percentage!). With this as our touchpoint, we can dive into analysis using R and ggplot2.

Wrangling  - dynamically!

R makes it easy to wrangle your data appropriately, and with a little tweaking, can produce more functionally-oriented code to increase reproducibility. I've included a few highlights of re-usable/dynamic code developed for this analysis - you can find the rest of the code (with comments) at the bottom under "R Code Walks: Wrangling - dynamically!".

After basic cleaning/wrangling, we would like to start aggregating user counts. Specifically, we are interested in percentages - what percentage of students got an A? What percentage stayed engaged with the course for more than a month? What percentage of students with a college degree failed to complete a given course?

In order to generate these percentages from individual records, we need to develop a re-usable function to dynamically calculate percentages per subgroup. The function to perform this makes use of dplyr's Standard Evaluation (SE) functionality to pass a column name into a function. Here, we've focused in on faceting based on the "active_length" variable, but this can be changed easily:

getPercs <- function(df, colname, colpiv, colvals) {
 if(!missing(colvals)) { 
   df.sub <- df[df[[colname]] %in% colvals,]
 } else {
   df.sub <- df
 } # this checks for optional colvals param
 
 df.sub <- df.sub[!is.na(df[[colname]]),] # remove NA
 
 df.sub.groups <- df.sub %>%
   group_by_(interp(~x, x = as.name(colname)), interp(~y, y = as.name(colpiv))) %>% 
   summarise(count = n()) # get group totals

 df.sub.totals <- df.sub %>%
   group_by_(interp(~x, x = as.name(colname))) %>% 
   summarise(totcount = n()) # get totals
 
 df.perc <- df.sub.groups %>% inner_join(df.sub.totals, by = c(colname))
 df.perc <- df.perc %>% mutate(perc = count / totcount)
 
 df.perc$count <- NULL; df.perc$totcount <- NULL
 
 return(df.perc[complete.cases(df.perc),])
}

With this function in hand, we can quickly generate percentages of active lengths per subgroup, and use this data for plotting (as will become clearer momentarily). Our dataset contains many potentially interesting facets:

Facet-able features of edX dataset

Let's try a bunch, and celebrate re-usable code!

getPercs(edx.filt, 'gender', 'active_length', c('m','f'))
getPercs(edx.filt, 'age_cat', 'active_length') # 18 - 25, 25 - 32, etc.
getPercs(edx.filt, 'LoE_DI', 'active_length') # Bachelor's, Master's
getPercs(edx.filt, 'final_cc_cname_DI', 'active_length') # country
getPercs(edx.filt, 'school', 'active_length') # harvard / mit
getPercs(edx.filt, 'course', 'active_length')
getPercs(edx.filt, 'CourseCat', 'active_length') # humanities / sciences

Course Participation - The Bad, the Bad, and the still Bad

Now that we have percentages, let's visualize how course participation behaves over time. We do this through the "active_length" variable created above, bearing in mind that some people register for the course before the actual start date. We choose to visualize the subset of users who had a positive, non-zero active length. This tells us nothing about how frequently users interacted with their course, but it does allow us to see general trends in participation.

We create a function to plot attrition rate over "time", where time here is a user's active length. Note the use of "aes_strings" for dynamic columns.

createAttritionPlot <- function(df, colorcol, title, legendtitle) {
  ggplot(df, aes_string(x = 'active_length', y = 'perc', color = colorcol)) +
  geom_line() +
  labs(
    title = paste0("User Engagement Attrition Rate (by ", title, ")"), 
    y = "Percentage of Total Users", x = "Days Active") +
  theme_bw() +
  scale_colour_discrete(name = legendtitle) 
}

With this, we begin plotting. With the code set up as such, it's as easy as:

grid.arrange(
 createAttritionPlot(
   getPercs(edx.filt, 'gender', 'active_length', c('m','f')), 
   'gender', 'gender', 'Gender'),
 createAttritionPlot(
   getPercs(edx.filt, 'CourseCat', 'active_length'), 
   'CourseCat', 'course type', 'Course Type'),
 createAttritionPlot(
   getPercs(edx.filt, 'age_cat', 'active_length'), 
   'age_cat', 'age', 'Age Category'),
 createAttritionPlot(
   getPercs(edx.filt, 'LoE_DI', 'active_length'), 
   'LoE_DI', 'level of education', 'Level of Education'),
 nrow=2
)

(Note: grid.arrange comes courtesy of the gridExtra package)

And finally, this code produces:

Screen Shot 2015-11-12 at 11.50.17 PM

Wait a sec! That's not too encouraging and/or helpful. Looks like everyone is pretty much equally bad at committing to online courses - or, more importantly, online courses fail to capture students regardless of background.

Let's try faceting out one or two particular variable to look for trends:

Age category?

Screen Shot 2015-11-13 at 2.02.08 PM

Course?

Screen Shot 2015-11-13 at 2.05.36 PM

Neither of these facets are particular revelatory, other than reinforcing what we've already seen. At the beginning of course, hopes and ambitions are high, but they fall off very quickly. There is an odd spike at around day 25 in Electricity and Magnetism, which motivates the idea of targeted campaigns.

Besides verifying our fears, what can we do with these visualizations in hand? The best course is targeted campaigns. Set up A-B tests with different styles of campaigns (email, incentives, etc.) and carefully observe any "lift" in the above graphs. We can then find the best way to increase course retention through targeted campaigns.

Course Performance - Find your target

We now turn to a different question: how do various subgroups perform in terms of final grade? We've already seen that retention rates are universally abysmal, regardless of who you are. Perhaps performance comparisons will yield more useful differences.

With our data already clean, plotting becomes relatively straight-forward. We start with the plotting results gridded together:

fourplot

Let's break these graphs down, starting with the top-left.

barplot

This first plot shows certificates issued by percentage per gender and level of education. Worth noting is the ratio of male vs female certificates per educational level. Lower educational levels tilt towards higher completion rates for men, but flips at the college/master's level towards women. At the doctorate level, completion rates are about equal.

point_grade

The second plot is a plot of number of interaction events per user versus the days that user was active. The scatterplot is also color-scaled according to final grade, and log-scaled on the x-axis. This yields a rather pretty trend showing what we'd expect: the longer a user stays engaged and the more they interact, the higher their final grade tends to be.

More interesting here, however, is the subgroup stretching from ~ 3 - 7 on the x-axis. This starkly-defined cluster can be interepreted as very high performers who nonetheless did not interact with the course material very much. These over-achievers are ripe candidates to be targeted! Flatter them, incentivize them, get them involved with the course, as they clearly understand the material!

pointbar_grade

This third plot is similar to the second, but instead plots number of chapters completed against number of forum posts (which is basically a discrete variable, so we almost end up with a bar chart).  We see again that more interaction corresponds with a higher grade, and can trace the rise in grade by tracking the color gradient change through a horizontal line drawn most anywhere.

violin_grade

The final plot is a simple violin plot of of final grades, by education level and gender. We see once again that women generally out-perform men, but the distributions equalize at the highest education levels. This can be seen by the comparative flatness of each violin plot (or plunger, if you will) at the bottom, and the thickness at the top ends.

Conclusions and Future Work

What can we conclude here? As feared, completion rates are abysmal in online courses. A method of improving retention is targeted campaigns, whose effects we could hopefully observe in visualizations like the ones above.

Course performance is a bit more interesting. On the whole, women seem to outperform men except at the very highest level of educational attainment. Further, there are distinct clusters of users visible simply through observation of scatter plots. One of these groups is a set of high-performing but unengaged students - these students need to be identified and engaged, as they could be invaluable for overall success!

On the technical side - well-designed functions take a bit of time, but save much more on the other side. I've actually used the percentage function shown here in other projects, so it's proven worth the investment.

To conclude - there's plenty more to be done here. We can actually perform some data science on this data -- clustering algorithms may reveal even more subgroups of users that we could selectively and uniquely target in email campaigns. Classification models can help us predict the likelihood of completion for users based on various statistics a the course progresses, and regression can be done afterwards to understand the relationship between user interaction and final grade. A fuller dataset would also be more interesting in such an undertaking, of course.

Appendix: R Code Walks

This section is dedicated to all code not already covered above. Some might call this the less interesting code, but who knows?

Wrangling - dynamically!

I begin cleaning by importing the data with appropriate column classes and NA values accounted for. Reading in a snippet of your data and then adjusting your import is crucial for saving time later.

fileloc = '/path/to/dataset.csv'

# get classes, NAs from sample, adjust/account for
edx.head <- read.csv(fileloc, nrows = 1000)
classes <- sapply(edx.head, class)

classes[c("LoE_DI","YoB","gender","final_cc_cname_DI","userid_DI","course_id", "start_time_DI","last_event_DI")] <- "character" 

# re-read with colClasses
edx <- read.csv(fileloc, colClasses = classes, na.string = c("", " ", NA))

The provided dataset documentation reveals which columns are of interest - we can briefly dispense with useless columns and known bad data.

edx$roles <- NULL
edx.incomplete <- filter(edx, !is.na(incomplete_flag))
edx <- filter(edx, is.na(incomplete_flag))
edx$incomplete_flag = NULL; edx.incomplete$incomplete_flag = NULL;

The next few lines create: three new columns based on school/course/semester, a new age category variables, and a new "Active Length" variable. This variable is the difference between the course start time and the last time the user interacted with said course. We also convert the date columns [start_time_DI, last_event_DI] to R's Date class. Lastly, we create a mapping of course codes to course type ['Humanities,'Sciences'] and join that up to create new columns per record. These course category labels are obviously subjective - feel free to yell at me if you disagree.

edx <- edx %>% 
 mutate(start_time_DI = as.Date(start_time_DI, format = "%Y-%m-%d")) %>%
 mutate(., last_event_DI = as.Date(last_event_DI, format = "%Y-%m-%d"))


edx <- edx %>%
 separate(course_id, into = c("school", "course", "semester"), sep = "/", remove=TRUE)

edx <- edx %>%
 mutate(active_length = as.integer(last_event_DI - start_time_DI))

# create age category
edx <- edx %>%
 mutate(age_cat = 
  ifelse(!is.na(edx$YoB), 
   as.character(cut(as.integer(Sys.Date() 
    - as.Date(as.character(edx$YoB)[!is.na(edx$YoB)], 
    format = "%Y")) / 365, 
    breaks = c(0,18,25,35,45,65,Inf),
    labels = c("0 - 18", "18 - 25", "25 - 35", "35 - 45", "45 - 65", "65+")
  )), NA)
 )

# create course type map, no getting around this
coursetype_map <- data.frame(rbind(
 c('CB22x','The Ancient Greek Hero', 'Humanities'),
 c('CS50x','Introduction to Computer Science 1', 'Sciences'),
 c('ER22x','Justice','Humanities'),
 c('PH207x','Health in Numbers: Quantitative Methods', 'Sciences'),
 c('PH278x','Human Health and Global Environmental Change','Humanities'),
 c('14.73x','The Challenges of Global Poverty','Humanities'),
 c('2.01x','Elements of Structures','Sciences'),
 c('3.091x','Introduction to Solid State Chemistry','Sciences'),
 c('6.002x','Circuits and Electronics','Sciences'),
 c('6.00x','Introduction to Computer Science and Programming','Sciences'),
 c('7.00x','Introduction to Biology โ€“ The Secret of Life','Sciences'),
 c('8.02x','Electricity and Magnetism','Sciences'),
 c('8M.ReV','Mechanics Review','Sciences')
))

# rename cols, convert to strings, join it up
coursetype_map <- coursetype_map %>% rename(CourseCode = X1, CourseName = X2,CourseCat = X3)
coursetype_map <- data.frame(sapply(coursetype_map, as.character), stringsAsFactors = FALSE)
edx <- edx %>% inner_join(coursetype_map, by = c("course" = "CourseCode"))

Course Performance - Find your Target

The below code generates the four plots (bar plot, scatter1, scatter2, violin plot) discussed in the corresponding section above:

# number of completions
gg.bar <- ggplot(getPercs(subset(edx, certified == 1), 'gender', 'LoE_DI'), aes(x = LoE_DI, y = perc)) + 
 geom_bar(stat = "identity", position = "dodge", aes(fill = gender)) +
 labs(title = "Certifications Issued (by education and gender)", 
 y = "Number Certified",x = "Level of Education") +
 theme_economist()

# interactivity with color scaling for final grade
gg.point <- ggplot(subset(edx, grade > 0 & grade <= 1), aes(x = log(nevents), y = ndays_act)) +
 geom_point(aes(color = grade)) + 
 scale_color_gradient() +
 labs(title = "User Interactivity (color-scaled by final grade)", 
 y = "Number of Days Active ",x = "Number of interaction events (log)") +
 theme_economist() + scale_fill_economist()


# interactivity re: chapters and forum posts -- why such convergences of chapters read?
gg.jitter <- ggplot(subset(edx, grade > 0 & grade <= 1), aes(x = nforum_posts, y = nchapters)) +
 geom_jitter(aes(color = grade)) + 
 scale_color_gradient() +
 labs(title = "User Interactivity (color-scaled by final grade)", 
 y = "Number of chapters completed ", x = "Number of forum posts") +
 theme_economist() + scale_fill_economist()


# number of completions
gg.bar <- ggplot(getPercs(subset(edx, certified == 1), 'gender', 'LoE_DI'), aes(x = LoE_DI, y = perc)) + 
 geom_bar(stat = "identity", position = "dodge", aes(fill = gender)) +
 labs(title = "Certifications Issued (by education and gender)", 
 y = "Number Certified",x = "Level of Education") +
 theme_economist()

# interactivity with color scaling for final grade
gg.point <- ggplot(subset(edx, grade > 0 & grade <= 1), aes(x = log(nevents), y = ndays_act)) +
 geom_point(aes(color = grade)) + 
 scale_color_gradient() +
 labs(title = "User Interactivity (color-scaled by final grade)", 
 y = "Number of Days Active ",x = "Number of interaction events (log)") +
 theme_economist() + scale_fill_economist()


# interactivity re: chapters and forum posts
gg.jitter <- ggplot(subset(edx, grade > 0 & grade <= 1), aes(x = nforum_posts, y = nchapters)) +
 geom_jitter(aes(color = grade)) + 
 scale_color_gradient() +
 labs(title = "User Interactivity (color-scaled by final grade)", 
 y = "Number of chapters completed ", x = "Number of forum posts") +
 theme_economist() + scale_fill_economist()


# violin of grade by education
gg.violin <- ggplot(subset(edx, !is.na(LoE_DI) & grade > 0 & grade <= 1 & gender %in% c("f","m")), 
 aes(x = factor(LoE_DI), y = grade, fill = gender)) +
 geom_violin() +
 labs(title = "Grade (by education, gender)", 
 y = "Grade", x = "Level of Education") +
 theme_economist() ##+ scale_fill_economist()

About Author

John Montroy

John Montroy is a graduate of Middlebury College with a B.A. in Physics. After a summer of particle physics at CERN with the Harvard ATLAS team, he began his career as a data analyst in the auto industry....
View all posts by John Montroy >

Leave a Comment

Cancel reply

You must be logged in to post a comment.

Google September 29, 2020
Google The info mentioned inside the post are a number of the top offered.
AllanRKolker November 15, 2016
I had been very pleased to uncover this web site. I needed to thanks for your personal time exclusively for this fantastic read!! I definitely liked every small amount of it and that i have you book-marked to see new stuff on your own site.
FidelaDTelch August 27, 2016
What's up, constantly i used to check web site posts here in the early hours in the morning, for the reason that i like to find out more and more.
MerlinMKaren August 25, 2016
This part of writing is actually a fastidious one it helps new web people, who definitely are wishing for blogging.
fifa 17 coins August 15, 2016
I like this website - its so usefull and helpfull
RicoSBudzyna August 9, 2016
I think this can be one of many most vital information to me. And i am glad reading your article. But desire to remark on few general things, The internet site style is perfect, the articles is very nice : D. Good job, cheers
RetaGGoulart August 9, 2016
When someone writes an article he/she maintains the idea of a user within his/her brain that the way a user can be familiar with it. Thus that's why this paragraph is perfect. Thanks!
JulioZMasudi July 13, 2016
Hi there, its fastidious paragraph concerning media print, we all be aware of media is a fantastic source of facts.
Chiropractor Liverpool NY January 11, 2016
Are all the reviews from the same approximate time. Chiropractors offer to help you lead a pain free life for certain types of pain. In many cases, people find themselves unable to maintain their normal work schedule due to the pain that this condition causes.
Gregory Gandenberger December 4, 2015
I wonder why exactly low completion rates are supposed to be a problem, and for whom. I often sign up for five or more courses at a time, fully intending only to complete one of them but wanting to explore them a bit before I decide which one. My 20% completion rate is not a sign that I am not engaged with or benefiting from online courses; it is just a sign that I am shopping around. How many of the early dropouts are like me? Attempts to increase my engagement might influence me to choose one course over another but wouldn't really benefit me.

View Posts by Categories

All Posts 2399 posts
AI 7 posts
AI Agent 2 posts
AI-based hotel recommendation 1 posts
AIForGood 1 posts
Alumni 60 posts
Animated Maps 1 posts
APIs 41 posts
Artificial Intelligence 2 posts
Artificial Intelligence 2 posts
AWS 13 posts
Banking 1 posts
Big Data 50 posts
Branch Analysis 1 posts
Capstone 206 posts
Career Education 7 posts
CLIP 1 posts
Community 72 posts
Congestion Zone 1 posts
Content Recommendation 1 posts
Cosine SImilarity 1 posts
Data Analysis 5 posts
Data Engineering 1 posts
Data Engineering 3 posts
Data Science 7 posts
Data Science News and Sharing 73 posts
Data Visualization 324 posts
Events 5 posts
Featured 37 posts
Function calling 1 posts
FutureTech 1 posts
Generative AI 5 posts
Hadoop 13 posts
Image Classification 1 posts
Innovation 2 posts
Kmeans Cluster 1 posts
LLM 6 posts
Machine Learning 364 posts
Marketing 1 posts
Meetup 144 posts
MLOPs 1 posts
Model Deployment 1 posts
Nagamas69 1 posts
NLP 1 posts
OpenAI 5 posts
OpenNYC Data 1 posts
pySpark 1 posts
Python 16 posts
Python 458 posts
Python data analysis 4 posts
Python Shiny 2 posts
R 404 posts
R Data Analysis 1 posts
R Shiny 560 posts
R Visualization 445 posts
RAG 1 posts
RoBERTa 1 posts
semantic rearch 2 posts
Spark 17 posts
SQL 1 posts
Streamlit 2 posts
Student Works 1687 posts
Tableau 12 posts
TensorFlow 3 posts
Traffic 1 posts
User Preference Modeling 1 posts
Vector database 2 posts
Web Scraping 483 posts
wukong138 1 posts

Our Recent Popular Posts

AI 4 AI: ChatGPT Unifies My Blog Posts
by Vinod Chugani
Dec 18, 2022
Meet Your Machine Learning Mentors: Kyle Gallatin
by Vivian Zhang
Nov 4, 2020
NICU Admissions and CCHD: Predicting Based on Data Analysis
by Paul Lee, Aron Berke, Bee Kim, Bettina Meier and Ira Villar
Jan 7, 2020

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day ChatGPT citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay football gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income industry Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI

NYC Data Science Academy

NYC Data Science Academy teaches data science, trains companies and their employees to better profit from data, excels at big data project consulting, and connects trained Data Scientists to our industry.

NYC Data Science Academy is licensed by New York State Education Department.

Get detailed curriculum information about our
amazing bootcamp!

Please enter a valid email address
Sign up completed. Thank you!

Offerings

  • HOME
  • DATA SCIENCE BOOTCAMP
  • ONLINE DATA SCIENCE BOOTCAMP
  • Professional Development Courses
  • CORPORATE OFFERINGS
  • HIRING PARTNERS
  • About

  • About Us
  • Alumni
  • Blog
  • FAQ
  • Contact Us
  • Refund Policy
  • Join Us
  • SOCIAL MEDIA

    ยฉ 2025 NYC Data Science Academy
    All rights reserved. | Site Map
    Privacy Policy | Terms of Service
    Bootcamp Application