Prerequisite online coursework includes a total of forty hours of work and over two hundred exercises. The Prework will prepare students to work with both R and Python as well as revisit basic concepts in linear algebra, calculus, and statistics.
- Mathematics/Statistics: Refresh your memory in linear algebra and statistics.
- Calculus: Exercise basic calculus techniques for data.
- Conda Installation: Kick off your Python journey with a beginner-friendly setting!
- Python: Designed for people who are new to programming.
- R: Learn R to process and analyze data.
The Unix environment is widely used in the data science field. Being familiar with the
common tools is important in order to carry out further data analysis. This course enables students to communicate with the computers via the command line environment. It also
introduces the SQL database, a traditional database that has been widely used in the
enterprise setting, as well as GitHub, a file sharing platform generally used by programmers
for version control.
This course introduces students to data analysis with the Python programming language.
Students learn to work with different data structures in Python and the most popular data
analytics and visualization packages such as numpy, scipy, pandas, matplotlib, and seaborn.
Ultimately, students will use effective Python code and packages to solve problems; extract,
transform, load, and analyze data to gain insights; and communicate the analyses, aided by
appropriate visualizations. Students are required to complete a project incorporating these
practices, culminating in a presentation of derived insights.
This course is designed to provide a comprehensive introduction to the R programming
language for data analysis. Students will learn to load, save, and otherwise wrangle data with
effective use of functions in R and relevant libraries, including those within the tidyverse
collection. Students will practice deriving insights from data using common statistical
techniques, including hypothesis testing and basic statistical modeling; effective visualization;
and other frequently used techniques within data analysis. Further, students will learn to
successfully communicate their insights, including creating reports with tools like knitr.
Students are required to complete a project demonstrating the ability to analyze data in R.
This course was designed to help students place data analytics and data science work in the
real-world context of business operations across industries. Students will be presented
various business cases in which datasets were explored to gain insights to guide and/or
enhance business operations. They will also be required to take given business cases and
conceptualize viable project approaches with defined objectives, selected tools and methods,
and expected deliverables
This course introduces students to Supervised Machine Learning from both a theoretical and practical perspective. Students will learn the theoretical foundations and mathematical structure behind several important, classical models; design a reproducible machine learning pipeline, including selection of an optimal model within a given context; and demonstrate the soundness and effectiveness of the final model, with a particular focus on the value of the model for extracting insights from data. Throughout the course, students will see both linear models for regression and classification, Bayesian classifiers, and time series.
This course continues from Machine Learning I to expand the students' arsenal of machine learning algorithms along with their underlying theoretical foundations and implementations in Python. Going further into Supervised Machine Learning, students will learn tree-based models, including Bagging Trees and Random Forest; Gradient Boosting; and Support Vector Machines. Moving into Unsupervised Machine Learning, students will learn techniques of Clustering, including KMeans and Hierarchical approaches; and Matrix Factorization, including Principal Component Analysis and Latent Dirichlet Allocation. Throughout the course, students will adhere to best-practices in choosing, tuning, and critiquing their models. Finally, students will be required to complete one machine learning project, in which they will demonstrate their machine learning acumen to distil deeper insights into data.
This course introduces students to more advanced data science practices, including Scalability and Deep Learning. On the scalability side, students will gain an overview of contemporary topics such as when to move from the desktop to a database, big data technologies and cloud computing. On the deep learning side, students will learn the basic mathematical construct of deep learning models, understand where deep learning has and has not found success, as well as gain an overview of several important model architectures. Along the way, students will be given examples of where the material they have learned throughout the curriculum compare and manifest in industry.
The capstone project is designed for students to employ the major data science concepts, tools, and methods they have learned in the program to solve a business operational problem with real data sets from a real business entity. Students are presented data sets and potential problems to solve. Students are then required to form project teams, develop a project proposal for instructor review and approval, and execute the project. When the project is completed, each project team is required to present the project findings and share the business insights obtained from the research.