Advanced
Designing and Implementing Production Machine Learning Systems (MLOps)

Designing and Implementing Production Machine Learning Systems (MLOps)

This course is an introduction to ML systems in production that will demonstrate and give students exposure to how real production ML systems operate. Using Python, Docker, Kubernetes, Google Cloud and various open-source tools, students will bring the different components of an ML system to life and setup real, automated infrastructure.

Clear
* Tuition paid for part-time courses can be applied to the Data Science Bootcamps if admitted within 9 months.
All courses are hosted online.
We do not offer this course at this moment. Please join our waiting list to be notified when it becomes available again.
Find out more information about our professional development courses.
DOWNLOAD COURSE INFORMATION

Product Description

Course Overview

As machine learning (ML) becomes ubiquitous in technology, there is an increasing need for well-engineered ML systems and processes that enable ML algorithms to drive business value. Enterprise ML has experienced a shift in focus from just the ML models themselves to the software engineering, infrastructure and best practices necessary to support ML at scale in production. Bringing a model from a data scientist’s notebook to running live in an application requires robust systems, MLOps and ML governance.

This course is an introduction to ML systems in production that will demonstrate and give students exposure to how real production ML systems operate. Using Python, Docker, Kubernetes, Google Cloud and various open-source tools, students will bring the different components of an ML system to life and setup real, automated infrastructure. It will be mostly in Python, docker, kuberentes, and google cloud in addition to lots of open source tools.

Prerequisites

It is expected you have familiarity with an object-oriented programming language (preferably Python) and experience with basic machine learning concepts and models. Some previous exposure to a cloud environment (AWS, Google Cloud, Azure, etc…) or other software engineering experience would be helpful but not necessary.

Certificate

Certificates are awarded at the end of the program at the satisfactory completion of the course. Students are evaluated on a pass/fail basis for their performance on the required homework and final project (where applicable). Students who complete 80% of the homework and attend a minimum of 85% of all classes are eligible for the certificate of completion.

Demo Lecture

Quick Course Overview of ML Ops class
Module
Overview
Instructor
Kyle Gallatin
Description
One minute overview of what you will be learning in this course.

Syllabus

Unit 1 - Overview of Machine Learning Systems in Production

  • Machine learning in industry versus academia
  • Comparing ML engineering and software engineering
  • Components of production ML systems
  • Online versus offline ML systems
  • Demonstration: a production ML system
  • Hands-on: Introduction to Google Cloud, project setup, and gcloud commands
  • Hands-on: Setting up our git repository

Unit 2 - Machine Learning Engineering Fundamentals

  • Software engineering principles
  • Systems design 101
  • ML Systems design 101
  • MLOps concepts and design principles
  • Hands-on: Essential Google Cloud services for ML
  • Hands-on: Kubernetes and Google Kubernetes Engine (GKE) intro
  • Your ML in production project: Ideating

Unit 3 - Feature Systems

  • Introduction to feature systems
  • Common feature systems design patterns
  • Developer experience in feature systems and ML systems
  • Hands-on: Working with different feature sources and data stores on Google Cloud
  • Hands-on: Building a miniature feature system in the cloud
  • Your ML in production project: Ideating

Unit 4 - ML Model Training Pipelines

  • Components of ML training pipelines
  • Workflow orchestration and automation
  • Cost and value analysis
  • Setting up an ML pipeline
  • Hands-on: Introduction to Kubeflow and building an automated pipeline
  • Hands-on: Running training automated jobs on Kubernetes
  • Your ML in production project: Design and Planning

Unit 5 - Managing Training Experiments, ML Metadata, and

  • Model Registries
  • Experimentation as an ML practioner
  • Hands-on: Setting up a centralized metadata store and model registry
  • Hands-on: Tracking and logging hyperparameters
  • Hands-on: Using model registries
  • Your ML in production project: Design and Planning

Unit 6 - Deploying Machine Learning Models

  • Generating offline predictions
  • Online model serving systems
  • Common real-time deployment architectures
  • Hands-on: Developing an automated offline prediction workflow using Kubeflow and Dataflow
  • Hands-on: Deploying ML models on Kubernetes for real-time inference with Seldon
  • Hands-on: Scaling ML model deployments
  • Your ML in production project: Architecture Review

Unit 7 - ML Observability

  • Infrastructure and software observability
  • Latency, throughput, availability, and reliability
  • ML observability, ML model/feature drift, and ML explainability
  • Fairness and bias
  • Hands-on: Setting up Prometheus and Grafana on Kubernetes
  • Hands-on: Accessing logs and metrics in Google Cloud
  • Hands-on: Logging predictions and implementing ML observability
  • Your ML in production project: Architecture Review

Unit 8 - Experimentation and Reliability Engineering

  • ML experimentation design and algorithms 101
  • Hands-on: A/B testing with Seldon on Kubernetes
  • Hands-on: Multi-armed bandits with Seldon on Kubernetes
  • Hands-on: Canary/shadow deployments on Kubernetes
  • Your ML in production project: Implementation

Unit 9 - Continuous Learning

  • Streaming versus batch processing
  • Event-driven, asynchronous systems
  • Stateful ML systems and incremental model updates
  • Hands-on: Designing and implementing a stateful ML system on Kubernetes
  • Your ML in production project: Implementation

Unit 10 - Machine Learning Governance

  • Observability, visibility and control
  • Monitoring and alerting
  • Model service catalogue
  • Security
  • Compliance and auditability
  • Your ML in production project: Presentation.

Campus Location

500 8th Ave Suite 905, New York, NY 10018
Nearby Subways
1 2 3 34th, Penn Station
A C E 34th, Penn Station
N Q R B D F M 34th, Herald Square
We do not offer this course at this moment. Please join our waiting list to be notified when it becomes available again.

Instructor

Kyle Gallatin
Kyle Gallatin
NYC Data Science Mentor
Kyle Gallatin is currently a software engineer on the machine learning platform team at Etsy. In this role, Kyle is redesigning existing ML systems with a focus on ML model training, real-time model serving, MLOps processes, and model governance. Kyle spends his free time teaching and volunteering within the ML space. He also writes articles for technical publications on ML engineering, MLOps, and infrastructure.

Related Courses

A class for people with no data science background who wish to learn AI trends.
This seven hour workshop will cover everything you need to know to understand the current trends in generative AI, how these models work and how to train them, and how you can leverage them for the finance industry as your competitive edge.