Advanced
Natural Language Processing for Production (NLP)

Natural Language Processing for Production (NLP)

This course demonstrates a practical and intuitive approach to NLP applications through variety of different use-cases. Essentials and practical fundamentals of NLP methods are presented via generic Python packages including but not limited to Regex, NLTK, SpaCy and Huggingface. The high-level foundations followed by hands-on code examples on a notebook environment will be studied touching on different aspects of NLP from conventional statistical text analytics approaches to the state-of-the-art deep/transfer learning models paired with result interpretations, industry challenges, visualizations and a prototype web application.

Clear
* Tuition paid for part-time courses can be applied to the Data Science Bootcamps if admitted within 9 months.
In response to COVID-19 State reopening, all our courses are hosted online.
We do not offer this course at this moment. Please join our waiting list to be notified when it becomes available again.
Find out more information about our professional development courses.
DOWNLOAD COURSE INFORMATION

Product Description

Course Overview

Society has been effectively communicating with different forms of text data for centuries and since NLP focuses on this type of data that has been exponentially increasing in the last decade, NLP has become one of the most exciting and rapidly growing sub-fields of Artificial Intelligence (AI) with immense research and practical interest. Organizations have been building and executing different text analytics capabilities to be able to:

  • Store and process text data efficiently.
  • Enhance the information extraction from high volume, velocity and variety of data
    sources.
  • Derive insights that are not obvious or feasible through manual human efforts.
  • Improve the decision-making utilizing different sources of information.
  • Automate or accelerate time consuming manual processes.
  • Advance the technology towards more generally applicable human-like AI frameworks.

NLP has had a big leap since 2017 when the large transfer learning models started to become more and more available. Nowadays one can utilize very large Neural Network models that have been trained on massive amount of text data using a piece of code thanks to open-source. This course aims to provide a solid foundation for effectively using these open-source text analytics technologies to be able to create NLP pipelines for different use-cases.

Prerequisites

This course will cover the text analytics starting from very basics and use Python. We keep the code in a Jupyter notebook using functions and will not dive into object-oriented programming, so medium level of Python knowledge suffices to comprehend the course content and assignments.

Certificate

Certificates are awarded at the end of the program at the satisfactory completion of the course. Students are evaluated on a pass/fail basis for their performance on the required assignments.

Students who complete 80% of the homework and attend a minimum of 85% of all classes are eligible for the certificate of completion.

Demo Lecture

In-depth Course Overview of NLP class
Module
Course Demo
Instructor
Tolga Akiner
Description
One hour course demo of what you will be learning in this course

Syllabus

Unit 1: Introduction

  • An introduction to Natural Language Processing, applications and course overview.
  • Running notebooks on different environments either on cloud or local machine.
  • An introduction to Text Analytics (TA) using Python.
  • String methods in Python.

Unit 2: Retrieving and Processing Text Data - 1

  • Parsing unstructured data from different type of sources such as pdf, docx and ppt.
  • How to scrape web to fuel information extraction.
  • First glance into NLTK cook-book and basics of text processing.

Unit 3: Retrieving and Processing Text Data – 2

  • Cleaning, normalizing and segmenting text.
  • Regular Expressions in Python.
  • Assignment 1 (Scraping, cleaning and indexing youtube/reddit/twitter transcripts).

Unit 4: How Machines Understand Text – 1

  • Bag-of-Word (BoW) methodology.
  • Statistical interpretation of natural language via TFIDF.
  • Semantic and Word Embeddings: How Neural Networks help with capturing context.

Unit 5: How Machines Understand Text - 2

  • Chronological flow of contextual models: From word2vec to transformers.
  • Best practices in model selection in NLP.
  • Assignment 2 (Comparison and visualization of different language models for word similarity).

Unit 6: Supervised Approach in NLP

  • Supervised vs Unsupervised methods with text data.
  • Supervised text classification examples using Scikit-learn.
  • Data labeling and subjectivity in text classification.
  • Assignment 3 (Why spam classification is an easier problem than sentiment classification?).

Unit 7: Unsupervised Approach in NLP

  • EDA on text data: You don’t know what you don’t know.
  • Topic modeling with LDA and Kmeans clustering.
  • Visualizing text and topics.
  • Interpretability challenges in unsupervised techniques with natural language.

Unit 8: NLP tasks 1: How to make sense of text data

  • Language deconstruction with SpaCY.
  • Name Entity Recognition (NER) example on Medical Records.
  • Text analytics dilemma: Rule based vs. training based models.
  • Assignment 4 (Are unsupervised topics subjective? Comparing students’ models and interpretation)

Unit 9: NLP Tasks 2: Transfer Learning Applications

  • How transfer learning changed the course of NLP.
  • Huggingface model hub and the power of open-source.
  • Huggingface pipelines: Text summarization, zero-shot learning and QnA.
  • Text generation: Why did GPT-3 get so famous?
  • Domain adaptation through fine-tuning.

Unit 10: NLP Tasks 3: Semantic similarity and NLP in production

  • Semantic similarity and NLP based textual search using sBERT.
  • Indexing a text database for faster information extraction.
  • How does a typical NLP pipeline look like?
  • Web applications upon NLP pipelines via Streamlit.
  • Assignment 5 (Selecting a dataset and query that show the difference between rule-based search and semantic search

Campus Location

500 8th Ave Suite 905, New York, NY 10018
Nearby Subways
1 2 3 34th, Penn Station
A C E 34th, Penn Station
N Q R B D F M 34th, Herald Square
We do not offer this course at this moment. Please join our waiting list to be notified when it becomes available again.

Instructor

Tolga Akiner
Tolga Akiner
Instructor
Tolga Akiner is a Senior Data Scientist in LexisNexis and has NLP experience in different companies and industries that are pharmaceuticals, healthcare, retail and legal. He holds a Ph.D. degree in Mechanical Engineering where he worked on nanomaterials followed by a post-doctoral research heavily using Machine Learning and Active Learning in Materials Science domain. He previously contributed ‘Practical AI’ course in Udemy covering NLP lectures and blogged in Medium focusing on some practical text analytics applications.