Introduces the theoretical foundations and advanced concepts of neural networks, generative models, transformers, and large language models. Students will explore how these AI systems create new data, process information, and learn through feedback, while analyzing their applications across various fields. The course emphasizes key principles in model building, optimization, and real-world generative AI use cases.

Generative AI Part 2

Recommended experience
Recommended experience
Intermediate level
Understanding of linear algebra.
Recommended experience
Recommended experience
Intermediate level
Understanding of linear algebra.
Skills you'll gain
Tools you'll learn
Details to know

Add to your LinkedIn profile
April 2026
33 assignments
See how employees at top companies are mastering in-demand skills

There are 7 modules in this course
In this module, you will explore Transformer-based models in natural language processing. You will study pretraining approaches such as BERT and GPT, the mathematics of pretraining word embeddings, and various optimization and scaling strategies critical to effective language modeling.
What's included
5 videos20 readings3 assignments
5 videos•Total 28 minutes
- Pre-Training•4 minutes
- BERT & Tuning•9 minutes
- GPT and RAG•5 minutes
- Prompt Engineering•6 minutes
- Scaling Law & Transfer Learning•4 minutes
20 readings•Total 217 minutes
- Course Introduction•1 minute
- Meet Your Faculty•1 minute
- Syllabus - Generative AI Part 2•10 minutes
- Recommended Prior Knowledge•100 minutes
- Academic Integrity•1 minute
- Module Overview•2 minutes
- Transformers for NLP•8 minutes
- Pre-Trained Word Embeddings•3 minutes
- Pre-Training Whole Models•3 minutes
- Reconstructing the Input•3 minutes
- Pre-Training Through Language Modeling•8 minutes
- Fine-Tuning BERT•15 minutes
- Fine-Tuning In-Depth•15 minutes
- Pre-Training Decoders•5 minutes
- Generative Pretrained Transformer•10 minutes
- Scaling Laws•8 minutes
- Scaling Efficiency•7 minutes
- Pre-Training Encoder/Decoders•7 minutes
- Span Corruption•7 minutes
- Module Wrap-Up•3 minutes
3 assignments•Total 9 minutes
- Module 8- Assess Your Learning 1•3 minutes
- Module 8- Assess Your Learning 2•3 minutes
- Module 8- Assess Your Learning 3•3 minutes
This module investigates deep latent variable models, focusing on variational autoencoders (VAEs) and related probabilistic methods. You will analyze the mathematics behind sampling strategies, evidence lower bound (ELBO), variational inference, reparameterization tricks, and amortized inference, developing an advanced toolkit for probabilistic generative modeling.
What's included
6 videos14 readings3 assignments
6 videos•Total 54 minutes
- Probability, Density, Mass Function•10 minutes
- VAE Introduction•8 minutes
- Sampling & Monte Carlo Optimization•11 minutes
- Evidence Lower Bound (ELBO) Part 1•8 minutes
- Evidence Lower Bound (ELBO) Part 2•5 minutes
- Variational Autoencoders in Depth•12 minutes
14 readings•Total 112 minutes
- Module Overview•2 minutes
- Deep Latent Variable Models•8 minutes
- Mixture of Gaussians•10 minutes
- Variational Autoencoder (VAE)•10 minutes
- Discrete and Continuous Space•8 minutes
- Naïve Monte Carlo•5 minutes
- Importance Sampling•8 minutes
- ELBO Deep Dive•8 minutes
- Return to Variational Autoencoders•15 minutes
- Variational Approximation•10 minutes
- Variational Autoencoder Continued•10 minutes
- Reparameterization Trick•10 minutes
- Amortization in VAE•5 minutes
- Module Wrap-Up•3 minutes
3 assignments•Total 9 minutes
- Module 9- Assess Your Learning 1•3 minutes
- Module 9- Assess Your Learning 2•3 minutes
- Module 9- Assess Your Learning 3•3 minutes
In this module, you'll explore normalizing flows as precise tools for modeling complex probability distributions through invertible neural networks. You’ll examine the underpinnings, including determinants, geometry, invertibility constraints, and specific flow architectures like Real-NVP and autoregressive models. You'll also investigate practical applications and synthesis of complex densities using normalizing flows.
What's included
8 videos25 readings4 assignments
8 videos•Total 33 minutes
- Normalizing Flow Part 1•4 minutes
- 1D Introduction•4 minutes
- Change of Variables Explained•3 minutes
- Introduction to Forward and Inverse Mapping•4 minutes
- 2D Example: Deep Neural Network•4 minutes
- Linear Flows•6 minutes
- Elementwise & Other Types of Flows•7 minutes
- Summary of Normalizing Flows•1 minute
25 readings•Total 124 minutes
- Module Overview•2 minutes
- Introduction to Normalizing Flow•10 minutes
- 1D Normalizing Flow•2 minutes
- Measuring Probability•12 minutes
- Change of Variables Formula•5 minutes
- Geometry Info•5 minutes
- Determinants and Volumes•2 minutes
- Forward and Inverse Mapping•2 minutes
- Learning•1 minute
- General Use Case•12 minutes
- Forward Mapping With a Deep Neural Network•5 minutes
- Training Objective for Normalizing Flows•5 minutes
- Flow Model Requirements•3 minutes
- Triangular Jacobian•1 minute
- Overview and Linear Flows•3 minutes
- Elementwise Flows•5 minutes
- Coupling Flows•5 minutes
- Introduction to NICE•5 minutes
- Real-NVP: Non-Volume Preserving Extension of NICE•7 minutes
- Interpolation in Latent Space With Real-NVP•3 minutes
- Autoregressive Flows•3 minutes
- Continuous Autoregressive Models as Flow Models•5 minutes
- Inverse Autoregressive Flows•8 minutes
- Applications of Normalizing Flows•10 minutes
- Module Wrap-Up•3 minutes
4 assignments•Total 12 minutes
- Module 10- Assess Your Learning 1•3 minutes
- Module 10- Assess Your Learning 2•3 minutes
- Module 10- Assess Your Learning 3•3 minutes
- Module 10- Assess Your Learning 4•3 minutes
This module provides a deep exploration of Generative Adversarial Networks (GANs), focusing on their formulation as likelihood-free generative models. You'll analyze GAN training dynamics, including optimization challenges, mode collapse, and divergence minimization strategies. The module also covers advanced GAN variants such as f-GAN and Wasserstein GAN (WGAN).
What's included
29 readings5 assignments
29 readings•Total 121 minutes
- Module Overview•2 minutes
- Refresher•5 minutes
- Towards Likelihood-Free Learning•6 minutes
- Likelihood-Free Learning•5 minutes
- Generative Modeling and Two-Sample Tests•3 minutes
- Discrimination as a Signal•4 minutes
- Overview•4 minutes
- Generator vs. Discriminator Diagram•5 minutes
- Training Objective for Discriminator•4 minutes
- Interpretation•5 minutes
- Loss Functions•5 minutes
- Training Algorithm•3 minutes
- Key Observations•2 minutes
- Alternating Optimization in GANs•2 minutes
- Examples•4 minutes
- Introduction•1 minute
- Optimization Challenges in GANs•2 minutes
- Mode Collapse•5 minutes
- Beyond KL and Jenson-Shannon Divergence•2 minutes
- f-divergences•2 minutes
- What is Lower Semicontinuity?•4 minutes
- Examples of f-divergences and Training•5 minutes
- Toward Variational Divergence Minimization•10 minutes
- f-GAN Variational Divergence Minimization•5 minutes
- Wasserstein (Earth Mover) Distance•5 minutes
- Discrete Distributions•8 minutes
- Wasserstein Distance for Continuous Distributions•5 minutes
- Inferring Latent Representations in GANs•5 minutes
- Module Wrap-Up•3 minutes
5 assignments•Total 15 minutes
- Module 11- Assess Your Learning 1•3 minutes
- Module 11- Assess Your Learning 2•3 minutes
- Module 11- Assess Your Learning 3•3 minutes
- Module 11- Assess Your Learning 4•3 minutes
- Module 11- Assess Your Learning 5•3 minutes
In this module, you will explore energy-based generative models and score-based modeling frameworks from a mathematical and implementation perspective. You'll dive deeply into the details of training via score functions, contrastive divergence, and various forms of score matching including denoising techniques, highlighting their theoretical and practical implications.
What's included
34 readings5 assignments
34 readings•Total 175 minutes
- Module Overview•2 minutes
- Background•3 minutes
- Parameterizing Probability Distribution: Definition•3 minutes
- Parameterizing Probability Distributions: Solution•7 minutes
- Energy-Based Models•5 minutes
- Pros and Cons of Energy Based Models•2 minutes
- Examples•5 minutes
- Examples Continued•5 minutes
- Computing the Normalization Constant•5 minutes
- Introduction•2 minutes
- Contrastive Divergence Algorithm•8 minutes
- Sampling in Energy-Based Models•5 minutes
- Score Function•8 minutes
- Score Matching•8 minutes
- Score-Based Models Introduction•2 minutes
- Background•3 minutes
- Denoising Score Matching Part 1: Introduction•6 minutes
- Denoising Score Matching Part 2: Defining the Objective•4 minutes
- Denoising Score Matching Part 3: Gradient Expansion•8 minutes
- Gradient Derivation•6 minutes
- Intuition•5 minutes
- Why Denoising Works in Score Matching•3 minutes
- Comparison Between NSM and DSM•2 minutes
- Tweedie Formula•4 minutes
- Overview of Sliced Score Matching (SSM)•8 minutes
- Data Generation with Score-Based Models•8 minutes
- Pitfalls With Score-Based Models•8 minutes
- Solution to Pitfalls•8 minutes
- Introduction to NCSBM•5 minutes
- Annealed Langevin Dynamics•8 minutes
- Training Noise Conditional Score Networks•3 minutes
- Choosing Noise Scales•5 minutes
- Choosing the Weighting Function•8 minutes
- Module Wrap-Up•3 minutes
5 assignments•Total 15 minutes
- Module 12- Assess Your Learning 1•3 minutes
- Module 12- Assess Your Learning 2•3 minutes
- Module 12- Assess Your Learning 3•3 minutes
- Module 12- Assess Your Learning 4•3 minutes
- Module 12- Assess Your Learning 5•3 minutes
You'll delve deeply into diffusion models, understanding them mathematically as stochastic processes and connecting them explicitly to score-based models. The module examines forward and reverse diffusion processes, training objectives, SDEs, predictor-corrector methods, and latent diffusion architectures, providing robust foundations for modern generative modeling.
What's included
41 readings6 assignments
41 readings•Total 201 minutes
- Module Overview•2 minutes
- Introduction•4 minutes
- Model Families Continued•8 minutes
- Definition•2 minutes
- Diffusion Process•5 minutes
- Distribution of Each Term•5 minutes
- Diffusion Kernel•15 minutes
- Marginal Distributions•4 minutes
- Conditional Distribution•7 minutes
- Backward Diffusion Process-Decoder•15 minutes
- Encoder / Decoder•4 minutes
- Loss Function•2 minutes
- Gaussian Distribution and Its Mean•2 minutes
- Diffusion Models as Score-Based Models•4 minutes
- Decoder Parameterization•8 minutes
- Loss Function•5 minutes
- Training and Inference•5 minutes
- U-Net Architecture•7 minutes
- Infinite Noise Levels Score-Based Modeling•5 minutes
- Perturbing Data With Stochastic Processes•3 minutes
- Stochastic Differential Equations (SDEs)•5 minutes
- Types of SDEs and Noise Evolution•5 minutes
- Reverse Stochastic Process•7 minutes
- Role of the Score Function•3 minutes
- Time-Dependent Score-Based Model•2 minutes
- Training Objective•2 minutes
- Reverse-Time SDE•3 minutes
- Euler-Maruyama Approximation and Summary•5 minutes
- Where Does the Time Step Come From?•5 minutes
- Step-by-Step Sampling: Euler-Maruyama Method•5 minutes
- Predictor-Corrector Sampling Methods•5 minutes
- Combined Predictor-Corrector Sampling•3 minutes
- Probability Flow ODE•5 minutes
- Likelihood Computation•6 minutes
- Practical Considerations and Conclusion•5 minutes
- Intro to Latent Diffusion Models•2 minutes
- Conditional Generation•5 minutes
- Improving Image Quality•3 minutes
- Control the Generation Process•7 minutes
- Examples•3 minutes
- Module Wrap-Up•3 minutes
6 assignments•Total 18 minutes
- Module 13- Assess Your Learning 1•3 minutes
- Module 13- Assess Your Learning 2•3 minutes
- Module 13- Assess Your Learning 3•3 minutes
- Module 13- Assess Your Learning 4•3 minutes
- Module 13- Assess Your Learning 5•3 minutes
- Module 13- Assess Your Learning 6•3 minutes
In this module, you'll study annealed importance sampling (AIS) methods for estimating complex probability distributions with rigorous mathematical treatment. You will mathematically analyze AIS step-by-step processes, intermediate distributions, and normalization constants, applying these techniques effectively to probabilistic models, to wrap up the course. You will also assess the evolution of generative models.
What's included
40 readings7 assignments
40 readings•Total 136 minutes
- Module Overview•2 minutes
- Overview of AIS•5 minutes
- Example: AIS With a Gaussian Distribution•3 minutes
- Intermediate Step (t = 1)•5 minutes
- Intermediate Step (t = 2)•5 minutes
- Final Steps (t = 8)•5 minutes
- Setup•5 minutes
- Step-By Step Solution for t = 1•5 minutes
- Applications and Takeaways•2 minutes
- Normalization of Probability Density Functions•2 minutes
- Examples of Normalizing Constants•2 minutes
- Steps to Normalize p(z)•3 minutes
- Wrapping Up Probability Distributions•2 minutes
- Model Family Recap•5 minutes
- Model Families Continued•5 minutes
- Distances of Probability Distributions•5 minutes
- Evaluating Generative Models•1 minute
- What is the Task That You Care About?•1 minute
- Evaluation•7 minutes
- Kernel Density Estimation (KDE)•7 minutes
- Latent Variables & Sample Quality•5 minutes
- HYPE: Human Eye Perceptual Evaluation•3 minutes
- Inception Scores•3 minutes
- Sharpness•3 minutes
- Diversity•2 minutes
- Inception Scores Finalized•2 minutes
- Relationship Between Inception Score and KL Divergence•7 minutes
- Frechet Inception Distance (FID)•2 minutes
- Kernel Inception Distance (KID)•2 minutes
- FID vs. KID•1 minute
- Evaluating Sample Quality for Text-to-Image Models•5 minutes
- Evaluating Latent Representations•1 minute
- Clustering•3 minutes
- Lossy Compression or Reconstruction•1 minute
- Distentanglement•3 minutes
- Beta-VAE•3 minutes
- Solving Tasks Through Prompting•4 minutes
- Holistic Evaluation of Language Models (HELM)•5 minutes
- Module Wrap-Up•3 minutes
- Congratulations!•1 minute
7 assignments•Total 21 minutes
- Module 14- Assess Your Learning 1•3 minutes
- Module 14- Assess Your Learning 2•3 minutes
- Module 14- Assess Your Learning 3•3 minutes
- Module 14- Assess Your Learning 4•3 minutes
- Module 14- Assess Your Learning 5•3 minutes
- Module 14- Assess Your Learning 6•3 minutes
- Module 14- Assess Your Learning 7•3 minutes
Instructor

Offered by

Offered by

Founded in 1898, Northeastern is a global research university with a distinctive, experience-driven approach to education and discovery. The university is a leader in experiential learning, powered by the world’s most far-reaching cooperative education program. The spirit of collaboration guides a use-inspired research enterprise focused on solving global challenges in health, security, and sustainability.
Why people choose Coursera for their career

Felipe M.

Jennifer J.

Larry W.

Chaitanya A.

Open new doors with Coursera Plus
Unlimited access to 10,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription
Advance your career with an online degree
Earn a degree from world-class universities - 100% online
Join over 3,400 global companies that choose Coursera for Business
Upskill your employees to excel in the digital economy
Frequently asked questions
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
When you purchase a Certificate you get access to all course materials, including graded assignments. Upon completing the course, your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.
Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.
More questions
Financial aid available,