KodeKloud

Data Engineering Essentials

Seize the savings! Get 40% off 3 months of Coursera Plus and full access to thousands of courses.

KodeKloud

Data Engineering Essentials

Mumshad Mannambeth

Instructor: Mumshad Mannambeth

Included with Coursera Plus

Gain insight into a topic and learn the fundamentals.
Beginner level

Recommended experience

5 hours to complete
Flexible schedule
Learn at your own pace
Gain insight into a topic and learn the fundamentals.
Beginner level

Recommended experience

5 hours to complete
Flexible schedule
Learn at your own pace

What you'll learn

  • Build scalable data pipelines using Pandas Polars and Apache Spark for diverse dataset sizes

  • Architect real time streaming solutions with Apache Kafka and feature stores for live ML inference

  • Automate complex ML workflows using Airflow and Prefect to ensure reliable continuous training

Details to know

Shareable certificate

Add to your LinkedIn profile

Recently updated!

March 2026

Assessments

4 assignments

Taught in English

See how employees at top companies are mastering in-demand skills

 logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

There are 4 modules in this course

Explore the foundational shift from traditional software development to data-centric machine learning operations. You will compare DevOps and MLOps workflows while mastering the core pillars of CI, CD, CT, and CM. This section establishes the architectural blueprint for building reliable and automated machine learning systems.

What's included

10 videos3 readings1 assignment

Master the essential techniques for collecting and preparing high-quality data for machine learning models. You will implement robust ETL processes and explore the strategic role of Data Lakes in modern ML stacks. Hands-on labs with Pandas and Polars will provide practical experience in transforming raw datasets into clean features.

What's included

7 videos2 readings1 assignment

Scale your engineering capabilities to handle massive datasets and real-time information flows. This module introduces distributed computing with Apache Spark and Dask alongside high-velocity streaming via Apache Kafka. You will also evaluate the critical role of Feature Stores in maintaining consistency between training and serving.

What's included

7 videos1 reading1 assignment

Connect individual data tasks into a seamless and automated production pipeline using Airflow and Prefect. You will learn to manage complex dependencies and schedule automated training triggers to ensure model performance over time. This section focuses on making your data workflows resilient through advanced monitoring and error handling.

What's included

4 videos2 readings1 assignment

Instructor

Mumshad Mannambeth
KodeKloud
7 Courses 33,797 learners

Offered by

KodeKloud

Why people choose Coursera for their career

Felipe M.

Learner since 2018
"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

Jennifer J.

Learner since 2020
"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

Larry W.

Learner since 2021
"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."
Coursera Plus

Open new doors with Coursera Plus

Unlimited access to 10,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription

Advance your career with an online degree

Earn a degree from world-class universities - 100% online

Join over 3,400 global companies that choose Coursera for Business

Upskill your employees to excel in the digital economy

Frequently asked questions