Packt

Cutting-Edge Topics in Deep Reinforcement Learning

Packt

Cutting-Edge Topics in Deep Reinforcement Learning

Included with Coursera Plus

Gain insight into a topic and learn the fundamentals.
Advanced level

Recommended experience

7 hours to complete
Flexible schedule
Learn at your own pace
Gain insight into a topic and learn the fundamentals.
Advanced level

Recommended experience

7 hours to complete
Flexible schedule
Learn at your own pace

What you'll learn

  • Understand continuous action spaces and their applications in deep reinforcement learning

  • Master trust region methods for stable policy optimization in RL

  • Explore black-box optimization techniques to solve complex RL problems

Details to know

Shareable certificate

Add to your LinkedIn profile

Recently updated!

April 2026

Assessments

8 assignments

Taught in English

See how employees at top companies are mastering in-demand skills

 logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Build your subject-matter expertise

This course is part of the Deep Reinforcement Learning Hands-On Specialization
When you enroll in this course, you'll also be enrolled in this Specialization.
  • Learn new concepts from industry experts
  • Gain a foundational understanding of a subject or tool
  • Develop job-relevant skills with hands-on projects
  • Earn a shareable career certificate

There are 8 modules in this course

This module introduces advanced reinforcement learning techniques for environments with continuous action spaces. Learners will explore the A2C method, analyze its performance, and implement practical solutions for training agents in such domains. Hands-on coding examples and experimental results will deepen understanding of policy gradient methods in continuous settings.

What's included

1 video5 readings1 assignment

This module explores advanced techniques for stabilizing policy gradient methods in deep reinforcement learning. Learners will compare and contrast Proximal Policy Optimization (PPO), Trust Region Policy Optimization (TRPO), and ACKTR, examining their theoretical foundations and practical performance. By the end, you will understand how these methods improve training stability and efficiency.

What's included

1 video4 readings1 assignment

This module introduces black-box optimization techniques in reinforcement learning, highlighting their principles and recent applications to complex environments. Learners will explore practical implementations using evolutionary strategies and genetic algorithms, and analyze performance results on benchmark tasks such as CartPole and HalfCheetah.

What's included

1 video4 readings1 assignment

This module delves into advanced exploration strategies in reinforcement learning, highlighting the exploration/exploitation dilemma and presenting alternative methods such as random exploration, noisy networks, and network distillation. Learners will experiment with these techniques in the MountainCar environment and compare their effectiveness using both DQN and PPO algorithms.

What's included

1 video6 readings1 assignment

This module introduces reinforcement learning with human feedback (RLHF), a technique for training agents when explicit reward functions are difficult to define. Learners will explore the RLHF pipeline, including data labeling, reward model training, and integration with reinforcement learning algorithms. Real-world applications, such as training large language models, are also discussed.

What's included

1 video6 readings1 assignment

This module explores advanced model-based reinforcement learning techniques through the lens of AlphaGo Zero and MuZero. Learners will examine Monte Carlo Tree Search (MCTS), neural network architectures, and the process of training agents for board games like Connect 4. Practical implementation details and evaluation strategies are also covered.

What's included

1 video11 readings1 assignment

This module explores how deep reinforcement learning techniques can be applied to discrete optimization problems, using the example of solving cubes. Learners will examine neural network architectures, training processes, and experimental results, gaining insight into both implementation and evaluation of RL-based solvers.

What's included

1 video5 readings1 assignment

This module introduces the fundamentals of multi-agent reinforcement learning (MARL), exploring how multiple agents interact and learn within shared environments. Learners will examine the application of deep Q-networks to groups of agents and analyze the resulting behaviors. Practical examples illustrate how agent strategies evolve in multi-agent scenarios.

What's included

1 video2 readings1 assignment

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.

Instructor

Packt - Course Instructors
Packt
1,728 Courses488,803 learners

Offered by

Packt

Explore more from Software Development

Why people choose Coursera for their career

Felipe M.

Learner since 2018
"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

Jennifer J.

Learner since 2020
"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

Larry W.

Learner since 2021
"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."
Coursera Plus

Open new doors with Coursera Plus

Unlimited access to 10,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription

Advance your career with an online degree

Earn a degree from world-class universities - 100% online

Join over 3,400 global companies that choose Coursera for Business

Upskill your employees to excel in the digital economy

Frequently asked questions