Adaptive dynamic programming (ADP), also known as approximate dynamic programming, neuro-dynamic programming, and reinforcement learning (RL), is a class of promising techniques to solve the problems of optimal control for discrete-time (DT) and continuous-time (CT) nonlinear systems.
ADP is a branch of machine learning that focuses on building controllers for dynamical systems. ADP is particularly useful for complex systems that are difficult to model mathematically or have uncertain parameters. ADP algorithms use a trial-and-error process to learn the optimal control policy for a given system, which is then used to control the system in real-time.
ADP is inspired by biological systems, which are able to learn from experience and adapt to changing conditions. In ADP, the system is modeled as a Markov decision process (MDP), where the state of the system is modeled as a set of variables and the control policy is modeled as a mapping from the current state to an action. The goal of ADP is to learn the optimal control policy, which maximizes a performance metric over time.
ADP algorithms typically consist of two main components: a critic and an actor. The critic estimates the value function, which measures the long-term performance of the control policy, while the actor updates the control policy based on the estimated value function. ADP algorithms use a form of temporal difference learning to update the critic and actor iteratively, using the difference between the estimated value function and the actual value function to adjust the control policy.
ADP has been successfully applied to a wide range of applications, including robotics and control systems. Its ability to learn from experience and adapt to changing conditions makes it a powerful tool for building robust controllers for complex systems.