Reinforcement learning has exhibited notable empirical success in approximating solutions to the Hamilton-Jacobi-Bellman (HJB) equation, consequently generating highly dynamic controllers. However, the inability to bind the suboptimality of resulting controllers or the approximation quality of the true cost-to-go function due to finite sampling and function approximators has limited the broader application of such methods.
Consequently, research efforts have intensified towards developing methods that offer guarantees in this regard. Various approaches have been explored, including lower bounding the value function, relaxing the HJB equation, and considering both discrete and continuous-time systems.
In recent studies, researchers from MIT CSAIL have extended prior work by providing both under- and over-approximations of the value function within a compact region for continuous-time nonlinear systems. This is achieved by synthesizing tight value function approximations through convex optimization, specifically sums-of-squares (SOS) programming, which can be solved efficiently.
Unlike many existing works that focus on global approximators, this approach generates local approximations over regions of interest, enhancing the quality of the approximation, particularly for underactuated robotic systems. The use of SOS conditions over compact sets strengthens the approximation and expands the regions over which resulting controllers can stabilize the system.
While previous work in the controls literature has predominantly employed SOS-based methods for stability and safety analysis, with a focus on Lyapunov or barrier certificates, this research emphasizes optimality alongside stability. By leveraging the original robot dynamics without local approximations and incorporating a notion of optimality, the resulting SOS-based controllers can stabilize the system over larger regions of the state space. Notably, unlike prior approaches requiring locally stabilizing initial controllers for non-autonomous systems, this method synthesizes value function approximators without any such requirement, facilitating the derivation of stabilizing controllers across various experiments.
Their research presents a strengthened numerical relaxation of existing programs for computing value function estimates that approximately satisfy the HJB over a compact domain. It analyzes the local performance of these value approximations by computing inner approximations of both the closed-loop system’s region of attraction and the region where the synthesized controllers perform effectively.
Finally, they apply this approach to continuous robotic systems, showcasing tight under and over-estimates of the value function and the corresponding controller’s ability to stabilize systems across a large region of the state space. They find that the under-approximation formulation to hybrid systems with contacts, validating the framework on the hybrid planar-pusher system, represents the first instance of time-invariant polynomial controllers synthesized with SOS achieving full cart-pole swing-up and completing the planar-pushing task.
Check out the Paper and Code. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.
If you like our work, you will love our newsletter..
Don’t Forget to join our 39k+ ML SubReddit
Arshad is an intern at MarktechPost. He is currently pursuing his Int. MSc Physics from the Indian Institute of Technology Kharagpur. Understanding things to the fundamental level leads to new discoveries which lead to advancement in technology. He is passionate about understanding the nature fundamentally with the help of tools like mathematical models, ML models and AI.