DEEP REINFORCEMENT LEARNING IN UAV FLIGHT CONTROL AND NAVIGATION: A SYSTEMATIC REVIEW OF ALGORITHMS, BENCHMARKS, AND SAFETY

Waqas Rauf Khattak; Muhammad Asad; Waqar Ahmad

Authors

Waqas Rauf Khattak
Muhammad Asad
Waqar Ahmad

Keywords:

Deep Reinforcement Learning, Unmanned Aerial Vehicles, Flight Control, Autonomous Navigation, Sim-to-Real Transfer, Safe Reinforcement Learning, Benchmarking.

Abstract

Background

Unmanned Aerial Vehicles (UAVs) are also becoming essential so that they can be autonomous within complex, dynamic and uncertain environment. In this type of environment, traditional model-based control and planning methods are frequently faced with scalability and flexibility issues. Deep Reinforcement Learning (DRL) has become one of the promising data-driven paradigms in the past years to control the UAV flight and navigation by end-to-end learning of control policies directly through interaction with the environment. Nevertheless, the issues of benchmarking, sim-to-real transfer, and safety are also the obstacles to the actual implementation.

Objective

This paper is designed to review and synthesize available information on the topic of DRL-based UAV flight control and navigation, and discuss algorithmic performance, benchmarking environments, robustness, sim-to-real transfer, and safety mechanisms. The aim is to give a systematic report of the existing accomplishments, constraints, and gaps in the knowledge to inform future evolution of autonomous UAVs.

Methods

The systematic review was done in accordance with PRISMA 2020. IEEE Xplore, Scopus, Web of Science, ACM Digital Library, and arXiv were searched to find peer-reviewed articles published since 2016. Following the screening and evaluation of eligibility, 187 articles were incorporated into qualitative synthesis. Information was identified about DRL algorithms, UAV tasks, simulation platforms, performance measures, safety measures, and real-world validation. Adapted evaluation criteria on quality of study were done using experimentation, reproducibility and safety reporting criteria.

Results

It is shown in the review that actor-critic algorithms, especially Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC), are more than twice as effective when compared to value-based approaches in continuous UAV control and navigation problems. DRL-based methods demonstrated 20-45 percent control accuracy, 78-96 percent rates of success in navigation, and 3060 percent cutbacks in collision frequency in simulation. Nonetheless, sim-to-real transfer caused a 35-percent performance decrease in the non-robustness-oriented training. DRA frameworks that were safety conscious (such as constrained learning and shielding) minimized safety violations but created trade-offs on the learning efficiency.

Conclusion

The results show that DRL is highly promising to improve the autonomy of UAVs, especially in complex control and navigation problems. However, the issues associated with sim-to-real generalization, safety assurance, and benchmark standardization have not been resolved yet. Subsequent studies are advised to focus on the strength-based training, safety-constrained training and experimental validation to facilitate sound and practical implementation of DRL-based UAV systems.