Reinforcement learning approach based on Proximal Policy Optimization algorithm for efficient last mile delivery using Smart Lockers

Authors

  • Mohamed-Ali Ejjanfi Author
  • Jamal Benhra Author

Keywords:

Last-Mile Delivery, Reinforcement Learning (RL), Proximal Policy Optimization (PPO), Metaheuristics, Ant Colony Optimization (ACO), Smart locker, Logistics Optimization

Abstract

The Vehicle Routing Problem (VRP) is a core logistics challenge where routing choices drive cost and service quality. Reinforcement Learning (RL), and in particular Proximal Policy Optimization (PPO), now rivals traditional metaheuristics by learning adaptive routing policies that transfer across instances. We compare PPO with the well-established Ant Colony Optimization (ACO) on city-based benchmarks. Extensive experiments on synthetic and real datasets substantiate these findings. Results show PPO delivers comparable or superior routing efficiency and, after training, orders-of-magnitude faster inference, whereas ACO stays competitive on static VRPs but falters in dynamic, large-scale settings. Our analysis underscores trade-offs between learning-based and search-based methods, highlighting scalability, computation time, and adaptability, and offers guidance for future intelligent logistics optimization.

Published

2026-01-01

How to Cite

Reinforcement learning approach based on Proximal Policy Optimization algorithm for efficient last mile delivery using Smart Lockers. (2026). Inspire Smart Systems, 1(1). https://inspirequill.org/index.php/inspireSmartSystems/article/view/9-25

Similar Articles

You may also start an advanced similarity search for this article.