In this RQ, we compare the performance of MORTAR with the original baseline policy and another representative action repair method, which is referred to as Safe Exploration (SafeExp) in this paper. The performance is evaluated from two perspectives: (1) the effectiveness of the repair method in enhancing the safety of the original policy and (2) the impact of the repair method on computational overhead, ensuring it does not significantly interfere with system operations.
If an unsafe action is identified by the trained prediction model, we employ the targeted BIM to compute a repaired action. In our experiments, we set the step size and maximum iterations of the BIM to 0.1 and 3, respectively. Note that the noise policies are used only to collect training data for the prediction model. To evaluate the effectiveness of MORTAR, we conduct experiments using the original DRL control policies. We also compare MORTAR with Safe Exploration (SafeExp), a representative action repair method that uses an offline-trained safety layer to rectify erroneous actions. For more details about SafeExp, please refer to. To analyze the overhead introduced by MORTAR, we record the computational time when MORTAR is activated and compare it to the computational time when no action repair scheme is utilized.
Table 5 showcases the performance of MORTAR in enhancing the safety of DRL control policies. Under standard specifications, MORTAR achieves the highest success rates across all tasks and control policies. Even for already well-performing DRL policies, MORTAR can still increase the success rates to nearly 100%. An expected decrease in overall success rates is observed when stricter specifications are applied. However, MOTAR is still able to enhance safety, achieving improved task completions in 5 out of 6 tasks. The only exception is the Cube Stacking task with the TRPO controller under the strict specification, where SafeExp performs best. Nonetheless, the broad enhancement of safety across almost all tasks demonstrates the effectiveness of MORTAR in increasing the safety of DRL controllers, even when relying solely on input and output information for the action repair.
It is worth noting that SafeExp performs poorly in repairing actions for complex CPSs and tasks. One possible reason could be that SafeExp relies on creating a linear system model to estimate the consequences of each proposed action. However, accurately modelling complex systems and tasks with a linear approach is often challenging and requires a considerable amount of training data. This limitation hinders the performance of SafeExp in the robotic manipulation tasks that are considered to feature high-dimensional state spaces and highly nonlinear system dynamics.
Table 6 presents the analysis of the computational overhead for MORTAR. The results show that MORTAR requires an average of 8.8 milliseconds per action repair step, resulting in a computational time increase of approximately 1.3 milliseconds compared to the original control policy without action repair. This minimal increase still allows for real-time operation of the AI-CPSs, indicating that no latency issues are introduced and MORTAR effectively meets the real-time requirements.