We employ the specified control policy to gather training data for constructing the prediction model. %, e.g., a neural network. Note that if the control policy is already well-trained, the collected data will predominantly consist of safe trajectories, with unsafe trajectories constituting only a small fraction. This imbalance can hinder the development of an efficient prediction model. To address this issue, we introduce random action noises into the control policy during the data collection phase to balance the proportion of safe and unsafe trajectories. This strategy facilitates the construction of a more accurate prediction model capable of identifying potential unsafe actions.
To predict the safety of each proposed action during runtime, we train a prediction model $M$ using the collected dataset. For a given state-action pair, the prediction model estimates the future STL score for that pair. For brevity, we denote the output of the prediction model, i.e., the estimated STL score, throughout the remainder of this paper. In this work, the prediction model is represented by a lightweight neural network and is specifically trained for each CPS task under consideration.