Verification measures like the RMSE and the ACC will value equally the case of an event being forecast, but not observed, as an event being observed but not forecast. But in real life the failure to forecast a storm that occurred will normally have more dramatic consequences than forecasting a storm that did not occur. To assess the forecast skill under these conditions another type of verifications must be used.
For any threshold (like frost/no frost, rain/dry or gale/no gale) the forecast is simplified to a yes/no statement (categorical forecast). The observation itself is put in one of two categories (event observed/not observed). Let H denote "hits", i.e. all correct yes-forecasts - the event is predicted to occur and it does occur, F false alarms, i.e. all incorrect yes-forecasts, M missed forecasts (all incorrect no-forecasts that the event would not occur) and Z all correct no-forecasts. Assume altogether N forecasts of this type with H+F+M+W=N. A perfect forecast sample is when F and M are zero. A large number of verification scores13 are computed from these four values.
The frequency bias BIAS=(H+F)/(H+M), ratio of the yes forecast frequency to the yes observation frequency.
The proportion of correct PC=(H+Z)/N, gives the fraction of all the forecasts that were correct. Usually it is very misleading because it credits correct "yes" and "no" forecasts equally and it is strongly influenced by the more common category (typically the "no" event).
The probability of detection POD=H/(H+M), also known as Hit Rate (HR), measures the fraction of observed events that were correctly forecast.
The false alarm ratio FAR=F/(H+F), gives the fraction of forecast events that were observed to be non events.
The probability of false detection POFD=F/(Z+F), also known as the false alarm rate, measures the fraction of false alarms given the event did not occur. POFD is generally associated with the evaluation of probabilistic forecast by combining it with POD into the Relative Operating Characteristic diagram (ROC)
A very simple measure of success of categorical forecasts is the difference POD-POFD which is known as the Hansen-Kuiper or True Skill Score. Among other properties, it can be easily generalised for the verification of probabilistic forecast (see 7.4 below).