Probabilistic metrics
Tip
All the metrics listed below are accessible via evalp
, the probabilistic
entry point of evalhyd
.
For example, the Brier score can be computed as follows:
#include <xtensor/xtensor.hpp>
#include <xtensor/xio.hpp>
#include <evalhyd/evalp.hpp>
xt::xtensor<double, 2> obs = {{4.7, 4.3, 5.5, 2.7, 4.1}};
xt::xtensor<double, 4> prd = {{{{5.3, 4.2, 5.7, 2.3, 3.1},
{4.3, 4.2, 4.7, 4.3, 3.3},
{5.3, 5.2, 5.7, 2.3, 3.9}}}};
xt::xtensor<double, 2> thr = {{4., 5.}};
std::cout << evalhyd::evalp(obs, prd, {"BS"}, thr, "high")[0] << std::endl;
// {{{{{ 0.222222, 0.133333}}}}}
>>> import numpy
... obs = numpy.array(
... [[4.7, 4.3, 5.5, 2.7, 4.1]]
... )
... prd = numpy.array(
... [[[[5.3, 4.2, 5.7, 2.3, 3.1],
... [4.3, 4.2, 4.7, 4.3, 3.3],
... [5.3, 5.2, 5.7, 2.3, 3.9]]]]
... )
... thr = numpy.array([[4., 5.]])
>>> import evalhyd
... evalhyd.evalp(obs, prd, ["BS"], thr, events="high")
[array([[[[[0.22222222, 0.13333333]]]]])]
> obs < rbind(
+ c(4.7, 4.3, 5.5, 2.7, 4.1)
+ )
> prd < array(
+ rbind(
+ c(5.3, 4.2, 5.7, 2.3, 3.1),
+ c(4.3, 4.2, 4.7, 4.3, 3.3),
+ c(5.3, 5.2, 5.7, 2.3, 3.9)
+ ),
+ dim = c(1, 1, 3, 5)
+ )
> thr < rbind(
+ c(4., 5.)
+ )
> library(evalhyd)
> evalhyd::evalp(obs, prd, c("BS"), thr, events="high")
[[1]]
, , 1, 1, 1
[,1]
[1,] 0.2222222
, , 1, 1, 2
[,1]
[1,] 0.1333333
$ ./evalhyd evalp "./obs/" "./prd/" "BS" q_thr "./thr/" events "high"
{{{{{ 0.222222, 0.133333}}}}}
BS
Brier Score ("BS"
) originally derived by Brier (1950), but
computed as per Wilks (2011):
where, for a dichotomous event, \(y_k\) is the event forecast probability, \(o_k\) is the observed event outcome, and \(n\) is the number of time steps.
Required inputs 
Output shape 



BSS
Brier Skill Score ("BSS"
), computed as per Wilks (2011):
where \(BS_{reference} = \frac{1}{n} \sum_{k=1}^{n} (o_k  \bar{o})^2\)2, \(o_k\) is the observed event outcome, \(n\) is the number of time steps, and \(\bar{o}\) is the mean observed event occurrence for the study period.
Required inputs 
Output shape 



BS_CRD
CalibrationRefinement Decomposition of the Brier Score ("BS_CRD"
)
into the three components reliability, resolution, and uncertainty
[returned in this order].
Required inputs 
Output shape 



BS_LBD
LikelihoodBase rate Decomposition of the Brier Score ("BS_LBD"
)
into the three components type 2 bias, discrimination, and sharpness
(a.k.a. refinement) [returned in this order].
Required inputs 
Output shape 



REL_DIAG
X and Y axes of the reliability diagram ("REL_DIAG"
) and ordinates
of its associated sampling histogram: forecast probabilities (X),
observed frequencies (Y), and number of forecasts for each forecast
probability [returned in this order].
Required inputs 
Output shape 



CRPS_FROM_BS
Continuous Ranked Probability Score computed from 101 Brier Scores
("CRPS_FROM_BS"
), i.e. using the observed minimum, the 99 observed
percentiles, and the observed maximum as streamflow thresholds.
Required inputs 
Output shape 



CRPS_FROM_ECDF
Continuous Ranked Probability Score computed from the Empirical Cumulative
Density Function ("CRPS_FROM_ECDF"
), i.e. constructed from the ensemble
member predictions.
Required inputs 
Output shape 



QS
Quantile Scores ("QS"
) where the ensemble member predictions are treated
as quantiles.
Required inputs 
Output shape 



CRPS_FROM_QS
Continuous Ranked Probability Score computed from the Quantile Scores
("CRPS_FROM_QS"
).
Required inputs 
Output shape 



POD
Probability Of Detection ("POD"
) also known as “hit rate”, derived
from the contingency table.
Required inputs 
Output shape 



POFD
Probability Of False Detection ("POFD"
) also known as “false alarm rate”,
derived from the contingency table.
Required inputs 
Output shape 



FAR
False Alarm Ratio ("FAR"
), derived from the contingency table.
Required inputs 
Output shape 



CSI
Critical Success Index ("CSI"
), derived from the contingency table.
Required inputs 
Output shape 



ROCSS
Relative Operating Characteristic Skill Score ("ROCSS"
), derived from
the contingency table, and based on computing the area under the ROC curve.
Required inputs 
Output shape 



RANK_HIST
Frequencies of the Rank Histogram ("RANK_HIST"
), also known as the
Talagrand diagram.
Required inputs 
Output shape 



DS
Delta score ("DS"
) as per Candille and Talagrand (2005).
Required inputs 
Output shape 



AS
Alpha score ("AS"
) as per Renard et al. (2010).
Required inputs 
Output shape 



CR
Coverage ratio ("CR"
), i.e. the portion of observations falling within the
predictive intervals. It is a measure of the reliability of the predictions.
Required inputs 
Output shape 



AW
Average width ("AW"
) of the predictive interval(s). It is a measure of the
sharpness of the predictions.
Required inputs 
Output shape 



AWN
Average width of the predictive interval(s) normalised by the mean
observation2 ("AWN"
), computed as per
Bourgin et al. (2015).
Required inputs 
Output shape 



WS
Winkler score ("WS"
), also known as interval score, computed as per
Gneiting and Raftery (2007).
where, for a given confidence level, \(\alpha\) is the portion not included in the central predictive interval, \(u\) and \(l\) are the upper and lower bounds of the predictive interval, respectively, \(x\) are the observations, and \(n\) is the number of time steps.
Required inputs 
Output shape 



ES
Energy score ("ES"
) is a multivariate (i.e. multisite) generalisation
of the continuous rank probability score.
Required inputs 
Output shape 



Footnotes
 1(1,2,3,4,5,6,7,8,9,10,11)
The threshold value is included in the definition of the events both for low flow and high flow events, i.e. where a streamflow observation/prediction value is equal to the threshold value, the event is considered to have occurred.
 2(1,2)
The metric value returned is \(\infty\) when the reference/climatology/normalisation value is zero.