Probabilistic metrics

Tip

All the metrics listed below are accessible via evalp, the probabilistic entry point of evalhyd.

For example, the Brier score can be computed as follows:

C++

#include <xtensor/xtensor.hpp>
#include <xtensor/xio.hpp>
#include <evalhyd/evalp.hpp>

xt::xtensor<double, 2> obs = {{4.7, 4.3, 5.5, 2.7, 4.1}};
xt::xtensor<double, 4> prd = {{{{5.3, 4.2, 5.7, 2.3, 3.1},
                                {4.3, 4.2, 4.7, 4.3, 3.3},
                                {5.3, 5.2, 5.7, 2.3, 3.9}}}};
xt::xtensor<double, 2> thr = {{4., 5.}};

std::cout << evalhyd::evalp(obs, prd, {"BS"}, thr, "high")[0] << std::endl;
// {{{{{ 0.222222,  0.133333}}}}}

Python

>>> import numpy
... obs = numpy.array(
...     [[4.7, 4.3, 5.5, 2.7, 4.1]]
... )
... prd = numpy.array(
...     [[[[5.3, 4.2, 5.7, 2.3, 3.1],
...        [4.3, 4.2, 4.7, 4.3, 3.3],
...        [5.3, 5.2, 5.7, 2.3, 3.9]]]]
... )
... thr = numpy.array([[4., 5.]])
>>> import evalhyd
... evalhyd.evalp(obs, prd, ["BS"], thr, events="high")
[array([[[[[0.22222222, 0.13333333]]]]])]

> obs <- rbind(
+     c(4.7, 4.3, 5.5, 2.7, 4.1)
+ )
> prd <- array(
+     rbind(
+         c(5.3, 4.2, 5.7, 2.3, 3.1),
+         c(4.3, 4.2, 4.7, 4.3, 3.3),
+         c(5.3, 5.2, 5.7, 2.3, 3.9)
+     ),
+     dim = c(1, 1, 3, 5)
+ )
> thr <- rbind(
+     c(4., 5.)
+ )
> library(evalhyd)
> evalhyd::evalp(obs, prd, c("BS"), thr, events="high")
[[1]]
, , 1, 1, 1

          [,1]
[1,] 0.2222222

, , 1, 1, 2

          [,1]
[1,] 0.1333333

CLI

$ ./evalhyd evalp "./obs/" "./prd/" "BS" --q_thr "./thr/" --events "high"
{{{{{ 0.222222,  0.133333}}}}}

BS

Brier Score ("BS") originally derived by Brier (1950), but computed as per Wilks (2011):

\[BS = \frac{1}{n} \sum_{k=1}^{n} (o_k - y_k)^2\]

where, for a dichotomous event, \(y_k\) is the event forecast probability, \(o_k\) is the observed event outcome, and \(n\) is the number of time steps.

Required inputs	Output shape
`q_obs`, `q_prd`, `q_thr`, `events`1	`(sites, lead times, subsets, samples, thresholds)`

BSS

Brier Skill Score ("BSS"), computed as per Wilks (2011):

\[BSS = 1 - \frac{BS}{BS_{reference}}\]

where \(BS_{reference} = \frac{1}{n} \sum_{k=1}^{n} (o_k - \bar{o})^2\)2, \(o_k\) is the observed event outcome, \(n\) is the number of time steps, and \(\bar{o}\) is the mean observed event occurrence for the study period.

Required inputs	Output shape
`q_obs`, `q_prd`, `q_thr`, `events`1	`(sites, lead times, subsets, samples, thresholds)`

BS_CRD

Calibration-Refinement Decomposition of the Brier Score ("BS_CRD") into the three components reliability, resolution, and uncertainty [returned in this order].

Required inputs	Output shape
`q_obs`, `q_prd`, `q_thr`, `events`1	`(sites, lead times, subsets, samples, thresholds, 3)`

BS_LBD

Likelihood-Base rate Decomposition of the Brier Score ("BS_LBD") into the three components type 2 bias, discrimination, and sharpness (a.k.a. refinement) [returned in this order].

Required inputs	Output shape
`q_obs`, `q_prd`, `q_thr`, `events`1	`(sites, lead times, subsets, samples, thresholds, 3)`

REL_DIAG

X and Y axes of the reliability diagram ("REL_DIAG") and ordinates of its associated sampling histogram: forecast probabilities (X), observed frequencies (Y), and number of forecasts for each forecast probability [returned in this order].

Required inputs	Output shape
`q_obs`, `q_prd`, `q_thr`, `events`1	`(sites, lead times, subsets, samples, thresholds, bins, 3)`

CRPS_FROM_BS

Continuous Ranked Probability Score computed from 101 Brier Scores ("CRPS_FROM_BS"), i.e. using the observed minimum, the 99 observed percentiles, and the observed maximum as streamflow thresholds.

Required inputs	Output shape
`q_obs`, `q_prd`, `events`1	`(sites, lead times, subsets, samples)`

CRPS_FROM_ECDF

Continuous Ranked Probability Score computed from the Empirical Cumulative Density Function ("CRPS_FROM_ECDF"), i.e. constructed from the ensemble member predictions.

Required inputs	Output shape
`q_obs`, `q_prd`	`(sites, lead times, subsets, samples)`

QS

Quantile Scores ("QS") where the ensemble member predictions are treated as quantiles.

Required inputs	Output shape
`q_obs`, `q_prd`	`(sites, lead times, subsets, samples, quantiles)`

CRPS_FROM_QS

Continuous Ranked Probability Score computed from the Quantile Scores ("CRPS_FROM_QS").

Required inputs	Output shape
`q_obs`, `q_prd`	`(sites, lead times, subsets, samples)`

CONT_TBL

Cells of the Contingency Table ("CONT_TBL"), i.e. the hits \(a\), the false alarms \(b\), the misses \(c\), and the correct rejections \(d\), in this order.

Required inputs	Output shape
`q_obs`, `q_prd`, `q_thr`, `events`1	`(sites, lead times, subsets, samples, levels, thresholds, 4)`

POD

Probability Of Detection ("POD") also known as “hit rate”, derived from the contingency table.

Required inputs	Output shape
`q_obs`, `q_prd`, `q_thr`, `events`1	`(sites, lead times, subsets, samples, levels, thresholds)`

POFD

Probability Of False Detection ("POFD") also known as “false alarm rate”, derived from the contingency table.

Required inputs	Output shape
`q_obs`, `q_prd`, `q_thr`, `events`1	`(sites, lead times, subsets, samples, levels, thresholds)`

FAR

False Alarm Ratio ("FAR"), derived from the contingency table.

Required inputs	Output shape
`q_obs`, `q_prd`, `q_thr`, `events`1	`(sites, lead times, subsets, samples, levels, thresholds)`

CSI

Critical Success Index ("CSI"), derived from the contingency table.

Required inputs	Output shape
`q_obs`, `q_prd`, `q_thr`, `events`1	`(sites, lead times, subsets, samples, levels, thresholds)`

ROCSS

Relative Operating Characteristic Skill Score ("ROCSS"), derived from the contingency table, and based on computing the area under the ROC curve.

Required inputs	Output shape
`q_obs`, `q_prd`, `q_thr`, `events`1	`(sites, lead times, subsets, samples, thresholds)`

RANK_HIST

Frequencies of the Rank Histogram ("RANK_HIST"), also known as the Talagrand diagram.

Required inputs	Output shape
`q_obs`, `q_prd`	`(sites, lead times, subsets, samples, ranks)`

DS

Delta score ("DS") as per Candille and Talagrand (2005).

Required inputs	Output shape
`q_obs`, `q_prd`	`(sites, lead times, subsets, samples)`

AS

Alpha score ("AS") as per Renard et al. (2010).

Required inputs	Output shape
`q_obs`, `q_prd`	`(sites, lead times, subsets, samples)`

CR

Coverage ratio ("CR"), i.e. the portion of observations falling within the predictive intervals. It is a measure of the reliability of the predictions.

Required inputs	Output shape
`q_obs`, `q_prd`, `c_lvl`	`(sites, lead times, subsets, samples, intervals)`

AW

Average width ("AW") of the predictive interval(s). It is a measure of the sharpness of the predictions.

Required inputs	Output shape
`q_obs`, `q_prd`, `c_lvl`	`(sites, lead times, subsets, samples, intervals)`

AWN

Average width of the predictive interval(s) normalised by the mean observation2 ("AWN"), computed as per Bourgin et al. (2015).

Required inputs	Output shape
`q_obs`, `q_prd`, `c_lvl`	`(sites, lead times, subsets, samples, intervals)`

WS

Winkler score ("WS"), also known as interval score, computed as per Gneiting and Raftery (2007).

\[WS = \frac{1}{n} \sum_{k=1}^{n} (u_k - l_k) + \frac{2}{\alpha} (l_k - x_k)𝟙\{x_k < l_k\} + \frac{2}{\alpha} (x_K - u_k)𝟙\{x_k > u_k\}\]

where, for a given confidence level, \(\alpha\) is the portion not included in the central predictive interval, \(u\) and \(l\) are the upper and lower bounds of the predictive interval, respectively, \(x\) are the observations, and \(n\) is the number of time steps.

Required inputs	Output shape
`q_obs`, `q_prd`, `c_lvl`	`(sites, lead times, subsets, samples, intervals)`

ES

Energy score ("ES") is a multivariate (i.e. multisite) generalisation of the continuous rank probability score.

Required inputs	Output shape
`q_obs`, `q_prd`	`(1, lead times, subsets, samples)`

Footnotes

1(1,2,3,4,5,6,7,8,9,10,11,12): The threshold value is included in the definition of the events both for low flow and high flow events, i.e. where a streamflow observation/prediction value is equal to the threshold value, the event is considered to have occurred.
2(1,2): The metric value returned is \(-\infty\) when the reference/climatology/normalisation value is zero.