Probabilistic metrics

Tip

All the metrics listed below are accessible via evalp, the probabilistic entry point of evalhyd.

For example, the Brier score can be computed as follows:

#include <xtensor/xtensor.hpp>
#include <xtensor/xio.hpp>
#include <evalhyd/evalp.hpp>

xt::xtensor<double, 2> obs = {{4.7, 4.3, 5.5, 2.7, 4.1}};
xt::xtensor<double, 4> prd = {{{{5.3, 4.2, 5.7, 2.3, 3.1},
                                {4.3, 4.2, 4.7, 4.3, 3.3},
                                {5.3, 5.2, 5.7, 2.3, 3.9}}}};
xt::xtensor<double, 2> thr = {{4., 5.}};

std::cout << evalhyd::evalp(obs, prd, {"BS"}, thr, "high")[0] << std::endl;
// {{{{{ 0.222222,  0.133333}}}}}
>>> import numpy
... obs = numpy.array(
...     [[4.7, 4.3, 5.5, 2.7, 4.1]]
... )
... prd = numpy.array(
...     [[[[5.3, 4.2, 5.7, 2.3, 3.1],
...        [4.3, 4.2, 4.7, 4.3, 3.3],
...        [5.3, 5.2, 5.7, 2.3, 3.9]]]]
... )
... thr = numpy.array([[4., 5.]])
>>> import evalhyd
... evalhyd.evalp(obs, prd, ["BS"], thr, events="high")
[array([[[[[0.22222222, 0.13333333]]]]])]
> obs <- rbind(
+     c(4.7, 4.3, 5.5, 2.7, 4.1)
+ )
> prd <- array(
+     rbind(
+         c(5.3, 4.2, 5.7, 2.3, 3.1),
+         c(4.3, 4.2, 4.7, 4.3, 3.3),
+         c(5.3, 5.2, 5.7, 2.3, 3.9)
+     ),
+     dim = c(1, 1, 3, 5)
+ )
> thr <- rbind(
+     c(4., 5.)
+ )
> library(evalhyd)
> evalhyd::evalp(obs, prd, c("BS"), thr, events="high")
[[1]]
, , 1, 1, 1

          [,1]
[1,] 0.2222222

, , 1, 1, 2

          [,1]
[1,] 0.1333333
$ ./evalhyd evalp "./obs/" "./prd/" "BS" --q_thr "./thr/" --events "high"
{{{{{ 0.222222,  0.133333}}}}}

BS

Brier Score ("BS") originally derived by Brier (1950), but computed as per Wilks (2011):

\[BS = \frac{1}{n} \sum_{k=1}^{n} (o_k - y_k)^2\]

where, for a dichotomous event, \(y_k\) is the event forecast probability, \(o_k\) is the observed event outcome, and \(n\) is the number of time steps.

Required inputs

Output shape

q_obs, q_prd, q_thr, events1

(sites, lead times, subsets, samples, thresholds)

BSS

Brier Skill Score ("BSS"), computed as per Wilks (2011):

\[BSS = 1 - \frac{BS}{BS_{reference}}\]

where \(BS_{reference} = \frac{1}{n} \sum_{k=1}^{n} (o_k - \bar{o})^2\)2, \(o_k\) is the observed event outcome, \(n\) is the number of time steps, and \(\bar{o}\) is the mean observed event occurrence for the study period.

Required inputs

Output shape

q_obs, q_prd, q_thr, events1

(sites, lead times, subsets, samples, thresholds)

BS_CRD

Calibration-Refinement Decomposition of the Brier Score ("BS_CRD") into the three components reliability, resolution, and uncertainty [returned in this order].

Required inputs

Output shape

q_obs, q_prd, q_thr, events1

(sites, lead times, subsets, samples, thresholds, 3)

BS_LBD

Likelihood-Base rate Decomposition of the Brier Score ("BS_LBD") into the three components type 2 bias, discrimination, and sharpness (a.k.a. refinement) [returned in this order].

Required inputs

Output shape

q_obs, q_prd, q_thr, events1

(sites, lead times, subsets, samples, thresholds, 3)

REL_DIAG

X and Y axes of the reliability diagram ("REL_DIAG") and ordinates of its associated sampling histogram: forecast probabilities (X), observed frequencies (Y), and number of forecasts for each forecast probability [returned in this order].

Required inputs

Output shape

q_obs, q_prd, q_thr, events1

(sites, lead times, subsets, samples, thresholds, bins, 3)

CRPS_FROM_BS

Continuous Ranked Probability Score computed from 101 Brier Scores ("CRPS_FROM_BS"), i.e. using the observed minimum, the 99 observed percentiles, and the observed maximum as streamflow thresholds.

Required inputs

Output shape

q_obs, q_prd, events1

(sites, lead times, subsets, samples)

CRPS_FROM_ECDF

Continuous Ranked Probability Score computed from the Empirical Cumulative Density Function ("CRPS_FROM_ECDF"), i.e. constructed from the ensemble member predictions.

Required inputs

Output shape

q_obs, q_prd

(sites, lead times, subsets, samples)

QS

Quantile Scores ("QS") where the ensemble member predictions are treated as quantiles.

Required inputs

Output shape

q_obs, q_prd

(sites, lead times, subsets, samples, quantiles)

CRPS_FROM_QS

Continuous Ranked Probability Score computed from the Quantile Scores ("CRPS_FROM_QS").

Required inputs

Output shape

q_obs, q_prd

(sites, lead times, subsets, samples)

CONT_TBL

Cells of the Contingency Table ("CONT_TBL"), i.e. the hits \(a\), the false alarms \(b\), the misses \(c\), and the correct rejections \(d\), in this order.

Required inputs

Output shape

q_obs, q_prd, q_thr, events1

(sites, lead times, subsets, samples, levels, thresholds, 4)

POD

Probability Of Detection ("POD") also known as “hit rate”, derived from the contingency table.

Required inputs

Output shape

q_obs, q_prd, q_thr, events1

(sites, lead times, subsets, samples, levels, thresholds)

POFD

Probability Of False Detection ("POFD") also known as “false alarm rate”, derived from the contingency table.

Required inputs

Output shape

q_obs, q_prd, q_thr, events1

(sites, lead times, subsets, samples, levels, thresholds)

FAR

False Alarm Ratio ("FAR"), derived from the contingency table.

Required inputs

Output shape

q_obs, q_prd, q_thr, events1

(sites, lead times, subsets, samples, levels, thresholds)

CSI

Critical Success Index ("CSI"), derived from the contingency table.

Required inputs

Output shape

q_obs, q_prd, q_thr, events1

(sites, lead times, subsets, samples, levels, thresholds)

ROCSS

Relative Operating Characteristic Skill Score ("ROCSS"), derived from the contingency table, and based on computing the area under the ROC curve.

Required inputs

Output shape

q_obs, q_prd, q_thr, events1

(sites, lead times, subsets, samples, thresholds)

RANK_HIST

Frequencies of the Rank Histogram ("RANK_HIST"), also known as the Talagrand diagram.

Required inputs

Output shape

q_obs, q_prd

(sites, lead times, subsets, samples, ranks)

DS

Delta score ("DS") as per Candille and Talagrand (2005).

Required inputs

Output shape

q_obs, q_prd

(sites, lead times, subsets, samples)

AS

Alpha score ("AS") as per Renard et al. (2010).

Required inputs

Output shape

q_obs, q_prd

(sites, lead times, subsets, samples)

CR

Coverage ratio ("CR"), i.e. the portion of observations falling within the predictive intervals. It is a measure of the reliability of the predictions.

Required inputs

Output shape

q_obs, q_prd, c_lvl

(sites, lead times, subsets, samples, intervals)

AW

Average width ("AW") of the predictive interval(s). It is a measure of the sharpness of the predictions.

Required inputs

Output shape

q_obs, q_prd, c_lvl

(sites, lead times, subsets, samples, intervals)

AWN

Average width of the predictive interval(s) normalised by the mean observation2 ("AWN"), computed as per Bourgin et al. (2015).

Required inputs

Output shape

q_obs, q_prd, c_lvl

(sites, lead times, subsets, samples, intervals)

WS

Winkler score ("WS"), also known as interval score, computed as per Gneiting and Raftery (2007).

\[WS = \frac{1}{n} \sum_{k=1}^{n} (u_k - l_k) + \frac{2}{\alpha} (l_k - x_k)𝟙\{x_k < l_k\} + \frac{2}{\alpha} (x_K - u_k)𝟙\{x_k > u_k\}\]

where, for a given confidence level, \(\alpha\) is the portion not included in the central predictive interval, \(u\) and \(l\) are the upper and lower bounds of the predictive interval, respectively, \(x\) are the observations, and \(n\) is the number of time steps.

Required inputs

Output shape

q_obs, q_prd, c_lvl

(sites, lead times, subsets, samples, intervals)

ES

Energy score ("ES") is a multivariate (i.e. multisite) generalisation of the continuous rank probability score.

Required inputs

Output shape

q_obs, q_prd

(1, lead times, subsets, samples)

Footnotes

1(1,2,3,4,5,6,7,8,9,10,11,12)

The threshold value is included in the definition of the events both for low flow and high flow events, i.e. where a streamflow observation/prediction value is equal to the threshold value, the event is considered to have occurred.

2(1,2)

The metric value returned is \(-\infty\) when the reference/climatology/normalisation value is zero.