# Probabilistic metrics

Tip

All the metrics listed below are accessible via evalp, the probabilistic entry point of evalhyd.

For example, the Brier score can be computed as follows:

#include <xtensor/xtensor.hpp>
#include <xtensor/xio.hpp>
#include <evalhyd/evalp.hpp>

xt::xtensor<double, 2> obs = {{4.7, 4.3, 5.5, 2.7, 4.1}};
xt::xtensor<double, 4> prd = {{{{5.3, 4.2, 5.7, 2.3, 3.1},
{4.3, 4.2, 4.7, 4.3, 3.3},
{5.3, 5.2, 5.7, 2.3, 3.9}}}};
xt::xtensor<double, 2> thr = {{4., 5.}};

std::cout << evalhyd::evalp(obs, prd, {"BS"}, thr, "high") << std::endl;
// {{{{{ 0.222222,  0.133333}}}}}

>>> import numpy
... obs = numpy.array(
...     [[4.7, 4.3, 5.5, 2.7, 4.1]]
... )
... prd = numpy.array(
...     [[[[5.3, 4.2, 5.7, 2.3, 3.1],
...        [4.3, 4.2, 4.7, 4.3, 3.3],
...        [5.3, 5.2, 5.7, 2.3, 3.9]]]]
... )
... thr = numpy.array([[4., 5.]])
>>> import evalhyd
... evalhyd.evalp(obs, prd, ["BS"], thr, events="high")
[array([[[[[0.22222222, 0.13333333]]]]])]

> obs <- rbind(
+     c(4.7, 4.3, 5.5, 2.7, 4.1)
+ )
> prd <- array(
+     rbind(
+         c(5.3, 4.2, 5.7, 2.3, 3.1),
+         c(4.3, 4.2, 4.7, 4.3, 3.3),
+         c(5.3, 5.2, 5.7, 2.3, 3.9)
+     ),
+     dim = c(1, 1, 3, 5)
+ )
> thr <- rbind(
+     c(4., 5.)
+ )
> library(evalhyd)
> evalhyd::evalp(obs, prd, c("BS"), thr, events="high")
[]
, , 1, 1, 1

[,1]
[1,] 0.2222222

, , 1, 1, 2

[,1]
[1,] 0.1333333

\$ ./evalhyd evalp "./obs/" "./prd/" "BS" --q_thr "./thr/" --events "high"
{{{{{ 0.222222,  0.133333}}}}}


## BS

Brier Score ("BS") originally derived by Brier (1950), but computed as per Wilks (2011):

$BS = \frac{1}{n} \sum_{k=1}^{n} (o_k - y_k)^2$

where, for a dichotomous event, $$y_k$$ is the event forecast probability, $$o_k$$ is the observed event outcome, and $$n$$ is the number of time steps.

Required inputs

Output shape

q_obs, q_prd, q_thr, events1

(sites, lead times, subsets, samples, thresholds)

## BSS

Brier Skill Score ("BSS"), computed as per Wilks (2011):

$BSS = 1 - \frac{BS}{BS_{reference}}$

where $$BS_{reference} = \frac{1}{n} \sum_{k=1}^{n} (o_k - \bar{o})^2$$2, $$o_k$$ is the observed event outcome, $$n$$ is the number of time steps, and $$\bar{o}$$ is the mean observed event occurrence for the study period.

Required inputs

Output shape

q_obs, q_prd, q_thr, events1

(sites, lead times, subsets, samples, thresholds)

## BS_CRD

Calibration-Refinement Decomposition of the Brier Score ("BS_CRD") into the three components reliability, resolution, and uncertainty [returned in this order].

Required inputs

Output shape

q_obs, q_prd, q_thr, events1

(sites, lead times, subsets, samples, thresholds, 3)

## BS_LBD

Likelihood-Base rate Decomposition of the Brier Score ("BS_LBD") into the three components type 2 bias, discrimination, and sharpness (a.k.a. refinement) [returned in this order].

Required inputs

Output shape

q_obs, q_prd, q_thr, events1

(sites, lead times, subsets, samples, thresholds, 3)

## REL_DIAG

X and Y axes of the reliability diagram ("REL_DIAG") and ordinates of its associated sampling histogram: forecast probabilities (X), observed frequencies (Y), and number of forecasts for each forecast probability [returned in this order].

Required inputs

Output shape

q_obs, q_prd, q_thr, events1

(sites, lead times, subsets, samples, thresholds, bins, 3)

## CRPS_FROM_BS

Continuous Ranked Probability Score computed from 101 Brier Scores ("CRPS_FROM_BS"), i.e. using the observed minimum, the 99 observed percentiles, and the observed maximum as streamflow thresholds.

Required inputs

Output shape

q_obs, q_prd, events1

(sites, lead times, subsets, samples)

## CRPS_FROM_ECDF

Continuous Ranked Probability Score computed from the Empirical Cumulative Density Function ("CRPS_FROM_ECDF"), i.e. constructed from the ensemble member predictions.

Required inputs

Output shape

q_obs, q_prd

(sites, lead times, subsets, samples)

## QS

Quantile Scores ("QS") where the ensemble member predictions are treated as quantiles.

Required inputs

Output shape

q_obs, q_prd

(sites, lead times, subsets, samples, quantiles)

## CRPS_FROM_QS

Continuous Ranked Probability Score computed from the Quantile Scores ("CRPS_FROM_QS").

Required inputs

Output shape

q_obs, q_prd

(sites, lead times, subsets, samples)

## POD

Probability Of Detection ("POD") also known as “hit rate”, derived from the contingency table.

Required inputs

Output shape

q_obs, q_prd, q_thr, events1

(sites, lead times, subsets, samples, levels, thresholds)

## POFD

Probability Of False Detection ("POFD") also known as “false alarm rate”, derived from the contingency table.

Required inputs

Output shape

q_obs, q_prd, q_thr, events1

(sites, lead times, subsets, samples, levels, thresholds)

## FAR

False Alarm Ratio ("FAR"), derived from the contingency table.

Required inputs

Output shape

q_obs, q_prd, q_thr, events1

(sites, lead times, subsets, samples, levels, thresholds)

## CSI

Critical Success Index ("CSI"), derived from the contingency table.

Required inputs

Output shape

q_obs, q_prd, q_thr, events1

(sites, lead times, subsets, samples, levels, thresholds)

## ROCSS

Relative Operating Characteristic Skill Score ("ROCSS"), derived from the contingency table, and based on computing the area under the ROC curve.

Required inputs

Output shape

q_obs, q_prd, q_thr, events1

(sites, lead times, subsets, samples, thresholds)

## RANK_HIST

Frequencies of the Rank Histogram ("RANK_HIST"), also known as the Talagrand diagram.

Required inputs

Output shape

q_obs, q_prd

(sites, lead times, subsets, samples, ranks)

## DS

Delta score ("DS") as per Candille and Talagrand (2005).

Required inputs

Output shape

q_obs, q_prd

(sites, lead times, subsets, samples)

## AS

Alpha score ("AS") as per Renard et al. (2010).

Required inputs

Output shape

q_obs, q_prd

(sites, lead times, subsets, samples)

## CR

Coverage ratio ("CR"), i.e. the portion of observations falling within the predictive intervals. It is a measure of the reliability of the predictions.

Required inputs

Output shape

q_obs, q_prd, c_lvl

(sites, lead times, subsets, samples, intervals)

## AW

Average width ("AW") of the predictive interval(s). It is a measure of the sharpness of the predictions.

Required inputs

Output shape

q_obs, q_prd, c_lvl

(sites, lead times, subsets, samples, intervals)

## AWN

Average width of the predictive interval(s) normalised by the mean observation2 ("AWN"), computed as per Bourgin et al. (2015).

Required inputs

Output shape

q_obs, q_prd, c_lvl

(sites, lead times, subsets, samples, intervals)

## WS

Winkler score ("WS"), also known as interval score, computed as per Gneiting and Raftery (2007).

$WS = \frac{1}{n} \sum_{k=1}^{n} (u_k - l_k) + \frac{2}{\alpha} (l_k - x_k)𝟙\{x_k < l_k\} + \frac{2}{\alpha} (x_K - u_k)𝟙\{x_k > u_k\}$

where, for a given confidence level, $$\alpha$$ is the portion not included in the central predictive interval, $$u$$ and $$l$$ are the upper and lower bounds of the predictive interval, respectively, $$x$$ are the observations, and $$n$$ is the number of time steps.

Required inputs

Output shape

q_obs, q_prd, c_lvl

(sites, lead times, subsets, samples, intervals)

## ES

Energy score ("ES") is a multivariate (i.e. multisite) generalisation of the continuous rank probability score.

Required inputs

Output shape

q_obs, q_prd

(1, lead times, subsets, samples)

Footnotes

1(1,2,3,4,5,6,7,8,9,10,11)

The threshold value is included in the definition of the events both for low flow and high flow events, i.e. where a streamflow observation/prediction value is equal to the threshold value, the event is considered to have occurred.

2(1,2)

The metric value returned is $$-\infty$$ when the reference/climatology/normalisation value is zero.