Bootstrapping

deterministic probabilistic

A bootstrapping method is available in evalhyd to assess the sampling uncertainty in the evaluation metrics computed. It follows a non-overlapping block bootstrapping approach (see e.g. Clark et al. (2021)) where blocks are taken to be full years of data. For a given period, the bootstrap method randomly draws with replacement from the years it contains.

Note

While providing full years is a requirement enforced by evalhyd to preserve seasonal patterns and intra-annual auto-correlation, the start of the year is left to the appreciation of the user (e.g. hydrological years, calendar years, etc.). And the first date and time provided via the parameter dts is used to define the start of the year.

This allows for the estimation of the sampling uncertainty of the evaluation metrics, i.e. the influence of the choice of the study period on the metric values.

The bootstrap method is configurable through three parameters:

Parameter	Description	Possible values
`n_samples`	The number of random samples to generate.	any integer
`len_sample`	The length of one sample in number of blocks (i.e. years).	any integer
`summary`	The statistics to summarise the sampling distribution (i.e. across the samples).	`0`	for no summary
		`1`	for mean & standard deviation
		`2`	for percentiles 5, 10, 25, 50, 75, 90, 95

Hint

The seed of the random generator is configurable through the seed parameter.

Note

Since the sampling is performed with replacement, the number of samples and the length of a sample have no upper limit.

Examples using the bootstrapping functionality are provided below.

Python

>>> res = evalhyd.evald(
...     obs, prd, ["NSE"],
...     bootstrap={"n_samples": 100, "len_sample": 10, "summary": 0},
...     dts=dts
... )
>>> res = evalhyd.evalp(
...     obs, prd, ["CRPS_FROM_ECDF"],
...     bootstrap={"n_samples": 100, "len_sample": 10, "summary": 0},
...     dts=dts
... )

> res <- evalhyd::evald(
+     obs, prd, c("NSE"),
+     bootstrap = list(n_samples = 100, len_sample = 10, summary = 0),
+     dts=dts
+ )
> res <- evalhyd::evalp(
+     obs, prd, c("CRPS_FROM_ECDF"),
+     bootstrap = list(n_samples = 100, len_sample = 10, summary = 0),
+     dts = dts
+ )

CLI

$ ./evalhyd evald "obs.csv" "prd.csv" "NSE" --to_file \
> --bootstrap "n_samples" 100 "len_sample" 10 "summary" 0 --dts "dts.csv"
$ ./evalhyd evalp "./obs" "./prd" "CRPS_FROM_ECDF" --to_file \
> --bootstrap "n_samples" 100 "len_sample" 10 "summary" 0 --dts "dts.csv"