Memoisation — evalhyd latest documentation

Across metrics

deterministic probabilistic

Since certain evaluation metrics require the same intermediate computations, there is scope for some optimisation by storing these intermediate computations so that they can be used by all the evaluation metrics requiring them (i.e. the concept of memoisation in computing).

For example, the deterministic metrics NSE and KGE both require to compute the quadratic error between the observations and their arithmetic mean, so it is more efficient to compute this quadratic error only once, and reuse it in the computation of both NSE and KGE.

evalhyd implements such approach to compute its evaluation metrics, this is why it is recommended to ask for all evaluation metrics of interest at once in a single call to evalhyd rather than ask for them separately in several calls.

That is to say, prefer:

Python

>>> evalhyd.evald(obs, prd, ["NSE", "KGE"])

> evalhyd::evald(obs, prd, c("NSE", "KGE"))

CLI

$ ./evalhyd evald "obs.csv" "prd.csv" "NSE" "KGE"

over:

Python

>>> evalhyd.evald(obs, prd, ["NSE"])
>>> evalhyd.evald(obs, prd, ["KGE"])

> evalhyd::evald(obs, prd, c("NSE"))
> evalhyd::evald(obs, prd, c("KGE"))

CLI

$ ./evalhyd evald "obs.csv" "prd.csv" "NSE"
$ ./evalhyd evald "obs.csv" "prd.csv" "KGE"

Across masks

deterministic probabilistic

In addition, most evaluation metrics first perform intermediate computations on each time step individually (e.g. errors between individual observations and their corresponding predictions), before performing some reduction across all time steps (e.g. arithmetic mean of these individual errors).

If different subset periods of the entire study period are needed (i.e. using the temporal masking or the conditional masking functionalities), and these sub-periods happen to overlap, it is recommended to provide several masks at once to evalhyd rather than one mask at a time. Indeed, evalhyd applies the masks only after the intermediate computations on individual time steps are computed, thus optimising the computation time by avoiding performing these intermediate computations on the same time steps several times.

That is to say, prefer:

Python

>>> res = evalhyd.evald(
...     obs, prd, ["NSE"],
...     t_msk=np.array([[[True, True, False, True, False, True],
...                      [False, True, True, True, False, True]]])
... )

> evalhyd::evald(
+     obs, prd, c("NSE")
+     t_msk = array(
+         data = rbind(c(TRUE, TRUE, FALSE, TRUE, FALSE, TRUE),
+                      c(FALSE, TRUE, TRUE, TRUE, FALSE, TRUE)),
+         dim = c(1, 2, 6)
+     )
+ )

over:

Python

>>> res = evalhyd.evald(
...     obs, prd, ["NSE"],
...     t_msk=np.array([[[True, True, False, True, False, True]]])
... )
>>> res = evalhyd.evald(
...     obs, prd, ["NSE"],
...     t_msk=np.array([[[False, True, True, True, False, True]]])
... )

> evalhyd::evald(
+     obs, prd, c("NSE")
+     t_msk = array(
+         data = c(TRUE, TRUE, FALSE, TRUE, FALSE, TRUE),
+         dim = c(1, 1, 6)
+     )
+ )
> evalhyd::evald(
+     obs, prd, c("NSE")
+     t_msk = array(
+         data = c(FALSE, TRUE, TRUE, TRUE, FALSE, TRUE),
+         dim = c(1, 1, 6)
+     )
+ )