Skip to contents

Evaluates random holdouts from timestep grid produced using hold_grid() against imputed data using impute_grid().

Usage

hold_eval(true, imp, hold, PI_upr = NULL, PI_lwr = NULL, norm = F)

Arguments

true

timestep grid used to generate holdout data with known "true" values.

imp

imputed timestep grid.

hold

holdout grid with where known "true" values were coerced to NA (not assigned) values.

PI_upr

timestep grid containing upper prediction intervals. Default is NULL.

PI_lwr

timestep grid containing lower prediction intervals. Default is NULL.

norm

logical flag to normalize returned root mean square error by standard deviation of observations. Default is FALSE.

Value

named list containing:

comp

data frame containing holdout comparison results. Fields include: #'

  • Site - observed holdout sites

  • Timestep - observed holdout timesteps

  • Observed - observed holdout values

  • Imputed - imputed holdout values.

diff

vector of differences (imputed minus observed values) with NAs where holdouts were not imputed

rmse

root mean square error for observed vs imputed values. Normalized by standard deviation of observations if norm is TRUE.

CR

Coverage rate computed as proportion of holdout values within the modeled prediction interval. Defaults to NULL if prediction intervals are not included.

Author

Maintainer: Zeno F. Levy zlevy@usgs.gov

Examples

if (FALSE) { # \dontrun{
# load example Long Island dataset
  data(LI_data)

# aggregate data at monthly timestep using median observed values
  grid <- timestep_grid(data = LI_data, 
                        timestep = "monthly", 
                        agg_method = "median")
                        
# trim grid to remove sites that are less than 35 percent complete
  grid <- trim_grid(grid, data_thresh = 0.35)
                        
# set seed for reproducibility
  set.seed(123)
                         
# holdout random 5 percent of observed values
  hold <- hold_grid(input_grid = grid, p = 0.05) 
 
# impute holdout grid using top 10 most correlated reference sites
  out <- impute_grid(input_grid = hold,
                     n_refwl = 10,
                     bootstrap_PI = T)

# evaluate imputation of holdout data
  eval <- hold_eval(true = grid,
                    imp = out$imputed_grid,
                    hold = hold,
                    PI_upr = out$PI_upr,
                    PI_lwr = out$PI_lwr)

# view root mean squared error of imputed holdouts
  eval$rmse
  
# view coverage rate
  eval$CR
  
# plot observed vs imputed values with 1:1 line
  plot(eval$comp$Observed, eval$comp$Imputed, 
  xlab = "Observed", ylab = "Imputed")
  abline(0,1,lty=2, col="red")     
} # }