Evaluate random holdouts from timestep grid following imputation

Evaluates random holdouts from timestep grid produced using hold_grid() against imputed data using impute_grid().

Usage

hold_eval(true, imp, hold, PI_upr = NULL, PI_lwr = NULL, norm = F)

Arguments

true: timestep grid used to generate holdout data with known "true" values.
imp: imputed timestep grid.
hold: holdout grid with where known "true" values were coerced to NA (not assigned) values.
PI_upr: timestep grid containing upper prediction intervals. Default is NULL.
PI_lwr: timestep grid containing lower prediction intervals. Default is NULL.
norm: logical flag to normalize returned root mean square error by standard deviation of observations. Default is FALSE.

Value

named list containing:

comp

data frame containing holdout comparison results. Fields include: #'

Site - observed holdout sites
Timestep - observed holdout timesteps
Observed - observed holdout values
Imputed - imputed holdout values.

diff

vector of differences (imputed minus observed values) with NAs where holdouts were not imputed

rmse

root mean square error for observed vs imputed values. Normalized by standard deviation of observations if norm is TRUE.

CR

Coverage rate computed as proportion of holdout values within the modeled prediction interval. Defaults to NULL if prediction intervals are not included.

Author

Maintainer: Zeno F. Levy zlevy@usgs.gov

Examples

if (FALSE) { # \dontrun{
# load example Long Island dataset
  data(LI_data)

# aggregate data at monthly timestep using median observed values
  grid <- timestep_grid(data = LI_data, 
                        timestep = "monthly", 
                        agg_method = "median")
                        
# trim grid to remove sites that are less than 35 percent complete
  grid <- trim_grid(grid, data_thresh = 0.35)
                        
# set seed for reproducibility
  set.seed(123)
                         
# holdout random 5 percent of observed values
  hold <- hold_grid(input_grid = grid, p = 0.05) 
 
# impute holdout grid using top 10 most correlated reference sites
  out <- impute_grid(input_grid = hold,
                     n_refwl = 10,
                     bootstrap_PI = T)

# evaluate imputation of holdout data
  eval <- hold_eval(true = grid,
                    imp = out$imputed_grid,
                    hold = hold,
                    PI_upr = out$PI_upr,
                    PI_lwr = out$PI_lwr)

# view root mean squared error of imputed holdouts
  eval$rmse
  
# view coverage rate
  eval$CR
  
# plot observed vs imputed values with 1:1 line
  plot(eval$comp$Observed, eval$comp$Imputed, 
  xlab = "Observed", ylab = "Imputed")
  abline(0,1,lty=2, col="red")     
} # }