Create random holdout data in timestep grid
hold_grid.RdRandomly induces NA (not assigned) values for p proportion of observed values in timestep
grid. Disregards timestep index column.
Arguments
- input_grid
- data frame populated with observed numeric values and - NAs. Sites (variables) are included as columns and timesteps (observations) are included as rows. A leading- timestepindex column is not assessed for holdouts.
- p
- proportion of observed numeric values to randomly transform to - NAs. Default is- 0.05.
Author
Maintainer: Zeno F. Levy zlevy@usgs.gov
Examples
# load example Long Island dataset
  data(LI_data)
# aggregate data at monthly timestep using median observed values
  grid <- timestep_grid(data = LI_data, 
                        timestep = "monthly", 
                        agg_method = "median")
                        
# holdout random 5 percent of observed values
  hold <- hold_grid(input_grid = grid, p = 0.05)
# number of observed values in original grid (not counting timestep index column)   
  sum(!is.na(grid[,-1]))
#> [1] 78473
# number of observed values in holdout grid (not counting timestep index column)
  sum(!is.na(hold[,-1]))
#> [1] 74615
# compute proportion of values coerced to `NA`s  
  c(sum(!is.na(grid[,-1])) - sum(!is.na(hold[,-1])))/sum(!is.na(grid[,-1]))
#> [1] 0.04916341