Create random holdout data in timestep grid
hold_grid.Rd
Randomly induces NA
(not assigned) values for p
proportion of observed values in timestep
grid. Disregards timestep
index column.
Arguments
- input_grid
data frame populated with observed numeric values and
NA
s. Sites (variables) are included as columns and timesteps (observations) are included as rows. A leadingtimestep
index column is not assessed for holdouts.- p
proportion of observed numeric values to randomly transform to
NA
s. Default is0.05
.
Author
Maintainer: Zeno F. Levy zlevy@usgs.gov
Examples
# load example Long Island dataset
data(LI_data)
# aggregate data at monthly timestep using median observed values
grid <- timestep_grid(data = LI_data,
timestep = "monthly",
agg_method = "median")
# holdout random 5 percent of observed values
hold <- hold_grid(input_grid = grid, p = 0.05)
# number of observed values in original grid (not counting timestep index column)
sum(!is.na(grid[,-1]))
#> [1] 78473
# number of observed values in holdout grid (not counting timestep index column)
sum(!is.na(hold[,-1]))
#> [1] 74615
# compute proportion of values coerced to `NA`s
c(sum(!is.na(grid[,-1])) - sum(!is.na(hold[,-1])))/sum(!is.na(grid[,-1]))
#> [1] 0.04916341