Skip to contents

Randomly induces NA (not assigned) values for p proportion of observed values in timestep grid. Disregards timestep index column.

Usage

hold_grid(input_grid, p = 0.05)

Arguments

input_grid

data frame populated with observed numeric values and NAs. Sites (variables) are included as columns and timesteps (observations) are included as rows. A leading timestep index column is not assessed for holdouts.

p

proportion of observed numeric values to randomly transform to NAs. Default is 0.05.

Value

data frame with p proportion of observed values from input_grid randomly coerced to NAs.

See also

Author

Maintainer: Zeno F. Levy zlevy@usgs.gov

Examples

# load example Long Island dataset
  data(LI_data)

# aggregate data at monthly timestep using median observed values
  grid <- timestep_grid(data = LI_data, 
                        timestep = "monthly", 
                        agg_method = "median")
                        
# holdout random 5 percent of observed values
  hold <- hold_grid(input_grid = grid, p = 0.05)

# number of observed values in original grid (not counting timestep index column)   
  sum(!is.na(grid[,-1]))
#> [1] 78473

# number of observed values in holdout grid (not counting timestep index column)
  sum(!is.na(hold[,-1]))
#> [1] 74615

# compute proportion of values coerced to `NA`s  
  c(sum(!is.na(grid[,-1])) - sum(!is.na(hold[,-1])))/sum(!is.na(grid[,-1]))
#> [1] 0.04916341