Trim timestep grid by column and row completeness
trim_grid.Rd
Trims data frame by column and row completeness. Can be used directly on output
from the timestep_grid
function or any other generic data frame object.
Automatically removes rows (timesteps) with no observations.
Can remove near-zero variance site records (columns).
Arguments
- input_grid
data frame populated with observed, numeric values and
NA
s. Sites (variables) are included as columns and timesteps (observations) are included as rows. A leadingtimestep
index column is not included when assessing row completeness.- data_thresh
site completeness threshold. Removes columns (sites) with less than this proportion of observations (non-
NA
values) in constituent rows (timesteps). Default is0.5
.- time_thresh
timestep completeness threshold. Removes rows (timesteps) with less than this proportion of observations (non-
NA
values) in constituent columns (sites). Default is0
(only empty timesteps are removed)- rm_nzv
logical flag to remove near-zero variance columns (sites). Uses
nearZeroVar
default settings to select columns for removal. Default isTRUE
.
Author
Maintainer: Zeno F. Levy zlevy@usgs.gov
Examples
# load example Long Island dataset
data(LI_data)
# grid data at monthly timestep using median observed values
grid <- timestep_grid(data = LI_data,
timestep = "monthly",
agg_method = "median")
# trim grid to remove sites that are less than 35 percent complete
grid <- trim_grid(grid, data_thresh = 0.35)
#> 339 site(s) removed with proportion complete less than site-data threshold of 0.35