Skip to contents

Trims data frame by column and row completeness. Can be used directly on output from the timestep_grid function or any other generic data frame object. Automatically removes rows (timesteps) with no observations. Can remove near-zero variance site records (columns).

Usage

trim_grid(input_grid, data_thresh = 0.5, time_thresh = 0, rm_nzv = T)

Arguments

input_grid

data frame populated with observed, numeric values and NAs. Sites (variables) are included as columns and timesteps (observations) are included as rows. A leading timestep index column is not included when assessing row completeness.

data_thresh

site completeness threshold. Removes columns (sites) with less than this proportion of observations (non-NA values) in constituent rows (timesteps). Default is 0.5.

time_thresh

timestep completeness threshold. Removes rows (timesteps) with less than this proportion of observations (non-NA values) in constituent columns (sites). Default is 0 (only empty timesteps are removed)

rm_nzv

logical flag to remove near-zero variance columns (sites). Uses nearZeroVar default settings to select columns for removal. Default is TRUE.

Value

Returns input_grid data frame with select rows and columns removed.

Author

Maintainer: Zeno F. Levy zlevy@usgs.gov

Examples

# load example Long Island dataset
  data(LI_data)

# grid data at monthly timestep using median observed values
  grid <- timestep_grid(data = LI_data, 
                        timestep = "monthly", 
                        agg_method = "median")

# trim grid to remove sites that are less than 35 percent complete
  grid <- trim_grid(grid, data_thresh = 0.35)
#> 339 site(s) removed with proportion complete less than site-data threshold of 0.35