Aggregate data into wide format by timestep and site
timestep_grid.Rd
Aggregates observed values, observation dates, and site identifiers
from a long-format into wide-format where timesteps of regular frequency
(rows) are indexed to unique sites (columns) and populated with NA
s where
no data are present. Generated timesteps can be of daily, weekly, monthly,
seasonal, or annual frequencies. Data can be aggregated by:
min
, max
, mean
, median
, or a user-specified quantile
.
Usage
timestep_grid(
data,
timestep = "monthly",
agg_method = "median",
q_perc = NULL,
type_year = "calendar",
start_month = 1,
n_seasons = 4,
year_range = NULL,
months = c(1:12),
output_date = "first_date"
)
Arguments
- data
a data frame with at least three named columns that must include:
site_no
(character or numeric),date
(character or date), andvalue
(numeric) fields. The following character format orders are recognized for the date field followinglubridate::parse_date_time()
orders: "ymd","dmy", "mdy", "ymd HMS" with or without separators.- timestep
temporal frequency for data aggregation. Can be set to:
"daily"
,"weekly"
,"monthly"
,"seasonal"
, or"annual"
frequencies. Default is"monthly"
. If"seasonal"
is selected thetype_year
input argument must be set to"water"
andstart_month
must be set to a numeric from 1 to 12.- agg_method
data aggregation function. All data values with dates within a given timestep are aggregated by this function. Can be set to:
"min"
,"max"
,"mean"
,"median"
, or"quantile"
. Default is"median"
. If"quantile"
is selected theq_perc
input argument must be set to a a numeric value from 0 to 1.- q_perc
user-defined quantile. For data aggregation when
agg_method
is set to"quantile"
. Must be set to a numeric value from 0 to 1. Default isNULL
.- type_year
type of year used for data aggregation. Can be set to
"calendar"
or"water"
. Must be set to"water"
iftimestep
is set to"seasonal"
.- start_month
first month defining water year. Must be set to a numeric value corresponding to calendar months from
1
(January) to12
(December). Default is1
.- n_seasons
number of seasons. Defines seasons by evenly dividing months of the year beginning with
start_month
inton_seasons
. Only months included in themonths
argument are considered. The number of months considered must be divisible byn_seasons
. Default is4
.- year_range
two-element vector containing first and last year to filter input dates by. The
year_range
filter applies to calendar or water years depending on thetype_year
argument, but all output dates are calendar. Default isNULL
, which does not filter the output by year.- months
vector of months to be included in the output. Calendar months included in construction of timesteps can range from 1 (January) to 12 (December). Months not included in the
months
argument will not be considered during timestep discretization and aggregation. Default isc(1:12)
.- output_date
date used as
timestep
identifier. Can be set to"first_date"
or"median_date"
, which attributes a given timestep by its first or median date, respectively.
Value
A data frame with dimensions equal to the number of timesteps
(rows) by the number of unique sites (columns) appended with an additional
leading timestep
index column. The timestep
index column is formatted as
an R "Date"
class and all other columns are formatted as "numeric"
.
Numeric values represent all values observed at the indexed site during a
given timestep aggregated by the function specified by the agg_method
input argument. Timesteps where no observed data occurs at a given site
are populated with NA
s. Unique sites identified in the column headers
derived from the site_no
input field are appended with an "X." character
prefix to prevent truncation of numeric site identifiers.
Details
All dates output in the timestep
field of the return data frame are
indexed by calendar year even if data are aggregated by water year.
If year_range
is not specified, sequential timesteps will be generated
spanning from the earliest to the latest timestep containing an observed value.
Currently, sub-daily gridding (e.g., hours, minutes, seconds) is not available,
but the impute_grid
function will intake user-formatted grids
of such data without a leading timestep column, indexing timesteps in model
output by sequential integers.
Author
Timothy J. Stagnitta, Zeno F. Levy
Maintainer: Zeno F. Levy
zlevy@usgs.gov
Examples
# load example Long Island dataset
data(LI_data)
# aggregate data at monthly timestep using median observed values
grid <- timestep_grid(data = LI_data,
timestep = "monthly",
agg_method = "median")
# view first five timesteps
grid$timestep[1:5]
#> [1] "1975-01-01" "1975-02-01" "1975-03-01" "1975-04-01" "1975-05-01"
# output median dates for timestep indices and apply year range filter
grid <- timestep_grid(data = LI_data,
timestep = "monthly",
agg_method = "median",
output_date = "median_date",
year_range = c(1990, 2000))
# view first five timesteps
grid$timestep[1:5]
#> [1] "1990-01-16" "1990-02-14" "1990-03-16" "1990-04-15" "1990-05-16"
# aggregate data by water year beginning in October using median observed values
grid <- timestep_grid(data = LI_data,
timestep = "annual",
agg_method = "median",
type_year = "water",
start_month = 10)
# view first five timesteps
grid$timestep[1:5]
#> [1] "1974-10-01" "1975-10-01" "1976-10-01" "1977-10-01" "1978-10-01"
# aggregate data seasonally by four-season water year beginning in October
grid <- timestep_grid(data = LI_data,
timestep = "seasonal",
n_seasons = 4,
agg_method = "median",
type_year = "water",
start_month = 10)
# view first five timesteps
grid$timestep[1:5]
#> [1] "1975-01-01" "1975-04-01" "1975-07-01" "1975-10-01" "1976-01-01"