Estimate smoothing basis dimensions for GAM smoothers.

Function used to estimate smoothing basis, k, for each smooth term.

Usage

# Default S3 method
estimate_smooth_basis(
  target,
  smooth_terms,
  df,
  regressors = NULL,
  k_values = 1:50,
  bs = "tp",
  kindex_thr = 0.95,
  pvalue_thr = 0.05,
  family = "auto",
  method = "fREML",
  discrete = TRUE,
  ...
)

# S3 method for class 'formula'
estimate_smooth_basis(
  formula,
  df,
  k_values = 1:50,
  bs = "tp",
  kindex_thr = 0.95,
  pvalue_thr = 0.05,
  family = "auto",
  method = "fREML",
  discrete = TRUE,
  ...
)

Arguments

target

The column name that encodes the metric to model.

smooth_terms

List of smooth terms to estimate smoothing basis. See details for examples of smoothing terms.

formula

GAM formula. Can contain mulitple smooth terms to estimate. See details for more information.

df

The data frame that contains the GAM metrics.

regressors

Column name or list of column names to use as regressors. This list can also include smoothing terms. Default: NULL.

k_values

A list of k values to consider. Default: 1:50

bs

The name of the default smoothing basis. Default: "tp"

kindex_thr

The k-index threshold. Default: 0.95

pvalue_thr

The p-value threshold. Default: 0.05

family

Name or family function of the distribution to use for modeling the GAM dependent variable.

If name, the possible values: ("auto", "beta", "gamma", "gaussian").
If "auto", will automatically determine the distribution of best fit between mgcv::betar ("beta"), stats::Gamma ("gamma"), or stats::gaussian ("gaussian").
If function, see family or family.mgcv for more family or extended.family class functions.

method

GAM fitting method passed to bam. Default: "fREML"

discrete

With method is "fREML" it is possible to discretize covariates for storage and efficiency reasons. See bam for more information. Default: TRUE

...

Further keyword arguments to be passed to bam

Value

A list containing two items:

	`est_terms`		List of smoothing terms with "best" estimated smoothing basis.
	`k_estimates`		A data frame with contains all estimated smoothing terms and corresponding k-index and p-values.

Details

Smooth terms specification

Smooth terms can be specified as:

s(x)
The smooth term will be estimated with the defaults from k_values and bs.

s(x, k = 1:10, bs = 'cp')
The smooth term will be estimated with k values 1 to 10 and a basis set of 'cp'.

s(x, by = group, k = c(2, 7))
The smooth term will estimate with k value of 2 and 7 and using the by variable group.

s(x, y, bs = 'fs', m = 3)
The smooth term over two variables, x and y, will be estimated with the default k_values with additional arguments of basis set of 'fs' and m of 3.

Not shown are other mgcv smoothers, such as te, ti, and t2, which are also available.

Estimation process

For each smooth_term, the function will iteratively fit a GAM model following the formula while incrementing through k_values:

target ~ regressor_terms + smooth_term

where target is the dependent variable, regressor_terms are the additive effects that should be accounted for while estimating the smoothing term, and smooth_term is the smoothing term that is currently being estimated.

Stopping criterion

The estimation process has two stopping criterion:

The procedure will stop once the k-index value exceeds kindex_thr AND the p-value exceeds the pvalue_thr.
If the thresholds are not met, the procedure will stop once it runs through all of the k_values, and returns the last term.

Examples

if (FALSE) { # \dontrun{
df_sarica <- read_afq_sarica(na_omit = TRUE)

# default specification method
estimate_smooth_basis(
  target       = "fa", 
  smooth_terms = c("s(nodeID)", "s(nodeID, by = group, bs = 'fs')"), 
  df           = df_sarica, 
  regressors   = c("age", "group"), 
)

# formula specification method
estimate_smooth_basis(
  formula = fa ~ age + group + s(nodeID) + s(nodeID, by = "group", bs = "fs"), 
  df      = df_sarica, 
)} # }