Estimate smoothing basis dimensions for GAM smoothers.
Source:R/fit-models.R
estimate_smooth_basis.Rd
Function used to estimate smoothing basis, k
, for each smooth term.
Usage
# Default S3 method
estimate_smooth_basis(
target,
smooth_terms,
df,
regressors = NULL,
k_values = 1:50,
bs = "tp",
kindex_thr = 0.95,
pvalue_thr = 0.05,
family = "auto",
method = "fREML",
discrete = TRUE,
...
)
# S3 method for class 'formula'
estimate_smooth_basis(
formula,
df,
k_values = 1:50,
bs = "tp",
kindex_thr = 0.95,
pvalue_thr = 0.05,
family = "auto",
method = "fREML",
discrete = TRUE,
...
)
Arguments
- target
The column name that encodes the metric to model.
- smooth_terms
List of smooth terms to estimate smoothing basis. See details for examples of smoothing terms.
- formula
GAM formula. Can contain mulitple smooth terms to estimate. See details for more information.
- df
The data frame that contains the GAM metrics.
- regressors
Column name or list of column names to use as regressors. This list can also include smoothing terms. Default: NULL.
- k_values
A list of k values to consider. Default: 1:50
- bs
The name of the default smoothing basis. Default: "tp"
- kindex_thr
The k-index threshold. Default: 0.95
- pvalue_thr
The p-value threshold. Default: 0.05
- family
Name or family function of the distribution to use for modeling the GAM dependent variable.
If name, the possible values: ("auto", "beta", "gamma", "gaussian").
If "auto", will automatically determine the distribution of best fit between mgcv::betar ("beta"), stats::Gamma ("gamma"), or stats::gaussian ("gaussian").
If function, see family or family.mgcv for more
family
orextended.family
class functions.
- method
GAM fitting method passed to bam. Default: "fREML"
- discrete
With
method
is "fREML" it is possible to discretize covariates for storage and efficiency reasons. See bam for more information. Default: TRUE- ...
Further keyword arguments to be passed to bam
Value
A list containing two items:
est_terms | List of smoothing terms with "best" estimated smoothing basis. | ||
k_estimates | A data frame with contains all estimated smoothing terms and corresponding k-index and p-values. |
Details
Smooth terms specification
Smooth terms can be specified as:
s(x)
The smooth term will be estimated with the defaults from k_values
and
bs
. s(x, k = 1:10, bs = 'cp')
The smooth term will be estimated with k
values 1 to 10 and a basis
set of 'cp'. s(x, by = group, k = c(2, 7))
The smooth term will estimate with k
value of 2 and 7 and using the
by
variable group
. s(x, y, bs = 'fs', m = 3)
The smooth term over two variables, x
and y
, will be estimated
with the default k_values
with additional arguments of basis set of
'fs' and m
of 3.
Not shown are other mgcv smoothers, such as te, ti, and t2, which are also available.
Estimation process
For each smooth_term
, the function will iteratively fit a GAM model
following the formula while incrementing through k_values
:
target ~ regressor_terms + smooth_term |
where target
is the dependent variable, regressor_terms
are the
additive effects that should be accounted for while estimating the smoothing
term, and smooth_term
is the smoothing term that is currently being
estimated.
Examples
if (FALSE) { # \dontrun{
df_sarica <- read_afq_sarica(na_omit = TRUE)
# default specification method
estimate_smooth_basis(
target = "fa",
smooth_terms = c("s(nodeID)", "s(nodeID, by = group, bs = 'fs')"),
df = df_sarica,
regressors = c("age", "group"),
)
# formula specification method
estimate_smooth_basis(
formula = fa ~ age + group + s(nodeID) + s(nodeID, by = "group", bs = "fs"),
df = df_sarica,
)} # }