Fits the Bayesian kernel machine regression (BKMR) model using Markov chain Monte Carlo (MCMC) methods.
Usage
kmbayes(
y,
Z,
X = NULL,
iter = 1000,
family = "gaussian",
id = NULL,
verbose = TRUE,
Znew = NULL,
starting.values = NULL,
control.params = NULL,
varsel = FALSE,
groups = NULL,
knots = NULL,
ztest = NULL,
rmethod = "varying",
est.h = FALSE
)
Arguments
- y
a vector of outcome data of length
n
.- Z
an
n
-by-M
matrix of predictor variables to be included in theh
function. Each row represents an observation and each column represents an predictor.- X
an
n
-by-K
matrix of covariate data where each row represents an observation and each column represents a covariate. Should not contain an intercept column.- iter
number of iterations to run the sampler
- family
a description of the error distribution and link function to be used in the model. Currently implemented for
gaussian
andbinomial
families.- id
optional vector (of length
n
) of grouping factors for fitting a model with a random intercept. If NULL then no random intercept will be included.- verbose
TRUE or FALSE: flag indicating whether to print intermediate diagnostic information during the model fitting.
- Znew
optional matrix of new predictor values at which to predict
h
, where each row represents a new observation. This will slow down the model fitting, and can be done as a post-processing step usingSamplePred
- starting.values
list of starting values for each parameter. If not specified default values will be chosen.
- control.params
list of parameters specifying the prior distributions and tuning parameters for the MCMC algorithm. If not specified default values will be chosen.
- varsel
TRUE or FALSE: indicator for whether to conduct variable selection on the Z variables in
h
- groups
optional vector (of length
M
) of group indicators for fitting hierarchical variable selection if varsel=TRUE. If varsel=TRUE without group specification, component-wise variable selections will be performed.- knots
optional matrix of knot locations for implementing the Gaussian predictive process of Banerjee et al. (2008). Currently only implemented for models without a random intercept.
- ztest
optional vector indicating on which variables in Z to conduct variable selection (the remaining variables will be forced into the model).
- rmethod
for those predictors being forced into the
h
function, the method for sampling ther[m]
values. Takes the value of 'varying' to allow separater[m]
for each predictor; 'equal' to force the samer[m]
for each predictor; or 'fixed' to fix ther[m]
to their starting values- est.h
TRUE or FALSE: indicator for whether to sample from the posterior distribution of the subject-specific effects h_i within the main sampler. This will slow down the model fitting.
Value
an object of class "bkmrfit" (containing the posterior samples from the model fit), which has the associated methods:
print
(i.e.,print.bkmrfit
)summary
(i.e.,summary.bkmrfit
)
References
Bobb, JF, Valeri L, Claus Henn B, Christiani DC, Wright RO, Mazumdar M, Godleski JJ, Coull BA (2015). Bayesian Kernel Machine Regression for Estimating the Health Effects of Multi-Pollutant Mixtures. Biostatistics 16, no. 3: 493-508.
Banerjee S, Gelfand AE, Finley AO, Sang H (2008). Gaussian predictive process models for large spatial data sets. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(4), 825-848.
See also
For guided examples, go to https://jenfb.github.io/bkmr/overview.html
Examples
## First generate dataset
set.seed(111)
dat <- SimData(n = 50, M = 4)
y <- dat$y
Z <- dat$Z
X <- dat$X
## Fit model with component-wise variable selection
## Using only 100 iterations to make example run quickly
## Typically should use a large number of iterations for inference
set.seed(111)
fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 100, verbose = FALSE, varsel = TRUE)
#> Iteration: 10 (10% completed; 0.00501 secs elapsed)
#> Iteration: 20 (20% completed; 0.01019 secs elapsed)
#> Iteration: 30 (30% completed; 0.01542 secs elapsed)
#> Iteration: 40 (40% completed; 0.02071 secs elapsed)
#> Iteration: 50 (50% completed; 0.02588 secs elapsed)
#> Iteration: 60 (60% completed; 0.03137 secs elapsed)
#> Iteration: 70 (70% completed; 0.03656 secs elapsed)
#> Iteration: 80 (80% completed; 0.04174 secs elapsed)
#> Iteration: 90 (90% completed; 0.04691 secs elapsed)
#> Iteration: 100 (100% completed; 0.05233 secs elapsed)