Fit Bayesian kernel machine regression

Fits the Bayesian kernel machine regression (BKMR) model using Markov chain Monte Carlo (MCMC) methods.

Usage

kmbayes(
  y,
  Z,
  X = NULL,
  iter = 1000,
  family = "gaussian",
  id = NULL,
  verbose = TRUE,
  Znew = NULL,
  starting.values = NULL,
  control.params = NULL,
  varsel = FALSE,
  groups = NULL,
  knots = NULL,
  ztest = NULL,
  rmethod = "varying",
  est.h = FALSE
)

Arguments

y: a vector of outcome data of length n.
Z: an n-by-M matrix of predictor variables to be included in the h function. Each row represents an observation and each column represents an predictor.
X: an n-by-K matrix of covariate data where each row represents an observation and each column represents a covariate. Should not contain an intercept column.
iter: number of iterations to run the sampler
family: a description of the error distribution and link function to be used in the model. Currently implemented for gaussian and binomial families.
id: optional vector (of length n) of grouping factors for fitting a model with a random intercept. If NULL then no random intercept will be included.
verbose: TRUE or FALSE: flag indicating whether to print intermediate diagnostic information during the model fitting.
Znew: optional matrix of new predictor values at which to predict h, where each row represents a new observation. This will slow down the model fitting, and can be done as a post-processing step using SamplePred
starting.values: list of starting values for each parameter. If not specified default values will be chosen.
control.params: list of parameters specifying the prior distributions and tuning parameters for the MCMC algorithm. If not specified default values will be chosen.
varsel: TRUE or FALSE: indicator for whether to conduct variable selection on the Z variables in h
groups: optional vector (of length M) of group indicators for fitting hierarchical variable selection if varsel=TRUE. If varsel=TRUE without group specification, component-wise variable selections will be performed.
knots: optional matrix of knot locations for implementing the Gaussian predictive process of Banerjee et al. (2008). Currently only implemented for models without a random intercept.
ztest: optional vector indicating on which variables in Z to conduct variable selection (the remaining variables will be forced into the model).
rmethod: for those predictors being forced into the h function, the method for sampling the r[m] values. Takes the value of 'varying' to allow separate r[m] for each predictor; 'equal' to force the same r[m] for each predictor; or 'fixed' to fix the r[m] to their starting values
est.h: TRUE or FALSE: indicator for whether to sample from the posterior distribution of the subject-specific effects h_i within the main sampler. This will slow down the model fitting.

Value

an object of class "bkmrfit" (containing the posterior samples from the model fit), which has the associated methods:

print (i.e., print.bkmrfit)
summary (i.e., summary.bkmrfit)

References

Bobb, JF, Valeri L, Claus Henn B, Christiani DC, Wright RO, Mazumdar M, Godleski JJ, Coull BA (2015). Bayesian Kernel Machine Regression for Estimating the Health Effects of Multi-Pollutant Mixtures. Biostatistics 16, no. 3: 493-508.

Banerjee S, Gelfand AE, Finley AO, Sang H (2008). Gaussian predictive process models for large spatial data sets. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(4), 825-848.

Examples

## First generate dataset
set.seed(111)
dat <- SimData(n = 50, M = 4)
y <- dat$y
Z <- dat$Z
X <- dat$X

## Fit model with component-wise variable selection
## Using only 100 iterations to make example run quickly
## Typically should use a large number of iterations for inference
set.seed(111)
fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 100, verbose = FALSE, varsel = TRUE)
#> Iteration: 10 (10% completed; 0.00501 secs elapsed)
#> Iteration: 20 (20% completed; 0.01019 secs elapsed)
#> Iteration: 30 (30% completed; 0.01542 secs elapsed)
#> Iteration: 40 (40% completed; 0.02071 secs elapsed)
#> Iteration: 50 (50% completed; 0.02588 secs elapsed)
#> Iteration: 60 (60% completed; 0.03137 secs elapsed)
#> Iteration: 70 (70% completed; 0.03656 secs elapsed)
#> Iteration: 80 (80% completed; 0.04174 secs elapsed)
#> Iteration: 90 (90% completed; 0.04691 secs elapsed)
#> Iteration: 100 (100% completed; 0.05233 secs elapsed)