Package 'GPCERF' reference manual

Title:	Gaussian Processes for Estimating Causal Exposure Response Curves
Description:	Provides a non-parametric Bayesian framework based on Gaussian process priors for estimating causal effects of a continuous exposure and detecting change points in the causal exposure response curves using observational data. Ren, B., Wu, X., Braun, D., Pillai, N., & Dominici, F.(2021). "Bayesian modeling for exposure response curve via gaussian processes: Causal effects of exposure to air pollution on health outcomes." arXiv preprint <doi:10.48550/arXiv.2105.03454>.
Authors:	Naeem Khoshnevis [aut] (<https://orcid.org/0000-0003-4315-1426>, HUIT), Boyu Ren [aut, cre] (<https://orcid.org/0000-0002-5300-1184>, McLean Hospital), Tanujit Dey [ctb] (<https://orcid.org/0000-0001-5559-211X>, HMS), Danielle Braun [aut] (<https://orcid.org/0000-0002-5177-8598>, HSPH)
Maintainer:	Boyu Ren <[email protected]>
License:	GPL (>= 3)
Version:	0.2.4
Built:	2025-04-01 05:19:54 UTC
Source:	https://github.com/nsaph-software/gpcerf

The 'GPCERF' package.

Description

Provides a non-parametric Bayesian framework based on Gaussian process priors for estimating causal effects of a continuous exposure and detecting change points in the causal exposure response curves using observational data.

Author(s)

Naeem Khoshnevis

Boyu Ren

Danielle Braun

References

Ren, B., Wu, X., Braun, D., Pillai, N. and Dominici, F., 2021. Bayesian modeling for exposure response curve via gaussian processes: Causal effects of exposure to air pollution on health outcomes. arXiv preprint arXiv:2105.03454.

Detect change-point in standard GP

Description

Calculates the posterior mean of the difference between left- and right-derivatives at an exposure level for the detection of change points.

Usage

compute_rl_deriv_gp(
  w,
  w_obs,
  y_obs,
  gps_m,
  hyperparam,
  kernel_fn = function(x) exp(-x),
  kernel_deriv_fn = function(x) -exp(-x)
)
compute_rl_deriv_gp(
  w,
  w_obs,
  y_obs,
  gps_m,
  hyperparam,
  kernel_fn = function(x) exp(-x),
  kernel_deriv_fn = function(x) -exp(-x)
)

Arguments

`w`	A scalar of exposure level of interest.
`w_obs`	A vector of observed exposure levels of all samples.
`y_obs`	A vector of observed outcome values of all samples.
`gps_m`	An S3 gps object including: gps: A data.frame of GPS vectors. - Column 1: GPS - Column 2: Prediction of exposure for covariate of each data sample (e_gps_pred). - Column 3: Standard deviation of e_gps (e_gps_std) used_params: - dnorm_log: TRUE or FLASE
`hyperparam`	A vector of hyper-parameters in the GP model.
`kernel_fn`	The covariance function.
`kernel_deriv_fn`	The partial derivative of the covariance function.

Value

A numeric value of the posterior mean of the difference between two one-sided derivatives.

Examples


set.seed(847)
data <- generate_synthetic_data(sample_size = 100)
gps_m <- estimate_gps(cov_mt = data[,-(1:2)],
                      w_all = data$treat,
                      sl_lib = c("SL.xgboost"),
                      dnorm_log = FALSE)

wi <- 8.6

val <- compute_rl_deriv_gp(w = wi,
                           w_obs = data$treat,
                           y_obs = data$Y,
                           gps_m = gps_m,
                           hyperparam = c(1,1,2))

set.seed(847)
data <- generate_synthetic_data(sample_size = 100)
gps_m <- estimate_gps(cov_mt = data[,-(1:2)],
                      w_all = data$treat,
                      sl_lib = c("SL.xgboost"),
                      dnorm_log = FALSE)

wi <- 8.6

val <- compute_rl_deriv_gp(w = wi,
                           w_obs = data$treat,
                           y_obs = data$Y,
                           gps_m = gps_m,
                           hyperparam = c(1,1,2))

Calculate right minus left derivatives for change-point detection in nnGP

Description

Calculates the posterior mean of the difference between left- and right-derivatives at an exposure level for the detection of change points. nnGP approximation is used.

Usage

compute_rl_deriv_nn(
  w,
  w_obs,
  gps_m,
  y_obs,
  hyperparam,
  n_neighbor,
  block_size,
  kernel_fn = function(x) exp(-x),
  kernel_deriv_fn = function(x) -exp(-x)
)
compute_rl_deriv_nn(
  w,
  w_obs,
  gps_m,
  y_obs,
  hyperparam,
  n_neighbor,
  block_size,
  kernel_fn = function(x) exp(-x),
  kernel_deriv_fn = function(x) -exp(-x)
)

Arguments

`w`	A scalar of exposure level of interest.
`w_obs`	A vector of observed exposure levels of all samples.
`gps_m`	An S3 gps object including: gps: A data.frame of GPS vectors. - Column 1: GPS - Column 2: Prediction of exposure for covariate of each data sample (e_gps_pred). - Column 3: Standard deviation of e_gps (e_gps_std) used_params: - dnorm_log: TRUE or FLASE
`y_obs`	A vector of observed outcome values.
`hyperparam`	A vector of hyper-parameters in the GP model.
`n_neighbor`	The number of nearest neighbors on one side.
`block_size`	The number of samples included in a computation block. Mainly used to balance the speed and memory requirement. Larger `block_size` is faster, but requires more memory.
`kernel_fn`	The covariance function. The input is the square of Euclidean distance.
`kernel_deriv_fn`	The partial derivative of the covariance function. The input is the square of Euclidean distance.

Value

A numeric value of the posterior mean of the difference between two one-sided derivatives.

Examples


set.seed(325)
data <- generate_synthetic_data(sample_size = 200)
gps_m <- estimate_gps(cov_mt = data[,-(1:2)],
                      w_all = data$treat,
                      sl_lib = c("SL.xgboost"),
                      dnorm_log = FALSE)

wi <- 12.2

deriv_val <- compute_rl_deriv_nn(w = wi,
                                 w_obs = data$treat,
                                 gps_m = gps_m,
                                 y_obs = data$Y,
                                 hyperparam = c(0.2,0.4,1.2),
                                 n_neighbor = 20,
                                 block_size = 10)

set.seed(325)
data <- generate_synthetic_data(sample_size = 200)
gps_m <- estimate_gps(cov_mt = data[,-(1:2)],
                      w_all = data$treat,
                      sl_lib = c("SL.xgboost"),
                      dnorm_log = FALSE)

wi <- 12.2

deriv_val <- compute_rl_deriv_nn(w = wi,
                                 w_obs = data$treat,
                                 gps_m = gps_m,
                                 y_obs = data$Y,
                                 hyperparam = c(0.2,0.4,1.2),
                                 n_neighbor = 20,
                                 block_size = 10)

Compute weighted covariate balance

Description

Computes weighted covariate balance for given data sets.

Usage

compute_w_corr(w, covariate, weight)
compute_w_corr(w, covariate, weight)

Arguments

`w`	A vector of observed continuous exposure variable.
`covariate`	A data frame of observed covariates variable.
`weight`	A vector of weights.

Value

The function returns a list saved the measure related to covariate balance absolute_corr: the absolute correlations for each pre-exposure covairates; mean_absolute_corr: the average absolute correlations for all pre-exposure covairates.

Examples

set.seed(639)
n <- 100
mydata <- generate_synthetic_data(sample_size=100)
year <- sample(x=c("2001","2002","2003","2004","2005"),size = n,
 replace = TRUE)
region <- sample(x=c("North", "South", "East", "West"),size = n,
 replace = TRUE)
mydata$year <- as.factor(year)
mydata$region <- as.factor(region)
mydata$cf5 <- as.factor(mydata$cf5)
cor_val <- compute_w_corr(mydata[,2],
                          mydata[, 3:length(mydata)],
                          runif(n))

print(cor_val$mean_absolute_corr)

set.seed(639)
n <- 100
mydata <- generate_synthetic_data(sample_size=100)
year <- sample(x=c("2001","2002","2003","2004","2005"),size = n,
 replace = TRUE)
region <- sample(x=c("North", "South", "East", "West"),size = n,
 replace = TRUE)
mydata$year <- as.factor(year)
mydata$region <- as.factor(region)
mydata$cf5 <- as.factor(mydata$cf5)
cor_val <- compute_w_corr(mydata[,2],
                          mydata[, 3:length(mydata)],
                          runif(n))

print(cor_val$mean_absolute_corr)

Estimate the conditional exposure response function using Gaussian process

Description

Estimates the conditional exposure response function (cerf) using Gaussian Process (gp). The function tune the best match (the lowest covariate balance) for the provided set of hyperparameters.

Usage

estimate_cerf_gp(
  data,
  w,
  gps_m,
  params,
  outcome_col,
  treatment_col,
  covariates_col,
  nthread = 1,
  kernel_fn = function(x) exp(-x^2)
)
estimate_cerf_gp(
  data,
  w,
  gps_m,
  params,
  outcome_col,
  treatment_col,
  covariates_col,
  nthread = 1,
  kernel_fn = function(x) exp(-x^2)
)

Arguments

`data`	A data.frame of observation data.
`w`	A vector of exposure level to compute CERF (please also see the notes).
`gps_m`	An S3 gps object including: gps: A data.frame of GPS vectors. - Column 1: GPS - Column 2: Prediction of exposure for covariate of each data sample (e_gps_pred). - Column 3: Standard deviation of e_gps (e_gps_std) used_params: - dnorm_log: TRUE or FLASE
`params`	A list of parameters that is required to run the process. These parameters include: alpha: A scaling factor for the GPS value. beta: A scaling factor for the exposure value. g_sigma: A scaling factor for kernel function (gamma/sigma). tune_app: A tuning approach. Available approaches: all: try all combinations of hyperparameters. alpha, beta, and g_sigma can be a vector of parameters.
`outcome_col`	An outcome column name in `data`.
`treatment_col`	A treatment column name in `data`.
`covariates_col`	Covariates columns name in `data`.
`nthread`	An integer value that represents the number of threads to be used by internal packages.
`kernel_fn`	A kernel function. A default value is a Gaussian Kernel.

Value

A cerf_gp object that includes the following values:

w, the vector of exposure levels.
pst_mean, Computed mean for the w vector.
pst_sd, Computed credible interval for the w vector.

Note

Please note that w is a vector representing a grid of exposure levels at which the CERF is to be estimated. This grid can include both observed and hypothetical values of the exposure variable. The purpose of defining this grid is to provide a structured set of points across the exposure spectrum for estimating the CERF. This approach is essential in nonparametric models like Gaussian Processes (GPs), where the CERF is evaluated at specific points to understand the relationship between the exposure and outcome variables across a continuum. It facilitates a comprehensive analysis by allowing practitioners to examine the effect of varying exposure levels, including those not directly observed in the dataset.

Examples


set.seed(129)
data <- generate_synthetic_data(sample_size = 100, gps_spec = 3)


# Estimate GPS function
gps_m <- estimate_gps(cov_mt = data[,-(1:2)],
                      w_all = data$treat,
                      sl_lib = c("SL.xgboost"),
                      dnorm_log = FALSE)

# exposure values
w_all <- seq(0,10,1)


cerf_gp_obj <- estimate_cerf_gp(data,
                                w_all,
                                gps_m,
                                params = list(alpha = c(0.1),
                                              beta=0.2,
                                              g_sigma = 1,
                                              tune_app = "all"),
                                outcome_col = "Y",
                                treatment_col = "treat",
                                covariates_col = paste0("cf", seq(1,6)),
                                nthread = 1)


set.seed(129)
data <- generate_synthetic_data(sample_size = 100, gps_spec = 3)


# Estimate GPS function
gps_m <- estimate_gps(cov_mt = data[,-(1:2)],
                      w_all = data$treat,
                      sl_lib = c("SL.xgboost"),
                      dnorm_log = FALSE)

# exposure values
w_all <- seq(0,10,1)


cerf_gp_obj <- estimate_cerf_gp(data,
                                w_all,
                                gps_m,
                                params = list(alpha = c(0.1),
                                              beta=0.2,
                                              g_sigma = 1,
                                              tune_app = "all"),
                                outcome_col = "Y",
                                treatment_col = "treat",
                                covariates_col = paste0("cf", seq(1,6)),
                                nthread = 1)

Estimate the conditional exposure response function using nearest neighbor Gaussian process

Description

Estimates the conditional exposure response function (cerf) using the nearest neighbor (nn) Gaussian Process (gp). The function tune the best match (the lowest covariate balance) for the provided set of hyperparameters.

Usage

estimate_cerf_nngp(
  data,
  w,
  gps_m,
  params,
  outcome_col,
  treatment_col,
  covariates_col,
  kernel_fn = function(x) exp(-x^2),
  nthread = 1
)
estimate_cerf_nngp(
  data,
  w,
  gps_m,
  params,
  outcome_col,
  treatment_col,
  covariates_col,
  kernel_fn = function(x) exp(-x^2),
  nthread = 1
)

Arguments

`data`	A data.frame of observation data.
`w`	A vector of exposure level to compute CERF (please also see the notes).
`gps_m`	An S3 gps object including: gps: A data.frame of GPS vectors. - Column 1: GPS - Column 2: Prediction of exposure for covariate of each data sample (e_gps_pred). - Column 3: Standard deviation of e_gps (e_gps_std) used_params: - dnorm_log: TRUE or FLASE
`params`	A list of parameters that is required to run the process. These parameters include: alpha: A scaling factor for the GPS value. beta: A scaling factor for the exposure value. g_sigma: A scaling factor for kernel function (gamma/sigma). tune_app: A tuning approach. Available approaches: all: try all combinations of hyperparameters. n_neighbor: Number of nearest neighbors on one side. block_size: Number of samples included in a computation block. Mainly used to balance the speed and memory requirement. Larger `block_size` is faster, but requires more memory. alpha, beta, and g_sigma can be a vector of parameters.
`outcome_col`	An outcome column name in `data`.
`treatment_col`	A treatment column name in `data`.
`covariates_col`	Covariates columns name in `data`.
`kernel_fn`	A kernel function. A default value is a Gaussian Kernel.
`nthread`	An integer value that represents the number of threads to be used by internal packages.

Value

A cerf_nngp object that includes the following values:

w, the vector of exposure levels.
pst_mean, the computed mean for the w vector.
pst_sd, the computed credible interval for the w vector.

Note

Examples



set.seed(19)
data <- generate_synthetic_data(sample_size = 120, gps_spec = 3)
# Estimate GPS function
gps_m <- estimate_gps(cov_mt = data[,-(1:2)],
                      w_all = data$treat,
                      sl_lib = c("SL.xgboost"),
                      dnorm_log = FALSE)
# exposure values
w.all <- seq(0,20,2)
cerf_nngp_obj <- estimate_cerf_nngp(data,
                                    w.all,
                                    gps_m,
                                    params = list(alpha = c(0.1),
                                                  beta = 0.2,
                                                  g_sigma = 1,
                                                  tune_app = "all",
                                                  n_neighbor = 20,
                                                  block_size = 1e4),
                                    outcome_col = "Y",
                                    treatment_col = "treat",
                                    covariates_col = paste0("cf", seq(1,6)),
                                    nthread = 1)


set.seed(19)
data <- generate_synthetic_data(sample_size = 120, gps_spec = 3)
# Estimate GPS function
gps_m <- estimate_gps(cov_mt = data[,-(1:2)],
                      w_all = data$treat,
                      sl_lib = c("SL.xgboost"),
                      dnorm_log = FALSE)
# exposure values
w.all <- seq(0,20,2)
cerf_nngp_obj <- estimate_cerf_nngp(data,
                                    w.all,
                                    gps_m,
                                    params = list(alpha = c(0.1),
                                                  beta = 0.2,
                                                  g_sigma = 1,
                                                  tune_app = "all",
                                                  n_neighbor = 20,
                                                  block_size = 1e4),
                                    outcome_col = "Y",
                                    treatment_col = "treat",
                                    covariates_col = paste0("cf", seq(1,6)),
                                    nthread = 1)

Estimate a model for generalized propensity score

Description

Estimates a model for generalized propensity score (GPS) using parametric approach.

Usage

estimate_gps(cov_mt, w_all, sl_lib, dnorm_log)
estimate_gps(cov_mt, w_all, sl_lib, dnorm_log)

Arguments

`cov_mt`	A covariate matrix containing all covariates. Each row is a data sample and each column is a covariate.
`w_all`	A vector of observed exposure levels.
`sl_lib`	A vector of SuperLearner's package libraries.
`dnorm_log`	Logical, if TRUE, probabilities p are given as log(p).

Value

A data.frame that includes:

a vector of estimated GPS at the observed exposure levels;
a vector of estimated conditional means of exposure levels when the covariates are fixed at the observed values;
estimated standard deviation of exposure levels
a vector of observed exposure levels.

Examples


data <- generate_synthetic_data(sample_size = 200)
gps_m <- estimate_gps(cov_mt = data[,-(1:2)],
                      w_all = data$treat,
                      sl_lib = c("SL.xgboost"),
                      dnorm_log = FALSE)

data <- generate_synthetic_data(sample_size = 200)
gps_m <- estimate_gps(cov_mt = data[,-(1:2)],
                      w_all = data$treat,
                      sl_lib = c("SL.xgboost"),
                      dnorm_log = FALSE)

Generate synthetic data for the GPCERF package

Description

Generates synthetic data set based on different GPS models and covariates.

Usage

generate_synthetic_data(
  sample_size = 1000,
  outcome_sd = 10,
  gps_spec = 1,
  cova_spec = 1
)
generate_synthetic_data(
  sample_size = 1000,
  outcome_sd = 10,
  gps_spec = 1,
  cova_spec = 1
)

Arguments

`sample_size`	A number of data samples.
`outcome_sd`	Standard deviation used to generate the outcome in the synthetic data set.
`gps_spec`	A numeric value (1-6) that indicates the GPS model used to generate the continuous exposure.
`cova_spec`	A numeric value (1-2) to modify the covariates.

Value

A data frame of the synthetic data. Outcome is labeled as Y, exposure as w, and covariates cf1-6.

Examples


set.seed(351)
data <- generate_synthetic_data(sample_size = 200)

set.seed(351)
data <- generate_synthetic_data(sample_size = 200)

Get logger settings

Description

Returns current logger settings.

Usage

get_logger()
get_logger()

Value

Returns a list that includes logger_file_path and logger_level.

Examples


set_logger("mylogger.log", "INFO")
log_meta <- get_logger()

set_logger("mylogger.log", "INFO")
log_meta <- get_logger()

Extend generic plot functions for cerf_gp class

Description

A wrapper function to extend generic plot functions for cerf_gp class.

Usage

## S3 method for class 'cerf_gp'
plot(x, ...)
## S3 method for class 'cerf_gp'
plot(x, ...)

Arguments

`x`	A cerf_gp object.
`...`	Additional arguments passed to customize the plot.

Value

Returns a ggplot2 object, invisibly. This function is called for side effects.

Extend generic plot functions for cerf_nngp class

Description

A wrapper function to extend generic plot functions for cerf_nngp class.

Usage

## S3 method for class 'cerf_nngp'
plot(x, ...)
## S3 method for class 'cerf_nngp'
plot(x, ...)

Arguments

`x`	A cerf_nngp object.
`...`	Additional arguments passed to customize the plot.

Value

Returns a ggplot2 object, invisibly. This function is called for side effects.

Extend print function for cerf_gp object

Description

Extend print function for cerf_gp object

Usage

## S3 method for class 'cerf_gp'
print(x, ...)
## S3 method for class 'cerf_gp'
print(x, ...)

Arguments

`x`	A cerf_gp object.
`...`	Additional arguments passed to customize the results.

Value

No return value. This function is called for side effects.

Extend print function for cerf_nngp object

Description

Extend print function for cerf_nngp object

Usage

## S3 method for class 'cerf_nngp'
print(x, ...)
## S3 method for class 'cerf_nngp'
print(x, ...)

Arguments

`x`	A cerf_nngp object.
`...`	Additional arguments passed to customize the results.

Value

No return value. This function is called for side effects.

Set logger settings

Description

Updates logger settings, including log level and location of the file.

Usage

set_logger(logger_file_path = "GPCERF.log", logger_level = "INFO")
set_logger(logger_file_path = "GPCERF.log", logger_level = "INFO")

Arguments

logger_file_path

A path (including file name) to log the messages. (Default: GPCERF.log)

logger_level

The log level. Available levels include:

TRACE
DEBUG
INFO (Default)
SUCCESS
WARN
ERROR
FATAL

Value

No return value. This function is called for side effects.

Examples


set_logger("mylogger.log", "INFO")

set_logger("mylogger.log", "INFO")

print summary of cerf_gp object

Description

print summary of cerf_gp object

Usage

## S3 method for class 'cerf_gp'
summary(object, ...)
## S3 method for class 'cerf_gp'
summary(object, ...)

Arguments

`object`	A cerf_gp object.
`...`	Additional arguments passed to customize the results.

Value

Returns summary of data

print summary of cerf_nngp object

Description

print summary of cerf_nngp object

Usage

## S3 method for class 'cerf_nngp'
summary(object, ...)
## S3 method for class 'cerf_nngp'
summary(object, ...)

Arguments

`object`	A cerf_nngp object.
`...`	Additional arguments passed to customize the results.

Value

Returns summary of data.

Package 'GPCERF'

Help Index

The 'GPCERF' package.

Description

Author(s)

References

Detect change-point in standard GP

Description

Usage

Arguments

Value

Examples

Calculate right minus left derivatives for change-point detection in nnGP

Description

Usage

Arguments

Value

Examples

Compute weighted covariate balance

Description

Usage

Arguments

Value

Examples

Estimate the conditional exposure response function using Gaussian process

Description

Usage

Arguments

Value

Note

Examples

Estimate the conditional exposure response function using nearest neighbor Gaussian process

Description

Usage

Arguments

Value

Note

Examples

Estimate a model for generalized propensity score

Description

Usage

Arguments

Value

Examples

Generate synthetic data for the GPCERF package

Description

Usage

Arguments

Value

Examples

Get logger settings

Description

Usage

Value

Examples

Extend generic plot functions for cerf_gp class

Description

Usage

Arguments

Value

Extend generic plot functions for cerf_nngp class

Description

Usage

Arguments

Value

Extend print function for cerf_gp object

Description

Usage

Arguments

Value

Extend print function for cerf_nngp object

Description

Usage

Arguments

Value

Set logger settings

Description

Usage

Arguments

Value