Title: | Interpretable Discovery and Inference of Heterogeneous Treatment Effects |
---|---|
Description: | Provides a new method for interpretable heterogeneous treatment effects characterization in terms of decision rules via an extensive exploration of heterogeneity patterns by an ensemble-of-trees approach, enforcing high stability in the discovery. It relies on a two-stage pseudo-outcome regression, and it is supported by theoretical convergence guarantees. Bargagli-Stoffi, F. J., Cadei, R., Lee, K., & Dominici, F. (2023) Causal rule ensemble: Interpretable Discovery and Inference of Heterogeneous Treatment Effects. arXiv preprint <doi:10.48550/arXiv.2009.09036>. |
Authors: | Naeem Khoshnevis [aut] |
Maintainer: | Falco Joannes Bargagli Stoffi <[email protected]> |
License: | GPL-3 |
Version: | 0.2.7 |
Built: | 2025-02-17 02:44:43 UTC |
Source: | https://github.com/nsaph-software/cre |
In health and social sciences, it is critically important to identify subgroups of the study population where a treatment has notable heterogeneity in the causal effects with respect to the average treatment effect. Data-driven discovery of heterogeneous treatment effects (HTE) via decision tree methods has been proposed for this task. Despite its high interpretability, the single-tree discovery of HTE tends to be highly unstable and to find an oversimplified representation of treatment heterogeneity. To accommodate these shortcomings, we propose Causal Rule Ensemble (CRE), a new method to discover heterogeneous subgroups through an ensemble-of-trees approach. CRE has the following features:
provides an interpretable representation of the HTE; 2) allows extensive exploration of complex heterogeneity patterns; and 3) guarantees high stability in the discovery. The discovered subgroups are defined in terms of interpretable decision rules, and we develop a general two-stage approach for subgroup-specific conditional causal effects estimation, providing theoretical guarantees.
Naeem Khoshnevis
Daniela Maria Garcia
Riccardo Cadei
Kwonsang Lee
Falco Joannes Bargagli Stoffi
Bargagli-Stoffi, F. J., Cadei, R., Lee, K. and Dominici, F. (2023). Causal rule ensemble: Interpretable Discovery and Inference of Heterogeneous Treatment Effects,arXiv preprint arXiv:2009.09036
Useful links:
Performs the Causal Rule Ensemble on a data set with a response variable, a treatment variable, and various features.
cre(y, z, X, method_params = NULL, hyper_params = NULL, ite = NULL)
cre(y, z, X, method_params = NULL, hyper_params = NULL, ite = NULL)
y |
An observed response vector. |
z |
A treatment vector. |
X |
A covariate matrix (or a data frame). Should be provided as numerical values. |
method_params |
The list of parameters to define the models used, including:
|
hyper_params |
The list of hyper parameters to fine-tune the method, including:
|
ite |
The estimated ITE vector. If given both the ITE estimation steps
in Discovery and Inference are skipped (default: |
An S3 object composed by:
M |
the number of Decision Rules extracted at each step, |
CATE |
the data.frame of Conditional Average Treatment Effect decomposition estimates with corresponding uncertainty quantification, |
method_params |
the list of method parameters, |
hyper_params |
the list of hyper parameters, |
rules |
the list of rules (implicit form) decomposing the CATE. |
If intervention_vars
are provided, it is important to note that the
individual treatment effect will still be computed using all covariates.
set.seed(123) dataset <- generate_cre_dataset(n = 400, rho = 0, n_rules = 2, p = 10, effect_size = 2, binary_covariates = TRUE, binary_outcome = FALSE, confounding = "no") y <- dataset[["y"]] z <- dataset[["z"]] X <- dataset[["X"]] method_params <- list(ratio_dis = 0.5, ite_method ="aipw", learner_ps = "SL.xgboost", learner_y = "SL.xgboost") hyper_params <- list(intervention_vars = NULL, offset = NULL, ntrees = 20, node_size = 20, max_rules = 50, max_depth = 3, t_decay = 0.025, t_ext = 0.025, t_corr = 1, stability_selection = "vanilla", cutoff = 0.6, pfer = 1, B = 20, subsample = 0.5) cre_results <- cre(y, z, X, method_params, hyper_params)
set.seed(123) dataset <- generate_cre_dataset(n = 400, rho = 0, n_rules = 2, p = 10, effect_size = 2, binary_covariates = TRUE, binary_outcome = FALSE, confounding = "no") y <- dataset[["y"]] z <- dataset[["z"]] X <- dataset[["X"]] method_params <- list(ratio_dis = 0.5, ite_method ="aipw", learner_ps = "SL.xgboost", learner_y = "SL.xgboost") hyper_params <- list(intervention_vars = NULL, offset = NULL, ntrees = 20, node_size = 20, max_rules = 50, max_depth = 3, t_decay = 0.025, t_ext = 0.025, t_corr = 1, stability_selection = "vanilla", cutoff = 0.6, pfer = 1, B = 20, subsample = 0.5) cre_results <- cre(y, z, X, method_params, hyper_params)
Generates synthetic data sets to run simulation for causal inference
experiments composed by an outcome vector (y
), a treatment vector (z
),
a covariates matrix (X
), and an unobserved individual treatment effects
vector (ite
).
The arguments specify the data set characteristic, including the
number of individuals (n
), the number of covariates (p
), the correlation
within the covariates (rho
), the number of decision rules
(n_rules
) decomposing the Conditional Average Treatment Effect (CATE), the
treatment effect magnitude (effect_size
), the confounding mechanism
(confounding
), and whether the covariates and outcomes are binary or
continuous (binary_covariates
, binary_outcome
).
generate_cre_dataset( n = 1000, rho = 0, n_rules = 2, p = 10, effect_size = 2, binary_covariates = TRUE, binary_outcome = TRUE, confounding = "no" )
generate_cre_dataset( n = 1000, rho = 0, n_rules = 2, p = 10, effect_size = 2, binary_covariates = TRUE, binary_outcome = TRUE, confounding = "no" )
n |
An integer number that represents the number of observations. Non-integer values will be converted into an integer number. |
rho |
A positive double number that represents the correlation within the covariates (default: 0, range: [0,1)). |
n_rules |
The number of causal rules (default: 2, range: {1,2,3,4}). |
p |
The number of covariates (default: 10). |
effect_size |
The treatment effect size magnitude (default: 2,
range: |
binary_covariates |
Whether to use binary or continuous covariates
(default: |
binary_outcome |
Whether to use binary or continuous outcomes
(default: |
confounding |
Only for continuous outcome, add confounding variables:
|
The covariates matrix is generated with the specified correlation among
individuals, and each covariate is sampled either from a
Bernoulli(0.5)
if binary, or a Gaussian(0,1)
if continuous.
The treatment vector is sampled from a
Bernoulli
(), enforcing the treatment
assignment probabilities to be a function of observed covariates.
The potential outcomes (
and
) are then sampled from a Bernoulli
if binary, or a Gaussian (with standard deviation equal to 1) if continuous.
Their mean is equal to a confounding term (null, linear or non-linear and
always null for binary outcome) plus 1-4 decision rules weighted by the
treatment effect magnitude. The two potential outcomes characterizes the CATE
(and then the unobserved individual treatment effects vector) as the sum of
different additive contributions for each decision rules considered
(plus an intercept).
The final expression of the CATE depends on the treatment effect magnitude
and the number of decision rules considered.
The 4 decision rules are:
Rule 1:
Rule 2:
Rule 3:
Rule 4:
with corresponding additive average treatment effect (AATE) equal to:
Rule 1:
effect_size
,
Rule 2:
effect_size
,
Rule 3:
effect_size
,
Rule 4:
effect_size
.
In example, setting effect_size
=4 and n_rules
=2:
The final outcome vector y
is finally computed by combining the potential
outcomes according to the treatment assignment.
A list, representing the generated synthetic data set, containing:
y |
an outcome vector, |
z |
a treatment vector, |
X |
a covariates matrix, |
ite |
an individual treatment vector. |
Set the covariates domain (binary_covariates
) and outcome domain
(binary_outcome
) according to the experiment of interest.
Increase complexity in heterogeneity discovery:
decreasing the sample size (n
),
adding correlation among covariates (rho
),
increasing the number of rules (n_rules
),
increasing the number of covariates (p
),
decreasing the absolute value of the causal effect (effect_size
),
adding linear or not-linear confounders (confounding
).
set.seed(123) dataset <- generate_cre_dataset(n = 1000, rho = 0, n_rules = 2, p = 10, effect_size = 2, binary_covariates = TRUE, binary_outcome = TRUE, confounding = "no")
set.seed(123) dataset <- generate_cre_dataset(n = 1000, rho = 0, n_rules = 2, p = 10, effect_size = 2, binary_covariates = TRUE, binary_outcome = TRUE, confounding = "no")
Returns current logger settings.
get_logger()
get_logger()
Returns a list that includes logger_file_path and logger_level.
set_logger
for information on setting the log level and file path.
set_logger("mylogger.log", "INFO") log_meta <- get_logger()
set_logger("mylogger.log", "INFO") log_meta <- get_logger()
A wrapper function to extend generic plot functions for cre class.
## S3 method for class 'cre' plot(x, ...)
## S3 method for class 'cre' plot(x, ...)
x |
A |
... |
Additional arguments passed to customize the plot. |
Returns a ggplot2 object, invisibly. This function is called for side effects.
Predicts individual treatment effect via causal rule ensemble algorithm.
## S3 method for class 'cre' predict(object, X, ...)
## S3 method for class 'cre' predict(object, X, ...)
object |
A |
X |
A covariate matrix (or data.frame) |
... |
Additional arguments passed to customize the prediction. |
An array with the estimated Individual Treatment Effects
Prints a brief summary of the CRE object
## S3 method for class 'cre' print(x, verbose = 2, ...)
## S3 method for class 'cre' print(x, verbose = 2, ...)
x |
A cre object from running the CRE function. |
verbose |
Set level of results description details: 0 for only results summary, 1 for results and parameters summary, 2 for results and parameters and rules summary (default 2). |
... |
Additional arguments passed to customize the results description. |
No return value. This function is called for side effects.
Updates logger settings, including log level and location of the file.
set_logger(logger_file_path = "CRE.log", logger_level = "INFO")
set_logger(logger_file_path = "CRE.log", logger_level = "INFO")
logger_file_path |
A path (including file name) to log the messages. (Default: CRE.log) |
logger_level |
The log level. When a log level is set, all log levels below it are also activated (if implemented). Available levels include:
|
No return value. This function is called for side effects.
Log levels are specified by developers during the initial implementation. Future developers or contributors can leverage these log levels to better capture and document the application's processes and events.
set_logger("Debug")
set_logger("Debug")
Prints a brief summary of the CRE object
## S3 method for class 'cre' summary(object, verbose = 2, ...)
## S3 method for class 'cre' summary(object, verbose = 2, ...)
object |
A cre object from running the CRE function. |
verbose |
Set level of results description details: only results summary 0, results+parameters summary 1, results+parameters+rules summary (default 2). |
... |
Additional arguments passed to customize the results description. |
A summary of the CRE object