---
title: "CausalGPS"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{CausalGPS}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
## Installation
```r
library("devtools")
install_github("NSAPH-Software/CausalGPS", ref="master")
library("CausalGPS")
```
## Usage
Input parameters:
**`Y`** A vector of observed outcome variable.
**`w`** A vector of observed continuous exposure variable.
**`c`** A data.frame or matrix of observed covariates variable.
**`ci_appr`** The causal inference approach. Possible values are:
- "matching": Matching by GPS
- "weighting": Weighting by GPS
**`gps_density`** Model density type which is used for estimating GPS value, including normal (default) and kernel.
**`use_cov_transform`** If TRUE, the function uses transformer to meet the covariate balance.
**`transformers`** A list of transformers. Each transformer should be a
unary function. You can pass name of customized function in the quotes.
Available transformers:
- pow2: to the power of 2
- pow3: to the power of 3
**`bin_seq`** Sequence of w (treatment) to generate pseudo population. If
NULL is passed the default value will be used, which is `seq(min(w)+delta_n/2,max(w), by=delta_n)`.
**`exposure_trim_qtls`** A numerical vector of two. Represents the trim quantile level for exposure value. Both numbers should be in the range of [0,1] and in increasing order (default: c(0.01,0.99)).
**`gps_trim_qtls`** A numerical vector of two. Represents the trim quantile level for gps value. Both numbers should be in the range of [0,1] and in increasing order (default: c(0.0, 1.0)).
**`params`** Includes list of params that is used internally. Unrelated parameters will be ignored.
**`sl_lib`**: A vector of prediction algorithms.
**`nthread`** An integer value that represents the number of threads to be used by internal packages.
**`...`** Additional arguments passed to different models.
## Additional parameters
### Causal Inference Approach (`ci.appr`)
- if ci.appr = 'matching':
- *dist_measure*: Distance measuring function. Available options:
- l1: Manhattan distance matching
- *delta_n*: caliper parameter.
- *scale*: a specified scale parameter to control the relative weight that is attributed to the distance measures of the exposure versus the GPS.
- *covar_bl_method*: covariate balance method. Available options:
- 'absolute'
- *covar_bl_trs*: covariate balance threshold
- *covar_bl_trs_type*: covariate balance type (mean, median, maximal)
- *max_attempt*: maximum number of attempt to satisfy covariate balance.
See create_matching() for more details about the parameters and default values.
- if ci.appr = 'weighting':
- *covar_bl_method*: Covariate balance method.
- *covar_bl_trs*: Covariate balance threshold
- *max_attempt*: Maximum number of attempt to satisfy covariate balance.
- Generating Pseudo Population
```r
set.seed(422)
n <- 1000
mydata <- generate_syn_data(sample_size = n)
year <- sample(x=c("2001", "2002", "2003", "2004", "2005"), size = n,
replace = TRUE)
region <- sample(x=c("North", "South", "East", "West"),size = n,
replace = TRUE)
mydata$year <- as.factor(year)
mydata$region <- as.factor(region)
mydata$cf5 <- as.factor(mydata$cf5)
pseudo_pop <- generate_pseudo_pop(
mydata[, c("id", "w")],
mydata[, c("id", "cf1", "cf2", "cf3", "cf4",
"cf5", "cf6","year","region")],
ci_appr = "matching",
gps_density = "kernel",
use_cov_transform = TRUE,
transformers = list("pow2", "pow3", "abs",
"scale"),
exposure_trim_qtls = c(0.01,0.99),
sl_lib = c("m_xgboost"),
covar_bl_method = "absolute",
covar_bl_trs = 0.1,
covar_bl_trs_type = "mean",
max_attempt = 4,
dist_measure = "l1",
delta_n = 1,
scale = 0.5,
nthread = 1)
plot(pseudo_pop)
```
**`matching_fn`** is Manhattan distance matching approach. For prediction model we use [SuperLearner](https://github.com/ecpolley/SuperLearner/) package.
SuperLearner supports different machine learning methods and packages.
**`params`** is a list of hyperparameters that users can pass to the third party libraries in the SuperLearner package.
All hyperparameters go into the params list. The prefixes are used to distinguished parameters for different libraries.
The following table shows the external package names, their equivalent name that should be used in **`sl_lib`**, the prefixes that should be used for their
hyperparameters in the **`params`** list, and available hyperparameters.
| Package name | `sl_lib` name | prefix| available hyperparameters |
|:------------:|:-------------:|:-----:|:-------------------------:|
| [XGBoost](https://xgboost.readthedocs.io/en/latest/index.html)| `m_xgboost` | `xgb_`| nrounds, eta, max_depth, min_child_weight |
| [ranger](https://cran.r-project.org/package=ranger) |`m_ranger`| `rgr_` | num.trees, write.forest, replace, verbose, family |
**`nthread`** is the number of available threads (cores). XGBoost needs OpenMP installed on the system to parallelize the processing.
- Estimating GPS
```r
data_with_gps <- estimate_gps(w,
c,
params = list(xgb_max_depth = c(3,4,5),
xgb_rounds = c(10,20,30,40)),
nthread = 1,
sl_lib = c("m_xgboost")
)
```
- Estimating Exposure Rate Function
```r
estimate_npmetric_erf<-function(matched_Y,
matched_w,
matched_counter = NULL,
bw_seq=seq(0.2,2,0.2),
w_vals,
nthread)
```
- Generating Synthetic Data
```r
syn_data <- generate_syn_data(sample_size=100,
outcome_sd = 10,
gps_spec = 1,
cova_spec = 1)
```
- Logging
The CausalGPS package is logging internal activities into the `CausalGPS.log` file. The file is located in the source file location and will be appended. Users can change the logging file name (and path) and logging threshold. The logging mechanism has different thresholds (see [logger](https://cran.r-project.org/package=logger) package). The two most important thresholds are INFO and DEBUG levels. The former, which is the default level, logs more general information about the process. The latter, if activated, logs more detailed information that can be used for debugging purposes.