tlverse / tmle3
Showing 1 of 4 files from the diff.

@@ -2,41 +2,117 @@
Loading
2 2
#'
3 3
#' This learner provides fitting procedures for semiparametric generalized linear models using a user-given baseline learner and
4 4
#' \code{\link[stats]{glm.fit}}. It supports models of the form `linkfun(E[Y|A,W]) = linkfun(E[Y|A=0,W]) + A * f(W)` where `A` is a binary or continuous interaction variable,
5 -
#' and `f(W)` is a user-specified parametric function (e.g. `f(W) = model.matrix(formula_sp, W)`). The baseline function `E[Y|A=0,W]` is fit using a user-specified \code{sl3}-Learner (possibly pooled over values of `A` and then projected onto the semiparametric model).
5 +
#' and `f(W)` is a user-specified parametric function (e.g. `f(W) = model.matrix(formula_sp, W)`). The baseline function `E[Y|A=0,W]` is fit using a user-specified Learner (possibly pooled over values of `A` and then projected onto the semiparametric model).
6 6
#'
7 7
#' @docType class
8 8
#'
9 9
#' @importFrom R6 R6Class
10 -
#' @importFrom stats glm predict family
11 10
#'
12 11
#' @export
13 12
#'
14 13
#' @keywords data
15 14
#'
16 -
#' @return Learner object with methods for training and prediction. See
17 -
#'  \code{\link{Lrnr_base}} for documentation on learners.
15 +
#' @return A learner object inheriting from \code{\link{Lrnr_base}} with
16 +
#'  methods for training and prediction. For a full list of learner
17 +
#'  functionality, see the complete documentation of \code{\link{Lrnr_base}}.
18 18
#'
19 -
#' @format \code{\link{R6Class}} object.
19 +
#' @format An \code{\link[R6]{R6Class}} object inheriting from
20 +
#'  \code{\link{Lrnr_base}}.
20 21
#'
21 22
#' @family Learners
22 23
#'
23 24
#' @section Parameters:
24 25
#' \describe{
25 -
#'   \item{\code{formula_sp}}{ A \code{formula} object specifying the parametric component of the semiparametric model.}
26 -
#'   \item{\code{lrnr_baseline}}{A baseline learner for estimation of the nonparametric component.}
27 -
#'   \item{\code{interaction_variable}}{A interaction variable to multiply with the design matrix generated by \code{formula_sp}. If NULL then the interaction variable is treated as the value 1.
28 -
#'   In many applications, this represents a binary treatment variable `A`.}
29 -
#'   \item{\code{family}}{A family object whose link function specifies the type of semiparametric model (e.g. partially-linear least-squares (\code{gaussian}), partially-linear logistic regression (\code{binomial}), partially-linear relative-risk regression (\code{poisson}) }
30 -
#'   \item{\code{append_interaction_matrix}}{Whether to \code{lrnr_baseline} should be fit on `cbind(task$X,A*V)` where `A` is the interaction variable and `V` is the design matrix obtained from \code{formula_sp}.
26 +
#'   \item{\code{formula_sp}}{ A \code{\link{formula}} object specifying the parametric component of the semiparametric model.}
27 +
#'   \item{\code{lrnr_baseline}}{A baseline learner for estimation of the nonparametric component. This can be pooled or unpooled by specifying \code{return_matrix_predictions}}
28 +
#'   \item{\code{interaction_variable = "A"}}{A interaction variable name (that can be found in `training_task$data`) to multiply by the design matrix generated by \code{formula_sp}. If NULL then the interaction variable is treated identically `1`.
29 +
#'   In many applications, this will be the name of a binary treatment variable (e.g. `A`).}
30 +
#'   \item{\code{family = NULL}}{A family object whose link function specifies the type of semiparametric model (e.g. partially-linear least-squares (\code{\link{gaussian}), partially-linear logistic regression (\code{\link{binomial}), partially-linear log-linear regression (\code{\link{poisson}) }
31 +
#'   \item{\code{append_interaction_matrix = TRUE}}{Whether \code{lrnr_baseline} should be fit on `cbind(task$X,A*V)` where `A` is the interaction variable and `V` is the design matrix obtained from \code{formula_sp}.
31 32
#'   Note, if `append_interaction_matrix = TRUE`, the resulting estimator will be projected onto the semiparametric model using \code{glm.fit}.
32 33
#'   If this is FALSE and \code{interaction_variable} is binary then the semiparametric model is learned by stratifying on \code{interaction_variable}.
33 -
#'   Specifically, if FALSE, \code{lrnr_baseline} is used to estimate `E[Y|A=0,W]` by subsetting to only observations with `A` = 0.
34 +
#'   Specifically, if FALSE, \code{lrnr_baseline} is used to estimate `E[Y|A=0,W]` by subsetting to only observations with `A = 0`.
34 35
#'   In the binary case, setting `append_interaction_matrix = TRUE` allows one to pool the learning across treatment arms and allows additive models to perform well.  }
35 -
#'   \item{\code{return_matrix_predictions}}{Only used if \code{interaction_variable} is binary. Whether to return a matrix output with three columns being `E[Y|A=0,W], E[Y|A=1,W], E[Y|A,W]`.}
36 -
#'
36 +
#'   \item{\code{return_matrix_predictions = FALSE}}{Only used if \code{interaction_variable} is binary. Whether to return a matrix output with three columns being `E[Y|A=0,W], E[Y|A=1,W], E[Y|A,W]`.}
37 +
#'  \item{\code{...}}{Not used.}
37 38
#' }
38 39
#'
39 -
#
40 +
#' @examples
41 +
#' library(glmnet)
42 +
#' n <- 200
43 +
#' W <- runif(n, -1, 1)
44 +
#' A <- rbinom(n, 1, plogis(W))
45 +
#' Y_continuous <- rnorm(n, mean = A + W, sd = 0.3)
46 +
#' Y_binary <- rbinom(n, 1, plogis(A + W))
47 +
#' Y_count <- rpois(n, exp(A + W))
48 +
#' data <- data.table(W, A, Y_continuous, Y_binary, Y_count)
49 +
#'
50 +
#' # Make tasks
51 +
#' task_continuous <- sl3_Task$new(data, covariates = c("A", "W"), outcome = "Y_continuous")
52 +
#' task_binary <- sl3_Task$new(data, covariates = c("A", "W"), outcome = "Y_binary")
53 +
#' task_count <- sl3_Task$new(data, covariates = c("A", "W"), outcome = "Y_count", outcome_type = "continuous")
54 +
#'
55 +
#' formula_sp <- ~ 1 + W
56 +
#'
57 +
#' # fit partially-linear least-squares regression with `append_interaction_matrix = TRUE`
58 +
#' set.seed(100)
59 +
#' lrnr_baseline <- Lrnr_glmnet$new()
60 +
#' family <- gaussian()
61 +
#' lrnr_glm_sp_gaussian <- Lrnr_glm_semiparametric$new(formula_sp = formula_sp, family = family, lrnr_baseline = lrnr_baseline, interaction_variable = "A", append_interaction_matrix = TRUE)
62 +
#' lrnr_glm_sp_gaussian <- lrnr_glm_sp_gaussian$train(task_continuous)
63 +
#' preds <- lrnr_glm_sp_gaussian$predict(task_continuous)
64 +
#' beta <- lrnr_glm_sp_gaussian$fit_object$coefficients
65 +
#' ## In this case, since `append_interaction_matrix = TRUE`, it is equivalent to:
66 +
#' V <- model.matrix(formula_sp, task_continuous$data)
67 +
#' X <- cbind(task_continuous$data[["W"]], task_continuous$data[["A"]] * V)
68 +
#' X0 <- cbind(task_continuous$data[["W"]], 0 * V)
69 +
#' colnames(X) <- c("W", "A", "A*W")
70 +
#' Y <- task_continuous$Y
71 +
#' set.seed(100)
72 +
#' beta_equiv <- coef(cv.glmnet(X, Y, family = "gaussian"), s = "lambda.min")[c(3, 4)]
73 +
#' ## Actually, the glmnet fit is projected onto the semiparametric model with glm.fit (no effect in this case)
74 +
#' print(beta - beta_equiv)
75 +
#'
76 +
#' # fit partially-linear least-squares regression with `append_interaction_matrix = FALSE`
77 +
#' set.seed(100)
78 +
#' lrnr_baseline <- Lrnr_glm$new(family = gaussian())
79 +
#' family <- gaussian()
80 +
#' lrnr_glm_sp_gaussian <- Lrnr_glm_semiparametric$new(formula_sp = formula_sp, family = family, lrnr_baseline = lrnr_baseline, interaction_variable = "A", append_interaction_matrix = FALSE)
81 +
#' lrnr_glm_sp_gaussian <- lrnr_glm_sp_gaussian$train(task_continuous)
82 +
#' preds <- lrnr_glm_sp_gaussian$predict(task_continuous)
83 +
#' beta <- lrnr_glm_sp_gaussian$fit_object$coefficients
84 +
#' ## In this case, since `append_interaction_matrix = TRUE`, it is equivalent to:
85 +
#' ## Subset to baseline treatment arm
86 +
#' subset_to <- task_continuous$data[["A"]] == 0
87 +
#'
88 +
#' V <- model.matrix(formula_sp, task_continuous$data)
89 +
#' X <- cbind(rep(1, n), task_continuous$data[["W"]])
90 +
#' Y <- task_continuous$Y
91 +
#' set.seed(100)
92 +
#' beta_Y0W <- lrnr_glm_sp_gaussian$fit_object$lrnr_baseline$fit_object$coefficients
93 +
#' beta_Y0W_equiv <- coef(glm.fit(X[subset_to, , drop = F], Y[subset_to], family = gaussian())) # Subset to baseline treatment arm
94 +
#' EY0 <- X %*% beta_Y0W
95 +
#' beta_equiv <- coef(glm.fit(A * V, Y, offset = EY0, family = gaussian()))
96 +
#' print(beta_Y0W - beta_Y0W_equiv)
97 +
#' print(beta - beta_equiv)
98 +
#'
99 +
#' # fit partially-linear logistic regression
100 +
#' lrnr_baseline <- Lrnr_glmnet$new()
101 +
#' family <- binomial()
102 +
#' lrnr_glm_sp_binomial <- Lrnr_glm_semiparametric$new(formula_sp = formula_sp, family = family, lrnr_baseline = lrnr_baseline, interaction_variable = "A", append_interaction_matrix = TRUE)
103 +
#' lrnr_glm_sp_binomial <- lrnr_glm_sp_binomial$train(task_binary)
104 +
#' preds <- lrnr_glm_sp_binomial$predict(task_binary)
105 +
#' beta <- lrnr_glm_sp_binomial$fit_object$coefficients
106 +
#'
107 +
#' # fit partially-linear log-link (relative-risk) regression
108 +
#' lrnr_baseline <- Lrnr_glmnet$new(family = "poisson") # This setting requires that lrnr_baseline predicts nonnegative values. It is recommended to use poisson regression based learners.
109 +
#' family <- poisson()
110 +
#' lrnr_glm_sp_binomial <- Lrnr_glm_semiparametric$new(formula_sp = formula_sp, family = family, lrnr_baseline = lrnr_baseline, interaction_variable = "A", append_interaction_matrix = TRUE)
111 +
#' lrnr_glm_sp_binomial <- lrnr_glm_sp_binomial$train(task_count)
112 +
#' preds <- lrnr_glm_sp_binomial$predict(task_count)
113 +
#' beta <- lrnr_glm_sp_binomial$fit_object$coefficients
114 +
#'
115 +
#' #
40 116
Lrnr_glm_semiparametric <- R6Class(
41 117
  classname = "Lrnr_glm_semiparametric", inherit = Lrnr_base,
42 118
  portable = TRUE, class = TRUE,
@@ -113,7 +189,7 @@
Loading
113 189
      }
114 190
115 191
      fit_object <- list(
116 -
        beta = beta, lrnr_baseline = lrnr_baseline, covariates = covariates, family = family, formula = formula,
192 +
        coefficients = beta, lrnr_baseline = lrnr_baseline, covariates = covariates, family = family, formula = formula,
117 193
        append_interaction_matrix = append_interaction_matrix, binary = binary, task_baseline = task_baseline
118 194
      )
119 195
      return(fit_object)
@@ -122,7 +198,7 @@
Loading
122 198
      fit_object <- self$fit_object
123 199
      append_interaction_matrix <- fit_object$append_interaction_matrix
124 200
      binary <- fit_object$binary
125 -
      beta <- fit_object$beta
201 +
      beta <- fit_object$coefficients
126 202
      lrnr_baseline <- fit_object$lrnr_baseline
127 203
      covariates <- fit_object$covariates
128 204
      family <- fit_object$family
Files Coverage
R 80.28%
Project Totals (59 files) 80.28%
Sunburst
The inner-most circle is the entire project, moving away from the center are folders then, finally, a single file. The size and color of each slice is representing the number of statements and the coverage, respectively.
Icicle
The top section represents the entire project. Proceeding with folders and finally individual files. The size and color of each slice is representing the number of statements and the coverage, respectively.
Grid
Each block represents a single file in the project. The size and color of each block is represented by the number of statements and the coverage, respectively.
Loading