1
#' Compatibility with dplyr
2
#'
3
#' @description
4
#' rsample should be fully compatible with dplyr 1.0.0.
5
#'
6
#' With older versions of dplyr, there is partial support for the following
7
#' verbs: `mutate()`, `arrange()`, `filter()`, `rename()`, `select()`, and
8
#' `slice()`. We strongly recommend updating to dplyr 1.0.0 if possible to
9
#' get more complete integration with dplyr.
10
#'
11
#' @section Version Specific Behavior:
12
#'
13
#' rsample performs somewhat differently depending on whether you have
14
#' dplyr >= 1.0.0 (new) or dplyr < 1.0.0 (old). Additionally, version
15
#' 0.0.7 of rsample (new) introduced some changes to how rsample objects
16
#' work with dplyr, even on old dplyr. Most of these changes influence the
17
#' return value of a dplyr verb and determine whether it will be a tibble
18
#' or an rsample rset subclass.
19
#'
20
#' The table below attempts to capture most of these changes. These examples
21
#' are not exhaustive and may not capture some edge-cases.
22
#'
23
#' ## Joins
24
#'
25
#' The following affect all of the dplyr joins, such as `left_join()`,
26
#' `right_join()`, `full_join()`, and `inner_join()`.
27
#'
28
#' Joins that alter the rows of the original rset object:
29
#'
30
#' | operation                  | old rsample + old dplyr | new rsample + old dplyr | new rsample + new dplyr
31
#' | :------------------------- | :---------------------: | :---------------------: | :---------------------:
32
#' | `join(rset, tbl)`          | error                   | error                   | tibble
33
#'
34
#' The idea here is that, if there are less rows in the result, the result should
35
#' not be an rset object. For example, you can't have a 10-fold CV object
36
#' without 10 rows.
37
#'
38
#' Joins that keep the rows of the original rset object:
39
#'
40
#' | operation                  | old rsample + old dplyr | new rsample + old dplyr | new rsample + new dplyr
41
#' | :------------------------- | :---------------------: | :---------------------: | :---------------------:
42
#' | `join(rset, tbl)`          | error                   | error                   | rset
43
#'
44
#' As with the logic above, if the original rset object (defined by the split
45
#' column and the id column(s)) is left intact, the results should be an rset.
46
#'
47
#' ## Row Subsetting
48
#'
49
#' As mentioned above, this should result in a tibble if any rows are removed
50
#' or added. Simply reordering rows still results in a valid rset with new
51
#' rsample.
52
#'
53
#' Cases where rows are removed or added:
54
#'
55
#' | operation       | old rsample + old dplyr | new rsample + old dplyr | new rsample + new dplyr
56
#' | :-------------- | :---------------------: | :---------------------: | :---------------------:
57
#' | `rset[ind,]`    | tibble                  | tibble                  | tibble
58
#' | `slice(rset)`   | rset                    | tibble                  | tibble
59
#' | `filter(rset)`  | rset                    | tibble                  | tibble
60
#'
61
#' Cases where all rows are kept, but are possibly reordered:
62
#'
63
#' | operation       | old rsample + old dplyr | new rsample + old dplyr | new rsample + new dplyr
64
#' | :-------------- | :---------------------: | :---------------------: | :---------------------:
65
#' | `rset[ind,]`    | tibble                  | rset                    | rset
66
#' | `slice(rset)`   | rset                    | rset                    | rset
67
#' | `filter(rset)`  | rset                    | rset                    | rset
68
#' | `arrange(rset)` | rset                    | rset                    | rset
69
#'
70
#' ## Column Subsetting
71
#'
72
#' When the `splits` column or any `id` columns are dropped or renamed,
73
#' the result should no longer be considered a valid rset.
74
#'
75
#' Cases when the required columns are removed or renamed:
76
#'
77
#' | operation       | old rsample + old dplyr | new rsample + old dplyr | new rsample + new dplyr
78
#' | :-------------- | :---------------------: | :---------------------: | :---------------------:
79
#' | `rset[,ind]`    | tibble                  | tibble                  | tibble
80
#' | `select(rset)`  | rset                    | tibble                  | tibble
81
#' | `rename(rset)`  | tibble                  | tibble                  | tibble
82
#'
83
#' Cases when no required columns are affected:
84
#'
85
#' | operation       | old rsample + old dplyr | new rsample + old dplyr | new rsample + new dplyr
86
#' | :-------------- | :---------------------: | :---------------------: | :---------------------:
87
#' | `rset[,ind]`    | tibble                  | rset                    | rset
88
#' | `select(rset)`  | rset                    | rset                    | rset
89
#' | `rename(rset)`  | rset                    | rset                    | rset
90
#'
91
#' ## Other Column Operations
92
#'
93
#' Cases when the required columns are altered:
94
#'
95
#' | operation       | old rsample + old dplyr | new rsample + old dplyr | new rsample + new dplyr
96
#' | :-------------- | :---------------------: | :---------------------: | :---------------------:
97
#' | `mutate(rset)`  | rset                    | tibble                  | tibble
98
#'
99
#' Cases when no required columns are affected:
100
#'
101
#' | operation       | old rsample + old dplyr | new rsample + old dplyr | new rsample + new dplyr
102
#' | :-------------- | :---------------------: | :---------------------: | :---------------------:
103
#' | `mutate(rset)`  | rset                    | rset                    | rset
104
#'
105
#' @name rsample-dplyr
106
NULL
107

108
# `dplyr_reconstruct()`
109
#
110
# `dplyr_reconstruct()` is called:
111
# - After a complex dplyr operation, like a `left_join()`, to restore to the
112
#   type of the first input, `x`.
113
# - At the end of a call to `dplyr_col_modify()`
114
# - At the end of a call to `dplyr_row_slice()`
115
# - See `?dplyr_reconstruct` for the full list.
116
#
117
# Because `dplyr_reconstruct()` is called at the end of `dplyr_col_modify()`
118
# and `dplyr_row_slice()`, we don't need methods for them. The default methods
119
# in dplyr do the right thing automatically, and then our reconstruction
120
# method decides whether or not the result should still be an rset.
121
#
122
# The implementation for rsample is the same as `vec_restore()`. Generally
123
# it will fall back to reconstructing a bare tibble, unless the rset structure
124
# is still completely intact. This happens when rset specific rows and columns
125
# (splits, id cols) are still exactly identical to how they were before the
126
# dplyr operation (with the exception of column reordering).
127

128
# Registered in `.onLoad()`
129
dplyr_reconstruct_rset <- function(data, template) {
130 1
  rset_reconstruct(data, template)
131
}

Read our documentation on viewing source code .

Loading