generalize and export some internal functions for splitting
1 |
#' Compatibility with dplyr
|
|
2 |
#'
|
|
3 |
#' @description
|
|
4 |
#' rsample should be fully compatible with dplyr 1.0.0.
|
|
5 |
#'
|
|
6 |
#' With older versions of dplyr, there is partial support for the following
|
|
7 |
#' verbs: `mutate()`, `arrange()`, `filter()`, `rename()`, `select()`, and
|
|
8 |
#' `slice()`. We strongly recommend updating to dplyr 1.0.0 if possible to
|
|
9 |
#' get more complete integration with dplyr.
|
|
10 |
#'
|
|
11 |
#' @section Version Specific Behavior:
|
|
12 |
#'
|
|
13 |
#' rsample performs somewhat differently depending on whether you have
|
|
14 |
#' dplyr >= 1.0.0 (new) or dplyr < 1.0.0 (old). Additionally, version
|
|
15 |
#' 0.0.7 of rsample (new) introduced some changes to how rsample objects
|
|
16 |
#' work with dplyr, even on old dplyr. Most of these changes influence the
|
|
17 |
#' return value of a dplyr verb and determine whether it will be a tibble
|
|
18 |
#' or an rsample rset subclass.
|
|
19 |
#'
|
|
20 |
#' The table below attempts to capture most of these changes. These examples
|
|
21 |
#' are not exhaustive and may not capture some edge-cases.
|
|
22 |
#'
|
|
23 |
#' ## Joins
|
|
24 |
#'
|
|
25 |
#' The following affect all of the dplyr joins, such as `left_join()`,
|
|
26 |
#' `right_join()`, `full_join()`, and `inner_join()`.
|
|
27 |
#'
|
|
28 |
#' Joins that alter the rows of the original rset object:
|
|
29 |
#'
|
|
30 |
#' | operation | old rsample + old dplyr | new rsample + old dplyr | new rsample + new dplyr
|
|
31 |
#' | :------------------------- | :---------------------: | :---------------------: | :---------------------:
|
|
32 |
#' | `join(rset, tbl)` | error | error | tibble
|
|
33 |
#'
|
|
34 |
#' The idea here is that, if there are less rows in the result, the result should
|
|
35 |
#' not be an rset object. For example, you can't have a 10-fold CV object
|
|
36 |
#' without 10 rows.
|
|
37 |
#'
|
|
38 |
#' Joins that keep the rows of the original rset object:
|
|
39 |
#'
|
|
40 |
#' | operation | old rsample + old dplyr | new rsample + old dplyr | new rsample + new dplyr
|
|
41 |
#' | :------------------------- | :---------------------: | :---------------------: | :---------------------:
|
|
42 |
#' | `join(rset, tbl)` | error | error | rset
|
|
43 |
#'
|
|
44 |
#' As with the logic above, if the original rset object (defined by the split
|
|
45 |
#' column and the id column(s)) is left intact, the results should be an rset.
|
|
46 |
#'
|
|
47 |
#' ## Row Subsetting
|
|
48 |
#'
|
|
49 |
#' As mentioned above, this should result in a tibble if any rows are removed
|
|
50 |
#' or added. Simply reordering rows still results in a valid rset with new
|
|
51 |
#' rsample.
|
|
52 |
#'
|
|
53 |
#' Cases where rows are removed or added:
|
|
54 |
#'
|
|
55 |
#' | operation | old rsample + old dplyr | new rsample + old dplyr | new rsample + new dplyr
|
|
56 |
#' | :-------------- | :---------------------: | :---------------------: | :---------------------:
|
|
57 |
#' | `rset[ind,]` | tibble | tibble | tibble
|
|
58 |
#' | `slice(rset)` | rset | tibble | tibble
|
|
59 |
#' | `filter(rset)` | rset | tibble | tibble
|
|
60 |
#'
|
|
61 |
#' Cases where all rows are kept, but are possibly reordered:
|
|
62 |
#'
|
|
63 |
#' | operation | old rsample + old dplyr | new rsample + old dplyr | new rsample + new dplyr
|
|
64 |
#' | :-------------- | :---------------------: | :---------------------: | :---------------------:
|
|
65 |
#' | `rset[ind,]` | tibble | rset | rset
|
|
66 |
#' | `slice(rset)` | rset | rset | rset
|
|
67 |
#' | `filter(rset)` | rset | rset | rset
|
|
68 |
#' | `arrange(rset)` | rset | rset | rset
|
|
69 |
#'
|
|
70 |
#' ## Column Subsetting
|
|
71 |
#'
|
|
72 |
#' When the `splits` column or any `id` columns are dropped or renamed,
|
|
73 |
#' the result should no longer be considered a valid rset.
|
|
74 |
#'
|
|
75 |
#' Cases when the required columns are removed or renamed:
|
|
76 |
#'
|
|
77 |
#' | operation | old rsample + old dplyr | new rsample + old dplyr | new rsample + new dplyr
|
|
78 |
#' | :-------------- | :---------------------: | :---------------------: | :---------------------:
|
|
79 |
#' | `rset[,ind]` | tibble | tibble | tibble
|
|
80 |
#' | `select(rset)` | rset | tibble | tibble
|
|
81 |
#' | `rename(rset)` | tibble | tibble | tibble
|
|
82 |
#'
|
|
83 |
#' Cases when no required columns are affected:
|
|
84 |
#'
|
|
85 |
#' | operation | old rsample + old dplyr | new rsample + old dplyr | new rsample + new dplyr
|
|
86 |
#' | :-------------- | :---------------------: | :---------------------: | :---------------------:
|
|
87 |
#' | `rset[,ind]` | tibble | rset | rset
|
|
88 |
#' | `select(rset)` | rset | rset | rset
|
|
89 |
#' | `rename(rset)` | rset | rset | rset
|
|
90 |
#'
|
|
91 |
#' ## Other Column Operations
|
|
92 |
#'
|
|
93 |
#' Cases when the required columns are altered:
|
|
94 |
#'
|
|
95 |
#' | operation | old rsample + old dplyr | new rsample + old dplyr | new rsample + new dplyr
|
|
96 |
#' | :-------------- | :---------------------: | :---------------------: | :---------------------:
|
|
97 |
#' | `mutate(rset)` | rset | tibble | tibble
|
|
98 |
#'
|
|
99 |
#' Cases when no required columns are affected:
|
|
100 |
#'
|
|
101 |
#' | operation | old rsample + old dplyr | new rsample + old dplyr | new rsample + new dplyr
|
|
102 |
#' | :-------------- | :---------------------: | :---------------------: | :---------------------:
|
|
103 |
#' | `mutate(rset)` | rset | rset | rset
|
|
104 |
#'
|
|
105 |
#' @name rsample-dplyr
|
|
106 |
NULL
|
|
107 |
|
|
108 |
# `dplyr_reconstruct()`
|
|
109 |
#
|
|
110 |
# `dplyr_reconstruct()` is called:
|
|
111 |
# - After a complex dplyr operation, like a `left_join()`, to restore to the
|
|
112 |
# type of the first input, `x`.
|
|
113 |
# - At the end of a call to `dplyr_col_modify()`
|
|
114 |
# - At the end of a call to `dplyr_row_slice()`
|
|
115 |
# - See `?dplyr_reconstruct` for the full list.
|
|
116 |
#
|
|
117 |
# Because `dplyr_reconstruct()` is called at the end of `dplyr_col_modify()`
|
|
118 |
# and `dplyr_row_slice()`, we don't need methods for them. The default methods
|
|
119 |
# in dplyr do the right thing automatically, and then our reconstruction
|
|
120 |
# method decides whether or not the result should still be an rset.
|
|
121 |
#
|
|
122 |
# The implementation for rsample is the same as `vec_restore()`. Generally
|
|
123 |
# it will fall back to reconstructing a bare tibble, unless the rset structure
|
|
124 |
# is still completely intact. This happens when rset specific rows and columns
|
|
125 |
# (splits, id cols) are still exactly identical to how they were before the
|
|
126 |
# dplyr operation (with the exception of column reordering).
|
|
127 |
|
|
128 |
# Registered in `.onLoad()`
|
|
129 |
dplyr_reconstruct_rset <- function(data, template) { |
|
130 | 1 |
rset_reconstruct(data, template) |
131 |
}
|
Read our documentation on viewing source code .