8 Avoid dependencies between arguments
8.1 What’s the problem?
Avoid creating dependencies between details arguments so that only certain combinations are permitted. Dependencies between arguments makes functions harder to use because you have to remember how arguments interact, and you when reading a call, you need to read multiple arguments before interpreting one.
8.2 What are some examples?
rep()you can supply both
timesis a vector:
Learn more in Chapter 11.
na.rmis only used if
useis not set. If you supply both
na.rmis silently ignored.
rgamma()you can provide either
rate. If you supply both you will get an error or a warning:
rgamma(5, shape = 1, rate = 2, scale = 1/2) #> Warning in rgamma(5, shape = 1, rate = 2, scale = 1/2): specify 'rate' or #> 'scale' but not both #>  0.9827712 0.2016992 0.6153597 0.1510318 0.8513697 rgamma(5, shape = 1, rate = 2, scale = 2) #> Error in rgamma(5, shape = 1, rate = 2, scale = 2): specify 'rate' or 'scale' but not both
ignore.casearguments which can either be
fixed = TRUEoverrides
perl = TRUE, and
ignore.caseonly works if
fixed = FALSE. Both
perlchange how another argument,
pattern, is interpreted.
character.onlyargument changes how
forcats::fct_lump()decides which algorithm to use based on a combination of the
ggplot2::geom_histogram(), you can specify the histogram breaks in three ways: as a number of
bins, as the width of each bin (
boundary), or the exact
breaks. You can only pick one of the three options, which is hard to convey in the documentation. There’s also an implied precedence so that if more than one option is supplied, one will silently win.
readr::locale()there’s a complex dependency between
grouping_markbecause they can’t be the same value, and the US and Europe use different standards.
8.3 Why is this important?
Having complicated interdependencies between arguments has major downsides:
It suggests that there are many more viable code paths than there really are and all those (unnecessary) possibilities still occupy head space. You have to memorise the set of allowed combinations, rather than them being implied by the structure of the function.
It increases implementation complexity. Interdependence of arguments suggests complex implementation paths which are harder to analyse and test.
It makes documentation harder to write. You have to use extra words to explain exactly how combinations of arguments work together, and it’s not obvious where those words should go. If there’s an interaction between
arg_bdo you document with
arg_b, or with both?
8.4 How do I remediate?
Often these problems arise because the scope of a function grows over time. When the function was initially designed, the scope was small, and it grew incrementally over time. At no point did it seem worth the additional effort to refactor to a new design, but now you have a large complex function. This makes the problem hard to avoid.
To remediate the problem, you’ll need to think holistically and reconsider the complete interface. There are two common outcomes which are illustrated in the case studies below:
Splitting the function into multiple functions that each do one thing.
Encapulsating related details arguments into a single object.
See also larger case study in Chapter 11 where this problem is tangled up with other problems.
If these changes to the interface occur to exported functions in a package, you’ll need to consider how to preserve the interface with deprecation warnings. For important functions, it is worth generating an message that includes new code to copy and paste.
8.4.1 Case study:
There are many different ways to decide how to lump uncommon factor levels together, and initially we attempted to encode these through arguments to
fct_lump(). However, over time as the number of arguments increased, it gets harder and harder to tell what the options are. Currently there are three behaviours:
propmissing - merge together the least frequent levels, ensuring that
otheris still the smallest level. (For this case, the
ties.methodargument is ignored.)
nsupplied: if positive, preserves
nmost common values.
propsupplied: if positive, preserves
propsupplied: due to a bug in the code, this is treated the same way as both
propmissing! (But it really should be an error)
Would be better to break into three functions:
That has three advantages:
The name of function helps remind you of the purpose.
There’s no way to supply both
ties.methodargument would only appear in
8.4.2 Case study:
grepl(), has three arguments that take either
fixed, which might suggest that there are 2 ^ 3 = 16 possible invocations. However, a number of combinations are not allowed:
x <- grepl("a", letters, fixed = TRUE, ignore.case = TRUE) #> Warning in grepl("a", letters, fixed = TRUE, ignore.case = TRUE): argument #> 'ignore.case = TRUE' will be ignored x <- grepl("a", letters, fixed = TRUE, perl = TRUE) #> Warning in grepl("a", letters, fixed = TRUE, perl = TRUE): argument 'perl = #> TRUE' will be ignored
Part of this problem could be resolved by making it more clear that one important choice is the matching engine to use: POSIX 1003.2 extended regular expressions (the default), Perl-style regular expressions (
perl = TRUE) or fixed matching (
fixed = TRUE). A better approach would be to use the pattern in Chapter 12, and create a new argument called something like
engine = c("POSIX", "perl", "fixed").
The other problem is that
ignore.case can only affect two of the three engines: POSIX and perl. This is hard to remedy without creating a completely new matching engine. Anything to do with case is always harder than you might expect because different languages have different rules.
stringr takes a different approach, encoding the engine as an attribute of the pattern:
This has the advantage that each engine can take different arguments.
An alternative approach would be to have a separate engine argument:
This approach is a bit more discoverable (because there’s clearly another argument that affects the pattern), but it’s slightly less general, because of the
boundary() engine, which doesn’t match patterns but boundaries:
It would also mean that you had an argument
engine, that affected how another argument,
pattern, was interpreted, so it would repeat the problem in a slightly different form.
It’s appealing to all the details of the match wrapped up into a single object.