10 Required args shouldn’t have defaults

10.1 What’s the problem?

The absence of a default value should imply than an argument is required; the presence of a default should imply that an argument is optional.

When reading a function, it’s important to be able to tell at a glance which arguments must be supplied and which are optional. Otherwise you need to rely on the user having carefully read the documentation.

10.2 What are some examples?

  • In sample() neither x not size has a default value, suggesting that both are required, and the function would error if you didn’t supply them. But size is optional, determined by a complex conditional.

    sample(1:4)
    #> [1] 1 2 3 4
    sample(4)
    #> [1] 4 2 1 3
  • rt() (draw random numbers from the t-distribution) looks like it requires the ncp parameter but it doesn’t.

  • download.file() looks like it requires the method argument but actually consults a global option (download.file.method) if it’s not supplied.

  • lm() does not have defaults for formula, data, subset, weights, na.action, or offset. Only formula is actually required, but even its absence fails to generate a clear error message:

    lm()
    #> Error in terms.formula(formula, data = data): argument is not a valid model
  • help() and vignette() have no default for their first argument, suggesting that they’re required. But they’re not: calling help() or vignette() without any arguments lists all help topics and vignettes respectively.

  • In diag(), the argument x has a default 1, but it’s required: if you don’t supply it you get an error:

    diag()
    #> Error in diag(): argument "nrow" is missing, with no default
    diag(x = 1)
    #>      [,1]
    #> [1,]    1

    Conversely, nrow and ncol don’t have defaults but aren’t required.

  • In ggplot2::geom_abline(), slope and intercept don’t have defaults but are not required. If you don’t supply them they default to slope = 1 and intercept = 0, or are taken from aes() if they’re provided there.

A common warning sign is the use of missing() inside the function.

10.3 What are the exceptions?

There are two exceptions to this rule:

  • A pair of arguments that provide an alternative specification for the same underlying concept. It is only ever possible to supply one argument.

  • When you can either supply one complex object, or a handful of simpler objects.

In both cases, I believe the benefits outweigh the costs of violating a standard pattern.

10.3.1 Pair of mututally exclusive arguments

A number of functions that allow you to supply exactly one of two possible arguments:

  • read.table() allows you to supply data either with a path to a file, or inline as text.

  • rvest::html_node() allows you to select HTML nodes either with a css selector or an xpath expression.

  • forcats::fct_other() allows you to either keep or drop specified factor values.

  • modelr::seq_range() allows you create a sequence over the range of x by either specifying the length of the sequence (with n) or the distance between values (with by).

If you use this technique, use xor() and missing() to check that exactly one argument is supplied:

if (!xor(missing(keep), missing(drop))) {
  stop("Must supply exactly one of `keep` and `drop`", call. = FALSE)
}

And in the documentation, make it clear that only one of the pair can be supplied:

#' @param keep,drop Pick one of `keep` and `drop`:
#'   * `keep` will preserve listed levels, replacing all others with 
#'     `other_level`.
#'   * `drop` will replace listed levels with `other_level`, keeping all
#'     as is.

This technique should only be used for are exactly two possible arguments. If there are more than two , that is generally a sign you should create more functions. See case studies in Chapter 11 and Section 8.4.1 for examples.

10.3.2 One compound argument vs multiple simple arguments

A related, if less generally useful, form is to allow the user to supply either a single complex argument or several smaller arguments. For example:

  • stringr::str_sub(x, cbind(start, end)) is equivalent to str_sub(x, start, end).

  • stringr::str_replace_all(x, c(pattern = replacement)) is equivalent to stringr(x, pattern, replacement).

  • rgb(cbind(r, g, b)) is equivalent to rgb(r, g, b) (See Chapter 17 for more details).

  • options(list(a = 1, b = 2)) is equivalent to options(a = 1, b = 2).

The most compelling reason to provide this sort of interface is when another function might return a complex output that you want to use as an input. For example, it seems reasonable that you should be able to feed the output of str_locate() directly into str_sub():

library(stringr)

x <- c("aaaaab", "aaab", "ccccb")
loc <- str_locate(x, "a+b")

str_sub(x, loc)
#> [1] "aaaaab" "aaab"   NA

But equally, it would be weird to have to provide a matrix when subsetting with known positions:

str_sub("Hadley", cbind(2, 4))
#> [1] "adl"

So str_sub() allows either individual vectors supplied to start and end, or a two-colummn matrix supplied to start.

To implement in your own functions, you should branch on the type of the first argument:

(Why? Why not branch if the other arguments are missing? Or some combination?)

str_sub <- function(string, start = 1L, end = -1L) {
  if (is.matrix(start)) {
    if (!missing(end)) {
      stop("`end` must be missing when `start` is a matrix", call. = FALSE)
    }
    if (ncol(start) != 2) {
      stop("Matrix `start` must have exactly two columns", call. = FALSE)
    }
    stri_sub(string, from = start[, 1], to = start[, 2])
  } else {
    stri_sub(string, from = start, to = end)
  }
}

And make it clear in the documentation:

#' @param start,end Integer vectors giving the `start` (default: first)
#'   and `end` (default: last) positions, inclusively. Alternatively, you
#'   pass a two-column matrix to `start`, i.e. `str_sub(x, start, end)`
#'   is equivalent to `str_sub(x, cbind(start, end))`