19 Making data with …

19.1 What’s the problem?

A number of functions take ... to save the user from having to create a vector themselves:

19.3 Why is it important?

In general, I think it is best to avoid using ... for this purpose because it has a relatively small benefit, only reducing typing by three letters c(), but has a number of costs:

19.4 What are the exceptions?

Note that in all the examples above, the ... are used to collect a single details argument. It’s ok to use ... to collect data, as in paste(), data.frame(), or list().

19.5 How can remediate it?

If you’ve already published a function where you’ve used ... for this purpose you can change the interface by adding a new argument in front of ..., and then warning if anything ends up in ....

Because this is a interface change, it should be prominently advertised in packages.

19.6 How can I protect myself?

If you do feel that the tradeoff is worth it (i.e. it’s an extremely frequently used function and the savings over time will be considerable), you need to take some steps to minimise the downsides.

This is easiest if you’re constructing a vector that shouldn’t have names. In this case, you can call ellipsis::check_dots_unnamed() to ensure that no named arguments have been accidentally passed to .... This protects you against the following undesirable behaviour of sum():

If you want your vector to have names, the problem is harder, and there’s relatively little that you can. You’ll need to ensure that all other arguments get a . prefix (to minimise chances of a mismatch) and then think carefully about how you might detect problems by thinking about the expect type of c(...). As far as I know, there are no general techniques, and you’ll have to think about the problem on a case-by-case basis.

19.7 Selecting variables

A number of funtions in the tidyverse use ... for selecting variables. For example, tidyr::fill() lets you fill in missing values based on the previous row:

All functions that work like this include a call to tidyselect::vars_select() that looks something like this:

I now think that this interface is a mistake because it suffers from the same problem as sum(): we’re using ... to only save a little typing. We can eliminate the use of dots by requiring the user to use c(). (This change also requires explicit quoting and unquoting of vars since we’re no longer using ....)

In other words, I believe that better interface to fill() would be:

Other tidyverse functions like dplyr’s scoped verbs and ggplot2::facet_grid() require the user to explicitly quote the input. I now believe that this is also a suboptimal interface because it is more typing (var() is longer than c(), and you must quote even single variables), and arguments that require their inputs to be explicitly quoted are rare in the tidyverse.

That said, it is unlikely we will ever change functions, because the benefit is smaller (primarily improved consistency) and the costs are high, as it impossible to switch from an evaluated argument to a quoted argument without breaking backward compatibility in some small percentage of cases.