# 24 Type-stability

The less you need to know about a function’s inputs to predict the type of its output, the better. Ideally, a function should either always return the same type of thing, or return something that can be trivially computed from its inputs.

If a function is **type-stable** it satisifes two conditions:

You can predict the output type based only on the input types (not their values).

If the function uses

`...`

, the order of arguments in does not affect the output type.

`library(vctrs)`

## 24.1 Simple examples

`purrr::map()`

and`base::lapply()`

are trivially type-stable because they always return lists.`paste()`

is type stable because it always returns a character vector.`vec_ptype(paste(1)) #> Prototype: character vec_ptype(paste("x")) #> Prototype: character`

`base::mean(x)`

almost always returns the same type of output as`x`

. For example, the mean of a numeric vector is a numeric vector, and the mean of a date-time is a date-time.`vec_ptype(mean(1)) #> Prototype: double vec_ptype(mean(Sys.time())) #> Prototype: datetime<local>`

`ifelse()`

is not type-stable because the output type depends on the value:`vec_ptype(ifelse(NA, 1L, 2)) #> Prototype: logical vec_ptype(ifelse(FALSE, 1L, 2)) #> Prototype: double vec_ptype(ifelse(TRUE, 1L, 2)) #> Prototype: integer`

## 24.2 More complicated examples

Some functions are more complex because they take multiple input types and have to return a single output type. This includes functions like `c()`

and `ifelse()`

. The rules governing base R functions are idiosyntractic, and each function tends to apply it’s own slightly different set of rules. Tidy functions should use the consistent set of rules provided by the vctrs package.

## 24.3 Challenge: the median

A more challenging example is `median()`

. The median of a vector is a value that (as evenly as possible) splits the vector into a lower half and an upper half. In the absense of ties, `mean(x > median(x)) == mean(x <= median(x)) == 0.5`

. The median is straightforward to compute for odd lengths: you simply order the vector and pick the value in the middle, i.e. `sort(x)[(length(x) - 1) / 2]`

. It’s clear that the type of the output should be the same type as `x`

, and this algorithm can be applied to any vector that can be ordered.

But what if the vector has an even length? In this case, there’s no longer a unique median, and by convention we usually take the mean of the middle two numbers.

In R, this makes the `median()`

not type-stable:

```
typeof(median(1:3))
#> [1] "integer"
typeof(median(1:4))
#> [1] "double"
```

Base R doesn’t appear to follow a consistent principle when computing the median of a vector of length 2. Factors throw an error, but dates do not (even though there’s no date half way between two days that differ by an odd number of days).

```
median(factor(1:2))
#> Error in median.default(factor(1:2)): need numeric data
median(Sys.Date() + 0:1)
#> [1] "2019-05-13"
```

To be clear, the problems that this cause in practice are quite small, but this makes analysis of `median()`

more complex, and it makes it to know what principle you should adhere to when creating `median`

methods for new vector classes.

```
median("foo")
#> [1] "foo"
median(c("foo", "bar"))
#> Warning in mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]):
#> argument is not numeric or logical: returning NA
#> [1] NA
```

## 24.4 Exercises

How is a date like an integer? Why is this inconsistent?

`vec_ptype(mean(Sys.Date())) #> Prototype: date vec_ptype(mean(1L)) #> Prototype: double`