Author:
John Högberg <john(at)erlang(dot)org> , Ilya Klyuchnikov <ilya(dot)klyuchnikov(at)gmail(dot)com>
Status:
Accepted
Type:
Standards Track
Created:
7-Aug-2024
Erlang-Version:
OTP-28

EEP 71: Clarify the type documentation, especially regarding type variables #

Abstract #

Type variables are an often-used yet very misunderstood part of the type language. As dialyzer has been lax in enforcing their defined semantics owing to a few bugs, many users have been led to believe that they are simple generics as in most other languages. As a significant amount of code has been written with this in mind, OTP cannot fix the bugs that made dialyzer lax by accident without breaking the analysis of a lot of code.

While that would be fine if dialyzer were the only tool out there, all the other code analyzers we know of interpret type variables as generics which would fragment the ecosystem should we elect to fix the bugs. One application might have been written with eqWAlizer in mind and another with dialyzer, preventing either tool from being applied to the whole release.

As dialyzer is simply broken in this regard, this EEP aims to solve the problem by changing the definition of type variables from equality constraints to generics (parametric polymorphism). We also propose several other changes to make the type language more useful and less ambiguous.

Rationale #

As currently documented, variables in -specs and -types act just like in ordinary code. That is, if the variable V appears several times it has the same value in all places. In a language like C++ or Java where said value is a type, this works pretty well:

template<typename V>
V foo(V a, V b) {
   return a + b;
}

For example, foo(12, 34) works because both arguments have the same type int (yielding int for T), but foo(12, 34.0) doesn’t because int is not double.

However, as the documentation for the Erlang type language says that repeated variables refer to the same value at runtime, tools like gradualizer and eqWAlizer are not free to treat them as generics. For example:

-spec my_add(T, T) -> T.
my_add(A, B) -> A + B.

Here, the only possible values of T that will work are the various representations of zero: foo(12, 34) will fail because 12 is not 34, and foo(1, 1) will fail because the result 2 is not 1.

This makes it impossible to declare that a function takes a value of a certain type and returns something of the same type (but not necessarily same value). This makes sense in dialyzer as values and types are the same thing in its paradigm1, but restricts what other type checkers can do; while the documentation states that the “extra information” given by type variables may be ignored, it leaves no room for other interpretations when they are taken into account.

`integer()` is. The type of `1` is simply `1`.

Changes #

To solve this issue and make the type language less surprising, we propose the following changes:

  1. Redefine :: as alias instead of “subtype of.”

    All currently known tools implement this interpretation2. Should bounded quantification be needed in the future, the “subtype of” operator can be introduced later under the more widely used <: or =< syntaxes.

    Multiple aliases under the same name are rejected, making the following illegal:

    -spec multiple(X :: integer(), X :: integer()) -> atom().
    -spec when_multiple(X) -> atom() when X :: integer(), X :: number().
    

    To prevent breaking existing code as a result of this change, the compiler will not enforce the above, leaving that to the type checking tools (or optionally a compiler flag).

  2. Normalize types in the parser, substituting aliases and treating “annotated types” the same as when. This makes the following signatures equivalent:

    -spec xyzzy(A) -> term() when A :: number().
    -spec waldo(B :: number()) -> term().
    -spec fred(number()) -> term().
    

    Where B :: number() is mere shorthand for B ... when B :: number(), and all variants declare that the first argument is a number().

  3. Treat type variables as generics rather than as equality constraints.

    Example:

    The current lists:append type spec is:

    -spec append(List1, List2) -> List3 when List1 :: [T], List2 :: [T], List3 :: [T], T :: term().
    

    With the defined semantics, T :: term() makes the append function to refer to a top type. Instead, the following is preferred and more in line with current type systems:

    -spec append(List1, List2) -> List3 when List1 :: [T], List2 :: [U], List3 :: [T | U].
    
  4. Have function contracts (-spec) describe argument types as they go in.

    A very surprising aspect of function contracts is that they describe the types of the arguments when the function returns successfully, and do not affect argument types within the function. For example:

     -spec foo(integer()) -> integer().
     foo(X) ->
         bar(X), %% X is not necessarily an integer here!
         X + 1.
    

    Most people take the above to mean that the type passed on to bar/1 is integer(), yet it can be whatever is passed to foo/1 including values that cause foo/1 to fail. This often hides the fact that there is unreachable code in callees such as bar/1 here, and sometimes results in cryptic warnings describing something the user believes could never happen.

    This also makes it much easier for a user to see what the type of a variable is at any point: just start with what it says in the -spec. At present, the type of X depends entirely on how X got there which is not always trivial to figure out.

Reference Implementation #

We will make a PR with changes in the documentation to reflect above once it is accepted

Copyright #

This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.

  1. In an untyped language we cannot simply say that the type of 1 is integer() since 1 | foo is just as much of a superset of 1 as 

  2. dialyzer also does this as a consequence of the bugs mentioned in the abstract.