Type Inference vs. Static/Dynamic Typing

Jeff Atwood just wrote a nice piece on why type inference is convenient, using a C# sample:

I was absolutely thrilled to be able to refactor this code:

StringBuilder sb = new StringBuilder(256);
UTF8Encoding e = new UTF8Encoding();
MD5CryptoServiceProvider md5 = new MD5CryptoServiceProvider();

Into this:

var sb = new StringBuilder(256);
var e = new UTF8Encoding();
var md5 = new MD5CryptoServiceProvider();

It’s not dynamic typing, per se; C# is still very much a statically typed language. It’s more of a compiler trick, a baby step toward a world of Static Typing Where Possible, and Dynamic Typing When Needed.

It’s worth making a stronger demarcation among:

  • type inference, which you can do in any language
  • static vs. dynamic typing, which is completely orthogonal but all too often confused with inference
  • strong vs. weak typing, which is mostly orthogonal (e.g., C is statically typed because every variable has a statically known actual type, but also weakly typed because of its casts)

Above, Jeff explicitly separates inference and dynamic-ness. Unfortunately, later on he proceeds to imply that inference is a small step toward dynamic typing, which is stylistically true in principle but might mislead some readers into thinking inference has something to do with dynamic-ness, which it doesn’t.

Type Inference

Many languages, including C# (as shown above) and the next C++ standard (C++0x, shown below), provide type inference. C++0x does it via the repurposed auto keyword. For example, say you have an object m of type map<int,list<string>>, and you want to create an iterator to it:

map<int,list<string>>::iterator i = m.begin();// type is required in today’s C++, allowed in C++0x
auto i = m.begin();// type can be inferred in C++0x

How many times have you said to your compiler, “Compiler, you know the type already, why are you making me repeat it?!” Even the IDE can tell you what the type is when you hover over an expression.

Well, in C++0x you won’t have to any more, which is often niftily convenient. This gets increasingly important as we don’t want to, or can’t, write out the type ourselves, because we have:

  • types with more complicated names
  • types without names (or hard-to-find names)
  • types held most conveniently via an indirection

In particular, consider that C++0x lambda functions generate a function object whose type you generally can’t spell, so if you want to hold that function object and don’t have auto then you generally have to use an indirection:

function<void(void)> f = [] { DoSomething(); };
auto f = [] { DoSomething(); };
// hold via a wrapper — requires indirection
// infer the type and bind directly

Note that the last line above is more efficient than the C equivalent using a pointer to function, because C++ lets you inline everything. For more on this, see Item 46 in Scott Meyers’ Effective STL on why it’s preferable to use function objects rather than functions, because (counterintuitively) they’re more efficient.

Now, though there’s no question auto and var are great, there are some minor limitations. In particular, you may not want the exact type, but another type that can be converted to:

map<int,list<string>>::const_iterator ci = m.begin();// ci’s type is map<int,list<string>>::const_iterator
auto i = m.begin();// i’s type is map<int,list<string>>::iterator
Widget* w = new Widget();
const Widget* cw = new Widget();
WidgetBase* wb = new Widget();
shared_ptr<Widget> spw( new Widget() );
// w’s type is Widget*
// cw’s type is const Widget*
// wb’s type is WidgetBase*
// spw’s type is shared_ptr<Widget>
auto w = new Widget();// w’s type is Widget*

So C++0x auto (like C# var) only gets you the most obvious type. Still and all, that does cover a lot of the cases.

The important thing to note in all of the above examples is that, regardless how you spell it, every variable has a clear, unambiguous, well-known and predictable static type. C++0x auto and C# var are purely notational conveniences that save us from having to spell it out in many cases, but the variable still has one fixed and static type.

Static and Dynamic Typing

As Jeff correctly noted in the above-quoted part, this isn’t dynamic typing, which permits the same variable to actually have different types at different points in its lifetime. Unfortunately, he goes on to say the following that could be mistaken by some readers to imply otherwise:

You might even say implicit variable typing is a gateway drug to more dynamically typed languages.

I know Jeff knows what he’s talking about because he said it correctly earlier in the same post, but let’s be clear: Inference doesn’t have anything to do with dynamic typing. Jeff is just noting that inference just happens to let you declare variables in a style that can be similar to the way you do it all the time in a dynamically typed language. (Before I could post this, I see that Lambda the Ultimate also commentedon this confusion. At least one commenter noted that this could be equally viewed as a gateway drug to statically typed languages, because you can get the notational convenience without abandoning static typing.)

Quoting from Bjarne’s glossary:

dynamic type – the type of an object as determined at run-time; e.g. usingdynamic_cast or typeid. Also known as most-derived type.

static type – the type of an object as known to the compiler based on itsdeclaration. See also: dynamic type.

Let’s revisit an earlier C++ example again, which shows the difference between a variable’s static type and dynamic type:

WidgetBase* wb = new Widget();
if( dynamic_cast<Widget*>( wb ) ) { … }
// wb’s static type is WidgetBase*
// cast succeeds: wb’s dynamic type is Widget*

The static type of the variable says what interface it supports, so in this case wb allows you to access only the members of WidgetBase. The dynamic type of the variable is what the object being pointed to right now is.

In dynamically typed languages, however, variables don’t have a static type and you generally don’t have to mention the type. In many dynamic languages, you don’t even have to declare variables. For example:

// Python
x = 10;
x = “hello, world”;
// x’s type is int
// x’s type is str

Boost’s variant and any

There are two popular ways to get this effect in C++, even though the language remains statically typed. The first is Boost variant:

// C++ using Boost
variant< int, string > x;
x = 42;
x = “hello, world”;
x = new Widget();
// say what types are allowed
// now x holds an int
// now x holds a string
// error, not int or string

Unlike a union, a variant can include essentially any kind of type, but you have to say what the legal types are up front. You can even simulate getting overload resolution via boost::apply_visitor, which is checked statically (at compile time).

The second is Boost any:

// C++ using Boost
any x;
x = 42;
x = “hello, world”;
x = new Widget();

// now x holds an int
// now x holds a string
// now x holds a Widget*

Again unlike a union, an any can include essentially any kind of type. Unlike variant, however, any doesn’t make (or let) you say what the legal types are up front, which can be good or bad depending how relaxed you want your typing to be. Also, any doesn’t have a way to simulate overload resolution, and it always requires heap storage for the contained object.

Interestingly, this shows how C++ is well and firmly (and let’s not forget efficiently) on the path of Static Typing Where Possible, and Dynamic Typing When Needed.

Use variant when:

  • You want an object that holds a value of one of a specific set of types.
  • You want compile-time checked visitation.
  • You want the efficiency of stack-based storage where possible scheme (avoiding the overhead of dynamic allocation).
  • You can live with horrible error messages when you don’t type it exactly right.

Use any when:

  • You want the flexibility of having an object that can hold a value of virtually “any” type.
  • You want the flexibility of any_cast.
  • You want the no-throw exception safety guarantee for swap.
