Contracts and protocols as a substitute to types and interfaces

December 8, 2011

I am a big fan of assertions. Whenever I reach a point in my code where I say “that pointer can’t possibly be null”, I immediately write – assert( p != NULL ); – and whenever I say “this list can’t possibly be longer than 256” I write assert len(l) <= 256. If you wonder why I keep doing this, it’s because very often I’m wrong. It’s not that I’m a particularly bad programmer, but sometimes I make mistakes, and even when I don’t, sometimes I get very unexpected input, and even when I don’t, sometimes other pieces of code conspire against me. Assertions save me from mythical bug hunts on a regular basis.

So, it’s not a big surprise that I’m a big fan of contracts too. If you don’t know what contracts are, they’re essentially asserts that run at the beginning and end of each function, and check that the parameters and the return values meet certain expectations. In a way, function type declarations, as can be found in C or Java, are a special case of contracts. (Would you like to know more?)

Why not just use duck-typing?

Duck typing is great, but in my experience it becomes a burden as the system grows in size and complexity. Sometimes objects aren’t fully used right away; they are stored as an instance variable, pickled for later use, or sent to another process or another computer. When you finally get the AttributeError, it’s in another execution stack, or in another thread, or in another computer, and debugging it becomes very unpleasant! And what happens when you get the correct object, but it’s in the wrong state? You won’t even get an exception until something somewhere gets corrupted.

In my experience, using an assertion system is the best way to find the subtle bugs and incongruities of big and complex systems.

Why do we need something new?

Types are very confining, even in “typeless” dynamic languages. Take Python: If your API has to verify that it’s getting a file object, the only way is to call isinstance(x, file). That forces the caller to inherit from file, even if he’s writing a mock object (say, as an RPC proxy) that makes no disk access. In any static-type language the I know, it’s impossible to say that you accept either int or float, and you’re forced to either write the same function twice, or use a template and just define it twice.

Today’s interfaces are ridiculous. In C#, an interface with a method that returns a IList<int> will be very upset if you try to implement it as returning List<int>! And don’t even try to return a List<int> when you’re expected to return List. Note that C# will gladly cast between these types in the code, but when dealing with interfaces and function signatures it just goes nuts. It gets very annoying when you’re implementing an ITree inteface and can’t use your own class as nodes‘ type because the signatures collide, and instead you have to explicitly cast from ITree at every method. But I digress.

Even if today’s implementations were better, types are just not enough. They tell you very little about the input or the output. You want to be able to test its values, lengths, states, and maybe to even interact with it to some degree. What we have just doesn’t cut it.

What should we do instead?

Contracts are already pretty good: they have a lot of flexibility and power, they’re self-documenting, and they can be reasoned upon by the compiler/interpreter (“Oh it only accepts a list[int<256]? Time to use my optimized string functions instead!”). But they only serve as a band-aid to existing type systems. They don’t give you the wholesome experience of abstract classes and methods. But, they can.

To me, contracts are much bigger than just assertions. I see them as stepping-stones to a completely new paradigm, that will replace our current system of interfaces, abstract methods, and needless inheritance, with “Contract Protocols”.

How? These are the steps that we need to take to get there:
  1.  Be able to state your assertions about a function, in a declarative manner. Treat these assertions as an entity called a “contract”.  We’re in the middle of this step, and some contract implementations (such as the wonderful PyContracts for python) have already taken the declarative entity route, which is essential for the next step.
  2. Be able to compare contracts. Basically, I want to be able to tell if a contract is contained within another contract, so if C1⊂C2 and x∊C1 then x∊C2. I suspect it’s easier said then done, but I believe that the following (much easier) steps render it as worth doing.
  3. Be able to bundle contracts in a “contract protocol”, and use it to test a class. A protocol is basically just a mapping of {method-name: contract}, and applying it to a class tests that each method exists in the class, and that its contract is a subset of the protocol’s corresponding contract. If these terms are met, it can be said that the class implements the protocol. A class can implement several protocols, obviously.
  4. Be able to compare protocols. Similarly to contracts, we want to check if a protocol is a subset of another protocol. Arguably, it’s the same as step 3.
  5. Contracts can also check if an instance implements a protocol. Making a full circle, we can now use protocols to check for protocols and so on, allowing layers of complexity. We can now write infinitely detailed demands about what a variable should be, but very concisely.

When we finish point 5, we have a complete and very powerful system in our hands. We don’t need to ever discuss types, except for the most basic ones. Inheritance is now only needed to gain functionality, not identity. We can use it for debug-only purposes, but also for run-time decisions in production (For example, in a Strategy pattern).

Example

As a last attempt to get my point across, here is vaguely how I imagine the file protocol to look in pseudo-code.

It doesn’t do the idea any justice, but hopefully it’s enough to get you started.

protocol Closeable:
<pre>    close()

protocol _File < Closeable:
    [str] name
    [int,>0] tell()
    seek( [int,in (0,1,2)] )

protocol ReadOnlyFile < _File:
    [str,!=''] read( [int,>0 or None]? )
    [Iterable[str]] readlines( )
    [Iterable[str]] __iter__( )

protocol WriteOnlyfile < _File:
    [int,>0] write( [str,!=''] )
    writelines( [Iterable[str]] )
    flush()

protocol RWFile < ReadOnlyFile | WriteOnlyFile:
    pass

>>> print ReadOnlyFile < RWFile
True
>>> print implements( open('bla'), ReadOnlyFile )
True
>>> print implements( open('bla'), Iterable )  # has __iter__ function,
True
>>> print implements( open('bla'), Iterable[int] )
False
>>> print implements( open('bla'), WriteOnlyFile )  # default is 'r'
False
>>> print implements( open('bla'), RWFile )
False
>>> print implements( open('bla', 'w+'), RWFile )
True
Advertisements

Separation of Data and Functionality

August 18, 2008

It’s well-known that computer algorithms (and thus programs) consist of two elements: data, and functionality (operations on said data). Different programming models offer different ways to look at the interaction between these two elements.

Object-Orientation is probably the newest and most popular model in that regard (and it’s pretty old), and it says the following:

  1. Data and functionality are tightly coupled.
  2. Data is more important.

This makes sense. Functionality is meaningless without data to work with (data exists even without manipulation, it’s just kinda boring), and it’s also meaningless without the correct kind of data. So the solution is to put data inside what they call “objects” (I feel a bit like Dr. Evil with these quotes) and surround it with methods to act upon that specific data.

While this makes sense, it also misses an opportunity: Functionality doesn’t have to work on specific data; it can work on a class of data (and I don’t mean OO-class, I mean kind of data). This is proven true by functional-programming styles all the time: A tuple of 3 integers can represent a color (RGB), or a position (xyz), and a length function can be useful for both cases.

This example can be imitated in OO, if both the Color class and Position class inherit from class Vector3. However, it is not considered good practice in OO to inherit by data rather than by nature. This can be easily seen in Java’s awt’s definition of Color: it inherits from Object, and implements no vector behaviour. Well it can’t inherit from vector, because it has more data than just RGB, and it needs Object‘s functionality. So if we want to measure the intensity of the color (“length”), we have to write a function especially for Color, which is functionally equal to length. Not pretty!

I hope the idea behind the separation of data and functionality is beginning to become clear. Data exists even without being manipulated, and should be class-able without being dependant on manipulations. In order words, It sometimes makes sense to manipulate data by what it is, and not what it is supposed to be.

The problem I presented, can be solved by a relatively new design mechanism: protocol, or contract. It’s a bit like interfaces, but better. Contracts allow to specify functionality without inheritance. By giving each data-class its own contract (at least conceptually), you can design some separation of data and functionality.

But this alone is a bit of a hack. True separation goes much further. For example, did you know that in python, converting a list to a tuple is O(n) ?  There is no reason for this to happen, and if there was any separation of data and functionality, it would be O(1).

I have a lot more to say about this subject, but I think I’m done for now. I hope this text wasn’t too confused or vague; I was figuring it out as I was writing, and it might reflect.

So, to conclude:

  1. Data and functionality are not tightly coupled.
  2. Data-class and functionality are tightly coupled.
  3. Data is more important.