The Evils Of Positional Arguments

January 16, 2009

A few days ago I found myself writing pure ansi C code, after over two years of not touching it at all. In fact, these two years consisted mostly of Python, and a little of C#, both very far from it. While coding away, trying to get re-accustumed to mallocs and frees, return values instead of exceptions and pointer arithmatic, I had to perform a memcpy. Very easy, of course, but I could not for the life of me remember how to call it. I remembered rather vividly that it had 3 arguments, named ‘dst’, ‘src’ and ‘len’, the first two being void* and the latter.. perhaps plain int? But in which order they came, I could not recall. Now, that is not so evil, of course, because a quick look at the C reference refreshed my memory (until next time). This repeated with other functions (such as memset). But only when it started to happen with my own functions, I realized how unnatural it is for me, to remember things by order and not by name. Names allow us to establish a context, to find reasoning. Also, we are (relatively) good at remembering names.

By using position as a way for transferring parameters, we are essentially calling them “one”, “two”, “three” and etc., which are still names, but are devoid of context or meaning, and are the same for every function. It is not only hard to remember the right order; we can endure it (as reality proves), but it makes our code less readable, less natural.

Also, it complicates changing existing APIs. Move an argument from its position, and you have to change the position everywhere in the code. This can easily lead to subtle bugs (consider switching dst and src, it may be very hard to find where this happened!). So if you append parameters, they must come in the end, which somtimes does not make much sense.

While all programming languages (that I know of) sin in this evil of positionality , one device takes it one step ahead: Regular expressions. Specfically, RE Substitution. Not only do you address its matches using numbers, but actually figuring out which number goes to which match can be confounding. I cannot count how many bugs this must have produced.

In conclusion:

Design your systems differently.
Let computers count. Let humans use names!

Advertisements

Separation of Data and Functionality

August 18, 2008

It’s well-known that computer algorithms (and thus programs) consist of two elements: data, and functionality (operations on said data). Different programming models offer different ways to look at the interaction between these two elements.

Object-Orientation is probably the newest and most popular model in that regard (and it’s pretty old), and it says the following:

  1. Data and functionality are tightly coupled.
  2. Data is more important.

This makes sense. Functionality is meaningless without data to work with (data exists even without manipulation, it’s just kinda boring), and it’s also meaningless without the correct kind of data. So the solution is to put data inside what they call “objects” (I feel a bit like Dr. Evil with these quotes) and surround it with methods to act upon that specific data.

While this makes sense, it also misses an opportunity: Functionality doesn’t have to work on specific data; it can work on a class of data (and I don’t mean OO-class, I mean kind of data). This is proven true by functional-programming styles all the time: A tuple of 3 integers can represent a color (RGB), or a position (xyz), and a length function can be useful for both cases.

This example can be imitated in OO, if both the Color class and Position class inherit from class Vector3. However, it is not considered good practice in OO to inherit by data rather than by nature. This can be easily seen in Java’s awt’s definition of Color: it inherits from Object, and implements no vector behaviour. Well it can’t inherit from vector, because it has more data than just RGB, and it needs Object‘s functionality. So if we want to measure the intensity of the color (“length”), we have to write a function especially for Color, which is functionally equal to length. Not pretty!

I hope the idea behind the separation of data and functionality is beginning to become clear. Data exists even without being manipulated, and should be class-able without being dependant on manipulations. In order words, It sometimes makes sense to manipulate data by what it is, and not what it is supposed to be.

The problem I presented, can be solved by a relatively new design mechanism: protocol, or contract. It’s a bit like interfaces, but better. Contracts allow to specify functionality without inheritance. By giving each data-class its own contract (at least conceptually), you can design some separation of data and functionality.

But this alone is a bit of a hack. True separation goes much further. For example, did you know that in python, converting a list to a tuple is O(n) ?  There is no reason for this to happen, and if there was any separation of data and functionality, it would be O(1).

I have a lot more to say about this subject, but I think I’m done for now. I hope this text wasn’t too confused or vague; I was figuring it out as I was writing, and it might reflect.

So, to conclude:

  1. Data and functionality are not tightly coupled.
  2. Data-class and functionality are tightly coupled.
  3. Data is more important.

Broken Inheritance

June 23, 2008

Hello readers, and welcome to my new blog.

Inheritance (of OO) is a tricky business. When creating a sub-class, nothing clear is said about the purpose of this new sub-class, or the relation between it and its parent. One is forced to guess, based on how the sub-class is written and used. This has been well said before, so I will not repeat it. But if it is not clear to humans, why should it be clear to a compiler?

Allow me to be concrete, by telling you my (sad) story. This all started as I was writing a tree class in C#. To simplify matters, suppose that this is how it looked:

class TreeNode
{
    public TreeData data {get; set;};
    public IEnumerable<TreeNode> subnodes();
    public void AddSubNode(TreeData data);
    /* ...
    A lot of useful methods for trees,
    such as bfs/dfs visitors
    ... */
}

This was very useful, but then I needed some new functionality. This functionality did not fit inside Tree (for reasons I will spare you, as you could probably make some up yourself), so I decided to create a new class, called BetterTreeNode.
So, suppose I wrote this code:

class BetterTreeNode : TreeNode
{
    /* ...
    Additional functionality
    ... */
}

I was happy, when horror struck: The class was flawed. I will spare you the detective work (pause now if you want to guess it by yourself): Whenever a method travelled the tree, the subnodes() method would return a TreeNode rather than a BetterTreeNode. Same with other methods. This required an explicit cast after every call to an inherited method. Also, methods such as AddSubNode were written to create a new TreeNode instance. In short, a mess. To solve it, I had the following options:
1. Override (but with the new keyword) every inherited method to return proper values. Rewrite AddSubNode (in the hope that it is possible).
2. Not inherit. Use TreeNode as a shadow tree. This means shadowing all the methods I want to use.
3. Rewrite everything from scratch. While the first two are annoying, this is a complete failure at code-reuse.

But, none of these options seem good; none of them answer my need for an easy and concise code-reuse.
I felt the true solution would be to have the methods automatically re-written to return BetterTreeNode. But here enters the ambiguity and obscureness of inheritance: Perhaps for some methods I would not want it. The compiler cannot guess this.

This problem was demonstrated using C#, but it exists in every statically-typed object-oriented language I know (granted, not that many).

Therefor, I propose for a new functionality for these languages – a “thisclass” keyword.
So this is how I would write TreeNode:

class TreeNode
{
    public TreeData data {get; set;};
    public IEnumerabl<thisclass> subnodes();
    public void AddSubnode(TreeData data)
    {
        ... new thisclass(...) ...
    }
    ...
}

I also have a suggestion: Take another look at code-reuse. Inheritance is a great way to re-use and extend code in classes, but perhaps it is not always the best way. Perhaps other ways should be sought after.