Monday, May 24, 2010

What motivates programmers?

Watch this amazing lecture. It's only 10 minutes, but if you don't have that much time, at least go to 5:40.

According to many experiments, for cognitive tasks, the main factors for better performance (and, incidentally, personal satisfaction) are:
  • Autonomy
  • Mastery
  • Purpose
It applies to everyone, but especially to programmers, since programming is a highly creative task.

Thursday, May 20, 2010

The Humble Programmer

A recent reddit post introduced me to a wonderfully lucid description of the problem of software engineering. Djikstra himself provides a particular prediction, based on well-organized reasoning, that software will become both cheaper and more reliable. While the prediction's fruition may be subject to debate, his arguments are as relevant today as they ever were.
The vision is that, well before the seventies have run to completion, we shall be able to design and implement the kind of systems that are now straining our programming ability, at the expense of only a few percent in man-years of what they cost us now, and that besides that, these systems will be virtually free of bugs.

These two improvements go hand in hand. In the latter respect software seems to be different from many other products, where as a rule a higher quality implies a higher price. Those who want really reliable software will discover that they must find means of avoiding the majority of bugs to start with, and as a result the programming process will become cheaper. If you want more effective programmers, you will discover that they should not waste their time debugging; they should not introduce the bugs to start with. [Emphasis added.] In other words: both goals point to the same change.
...
I shall give you six arguments in support of ... the technical feasibility of the revolution which might take place [in the reliability of software]...
I summarize them here:
  1. Avoidance of unmanageable tasks
  2. Smaller solution spaces
  3. Correctness by construction: Today a usual technique is to make a program and then to test it. But: program testing can be a very effective way to show the presence of bugs, but is hopelessly inadequate for showing their absence.
  4. More abstraction
  5. Better programming languages: the influence of the tool we are trying to use upon our own thinking habits.
  6. Code reuse: the wider applicability of nicely factored solutions.

In the most interesting part of the essay, Djikstra disparages clever programming tricks, which he admits are often encouraged by the languages themselves.
The analysis of the influence that programming languages have on the thinking habits of its users, and the recognition that, by now, brainpower is by far our scarcest resource, they together give us a new collection of yardsticks for comparing the relative merits of various programming languages.

The competent programmer is fully aware of the strictly limited size of his own skull; therefore he approaches the programming task in full humility, and among other things he avoids clever tricks like the plague. In the case of a well-known conversational programming language I have been told from various sides that as soon as a programming community is equipped with a terminal for it, a specific phenomenon occurs that even has a well-established name: it is called "the one-liners".

It takes one of two different forms: one programmer places a one-line program on the desk of another and either he proudly tells what it does and adds the question "Can you code this in less symbols?" —- as if this were of any conceptual relevance! -— or he just asks "Guess what it does!". From this observation we must conclude that this language as a tool is an open invitation for clever tricks; and while exactly this may be the explanation for some of its appeal, viz. to those who like to show how clever they are, I am sorry, but I must regard this as one of the most damning things that can be said about a programming language.

I don't know who scare me more: authors of excessively terse code, or blind advocates of particular programming paradigms (viz. OOP).

The full essay is The Humble Programmer, by Edsger W. Dijkstra.
More discussion is here.

Wednesday, May 5, 2010

Problems with Perforce (p4)

Gentle Reader,

First I should say that p4 is great for many jobs. In particular, it's efficient for large files or large numbers of files. It also fits well with a common work-flow: Several projects checked out, with several branches, all in one working directory.

Besides, with a title that conjures Shakespeare, it is too great to be by me gainsaid. If it works well enough for you, then you don't need this weblog. Get thee to a nunnery. Parting is such sweet sorrow, but get thee gone. Stop reading!

If on the other hand you are required to use Perforce by your employer and wish it were not so, then like the Duke of Clarence, have patience; you must perforce. Hopefully, after you show this blog to your co-workers, your imprisonment shall not be long.

To sleep, perforce to dream

P4 has a reputation for being fast. Well, it is fast on the server, but communicating with the server, not so much ado.

Suppose you need to run 'p4 fstat' or 'p4 diff' on a huge number of files. And remember: P4 is supposed to be great on large numbers of files.
p4 diff files*

That will print a bunch of info. Great....

Now suppose this is part of a script. You want to learn about all files simultaneously. The output has errors for some files, and some files are not mentioned at all. Consider 5 paths, in 5 different states:
ls non-existent not-added unmapped-but-changed opened-up-to-date unopened-up-to-date
[STDOUT]
 not-added
 unmapped-but-changed
 opened-up-to-date
 unopened-up-to-date
[STDERR]
 ls: non-existent: No such file or directory
Here are several flavors of 'p4 diff':
p4 diff -sa non-existent not-added unmapped-but-changed opened-up-to-date unopened-up-to-date
[STDOUT]
 /home/wshakes/work/opened-up-to-date
[STDERR]
 non-existent - file(s) not opened on this client.
 not-added - file(s) not opened on this client.
 unmapped-but-changed - file(s) not opened on this client.
p4 diff -sr non-existent not-added unmapped-but-changed opened-up-to-date unopened-up-to-date
[STDOUT]
 /home/wshakes/work/unopened-up-to-date
[STDERR]
 non-existent - file(s) not opened on this client.
 not-added - file(s) not opened on this client.
 unmapped-but-changed - file(s) not opened on this client.
p4 diff -se non-existent not-added unmapped-but-changed opened-up-to-date unopened-up-to-date
[STDOUT]
 not-added - file(s) not on client.
 unmapped-but-changed - file(s) not on client.
 opened-up-to-date - file(s) up-to-date.
 unopened-up-to-date - file(s) up-to-date.
[STDERR]
 unmapped-but-changed
It is very difficult to match each section of output to the corresponding file on the command-line. First, you have to parse stderr and stdout. Then, you have to figure out how to map the filename listed in the output back to the filename on the command-line, which can be very tricky in sub-directories.

That's way too much work, especially the file-path mapping, so you decide to run the command on one file at a time.
for f in files*; do
p4 diff -sa $f > $f.diff-sa
done
But soft! For large numbers of files, that will take minutes, or worse. So you decide to use a Perl API. (The C API does not prove to be any more helpful.)
use P4;
$p4 = new P4;
$p4->Connect();
for $file (@files) {
  $fdiffs = ($p4->Run('diff -sa', $file))[0]
  if ($p4->ErrorCount()) {
    print $p4->Errors();
  }
  Process($fdiffs, $file);
}
Most excellent, i' faith! $fdiffs is a hash of the fields that would have gone to stdout. You still have the pesky stderr output, but you know what everything refers to. Only there's one thing wanting ...

Behold! It's still slow -- not as slow, since it now maintains the server connection, but nowhere near so fast as 'p4 diff files*' all at once. Fine. You can pass multiple filenames to the Run() command.
use P4;
$p4 = new P4;
$p4->Connect();
@fdiffs = $p4->Run('diff -sa', @files);
if ($p4->ErrorCount()) {
  print $p4->Errors();
}

for $file (@files) ...
Hark! Not only are you back to the problem of parsing stderr, but you also need to map @fdiffs back to @files in order to know which files were ignored.

This is incredible. The API returns an array of data-structures, but the size of the array does not match the size of the request. What would be so wrong with returning 'undef' to denote the missing files, and maybe '{}' for errors?

Other problems


I could go on and on about minor annoyances, but the problems above do me most insupportable vexation. They make p4 completely impractical, at least in many cases. Just beware. As the Bard wrote, perforce must whither and come to deadly use.