Wednesday, May 5, 2010

Problems with Perforce (p4)

Gentle Reader,

First I should say that p4 is great for many jobs. In particular, it's efficient for large files or large numbers of files. It also fits well with a common work-flow: Several projects checked out, with several branches, all in one working directory.

Besides, with a title that conjures Shakespeare, it is too great to be by me gainsaid. If it works well enough for you, then you don't need this weblog. Get thee to a nunnery. Parting is such sweet sorrow, but get thee gone. Stop reading!

If on the other hand you are required to use Perforce by your employer and wish it were not so, then like the Duke of Clarence, have patience; you must perforce. Hopefully, after you show this blog to your co-workers, your imprisonment shall not be long.

To sleep, perforce to dream

P4 has a reputation for being fast. Well, it is fast on the server, but communicating with the server, not so much ado.

Suppose you need to run 'p4 fstat' or 'p4 diff' on a huge number of files. And remember: P4 is supposed to be great on large numbers of files.
p4 diff files*

That will print a bunch of info. Great....

Now suppose this is part of a script. You want to learn about all files simultaneously. The output has errors for some files, and some files are not mentioned at all. Consider 5 paths, in 5 different states:
ls non-existent not-added unmapped-but-changed opened-up-to-date unopened-up-to-date
[STDOUT]
 not-added
 unmapped-but-changed
 opened-up-to-date
 unopened-up-to-date
[STDERR]
 ls: non-existent: No such file or directory
Here are several flavors of 'p4 diff':
p4 diff -sa non-existent not-added unmapped-but-changed opened-up-to-date unopened-up-to-date
[STDOUT]
 /home/wshakes/work/opened-up-to-date
[STDERR]
 non-existent - file(s) not opened on this client.
 not-added - file(s) not opened on this client.
 unmapped-but-changed - file(s) not opened on this client.
p4 diff -sr non-existent not-added unmapped-but-changed opened-up-to-date unopened-up-to-date
[STDOUT]
 /home/wshakes/work/unopened-up-to-date
[STDERR]
 non-existent - file(s) not opened on this client.
 not-added - file(s) not opened on this client.
 unmapped-but-changed - file(s) not opened on this client.
p4 diff -se non-existent not-added unmapped-but-changed opened-up-to-date unopened-up-to-date
[STDOUT]
 not-added - file(s) not on client.
 unmapped-but-changed - file(s) not on client.
 opened-up-to-date - file(s) up-to-date.
 unopened-up-to-date - file(s) up-to-date.
[STDERR]
 unmapped-but-changed
It is very difficult to match each section of output to the corresponding file on the command-line. First, you have to parse stderr and stdout. Then, you have to figure out how to map the filename listed in the output back to the filename on the command-line, which can be very tricky in sub-directories.

That's way too much work, especially the file-path mapping, so you decide to run the command on one file at a time.
for f in files*; do
p4 diff -sa $f > $f.diff-sa
done
But soft! For large numbers of files, that will take minutes, or worse. So you decide to use a Perl API. (The C API does not prove to be any more helpful.)
use P4;
$p4 = new P4;
$p4->Connect();
for $file (@files) {
  $fdiffs = ($p4->Run('diff -sa', $file))[0]
  if ($p4->ErrorCount()) {
    print $p4->Errors();
  }
  Process($fdiffs, $file);
}
Most excellent, i' faith! $fdiffs is a hash of the fields that would have gone to stdout. You still have the pesky stderr output, but you know what everything refers to. Only there's one thing wanting ...

Behold! It's still slow -- not as slow, since it now maintains the server connection, but nowhere near so fast as 'p4 diff files*' all at once. Fine. You can pass multiple filenames to the Run() command.
use P4;
$p4 = new P4;
$p4->Connect();
@fdiffs = $p4->Run('diff -sa', @files);
if ($p4->ErrorCount()) {
  print $p4->Errors();
}

for $file (@files) ...
Hark! Not only are you back to the problem of parsing stderr, but you also need to map @fdiffs back to @files in order to know which files were ignored.

This is incredible. The API returns an array of data-structures, but the size of the array does not match the size of the request. What would be so wrong with returning 'undef' to denote the missing files, and maybe '{}' for errors?

Other problems


I could go on and on about minor annoyances, but the problems above do me most insupportable vexation. They make p4 completely impractical, at least in many cases. Just beware. As the Bard wrote, perforce must whither and come to deadly use.

No comments:

Post a Comment