Monday, August 30, 2010

Linux: How strace works

Here is a highly instructive article on the innards of strace, a Linux utility for tracing system calls in a process and its children.  The sample code turns on ptrace for the child before starting the child, so I am a little curious about how it works when I attach it to a process that is always running.

At a previous job, I saw crashes in vfork() when using strace on make.  I'm told that it should work, so I have to assume that it was caused by yet more memory bugs in the poorly written code at that company. Now I wish I'd saved the crashing example so that I could dive into it further.

Saturday, August 28, 2010

Bug in Blogspot: Spaces after periods

I just noticed a problem with the rendering at Blogspot.  It is traditional to add 2 spaces after a period when typing (for the sake of typesetting, an ancient art form) but if I do that here, line-wrapping may carry a space to the beginning of the next line.  I'll just use a single space from now on.

Friday, August 27, 2010

Python: Comprehension loop variables

Python3 adds the nonlocal keyword, and alters list comprehensions to make their variables behave more like those in generator comprehensions.
"... the loop control variables are no longer leaked into the surrounding scope."
 Consider the following:




from __future__ import print_function

global x
x = 7

def main():
    x = 5
    def foo():
        #nonlocal x
        print("x =", x)
        return x

    foo()
    print([x for x in range(2)])
    print([foo() for x in range(2)])
    print(x)
    foo()

main()
 Python2.7 prints:


x = 5
[0, 1]
x = 0
x = 1
[0, 1]
1
x = 1

Python3.1 prints:


x = 5
[0, 1]
x = 5
x = 5
[5, 5]
5
x = 5

In other words, in Python2.7 list comprehension loop variables are part of the current scope, while Python3.1 they hide the scope, but only within the comprehension.  There is not really a lexical scope inside the comprehension.  It's a subtle distinction.

"8 things I wish I knew before starting a business"

Here are lessons for start-ups, from Don Rainey, a general partner at Grotech Ventures.

Summary:

  1. Things take longer than you imagine.
  2. Items that succeed tend to do so quickly.
  3. People will let you down.
  4. Good employees are really hard to find.
  5. Your bad employees rarely quit.
  6. You will be lucky and unlucky.
  7. Avoid the myth and misery of sunk cost.
  8. Fill the pipe, always.
#7 seems to be the same as #2, but you can read the full article yourself.

Thursday, August 26, 2010

Giving F# another look

Here is an interesting comparison among Haskell, Ocaml, and F# hashtable implementations.

F# is starting to look very good.  It used to be only a research project at Microsoft, and much slower than Ocaml.  Now, it's part of Visual Studio 2010, and the speed is clearly competitive with Ocaml for some common operations.  The clear syntax might encourage adoption at Fortune 500 cos.

I guess I should learn F# along with the .NET API.  I can use Ocaml for MacOS work.  The languages are quite similar, afterall.

Wednesday, August 25, 2010

Ocaml: Improved polymorphism

Ocaml now supports polymorphism!  Well, it always did, somewhat, but with fatal caveats.  "Jumping through hoops,"  as Janestreet coders say, or relying on macros is not my idea of "high level".

Janestreet has provided an excellent description of these new features of Ocaml.  There is a very subtle distinction between polymorphic type annotations and explicit type parameters, regarding recursive functions, and I certainly could not explain it better.

This is big news for me.  As soon as the MacPorts installation is ready, I plan to switch to Ocaml as my go-to language, whereas I had been considering Haskell.  I'll still use Python, Ruby, Go, Perl, Bash, etc. for many tasks, of course.

Maximum argument length in Linux

From a 2004 Slashdot interview with Rob Pike:
I didn't use Unix at all, really, from about 1990 until 2002, when I joined Google. (I worked entirely on Plan 9, which I still believe does a pretty good job of solving those fundamental problems.) I was surprised when I came back to Unix how many of even the little things that were annoying in 1990 continue to annoy today. In 1975, when the argument vector had to live in a 512-byte-block, the 6th Edition system would often complain, 'arg list too long'. But today, when machines have gigabytes of memory, I still see that silly message far too often. The argument list is now limited somewhere north of 100K on the Linux machines I use at work, but come on people, dynamic memory allocation is a done deal!
Pike is referring to this problem, most common when a '*' wildcard expands to too many files.  For the examples, I would say that 3a/b might as well be a Perl/Python/Ruby script.  I would also add Example 2b:
% find -X $directory1 -name '*' -depth 1 -type f | xargs mv --target-directory=$directory2
(if --target-directory is available on mv) since xargs already holds the line-length well below ARG_MAX. Or just use the little-known plus sign in find:
% find $directory1 -name '*' -depth 1 -type f -exec mv {} $directory2 +
That might be the fastest solution.

Another idea, from Lyren Brown:
for f in *foo*; do echo $f; done | tar -T/dev/stdin -cf - | tar -C/dest/path -xvf -
Apparently, the latest Linux kernel finally removes any practical limit.

Monday, August 23, 2010

How much coverage is right for testing?

Here is an amusing parable on test coverage.

Ocaml in the real world (or at least Wall Street)

Here are two interesting lectures on Ocaml by a fellow who works at Janestreet, a Wall Street trading firm.

Janestreet has switched from the common development languages, like C# and Java, to almost solely Ocaml.  The reason is that an error can cost huge amounts of money.  They value the strong type safety of ML.  In fact, according to the second video, their main goals are:

  • Correctness
  • Agility
  • Performance

The higher the performance, the greater the importance of correctness, since mistakes can cost money more quickly.

The first video is advice on good practices in Ocaml coding.

  • Favor readers over writers.
  • Create uniform interfaces.
  • Make illegal states unrepresentable.
  • Code for exhaustiveness.
  • Open few modules.
  • Make common errors obvious.
  • Avoid boilerplate.
  • Avoid complex type hackery.
  • Don't be puritanical about purity.

Some of these may seem obviously beneficial to non-Ocaml coders, but the point is that, since you're using Ocaml for its ability to increase the likelihood of correctness elegantly, you should let Ocaml do its job.

The most important point, at about 11:00 minutes, is generally applicable to software engineering.  It is that there are two basic schemes of preferences in coding, one for code authors and one for code reviewers.  When there is a conflict, the speaker says, "Favor readers over writers."  He gives several reasons for that, the most important being that readers tend to eschew complexity.  "The enemy of correctness is complexity."

These are long, but worth hearing, since real world (i.e. profitable) experience with highly pure functional languages is rare.

Sunday, August 15, 2010

Errors in moving to 64-bit architecture

Here are some errors that one may encounter when compiling old code on a new, 64-bit architecture.  I have seen all but #7, and I have seen other errors similar to that one, involving how headers are included.  Many, fortunately, would be caught by examination of compiler warnings, but explicit casts will usually suppress those warnings.

The main advice seems to be: Let the compiler help you.

Thursday, August 12, 2010

More Perl strangeness: eval and $@

I just learned something dangerous about altering $@ within an eval block.  I guess these things are getting fixed in Perl6.  Not sure.  I just can't believe that people use Perl for more than minor tasks. Ruby, Python, Go, and many others are much better alternatives, requiring much less expertise to avoid disastrous side-effects.

On hiring bad programmers

Here is an interesting article, by Paul Graham, about Yahoo! but with this bit of wisdom:
Microsoft (back in the day), Google, and Facebook have all been obsessed with hiring the best programmers. Yahoo wasn't. They preferred good programmers to bad ones, but they didn't have the kind of single-minded, almost obnoxiously elitist focus on hiring the smartest people that the big winners have had. And when you consider how much competition there was for programmers when they were hiring, during the Bubble, it's not surprising that the quality of their programmers was uneven.

In technology, once you have bad programmers, you're doomed. I can't think of an instance where a company has sunk into technical mediocrity and recovered. Good programmers want to work with other good programmers. So once the quality of programmers at your company starts to drop, you enter a death spiral from which there is no recovery.
And this:
Probably the most impressive commitment I've heard to having a hacker-centric culture came from Mark Zuckerberg, when he spoke at Startup School in 2007. He said that in the early days Facebook made a point of hiring programmers even for jobs that would not ordinarily consist of programming, like HR and marketing.
Other thoughts:

Wednesday, August 11, 2010

bash: little-known alias trick

According to the bash manual:
If the last character of the alias value is a space or tab character, then the next command word following the alias is also checked for alias expansion.
By default, alias expansion is not performed in a non-interactive shell (unless 'shopt expand_aliases').  Besides, if you are using ssh, the account on the server might not have all the aliases you are used to.  The trick is to alias ssh with an extra space.

$ alias ll="ls -al"
$ alias s="ssh me@mymachine.com "
$ s ll
(long listing of all files...)

More on 'mk' (plan9)

The following is a valid mk-file (with basically the same syntax as a makefile):

% cat mkfile
MKSHELL=bash
dinosaur:
        for i in a b c; do
            echo $i;
            echo next...;
        done
MKSHELL=rc
target:
        for(i in a b c) {
            echo $i
            echo next...
        }
MKSHELL=./wrap-python
foo:
        import sys
        for i in range(3):
          sys.stdout.write('Whoa!\n')
MKSHELL=./wrap-perl
bar:
        use Cwd;
        for my $i (1..3) {
            print getcwd() . "\n";
        }

% cat wrap-perl

#!/bin/sh
echo $*
exec perl

(The wrappers are needed only because mk passes '-e' to the MKSHELL program.)

Not only does it make sh easier to use than with 'make', but it also allows the recipes to be written in any language.  (If you use SHELL in 'make', each line is a separate invocation.)  If you want, you can use multiple languages within the same mk-file.  You have to admit that's pretty cool.

As for rc, because of complications with readline and command-history, I can't yet advocate it for an interactive shell.  Oh, well.

Monday, August 9, 2010

Fun new language

http://marcuswest.in/read/fun-intro/

It's a new way of doing web development.   I will follow this to see where it goes.  One interesting thing: It uses PEG.js for its grammar.

Better than make?

As mentioned earlier, I have switched from bash to rc.  To avoid a problem with the Mac Ports version, I installed it from User Space, which also provides mk, a replacement for make.

There are many good reasons to prefer mk, but for me the strongest is to avoid vfork().  Here is the comment thread from reddit:
uriel says:

I suspect the 'rc' in MacPorts is an old re-implementation which has some serious flaws.
cdunn2001 says:
Can you be more specific? I got version 1.7.1. How can I know which implementation I got? The description says it's the Rakitzis re-implementation, but maybe it's been properly patched.
I will look into mk and Plan 9 from User Space. I am a fan of Go, though I ran into minor flaws in the installation process last time I installed it a few months ago (under RedHat Linux).
One problem I've had with traditional make is that it uses vfork() instead of fork() (for speed, I think) which breaks strace. This may be too technical a question for a reddit forum, but do you have any idea whether strace would work on mk?
I certainly will not switch to general scripting with rc. That seems like a giant step backward.
uriel says:

The flaws in Rakitzis's re-implementation are too fundamental to be fixed, some of the syntax is just incompatible with the real rc, and I don't think anyone is using it anymore (I certainly hope that nobody is using it, as it creates great confusion).
No idea why on earth any make or make-like program would use vfork(), mk certainly doesn't use it as there is no vfork in Plan 9. vfork as far as I can tell is just a hideous hack in systems where fork() is too badly broken and slow.
As for general scripting, rc is a huge step backwards perhaps compared to for example Perl, but it is a step backwards from a not very good direction and back to what made Unix awesome. The power of combining small specialized tools using pipes is amazing, specially when you have a shell that is simple enough not to get in the way while providing everything you need to glue things together.

Better than bash?

I avoided bash (the GNU extension of the Bourne shell) for a long time for one reason:
   prog1 |& prog2
I have always been annoyed by the bash equivalent:
  prog1 2>&1 | prog2
When I learned that a new version of bash allowed the |&, I made the switch and have been very happy.

Recently, I learned about something that could be better than bash: rc.  The docs for rc are very interesting and amply justify the switch.

Instead of switching immediately, I am moving gradually by putting
  SHELL=rc
into my makefiles.  So far, I am very happy with it.

There is one caveat, pointed out by uriel on reddit:
I suspect the 'rc' in MacPorts is an old re-implementation which has some serious flaws.
The original rc, plus all the great Plan 9 commands are available as part of Plan 9 from User Space which runs great on OS X (Russ Cox, the author, is also one of the main Google Go developers, and uses OS X as his main development platform, with p9p's acme of course ;)).