Sunday, July 10, 2011

Concurrency in node.js: Objects vs. Functions -- or maybe both!

Here is an excellent article on using node.js as a simple web-server. In particular, it talks about dependency injection (with a reference to Martin Fowler's article) to handle routing, and it includes an aside on "nouns vs. verbs". Kiessling's aside refers to Yegge's 2006 article, which shows why Java is so verbose. As Yegge says,
I've really come around to what Perl folks were telling me 8 or 9 years ago: "Dude, not everything is an object."
All the talk of imperative vs. functional code -- and message-passing vs. function-passing -- seems to miss the point: We need both objects and functions!

Functional code facilitates multiprocessing by reducing dependencies. For example:
function upload() {
  console.log("Request handler 'upload' was called.");
  return "Hello Upload";
}
Except for the log message, that is functional code, which might be part of a web-server. Maybe a URL like "http://foo.com/upload" would eventually lead to this function. A more complex version of it could produce a whole web-page.

At first, this seems pleasantly scalable, but looks are deceiving. Consider how it might be called:
function route(pathname) {
  console.log("About to route a request for " + pathname);
  if (pathname == "/upload") {
    return upload();
  } else {
    console.log("No request handler found for " + pathname);
    return "404 Not found";
  }
}

function onRequest(request, response) {
  var pathname = url.parse(request.url).pathname;
  console.log("Request for " + pathname + " received.");

  response.writeHead(200, {"Content-Type": "text/plain"});
  var content = route(pathname)
  response.write(content);
  response.end();
}
The problem is that the entire stack -- from onRequest() to route() to upload() -- may block the server. The author of node.js, Ryan Dahl, has given many talks in which he discusses the importance of non-blocking calls for the sake of concurrency. Here is an example that can be non-blocking:
function upload(response) {
  console.log("Request handler 'upload' was called.");
  response.writeHead(200, {"Content-Type": "text/plain"});
  response.write("Hello Upload");
  response.end();
}

function route(pathname, response) {
  console.log("About to route a request for " + pathname);
  if (pathname == '/upload') {
    upload(response);
  } else {
    console.log("No request handler found for " + pathname);
    response.writeHead(404, {"Content-Type": "text/plain"});
    response.write("404 Not found");
    response.end();
  }
}

function onRequest(request, response) {
  var pathname = url.parse(request.url).pathname;
  console.log("Request for " + pathname + " received.");
  route(pathname, response);
}
Notice that we are passing the response object from function to function. That is message-passing. The handler eventually writes directly into that object, rather than returning a string. Thus, the handler has side-effects. It is no longer functional code. But because it will be passed everything it needs, we can forget the call stack. Why is this an advantage? Because it allows us to use a cheap event loop. Let's suppose that upload() is a time-consuming operation:
function upload(response) {
  console.log("Request handler 'upload' was called.");

  exec("slow-operation", function (response, error, stdout, stderr) {
    response.writeHead(200, {"Content-Type": "text/plain"});
    response.write(stdout);
    response.end();
  });
}
The exec() call is slow, but exec() itself is non-blocking. When called, a sub-process starts, the function goes into the event queue, and upload() returns immediately. Thus, the system-call is executed concurrently with other operations. That's the essence of node.js.

To be clear, message-passing is not the main point. The response object could be stored temporarily in a closure, which is typical in JavaScript. (In fact, exec() in node.js does not actually allow response to be passed to its function.) We do not need a mutex lock on the response object because this is a single-threaded program, but even that's not the point. We could have multiple threads and lock the response object. It's in-memory, so response.write() is very fast. We create new events (with threads, processes, or whatever) only for slow (aka blocking) operations, and we assign call-backs to those events so that processing can be deferred. This is a paradigm which makes concurrency simple.

1 comment:

  1. Closures cannot be used to implement this paradigm in C++0x, because it has only downward funargs.

    http://en.wikipedia.org/wiki/Funarg_problem

    But the code as written could (basically) be written in C++, Java, etc.

    ReplyDelete