DunnDunnDunn: 2011

Friday, October 21, 2011

Linux utilities

I'll try to update this list over time...

bbcp -- alternative to rsync

Thursday, September 8, 2011

MySQL 4.1 for Windows performance problems

http://lists.mysql.com/mysql/202489

Excellent explanation.

Sunday, August 7, 2011

Git: What is the purpose of `git reset`?

Scott Chacon, the author of ProGit, also has a helpful blog, and this particular entry on "git reset" is absolutely invaluable. Also see this blogpost by Mark Dominus.

ShowOff: PowerPoint/Keynote alternative?

According to Scott Chacon (also the author of the ProGit book and a senior developer at GitHub) near the end of this interview, the following inequalities exist in the world of presentation slides for software developers:

PowerPoint < Keynote < ShowOff

Actually, the entire interview (of Scott Chacon by Werner Schuster on 06 September 2010 at the Scottish Ruby Conference in Edinburgh) is interesting, covering such topics as the use of Erlang, Redis, and memcached in the GitHub infrastructure.

Node.js, Redis, Pub-Sub, WebSockets in one brief example

http://howtonode.org/redis-pubsub

That's a quick way to learn a bunch of stuff.

... Well, as of 2014 that's pretty much out-of-date, but the discussion thread has useful links.

Sunday, July 24, 2011

Java: The worst part? Checked exceptions.

This guy hates checked exceptions in Java, with good reason. They are also a bad idea in C++. Bruce Eckel has chimed in as well.

A "checked" exception is one which is named in a "throws" annotation on a function.

void foo() throws MyException {}

Friday, July 15, 2011

Git: Why should I use git instead of Subversion, CVS, etc?

Since the announcement that GoogleCode now supports git, many people are wondering why it's preferable to Subversion or even CVS. Here is my opinion:

I saw part of an interesting [1] video in which a YUI dev claimed that her productivity went up after switching from svn to git. YMMV.

For me, the advantages are:

Distributed repositories
- At first, a central repo seems more appealing to a Project Manager, but eventually you may prefer the Integration Manager model which a DVCS facilitates. Also, a DVCS allows one to commit while offline.
Private branches
- Keep your dirty laundry to yourself. With svn, many devs avoid frequent commits for this reason.
Simpler branch-merging
- When it's easy, people do it.
Rebasing
- The "killer" feature of git. (Also available in Mercurial.) Lets you consolidate groups of commits and pretend that you did them all after the most recent update.
The .git directory
- Very unobtrusive, unlike CVS/ and .svn/. Perforce is even worse, requiring a specific directory for the check-out. With git/hg/bzr/etc., you can version-control any sub-directory in your filesystem at any time, very easily, without setting up a central repo. I sometimes run git init inside a working area for Subversion, for a one-day project. Remember: With Subversion you cannot hide your dirty laundry.
The "stash"
- Unique to git. Syntactic sugar for temporary branching.

"rerere" (reuse recorded resolution)
- Pure magic. Caches merge-conflict resolution, so you never have to resolve manually the identical conflict again.

The biggest advantage of git over mercurial is the [3] index, which is the genius of Linus Torvalds (at least to recognize the value). Otherwise, mercurial is very good and in some ways better.

And what's the biggest disadvantage of git? Large files can make it really slow. With default settings, it's for source-code only. If you want to store big files in git, try git-annex, which even allows the files to be stored on remotes such as rsync, the web (RESTfully), or Amazon S3. Also consider git-media. I wouldn't bother with git-bigfiles.

Wednesday, July 13, 2011

Linux: Memory usage with Exmap

Wow. I can read the output from top just fine, but this little utility is amazing.

Monday, July 11, 2011

C++0x: Does it have closures?

No. It has downward funargs, but not upward. More discussion is here.

This means that something in the style of node.js would be a bit more complicated in C++ (or Java, etc.) than in JavaScript/Perl/Python/Ruby/Lua/Go, etc.

HTTP: Truly a stateless protocol?

> Is HTTP stateful or Stateless? Also, it would be really great is you
> please do let me know where can I find more details regarding HTTP protocol?

Fundamentally, HTTP as a protocol is stateless. In general, though, a
stateless protocol can be made to act as if it were stateful, assuming
you've got help from the client. This happens by arranging for the
server to send the state (or some representative of the state) to the
client, and for the client to send it back again next time.

There are three ways this happens in HTTP. One is cookies, in which
case the state is sent and returned in HTTP headers. The second is URL
rewriting, in which case the state is sent as part of the response and
returned as part of the request URI. The third is hidden form fields,
in which the state is sent to the client as part of the response, and
returned to the server as part of a form's data (which can be in the
request URI or the POST body, depending on the form's method).

To learn more about HTTP as a protocol, see http://www.w3.org/Protocols/

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation

(I'll elaborate on this later... ~cdunn2001)

Parameterized tests

There was a reddit discussion on assert vs. UnitTest-style assert_equal etc.

I really have no preference between assert x == y and assert_equal(x, y), given pytest's helpful tracebacks.

My problem with pytest is the awkward support for parameterized tests. nose handles parameterized tests much better. (pytest handles the nose-style too, but that's hard to find in the docs.) This is an open issue for unittest2.

In a nutshell, when I test "purely functional" code (i.e. free of side-effects, referentially transparent) I want this:

To list inputs and correct outputs.
To apply each input to a specific function.
To consider each a separate testcase, so that they will all run even when one fails.
To learn which inputs failed, along with expected and actual results.

(ToDo: Write an example.)

GoogleTest (C++, not Python) has very good support for parameterized tests via TEST_P. However, GoogleTest does not allow TEST_P to be combined with TEST_F (parameterized within a fixture). That is something pytest allows, with a bit of work.

Sunday, July 10, 2011

Concurrency in node.js: Objects vs. Functions -- or maybe both!

Here is an excellent article on using node.js as a simple web-server. In particular, it talks about dependency injection (with a reference to Martin Fowler's article) to handle routing, and it includes an aside on "nouns vs. verbs". Kiessling's aside refers to Yegge's 2006 article, which shows why Java is so verbose. As Yegge says,

I've really come around to what Perl folks were telling me 8 or 9 years ago: "Dude, not everything is an object."

All the talk of imperative vs. functional code -- and message-passing vs. function-passing -- seems to miss the point: We need both objects and functions!

Functional code facilitates multiprocessing by reducing dependencies. For example:

function upload() {
  console.log("Request handler 'upload' was called.");
  return "Hello Upload";
}

Except for the log message, that is functional code, which might be part of a web-server. Maybe a URL like "http://foo.com/upload" would eventually lead to this function. A more complex version of it could produce a whole web-page.

At first, this seems pleasantly scalable, but looks are deceiving. Consider how it might be called:

function route(pathname) {
  console.log("About to route a request for " + pathname);
  if (pathname == "/upload") {
    return upload();
  } else {
    console.log("No request handler found for " + pathname);
    return "404 Not found";
  }
}

function onRequest(request, response) {
  var pathname = url.parse(request.url).pathname;
  console.log("Request for " + pathname + " received.");

  response.writeHead(200, {"Content-Type": "text/plain"});
  var content = route(pathname)
  response.write(content);
  response.end();
}

The problem is that the entire stack -- from onRequest() to route() to upload() -- may block the server. The author of node.js, Ryan Dahl, has given many talks in which he discusses the importance of non-blocking calls for the sake of concurrency. Here is an example that can be non-blocking:

function upload(response) {
  console.log("Request handler 'upload' was called.");
  response.writeHead(200, {"Content-Type": "text/plain"});
  response.write("Hello Upload");
  response.end();
}

function route(pathname, response) {
  console.log("About to route a request for " + pathname);
  if (pathname == '/upload') {
    upload(response);
  } else {
    console.log("No request handler found for " + pathname);
    response.writeHead(404, {"Content-Type": "text/plain"});
    response.write("404 Not found");
    response.end();
  }
}

function onRequest(request, response) {
  var pathname = url.parse(request.url).pathname;
  console.log("Request for " + pathname + " received.");
  route(pathname, response);
}

Notice that we are passing the response object from function to function. That is message-passing. The handler eventually writes directly into that object, rather than returning a string. Thus, the handler has side-effects. It is no longer functional code. But because it will be passed everything it needs, we can forget the call stack. Why is this an advantage? Because it allows us to use a cheap event loop. Let's suppose that upload() is a time-consuming operation:

function upload(response) {
  console.log("Request handler 'upload' was called.");

  exec("slow-operation", function (response, error, stdout, stderr) {
    response.writeHead(200, {"Content-Type": "text/plain"});
    response.write(stdout);
    response.end();
  });
}

The exec() call is slow, but exec() itself is non-blocking. When called, a sub-process starts, the function goes into the event queue, and upload() returns immediately. Thus, the system-call is executed concurrently with other operations. That's the essence of node.js.

To be clear, message-passing is not the main point. The response object could be stored temporarily in a closure, which is typical in JavaScript. (In fact, exec() in node.js does not actually allow response to be passed to its function.) We do not need a mutex lock on the response object because this is a single-threaded program, but even that's not the point. We could have multiple threads and lock the response object. It's in-memory, so response.write() is very fast. We create new events (with threads, processes, or whatever) only for slow (aka blocking) operations, and we assign call-backs to those events so that processing can be deferred. This is a paradigm which makes concurrency simple.

25 Most Dangerous Software Errors

FWIW, these are the 25 most dangerous software errors, according to CWE/SANS.

93.8% Improper Neutralization of Special Elements used in an SQL Command ('SQL Injection')
83.3% Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection')
79.0% Buffer Copy without Checking Size of Input ('Classic Buffer Overflow')
77.7% Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')
76.9% Missing Authentication for Critical Function
76.8% Missing Authorization
75.0% Use of Hard-coded Credentials
75.0% Missing Encryption of Sensitive Data
74.0% Unrestricted Upload of File with Dangerous Type
73.8% Reliance on Untrusted Inputs in a Security Decision
73.1% Execution with Unnecessary Privileges
70.1% Cross-Site Request Forgery (CSRF)
69.3% Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal')
68.5% Download of Code Without Integrity Check
67.8% Incorrect Authorization
66.0% Inclusion of Functionality from Untrusted Control Sphere
65.5% Incorrect Permission Assignment for Critical Resource
64.6% Use of Potentially Dangerous Function
64.1% Use of a Broken or Risky Cryptographic Algorithm
62.4% Incorrect Calculation of Buffer Size
61.5% Improper Restriction of Excessive Authentication Attempts
61.1% URL Redirection to Untrusted Site ('Open Redirect')
61.0% Uncontrolled Format String
60.3% Integer Overflow or Wraparound
59.9% Use of a One-Way Hash without a Salt

SQL: The value of NOT NULL

Most SQL resources teach only the language. It's hard to find blogs with useful advice, and even harder to find bloggers who back up their claims. Here is an excellent article:

"The NOT IN took over 5 times longer to execute and did thousands of times more reads."

That site has lots of interesting comparisons for various SQL queries.

Here are some other useful links:

Friday, July 8, 2011

GoLang: I've created a new 'brush' for SyntaxHighlighter

I created a SyntaxHighlighter file for Go. (To set up SyntaxHighlighter, refer to this.)

This example is from the Go website.

// Copyright 2009 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.

/*
   Write and http server to present pages in the file system, but
   transformed somehow. Substitutable? With Fibonacci program?
*/
package main

import (
 "bytes"
 "expvar"
 "flag"
 "fmt"
 "http"
 "io"
 "log"
 "os"
 "strconv"
)


// hello world, the web server
var helloRequests = expvar.NewInt("hello-requests")

func HelloServer(w http.ResponseWriter, req *http.Request) {
 helloRequests.Add(1)
 io.WriteString(w, "hello, world!\n")
}

// Simple counter server. POSTing to it will set the value.
type Counter struct {
 n int
}

// This makes Counter satisfy the expvar.Var interface, so we can export
// it directly.
func (ctr *Counter) String() string { return fmt.Sprintf("%d", ctr.n) }

func (ctr *Counter) ServeHTTP(w http.ResponseWriter, req *http.Request) {
 switch req.Method {
 case "GET":
  ctr.n++
 case "POST":
  buf := new(bytes.Buffer)
  io.Copy(buf, req.Body)
  body := buf.String()
  if n, err := strconv.Atoi(body); err != nil {
   fmt.Fprintf(w, "bad POST: %v\nbody: [%v]\n", err, body)
  } else {
   ctr.n = n
   fmt.Fprint(w, "counter reset\n")
  }
 }
 fmt.Fprintf(w, "counter = %d\n", ctr.n)
}

// simple flag server
var booleanflag = flag.Bool("boolean", true, "another flag for testing")

func FlagServer(w http.ResponseWriter, req *http.Request) {
 w.Header().Set("Content-Type", "text/plain; charset=utf-8")
 fmt.Fprint(w, "Flags:\n")
 flag.VisitAll(func(f *flag.Flag) {
  if f.Value.String() != f.DefValue {
   fmt.Fprintf(w, "%s = %s [default = %s]\n", f.Name, f.Value.String(), f.DefValue)
  } else {
   fmt.Fprintf(w, "%s = %s\n", f.Name, f.Value.String())
  }
 })
}

// simple argument server
func ArgServer(w http.ResponseWriter, req *http.Request) {
 for _, s := range os.Args {
  fmt.Fprint(w, s, " ")
 }
}

// a channel (just for the fun of it)
type Chan chan int

func ChanCreate() Chan {
 c := make(Chan)
 go func(c Chan) {
  for x := 0; ; x++ {
   c <- x
  }
 }(c)
 return c
}

func (ch Chan) ServeHTTP(w http.ResponseWriter, req *http.Request) {
 io.WriteString(w, fmt.Sprintf("channel send #%d\n", <-ch))
}

// exec a program, redirecting output
func DateServer(rw http.ResponseWriter, req *http.Request) {
 rw.Header().Set("Content-Type", "text/plain; charset=utf-8")
 r, w, err := os.Pipe()
 if err != nil {
  fmt.Fprintf(rw, "pipe: %s\n", err)
  return
 }

 p, err := os.StartProcess("/bin/date", []string{"date"}, &os.ProcAttr{Files: []*os.File{nil, w, w}})
 defer r.Close()
 w.Close()
 if err != nil {
  fmt.Fprintf(rw, "fork/exec: %s\n", err)
  return
 }
 defer p.Release()
 io.Copy(rw, r)
 wait, err := p.Wait(0)
 if err != nil {
  fmt.Fprintf(rw, "wait: %s\n", err)
  return
 }
 if !wait.Exited() || wait.ExitStatus() != 0 {
  fmt.Fprintf(rw, "date: %v\n", wait)
  return
 }
}

func Logger(w http.ResponseWriter, req *http.Request) {
 log.Print(req.URL.Raw)
 w.WriteHeader(404)
 w.Write([]byte("oops"))
}


var webroot = flag.String("root", "/home/rsc", "web root directory")

func main() {
 flag.Parse()

 // The counter is published as a variable directly.
 ctr := new(Counter)
 http.Handle("/counter", ctr)
 expvar.Publish("counter", ctr)

 http.Handle("/", http.HandlerFunc(Logger))
 http.Handle("/go/", http.StripPrefix("/go/", http.FileServer(http.Dir(*webroot))))
 http.Handle("/flags", http.HandlerFunc(FlagServer))
 http.Handle("/args", http.HandlerFunc(ArgServer))
 http.Handle("/go/hello", http.HandlerFunc(HelloServer))
 http.Handle("/chan", ChanCreate())
 http.Handle("/date", http.HandlerFunc(DateServer))
 err := http.ListenAndServe(":12345", nil)
 if err != nil {
  log.Panicln("ListenAndServe:", err)
 }
}

Embedding code with SyntaxHighlighter

Alex Gorbatchev's SyntaxHighlighter is the best thing around for adding syntax highlighting to code embedded into your blogs. Readers see line numbers but can easily cut-and-paste with or without those often pesky line numbers.

After you have all the necessary JavaScript and CSS files loaded into your web-page (see below for details on that) you have two choices for wrapping your source code. The "pre" method is necessary for CSS feeds, but the "script/CDATA" method handles embedded HTML tags without escaping them. The title is optional for both.

<script type="syntaxhighlighter" class="brush: js"><![CDATA[
  /**
   * SyntaxHighlighter
   */
  function foo()
  {
      convert("<body>Hello</body>");
      if (counter <= 10)
          return;
      // it works!
  }
]]></script>

Examples:

print "Hallo"

If you are using Google's Blogger (aka 'blogspot') as I am, you can prepare for syntax highlighting all blog posts via your Design template. Here are the steps:

Click 'Design' at the top of your blogspot page.
Click 'Edit HTML' at the top of that page.
Find the tag which ends the HEAD section, and before the end insert the following:

<head>
...
<link href='http://alexgorbatchev.com/pub/sh/current/styles/shThemeDefault.css' rel='stylesheet' type='text/css'/>
<link href='http://alexgorbatchev.com/pub/sh/current/styles/shCore.css' rel='stylesheet' type='text/css'/>
<script src='http://alexgorbatchev.com/pub/sh/current/scripts/shCore.js' type='text/javascript'/>
<script src='http://alexgorbatchev.com/pub/sh/current/scripts/shBrushJScript.js' type='text/javascript'/>
<script src='http://alexgorbatchev.com/pub/sh/current/scripts/shBrushPython.js' type='text/javascript'/>


<script type='text/javascript'>
  SyntaxHighlighter.config.bloggerMode = true;
  SyntaxHighlighter.all();
</script>

</head>

Add brushes for everything you might use, or try the new autoload feature of version 3.0. Then, you can use the pre or script/CDATA blocks as I've shown at the top of this page. If you need more help, follow these directions. Good luck!

Thursday, July 7, 2011

MVC Framework: Template-inversion

In many MVC frameworks, a template is used to generate web pages. For example, here is a template for Erb (the default template language for Ruby on Rails):

When I parse this using erb from the command-line, I get this:

That is usually done at runtime, within a Rails server.

Instead, I propose inverting the template prior to deployment, so that it becomes a Ruby file, like this:

For Ruby, there is no benefit. It's an extra step, and it's harder to debug. However, for a pre-compiled language like Go, the benefit is that the inverted template can be compiled and linked into the web application. All the code inside a template is fully type-checked before the app is deployed, which is both faster and safer than the current paradigm. It also means that the templates are compiled into the executable, rather than separate files, so deployment becomes simpler (assuming that static content is served by "the cloud", not by the application server).

Note that Go lacks an "eval" function, and rightly so. With template-inversion, "eval" is not needed.

A more realistic example in Go might result in something more like this:

Saturday, June 25, 2011

Version-control for the home directory dot-files

Lots of people revision-control the dotfiles in their home directories: .bashrc, .vim, etc. That works ok as long as you can ignore any files not controlled, and most VCSs allow that.

But what if you have several homedirs, and you want to maintain some common files between them. Of course, you also have files that differ. I think I've found an elegant solution: 2-tiered VCS.

Each homedir gets a git repo, which is pulled from one in Dropbox. (You'll see why I use Dropbox instead of GitHub in a minute.) I have a branch for each machine, so I can do some comparisons if I want. The master branch has only the common files, which can be used for seeding a new branch on a new machine.

What if I change a common file? I'd hate to have to merge it on each machine. I could forget easily, and that's a lot of work for every little change.

Instead, I keep the common files in cvs, also in Dropbox. Each local cvs workspace is also added to git. (That's not strictly necessary, but it makes setting up a new machine trivial.) When I change a common file, I just 'cvs commit' that file. On any machine, I can run 'cvs update' at any time.

One of the keys to this is the presence of 'CVS/Entries.Static' in the homedir. Otherwise, 'cvs update' could wreak havoc, as some common files are over-ridden on specific machines. (That's why a simpler solution does not work.) Cvs creates that file for you automatically if you 'cvs co' a single file. Otherwise, you can just 'touch CVS/Entries.Static', and remove unwanted files/directories from 'CVS/Entries'.

Another helpful thing is to commit a file called 'cvsignore' (no dot) into the CVSROOT directory (which is in the repo on Dropbox). It has just a single '*', which means to 'ignore everything not listed explicitly in CVS/Entries'. For sub-directories (e.g. .vim/), add a file called .cvsignore with just a single character, '!', to let cvs see all files there.

Also put '*' in '~/.gitignore', and add/commit that file. Henceforth, you will need 'git add -f' for any new files, but that's not really a bad thing.

The most difficult -- and dangerous -- part is setting up the local git repo. Normally, 'cd ~; git clone URL .' will set up a clone in the current directory, but that only works when the directory is empty. Instead, I came up with this sequence of steps:

git init
git remote add origin ~/Dropbox/homedir-repo
git fetch origin
git checkout -f -B mymachine origin/mymachine

Of course, the homedir-repo is 'bare', and the relevant branch was set-up safely in a different directory, with lots of testing. We don't want to destroy our homedir by accident!

So far, this is working extremely well for me, and I have not seen any better ideas out there.

This is helpful in ~/.git/config:

[gc]
auto = 0

That way, git will not pack stuff on Dropbox. Pushing to the remote repo will then only add files. Very little will change. (That's the problem with hosting CVS on Dropbox; files are edited or appended for every commit.)

Monday, April 18, 2011

Java: Are strings really immutable?

No, but a Security Manager can at least spot their mutations.

http://directwebremoting.org/blog/joe/2005/05/26/1117108773674.html

git: A decision on merging.

A lot of people don't really understand why branch-merging is an ambiguous operation. Here is a good explanation:

http://bramcohen.livejournal.com/74462.html

jQuery: "return false"

Back from vacation, I found this fascinating explanation of a common mis-use of jQuery:

http://fuelyourcoding.com/jquery-events-stop-misusing-return-false/

Monday, February 28, 2011

Facebook: The "Like" Button Just Changed

http://www.techi.com/2011/02/why-the-facebook-like-button-change-is-a-bait-and-switch/

Excuse the opinionated title of that link. The information is worth knowing.

Wednesday, February 23, 2011

Go: Race conditions

Race conditions should be impossible in Go. Am I wrong?

Here is a verbose article on the subject. I'm not sure I get the point of the author, but the comments are interesting.

Tuesday, February 15, 2011

Ruby: Following HTTP redirects

I'll update this with my Ruby code later. For now, read this:

How to control your HTTP transactions in Go

You might want to read his previous post also, on using Go for an HTTP throttler to simulate a low-bandwidth connection.

Monday, January 24, 2011

Napping and memories

http://www.cosmosmagazine.com/news/3982/memories-take-hold-better-during-sleep

Friday, January 21, 2011

HTTP: safety and idempotency

This blogpost has lots of useful links and a good example.

A big part of RESTfulness is mapping CRUD (Create/Read/Update/Destroy) to GET/POST/PUT/DELETE without violating idempotency (repeatability) or safety (no side-effects).

As pointed out here, using GET unsafely does not break anything, but I think the author misses the point. GET safety is a convention, which makes it easier for the rest of us to understand Web APIs. It's similar to conventions about identifying side-effects in software. The trouble is that most software engineers do not recognize the cognitive penalty of side-effects, so they do not see any reason to illuminate them.

Also, note the difference between idempotency and referential transparency. Technically, we should say referentially transparent, rather than idempotent, since the result of a GET (or PUT or DELETE) cannot be applied to itself. To me, that's a less important distinction than the tenor of the rule. And here is a lucid defense of using idempotent in the context of the web.

See also this StackOverflow discussion, especially the link to Roy Fielding's comment on REST and Cookies.

Thursday, January 6, 2011

jQuery: WTF?

Unbelievable.

I definitely prefer Prototype. Still, jQuery is much better than ASP. (Fortunately, the folks at MS recognized this, and VS MVC has supported jQuery since 2008.)

RoR: Nested routes

I learned something interesting enough that I want to keep a link to it. Unfortunately, I posted the answer a year after the question, so nobody will ever see it. Awwww.