Thursday, December 30, 2010

Ruby: Inconsistencies

This is a growing list:
  • String#concat
    • Should be String#concat!
    • I guess it's meant to resemble #insert, #delete, #fill, #replace, #clear, etc., but I wish those had "!" too, like #compact!, #reject!, #slice!, etc. There should be 2 versions of all these, but because there is no "!", the non-mutating versions can never be.
  • Hash#update is a synonym for Hash#merge!
    • I have to remember which one has the "!"
    • Hash#invert does *not* mutate. How do I remember all this?
    • Even worse, Set#merge (no "!") mutates, unlike Hash#merge.
  • Set#add? and Set#delete?
    • These mutate.
  • s.chomp!.upcase
    • Can fail, since String#chomp! can return nil
  • Array#fetch(i) (and Hash#fetch(k)) can raise IndexError
    • Should be Array#fetch!(i)
    • Block and multi-arg version could drop the "!"
  • String#each does not exist in Ruby1.9, while #each_char does not exist in 1.8
    • We cannot write forward-compatible code!
    • Soln: s.split("")
  • '0x10'.hex and '0x10'.oct are same, but
    • '010'.hex and '010'.oct are different
  • Given: a = [1,2,3]
    • a[2] == 3
    • a[3] == nil
    • a[4] == nil
    • a[2,1] == [3] (and a[2,2] == [3] as well)
    • but a[3,1] == []
    • while a[4,1] == nil
  • For Arrays
    • a[x,y] = [nil] substitutes the slice
    • a[x,y] = z     substitutes [z], but ...
    • a[x,y] = nil   deletes the slice! (fixed in 1.9)
  • inspect/to_s
    • There used to be a clear distinction, like Python's __repr__/__str__, but 1.9 often (though not in all cases) erases that useful distinction.
  • Array   Float   Integer   String 
    • Those are Kernel functions, not constants.
  • ...
Also see Ruby Best Practices, a great book.

Wednesday, December 29, 2010

Ruby: Introspection

Supposedly, Ruby has introspection, but some things are missing. For example:

# How to get the name of the current method?
# Add this snippet of code to your logic somewhere
 
module Kernel
  private
  # Defined in ruby 1.9
  unless defined?(__method__)
    def __method__
      caller[0] =~ /`([^']*)'/ and $1
    end
  end
end
In Python, this is not much better:

import tracebackdef asdf():
    (filename,line_number,function_name,text)=traceback.extract_stack()[-1]
    print function_name
asdf()

Update: __method__ is part of Ruby as of 1.8.7.

Here is something else rather awkward in Ruby:

# Print all modules (excluding classes)
puts Module.constants.sort.select {|x| eval(x.to_s).instance_of? Module}
In Python, we could simply do this:
import sys; print sys.modules.keys()
Ruby's introspection (also) reveals a lot about the structure of classes.

Tuesday, December 14, 2010

Web: Better URL navigation

This post made me realize that very few people are aware of a nice way to work with URLs for dynamic content.

Many sites use the hash (#) or shebang (#!) in their URLs for AJAX, and some end up breaking the *Back* button on your browser. The hash is important for letting Google index a link to a self-reloading page, but for pages that should not be indexed, there is a better way.
  •    http://foo.com/current.html
    • Contains a POST-method link to "/link".
  •    http://foo.com/link
    • On the server, the web framework interprets POST, computes new context, then redirects to the template "final.html", along with extra context.
  •    http://foo.com/final.html
    • This would be the result of the template substitution.
There are several benefits to this scheme:
  1. POST interpretation (the "controller") is separated from template substitution (the "view"). Most people instead use an if-clause in their controller.
  2. The web-designer can keep a simple redirect stub for the "link" page. That way, he can continue web-design in his static environment. He does not have to use a server or the intended framework.
  3. final.html is inherently secure, since none of the extra context is ever provided directly to that URL by the user.
The *Back* button works fine, because the `link` page was never rendered.

I'm not sure why this pattern is not better known. Maybe I am overlooking some advantages of the alternatives.

Thursday, December 9, 2010

Linux: Interesting, obscure commands


# First, the most important place for interesting commands:
http://www.commandlinefu.com/commands/browse/sort-by-votes


# Now, a bunch of cut-and-pasted stuff, from a thread...


# to fix the termincal
reset

# or try Ctrl-v Ctrl-o

Or try:
reset='echo "X[mX(BX)0OX[?5lX7X[rX8" | tr "XO" "\033\017" && /usr/bin/reset'
ESC [m (actually ESC [0m) Character Attributes: Normal (not bold f.i.)
ESC (B Select G0 Character Set: United States (USASCII)
ESC )0 Select G1 Character Set: Special Character and Line Drawing Set
O ( Ctrl-O ) Switch to Standard Character Set
ESC [?5l DEC Private Mode Reset: Normal Video
ESC 7 Save Cursor
ESC [r weird (actually 'ESC [0;0r' ? Set Scrolling Region [top;bottom] )
ESC 8 Restore Cursor

# to turn off display
xset dpms force off


# for virtual terminal
Personally, I think every Linux user should know how to use the virtual terminals. Just hit Ctrl+Alt+F1 and that should take you to a bash prompt. Usually the main one you're on with X running is F7, so you can switch back to that.
If X locks up on me, just a simple:
Ctrl+Alt+F1
login, and run
$ sudo /etc/init.d/gdm restart
Note that it could be gdm, kdm or xdm depending on your distro.
On RedHat or Ubuntu, you could instead:
$ sudo service gdm restart
 Or
$ invoke-rc.d gdm restart # for ubuntu/debian

# Others
GNU-screen (or tmux) is an excellent command (won't have to use nohup again), if you don't have it you should install it and try it.
If you're on a Red Hat based distro, yum and rpm are good to know. If it's Debian based, apt-get anddpkg for installing stuff.
pingtraceroute (or mtr --curses or nstat)ifconfig are all handy for networking stuff.
Look into htop its a much nicer version of top but you may need to enable additional 3rd party repositories if using yum or apt-get (or aptitude). Or nmon, or atop. And pgrep?

# More on screen:
screen (start a screen session)
screen -dr (detach said screen session and reattach it in current sess)
screen -ls (show active screen sessions)
screen -dr [screen session] (detach and resume a specific session)
# And for pair programming:
screen -S sessionname (start a session with a name) screen -x sessionname (attach the named session, even if it's attached elsewhere)
Those are essentially the only two I use, with the occasional "screen -ls". I much prefer -x over -r as you can attach in multiple places. So at home I always leave stuff running in screen and when I log in from work or where ever I can attach that same screen session without first detaching it form my terminal at home. Plus you could have two people working together in one "screen" which is good for pair programming. http://www.ibm.com/developerworks/aix/library/au-gnu_screen/
 # The magic SysRq key
https://secure.wikimedia.org/wikipedia/en/wiki/Magic_SysRq_key

# General stuff

Strings

  • grep
  • awk
  • uniq, sort, sort -n
  • seq
  • cut
  • wc

Files

  • rsync
  • lsof
  • find | xargs
  • locate
  • df -H
  • du -cks | sort -n
  • scp
  • strings
  • file
  • touch
  • z* (zgrep, zcat, etc)
  • tail -f, head

Administration

  • man
  • ps auxf (f only on GNU)
  • kill, -HUP, -9
  • sudo
  • screen
  • /etc/init.d/ scripts
  • id
  • ^Z, fg, jobs, &

Networking

  • nmap
  • dig
  • tcpdump
  • ifconfig

Operators

  • The knowledge that bash is a programming language that provides all your basic constructs (ifs, loops, variables, functions), but instead of having a library of functions, you execute simple programs instead
  • |
  • <, >, >>
  • - as stdin, e.g. "cat somefile.txt | vi -"
  • for i in a b c d; do echo $i; something_else $i; done
  • alias
  • All the goodies at http://samrowe.com/wordpress/advancing-in-the-bash-shell/

# And more

netstat -ano views your open TCP and UDP connections
netstat -tulp # what is listening on which port
# or lsof -i
top -b | grep processname # continuous info about a process, you have to Ctrl+C out of it though
nmap -sS -sV -O localhost # local listening ports and what versions of daemons are running.
# maybe -p 1-65535
xsel --clipboard --input # stdin to clipboard

# OSX
pbcopy # stdin to clipboard


diff this that | vim -
pgrep firef
watch sensors #?

ncdu # to find out where all space is being used
htop > ps # not a redirect

# For bigger programs
mocpalsamixerncduhtopemacsscreenfehacpidpkgconvert

diff -wyW160 this that | less  #compare side-by-side
diff -u this that >other       #write unified diff
patch 

Wednesday, December 8, 2010

JavaScript: Namespaces

http://javascriptweblog.wordpress.com/2010/12/07/namespacing-in-javascript/

The section on using this as a namespace proxy is brilliant. The origin of that idea is here (James Edwards).

Example:
var myApp = {};
(function() {
var id = 0;

this.next = function() {
return id++;
};

this.reset = function() {
id = 0;
}
}).apply(myApp)

window.console && console.log(
myApp.next(),
myApp.next(),
myApp.reset(),
myApp.next()
) //0, 1, undefined, 0
or more powerfully

var subsys1 = {}, subsys2 = {};
var nextIdMod = function(startId) {
var id = startId || 0;

this.next = function() {
return id++;
};

this.reset = function() {
id = 0;
}
};

nextIdMod.call(subsys1);
nextIdMod.call(subsys2,1000);

window.console && console.log(
subsys1.next(),
subsys1.next(),
subsys2.next(),
subsys1.reset(),
subsys2.next(),
subsys1.next()
) //0, 1, 1000, undefined, 1001, 0

Monday, December 6, 2010

C/flex: reentrant lexical analysis

There is a great deal of confusion on how to use flex/bison (lex/yacc) with reentrancy. The biggest reason, as shown here, is that flex and bison deal with reentrancy differently.

To clear some of that up, I posted an Answer at Stackoverflow. With reentrant flex, I recommend using Lemon Parser, rather than bison.

Note that flex scanners are not "as reentrant" as lex scanners. However, if bison is avoided, then the flex C++ scanner (also reentrant) is probably a good alternative to %option reentrant, maybe with a small speed penalty.

Sunday, November 21, 2010

Ruby: Whitespace significant?

Many people claim that they do not like Python because of the significance of whitespace. Well, whitespace is also significant in Ruby. E.g.
>> def say
>>  puts 'hi'
>> end
=> nil
>> say
hi
=> nil
>> def say puts 'hi' end
SyntaxError: (irb):5: syntax error, unexpected tSTRING_BEG, expecting ';' or '\n'
def say puts 'hi' end
              ^
(irb):5: syntax error, unexpected keyword_end, expecting $end
        from /Users/cdunn2001/bin/irb:12:in `
' >> def say; puts 'hi'; end => nil >> say hi => nil
See? The newline is a substitute for the semicolon, not equivalent to a space or tab. This is not pedantry. To me, the main value of whitespace-independence is that you can insert the code into something else -- e.g. an HTML template -- without breaking it.

I wouldn't mind if I could use curly brackets instead of 'end', but that works only for blocks, not function definitions. So I wish that Ruby fans would quit bragging that their language is whitespace-independent. There are such languages, e.g. Perl, where whitespace merely delimits tokens and can be removed completed by relying on parentheses and other delimiters. Ruby is not one of them.

I do understand the objection to Python's syntax. It's not the enforced indentation; it's the lack of an 'end' delimiter. The result is that copy-and-paste operations can introduce mistakes. I get that.

I'm just sayin' ...

Monday, November 15, 2010

StartupWeekend - We won! (Judge's prize)

My team won the judge's prize at StartupWeekend Seattle, Nov. 14 2010. It's more than just a software event, but software was a large part of it. Here is a good summary of our experience, from one of our team members. Including the votes from other teams, we came in 2nd overall. I guess that's like winning the coaches' poll but losing the BCS.

Friday, November 12, 2010

Git: Why?

Here is a funny and informative blog on problems with git, and why the benefits outweigh them. I liked this part the best:
... by far the best justification I’ve ever seen for git rebase (or git lie, as I prefer to call it).
Anyway, the comments are where the real information can be found.

Wednesday, November 10, 2010

Lots of ideas on NULL

There was quite a discussion on this at StackOverflow. The arguments are often language-specific, but basically folks like a Null option as long as it's part of the type system.

Friday, November 5, 2010

Objective-C: Slow?

Here is an interesting discussion of speed problems in Objective-C. I guess the bottom line is that it's generally only a little slow, not too bad, but some standard libraries are terrible. I assume that applies to iPhone apps.

Thursday, October 28, 2010

C#: Async on the way!

Easy asynchronous communication, coming in C# 5:

I am pleased to announce that there will be a C# 5.0 (*), and that in C# 5.0 you’ll be able to take this synchronous code:
void ArchiveDocuments(List urls)
{
  for(int i = 0; i < urls.Count; ++i)
    Archive(Fetch(urls[i]));
}
and, given reasonable implementations of the FetchAsync and ArchiveAsync methods, transform it into this code to achieve the goal of sharing wait times as described yesterday:
async void ArchiveDocuments(List urls)
{
  Task archive = null;
  for(int i = 0; i < urls.Count; ++i)
  {
    var document = await FetchAsync(urls[i]);
    if (archive != null)
      await archive;
    archive = ArchiveAsync(document);
  }
}
Where is the state machine code, the lambdas, the continuations, the checks to see if the task is already complete? They’re all still there. Let the compiler generate all that stuff for you

Haskell/Python: Calling Python from Haskell

Interesting. I don't know what else to say. Embedding Python into Haskell? I can't imagine this is useful, but maybe for someone.

Wednesday, October 27, 2010

Haskell: Finger Trees

The Finger Tree is a way to implement arrays in a functional language. This description for a Haskell version is instructive.

(The example in my previous Haskell post happened to use Finger Trees.)

Tuesday, October 26, 2010

Haskell: Complete example program

Whoa! Someone has written an excellent, didactic description of a complete Haskell program. While "Learn You A Haskell for Great Good" and the Haskell Wikibook are still great ways to learn the language, this new walk-thru helps to put all the pieces together. Highly recommend.

Monday, October 25, 2010

Java/C#: Generics

Java and C# generics are quite similar, but the implementations are very different. Here are some useful links for the curious:

Summary:
In short, all that Java generics permit is greater type safety with no new capabilities, with an implementation that permits blatant violation of the type system with nothing more than warnings
Furthermore, C#/.NET convey additional performance benefits due to the lack of required casts (as the verifier ensures everything is kosher) and support for value types (Java generics don't work with the builtin types like int), thus removing the overhead of boxing, and C# permits faster, more elegant, more understandable, and more maintainable code.

Friday, October 22, 2010

Ruby: Parsing Expression Grammars

Ruby's syntactic flexibility often makes it a convenient choice for a DSL (Domain-Specific Language). Treetop is a nice way to specify a PEG (Parsing Expression Grammar) in Ruby. Here is an example.

In the example, note that the SexpParser class is defined magically via Treetop.Load(). (See Instantiating and using parsers.) I do not like such magic, but it is a common thing amongst Ruby coders.

Tuesday, October 19, 2010

Win32: Beware string-length restrictions

Some Win32 functions impose constraints on the lengths of strings. E.g. UNICODE_STRING. Here is a nasty example:

http://blogs.msdn.com/b/larryosterman/archive/2010/10/19/because-if-you-do_2c00_-stuff-doesn_2700_t-work-the-way-you-intended_2e00_.aspx

Tuesday, October 5, 2010

JavaScript: An idea for closures

Here someone advocates the Thrush Combinator (let) in lieu of anonymous functions in JavaScript. Interesting.

A beautiful memory. (Also, shifting arrays in place.)

Here is a wonderful epithet for a deceased coder, written as a description of efficient array-shifting.

In case this is ever deleted, here is the quote from Shakespeare, sonnet LX:
Like as the waves make towards the pebbled shore,
So do our minutes hasten to their end,
Each changing place with that which goes before
In sequent toil all forwards do contend.

Monday, October 4, 2010

Python: List of running processes (cross-platform)

On Linux, we have lsof. On Windows, things are much more complicated. Fortunately, there is a cross-platform Python module which can find information about all running processes, psutil.

UNIX/Python: sockets, select, and poll

Doug Hellman has provided a very readable description of how to use sockets, in Python.

Note: Most of this will not work with Windows.

Wednesday, September 29, 2010

Eff: Like Haskell, but simpler than Monads

I may someday learn yet another interesting language, Eff. It is like Python + Ocaml/Haskell in syntax, but it really has nothing to do with Python. Interestingly, it's written in Ocaml. It's not ready for primetime, but could be preferable to Haskell someday, though I do not yet see the value.

The main difference with Haskell is the Effect, rather than the Monad.

Linux Multi-Threaded "Swap Insanity"

Here is a helpful description of a problem that can be encountered on Linux using the NUMA (Non-Uniform Memory Architecture), often seen with MySQL.

In a nutshell, the machine may bog down with memory-swapping when total memory use is far below what is available, because the default policy is for a processor to prefer the memory in its own node even when swapping occurs.

The simplest solution is to use numactl --interleave.

Thursday, September 23, 2010

Eigen: Wrapper for SIMD instructions

From @Rikrd at http://blog.wolfire.com/2010/09/SIMD-optimization:
If you make extensive use of linear algebra (matrices, vectors, etc.) you may consider using Eigen. It is a header only library and it implements quite a few linear algebra operations in a very intuitive API and it uses whatever backend is available (simple loop unrolling, vectorizing with SSE/SSE2/SSE3..., Altivec, etc...). It may save you a lot of vectorization work... and it is just really fun to write stuff using Eigen.

Parsers and grammars

A friend expressed an interest in an improved version of yacc. I think this fits the bill:
    http://en.wikipedia.org/wiki/Lemon_Parser
    http://hwaci.com/sw/lemon/lemon.html

It's always re-entrant and thread-safe, with some other advantages and all the best lessons of
yacc (e.g. error-handling).

Anyway, I'm nearly sold on PEG (Parsing Expression Grammars).
    http://en.wikipedia.org/wiki/Parsing_expression_grammar
    http://piumarta.com/software/peg/peg.1.html

Unlike LR parsers, a PEG parser does not require a separate tokenizer.  It's very elegant, much like BNF.  I want to do some runtime comparisons, but I expect it to be comparable.  I will try to attach a pegleg grammar that I wrote to solve this (with the added constraint of left-associativity):
    https://www.spoj.pl/problems/ONP/
Just install peg/leg, run 'leg' on the attached file (if I ever manage to post it to this blog), and compile.

Here is an interesting table:
    http://en.wikipedia.org/wiki/Comparison_of_parser_generators

To learn a ton about PEG:
    http://pdos.csail.mit.edu/~baford/packrat/

In a nutshell, a Context-Free Grammar parser like
yacc is meant for natural language, based on Chomsky's work, not for an unambiguous computer language.  PEG simplifies the parser by eliminating ambiguity in the grammar.

Functional data-structures (F#, ML, Haskell)

A good way to learn Standard ML (which is very similar to F#) is to read Chris Okasaki's book, Purely Functional Structures, which uses ML for examples in the main text.

Okasaki has many interesting ideas on functional programming. For example, among other interesting articles here, I enjoyed "Alternatives to Two Classic Data Structures".

There is also source code next to the PDF, in both Java and Ada. However, the deletion routine for red-black trees is missing from the source code. I mention that because, as Randall Helzerman points out in an Amazon comment on Okasaki's book,
Although he presents an EXTREMELY lucid description of how to implement Red-Black trees in a functional language, he only presented algorithms for insertion and querying. Of course, deletion from a red-black tree is the hardest part, left here, I suppose, as an exercise to the student. If you want to supply this missing piece yourself, check out a paper by Stefan Kars, "Red-black trees with types", J. Functional Programming 11(4):425-432, July, 2001. It presents deletion routines, but you'll still want to read Okasaki's book first, for unless you're very much smarter than me you won't be able to understand Kars' paper until you read Okasaki's exposition of red black trees.
Unfortunately, I do not have access to that Journal. I'm hoping to get a PDF from a friend.... Here it is. Warning: The code it shows, including for deletion, uses existential types.

Functional data-structures are interesting for many reasons, not the least of which is their value in persistent data storage, such as in a system that was used when I was at AMD. Here is an overview of current knowledge of such data-structures.