Wednesday, April 24, 2013

Mathematicians and jokes: Shlemiel the Painter

http://discuss.fogcreek.com/techInterview/default.asp?cmd=show&ixPost=153
The puzzle? Tell me the a set of correct numbers to replace 300/150/30 in this joke.
Joel Spolsky
Tuesday, December 11, 2001 
I'd go with 300, 125, and "a bit less then 100" yards. But then, the joke is not as funny!

Sunday, March 31, 2013

C: Computed gotos vs. switch statement

A fairly new feature of some C compilers is the "computed goto", which is around 20% faster than a traditional switch-statement.

http://eli.thegreenplace.net/2012/07/12/computed-goto-for-efficient-dispatch-tables/

That's a great description, with examples, benchmarks, and an explanation of why it's faster. I'm recording that link here because I often forget the *name* of this thing. If you don't know what you're looking for, it's hard to find.

Netty3 Tutorial

The main Netty author (normanmauer@) tweets, "Best #netty 3.x tutorial I saw so far out there."
  • http://seeallhearall.blogspot.de/2012/05/netty-tutorial-part-1-introduction-to.html
  • http://seeallhearall.blogspot.de/2012/06/netty-tutorial-part-15-on-channel.html

Saturday, March 23, 2013

JarJar Links


  • http://priyanka-tyagi.blogspot.com/2013/03/dealing-with-java-hell-using-jarjar.html
  • https://code.google.com/p/jarjar/
Despite my animadversion for his namesake, JarJar has become my guardian angel in my escape from Java jar hell.

In a nutshell, if you distribute Java libraries to others, (and if you do not work at Google), you cannot expect them to use the same versions of sub-libraries as you do. Hiding those sub-libraries in their own namespace is one way to solve the problem, at the cost of some memory and disk space.

Here is what JarJar does: It uses ASM to walk through .class files, changing their Java package namespace according to a pattern that you supply.

The hard part is to change all the self-references as well. ASM-3 used to have problems with Java generics, so this was only a partial solution to dependency hell. But JarJar uses ASM-4. In fact, it embeds its own copy of ASM-4 so you don't need to worry about finding a compatible version. (ASM-4 works fine with JDK-7.)

You could fairly say that JarJar uses ASM to JarJar ASM into JarJar! Here's how (inside JarJar's own Antfile):


     <target name="jar" depends="compile" description="Create Jar">
          <mkdir dir="dist"/>
          <jarjar jarfile="${jarfile}">
              <fileset dir="build/main"/>
              <zipfileset src="lib/asm-4.0.jar"/>
              <zipfileset src="lib/asm-commons-4.0.jar">
                  <include name="org/objectweb/asm/commons/Remap*.class"/>
                  <include name="org/objectweb/asm/commons/LocalVariablesSorter.class"/>
              </zipfileset>
              <keep pattern="com.tonicsystems.jarjar.JarJarTask"/>
              <rule pattern="com.tonicsystems.jarjar.util.**" result="com.tonicsystems.jarjar.ext_util.@1"/>
              <rule pattern="org.objectweb.asm.**" result="com.tonicsystems.jarjar.asm.@1"/>
              <manifest>
                  <attribute name="Main-Class" value="com.tonicsystems.jarjar.Main"/>
                  <attribute name="Implementation-Version" value="${version}"/>
              </manifest>
          </jarjar>
      </target>

Monday, June 18, 2012

Linux: ELF shared library versioning

Here is an amazingly helpful description of all aspects of shared library and symbol versioning for ELF libraries (on Linux, BSD, etc.):

  http://plan99.net/~mike/writing-shared-libraries.html

For info on @ and @@:

  http://www.trevorpounds.com/blog/?tag=symbol-versioning

Sunday, May 13, 2012

GCC demangling and stack traces

Tons of useful info:
http://www.acsu.buffalo.edu/~charngda/backtrace.html

In fact, it's so useful, I'm afraid to lose that page, so I've copied it here:

Call stack trace generation

Call stack trace ("backtrace") is very useful in debugging. Here are several ways to retrieve the backtrace in a user program.

(The contents are mostly from here and here)

Easiest approach: __builtin_return_address

GCC has a built-in function to retrieve call stack trace's addresses. For example
void do_backtrace()
{
    printf("Frame 0: PC=%p\n", __builtin_return_address(0));
    printf("Frame 1: PC=%p\n", __builtin_return_address(1));
    printf("Frame 2: PC=%p\n", __builtin_return_address(2));
    printf("Frame 3: PC=%p\n", __builtin_return_address(3));
}
__builtin_return_address(0) is always current function's address. On the other hand, __builtin_return_address(1), __builtin_return_address(2), ... may not be available on all platforms.

What to do with these addresses ?

Addresses can be mapped to the binary executable or dynamic link libraries. This is always doable even if the binary executable has been stripped off the symbols.

To see the mapping during runtime, parse the following plain-text file on the /proc file system:

/proc/self/maps
A utility called pmap can do the same.

If the address belongs to a DLL, it is possible to obtain the function name since DLLs are usually not stripped.

Addresses can be mapped to function names. Even if a binary executable is compiled without -g option, it still contains function names. To see the function names in the binary executable, do

nm -C -n a.out
To see the function names programmatically in the binary executable during run-time, read later paragraphs.

Addresses can be mapped to line numbers in source files. This extra information (in DWARF format) is added to the binary executable if it is compiled with -g option. To see line numbers in source files, do

objdump -WL a.out
objdump --dwarf=decodedline a.out
or even better:
addr2line -ifC a.out 0x123456
where 0x123456 is the address of interest. To see line numbers in source files programmatically during run-time, read later paragraphs.

Approach 2: backtrace

backtrace and backtrace_symbols are functions in Glibc. To use backtrace_symbols, one must compile the program with -rdynamic option.

One does not need to compile with -g option (but -rdynamic option cannot be used together with -static option) since backtrace_symbols cannot retrieve line number information. Actually, one can even strip off the symbols, and the backtrace_symbols will still work. This is because when -rdynamic is used, all global symbols are also stored in .dynsym section in the ELF-formatted executable binary, and this section cannot be stripped away. (To see the content of .dynsym section, use readelf -s a.out command, or readelf -p .dynstr a.out command.)

backtrace_symbols obtains symbol information from .dynsym section.

(The main purpose of .dynsym section is for dynamic link libraries to expose their symbols so the runtime linker ld.so can find them.)

Here is the sample program:

#include <execinfo.h>
void do_backtrace()
{
    #define BACKTRACE_SIZ 100
    void    *array[BACKTRACE_SIZ];
    size_t  size, i;
    char    **strings;

    size = backtrace(array, BACKTRACE_SIZ);
    strings = backtrace_symbols(array, size);

    for (i = 0; i < size; ++i) {
        printf("%p : %s\n", array[i], strings[i]);
    }

    free(strings);
}

For C++ programs, to get demangled names, use abi::__cxa_demangle (include the header cxxabi.h)

Approach 3: Improved backtrace

The backtrace_symbols in Glibc uses dladdr to obtain function names, but it cannot retrieve line numbers. Jeff Muizelaar has an improved version here which can do line numbers.

If the user program is compiled without any special command-line options, then one can obtain function names (of course, provided the binary executable is not stripped.) Better yet, -rdynamic compiler option is not needed.

If the user program is compiled with -g option, one can obtain both line numbers and function names.

Note that to compile Jeff Muizelaar's backtrace_symbols implementation, make sure the following two macros are defined and appears as the first two lines of a user program (they must precede before all #include ...):

#define __USE_GNU
#define _GNU_SOURCE
and one needs Binary File Descriptor (BFD) library, which is now part of GNU binutils when linking Jeff's code to the user program.

Approach 4: libunwind

libunwind does pretty much what the original backtrace/backtrace_symbols do. Its main purpose, however, is to unwind the stack programmatically (even more powerful than setjmp/longjmp pair) through unw_step and unw_resume calls. One can also peek and modify the saved register values on stack via unw_get_reg, unw_get_freg, unw_set_reg, and unw_set_freg calls.

If one just wants to retrieve the backtrace, use the following code:

#include <libunwind.h>
void do_backtrace()
{
    unw_cursor_t    cursor;
    unw_context_t   context;

    unw_getcontext(&context);
    unw_init_local(&cursor, &context);

    while (unw_step(&cursor) > 0) {
        unw_word_t  offset, pc;
        char        fname[64];

        unw_get_reg(&cursor, UNW_REG_IP, &pc);

        fname[0] = '\0';
        (void) unw_get_proc_name(&cursor, fname, sizeof(fname), &offset);

        printf ("%p : (%s+0x%x) [%p]\n", pc, fname, offset, pc);
    }
}

and linked the user program with -lunwind -lunwind-x86_64.

There is no need to compile the user program with -g option.

Wednesday, April 25, 2012

Blekko's NoSQL Database

Here is an interesting blog on the Blekko NoSQL DB. Greg Lindahl (CTO of blekko) mentions cost/value of Amazon AWS and details of the implementation. In particular, the "Combinator", which relies on commutative and associative operations.

http://highscalability.com/blog/2012/4/25/the-anatomy-of-search-technology-blekkos-nosql-database.html