Thursday, June 18, 2015

Using C++11 in Travis CI

There are many strange solutions, but there is now a better way to do this.

    sudo: false
    language: cpp
    compiler:
      - gcc
      - clang
    install:
    - if [ "$CXX" = "g++" ]; then export CXX="g++-4.8" CC="gcc-4.8"; fi
    addons:
      apt:
        sources:
        - ubuntu-toolchain-r-test
        packages:
        - gcc-4.8
        - g++-4.8
        - clang

The explicit sudo: false will let it build in Docker (for speed) even if you have a pre-docker repo, according to Travis support.

Thanks to solarce at Travis support for noticing my error and fixing the docs.




Friday, May 8, 2015

Deceptively simple interview questions

Someone posted this blog to reddit:



Then, someone else posted this correction:


Embarrassingly, the original poster had stated:

I never said that you'll be hired if you know how to answer these problems, but I won't consider you if you can't.
Therefore, he would not hire himself!

The moral is not to assume that you know the answer to your own interview question. Be humble. Let the interview be a discussion. Maybe you will learn something.

Wednesday, April 29, 2015

git-fat for large files

GitHub recently added support for largefiles. If you want to share repos globally, that's fine. But for work within a corporate network, I like git-fat. It has few dependencies -- just plain python and rsync -- and caches files in /tmp. It's much simpler than alternatives like git-annex or git-lfs, which are better when you need the option to store files in S3, etc..

However, there is one problem with all these: They still deal with expensive checksums for many operations. This is partly to keep things simple -- letting git operate on files directly. But even just copying large files is slow, let alone checksumming. It's much faster to store URLs and to let the plugin update symlinks (or hardlinks) and handle caching. If you want a checksum, that can be encoded into the URL. Another advantage is that you can store whole directories, rather than individual files.

That is a plan I am working on with a friend. All large files would be read-only, unless explicitly "opened".

For now, git-fat works pretty well.

Sunday, February 15, 2015

Google C++ Style Guide

  • https://www.youtube.com/watch?v=NOCElcMcFik (Titus Winters at CppCon)
I am a big fan of this, mainly because I like to see side-effects clearly identified. There is an amazing amount of resistance. I don't think that the opposition can be moved, but maybe this reddit comment will at least convince managers of its value:

ericanderton 20 points  
My $0.02:
I've done a lot of work using this style guide religiously, largely because it was what my team came together over. It was also the most cogent guide we could find online that was more than merely prescriptive. It's exactness was the overall deciding factor.
I hated using it at first. Overall, the guide is very regressive, and chops the legs off of C++ such that it's not much more than "C with namespaces and classes." You could even go as far as to say that most arguments in the guide reduce to "this isn't a problem in Java or Python because you can't make this mistake, so don't use feature X here at all." Ultimately, it keeps you from doing anything that would allow bugs to creep in by mistake or by misinterpretation, by keeping it all nice and simple. And this is where this style guide actually helps.
The problem is that C++ provides almost too much leverage. Left unchecked, developers will likely use the language to its limits, which inevitably will confound other members of the team. While the program may be a masterpiece of template code, move semantics, and other concepts, it's now unmaintainable by anyone but the original author. Business wise, it's vastly preferable to have an uninspired piece of software if it means you can fix bugs while half your staff has the flu.
Also, consider that the goals of the Google C++ style guide align incredibly well with Go. To me, this is a very salient case against using C++ if that guide is at all a good fit for new development. Go has a very tight language spec, and a (IMO) superior concurrency model that is easier to construct, reason about, and debug. And it's still a compiled language.
Anyway, I'm proud to say that I have delivered excellent results, and relatively bug free code using this guide. If left to my own devices I probably would have used too much template magic and other mechanics that, while are all valid C++, would be harder to debug and understand by other members of my team. The resulting codebase is boring as all hell to read, but stable, reliable, and works incredibly well.
The downside is that compared to conventional languages, the result still takes a long time to compile, is twice as much code as is needed (header files), and relies heavily on smart pointers to manage memory (may as well use a GC). Again, this is why I mentioned using Go earlier.
One last thing: this guide does not stress the importance of "const correctness" in class construction. Add that to your work and you'll really have some solid code to rely upon.
tl;dr: For new development, either forgo this guide completely, or just use Go. Otherwise, you'll just piss off experienced C++ developers by using this thing.
Edit: I forgot to mention that the GSG has a massive blind spot for exception safety. Just because your code doesn't use exceptions, doesn't mean that the libraries you use don't throw. This includes the STL; the guide should steer you away from throwing variants of STL functions/methods, but it doesn't. So be on the lookout for coders not throwing try/catch around third party code, and refusing to use basic RAII (also not mentioned in the guide) to make things airtight. Either that, or just except that every exception-based runtime fault should halt the program, and that it's an acceptable failure mode (probably not).
For a longer discussion, see:



Wednesday, January 7, 2015

C++: STL Iterator confusion

Recently, folks at my last company got confused with STL iterators.
  1. They used boost::python to create an InputIterator, using stl_iterator.
  2. They used is_sorted() to check whether the underlying list was sorted (of course).
  3. They then looped over their iterator.
That was a mistake. An InputIterator is single-pass, so someone obviously forgot to read the documentation.

But this highlights a problem with STL. is_sorted() actually requires a ForwardIterator. At that link, you can see that a ForwardIterator might be "mutable", meaning that it can be written into. But is_sorted() takes a copy of an iterator, and it certainly does not write into it. There can be no issue of side-effects, right?

Wrong. You can pass an InputIterator if you want, at your own risk. is_sorted() might still return the correct result, but the original iterator (which was copied for the call to is_sorted()) might now be invalid.

So the question became whether it was safe to pass an Inputerator to is_sorted(). In general, the answer is "No". Look at representative source code:

template <class ForwardIterator>
  bool is_sorted (ForwardIterator first, ForwardIterator last)
{
  if (first==last) return true;
  ForwardIterator next = first;
  while (++next!=last) {
    if (*next<*first)     // or, if (comp(*next,*first)) for version (2)
      return false;
    ++first;
  }
  return true;
}

That function requires a copy of the original "first" iterator. "first" and "next" must both be valid during the entire function, and that is not guaranteed for an InputIterator.

However, the boost::python InputIterator is actually a memoized iterator. If you increment it, then the next value for any copy will no longer be the value after what the copy points to. But the copy remains valid for reference. So is_sorted() is fine, as long as no further iteration is performed later.

The problem is that STL is missing a concept between ForwardIterator and InputIterator. So is_sorted() is given the more restrictive designation.

Another way to look at this is that C++ templates are insufficient for practical iterators. The following is closer to the function we really want:

template <class ForwardIterator>
  bool is_sorted (ForwardIterator first, ForwardIterator last)
{
  if (first==last) return true;
  ForwardIterator next = first;
  auto prev_value = *first;
  while (++next!=last) {
    auto next_value = *next;
    if (next_value<prev_value);
      return false;
    prev_value = next_value;;
  }
  return true;
}

But that has problems too. Suddenly, the underlying type must be copy-constructible, and those copies could be expensive. We could use a pointer, but a pointer to what? If we don't make a copy, then there is nothing to point to.

We need an extra version of this algorithm which can safely accept InputIterators, with the implication that there will be copies of the underlying data. But it would need a different function name, and it would confuse everyone.

What we really need is the concept of "Iterator with invariant current element (unless underlying container is modified) but possibly variant subsequent Iterator". Lots of iterators actually work that way. But it would probably confuse people even more.