Wednesday, January 7, 2015

C++: STL Iterator confusion

Recently, folks at my last company got confused with STL iterators.
  1. They used boost::python to create an InputIterator, using stl_iterator.
  2. They used is_sorted() to check whether the underlying list was sorted (of course).
  3. They then looped over their iterator.
That was a mistake. An InputIterator is single-pass, so someone obviously forgot to read the documentation.

But this highlights a problem with STL. is_sorted() actually requires a ForwardIterator. At that link, you can see that a ForwardIterator might be "mutable", meaning that it can be written into. But is_sorted() takes a copy of an iterator, and it certainly does not write into it. There can be no issue of side-effects, right?

Wrong. You can pass an InputIterator if you want, at your own risk. is_sorted() might still return the correct result, but the original iterator (which was copied for the call to is_sorted()) might now be invalid.

So the question became whether it was safe to pass an Inputerator to is_sorted(). In general, the answer is "No". Look at representative source code:

template <class ForwardIterator>
  bool is_sorted (ForwardIterator first, ForwardIterator last)
{
  if (first==last) return true;
  ForwardIterator next = first;
  while (++next!=last) {
    if (*next<*first)     // or, if (comp(*next,*first)) for version (2)
      return false;
    ++first;
  }
  return true;
}

That function requires a copy of the original "first" iterator. "first" and "next" must both be valid during the entire function, and that is not guaranteed for an InputIterator.

However, the boost::python InputIterator is actually a memoized iterator. If you increment it, then the next value for any copy will no longer be the value after what the copy points to. But the copy remains valid for reference. So is_sorted() is fine, as long as no further iteration is performed later.

The problem is that STL is missing a concept between ForwardIterator and InputIterator. So is_sorted() is given the more restrictive designation.

Another way to look at this is that C++ templates are insufficient for practical iterators. The following is closer to the function we really want:

template <class ForwardIterator>
  bool is_sorted (ForwardIterator first, ForwardIterator last)
{
  if (first==last) return true;
  ForwardIterator next = first;
  auto prev_value = *first;
  while (++next!=last) {
    auto next_value = *next;
    if (next_value<prev_value);
      return false;
    prev_value = next_value;;
  }
  return true;
}

But that has problems too. Suddenly, the underlying type must be copy-constructible, and those copies could be expensive. We could use a pointer, but a pointer to what? If we don't make a copy, then there is nothing to point to.

We need an extra version of this algorithm which can safely accept InputIterators, with the implication that there will be copies of the underlying data. But it would need a different function name, and it would confuse everyone.

What we really need is the concept of "Iterator with invariant current element (unless underlying container is modified) but possibly variant subsequent Iterator". Lots of iterators actually work that way. But it would probably confuse people even more.