comex

klib

10 Apr 2014

klib is a medium-size example of the modern C generic library, a species of program I want very much to like. Among other features, it provides simple implementations of dynamic arrays, hash tables, B-trees, sorting, etc. which generate efficient code specialized for arbitrary types - the kind of thing that C++ templates make easy, but in C.

I want to like it because I don't like C++ and would like to be able to use high level functionality in C where it's useful, without the pointless performance loss of void */callback based solutions. The affinity is conditional, though, because I admit these things are very ugly [1]. Even mostly well-written C macros like klib's end up looking like this:

#define __KHASH_IMPL(name, SCOPE, khkey_t, khval_t, kh_is_map,    \
                     __hash_func, __hash_equal)                   \
    SCOPE kh_##name##_t *kh_init_##name(void) {                   \
        return (kh_##name##_t*)kcalloc(1, sizeof(kh_##name##_t)); \
    }                                                             \

Some (but not all) macros in klib will evaluate their arguments multiple times, causing hard-to-spot bugs if they have side effects. This is partially klib's fault (kvec.h could improve), but mostly C's for making it so easy to screw up.

I wanted to point out a more subtle issue I recently noticed. Some code I wrote which had been working fine on my OS X system (compiled using Clang) immediately started segfaulting when I compiled it on a Linux VM (using GCC). It was crashing on this line:

kh_value(map, kh_put(ccp, map, key, &_)) = val;

kh is short for khash, a hash map type. This is supposed to set "map[key]" to val:

This being a hash table, an iterator is just defined as the index of a bucket, and in this implementation keys and values are stored separately. Altogether, so far, so good. The code I had to write is somewhat verbose just to put something in a map, but I can deal with verbosity; C code shouldn't need hash tables that often, and it's loads better than the common C practice of implementing a hash table manually, badly, for each use case. The nested function call - kh_put inside kh_value - looks fishy when macros are involved, but without multiple evaluation what could go wrong? Except it was crashing.

Well, if you expand the macro invocations, it looks like:

map->vals[kh_put_ccp(map, key, &_)]

The problem is that multiple evaluation isn't the only hazard of turning what looks like a function call into an expression with different semantics; ordering is another. If kh_value were a function call, the arguments would obviously be evaluated before the start of the function body, but in this case, there is no particular reason the kh_put_ccp call should be evaluated before map->vals, part of the "body" of kh_value!

In fact, this is undefined behavior in C. There is no sequence point between the two expressions, and the order of side-effecting expressions is only defined between sequence points. Sequence points include semicolons, &&, and the other operators you'd usually see in sane side-effecting code, so this rarely comes up in practice. Usually when I think about sequence points I think about snippets like i = i++, which is clearly insane, but here I was having accidentally written ambiguous code.

If the hash table gets too close to capacity, kh_put_ccp will resize it, which involves reallocating the keys and vals arrays; if map->vals was evaluated earlier, the code will write to a freed pointer. The version of Clang I was using on OS X, at the optimization level I was using, happened to emit assembly code that evaluated map->vals after kh_put_ccp, so I never noticed the problem. But GCC decided to evaluate it before, making my program promptly crash.

Incidentally, by undefined I meant undefined. In C, if any undefined behavior is invoked, the compiled program is allowed to do anything at all, including crash or make demons fly out of your nose. This is obviously true for wild writes that could mess up control flow, and is also exploited by optimizers to generate code that completely disregards the possibility of things like array overflow happening - speeding up correct code, but potentially causing incorrect code to think 2 < 2 (a great bug). In a related situation, evaluation of arguments to a function call, the C standard says the order is "unspecified", meaning that the compiler can pick any order but has to emit code consistent with one. But in this case, accessing the "prior value" of a value can only be done "to determine the value to be stored": i = i + 1 is OK even though there is no sequence point between the read and write of i, because the read determines the value to be stored, but no such thing applies with map->vals.

Therefore, reasoning about which expression the compiler decided to "evaluate first" isn't necessarily valid - if the function were inlined, the optimizer could have, say, decided that because of the read, any conditions leading to the write are impossible, and proceeded to remove the check for the hash table being full, perhaps causing a misleading infinite loop trying to find a free bucket that was supposed to be guaranteed to exist. No such thing actually happened; rather, I got a nice crash on the exact line with the bug. But I thought it was worth pointing out: optimizers are dangerous.

Anyway, the solution was just assigning the result of kh_put to a variable:

khiter_t it = kh_put(ccp, map, key, &_);
kh_value(map, it) = value;

Simple enough fix, and not a particularly hard bug, but an unusual one.

Oh, and C++ would have been completely immune, since there is no reason to use a syntactic macro rather than a semantic template specialization. But don't get me started about the weird C++ can get... maybe I should do another post about that.

[1] Naturally, some C++ STL implementations seem to have a goal of being as ugly as possible, backwards indents included, but C++ templates are certainly supposed to be prettier.