comex

klib

10 Apr 2014

klib is a medium-size example of the modern C generic library, a species of program I want very much to like. Among other features, it provides simple implementations of dynamic arrays, hash tables, B-trees, sorting, etc. which generate efficient code specialized for arbitrary types - the kind of thing that C++ templates make easy, but in C.

I want to like it because I don't like C++ and would like to be able to use high level functionality in C where it's useful, without the pointless performance loss of void */callback based solutions. The affinity is conditional, though, because I admit these things are very ugly [1]. Even mostly well-written C macros like klib's end up looking like this:

#define __KHASH_IMPL(name, SCOPE, khkey_t, khval_t, kh_is_map,    \
                     __hash_func, __hash_equal)                   \
    SCOPE kh_##name##_t *kh_init_##name(void) {                   \
        return (kh_##name##_t*)kcalloc(1, sizeof(kh_##name##_t)); \
    }                                                             \

Some (but not all) macros in klib will evaluate their arguments multiple times, causing hard-to-spot bugs if they have side effects. This is partially klib's fault (kvec.h could improve), but mostly C's for making it so easy to screw up.

I wanted to point out a more subtle issue I recently noticed. Some code I wrote which had been working fine on my OS X system (compiled using Clang) immediately started segfaulting when I compiled it on a Linux VM (using GCC). It was crashing on this line:

kh_value(map, kh_put(ccp, map, key, &_)) = val;

kh is short for khash, a hash map type. This is supposed to set "map[key]" to val:

  • kh_put returns an iterator for the given key, inserting if necessary; ccp is an arbitrary symbol representing the specialization (i.e. key/value types), and _ indicates whether the key was already present, which I don't care about. It's defined as

    #define kh_put(name, h, k, r) kh_put_##name(h, k, r)
    

    The ## concatenates symbols, so it evaluates to kh_put_ccp, which is a static function. At the top of the file, I had declared the specialization using

    KHASH_MAP_INIT_INT64(ccp, const char *)
    

    which expanded to a bunch of declarations, including functions for all the code-heavy operations on the map. The macro could have put the whole operation in the definition of kh_put, but using functions makes it easier to avoid multiple evaluation and enables code reuse if the function gets invoked multiple times.

  • kh_value just turns the iterator into a 'reference' to the value, like the * operator in C++. It's defined directly in the macro, since it's simple enough to not need a function, and has no multiple evaluation hazard:

    #define kh_value(h, x) ((h)->vals[x])
    

This being a hash table, an iterator is just defined as the index of a bucket, and in this implementation keys and values are stored separately. Altogether, so far, so good. The code I had to write is somewhat verbose just to put something in a map, but I can deal with verbosity; C code shouldn't need hash tables that often, and it's loads better than the common C practice of implementing a hash table manually, badly, for each use case. The nested function call - kh_put inside kh_value - looks fishy when macros are involved, but without multiple evaluation what could go wrong? Except it was crashing.

Well, if you expand the macro invocations, it looks like:

map->vals[kh_put_ccp(map, key, &_)]

The problem is that multiple evaluation isn't the only hazard of turning what looks like a function call into an expression with different semantics; ordering is another. If kh_value were a function call, the arguments would obviously be evaluated before the start of the function body, but in this case, there is no particular reason the kh_put_ccp call should be evaluated before map->vals, part of the "body" of kh_value!

In fact, this is undefined behavior in C. There is no sequence point between the two expressions, and the order of side-effecting expressions is only defined between sequence points. Sequence points include semicolons, &&, and the other operators you'd usually see in sane side-effecting code, so this rarely comes up in practice. Usually when I think about sequence points I think about snippets like i = i++, which is clearly insane, but here I was having accidentally written ambiguous code.

If the hash table gets too close to capacity, kh_put_ccp will resize it, which involves reallocating the keys and vals arrays; if map->vals was evaluated earlier, the code will write to a freed pointer. The version of Clang I was using on OS X, at the optimization level I was using, happened to emit assembly code that evaluated map->vals after kh_put_ccp, so I never noticed the problem. But GCC decided to evaluate it before, making my program promptly crash.

Incidentally, by undefined I meant undefined. In C, if any undefined behavior is invoked, the compiled program is allowed to do anything at all, including crash or make demons fly out of your nose. This is obviously true for wild writes that could mess up control flow, and is also exploited by optimizers to generate code that completely disregards the possibility of things like array overflow happening - speeding up correct code, but potentially causing incorrect code to think 2 < 2 (a great bug). In a related situation, evaluation of arguments to a function call, the C standard says the order is "unspecified", meaning that the compiler can pick any order but has to emit code consistent with one. But in this case, accessing the "prior value" of a value can only be done "to determine the value to be stored": i = i + 1 is OK even though there is no sequence point between the read and write of i, because the read determines the value to be stored, but no such thing applies with map->vals.

Therefore, reasoning about which expression the compiler decided to "evaluate first" isn't necessarily valid - if the function were inlined, the optimizer could have, say, decided that because of the read, any conditions leading to the write are impossible, and proceeded to remove the check for the hash table being full, perhaps causing a misleading infinite loop trying to find a free bucket that was supposed to be guaranteed to exist. No such thing actually happened; rather, I got a nice crash on the exact line with the bug. But I thought it was worth pointing out: optimizers are dangerous.

Anyway, the solution was just assigning the result of kh_put to a variable:

khiter_t it = kh_put(ccp, map, key, &_);
kh_value(map, it) = value;

Simple enough fix, and not a particularly hard bug, but an unusual one.

Oh, and C++ would have been completely immune, since there is no reason to use a syntactic macro rather than a semantic template specialization. But don't get me started about the weird C++ can get... maybe I should do another post about that.

[1] Naturally, some C++ STL implementations seem to have a goal of being as ugly as possible, backwards indents included, but C++ templates are certainly supposed to be prettier.

Is this thing on?

07 Apr 2014

So this is my new blog. I've had blogs before: one full of teenage angst I still have archived somewhere, one with five useless code snippets or images over as many years, and one repository on GitHub called "bloggy" with another code snippet. But I want to be a teeny bit more serious; mainly to practice writing, but also because there are some things I find interesting and would like to talk about, I want to come up with actual blog-length pieces of text on technical subjects.

However, I also intend to post (arbitrarily) short pieces of text which I do not feel like fully developing, but which are longer than a tweet, or don't fit neatly into a Hacker News or Reddit discussion. For tonight, I'll start with one of those.


It's not exactly for the faint of heart, to be clear. Bird's idea is that we can establish certain mathematical rules which allow us to transform "the simplest thing which could possibly be correct" to an efficient program through a series of steps, each one maintaining correctness as an invariant.

https://news.ycombinator.com/item?id=7537997

An interesting thing about this wording is that it still makes sense if you flip it around:

We can establish certain principles which allow us to transform "the simplest thing which could possibly do the job" to a complete program through a series of steps, each one maintaining performance as an invariant.

Of course, that completely misses the point of the original sentence to take a mathematical approach to evolving programs. But although I hadn't thought about it quite as such, this sounds like a reasonable approach to developing programs with performance as a priority. Start with the core algorithms, make them work, and make them fast, because optimization often involves heavy refactoring which is hard to do in large programs. Then add the rest of the functionality, with the ability to measure how much performance impact each step had rather than only getting numbers at the end.

I'd say it resembles the way I've done it, but most of the time I seem to end up with the core done and too bored to do any of the rest, so my experience should be taken with quite a bit of salt... still.

Incidentally, I've bought an e-book of the textbook in question and intend to read it soon.