Code review comment for lp:~mnordhoff/loggerhead/statictuples

Revision history for this message
Matt Nordhoff (mnordhoff) wrote :

This totally isn't ready to merge (though it does work 100%, AFAICT), but I wanted to bring it to your attention.

This branch changes Loggerhead to make use of StaticTuples, jam's recent addition to bzrlib. (If you don't know, they're very similar to tuples, but they are faster and save memory by only holding certain types of objects that can't get in reference cycles, so they can avoid the GC overhead.)

I'm bundling a copy of the Python version of StaticTuple for compatibility with older bzr versions. This has a performance cost, but hopefully it's minor, and worth it.

I don't really have information on how much of a savings this is; I haven't been running it long enough for memory usage to stabilize. But I figure it should reduce the number of tuples by 50-75%, which is 100,000-300,000 for me.

BTW, I fixed a missing import since the last time Launchpad mirrored this branch.

Also, you need to run it against lp:~jameinel/bzr/2.1-st-concat, which has a proposal up for merging to bzr.dev.

What's left to do:

1.) It's not possible to marshal StaticTuples, so RevInfoDiskCache currently uses listcomps to convert to/from regular tuples when serializing/deserializing. This probably has a performance impact, but all I cared about at the time was getting it working.

    It's not possible to pickle them either, so we can't switch to that. Besides, pickling uses more disk space than marshal. (As a quick test, I took a random row from revinfo.sql. Without changing from tuple to StaticTuple, changing it from marshal+zlib to pickle+zlib (using pickle protocol version 2) went from 39 KB to 54 KB.)

2.) Currently I work around StaticTuple.from_sequence() only supporting sequences, not generators. I'll poke jam about that next time I see him.

3.) There weren't very many of them, but I saw some tuples like ('replace', 0, 1, 0, 1) in memory, and want to StaticTupleify them too, but don't know where they come from.

4.) I convert the results of iter_ancestry to StaticTuples. I should probably send a patch to bzr.dev, or maybe just leave them as normal tuples, to save the cost of converting them (though it's probably very minor).

4.) Maybe some things should be interned? I dunno.

5.) Some of the things I StaticTupled probably have a negligible impact on memory usage, but hey, why not?

6.) After this, the only other interesting tuples I see in Dozer are related to Subversion exceptions (bug #436406) and a few like ('sha1:...',), which I haven't investigated yet. Plus a couple ('$some_revid', [$graph_cache]).

7.) It might be possible for some of Loggerhead's many lists to be turned into StaticTuples. In particular, the graph cache is a list of lists of 3 tuples. (They actually are sometimes modified where they're built, wholehistory.compute_whole_history_data(), but it may be worth the inconvenience of turning them into StaticTuples anyway.)

« Back to merge proposal