Code review comment for lp:~jameinel/bzr/2.1-peak-mem-tweak

Revision history for this message
John A Meinel (jameinel) wrote :

This patch probably gives the greatest peak memory reduction of the patches I've proposed so far.

The basic work is to just provide "clear_cache()" functions, and then have them called at appropriate times.

Namely, VersionedFiles.clear_cache() and BTreeGraphIndex.clear_cache() are the actual points that matter. I then updated the GroupCompressStreamSource() so that caches are cleared once we have streamed the content out of that VF.

(So we stream revisions, and then clear the cache, then inventories, etc.)

The basic observation is that all of the GCVF instances have a '50MB' LRU cache for their GroupCompressBlocks. In reality, that 50MB is often 160MB do to 'counting errors'. I haven't tracked down the details there yet, but it doesn't really matter that the cache is oversized. What specifically matters is that once we are done reading all the inventories, we aren't going back and reading them again, but we keep the cached state around.

As for the effect. Branching 'launchpad' locally, I don't see a change in time (8m37s both times), but I see
  760MB bzr.dev@4756
  551MB clear from_repository versioned files after 'get_record_stream()'

One thing that this could effect is the "get_stream_for_missing_keys()". As it probably triggers the indexes to re-read some pages it may otherwise have in cache.

I *could* change the code to only clear the GroupCompressBlock cache. However, my numbers from earlier show that as:
  652MB clear _group_cache
  551MB clear _group_cache + index LeafNode cache

So clearing _group_cache is 110MB saved, clearing the index cache is another 100MB saved. (Give or take 10MB on each measurement, as they were done at different times.)

Note that I also tried adding 'self.to_repo.X.clear_cache()' to the StreamSink code, however that was
  549MB clear source and target

And 2MB wasn't worth it.

The numbers for branching bzr.dev
  345MB bzr.dev@4756
  227MB clear both

« Back to merge proposal