Merge lp:~jameinel/bzr/2.1-peak-mem-tweak into lp:bzr
Proposed by
John A Meinel
Status: | Merged |
---|---|
Approved by: | Ian Clatworthy |
Approved revision: | no longer in the source branch. |
Merged at revision: | not available |
Proposed branch: | lp:~jameinel/bzr/2.1-peak-mem-tweak |
Merge into: | lp:bzr |
Diff against target: |
279 lines 9 files modified
bzrlib/btree_index.py (+13/-0) bzrlib/groupcompress.py (+5/-0) bzrlib/index.py (+20/-0) bzrlib/repofmt/groupcompress_repo.py (+5/-0) bzrlib/tests/per_versionedfile.py (+4/-0) bzrlib/tests/test_btree_index.py (+27/-0) bzrlib/tests/test_groupcompress.py (+13/-1) bzrlib/tests/test_index.py (+35/-0) bzrlib/versionedfile.py (+7/-0) |
To merge this branch: | bzr merge lp:~jameinel/bzr/2.1-peak-mem-tweak |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Ian Clatworthy | Approve | ||
Review via email: mp+13582@code.launchpad.net |
To post a comment you must log in.
This patch probably gives the greatest peak memory reduction of the patches I've proposed so far.
The basic work is to just provide "clear_cache()" functions, and then have them called at appropriate times.
Namely, VersionedFiles. clear_cache( ) and BTreeGraphIndex .clear_ cache() are the actual points that matter. I then updated the GroupCompressSt reamSource( ) so that caches are cleared once we have streamed the content out of that VF.
(So we stream revisions, and then clear the cache, then inventories, etc.)
The basic observation is that all of the GCVF instances have a '50MB' LRU cache for their GroupCompressBl ocks. In reality, that 50MB is often 160MB do to 'counting errors'. I haven't tracked down the details there yet, but it doesn't really matter that the cache is oversized. What specifically matters is that once we are done reading all the inventories, we aren't going back and reading them again, but we keep the cached state around.
As for the effect. Branching 'launchpad' locally, I don't see a change in time (8m37s both times), but I see stream( )'
760MB bzr.dev@4756
551MB clear from_repository versioned files after 'get_record_
One thing that this could effect is the "get_stream_ for_missing_ keys()" . As it probably triggers the indexes to re-read some pages it may otherwise have in cache.
I *could* change the code to only clear the GroupCompressBlock cache. However, my numbers from earlier show that as:
652MB clear _group_cache
551MB clear _group_cache + index LeafNode cache
So clearing _group_cache is 110MB saved, clearing the index cache is another 100MB saved. (Give or take 10MB on each measurement, as they were done at different times.)
Note that I also tried adding 'self.to_ repo.X. clear_cache( )' to the StreamSink code, however that was
549MB clear source and target
And 2MB wasn't worth it.
The numbers for branching bzr.dev
345MB bzr.dev@4756
227MB clear both