Code review comment for lp:~jameinel/bzr/2.1b1-pack-on-the-fly

Revision history for this message
John A Meinel (jameinel) wrote :

This adds 'pack-on-the-fly' support for gc streaming.

1) It restores 'groupcompress' sorting for the requested inventories and texts.
2) It uses a heuristic that is approximately:
  if a given block is less than 75% the size of a 'fully utilized' block, then don't re-use the
  content directly, but schedule it to be packed into a new block.
  The specifics are in '_LazyGroupContentManager.check_is_well_utilized()'
3) I did some real-world testing, and the results seem pretty good.
   To start with, the copy of bzr.dev on Launchpad is currently very poorly packed, taking up >90MB of disk space for a single pack file. After branching that using bzr.dev, I get a 101MB repository locally. If I 'bzr pack', I end up with 39MB (30MB in .pack, and 8.8MB in indices)

101MB poorly-packed-from-lp
101MB post 'bzr.dev branch new-repo' (takes 1m0s locally)
 39MB post 'bzr pack' (takes 2m0s locally)

I then tested the results of using the pack-on-the-fly
 41MB post 'bzr-pack branch new-repo' (takes 1m43s locally)
 41MB post 'bzr-pack branch new-repo new-repo2) (takes 1m0s)

Which means that pack-on-the-fly is working as we hoped it would. It
 a) Gives almost as good of pack results as if we had issued 'bzr pack'
 b) Takes a bit of extra time when the source is poorly packed (1m => 1m45s)
 c) Takes no extra time when the source is already properly packed (1m => 1m)

4) Unfortunately this was built on top of bzr.dev, but we can land it there, and then cherrypick it back to 2.0. I'll still submit a merge request for 2.0.

« Back to merge proposal