Merge lp:~jameinel/bzr/2.3-gcb-peak-mem into lp:bzr
Status: | Merged |
---|---|
Approved by: | Andrew Bennetts |
Approved revision: | no longer in the revision history of the source branch. |
Merged at revision: | 5451 |
Proposed branch: | lp:~jameinel/bzr/2.3-gcb-peak-mem |
Merge into: | lp:bzr |
Diff against target: |
289 lines (+99/-42) 4 files modified
NEWS (+3/-0) bzrlib/groupcompress.py (+69/-39) bzrlib/tests/per_lock/test_lock.py (+1/-1) bzrlib/tests/test_groupcompress.py (+26/-2) |
To merge this branch: | bzr merge lp:~jameinel/bzr/2.3-gcb-peak-mem |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Andrew Bennetts | Approve | ||
Review via email: mp+36200@code.launchpad.net |
Commit message
Add GroupCompressBl
Description of the change
This is just a small push towards one of my pet-peeves. Specifically "bzr commit" has a peak memory of 1 fulltext + 2 copies of the compressed text. This is happening at several layers in the stack, and this only fixes one of them. But it is a push in the right direction.
Basically, we have a pattern of:
header = xyz
content = (from some_source)
return ''.join([header, content])
That occurs in a few places. This changes GroupCompressBlock to just return:
return [header, content, lines]
So that higher up the stack we can then do:
outer_header = abc
chunks = [outer_header]
chunks.
Which creates a list, rather than copying all of the content.
To actually get this copying removed we need to:
1) Change _DirectPackAccess to have an api that can be passed the chunks of the content
2) Change Serializer.
3) Change how we write, to maybe issue multiple write requests. ATM, I'm thinking to use a
size check, so that if len(content) > 1MB, we issue multiple writes, otherwise we just
''.join([header, content])
Anyway, I'm pretty sure this is a step in the right direction, so I'd like to get it merged.
56 + @property
57 + def _z_content(self):
I guess the plan is to eventually remove this property? It would be good to have an explicit comment saying so.
Otherwise this looks good to me. I agree it makes sense to move to a chunks-based API, rather than one based on monolithic strs.