Merge lp:~parthm/bzr/538868-message-for-heavy-checkout into lp:bzr

Proposed by Parth Malwankar
Status: Superseded
Proposed branch: lp:~parthm/bzr/538868-message-for-heavy-checkout
Merge into: lp:bzr
Diff against target: 336 lines (+139/-23)
8 files modified
NEWS (+3/-3)
bzrlib/builtins.py (+0/-5)
bzrlib/recordcounter.py (+65/-0)
bzrlib/remote.py (+2/-1)
bzrlib/repofmt/groupcompress_repo.py (+23/-4)
bzrlib/repository.py (+6/-4)
bzrlib/smart/repository.py (+40/-4)
bzrlib/tests/blackbox/test_checkout.py (+0/-2)
To merge this branch: bzr merge lp:~parthm/bzr/538868-message-for-heavy-checkout
Reviewer Review Type Date Requested Status
John A Meinel Needs Resubmitting
Martin Pool 2nd review Needs Information
Vincent Ladeuil Approve
Gary van der Merwe Approve
Review via email: mp+24483@code.launchpad.net

This proposal has been superseded by a proposal from 2010-05-14.

Commit message

(parthm) heavyweight checkout now indicates that history is being copied.

Description of the change

=== Fixes Bug #538868 ===
For heavyweight checkout show a message showing that history is being copied and it may take some time.

Sample output:

[tmp]% ~/src/bzr.dev/538868-message-for-heavy-checkout/bzr --no-plugins checkout ~/src/bzr.dev/trunk foobar
Copying history to "foobar". This may take some time.
bzr: interrupted
[tmp]% ~/src/bzr.dev/538868-message-for-heavy-checkout/bzr --no-plugins checkout ~/src/bzr.dev/trunk
Copying history to "trunk". This may take some time.
bzr: interrupted

The only ugliness I see is in the off case that to_location already exists. In this case the output is:

[tmp]% ~/src/bzr.dev/538868-message-for-heavy-checkout/bzr --no-plugins checkout ~/src/bzr.dev/trunk
Copying history to "trunk". This may take some time.
bzr: ERROR: File exists: u'/home/parthm/tmp/trunk/.bzr': [Errno 17] File exists: '/home/parthm/tmp/trunk/.bzr'

It would be ideal if the "copying history" message is not shown. I suppose thats not too bad though. I had a early failure fix for this but haven't put it in considering that bzr works across multiple transports.

+ # Fail early if to_location/.bzr exists. We don't want to
+ # give a message "Copying history ..." and then fail
+ # saying to_location/.bzr exists.
+ to_loc_bzr = osutils.joinpath([to_location, '.bzr'])
+ if osutils.lexists(to_loc_bzr):
+ raise errors.BzrCommandError('"%s" exists.' % to_loc_bzr)
+

To post a comment you must log in.
Revision history for this message
Martin Pool (mbp) wrote :

Thanks, this is a very nice bug to fix.

I would prefer the message came out through trace or the ui factory
than directly to self.outf, because that will make it easier to
refactor out of the cmd implementation, and it's more likely to
automatically respect --quiet. You might then be able to test more
cleanly through TestUIFactory.

Revision history for this message
Gary van der Merwe (garyvdm) :
review: Approve
Revision history for this message
Vincent Ladeuil (vila) wrote :

Apart from the message tweaks mentioned on IRC, that's good to land !

review: Approve
Revision history for this message
Robert Collins (lifeless) wrote :

I realise this has gone through, so I'd like to just request some more
stuff if you have time; if not please file a bug.

The message will show up when doing a heavy checking in a repository;
that's just annoying - no history is being copied, so no message
should appear. Recommended fix: move the notification into the core,
out of builtins.py.

Secondly, if its worth telling people we're copying [a lot] of history
for checkout, I think its worth telling them about it for branch and
merge too. Perhaps lets set some sort of heuristic (e.g. 100 or more
revisions) and have the warning trigger on that?

-Rob

Revision history for this message
Martin Pool (mbp) wrote :

On 6 May 2010 04:28, Robert Collins <email address hidden> wrote:
> I realise this has gone through, so I'd like to just request some more
> stuff if you have time; if not please file a bug.
>
> The message will show up when doing a heavy checking in a repository;
> that's just annoying - no history is being copied, so no message
> should appear. Recommended fix: move the notification into the core,
> out of builtins.py.

+1

perhaps just showing it from fetch would be best

> Secondly, if its worth telling people we're copying [a lot] of history
> for checkout, I think its worth telling them about it for branch and
> merge too. Perhaps lets set some sort of heuristic (e.g. 100 or more
> revisions) and have the warning trigger on that?

-½ on that, because it will create questions about "but it worked
before, what changed?" If we want that kind of approach we should
make sure there's a clear progress bar message, so that it's visible
only while the slow operation is taking place.

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message
Parth Malwankar (parthm) wrote :

On Thu, May 6, 2010 at 8:58 AM, Robert Collins
<email address hidden> wrote:
> I realise this has gone through, so I'd like to just request some more
> stuff if you have time; if not please file a bug.
>
> The message will show up when doing a heavy checking in a repository;
> that's just annoying - no history is being copied, so no message
> should appear. Recommended fix: move the notification into the core,
> out of builtins.py.
>
> Secondly, if its worth telling people we're copying [a lot] of history
> for checkout, I think its worth telling them about it for branch and
> merge too. Perhaps lets set some sort of heuristic (e.g. 100 or more
> revisions) and have the warning trigger on that?
>

Good points. Thanks for the review.
As discussed on the IRC I will work on fixing this.
I don't have a good solution yet. Will propose something taking into
account Martin Pool recommendation.

Revision history for this message
Parth Malwankar (parthm) wrote :

So I updated this patch to skip the message when checkout is done in a shared repo. However, there is an interesting case below.

[tmp]% bzr init-repo foo
Shared repository with trees (format: 2a)
Location:
  shared repository: foo
[tmp]% cd foo
[foo]% /home/parthm/src/bzr.dev/538868-message-for-heavy-checkout/bzr checkout ~/src/bzr.dev/trunk foo
[foo]%

In this case, the entire history is copied so it does take time. I am wondering if we should just stick to the simpler earlier patch. Alternatively, if there is a way to know how many changes need to be pulled we could show the message based on this.

This is still checkout specific and doesn't touch other operations.

Revision history for this message
Robert Collins (lifeless) wrote :

Well the main point for me is that the issue - lots of history being
copied - is separate from the commands. So I guess I'm really saying
'do it more broadly please'.

-Rob

Revision history for this message
Martin Pool (mbp) wrote :

test

Revision history for this message
Martin Pool (mbp) wrote :

test

review: Needs Information (2nd review)
Revision history for this message
John A Meinel (jameinel) wrote :

23 pb = ui.ui_factory.nested_progress_bar()
24 + key_count = len(search.get_keys())
25 try:

^- We've discussed that this is a fairly unfortunate regression, as it requires polling the remote server for the list of revisions rather than just having it stream them out.

I'm pretty sure Parth is already looking at how to fix this.

review: Needs Resubmitting
Revision history for this message
Parth Malwankar (parthm) wrote :

With a lot of help from John, this patch is is a good enough shape for review.
Its evolved from a fix for bug #538868 to a fix for bug #374740

The intent is to show users an _estimate_ of the amount of work pending in branch/push/pull/checkout (remote-local, local-remote, remote-remote) operations. This is done by showing the number of records pending.

E.g.

[tmp]% ~/src/bzr.dev/edge/bzr checkout ~/src/bzr.dev/trunk pqr
- Fetching revisions:Inserting stream:Estimate 106429/320381

As the number of records are proportional to the number of revisions to be fetched, for remote operations, this count is not known and the progress bar starts with "Estimating.. X" where X goes from 0 to revs-to-fetch, following this the progress bar changes to whats shown above. For the local ops, we know the count upfront so the progress starts at 0/N.

A RecordCounter object has been added to maintain current, max, key_count and to encapsulate the estimation algorithm. An instance of this is added to StreamSource which is then used among the various sub-streams to show progress. The wrap_and_count generator wraps existing sub-streams with the progress bar printer.

Revision history for this message
Parth Malwankar (parthm) wrote :

Just to add. This progress is seen during the "Inserting stream" phase with is the big time consumer. There is still room for improvement with "Getting stream" and "Inserting missing keys" phase but that can probably be a separate bug.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2010-05-14 09:02:35 +0000
3+++ NEWS 2010-05-14 13:38:33 +0000
4@@ -96,9 +96,9 @@
5 versions before 1.6.
6 (Andrew Bennetts, #528041)
7
8-* Heavyweight checkout operation now shows a message to the user indicating
9- history is being copied.
10- (Parth Malwankar, #538868)
11+* Improved progress bar for fetch. Bazaar now shows an estimate of the
12+ number of records to be fetched vs actually fetched.
13+ (Parth Malwankar, #374740, #538868)
14
15 * Reduce peak memory by one copy of compressed text.
16 (John Arbash Meinel, #566940)
17
18=== modified file 'bzrlib/builtins.py'
19--- bzrlib/builtins.py 2010-05-14 09:20:34 +0000
20+++ bzrlib/builtins.py 2010-05-14 13:38:33 +0000
21@@ -1336,11 +1336,6 @@
22 except errors.NoWorkingTree:
23 source.bzrdir.create_workingtree(revision_id)
24 return
25-
26- if not lightweight:
27- message = ('Copying history to "%s". '
28- 'To checkout without local history use --lightweight.' % to_location)
29- ui.ui_factory.show_message(message)
30 source.create_checkout(to_location, revision_id, lightweight,
31 accelerator_tree, hardlink)
32
33
34=== added file 'bzrlib/recordcounter.py'
35--- bzrlib/recordcounter.py 1970-01-01 00:00:00 +0000
36+++ bzrlib/recordcounter.py 2010-05-14 13:38:33 +0000
37@@ -0,0 +1,65 @@
38+# Copyright (C) 2006-2010 Canonical Ltd
39+#
40+# This program is free software; you can redistribute it and/or modify
41+# it under the terms of the GNU General Public License as published by
42+# the Free Software Foundation; either version 2 of the License, or
43+# (at your option) any later version.
44+#
45+# This program is distributed in the hope that it will be useful,
46+# but WITHOUT ANY WARRANTY; without even the implied warranty of
47+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
48+# GNU General Public License for more details.
49+#
50+# You should have received a copy of the GNU General Public License
51+# along with this program; if not, write to the Free Software
52+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
53+"""Record counting support for showing progress of revision fetch."""
54+
55+class RecordCounter(object):
56+ """Container for maintains estimates of work requires for fetch.
57+
58+ Instance of this class is used along with a progress bar to provide
59+ the user an estimate of the amount of work pending for a fetch (push,
60+ pull, branch, checkout) operation.
61+ """
62+ def __init__(self):
63+ self.initialized = False
64+ self.current = 0
65+ self.key_count = 0
66+ self.max = 0
67+ self.STEP = 71
68+
69+ def is_initialized(self):
70+ return self.initialized
71+
72+ def _estimate_max(self, key_count):
73+ """Estimate the maximum amount of 'inserting stream' work.
74+
75+ This is just an estimate.
76+ """
77+ # Note: The magic number below is based of empirical data
78+ # based on 3 seperate projects. Estimatation can probably
79+ # be improved but this should work well for most cases.
80+ return int(key_count * 10.3)
81+
82+ def setup(self, key_count, current=0):
83+ """Setup RecordCounter with basic estimate of work pending.
84+
85+ Setup self.max and self.current to reflect the amount of work
86+ pending for a fetch.
87+ """
88+ self.current = current
89+ self.key_count = key_count
90+ self.max = self._estimate_max(key_count)
91+ self.initialized = True
92+
93+ def increment(self, count):
94+ """Increment self.current by count.
95+
96+ Apart from incrementing self.current by count, also ensure
97+ that self.max > self.current.
98+ """
99+ self.current += count
100+ if self.current > self.max:
101+ self.max += self.key_count
102+
103
104=== modified file 'bzrlib/remote.py'
105--- bzrlib/remote.py 2010-05-13 16:17:54 +0000
106+++ bzrlib/remote.py 2010-05-14 13:38:33 +0000
107@@ -1980,7 +1980,8 @@
108 if response_tuple[0] != 'ok':
109 raise errors.UnexpectedSmartServerResponse(response_tuple)
110 byte_stream = response_handler.read_streamed_body()
111- src_format, stream = smart_repo._byte_stream_to_stream(byte_stream)
112+ src_format, stream = smart_repo._byte_stream_to_stream(byte_stream,
113+ self._record_counter)
114 if src_format.network_name() != repo._format.network_name():
115 raise AssertionError(
116 "Mismatched RemoteRepository and stream src %r, %r" % (
117
118=== modified file 'bzrlib/repofmt/groupcompress_repo.py'
119--- bzrlib/repofmt/groupcompress_repo.py 2010-05-13 18:52:58 +0000
120+++ bzrlib/repofmt/groupcompress_repo.py 2010-05-14 13:38:33 +0000
121@@ -1108,13 +1108,29 @@
122 yield 'chk_bytes', _get_parent_id_basename_to_file_id_pages()
123
124 def get_stream(self, search):
125+ def wrap_and_count(pb, rc, stream):
126+ """Yield records from stream while showing progress."""
127+ count = 0
128+ for record in stream:
129+ if count == rc.STEP:
130+ rc.increment(count)
131+ pb.update('Estimate', rc.current, rc.max)
132+ count = 0
133+ count += 1
134+ yield record
135+
136 revision_ids = search.get_keys()
137+ pb = ui.ui_factory.nested_progress_bar()
138+ rc = self._record_counter
139+ self._record_counter.setup(len(revision_ids))
140 for stream_info in self._fetch_revision_texts(revision_ids):
141- yield stream_info
142+ yield (stream_info[0],
143+ wrap_and_count(pb, rc, stream_info[1]))
144 self._revision_keys = [(rev_id,) for rev_id in revision_ids]
145 self.from_repository.revisions.clear_cache()
146 self.from_repository.signatures.clear_cache()
147- yield self._get_inventory_stream(self._revision_keys)
148+ s = self._get_inventory_stream(self._revision_keys)
149+ yield (s[0], wrap_and_count(pb, rc, s[1]))
150 self.from_repository.inventories.clear_cache()
151 # TODO: The keys to exclude might be part of the search recipe
152 # For now, exclude all parents that are at the edge of ancestry, for
153@@ -1123,10 +1139,13 @@
154 parent_keys = from_repo._find_parent_keys_of_revisions(
155 self._revision_keys)
156 for stream_info in self._get_filtered_chk_streams(parent_keys):
157- yield stream_info
158+ yield (stream_info[0], wrap_and_count(pb, rc, stream_info[1]))
159 self.from_repository.chk_bytes.clear_cache()
160- yield self._get_text_stream()
161+ s = self._get_text_stream()
162+ yield (s[0], wrap_and_count(pb, rc, s[1]))
163 self.from_repository.texts.clear_cache()
164+ pb.update('Done', rc.max, rc.max)
165+ pb.finished()
166
167 def get_stream_for_missing_keys(self, missing_keys):
168 # missing keys can only occur when we are byte copying and not
169
170=== modified file 'bzrlib/repository.py'
171--- bzrlib/repository.py 2010-05-13 18:52:58 +0000
172+++ bzrlib/repository.py 2010-05-14 13:38:33 +0000
173@@ -43,7 +43,6 @@
174 symbol_versioning,
175 trace,
176 tsort,
177- ui,
178 versionedfile,
179 )
180 from bzrlib.bundle import serializer
181@@ -55,6 +54,7 @@
182 from bzrlib import (
183 errors,
184 registry,
185+ ui,
186 )
187 from bzrlib.decorators import needs_read_lock, needs_write_lock, only_raises
188 from bzrlib.inter import InterObject
189@@ -64,6 +64,7 @@
190 ROOT_ID,
191 entry_factory,
192 )
193+from bzrlib.recordcounter import RecordCounter
194 from bzrlib.lock import _RelockDebugMixin, LogicalLockResult
195 from bzrlib.trace import (
196 log_exception_quietly, note, mutter, mutter_callsite, warning)
197@@ -4283,7 +4284,8 @@
198 is_resume = False
199 try:
200 # locked_insert_stream performs a commit|suspend.
201- return self._locked_insert_stream(stream, src_format, is_resume)
202+ return self._locked_insert_stream(stream, src_format,
203+ is_resume)
204 except:
205 self.target_repo.abort_write_group(suppress_errors=True)
206 raise
207@@ -4336,8 +4338,7 @@
208 # required if the serializers are different only in terms of
209 # the inventory.
210 if src_serializer == to_serializer:
211- self.target_repo.revisions.insert_record_stream(
212- substream)
213+ self.target_repo.revisions.insert_record_stream(substream)
214 else:
215 self._extract_and_insert_revisions(substream,
216 src_serializer)
217@@ -4451,6 +4452,7 @@
218 """Create a StreamSource streaming from from_repository."""
219 self.from_repository = from_repository
220 self.to_format = to_format
221+ self._record_counter = RecordCounter()
222
223 def delta_on_metadata(self):
224 """Return True if delta's are permitted on metadata streams.
225
226=== modified file 'bzrlib/smart/repository.py'
227--- bzrlib/smart/repository.py 2010-05-06 23:41:35 +0000
228+++ bzrlib/smart/repository.py 2010-05-14 13:38:33 +0000
229@@ -39,6 +39,7 @@
230 SuccessfulSmartServerResponse,
231 )
232 from bzrlib.repository import _strip_NULL_ghosts, network_format_registry
233+from bzrlib.recordcounter import RecordCounter
234 from bzrlib import revision as _mod_revision
235 from bzrlib.versionedfile import (
236 NetworkRecordStream,
237@@ -544,12 +545,14 @@
238 :ivar first_bytes: The first bytes to give the next NetworkRecordStream.
239 """
240
241- def __init__(self, byte_stream):
242+ def __init__(self, byte_stream, record_counter):
243 """Create a _ByteStreamDecoder."""
244 self.stream_decoder = pack.ContainerPushParser()
245 self.current_type = None
246 self.first_bytes = None
247 self.byte_stream = byte_stream
248+ self._record_counter = record_counter
249+ self.key_count = 0
250
251 def iter_stream_decoder(self):
252 """Iterate the contents of the pack from stream_decoder."""
253@@ -580,13 +583,46 @@
254
255 def record_stream(self):
256 """Yield substream_type, substream from the byte stream."""
257+ def wrap_and_count(pb, rc, substream):
258+ """Yield records from stream while showing progress."""
259+ counter = 0
260+ if rc:
261+ if self.current_type != 'revisions' and self.key_count != 0:
262+ # As we know the number of revisions now (in self.key_count)
263+ # we can setup and use record_counter (rc).
264+ if not rc.is_initialized():
265+ rc.setup(self.key_count, self.key_count)
266+ for record in substream.read():
267+ if rc:
268+ if rc.is_initialized() and counter == rc.STEP:
269+ rc.increment(counter)
270+ pb.update('Estimate', rc.current, rc.max)
271+ counter = 0
272+ if self.current_type == 'revisions':
273+ # Total records is proportional to number of revs
274+ # to fetch. With remote, we used self.key_count to
275+ # track the number of revs. Once we have the revs
276+ # counts in self.key_count, the progress bar changes
277+ # from 'Estimating..' to 'Estimate' above.
278+ self.key_count += 1
279+ if counter == rc.STEP:
280+ pb.update('Estimating..', self.key_count)
281+ counter = 0
282+ counter += 1
283+ yield record
284+
285 self.seed_state()
286+ pb = ui.ui_factory.nested_progress_bar()
287+ rc = self._record_counter
288 # Make and consume sub generators, one per substream type:
289 while self.first_bytes is not None:
290 substream = NetworkRecordStream(self.iter_substream_bytes())
291 # after substream is fully consumed, self.current_type is set to
292 # the next type, and self.first_bytes is set to the matching bytes.
293- yield self.current_type, substream.read()
294+ yield self.current_type, wrap_and_count(pb, rc, substream)
295+ if rc:
296+ pb.update('Done', rc.max, rc.max)
297+ pb.finished()
298
299 def seed_state(self):
300 """Prepare the _ByteStreamDecoder to decode from the pack stream."""
301@@ -597,13 +633,13 @@
302 list(self.iter_substream_bytes())
303
304
305-def _byte_stream_to_stream(byte_stream):
306+def _byte_stream_to_stream(byte_stream, record_counter=None):
307 """Convert a byte stream into a format and a stream.
308
309 :param byte_stream: A bytes iterator, as output by _stream_to_byte_stream.
310 :return: (RepositoryFormat, stream_generator)
311 """
312- decoder = _ByteStreamDecoder(byte_stream)
313+ decoder = _ByteStreamDecoder(byte_stream, record_counter)
314 for bytes in byte_stream:
315 decoder.stream_decoder.accept_bytes(bytes)
316 for record in decoder.stream_decoder.read_pending_records(max=1):
317
318=== modified file 'bzrlib/tests/blackbox/test_checkout.py'
319--- bzrlib/tests/blackbox/test_checkout.py 2010-04-30 09:52:08 +0000
320+++ bzrlib/tests/blackbox/test_checkout.py 2010-05-14 13:38:33 +0000
321@@ -65,7 +65,6 @@
322
323 def test_checkout_dash_r(self):
324 out, err = self.run_bzr(['checkout', '-r', '-2', 'branch', 'checkout'])
325- self.assertContainsRe(out, 'Copying history to "checkout".')
326 # the working tree should now be at revision '1' with the content
327 # from 1.
328 result = bzrdir.BzrDir.open('checkout')
329@@ -75,7 +74,6 @@
330 def test_checkout_light_dash_r(self):
331 out, err = self.run_bzr(['checkout','--lightweight', '-r', '-2',
332 'branch', 'checkout'])
333- self.assertNotContainsRe(out, 'Copying history')
334 # the working tree should now be at revision '1' with the content
335 # from 1.
336 result = bzrdir.BzrDir.open('checkout')