Bazaar

Merge lp:~parthm/bzr/538868-message-for-heavy-checkout into lp:bzr

538868-message-for-heavy-checkout
Merge into bzr.dev

Proposed by Parth Malwankar on 2010-04-30

Status:

Superseded

Proposed branch:

lp:~parthm/bzr/538868-message-for-heavy-checkout

Merge into:

lp:bzr

Diff against target:

336 lines (+139/-23)

8 files modified

NEWS (+3/-3)
bzrlib/builtins.py (+0/-5)
bzrlib/recordcounter.py (+65/-0)
bzrlib/remote.py (+2/-1)
bzrlib/repofmt/groupcompress_repo.py (+23/-4)
bzrlib/repository.py (+6/-4)
bzrlib/smart/repository.py (+40/-4)
bzrlib/tests/blackbox/test_checkout.py (+0/-2)

To merge this branch:

bzr merge lp:~parthm/bzr/538868-message-for-heavy-checkout

Related bugs:

Bug #374740: streaming fetch progress reporting odd	High	Fix Released
Bug #538868: no indication why heavyweight checkout is slow	Medium	Fix Released

Link a bug report

Reviewer	Review Type	Date Requested	Status
John A Meinel		2010-05-12	Needs Resubmitting on 2010-05-12
Martin Pool	2nd review	2010-05-04	Needs Information on 2010-05-12
Vincent Ladeuil			Approve on 2010-05-04
Gary van der Merwe		2010-04-30	Approve on 2010-05-04
Review via email: mp+24483@code.launchpad.net

This proposal has been superseded by a proposal from 2010-05-14.

Commit message

(parthm) heavyweight checkout now indicates that history is being copied.

Description of the change

=== Fixes Bug #538868 ===
For heavyweight checkout show a message showing that history is being copied and it may take some time.

Sample output:

[tmp]% ~/src/bzr.dev/538868-message-for-heavy-checkout/bzr --no-plugins checkout ~/src/bzr.dev/trunk foobar
Copying history to "foobar". This may take some time.
bzr: interrupted
[tmp]% ~/src/bzr.dev/538868-message-for-heavy-checkout/bzr --no-plugins checkout ~/src/bzr.dev/trunk
Copying history to "trunk". This may take some time.
bzr: interrupted

The only ugliness I see is in the off case that to_location already exists. In this case the output is:

[tmp]% ~/src/bzr.dev/538868-message-for-heavy-checkout/bzr --no-plugins checkout ~/src/bzr.dev/trunk
Copying history to "trunk". This may take some time.
bzr: ERROR: File exists: u'/home/parthm/tmp/trunk/.bzr': [Errno 17] File exists: '/home/parthm/tmp/trunk/.bzr'

It would be ideal if the "copying history" message is not shown. I suppose thats not too bad though. I had a early failure fix for this but haven't put it in considering that bzr works across multiple transports.

+ # Fail early if to_location/.bzr exists. We don't want to
+ # give a message "Copying history ..." and then fail
+ # saying to_location/.bzr exists.
+ to_loc_bzr = osutils.joinpath([to_location, '.bzr'])
+ if osutils.lexists(to_loc_bzr):
+ raise errors.BzrCommandError('"%s" exists.' % to_loc_bzr)
+

Revision history for this message

Martin Pool (mbp) wrote on 2010-05-03:

Thanks, this is a very nice bug to fix.

I would prefer the message came out through trace or the ui factory
than directly to self.outf, because that will make it easier to
refactor out of the cmd implementation, and it's more likely to
automatically respect --quiet. You might then be able to test more
cleanly through TestUIFactory.

Revision history for this message

Gary van der Merwe (garyvdm) on 2010-05-04:

review: Approve

Revision history for this message

Vincent Ladeuil (vila) wrote on 2010-05-04:

Apart from the message tweaks mentioned on IRC, that's good to land !

review: Approve

Revision history for this message

Robert Collins (lifeless) wrote on 2010-05-06:

I realise this has gone through, so I'd like to just request some more
stuff if you have time; if not please file a bug.

The message will show up when doing a heavy checking in a repository;
that's just annoying - no history is being copied, so no message
should appear. Recommended fix: move the notification into the core,
out of builtins.py.

Secondly, if its worth telling people we're copying [a lot] of history
for checkout, I think its worth telling them about it for branch and
merge too. Perhaps lets set some sort of heuristic (e.g. 100 or more
revisions) and have the warning trigger on that?

-Rob

Revision history for this message

Martin Pool (mbp) wrote on 2010-05-06:

On 6 May 2010 04:28, Robert Collins <email address hidden> wrote:
> I realise this has gone through, so I'd like to just request some more
> stuff if you have time; if not please file a bug.
>
> The message will show up when doing a heavy checking in a repository;
> that's just annoying - no history is being copied, so no message
> should appear. Recommended fix: move the notification into the core,
> out of builtins.py.

perhaps just showing it from fetch would be best

> Secondly, if its worth telling people we're copying [a lot] of history
> for checkout, I think its worth telling them about it for branch and
> merge too. Perhaps lets set some sort of heuristic (e.g. 100 or more
> revisions) and have the warning trigger on that?

-½ on that, because it will create questions about "but it worked
before, what changed?" If we want that kind of approach we should
make sure there's a clear progress bar message, so that it's visible
only while the slow operation is taking place.

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message

Parth Malwankar (parthm) wrote on 2010-05-06:

On Thu, May 6, 2010 at 8:58 AM, Robert Collins
<email address hidden> wrote:
> I realise this has gone through, so I'd like to just request some more
> stuff if you have time; if not please file a bug.
>
> The message will show up when doing a heavy checking in a repository;
> that's just annoying - no history is being copied, so no message
> should appear. Recommended fix: move the notification into the core,
> out of builtins.py.
>
> Secondly, if its worth telling people we're copying [a lot] of history
> for checkout, I think its worth telling them about it for branch and
> merge too. Perhaps lets set some sort of heuristic (e.g. 100 or more
> revisions) and have the warning trigger on that?
>

Good points. Thanks for the review.
As discussed on the IRC I will work on fixing this.
I don't have a good solution yet. Will propose something taking into
account Martin Pool recommendation.

Revision history for this message

Parth Malwankar (parthm) wrote on 2010-05-06:

So I updated this patch to skip the message when checkout is done in a shared repo. However, there is an interesting case below.

[tmp]% bzr init-repo foo
Shared repository with trees (format: 2a)
Location:
shared repository: foo
[tmp]% cd foo
[foo]% /home/parthm/src/bzr.dev/538868-message-for-heavy-checkout/bzr checkout ~/src/bzr.dev/trunk foo
[foo]%

In this case, the entire history is copied so it does take time. I am wondering if we should just stick to the simpler earlier patch. Alternatively, if there is a way to know how many changes need to be pulled we could show the message based on this.

This is still checkout specific and doesn't touch other operations.

Revision history for this message

Robert Collins (lifeless) wrote on 2010-05-07:

Well the main point for me is that the issue - lots of history being
copied - is separate from the commands. So I guess I'm really saying
'do it more broadly please'.

-Rob

Revision history for this message

Martin Pool (mbp) wrote on 2010-05-12:

test

Revision history for this message

Martin Pool (mbp) wrote on 2010-05-12:

test

review: Needs Information (2nd review)

Revision history for this message

John A Meinel (jameinel) wrote on 2010-05-12:

23 pb = ui.ui_factory.nested_progress_bar()
24 + key_count = len(search.get_keys())
25 try:

^- We've discussed that this is a fairly unfortunate regression, as it requires polling the remote server for the list of revisions rather than just having it stream them out.

I'm pretty sure Parth is already looking at how to fix this.

review: Needs Resubmitting

Revision history for this message

Parth Malwankar (parthm) wrote on 2010-05-14:

With a lot of help from John, this patch is is a good enough shape for review.
Its evolved from a fix for bug #538868 to a fix for bug #374740

The intent is to show users an _estimate_ of the amount of work pending in branch/push/pull/checkout (remote-local, local-remote, remote-remote) operations. This is done by showing the number of records pending.

E.g.

[tmp]% ~/src/bzr.dev/edge/bzr checkout ~/src/bzr.dev/trunk pqr
- Fetching revisions:Inserting stream:Estimate 106429/320381

As the number of records are proportional to the number of revisions to be fetched, for remote operations, this count is not known and the progress bar starts with "Estimating.. X" where X goes from 0 to revs-to-fetch, following this the progress bar changes to whats shown above. For the local ops, we know the count upfront so the progress starts at 0/N.

A RecordCounter object has been added to maintain current, max, key_count and to encapsulate the estimation algorithm. An instance of this is added to StreamSource which is then used among the various sub-streams to show progress. The wrap_and_count generator wraps existing sub-streams with the progress bar printer.

Revision history for this message

Parth Malwankar (parthm) wrote on 2010-05-14:

Just to add. This progress is seen during the "Inserting stream" phase with is the big time consumer. There is still room for improvement with "Getting stream" and "Inserting missing keys" phase but that can probably be a separate bug.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Alejandro Cornejo2

Bazaar Codereview Subscribers

Benoit Pierre

Gmood

Karl Bielefeldt

Mahmoud Hassan

Matt Nordhoff

Mohd Fikri Mohd Amin

MrJOHN

Parth Malwankar

Václav Haisman

bzr PQM

vincenzo

to status/vote changes:

Alexander Belchenko

amandla2023

 === modified file 'NEWS'
 --- NEWS	2010-05-14 09:02:35 +0000
 +++ NEWS	2010-05-14 13:38:33 +0000
@@ -96,9 +96,9 @@
    versions before 1.6.
    (Andrew Bennetts, #528041)
--* Heavyweight checkout operation now shows a message to the user indicating
--  history is being copied.
--  (Parth Malwankar, #538868)
++* Improved progress bar for fetch. Bazaar now shows an estimate of the
++  number of records to be fetched vs actually fetched.
++  (Parth Malwankar, #374740, #538868)
  * Reduce peak memory by one copy of compressed text.
    (John Arbash Meinel, #566940)
 === modified file 'bzrlib/builtins.py'
 --- bzrlib/builtins.py	2010-05-14 09:20:34 +0000
 +++ bzrlib/builtins.py	2010-05-14 13:38:33 +0000
@@ -1336,11 +1336,6 @@
              except errors.NoWorkingTree:
                  source.bzrdir.create_workingtree(revision_id)
                  return
--
--        if not lightweight:
--            message = ('Copying history to "%s". '
--                'To checkout without local history use --lightweight.' % to_location)
--            ui.ui_factory.show_message(message)
          source.create_checkout(to_location, revision_id, lightweight,
                                 accelerator_tree, hardlink)
 === added file 'bzrlib/recordcounter.py'
 --- bzrlib/recordcounter.py	1970-01-01 00:00:00 +0000
 +++ bzrlib/recordcounter.py	2010-05-14 13:38:33 +0000
@@ -0,0 +1,65 @@
++# Copyright (C) 2006-2010 Canonical Ltd
++#
++# This program is free software; you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation; either version 2 of the License, or
++# (at your option) any later version.
++#
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
++# GNU General Public License for more details.
++#
++# You should have received a copy of the GNU General Public License
++# along with this program; if not, write to the Free Software
++# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
++"""Record counting support for showing progress of revision fetch."""
++
++class RecordCounter(object):
++    """Container for maintains estimates of work requires for fetch.
++
++    Instance of this class is used along with a progress bar to provide
++    the user an estimate of the amount of work pending for a fetch (push,
++    pull, branch, checkout) operation.
++    """
++    def __init__(self):
++        self.initialized = False
++        self.current = 0
++        self.key_count = 0
++        self.max = 0
++        self.STEP = 71
++
++    def is_initialized(self):
++        return self.initialized
++
++    def _estimate_max(self, key_count):
++        """Estimate the maximum amount of 'inserting stream' work.
++
++        This is just an estimate.
++        """
++        # Note: The magic number below is based of empirical data
++        # based on 3 seperate projects. Estimatation can probably
++        # be improved but this should work well for most cases.
++        return int(key_count * 10.3)
++
++    def setup(self, key_count, current=0):
++        """Setup RecordCounter with basic estimate of work pending.
++
++        Setup self.max and self.current to reflect the amount of work
++        pending for a fetch.
++        """
++        self.current = current
++        self.key_count = key_count
++        self.max = self._estimate_max(key_count)
++        self.initialized = True
++
++    def increment(self, count):
++        """Increment self.current by count.
++
++        Apart from incrementing self.current by count, also ensure
++        that self.max > self.current.
++        """
++        self.current += count
++        if self.current > self.max:
++            self.max += self.key_count
++
 === modified file 'bzrlib/remote.py'
 --- bzrlib/remote.py	2010-05-13 16:17:54 +0000
 +++ bzrlib/remote.py	2010-05-14 13:38:33 +0000
@@ -1980,7 +1980,8 @@
          if response_tuple[0] != 'ok':
              raise errors.UnexpectedSmartServerResponse(response_tuple)
          byte_stream = response_handler.read_streamed_body()
--        src_format, stream = smart_repo._byte_stream_to_stream(byte_stream)
++        src_format, stream = smart_repo._byte_stream_to_stream(byte_stream,
++            self._record_counter)
          if src_format.network_name() != repo._format.network_name():
              raise AssertionError(
                  "Mismatched RemoteRepository and stream src %r, %r" % (
 === modified file 'bzrlib/repofmt/groupcompress_repo.py'
 --- bzrlib/repofmt/groupcompress_repo.py	2010-05-13 18:52:58 +0000
 +++ bzrlib/repofmt/groupcompress_repo.py	2010-05-14 13:38:33 +0000
@@ -1108,13 +1108,29 @@
          yield 'chk_bytes', _get_parent_id_basename_to_file_id_pages()
      def get_stream(self, search):
++        def wrap_and_count(pb, rc, stream):
++            """Yield records from stream while showing progress."""
++            count = 0
++            for record in stream:
++                if count == rc.STEP:
++                    rc.increment(count)
++                    pb.update('Estimate', rc.current, rc.max)
++                    count = 0
++                count += 1
++                yield record
++
          revision_ids = search.get_keys()
++        pb = ui.ui_factory.nested_progress_bar()
++        rc = self._record_counter
++        self._record_counter.setup(len(revision_ids))
          for stream_info in self._fetch_revision_texts(revision_ids):
--            yield stream_info
++            yield (stream_info[0],
++                wrap_and_count(pb, rc, stream_info[1]))
          self._revision_keys = [(rev_id,) for rev_id in revision_ids]
          self.from_repository.revisions.clear_cache()
          self.from_repository.signatures.clear_cache()
--        yield self._get_inventory_stream(self._revision_keys)
++        s = self._get_inventory_stream(self._revision_keys)
++        yield (s[0], wrap_and_count(pb, rc, s[1]))
          self.from_repository.inventories.clear_cache()
          # TODO: The keys to exclude might be part of the search recipe
          # For now, exclude all parents that are at the edge of ancestry, for
@@ -1123,10 +1139,13 @@
          parent_keys = from_repo._find_parent_keys_of_revisions(
                          self._revision_keys)
          for stream_info in self._get_filtered_chk_streams(parent_keys):
--            yield stream_info
++            yield (stream_info[0], wrap_and_count(pb, rc, stream_info[1]))
          self.from_repository.chk_bytes.clear_cache()
--        yield self._get_text_stream()
++        s = self._get_text_stream()
++        yield (s[0], wrap_and_count(pb, rc, s[1]))
          self.from_repository.texts.clear_cache()
++        pb.update('Done', rc.max, rc.max)
++        pb.finished()
      def get_stream_for_missing_keys(self, missing_keys):
          # missing keys can only occur when we are byte copying and not
 === modified file 'bzrlib/repository.py'
 --- bzrlib/repository.py	2010-05-13 18:52:58 +0000
 +++ bzrlib/repository.py	2010-05-14 13:38:33 +0000
@@ -43,7 +43,6 @@
      symbol_versioning,
      trace,
      tsort,
--    ui,
      versionedfile,
+     )
  from bzrlib.bundle import serializer
@@ -55,6 +54,7 @@
  from bzrlib import (
      errors,
      registry,
++    ui,
+     )
  from bzrlib.decorators import needs_read_lock, needs_write_lock, only_raises
  from bzrlib.inter import InterObject
@@ -64,6 +64,7 @@
      ROOT_ID,
      entry_factory,
+     )
++from bzrlib.recordcounter import RecordCounter
  from bzrlib.lock import _RelockDebugMixin, LogicalLockResult
  from bzrlib.trace import (
      log_exception_quietly, note, mutter, mutter_callsite, warning)
@@ -4283,7 +4284,8 @@
                  is_resume = False
              try:
                  # locked_insert_stream performs a commit|suspend.
--                return self._locked_insert_stream(stream, src_format, is_resume)
++                return self._locked_insert_stream(stream, src_format,
++                    is_resume)
              except:
                  self.target_repo.abort_write_group(suppress_errors=True)
                  raise
@@ -4336,8 +4338,7 @@
                  # required if the serializers are different only in terms of
                  # the inventory.
                  if src_serializer == to_serializer:
--                    self.target_repo.revisions.insert_record_stream(
--                        substream)
++                    self.target_repo.revisions.insert_record_stream(substream)
                  else:
                      self._extract_and_insert_revisions(substream,
                          src_serializer)
@@ -4451,6 +4452,7 @@
          """Create a StreamSource streaming from from_repository."""
          self.from_repository = from_repository
          self.to_format = to_format
++        self._record_counter = RecordCounter()
      def delta_on_metadata(self):
          """Return True if delta's are permitted on metadata streams.
 === modified file 'bzrlib/smart/repository.py'
 --- bzrlib/smart/repository.py	2010-05-06 23:41:35 +0000
 +++ bzrlib/smart/repository.py	2010-05-14 13:38:33 +0000
@@ -39,6 +39,7 @@
      SuccessfulSmartServerResponse,
+     )
  from bzrlib.repository import _strip_NULL_ghosts, network_format_registry
++from bzrlib.recordcounter import RecordCounter
  from bzrlib import revision as _mod_revision
  from bzrlib.versionedfile import (
      NetworkRecordStream,
@@ -544,12 +545,14 @@
      :ivar first_bytes: The first bytes to give the next NetworkRecordStream.
      """
--    def __init__(self, byte_stream):
++    def __init__(self, byte_stream, record_counter):
          """Create a _ByteStreamDecoder."""
          self.stream_decoder = pack.ContainerPushParser()
          self.current_type = None
          self.first_bytes = None
          self.byte_stream = byte_stream
++        self._record_counter = record_counter
++        self.key_count = 0
      def iter_stream_decoder(self):
          """Iterate the contents of the pack from stream_decoder."""
@@ -580,13 +583,46 @@
      def record_stream(self):
          """Yield substream_type, substream from the byte stream."""
++        def wrap_and_count(pb, rc, substream):
++            """Yield records from stream while showing progress."""
++            counter = 0
++            if rc:
++                if self.current_type != 'revisions' and self.key_count != 0:
++                    # As we know the number of revisions now (in self.key_count)
++                    # we can setup and use record_counter (rc).
++                    if not rc.is_initialized():
++                        rc.setup(self.key_count, self.key_count)
++            for record in substream.read():
++                if rc:
++                    if rc.is_initialized() and counter == rc.STEP:
++                        rc.increment(counter)
++                        pb.update('Estimate', rc.current, rc.max)
++                        counter = 0
++                    if self.current_type == 'revisions':
++                        # Total records is proportional to number of revs
++                        # to fetch. With remote, we used self.key_count to
++                        # track the number of revs. Once we have the revs
++                        # counts in self.key_count, the progress bar changes
++                        # from 'Estimating..' to 'Estimate' above.
++                        self.key_count += 1
++                        if counter == rc.STEP:
++                            pb.update('Estimating..', self.key_count)
++                            counter = 0
++                counter += 1
++                yield record
++
          self.seed_state()
++        pb = ui.ui_factory.nested_progress_bar()
++        rc = self._record_counter
          # Make and consume sub generators, one per substream type:
          while self.first_bytes is not None:
              substream = NetworkRecordStream(self.iter_substream_bytes())
              # after substream is fully consumed, self.current_type is set to
              # the next type, and self.first_bytes is set to the matching bytes.
--            yield self.current_type, substream.read()
++            yield self.current_type, wrap_and_count(pb, rc, substream)
++        if rc:
++            pb.update('Done', rc.max, rc.max)
++        pb.finished()
      def seed_state(self):
          """Prepare the _ByteStreamDecoder to decode from the pack stream."""
@@ -597,13 +633,13 @@
          list(self.iter_substream_bytes())
--def _byte_stream_to_stream(byte_stream):
++def _byte_stream_to_stream(byte_stream, record_counter=None):
      """Convert a byte stream into a format and a stream.
      :param byte_stream: A bytes iterator, as output by _stream_to_byte_stream.
      :return: (RepositoryFormat, stream_generator)
      """
--    decoder = _ByteStreamDecoder(byte_stream)
++    decoder = _ByteStreamDecoder(byte_stream, record_counter)
      for bytes in byte_stream:
          decoder.stream_decoder.accept_bytes(bytes)
          for record in decoder.stream_decoder.read_pending_records(max=1):
 === modified file 'bzrlib/tests/blackbox/test_checkout.py'
 --- bzrlib/tests/blackbox/test_checkout.py	2010-04-30 09:52:08 +0000
 +++ bzrlib/tests/blackbox/test_checkout.py	2010-05-14 13:38:33 +0000
@@ -65,7 +65,6 @@
      def test_checkout_dash_r(self):
          out, err = self.run_bzr(['checkout', '-r', '-2', 'branch', 'checkout'])
--        self.assertContainsRe(out, 'Copying history to "checkout".')
          # the working tree should now be at revision '1' with the content
          # from 1.
          result = bzrdir.BzrDir.open('checkout')
@@ -75,7 +74,6 @@
      def test_checkout_light_dash_r(self):
          out, err = self.run_bzr(['checkout','--lightweight', '-r', '-2',
              'branch', 'checkout'])
--        self.assertNotContainsRe(out, 'Copying history')
          # the working tree should now be at revision '1' with the content
          # from 1.
          result = bzrdir.BzrDir.open('checkout')