Bazaar

Merge lp:~jameinel/bzr/1.15-gc-stacking into lp:~bzr/bzr/trunk-old

1.15-gc-stacking
Merge into trunk-old

Proposed by John A Meinel on 2009-05-29

Status:

Merged

Merged at revision:

not available

Proposed branch:

lp:~jameinel/bzr/1.15-gc-stacking

Merge into:

lp:~bzr/bzr/trunk-old

Diff against target:

2014 lines

To merge this branch:

bzr merge lp:~jameinel/bzr/1.15-gc-stacking

High

Fix Released

Link a bug report

Reviewer	Review Type	Date Requested	Status
Andrew Bennetts			Approve on 2009-05-29
Review via email: mp+6880@code.launchpad.net

This proposal supersedes a proposal from 2009-05-28.

Revision history for this message

John A Meinel (jameinel) wrote on 2009-05-28: Posted in a previous version of this proposal

This change enables --development6-rich-root to stack. It ends up including the Repository fallback locking fixes, and a few other code cleanups that we encountered along the way.

It unfortunately adds a bit more direct connection between PackRepository and PackRepository.revisions._index._key_dependencies

We already had an explicit connection, because get_missing_parent_inventories() was directly accessing that variable. The bit we added was to just add a reset() of that cache when a write group is committed, aborted, or suspended. We felt that this was the 'right thing', but it was also required to fix a test about ghosts.

(We had a test that ghosts aren't filled in, but without resetting key deps, an earlier commit that introduced the ghost still tracks that the ghost is missing. Existing Pack fetching would suffer this as well if it used the Stream code for fetching, rather than Pack => Pack.)

This is potentially up for backporting to a 1.15.1 release.

Revision history for this message

Andrew Bennetts (spiv) wrote on 2009-05-29: Posted in a previous version of this proposal

It's a shame that you add both "_find_present_inventory_ids" and "_find_present_inventories" to groupcompress_repo.py, but it's not trivial to factor out that duplication. Similarly, around line 950 of that file you have a duplication of the logic of find_parent_ids_of_revisions, but again reusing that code isn't trivial. Something to cleanup in the future I guess...

In test_sprout_from_stacked_with_short_history in bzrlib/tests/per_repository_reference/test_fetch.py you start with a comment saying "Now copy this ...", which is a bit weird as the first thing in a test. Probably this comment hasn't been updated after you refactored the test? Anyway, please update it.

1353 + for record in stream:
1354 + records.append(record.key)
1355 + if record.key == ('a-id', 'A-id'):
1356 + self.assertEqual(''.join(content[:-2]),
1357 + record.get_bytes_as('fulltext'))
1358 + elif record.key == ('a-id', 'B-id'):
1359 + self.assertEqual(''.join(content[:-1]),
1360 + record.get_bytes_as('fulltext'))
1361 + elif record.key == ('a-id', 'C-id'):
1362 + self.assertEqual(''.join(content),
1363 + record.get_bytes_as('fulltext'))
1364 + else:
1365 + self.fail('Unexpected record: %s' % (record.key,))

This is ok, but I think I'd rather:

for record in stream:
    records.append((record.key, record.get_bytes_as('fulltext')))
records.sort()
self.assertEqual(
    [(('a-id', 'A-id'), ''.join(content[:-2])), (('a-id', 'B-id'), ''.join(content[:-1])),
     (('a-id', 'C-id'), ''.join(content))],
    records)

Which is more compact and doesn't have any need for conditionals in the test, and will probably give more informative failures.

bzrlib/tests/per_repository_reference/test_initialize.py adds a test with no assert* calls. Is that intentional?

In bzrlib/tests/test_pack_repository.py, test_resume_chk_bytes has a line of unreachable code after a raise statement.

In bzrlib/tests/test_repository.py, is the typo in 'abcdefghijklmnopqrstuvwxzy123456789' meant to be a test to see how attentive your reviewer is? ;)

Other than those, this seems fine to me though.

It's a shame that you add both "_find_present_inventory_ids" and "_find_present_inventories" to groupcompress_repo.py, but it's not trivial to factor out that duplication.  Similarly, around line 950 of that file you have a duplication of the logic of find_parent_ids_of_revisions, but again reusing that code isn't trivial.  Something to cleanup in the future I guess...

In test_sprout_from_stacked_with_short_history in bzrlib/tests/per_repository_reference/test_fetch.py you start with a comment saying "Now copy this ...", which is a bit weird as the first thing in a test.  Probably this comment hasn't been updated after you refactored the test?  Anyway, please update it.

1353	+ for record in stream:
1354	+ records.append(record.key)
1355	+ if record.key == ('a-id', 'A-id'):
1356	+ self.assertEqual(''.join(content[:-2]),
1357	+ record.get_bytes_as('fulltext'))
1358	+ elif record.key == ('a-id', 'B-id'):
1359	+ self.assertEqual(''.join(content[:-1]),
1360	+ record.get_bytes_as('fulltext'))
1361	+ elif record.key == ('a-id', 'C-id'):
1362	+ self.assertEqual(''.join(content),
1363	+ record.get_bytes_as('fulltext'))
1364	+ else:
1365	+ self.fail('Unexpected record: %s' % (record.key,))

This is ok, but I think I'd rather:

Which is more compact and doesn't have any need for conditionals in the test, and will probably give more informative failures.

bzrlib/tests/per_repository_reference/test_initialize.py adds a test with no assert* calls.  Is that intentional?

In bzrlib/tests/test_pack_repository.py, test_resume_chk_bytes has a line of unreachable code after a raise statement.

In bzrlib/tests/test_repository.py, is the typo in 'abcdefghijklmnopqrstuvwxzy123456789' meant to be a test to see how attentive your reviewer is? ;)

Other than those, this seems fine to me though.

review: Needs Fixing

Revision history for this message

John A Meinel (jameinel) wrote on 2009-05-29: Posted in a previous version of this proposal

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> Review: Needs Fixing
> It's a shame that you add both "_find_present_inventory_ids" and "_find_present_inventories" to groupcompress_repo.py, but it's not trivial to factor out that duplication. Similarly, around line 950 of that file you have a duplication of the logic of find_parent_ids_of_revisions, but again reusing that code isn't trivial. Something to cleanup in the future I guess...
>

_find_present_inventory_ids, and _find_present_inventories are actually
exchangeable, it is just
self.from_repository._find_present_inventory_ids
rather than
self._find_present_inventories.

I'm glad you caught the duplication.

And for "_find_parent_ids_of_revisions()" it also is available as
self.from_repository....

Mostly because this is GroupCHKStreamSource which can assume that it has
a RepositoryCHK1 as the .from_repository.

Ultimately, we should probably move those functions to be on Repository,
and potentially make them public. I don't really like widening the
Repository api, but as it has a default implementation that works just
fine for all other implementations, it doesn't really cause a burden for
something like SVNRepository.

> In test_sprout_from_stacked_with_short_history in bzrlib/tests/per_repository_reference/test_fetch.py you start with a comment saying "Now copy this ...", which is a bit weird as the first thing in a test. Probably this comment hasn't been updated after you refactored the test? Anyway, please update it.

Done.

...

> for record in stream:
> records.append((record.key, record.get_bytes_as('fulltext')))
> records.sort()
> self.assertEqual(
> [(('a-id', 'A-id'), ''.join(content[:-2])), (('a-id', 'B-id'), ''.join(content[:-1])),
> (('a-id', 'C-id'), ''.join(content))],
> records)
>
> Which is more compact and doesn't have any need for conditionals in the test, and will probably give more informative failures.

Done.

>
> bzrlib/tests/per_repository_reference/test_initialize.py adds a test with no assert* calls. Is that intentional?
>

It exercises the code that was broken by doing things differently. (As
in you would get an exception.)
I can add arbitrary assertions, but the reason for the test was to have
a simple call to "initialize_on_transport_ex()" given all repository
formats, and remote requests, etc, etc.

I'll add some basic bits, just to make it look like a real test. I'll
even add one that tests we can initialize all formats over the smart server.

> In bzrlib/tests/test_pack_repository.py, test_resume_chk_bytes has a line of unreachable code after a raise statement.
>
> In bzrlib/tests/test_repository.py, is the typo in 'abcdefghijklmnopqrstuvwxzy123456789' meant to be a test to see how attentive your reviewer is? ;)
>
> Other than those, this seems fine to me though.

Fixed.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkofsfMACgkQJdeBCYSNAAP7BgCfeZehp6iRn0THWW1lDnOEzs1p
PxoAnjCXPs75oPLPiZTtSrrDT6jebUkt
=zVXm
-----END PGP SIGNATURE-----

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> Review: Needs Fixing
> It's a shame that you add both "_find_present_inventory_ids" and "_find_present_inventories" to groupcompress_repo.py, but it's not trivial to factor out that duplication.  Similarly, around line 950 of that file you have a duplication of the logic of find_parent_ids_of_revisions, but again reusing that code isn't trivial.  Something to cleanup in the future I guess...
>

_find_present_inventory_ids, and _find_present_inventories are actually
exchangeable, it is just
self.from_repository._find_present_inventory_ids
rather than
self._find_present_inventories.

I'm glad you caught the duplication.

And for "_find_parent_ids_of_revisions()" it also is available as
self.from_repository....

Mostly because this is GroupCHKStreamSource which can assume that it has
a RepositoryCHK1 as the .from_repository.

> In test_sprout_from_stacked_with_short_history in bzrlib/tests/per_repository_reference/test_fetch.py you start with a comment saying "Now copy this ...", which is a bit weird as the first thing in a test.  Probably this comment hasn't been updated after you refactored the test?  Anyway, please update it.

Done.

...

> for record in stream:
>     records.append((record.key, record.get_bytes_as('fulltext')))
> records.sort()
> self.assertEqual(
>     [(('a-id', 'A-id'), ''.join(content[:-2])), (('a-id', 'B-id'), ''.join(content[:-1])),
>      (('a-id', 'C-id'), ''.join(content))],
>     records)
> 
> Which is more compact and doesn't have any need for conditionals in the test, and will probably give more informative failures.

Done.

> 
> bzrlib/tests/per_repository_reference/test_initialize.py adds a test with no assert* calls.  Is that intentional?
>

I'll add some basic bits, just to make it look like a real test. I'll
even add one that tests we can initialize all formats over the smart server.

> In bzrlib/tests/test_pack_repository.py, test_resume_chk_bytes has a line of unreachable code after a raise statement.
> 
> In bzrlib/tests/test_repository.py, is the typo in 'abcdefghijklmnopqrstuvwxzy123456789' meant to be a test to see how attentive your reviewer is? ;)
> 
> Other than those, this seems fine to me though.

Fixed.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkofsfMACgkQJdeBCYSNAAP7BgCfeZehp6iRn0THWW1lDnOEzs1p
PxoAnjCXPs75oPLPiZTtSrrDT6jebUkt
=zVXm
-----END PGP SIGNATURE-----

Revision history for this message

Andrew Bennetts (spiv) wrote on 2009-05-29:

Looks good to me now.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Aaron Bentley

Denys Duchier

Eric Siegerman

Gary van der Merwe

Jelmer Vernooij

John A Meinel

John Szakmeister

Jonathan Lange

Marius Kruger

Martin Albisetti

Matt Nordhoff

Paul Hummer

SuperMMX

Talden

Yoshinori Sano

to status/vote changes:

Alexander Belchenko

Martin Eisenhardt

Tim Penhey

Vincent Ladeuil

Bazaar

Merge lp:~jameinel/bzr/1.15-gc-stacking into lp:~bzr/bzr/trunk-old

Commit message

Description of the change

Preview Diff

Subscribers

 === modified file 'NEWS'
 --- NEWS	2009-05-28 18:56:55 +0000
 +++ NEWS	2009-05-29 10:35:20 +0000
@@ -1,6 +1,6 @@
--====================
++####################
  Bazaar Release Notes
--====================
++####################
  .. contents:: List of Releases
@@ -25,12 +25,22 @@
  * ``bzr diff`` is now faster on large trees. (Ian Clatworthy)
++* ``--development6-rich-root`` can now stack. (Modulo some smart-server
++  bugs with stacking and non default formats.)
++  (John Arbash Meinel, #373455)
++
++
  Bug Fixes
  *********
  * Better message in ``bzr add`` output suggesting using ``bzr ignored`` to
    see which files can also be added.  (Jason Spashett, #76616)
++* Clarify the rules for locking and fallback repositories. Fix bugs in how
++  ``RemoteRepository`` was handling fallbacks along with the
++  ``_real_repository``. (Andrew Bennetts, John Arbash Meinel, #375496)
++
++
  Documentation
  *************
@@ -76,6 +86,15 @@
  New Features
  ************
++* New command ``bzr dpush`` that can push changes to foreign
++  branches (svn, git) without setting custom bzr-specific metadata.
++  (Jelmer Vernooij)
++
++* The new development format ``--development6-rich-root`` now supports
++  stacking. We chose not to use a new format marker, since old clients
++  will just fail to open stacked branches, the same as if we used a new
++  format flag. (John Arbash Meinel, #373455)
++
  * Plugins can now define their own annotation tie-breaker when two revisions
    introduce the exact same line. See ``bzrlib.annotate._break_annotation_tie``
    Be aware though that this is temporary, private (as indicated by the leading
 === modified file 'bzrlib/branch.py'
 --- bzrlib/branch.py	2009-05-26 20:32:34 +0000
 +++ bzrlib/branch.py	2009-05-29 10:35:20 +0000
@@ -101,13 +101,9 @@
      def _open_hook(self):
          """Called by init to allow simpler extension of the base class."""
--    def _activate_fallback_location(self, url, lock_style):
++    def _activate_fallback_location(self, url):
          """Activate the branch/repository from url as a fallback repository."""
          repo = self._get_fallback_repository(url)
--        if lock_style == 'write':
--            repo.lock_write()
--        elif lock_style == 'read':
--            repo.lock_read()
          self.repository.add_fallback_repository(repo)
      def break_lock(self):
@@ -656,7 +652,7 @@
                  self.repository.fetch(source_repository, revision_id,
                      find_ghosts=True)
          else:
--            self._activate_fallback_location(url, 'write')
++            self._activate_fallback_location(url)
          # write this out after the repository is stacked to avoid setting a
          # stacked config that doesn't work.
          self._set_config_location('stacked_on_location', url)
@@ -2370,7 +2366,7 @@
                      raise AssertionError(
                          "'transform_fallback_location' hook %s returned "
                          "None, not a URL." % hook_name)
--            self._activate_fallback_location(url, None)
++            self._activate_fallback_location(url)
      def __init__(self, *args, **kwargs):
          self._ignore_fallbacks = kwargs.get('ignore_fallbacks', False)
 === modified file 'bzrlib/groupcompress.py'
 --- bzrlib/groupcompress.py	2009-05-25 19:04:59 +0000
 +++ bzrlib/groupcompress.py	2009-05-29 10:35:20 +0000
@@ -31,13 +31,13 @@
      diff,
      errors,
      graph as _mod_graph,
++    knit,
      osutils,
      pack,
      patiencediff,
      trace,
+     )
  from bzrlib.graph import Graph
--from bzrlib.knit import _DirectPackAccess
  from bzrlib.btree_index import BTreeBuilder
  from bzrlib.lru_cache import LRUSizeCache
  from bzrlib.tsort import topo_sort
@@ -911,7 +911,7 @@
          writer.begin()
          index = _GCGraphIndex(graph_index, lambda:True, parents=parents,
              add_callback=graph_index.add_nodes)
--        access = _DirectPackAccess({})
++        access = knit._DirectPackAccess({})
          access.set_writer(writer, graph_index, (transport, 'newpack'))
          result = GroupCompressVersionedFiles(index, access, delta)
          result.stream = stream
@@ -1547,7 +1547,7 @@
      """Mapper from GroupCompressVersionedFiles needs into GraphIndex storage."""
      def __init__(self, graph_index, is_locked, parents=True,
--        add_callback=None):
++        add_callback=None, track_external_parent_refs=False):
          """Construct a _GCGraphIndex on a graph_index.
          :param graph_index: An implementation of bzrlib.index.GraphIndex.
@@ -1558,12 +1558,19 @@
          :param add_callback: If not None, allow additions to the index and call
              this callback with a list of added GraphIndex nodes:
              [(node, value, node_refs), ...]
++        :param track_external_parent_refs: As keys are added, keep track of the
++            keys they reference, so that we can query get_missing_parents(),
++            etc.
          """
          self._add_callback = add_callback
          self._graph_index = graph_index
          self._parents = parents
          self.has_graph = parents
          self._is_locked = is_locked
++        if track_external_parent_refs:
++            self._key_dependencies = knit._KeyRefs()
++        else:
++            self._key_dependencies = None
      def add_records(self, records, random_id=False):
          """Add multiple records to the index.
@@ -1614,6 +1621,11 @@
                  for key, (value, node_refs) in keys.iteritems():
                      result.append((key, value))
              records = result
++        key_dependencies = self._key_dependencies
++        if key_dependencies is not None and self._parents:
++            for key, value, refs in records:
++                parents = refs[0]
++                key_dependencies.add_references(key, parents)
          self._add_callback(records)
      def _check_read(self):
@@ -1668,6 +1680,14 @@
                  result[node[1]] = None
          return result
++    def get_missing_parents(self):
++        """Return the keys of missing parents."""
++        # Copied from _KnitGraphIndex.get_missing_parents
++        # We may have false positives, so filter those out.
++        self._key_dependencies.add_keys(
++            self.get_parent_map(self._key_dependencies.get_unsatisfied_refs()))
++        return frozenset(self._key_dependencies.get_unsatisfied_refs())
++
      def get_build_details(self, keys):
          """Get the various build details for keys.
@@ -1719,6 +1739,23 @@
          delta_end = int(bits[3])
          return node[0], start, stop, basis_end, delta_end
++    def scan_unvalidated_index(self, graph_index):
++        """Inform this _GCGraphIndex that there is an unvalidated index.
++
++        This allows this _GCGraphIndex to keep track of any missing
++        compression parents we may want to have filled in to make those
++        indices valid.
++
++        :param graph_index: A GraphIndex
++        """
++        if self._key_dependencies is not None:
++            # Add parent refs from graph_index (and discard parent refs that
++            # the graph_index has).
++            add_refs = self._key_dependencies.add_references
++            for node in graph_index.iter_all_entries():
++                add_refs(node[1], node[3][0])
++
++
  from bzrlib._groupcompress_py import (
      apply_delta,
 === modified file 'bzrlib/inventory.py'
 --- bzrlib/inventory.py	2009-04-10 12:11:58 +0000
 +++ bzrlib/inventory.py	2009-05-29 10:35:20 +0000
@@ -1547,11 +1547,9 @@
      def _get_mutable_inventory(self):
          """See CommonInventory._get_mutable_inventory."""
          entries = self.iter_entries()
--        if self.root_id is not None:
--            entries.next()
--        inv = Inventory(self.root_id, self.revision_id)
++        inv = Inventory(None, self.revision_id)
          for path, inv_entry in entries:
--            inv.add(inv_entry)
++            inv.add(inv_entry.copy())
          return inv
      def create_by_apply_delta(self, inventory_delta, new_revision_id,
 === modified file 'bzrlib/knit.py'
 --- bzrlib/knit.py	2009-05-25 19:04:59 +0000
 +++ bzrlib/knit.py	2009-05-29 10:35:20 +0000
@@ -2882,6 +2882,8 @@
      def get_missing_parents(self):
          """Return the keys of missing parents."""
++        # If updating this, you should also update
++        # groupcompress._GCGraphIndex.get_missing_parents
          # We may have false positives, so filter those out.
          self._key_dependencies.add_keys(
              self.get_parent_map(self._key_dependencies.get_unsatisfied_refs()))
 === modified file 'bzrlib/remote.py'
 --- bzrlib/remote.py	2009-05-10 23:45:33 +0000
 +++ bzrlib/remote.py	2009-05-29 10:35:21 +0000
@@ -670,9 +670,10 @@
          self._ensure_real()
          return self._real_repository.suspend_write_group()
--    def get_missing_parent_inventories(self):
++    def get_missing_parent_inventories(self, check_for_missing_texts=True):
          self._ensure_real()
--        return self._real_repository.get_missing_parent_inventories()
++        return self._real_repository.get_missing_parent_inventories(
++            check_for_missing_texts=check_for_missing_texts)
      def _ensure_real(self):
          """Ensure that there is a _real_repository set.
@@ -860,10 +861,10 @@
              self._unstacked_provider.enable_cache(cache_misses=True)
              if self._real_repository is not None:
                  self._real_repository.lock_read()
++            for repo in self._fallback_repositories:
++                repo.lock_read()
          else:
              self._lock_count += 1
--        for repo in self._fallback_repositories:
--            repo.lock_read()
      def _remote_lock_write(self, token):
          path = self.bzrdir._path_for_remote_call(self._client)
@@ -901,13 +902,13 @@
              self._lock_count = 1
              cache_misses = self._real_repository is None
              self._unstacked_provider.enable_cache(cache_misses=cache_misses)
++            for repo in self._fallback_repositories:
++                # Writes don't affect fallback repos
++                repo.lock_read()
          elif self._lock_mode == 'r':
              raise errors.ReadOnlyError(self)
          else:
              self._lock_count += 1
--        for repo in self._fallback_repositories:
--            # Writes don't affect fallback repos
--            repo.lock_read()
          return self._lock_token or None
      def leave_lock_in_place(self):
@@ -1015,6 +1016,10 @@
                  self._lock_token = None
                  if not self._leave_lock:
                      self._unlock(old_token)
++        # Fallbacks are always 'lock_read()' so we don't pay attention to
++        # self._leave_lock
++        for repo in self._fallback_repositories:
++            repo.unlock()
      def break_lock(self):
          # should hand off to the network
@@ -1084,6 +1089,11 @@
          # We need to accumulate additional repositories here, to pass them in
          # on various RPC's.
+         #
++        if self.is_locked():
++            # We will call fallback.unlock() when we transition to the unlocked
++            # state, so always add a lock here. If a caller passes us a locked
++            # repository, they are responsible for unlocking it later.
++            repository.lock_read()
          self._fallback_repositories.append(repository)
          # If self._real_repository was parameterised already (e.g. because a
          # _real_branch had its get_stacked_on_url method called), then the
@@ -1971,7 +1981,7 @@
          except (errors.NotStacked, errors.UnstackableBranchFormat,
              errors.UnstackableRepositoryFormat), e:
              return
--        self._activate_fallback_location(fallback_url, None)
++        self._activate_fallback_location(fallback_url)
      def _get_config(self):
          return RemoteBranchConfig(self)
 === modified file 'bzrlib/repofmt/groupcompress_repo.py'
 --- bzrlib/repofmt/groupcompress_repo.py	2009-05-26 13:12:59 +0000
 +++ bzrlib/repofmt/groupcompress_repo.py	2009-05-29 10:35:21 +0000
@@ -51,6 +51,7 @@
      PackRootCommitBuilder,
      RepositoryPackCollection,
      RepositoryFormatPack,
++    ResumedPack,
      Packer,
+     )
@@ -163,7 +164,21 @@
          have deltas based on a fallback repository.
          (See <https://bugs.launchpad.net/bzr/+bug/288751>)
          """
--        # Groupcompress packs don't have any external references
++        # Groupcompress packs don't have any external references, arguably CHK
++        # pages have external references, but we cannot 'cheaply' determine
++        # them without actually walking all of the chk pages.
++
++
++class ResumedGCPack(ResumedPack):
++
++    def _check_references(self):
++        """Make sure our external compression parents are present."""
++        # See GCPack._check_references for why this is empty
++
++    def _get_external_refs(self, index):
++        # GC repositories don't have compression parents external to a given
++        # pack file
++        return set()
  class GCCHKPacker(Packer):
@@ -540,6 +555,7 @@
  class GCRepositoryPackCollection(RepositoryPackCollection):
      pack_factory = GCPack
++    resumed_pack_factory = ResumedGCPack
      def _already_packed(self):
          """Is the collection already packed?"""
@@ -609,7 +625,8 @@
          self.revisions = GroupCompressVersionedFiles(
              _GCGraphIndex(self._pack_collection.revision_index.combined_index,
                  add_callback=self._pack_collection.revision_index.add_callback,
--                parents=True, is_locked=self.is_locked),
++                parents=True, is_locked=self.is_locked,
++                track_external_parent_refs=True),
              access=self._pack_collection.revision_index.data_access,
              delta=False)
          self.signatures = GroupCompressVersionedFiles(
@@ -719,52 +736,21 @@
          # make it raise to trap naughty direct users.
          raise NotImplementedError(self._iter_inventory_xmls)
--    def _find_revision_outside_set(self, revision_ids):
--        revision_set = frozenset(revision_ids)
--        for revid in revision_ids:
--            parent_ids = self.get_parent_map([revid]).get(revid, ())
--            for parent in parent_ids:
--                if parent in revision_set:
--                    # Parent is not outside the set
--                    continue
--                if parent not in self.get_parent_map([parent]):
--                    # Parent is a ghost
--                    continue
--                return parent
--        return _mod_revision.NULL_REVISION
++    def _find_parent_ids_of_revisions(self, revision_ids):
++        # TODO: we probably want to make this a helper that other code can get
++        #       at
++        parent_map = self.get_parent_map(revision_ids)
++        parents = set()
++        map(parents.update, parent_map.itervalues())
++        parents.difference_update(revision_ids)
++        parents.discard(_mod_revision.NULL_REVISION)
++        return parents
--    def _find_file_keys_to_fetch(self, revision_ids, pb):
--        rich_root = self.supports_rich_root()
--        revision_outside_set = self._find_revision_outside_set(revision_ids)
--        if revision_outside_set == _mod_revision.NULL_REVISION:
--            uninteresting_root_keys = set()
--        else:
--            uninteresting_inv = self.get_inventory(revision_outside_set)
--            uninteresting_root_keys = set([uninteresting_inv.id_to_entry.key()])
--        interesting_root_keys = set()
--        for idx, inv in enumerate(self.iter_inventories(revision_ids)):
--            interesting_root_keys.add(inv.id_to_entry.key())
--        revision_ids = frozenset(revision_ids)
--        file_id_revisions = {}
--        bytes_to_info = inventory.CHKInventory._bytes_to_utf8name_key
--        for record, items in chk_map.iter_interesting_nodes(self.chk_bytes,
--                    interesting_root_keys, uninteresting_root_keys,
--                    pb=pb):
--            # This is cheating a bit to use the last grabbed 'inv', but it
--            # works
--            for name, bytes in items:
--                (name_utf8, file_id, revision_id) = bytes_to_info(bytes)
--                if not rich_root and name_utf8 == '':
--                    continue
--                if revision_id in revision_ids:
--                    # Would we rather build this up into file_id => revision
--                    # maps?
--                    try:
--                        file_id_revisions[file_id].add(revision_id)
--                    except KeyError:
--                        file_id_revisions[file_id] = set([revision_id])
--        for file_id, revisions in file_id_revisions.iteritems():
--            yield ('file', file_id, revisions)
++    def _find_present_inventory_ids(self, revision_ids):
++        keys = [(r,) for r in revision_ids]
++        parent_map = self.inventories.get_parent_map(keys)
++        present_inventory_ids = set(k[-1] for k in parent_map)
++        return present_inventory_ids
      def fileids_altered_by_revision_ids(self, revision_ids, _inv_weave=None):
          """Find the file ids and versions affected by revisions.
@@ -776,23 +762,39 @@
              revision_ids. Each altered file-ids has the exact revision_ids that
              altered it listed explicitly.
          """
--        rich_roots = self.supports_rich_root()
--        result = {}
++        rich_root = self.supports_rich_root()
++        bytes_to_info = inventory.CHKInventory._bytes_to_utf8name_key
++        file_id_revisions = {}
          pb = ui.ui_factory.nested_progress_bar()
          try:
--            total = len(revision_ids)
--            for pos, inv in enumerate(self.iter_inventories(revision_ids)):
--                pb.update("Finding text references", pos, total)
--                for entry in inv.iter_just_entries():
--                    if entry.revision != inv.revision_id:
--                        continue
--                    if not rich_roots and entry.file_id == inv.root_id:
--                        continue
--                    alterations = result.setdefault(entry.file_id, set([]))
--                    alterations.add(entry.revision)
--            return result
++            parent_ids = self._find_parent_ids_of_revisions(revision_ids)
++            present_parent_inv_ids = self._find_present_inventory_ids(parent_ids)
++            uninteresting_root_keys = set()
++            interesting_root_keys = set()
++            inventories_to_read = set(present_parent_inv_ids)
++            inventories_to_read.update(revision_ids)
++            for inv in self.iter_inventories(inventories_to_read):
++                entry_chk_root_key = inv.id_to_entry.key()
++                if inv.revision_id in present_parent_inv_ids:
++                    uninteresting_root_keys.add(entry_chk_root_key)
++                else:
++                    interesting_root_keys.add(entry_chk_root_key)
++
++            chk_bytes = self.chk_bytes
++            for record, items in chk_map.iter_interesting_nodes(chk_bytes,
++                        interesting_root_keys, uninteresting_root_keys,
++                        pb=pb):
++                for name, bytes in items:
++                    (name_utf8, file_id, revision_id) = bytes_to_info(bytes)
++                    if not rich_root and name_utf8 == '':
++                        continue
++                    try:
++                        file_id_revisions[file_id].add(revision_id)
++                    except KeyError:
++                        file_id_revisions[file_id] = set([revision_id])
          finally:
              pb.finished()
++        return file_id_revisions
      def find_text_key_references(self):
          """Find the text key references within the repository.
@@ -843,12 +845,6 @@
              return GroupCHKStreamSource(self, to_format)
          return super(CHKInventoryRepository, self)._get_source(to_format)
--    def suspend_write_group(self):
--        raise errors.UnsuspendableWriteGroup(self)
--
--    def _resume_write_group(self, tokens):
--        raise errors.UnsuspendableWriteGroup(self)
--
  class GroupCHKStreamSource(repository.StreamSource):
      """Used when both the source and target repo are GroupCHK repos."""
@@ -861,7 +857,7 @@
          self._chk_id_roots = None
          self._chk_p_id_roots = None
--    def _get_filtered_inv_stream(self):
++    def _get_inventory_stream(self, inventory_keys):
          """Get a stream of inventory texts.
          When this function returns, self._chk_id_roots and self._chk_p_id_roots
@@ -873,7 +869,7 @@
              id_roots_set = set()
              p_id_roots_set = set()
              source_vf = self.from_repository.inventories
--            stream = source_vf.get_record_stream(self._revision_keys,
++            stream = source_vf.get_record_stream(inventory_keys,
                                                   'groupcompress', True)
              for record in stream:
                  bytes = record.get_bytes_as('fulltext')
@@ -897,16 +893,29 @@
              p_id_roots_set.clear()
          return ('inventories', _filtered_inv_stream())
--    def _get_filtered_chk_streams(self, excluded_keys):
++    def _find_present_inventories(self, revision_ids):
++        revision_keys = [(r,) for r in revision_ids]
++        inventories = self.from_repository.inventories
++        present_inventories = inventories.get_parent_map(revision_keys)
++        return [p[-1] for p in present_inventories]
++
++    def _get_filtered_chk_streams(self, excluded_revision_ids):
          self._text_keys = set()
--        excluded_keys.discard(_mod_revision.NULL_REVISION)
--        if not excluded_keys:
++        excluded_revision_ids.discard(_mod_revision.NULL_REVISION)
++        if not excluded_revision_ids:
              uninteresting_root_keys = set()
              uninteresting_pid_root_keys = set()
          else:
++            # filter out any excluded revisions whose inventories are not
++            # actually present
++            # TODO: Update Repository.iter_inventories() to add
++            #       ignore_missing=True
++            present_ids = self.from_repository._find_present_inventory_ids(
++                            excluded_revision_ids)
++            present_ids = self._find_present_inventories(excluded_revision_ids)
              uninteresting_root_keys = set()
              uninteresting_pid_root_keys = set()
--            for inv in self.from_repository.iter_inventories(excluded_keys):
++            for inv in self.from_repository.iter_inventories(present_ids):
                  uninteresting_root_keys.add(inv.id_to_entry.key())
                  uninteresting_pid_root_keys.add(
                      inv.parent_id_basename_to_file_id.key())
@@ -922,12 +931,16 @@
                      self._text_keys.add((file_id, revision_id))
                  if record is not None:
                      yield record
++            # Consumed
++            self._chk_id_roots = None
          yield 'chk_bytes', _filter_id_to_entry()
          def _get_parent_id_basename_to_file_id_pages():
              for record, items in chk_map.iter_interesting_nodes(chk_bytes,
                          self._chk_p_id_roots, uninteresting_pid_root_keys):
                  if record is not None:
                      yield record
++            # Consumed
++            self._chk_p_id_roots = None
          yield 'chk_bytes', _get_parent_id_basename_to_file_id_pages()
      def _get_text_stream(self):
@@ -943,18 +956,43 @@
          for stream_info in self._fetch_revision_texts(revision_ids):
              yield stream_info
          self._revision_keys = [(rev_id,) for rev_id in revision_ids]
--        yield self._get_filtered_inv_stream()
--        # The keys to exclude are part of the search recipe
--        _, _, exclude_keys, _ = search.get_recipe()
--        for stream_info in self._get_filtered_chk_streams(exclude_keys):
++        yield self._get_inventory_stream(self._revision_keys)
++        # TODO: The keys to exclude might be part of the search recipe
++        # For now, exclude all parents that are at the edge of ancestry, for
++        # which we have inventories
++        from_repo = self.from_repository
++        parent_ids = from_repo._find_parent_ids_of_revisions(revision_ids)
++        for stream_info in self._get_filtered_chk_streams(parent_ids):
              yield stream_info
          yield self._get_text_stream()
++    def get_stream_for_missing_keys(self, missing_keys):
++        # missing keys can only occur when we are byte copying and not
++        # translating (because translation means we don't send
++        # unreconstructable deltas ever).
++        missing_inventory_keys = set()
++        for key in missing_keys:
++            if key[0] != 'inventories':
++                raise AssertionError('The only missing keys we should'
++                    ' be filling in are inventory keys, not %s'
++                    % (key[0],))
++            missing_inventory_keys.add(key[1:])
++        if self._chk_id_roots or self._chk_p_id_roots:
++            raise AssertionError('Cannot call get_stream_for_missing_keys'
++                ' untill all of get_stream() has been consumed.')
++        # Yield the inventory stream, so we can find the chk stream
++        yield self._get_inventory_stream(missing_inventory_keys)
++        # We use the empty set for excluded_revision_ids, to make it clear that
++        # we want to transmit all referenced chk pages.
++        for stream_info in self._get_filtered_chk_streams(set()):
++            yield stream_info
++
  class RepositoryFormatCHK1(RepositoryFormatPack):
      """A hashed CHK+group compress pack repository."""
      repository_class = CHKInventoryRepository
++    supports_external_lookups = True
      supports_chks = True
      # For right now, setting this to True gives us InterModel1And2 rather
      # than InterDifferingSerializer
 === modified file 'bzrlib/repofmt/pack_repo.py'
 --- bzrlib/repofmt/pack_repo.py	2009-04-27 23:14:00 +0000
 +++ bzrlib/repofmt/pack_repo.py	2009-05-29 10:35:21 +0000
@@ -268,10 +268,11 @@
      def __init__(self, name, revision_index, inventory_index, text_index,
          signature_index, upload_transport, pack_transport, index_transport,
--        pack_collection):
++        pack_collection, chk_index=None):
          """Create a ResumedPack object."""
          ExistingPack.__init__(self, pack_transport, name, revision_index,
--            inventory_index, text_index, signature_index)
++            inventory_index, text_index, signature_index,
++            chk_index=chk_index)
          self.upload_transport = upload_transport
          self.index_transport = index_transport
          self.index_sizes = [None, None, None, None]
@@ -281,6 +282,9 @@
              ('text', text_index),
              ('signature', signature_index),
+             ]
++        if chk_index is not None:
++            indices.append(('chk', chk_index))
++            self.index_sizes.append(None)
          for index_type, index in indices:
              offset = self.index_offset(index_type)
              self.index_sizes[offset] = index._size
@@ -301,6 +305,8 @@
          self.upload_transport.delete(self.file_name())
          indices = [self.revision_index, self.inventory_index, self.text_index,
              self.signature_index]
++        if self.chk_index is not None:
++            indices.append(self.chk_index)
          for index in indices:
              index._transport.delete(index._name)
@@ -308,7 +314,10 @@
          self._check_references()
          new_name = '../packs/' + self.file_name()
          self.upload_transport.rename(self.file_name(), new_name)
--        for index_type in ['revision', 'inventory', 'text', 'signature']:
++        index_types = ['revision', 'inventory', 'text', 'signature']
++        if self.chk_index is not None:
++            index_types.append('chk')
++        for index_type in index_types:
              old_name = self.index_name(index_type, self.name)
              new_name = '../indices/' + old_name
              self.upload_transport.rename(old_name, new_name)
@@ -316,6 +325,11 @@
          self._state = 'finished'
      def _get_external_refs(self, index):
++        """Return compression parents for this index that are not present.
++
++        This returns any compression parents that are referenced by this index,
++        which are not contained *in* this index. They may be present elsewhere.
++        """
          return index.external_references(1)
@@ -1352,6 +1366,7 @@
      """
      pack_factory = NewPack
++    resumed_pack_factory = ResumedPack
      def __init__(self, repo, transport, index_transport, upload_transport,
                   pack_transport, index_builder_class, index_class,
@@ -1680,9 +1695,14 @@
              inv_index = self._make_index(name, '.iix', resume=True)
              txt_index = self._make_index(name, '.tix', resume=True)
              sig_index = self._make_index(name, '.six', resume=True)
--            result = ResumedPack(name, rev_index, inv_index, txt_index,
--                sig_index, self._upload_transport, self._pack_transport,
--                self._index_transport, self)
++            if self.chk_index is not None:
++                chk_index = self._make_index(name, '.cix', resume=True)
++            else:
++                chk_index = None
++            result = self.resumed_pack_factory(name, rev_index, inv_index,
++                txt_index, sig_index, self._upload_transport,
++                self._pack_transport, self._index_transport, self,
++                chk_index=chk_index)
          except errors.NoSuchFile, e:
              raise errors.UnresumableWriteGroup(self.repo, [name], str(e))
          self.add_pack_to_memory(result)
@@ -1809,14 +1829,11 @@
      def reset(self):
          """Clear all cached data."""
          # cached revision data
--        self.repo._revision_knit = None
          self.revision_index.clear()
          # cached signature data
--        self.repo._signature_knit = None
          self.signature_index.clear()
          # cached file text data
          self.text_index.clear()
--        self.repo._text_knit = None
          # cached inventory data
          self.inventory_index.clear()
          # cached chk data
@@ -2035,7 +2052,6 @@
                  except KeyError:
                      pass
          del self._resumed_packs[:]
--        self.repo._text_knit = None
      def _remove_resumed_pack_indices(self):
          for resumed_pack in self._resumed_packs:
@@ -2081,7 +2097,6 @@
                  # when autopack takes no steps, the names list is still
                  # unsaved.
                  self._save_pack_names()
--        self.repo._text_knit = None
      def _suspend_write_group(self):
          tokens = [pack.name for pack in self._resumed_packs]
@@ -2095,7 +2110,6 @@
              self._new_pack.abort()
              self._new_pack = None
          self._remove_resumed_pack_indices()
--        self.repo._text_knit = None
          return tokens
      def _resume_write_group(self, tokens):
@@ -2202,6 +2216,7 @@
                      % (self._format, self.bzrdir.transport.base))
      def _abort_write_group(self):
++        self.revisions._index._key_dependencies.refs.clear()
          self._pack_collection._abort_write_group()
      def _find_inconsistent_revision_parents(self):
@@ -2262,11 +2277,13 @@
          self._pack_collection._start_write_group()
      def _commit_write_group(self):
++        self.revisions._index._key_dependencies.refs.clear()
          return self._pack_collection._commit_write_group()
      def suspend_write_group(self):
          # XXX check self._write_group is self.get_transaction()?
          tokens = self._pack_collection._suspend_write_group()
++        self.revisions._index._key_dependencies.refs.clear()
          self._write_group = None
          return tokens
@@ -2295,10 +2312,10 @@
          self._write_lock_count += 1
          if self._write_lock_count == 1:
              self._transaction = transactions.WriteTransaction()
++        if not locked:
              for repo in self._fallback_repositories:
                  # Writes don't affect fallback repos
                  repo.lock_read()
--        if not locked:
              self._refresh_data()
      def lock_read(self):
@@ -2307,10 +2324,9 @@
              self._write_lock_count += 1
          else:
              self.control_files.lock_read()
++        if not locked:
              for repo in self._fallback_repositories:
--                # Writes don't affect fallback repos
                  repo.lock_read()
--        if not locked:
              self._refresh_data()
      def leave_lock_in_place(self):
@@ -2356,10 +2372,10 @@
                  transaction = self._transaction
                  self._transaction = None
                  transaction.finish()
--                for repo in self._fallback_repositories:
--                    repo.unlock()
          else:
              self.control_files.unlock()
++
++        if not self.is_locked():
              for repo in self._fallback_repositories:
                  repo.unlock()
 === modified file 'bzrlib/repository.py'
 --- bzrlib/repository.py	2009-05-12 04:54:04 +0000
 +++ bzrlib/repository.py	2009-05-29 10:35:21 +0000
@@ -969,6 +969,10 @@
          """
          if not self._format.supports_external_lookups:
              raise errors.UnstackableRepositoryFormat(self._format, self.base)
++        if self.is_locked():
++            # This repository will call fallback.unlock() when we transition to
++            # the unlocked state, so we make sure to increment the lock count
++            repository.lock_read()
          self._check_fallback_repository(repository)
          self._fallback_repositories.append(repository)
          self.texts.add_fallback_versioned_files(repository.texts)
@@ -1240,19 +1244,19 @@
          """
          locked = self.is_locked()
          result = self.control_files.lock_write(token=token)
--        for repo in self._fallback_repositories:
--            # Writes don't affect fallback repos
--            repo.lock_read()
          if not locked:
++            for repo in self._fallback_repositories:
++                # Writes don't affect fallback repos
++                repo.lock_read()
              self._refresh_data()
          return result
      def lock_read(self):
          locked = self.is_locked()
          self.control_files.lock_read()
--        for repo in self._fallback_repositories:
--            repo.lock_read()
          if not locked:
++            for repo in self._fallback_repositories:
++                repo.lock_read()
              self._refresh_data()
      def get_physical_lock_status(self):
@@ -1424,7 +1428,7 @@
      def suspend_write_group(self):
          raise errors.UnsuspendableWriteGroup(self)
--    def get_missing_parent_inventories(self):
++    def get_missing_parent_inventories(self, check_for_missing_texts=True):
          """Return the keys of missing inventory parents for revisions added in
          this write group.
@@ -1439,7 +1443,7 @@
              return set()
          if not self.is_in_write_group():
              raise AssertionError('not in a write group')
--
++
          # XXX: We assume that every added revision already has its
          # corresponding inventory, so we only check for parent inventories that
          # might be missing, rather than all inventories.
@@ -1448,9 +1452,12 @@
          unstacked_inventories = self.inventories._index
          present_inventories = unstacked_inventories.get_parent_map(
              key[-1:] for key in parents)
--        if len(parents.difference(present_inventories)) == 0:
++        parents.difference_update(present_inventories)
++        if len(parents) == 0:
              # No missing parent inventories.
              return set()
++        if not check_for_missing_texts:
++            return set(('inventories', rev_id) for (rev_id,) in parents)
          # Ok, now we have a list of missing inventories.  But these only matter
          # if the inventories that reference them are missing some texts they
          # appear to introduce.
@@ -1577,8 +1584,8 @@
          self.control_files.unlock()
          if self.control_files._lock_count == 0:
              self._inventory_entry_cache.clear()
--        for repo in self._fallback_repositories:
--            repo.unlock()
++            for repo in self._fallback_repositories:
++                repo.unlock()
      @needs_read_lock
      def clone(self, a_bzrdir, revision_id=None):
@@ -4003,18 +4010,20 @@
          try:
              if resume_tokens:
                  self.target_repo.resume_write_group(resume_tokens)
++                is_resume = True
              else:
                  self.target_repo.start_write_group()
++                is_resume = False
              try:
                  # locked_insert_stream performs a commit|suspend.
--                return self._locked_insert_stream(stream, src_format)
++                return self._locked_insert_stream(stream, src_format, is_resume)
              except:
                  self.target_repo.abort_write_group(suppress_errors=True)
                  raise
          finally:
              self.target_repo.unlock()
--    def _locked_insert_stream(self, stream, src_format):
++    def _locked_insert_stream(self, stream, src_format, is_resume):
          to_serializer = self.target_repo._format._serializer
          src_serializer = src_format._serializer
          new_pack = None
@@ -4070,14 +4079,18 @@
          if new_pack is not None:
              new_pack._write_data('', flush=True)
          # Find all the new revisions (including ones from resume_tokens)
--        missing_keys = self.target_repo.get_missing_parent_inventories()
++        missing_keys = self.target_repo.get_missing_parent_inventories(
++            check_for_missing_texts=is_resume)
          try:
              for prefix, versioned_file in (
                  ('texts', self.target_repo.texts),
                  ('inventories', self.target_repo.inventories),
                  ('revisions', self.target_repo.revisions),
                  ('signatures', self.target_repo.signatures),
++                ('chk_bytes', self.target_repo.chk_bytes),
                  ):
++                if versioned_file is None:
++                    continue
                  missing_keys.update((prefix,) + key for key in
                      versioned_file.get_missing_compression_parent_keys())
          except NotImplementedError:
@@ -4230,6 +4243,7 @@
          keys['texts'] = set()
          keys['revisions'] = set()
          keys['inventories'] = set()
++        keys['chk_bytes'] = set()
          keys['signatures'] = set()
          for key in missing_keys:
              keys[key[0]].add(key[1:])
@@ -4242,6 +4256,13 @@
                      keys['revisions'],))
          for substream_kind, keys in keys.iteritems():
              vf = getattr(self.from_repository, substream_kind)
++            if vf is None and keys:
++                    raise AssertionError(
++                        "cannot fill in keys for a versioned file we don't"
++                        " have: %s needs %s" % (substream_kind, keys))
++            if not keys:
++                # No need to stream something we don't have
++                continue
              # Ask for full texts always so that we don't need more round trips
              # after this stream.
              stream = vf.get_record_stream(keys,
 === modified file 'bzrlib/tests/per_repository/test_fileid_involved.py'
 --- bzrlib/tests/per_repository/test_fileid_involved.py	2009-03-23 14:59:43 +0000
 +++ bzrlib/tests/per_repository/test_fileid_involved.py	2009-05-29 10:35:21 +0000
@@ -1,4 +1,4 @@
--# Copyright (C) 2005 Canonical Ltd
++# Copyright (C) 2005, 2009 Canonical Ltd
+ #
  # This program is free software; you can redistribute it and/or modify
  # it under the terms of the GNU General Public License as published by
@@ -16,7 +16,12 @@
  import os
  import sys
++import time
++from bzrlib import (
++    revision as _mod_revision,
++    tests,
++    )
  from bzrlib.errors import IllegalPath, NonAsciiRevisionId
  from bzrlib.tests import TestSkipped
  from bzrlib.tests.per_repository.test_repository import TestCaseWithRepository
@@ -49,11 +54,11 @@
          super(TestFileIdInvolved, self).setUp()
          # create three branches, and merge it
+         #
--        #           /-->J ------>K                (branch2)
--        #          /              \
--        #  A ---> B --->C ---->D->G               (main)
--        #  \           /      /
--        #   \---> E---/----> F                 (branch1)
++        #          ,-->J------>K                (branch2)
++        #         /             \
++        #  A --->B --->C---->D-->G              (main)
++        #  \          /     /
++        #   '--->E---+---->F                    (branch1)
          # A changes:
          # B changes: 'a-file-id-2006-01-01-abcd'
@@ -137,8 +142,6 @@
          self.branch = main_branch
      def test_fileids_altered_between_two_revs(self):
--        def foo(old, new):
--            print set(self.branch.repository.get_ancestry(new)).difference(set(self.branch.repository.get_ancestry(old)))
          self.branch.lock_read()
          self.addCleanup(self.branch.unlock)
          self.branch.repository.fileids_altered_by_revision_ids(["rev-J","rev-K"])
@@ -295,7 +298,7 @@
          self.branch = main_branch
      def test_fileid_involved_full_compare2(self):
--        # this tests that fileids_alteted_by_revision_ids returns
++        # this tests that fileids_altered_by_revision_ids returns
          # more information than compare_tree can, because it
          # sees each change rather than the aggregate delta.
          self.branch.lock_read()
@@ -315,6 +318,73 @@
          self.assertSubset(l2, l1)
++class FileIdInvolvedWGhosts(TestCaseWithRepository):
++
++    def create_branch_with_ghost_text(self):
++        builder = self.make_branch_builder('ghost')
++        builder.build_snapshot('A-id', None, [
++            ('add', ('', 'root-id', 'directory', None)),
++            ('add', ('a', 'a-file-id', 'file', 'some content\n'))])
++        b = builder.get_branch()
++        old_rt = b.repository.revision_tree('A-id')
++        new_inv = old_rt.inventory._get_mutable_inventory()
++        new_inv.revision_id = 'B-id'
++        new_inv['a-file-id'].revision = 'ghost-id'
++        new_rev = _mod_revision.Revision('B-id',
++            timestamp=time.time(),
++            timezone=0,
++            message='Committing against a ghost',
++            committer='Joe Foo <joe@foo.com>',
++            properties={},
++            parent_ids=('A-id', 'ghost-id'),
++            )
++        b.lock_write()
++        self.addCleanup(b.unlock)
++        b.repository.start_write_group()
++        b.repository.add_revision('B-id', new_rev, new_inv)
++        b.repository.commit_write_group()
++        return b
++
++    def test_file_ids_include_ghosts(self):
++        b = self.create_branch_with_ghost_text()
++        repo = b.repository
++        self.assertEqual(
++            {'a-file-id':set(['ghost-id'])},
++            repo.fileids_altered_by_revision_ids(['B-id']))
++
++    def test_file_ids_uses_fallbacks(self):
++        builder = self.make_branch_builder('source',
++                                           format=self.bzrdir_format)
++        repo = builder.get_branch().repository
++        if not repo._format.supports_external_lookups:
++            raise tests.TestNotApplicable('format does not support stacking')
++        builder.start_series()
++        builder.build_snapshot('A-id', None, [
++            ('add', ('', 'root-id', 'directory', None)),
++            ('add', ('file', 'file-id', 'file', 'contents\n'))])
++        builder.build_snapshot('B-id', ['A-id'], [
++            ('modify', ('file-id', 'new-content\n'))])
++        builder.build_snapshot('C-id', ['B-id'], [
++            ('modify', ('file-id', 'yet more content\n'))])
++        builder.finish_series()
++        source_b = builder.get_branch()
++        source_b.lock_read()
++        self.addCleanup(source_b.unlock)
++        base = self.make_branch('base')
++        base.pull(source_b, stop_revision='B-id')
++        stacked = self.make_branch('stacked')
++        stacked.set_stacked_on_url('../base')
++        stacked.pull(source_b, stop_revision='C-id')
++
++        stacked.lock_read()
++        self.addCleanup(stacked.unlock)
++        repo = stacked.repository
++        keys = {'file-id': set(['A-id'])}
++        if stacked.repository.supports_rich_root():
++            keys['root-id'] = set(['A-id'])
++        self.assertEqual(keys, repo.fileids_altered_by_revision_ids(['A-id']))
++
++
  def set_executability(wt, path, executable=True):
      """Set the executable bit for the file at path in the working tree
 === modified file 'bzrlib/tests/per_repository/test_write_group.py'
 --- bzrlib/tests/per_repository/test_write_group.py	2009-05-12 09:05:30 +0000
 +++ bzrlib/tests/per_repository/test_write_group.py	2009-05-29 10:35:21 +0000
@@ -18,7 +18,15 @@
  import sys
--from bzrlib import bzrdir, errors, graph, memorytree, remote
++from bzrlib import (
++    bzrdir,
++    errors,
++    graph,
++    memorytree,
++    osutils,
++    remote,
++    versionedfile,
++    )
  from bzrlib.branch import BzrBranchFormat7
  from bzrlib.inventory import InventoryDirectory
  from bzrlib.transport import local, memory
@@ -240,9 +248,9 @@
          inventory) in it must have all the texts in its inventory (even if not
          changed w.r.t. to the absent parent), otherwise it will report missing
          texts/parent inventory.
--
++
          The core of this test is that a file was changed in rev-1, but in a
--        stacked repo that only has rev-2
++        stacked repo that only has rev-2
          """
          # Make a trunk with one commit.
          trunk_repo = self.make_stackable_repo()
@@ -284,6 +292,69 @@
              set(), reopened_repo.get_missing_parent_inventories())
          reopened_repo.abort_write_group()
++    def test_get_missing_parent_inventories_check(self):
++        builder = self.make_branch_builder('test')
++        builder.build_snapshot('A-id', ['ghost-parent-id'], [
++            ('add', ('', 'root-id', 'directory', None)),
++            ('add', ('file', 'file-id', 'file', 'content\n'))],
++            allow_leftmost_as_ghost=True)
++        b = builder.get_branch()
++        b.lock_read()
++        self.addCleanup(b.unlock)
++        repo = self.make_repository('test-repo')
++        repo.lock_write()
++        self.addCleanup(repo.unlock)
++        repo.start_write_group()
++        self.addCleanup(repo.abort_write_group)
++        # Now, add the objects manually
++        text_keys = [('file-id', 'A-id')]
++        if repo.supports_rich_root():
++            text_keys.append(('root-id', 'A-id'))
++        # Directly add the texts, inventory, and revision object for 'A-id'
++        repo.texts.insert_record_stream(b.repository.texts.get_record_stream(
++            text_keys, 'unordered', True))
++        repo.add_revision('A-id', b.repository.get_revision('A-id'),
++                          b.repository.get_inventory('A-id'))
++        get_missing = repo.get_missing_parent_inventories
++        if repo._format.supports_external_lookups:
++            self.assertEqual(set([('inventories', 'ghost-parent-id')]),
++                get_missing(check_for_missing_texts=False))
++            self.assertEqual(set(), get_missing(check_for_missing_texts=True))
++            self.assertEqual(set(), get_missing())
++        else:
++            # If we don't support external lookups, we always return empty
++            self.assertEqual(set(), get_missing(check_for_missing_texts=False))
++            self.assertEqual(set(), get_missing(check_for_missing_texts=True))
++            self.assertEqual(set(), get_missing())
++
++    def test_insert_stream_passes_resume_info(self):
++        repo = self.make_repository('test-repo')
++        if not repo._format.supports_external_lookups:
++            raise TestNotApplicable('only valid in resumable repos')
++        # log calls to get_missing_parent_inventories, so that we can assert it
++        # is called with the correct parameters
++        call_log = []
++        orig = repo.get_missing_parent_inventories
++        def get_missing(check_for_missing_texts=True):
++            call_log.append(check_for_missing_texts)
++            return orig(check_for_missing_texts=check_for_missing_texts)
++        repo.get_missing_parent_inventories = get_missing
++        repo.lock_write()
++        self.addCleanup(repo.unlock)
++        sink = repo._get_sink()
++        sink.insert_stream((), repo._format, [])
++        self.assertEqual([False], call_log)
++        del call_log[:]
++        repo.start_write_group()
++        # We need to insert something, or suspend_write_group won't actually
++        # create a token
++        repo.texts.insert_record_stream([versionedfile.FulltextContentFactory(
++            ('file-id', 'rev-id'), (), None, 'lines\n')])
++        tokens = repo.suspend_write_group()
++        self.assertNotEqual([], tokens)
++        sink.insert_stream((), repo._format, tokens)
++        self.assertEqual([True], call_log)
++
  class TestResumeableWriteGroup(TestCaseWithRepository):
@@ -518,9 +589,12 @@
          source_repo.start_write_group()
          key_base = ('file-id', 'base')
          key_delta = ('file-id', 'delta')
--        source_repo.texts.add_lines(key_base, (), ['lines\n'])
--        source_repo.texts.add_lines(
--            key_delta, (key_base,), ['more\n', 'lines\n'])
++        def text_stream():
++            yield versionedfile.FulltextContentFactory(
++                key_base, (), None, 'lines\n')
++            yield versionedfile.FulltextContentFactory(
++                key_delta, (key_base,), None, 'more\nlines\n')
++        source_repo.texts.insert_record_stream(text_stream())
          source_repo.commit_write_group()
          return source_repo
@@ -536,9 +610,20 @@
          stream = source_repo.texts.get_record_stream(
              [key_delta], 'unordered', False)
          repo.texts.insert_record_stream(stream)
--        # It's not commitable due to the missing compression parent.
--        self.assertRaises(
--            errors.BzrCheckError, repo.commit_write_group)
++        # It's either not commitable due to the missing compression parent, or
++        # the stacked location has already filled in the fulltext.
++        try:
++            repo.commit_write_group()
++        except errors.BzrCheckError:
++            # It refused to commit because we have a missing parent
++            pass
++        else:
++            same_repo = self.reopen_repo(repo)
++            same_repo.lock_read()
++            record = same_repo.texts.get_record_stream([key_delta],
++                                                       'unordered', True).next()
++            self.assertEqual('more\nlines\n', record.get_bytes_as('fulltext'))
++            return
          # Merely suspending and resuming doesn't make it commitable either.
          wg_tokens = repo.suspend_write_group()
          same_repo = self.reopen_repo(repo)
@@ -570,8 +655,19 @@
          same_repo.texts.insert_record_stream(stream)
          # Just like if we'd added that record without a suspend/resume cycle,
          # commit_write_group fails.
--        self.assertRaises(
--            errors.BzrCheckError, same_repo.commit_write_group)
++        try:
++            same_repo.commit_write_group()
++        except errors.BzrCheckError:
++            pass
++        else:
++            # If the commit_write_group didn't fail, that is because the
++            # insert_record_stream already gave it a fulltext.
++            same_repo = self.reopen_repo(repo)
++            same_repo.lock_read()
++            record = same_repo.texts.get_record_stream([key_delta],
++                                                       'unordered', True).next()
++            self.assertEqual('more\nlines\n', record.get_bytes_as('fulltext'))
++            return
          same_repo.abort_write_group()
      def test_add_missing_parent_after_resume(self):
 === modified file 'bzrlib/tests/per_repository_reference/__init__.py'
 --- bzrlib/tests/per_repository_reference/__init__.py	2009-03-23 14:59:43 +0000
 +++ bzrlib/tests/per_repository_reference/__init__.py	2009-05-29 10:35:21 +0000
@@ -97,6 +97,9 @@
          'bzrlib.tests.per_repository_reference.test_break_lock',
          'bzrlib.tests.per_repository_reference.test_check',
          'bzrlib.tests.per_repository_reference.test_default_stacking',
++        'bzrlib.tests.per_repository_reference.test_fetch',
++        'bzrlib.tests.per_repository_reference.test_initialize',
++        'bzrlib.tests.per_repository_reference.test_unlock',
+         ]
      # Parameterize per_repository_reference test modules by format.
      standard_tests.addTests(loader.loadTestsFromModuleNames(module_list))
 === modified file 'bzrlib/tests/per_repository_reference/test_default_stacking.py'
 --- bzrlib/tests/per_repository_reference/test_default_stacking.py	2009-03-23 14:59:43 +0000
 +++ bzrlib/tests/per_repository_reference/test_default_stacking.py	2009-05-29 10:35:21 +0000
@@ -21,19 +21,13 @@
  class TestDefaultStackingPolicy(TestCaseWithRepository):
--    # XXX: this helper probably belongs on TestCaseWithTransport
--    def make_smart_server(self, path):
--        smart_server = server.SmartTCPServer_for_testing()
--        smart_server.setUp(self.get_server())
--        return smart_server.get_url() + path
--
      def test_sprout_to_smart_server_stacking_policy_handling(self):
          """Obey policy where possible, ignore otherwise."""
          stack_on = self.make_branch('stack-on')
          parent_bzrdir = self.make_bzrdir('.', format='default')
          parent_bzrdir.get_config().set_default_stack_on('stack-on')
          source = self.make_branch('source')
--        url = self.make_smart_server('target')
++        url = self.make_smart_server('target').abspath('')
          target = source.bzrdir.sprout(url).open_branch()
          self.assertEqual('../stack-on', target.get_stacked_on_url())
          self.assertEqual(
 === added file 'bzrlib/tests/per_repository_reference/test_fetch.py'
 --- bzrlib/tests/per_repository_reference/test_fetch.py	1970-01-01 00:00:00 +0000
 +++ bzrlib/tests/per_repository_reference/test_fetch.py	2009-05-29 10:35:21 +0000
@@ -0,0 +1,101 @@
++# Copyright (C) 2009 Canonical Ltd
++#
++# This program is free software; you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation; either version 2 of the License, or
++# (at your option) any later version.
++#
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
++# GNU General Public License for more details.
++#
++# You should have received a copy of the GNU General Public License
++# along with this program; if not, write to the Free Software
++# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
++
++
++from bzrlib.smart import server
++from bzrlib.tests.per_repository import TestCaseWithRepository
++
++
++class TestFetch(TestCaseWithRepository):
++
++    def make_source_branch(self):
++        # It would be nice if there was a way to force this to be memory-only
++        builder = self.make_branch_builder('source')
++        content = ['content lines\n'
++                   'for the first revision\n'
++                   'which is a marginal amount of content\n'
++                  ]
++        builder.start_series()
++        builder.build_snapshot('A-id', None, [
++            ('add', ('', 'root-id', 'directory', None)),
++            ('add', ('a', 'a-id', 'file', ''.join(content))),
++            ])
++        content.append('and some more lines for B\n')
++        builder.build_snapshot('B-id', ['A-id'], [
++            ('modify', ('a-id', ''.join(content)))])
++        content.append('and yet even more content for C\n')
++        builder.build_snapshot('C-id', ['B-id'], [
++            ('modify', ('a-id', ''.join(content)))])
++        builder.finish_series()
++        source_b = builder.get_branch()
++        source_b.lock_read()
++        self.addCleanup(source_b.unlock)
++        return content, source_b
++
++    def test_sprout_from_stacked_with_short_history(self):
++        content, source_b = self.make_source_branch()
++        # Split the generated content into a base branch, and a stacked branch
++        # Use 'make_branch' which gives us a bzr:// branch when appropriate,
++        # rather than creating a branch-on-disk
++        stack_b = self.make_branch('stack-on')
++        stack_b.pull(source_b, stop_revision='B-id')
++        target_b = self.make_branch('target')
++        target_b.set_stacked_on_url('../stack-on')
++        target_b.pull(source_b, stop_revision='C-id')
++        # At this point, we should have a target branch, with 1 revision, on
++        # top of the source.
++        final_b = self.make_branch('final')
++        final_b.pull(target_b)
++        final_b.lock_read()
++        self.addCleanup(final_b.unlock)
++        self.assertEqual('C-id', final_b.last_revision())
++        text_keys = [('a-id', 'A-id'), ('a-id', 'B-id'), ('a-id', 'C-id')]
++        stream = final_b.repository.texts.get_record_stream(text_keys,
++            'unordered', True)
++        records = sorted([(r.key, r.get_bytes_as('fulltext')) for r in stream])
++        self.assertEqual([
++            (('a-id', 'A-id'), ''.join(content[:-2])),
++            (('a-id', 'B-id'), ''.join(content[:-1])),
++            (('a-id', 'C-id'), ''.join(content)),
++            ], records)
++
++    def test_sprout_from_smart_stacked_with_short_history(self):
++        content, source_b = self.make_source_branch()
++        transport = self.make_smart_server('server')
++        transport.ensure_base()
++        url = transport.abspath('')
++        stack_b = source_b.bzrdir.sprout(url + '/stack-on', revision_id='B-id')
++        # self.make_branch only takes relative paths, so we do it the 'hard'
++        # way
++        target_transport = transport.clone('target')
++        target_transport.ensure_base()
++        target_bzrdir = self.bzrdir_format.initialize_on_transport(
++                            target_transport)
++        target_bzrdir.create_repository()
++        target_b = target_bzrdir.create_branch()
++        target_b.set_stacked_on_url('../stack-on')
++        target_b.pull(source_b, stop_revision='C-id')
++        # Now we should be able to branch from the remote location to a local
++        # location
++        final_b = target_b.bzrdir.sprout('final').open_branch()
++        self.assertEqual('C-id', final_b.last_revision())
++
++        # bzrdir.sprout() has slightly different code paths if you supply a
++        # revision_id versus not. If you supply revision_id, then you get a
++        # PendingAncestryResult for the search, versus a SearchResult...
++        final2_b = target_b.bzrdir.sprout('final2',
++                                          revision_id='C-id').open_branch()
++        self.assertEqual('C-id', final_b.last_revision())
 === added file 'bzrlib/tests/per_repository_reference/test_initialize.py'
 --- bzrlib/tests/per_repository_reference/test_initialize.py	1970-01-01 00:00:00 +0000
 +++ bzrlib/tests/per_repository_reference/test_initialize.py	2009-05-29 10:35:21 +0000
@@ -0,0 +1,59 @@
++# Copyright (C) 2009 Canonical Ltd
++#
++# This program is free software; you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation; either version 2 of the License, or
++# (at your option) any later version.
++#
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
++# GNU General Public License for more details.
++#
++# You should have received a copy of the GNU General Public License
++# along with this program; if not, write to the Free Software
++# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
++
++"""Tests for initializing a repository with external references."""
++
++
++from bzrlib import (
++    errors,
++    tests,
++    )
++from bzrlib.tests.per_repository_reference import (
++    TestCaseWithExternalReferenceRepository,
++    )
++
++
++class TestInitialize(TestCaseWithExternalReferenceRepository):
++
++    def initialize_and_check_on_transport(self, base, trans):
++        network_name = base.repository._format.network_name()
++        result = self.bzrdir_format.initialize_on_transport_ex(
++            trans, use_existing_dir=False, create_prefix=False,
++            stacked_on='../base', stack_on_pwd=base.base,
++            repo_format_name=network_name)
++        result_repo, a_bzrdir, require_stacking, repo_policy = result
++        self.addCleanup(result_repo.unlock)
++        self.assertEqual(1, len(result_repo._fallback_repositories))
++        return result_repo
++
++    def test_initialize_on_transport_ex(self):
++        base = self.make_branch('base')
++        trans = self.get_transport('stacked')
++        repo = self.initialize_and_check_on_transport(base, trans)
++        self.assertEqual(base.repository._format.network_name(),
++                         repo._format.network_name())
++
++    def test_remote_initialize_on_transport_ex(self):
++        # All formats can be initialized appropriately over bzr://
++        base = self.make_branch('base')
++        trans = self.make_smart_server('stacked')
++        repo = self.initialize_and_check_on_transport(base, trans)
++        network_name = base.repository._format.network_name()
++        if network_name != repo._format.network_name():
++            raise tests.KnownFailure('Remote initialize_on_transport_ex()'
++                ' tries to "upgrade" the format because it doesn\'t have a'
++                ' branch format, and hard-codes the new repository format.')
++        self.assertEqual(network_name, repo._format.network_name())
 === added file 'bzrlib/tests/per_repository_reference/test_unlock.py'
 --- bzrlib/tests/per_repository_reference/test_unlock.py	1970-01-01 00:00:00 +0000
 +++ bzrlib/tests/per_repository_reference/test_unlock.py	2009-05-29 10:35:21 +0000
@@ -0,0 +1,76 @@
++# Copyright (C) 2009 Canonical Ltd
++#
++# This program is free software; you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation; either version 2 of the License, or
++# (at your option) any later version.
++#
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
++# GNU General Public License for more details.
++#
++# You should have received a copy of the GNU General Public License
++# along with this program; if not, write to the Free Software
++# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
++
++"""Tests for locking/unlocking a repository with external references."""
++
++from bzrlib import (
++    branch,
++    errors,
++    )
++from bzrlib.tests.per_repository_reference import (
++    TestCaseWithExternalReferenceRepository,
++    )
++
++
++class TestUnlock(TestCaseWithExternalReferenceRepository):
++
++    def create_stacked_branch(self):
++        builder = self.make_branch_builder('source',
++                                           format=self.bzrdir_format)
++        builder.start_series()
++        repo = builder.get_branch().repository
++        if not repo._format.supports_external_lookups:
++            raise tests.TestNotApplicable('format does not support stacking')
++        builder.build_snapshot('A-id', None, [
++            ('add', ('', 'root-id', 'directory', None)),
++            ('add', ('file', 'file-id', 'file', 'contents\n'))])
++        builder.build_snapshot('B-id', ['A-id'], [
++            ('modify', ('file-id', 'new-content\n'))])
++        builder.build_snapshot('C-id', ['B-id'], [
++            ('modify', ('file-id', 'yet more content\n'))])
++        builder.finish_series()
++        source_b = builder.get_branch()
++        source_b.lock_read()
++        self.addCleanup(source_b.unlock)
++        base = self.make_branch('base')
++        base.pull(source_b, stop_revision='B-id')
++        stacked = self.make_branch('stacked')
++        stacked.set_stacked_on_url('../base')
++        stacked.pull(source_b, stop_revision='C-id')
++
++        return base, stacked
++
++    def test_unlock_unlocks_fallback(self):
++        base = self.make_branch('base')
++        stacked = self.make_branch('stacked')
++        repo = stacked.repository
++        stacked.set_stacked_on_url('../base')
++        self.assertEqual(1, len(repo._fallback_repositories))
++        fallback_repo = repo._fallback_repositories[0]
++        self.assertFalse(repo.is_locked())
++        self.assertFalse(fallback_repo.is_locked())
++        repo.lock_read()
++        self.assertTrue(repo.is_locked())
++        self.assertTrue(fallback_repo.is_locked())
++        repo.unlock()
++        self.assertFalse(repo.is_locked())
++        self.assertFalse(fallback_repo.is_locked())
++        repo.lock_write()
++        self.assertTrue(repo.is_locked())
++        self.assertTrue(fallback_repo.is_locked())
++        repo.unlock()
++        self.assertFalse(repo.is_locked())
++        self.assertFalse(fallback_repo.is_locked())
 === modified file 'bzrlib/tests/test_graph.py'
 --- bzrlib/tests/test_graph.py	2009-03-24 23:19:12 +0000
 +++ bzrlib/tests/test_graph.py	2009-05-29 10:35:21 +0000
@@ -1558,6 +1558,19 @@
          result = _mod_graph.PendingAncestryResult(['rev-2'], repo)
          self.assertEqual(set(['rev-1', 'rev-2']), set(result.get_keys()))
++    def test_get_keys_excludes_ghosts(self):
++        builder = self.make_branch_builder('b')
++        builder.start_series()
++        builder.build_snapshot('rev-1', None, [
++            ('add', ('', 'root-id', 'directory', ''))])
++        builder.build_snapshot('rev-2', ['rev-1', 'ghost'], [])
++        builder.finish_series()
++        repo = builder.get_branch().repository
++        repo.lock_read()
++        self.addCleanup(repo.unlock)
++        result = _mod_graph.PendingAncestryResult(['rev-2'], repo)
++        self.assertEqual(sorted(['rev-1', 'rev-2']), sorted(result.get_keys()))
++
      def test_get_keys_excludes_null(self):
          # Make a 'graph' with an iter_ancestry that returns NULL_REVISION
          # somewhere other than the last element, which can happen in real
 === modified file 'bzrlib/tests/test_groupcompress.py'
 --- bzrlib/tests/test_groupcompress.py	2009-04-22 17:18:45 +0000
 +++ bzrlib/tests/test_groupcompress.py	2009-05-29 10:35:21 +0000
@@ -19,8 +19,10 @@
  import zlib
  from bzrlib import (
++    btree_index,
      groupcompress,
      errors,
++    index as _mod_index,
      osutils,
      tests,
      versionedfile,
@@ -475,6 +477,23 @@
  class TestGroupCompressVersionedFiles(TestCaseWithGroupCompressVersionedFiles):
++    def make_g_index(self, name, ref_lists=0, nodes=[]):
++        builder = btree_index.BTreeBuilder(ref_lists)
++        for node, references, value in nodes:
++            builder.add_node(node, references, value)
++        stream = builder.finish()
++        trans = self.get_transport()
++        size = trans.put_file(name, stream)
++        return btree_index.BTreeGraphIndex(trans, name, size)
++
++    def make_g_index_missing_parent(self):
++        graph_index = self.make_g_index('missing_parent', 1,
++            [(('parent', ), '2 78 2 10', ([],)),
++             (('tip', ), '2 78 2 10',
++              ([('parent', ), ('missing-parent', )],)),
++              ])
++        return graph_index
++
      def test_get_record_stream_as_requested(self):
          # Consider promoting 'as-requested' to general availability, and
          # make this a VF interface test
@@ -606,6 +625,30 @@
              else:
                  self.assertIs(block, record._manager._block)
++    def test_add_missing_noncompression_parent_unvalidated_index(self):
++        unvalidated = self.make_g_index_missing_parent()
++        combined = _mod_index.CombinedGraphIndex([unvalidated])
++        index = groupcompress._GCGraphIndex(combined,
++            is_locked=lambda: True, parents=True,
++            track_external_parent_refs=True)
++        index.scan_unvalidated_index(unvalidated)
++        self.assertEqual(
++            frozenset([('missing-parent',)]), index.get_missing_parents())
++
++    def test_track_external_parent_refs(self):
++        g_index = self.make_g_index('empty', 1, [])
++        mod_index = btree_index.BTreeBuilder(1, 1)
++        combined = _mod_index.CombinedGraphIndex([g_index, mod_index])
++        index = groupcompress._GCGraphIndex(combined,
++            is_locked=lambda: True, parents=True,
++            add_callback=mod_index.add_nodes,
++            track_external_parent_refs=True)
++        index.add_records([
++            (('new-key',), '2 10 2 10', [(('parent-1',), ('parent-2',))])])
++        self.assertEqual(
++            frozenset([('parent-1',), ('parent-2',)]),
++            index.get_missing_parents())
++
  class TestLazyGroupCompress(tests.TestCaseWithTransport):
 === modified file 'bzrlib/tests/test_pack_repository.py'
 --- bzrlib/tests/test_pack_repository.py	2009-05-11 15:30:40 +0000
 +++ bzrlib/tests/test_pack_repository.py	2009-05-29 10:35:21 +0000
@@ -620,7 +620,7 @@
          Also requires that the exception is logged.
          """
          self.vfs_transport_factory = memory.MemoryServer
--        repo = self.make_repository('repo')
++        repo = self.make_repository('repo', format=self.get_format())
          token = repo.lock_write()
          self.addCleanup(repo.unlock)
          repo.start_write_group()
@@ -637,7 +637,7 @@
      def test_abort_write_group_does_raise_when_not_suppressed(self):
          self.vfs_transport_factory = memory.MemoryServer
--        repo = self.make_repository('repo')
++        repo = self.make_repository('repo', format=self.get_format())
          token = repo.lock_write()
          self.addCleanup(repo.unlock)
          repo.start_write_group()
@@ -650,23 +650,51 @@
      def test_suspend_write_group(self):
          self.vfs_transport_factory = memory.MemoryServer
--        repo = self.make_repository('repo')
++        repo = self.make_repository('repo', format=self.get_format())
          token = repo.lock_write()
          self.addCleanup(repo.unlock)
          repo.start_write_group()
          repo.texts.add_lines(('file-id', 'revid'), (), ['lines'])
          wg_tokens = repo.suspend_write_group()
          expected_pack_name = wg_tokens[0] + '.pack'
++        expected_names = [wg_tokens[0] + ext for ext in
++                            ('.rix', '.iix', '.tix', '.six')]
++        if repo.chk_bytes is not None:
++            expected_names.append(wg_tokens[0] + '.cix')
++        expected_names.append(expected_pack_name)
          upload_transport = repo._pack_collection._upload_transport
          limbo_files = upload_transport.list_dir('')
--        self.assertTrue(expected_pack_name in limbo_files, limbo_files)
++        self.assertEqual(sorted(expected_names), sorted(limbo_files))
          md5 = osutils.md5(upload_transport.get_bytes(expected_pack_name))
          self.assertEqual(wg_tokens[0], md5.hexdigest())
++    def test_resume_chk_bytes(self):
++        self.vfs_transport_factory = memory.MemoryServer
++        repo = self.make_repository('repo', format=self.get_format())
++        if repo.chk_bytes is None:
++            raise TestNotApplicable('no chk_bytes for this repository')
++        token = repo.lock_write()
++        self.addCleanup(repo.unlock)
++        repo.start_write_group()
++        text = 'a bit of text\n'
++        key = ('sha1:' + osutils.sha_string(text),)
++        repo.chk_bytes.add_lines(key, (), [text])
++        wg_tokens = repo.suspend_write_group()
++        same_repo = repo.bzrdir.open_repository()
++        same_repo.lock_write()
++        self.addCleanup(same_repo.unlock)
++        same_repo.resume_write_group(wg_tokens)
++        self.assertEqual([key], list(same_repo.chk_bytes.keys()))
++        self.assertEqual(
++            text, same_repo.chk_bytes.get_record_stream([key],
++                'unordered', True).next().get_bytes_as('fulltext'))
++        same_repo.abort_write_group()
++        self.assertEqual([], list(same_repo.chk_bytes.keys()))
++
      def test_resume_write_group_then_abort(self):
          # Create a repo, start a write group, insert some data, suspend.
          self.vfs_transport_factory = memory.MemoryServer
--        repo = self.make_repository('repo')
++        repo = self.make_repository('repo', format=self.get_format())
          token = repo.lock_write()
          self.addCleanup(repo.unlock)
          repo.start_write_group()
@@ -685,10 +713,38 @@
          self.assertEqual(
              [], same_repo._pack_collection._pack_transport.list_dir(''))
++    def test_commit_resumed_write_group(self):
++        self.vfs_transport_factory = memory.MemoryServer
++        repo = self.make_repository('repo', format=self.get_format())
++        token = repo.lock_write()
++        self.addCleanup(repo.unlock)
++        repo.start_write_group()
++        text_key = ('file-id', 'revid')
++        repo.texts.add_lines(text_key, (), ['lines'])
++        wg_tokens = repo.suspend_write_group()
++        # Get a fresh repository object for the repo on the filesystem.
++        same_repo = repo.bzrdir.open_repository()
++        # Resume
++        same_repo.lock_write()
++        self.addCleanup(same_repo.unlock)
++        same_repo.resume_write_group(wg_tokens)
++        same_repo.commit_write_group()
++        expected_pack_name = wg_tokens[0] + '.pack'
++        expected_names = [wg_tokens[0] + ext for ext in
++                            ('.rix', '.iix', '.tix', '.six')]
++        if repo.chk_bytes is not None:
++            expected_names.append(wg_tokens[0] + '.cix')
++        self.assertEqual(
++            [], same_repo._pack_collection._upload_transport.list_dir(''))
++        index_names = repo._pack_collection._index_transport.list_dir('')
++        self.assertEqual(sorted(expected_names), sorted(index_names))
++        pack_names = repo._pack_collection._pack_transport.list_dir('')
++        self.assertEqual([expected_pack_name], pack_names)
++
      def test_resume_malformed_token(self):
          self.vfs_transport_factory = memory.MemoryServer
          # Make a repository with a suspended write group
--        repo = self.make_repository('repo')
++        repo = self.make_repository('repo', format=self.get_format())
          token = repo.lock_write()
          self.addCleanup(repo.unlock)
          repo.start_write_group()
@@ -696,7 +752,7 @@
          repo.texts.add_lines(text_key, (), ['lines'])
          wg_tokens = repo.suspend_write_group()
          # Make a new repository
--        new_repo = self.make_repository('new_repo')
++        new_repo = self.make_repository('new_repo', format=self.get_format())
          token = new_repo.lock_write()
          self.addCleanup(new_repo.unlock)
          hacked_wg_token = (
@@ -732,12 +788,12 @@
              # can only stack on repositories that have compatible internal
              # metadata
              if getattr(repo._format, 'supports_tree_reference', False):
++                matching_format_name = 'pack-0.92-subtree'
++            else:
                  if repo._format.supports_chks:
                      matching_format_name = 'development6-rich-root'
                  else:
--                    matching_format_name = 'pack-0.92-subtree'
--            else:
--                matching_format_name = 'rich-root-pack'
++                    matching_format_name = 'rich-root-pack'
              mismatching_format_name = 'pack-0.92'
          else:
              # We don't have a non-rich-root CHK format.
@@ -763,15 +819,14 @@
          if getattr(repo._format, 'supports_tree_reference', False):
              # can only stack on repositories that have compatible internal
              # metadata
--            if repo._format.supports_chks:
--                # No CHK subtree formats in bzr.dev, so this doesn't execute.
--                matching_format_name = 'development6-subtree'
--            else:
--                matching_format_name = 'pack-0.92-subtree'
++            matching_format_name = 'pack-0.92-subtree'
              mismatching_format_name = 'rich-root-pack'
          else:
              if repo.supports_rich_root():
--                matching_format_name = 'rich-root-pack'
++                if repo._format.supports_chks:
++                    matching_format_name = 'development6-rich-root'
++                else:
++                    matching_format_name = 'rich-root-pack'
                  mismatching_format_name = 'pack-0.92-subtree'
              else:
                  raise TestNotApplicable('No formats use non-v5 serializer'
@@ -844,6 +899,66 @@
          self.assertTrue(large_pack_name in pack_names)
++class TestKeyDependencies(TestCaseWithTransport):
++
++    def get_format(self):
++        return bzrdir.format_registry.make_bzrdir(self.format_name)
++
++    def create_source_and_target(self):
++        builder = self.make_branch_builder('source', format=self.get_format())
++        builder.start_series()
++        builder.build_snapshot('A-id', None, [
++            ('add', ('', 'root-id', 'directory', None))])
++        builder.build_snapshot('B-id', ['A-id', 'ghost-id'], [])
++        builder.finish_series()
++        repo = self.make_repository('target')
++        b = builder.get_branch()
++        b.lock_read()
++        self.addCleanup(b.unlock)
++        repo.lock_write()
++        self.addCleanup(repo.unlock)
++        return b.repository, repo
++
++    def test_key_dependencies_cleared_on_abort(self):
++        source_repo, target_repo = self.create_source_and_target()
++        target_repo.start_write_group()
++        try:
++            stream = source_repo.revisions.get_record_stream([('B-id',)],
++                                                             'unordered', True)
++            target_repo.revisions.insert_record_stream(stream)
++            key_refs = target_repo.revisions._index._key_dependencies
++            self.assertEqual([('B-id',)], sorted(key_refs.get_referrers()))
++        finally:
++            target_repo.abort_write_group()
++        self.assertEqual([], sorted(key_refs.get_referrers()))
++
++    def test_key_dependencies_cleared_on_suspend(self):
++        source_repo, target_repo = self.create_source_and_target()
++        target_repo.start_write_group()
++        try:
++            stream = source_repo.revisions.get_record_stream([('B-id',)],
++                                                             'unordered', True)
++            target_repo.revisions.insert_record_stream(stream)
++            key_refs = target_repo.revisions._index._key_dependencies
++            self.assertEqual([('B-id',)], sorted(key_refs.get_referrers()))
++        finally:
++            target_repo.suspend_write_group()
++        self.assertEqual([], sorted(key_refs.get_referrers()))
++
++    def test_key_dependencies_cleared_on_commit(self):
++        source_repo, target_repo = self.create_source_and_target()
++        target_repo.start_write_group()
++        try:
++            stream = source_repo.revisions.get_record_stream([('B-id',)],
++                                                             'unordered', True)
++            target_repo.revisions.insert_record_stream(stream)
++            key_refs = target_repo.revisions._index._key_dependencies
++            self.assertEqual([('B-id',)], sorted(key_refs.get_referrers()))
++        finally:
++            target_repo.commit_write_group()
++        self.assertEqual([], sorted(key_refs.get_referrers()))
++
++
  class TestSmartServerAutopack(TestCaseWithTransport):
      def setUp(self):
@@ -931,7 +1046,7 @@
           dict(format_name='development6-rich-root',
                format_string='Bazaar development format - group compression '
                    'and chk inventory (needs bzr.dev from 1.14)\n',
--              format_supports_external_lookups=False,
++              format_supports_external_lookups=True,
                index_class=BTreeGraphIndex),
+          ]
      # name of the scenario is the format name
 === modified file 'bzrlib/tests/test_repository.py'
 --- bzrlib/tests/test_repository.py	2009-04-09 20:23:07 +0000
 +++ bzrlib/tests/test_repository.py	2009-05-29 10:35:21 +0000
@@ -686,11 +686,11 @@
              inv.parent_id_basename_to_file_id._root_node.maximum_size)
--class TestDevelopment6FindRevisionOutsideSet(TestCaseWithTransport):
--    """Tests for _find_revision_outside_set."""
++class TestDevelopment6FindParentIdsOfRevisions(TestCaseWithTransport):
++    """Tests for _find_parent_ids_of_revisions."""
      def setUp(self):
--        super(TestDevelopment6FindRevisionOutsideSet, self).setUp()
++        super(TestDevelopment6FindParentIdsOfRevisions, self).setUp()
          self.builder = self.make_branch_builder('source',
              format='development6-rich-root')
          self.builder.start_series()
@@ -699,42 +699,42 @@
          self.repo = self.builder.get_branch().repository
          self.addCleanup(self.builder.finish_series)
--    def assertRevisionOutsideSet(self, expected_result, rev_set):
--        self.assertEqual(
--            expected_result, self.repo._find_revision_outside_set(rev_set))
++    def assertParentIds(self, expected_result, rev_set):
++        self.assertEqual(sorted(expected_result),
++            sorted(self.repo._find_parent_ids_of_revisions(rev_set)))
      def test_simple(self):
          self.builder.build_snapshot('revid1', None, [])
--        self.builder.build_snapshot('revid2', None, [])
++        self.builder.build_snapshot('revid2', ['revid1'], [])
          rev_set = ['revid2']
--        self.assertRevisionOutsideSet('revid1', rev_set)
++        self.assertParentIds(['revid1'], rev_set)
      def test_not_first_parent(self):
          self.builder.build_snapshot('revid1', None, [])
--        self.builder.build_snapshot('revid2', None, [])
--        self.builder.build_snapshot('revid3', None, [])
++        self.builder.build_snapshot('revid2', ['revid1'], [])
++        self.builder.build_snapshot('revid3', ['revid2'], [])
          rev_set = ['revid3', 'revid2']
--        self.assertRevisionOutsideSet('revid1', rev_set)
++        self.assertParentIds(['revid1'], rev_set)
      def test_not_null(self):
          rev_set = ['initial']
--        self.assertRevisionOutsideSet(_mod_revision.NULL_REVISION, rev_set)
++        self.assertParentIds([], rev_set)
      def test_not_null_set(self):
          self.builder.build_snapshot('revid1', None, [])
          rev_set = [_mod_revision.NULL_REVISION]
--        self.assertRevisionOutsideSet(_mod_revision.NULL_REVISION, rev_set)
++        self.assertParentIds([], rev_set)
      def test_ghost(self):
          self.builder.build_snapshot('revid1', None, [])
          rev_set = ['ghost', 'revid1']
--        self.assertRevisionOutsideSet('initial', rev_set)
++        self.assertParentIds(['initial'], rev_set)
      def test_ghost_parent(self):
          self.builder.build_snapshot('revid1', None, [])
          self.builder.build_snapshot('revid2', ['revid1', 'ghost'], [])
          rev_set = ['revid2', 'revid1']
--        self.assertRevisionOutsideSet('initial', rev_set)
++        self.assertParentIds(['ghost', 'initial'], rev_set)
      def test_righthand_parent(self):
          self.builder.build_snapshot('revid1', None, [])
@@ -742,7 +742,7 @@
          self.builder.build_snapshot('revid2b', ['revid1'], [])
          self.builder.build_snapshot('revid3', ['revid2a', 'revid2b'], [])
          rev_set = ['revid3', 'revid2a']
--        self.assertRevisionOutsideSet('revid2b', rev_set)
++        self.assertParentIds(['revid1', 'revid2b'], rev_set)
  class TestWithBrokenRepo(TestCaseWithTransport):
@@ -1220,3 +1220,68 @@
          stream = source._get_source(target._format)
          # We don't want the child GroupCHKStreamSource
          self.assertIs(type(stream), repository.StreamSource)
++
++    def test_get_stream_for_missing_keys_includes_all_chk_refs(self):
++        source_builder = self.make_branch_builder('source',
++                            format='development6-rich-root')
++        # We have to build a fairly large tree, so that we are sure the chk
++        # pages will have split into multiple pages.
++        entries = [('add', ('', 'a-root-id', 'directory', None))]
++        for i in 'abcdefghijklmnopqrstuvwxyz123456789':
++            for j in 'abcdefghijklmnopqrstuvwxyz123456789':
++                fname = i + j
++                fid = fname + '-id'
++                content = 'content for %s\n' % (fname,)
++                entries.append(('add', (fname, fid, 'file', content)))
++        source_builder.start_series()
++        source_builder.build_snapshot('rev-1', None, entries)
++        # Now change a few of them, so we get a few new pages for the second
++        # revision
++        source_builder.build_snapshot('rev-2', ['rev-1'], [
++            ('modify', ('aa-id', 'new content for aa-id\n')),
++            ('modify', ('cc-id', 'new content for cc-id\n')),
++            ('modify', ('zz-id', 'new content for zz-id\n')),
++            ])
++        source_builder.finish_series()
++        source_branch = source_builder.get_branch()
++        source_branch.lock_read()
++        self.addCleanup(source_branch.unlock)
++        target = self.make_repository('target', format='development6-rich-root')
++        source = source_branch.repository._get_source(target._format)
++        self.assertIsInstance(source, groupcompress_repo.GroupCHKStreamSource)
++
++        # On a regular pass, getting the inventories and chk pages for rev-2
++        # would only get the newly created chk pages
++        search = graph.SearchResult(set(['rev-2']), set(['rev-1']), 1,
++                                    set(['rev-2']))
++        simple_chk_records = []
++        for vf_name, substream in source.get_stream(search):
++            if vf_name == 'chk_bytes':
++                for record in substream:
++                    simple_chk_records.append(record.key)
++            else:
++                for _ in substream:
++                    continue
++        # 3 pages, the root (InternalNode), + 2 pages which actually changed
++        self.assertEqual([('sha1:91481f539e802c76542ea5e4c83ad416bf219f73',),
++                          ('sha1:4ff91971043668583985aec83f4f0ab10a907d3f',),
++                          ('sha1:81e7324507c5ca132eedaf2d8414ee4bb2226187',),
++                          ('sha1:b101b7da280596c71a4540e9a1eeba8045985ee0',)],
++                         simple_chk_records)
++        # Now, when we do a similar call using 'get_stream_for_missing_keys'
++        # we should get a much larger set of pages.
++        missing = [('inventories', 'rev-2')]
++        full_chk_records = []
++        for vf_name, substream in source.get_stream_for_missing_keys(missing):
++            if vf_name == 'inventories':
++                for record in substream:
++                    self.assertEqual(('rev-2',), record.key)
++            elif vf_name == 'chk_bytes':
++                for record in substream:
++                    full_chk_records.append(record.key)
++            else:
++                self.fail('Should not be getting a stream of %s' % (vf_name,))
++        # We have 257 records now. This is because we have 1 root page, and 256
++        # leaf pages in a complete listing.
++        self.assertEqual(257, len(full_chk_records))
++        self.assertSubset(simple_chk_records, full_chk_records)