Merge lp:~spiv/bzr/smarter-index-search into lp:bzr

Proposed by Andrew Bennetts
Status: Superseded
Proposed branch: lp:~spiv/bzr/smarter-index-search
Merge into: lp:bzr
Diff against target: 348 lines (+170/-29)
4 files modified
bzrlib/index.py (+95/-7)
bzrlib/repofmt/pack_repo.py (+14/-22)
bzrlib/tests/per_pack_repository.py (+17/-0)
bzrlib/tests/test_index.py (+44/-0)
To merge this branch: bzr merge lp:~spiv/bzr/smarter-index-search
Reviewer Review Type Date Requested Status
John A Meinel Needs Fixing
Review via email: mp+21210@code.launchpad.net

This proposal has been superseded by a proposal from 2010-03-18.

Description of the change

Optimise index lookups in repositories with many pack files.

First, the headline: this greatly improves "bzr pull" of one new revision of grub from savannah by HTTP (as reported on the mailing list, which has further analysis):

bzr.dev: 2424kB (50.2kB/s r:2395kB w:30kB)
this patch: 1034kB (43.3kB/s r:1022kB w:12kB)

Given that the pack data transferred is 701266 bytes (which itself seems quite large for such a small change...), that brings the index-searching overhead from 2.42x to 0.45x of the bytes read. It also halves the wall-clock time :)

That repo has I think 14 packs, and bzr.dev tries 11 indices for each of rix, iix, etc before finding the data it needs for that fetch.

There are two parts to this change:

 1) when a CombinedGraphIndex performs a lookup, it shuffles the index or indices that contained the records to the front of self._indices on the assumption that future lookups should try those first.
 2) propagates that reordering to the other CombinedGraphIndex objects from the same pack collection. This is done by a) associating a name (the pack name) with the elements of CombinedGraphIndex, and b) linking the revisions/inventories/etc CombinedGraphIndex objects belonging to a single pack collection via setting a _sibling_indices attribute on them, c) using those links and names to apply the same reordering to those sibling indices.

I've been pretty conservative with API changes: the new behaviour is only activated by optional keyword arguments, so existing uses of CombinedGraphIndex should see no change of behaviour (including no improvement). This is to make it as easy as possible to backport this change to 2.1 and 2.0 if we choose to.

I think this change needs some tests before it's truly ready to merge, but it's getting to the end of my work week and I think this code is ready for feedback, so here it is!

To post a comment you must log in.
Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (5.3 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> Andrew Bennetts has proposed merging lp:~spiv/bzr/smarter-index-search into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
>
>
> Optimise index lookups in repositories with many pack files.
>
> First, the headline: this greatly improves "bzr pull" of one new revision of grub from savannah by HTTP (as reported on the mailing list, which has further analysis):
>
> bzr.dev: 2424kB (50.2kB/s r:2395kB w:30kB)
> this patch: 1034kB (43.3kB/s r:1022kB w:12kB)
>
> Given that the pack data transferred is 701266 bytes (which itself seems quite large for such a small change...), that brings the index-searching overhead from 2.42x to 0.45x of the bytes read. It also halves the wall-clock time :)
>
> That repo has I think 14 packs, and bzr.dev tries 11 indices for each of rix, iix, etc before finding the data it needs for that fetch.
>
> There are two parts to this change:
>
> 1) when a CombinedGraphIndex performs a lookup, it shuffles the index or indices that contained the records to the front of self._indices on the assumption that future lookups should try those first.
> 2) propagates that reordering to the other CombinedGraphIndex objects from the same pack collection. This is done by a) associating a name (the pack name) with the elements of CombinedGraphIndex, and b) linking the revisions/inventories/etc CombinedGraphIndex objects belonging to a single pack collection via setting a _sibling_indices attribute on them, c) using those links and names to apply the same reordering to those sibling indices.
>
> I've been pretty conservative with API changes: the new behaviour is only activated by optional keyword arguments, so existing uses of CombinedGraphIndex should see no change of behaviour (including no improvement). This is to make it as easy as possible to backport this change to 2.1 and 2.0 if we choose to.
>
> I think this change needs some tests before it's truly ready to merge, but it's getting to the end of my work week and I think this code is ready for feedback, so here it is!
>

Thanks for doing this. I did the work in Packer, but it is nice to have
it here.

I would tend to set "auto_reorder" to always on. I don't see it ever
making things worse, and there should only be a small overhead for the
reordering. As such, I would also get rid of the constructor flag.

Doing it by names is ok, I think Index already has _name for most
indices. You could also order by "access_tuple()" which I believe is a
public API and returns (transport, name) to the original .pack file. The
*nice* thing about doing that is that you don't have to introduce any
new apis here. But if it isn't as clean, then don't worry too much about it.

It also avoids having to rebuild the mapping for every move call:
+ indices_info = zip(self._index_names, self._indices)
+ hit_indices_info = []
+ hit_names = []
+ unhit_indices_info = []
+ for name, idx in indices_info:
+ if idx in hit_indices:
+ info = hit_indices_info
+ hit_names.append(name)
+ else:
+ info = unhit_indices_info
+ ...

Read more...

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> Andrew Bennetts has proposed merging lp:~spiv/bzr/smarter-index-search into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
>
>
> Optimise index lookups in repositories with many pack files.
>
> First, the headline: this greatly improves "bzr pull" of one new revision of grub from savannah by HTTP (as reported on the mailing list, which has further analysis):
>
> bzr.dev: 2424kB (50.2kB/s r:2395kB w:30kB)
> this patch: 1034kB (43.3kB/s r:1022kB w:12kB)
>
> Given that the pack data transferred is 701266 bytes (which itself seems quite large for such a small change...), that brings the index-searching overhead from 2.42x to 0.45x of the bytes read. It also halves the wall-clock time :)
>
> That repo has I think 14 packs, and bzr.dev tries 11 indices for each of rix, iix, etc before finding the data it needs for that fetch.
>
> There are two parts to this change:
>
> 1) when a CombinedGraphIndex performs a lookup, it shuffles the index or indices that contained the records to the front of self._indices on the assumption that future lookups should try those first.
> 2) propagates that reordering to the other CombinedGraphIndex objects from the same pack collection. This is done by a) associating a name (the pack name) with the elements of CombinedGraphIndex, and b) linking the revisions/inventories/etc CombinedGraphIndex objects belonging to a single pack collection via setting a _sibling_indices attribute on them, c) using those links and names to apply the same reordering to those sibling indices.
>
> I've been pretty conservative with API changes: the new behaviour is only activated by optional keyword arguments, so existing uses of CombinedGraphIndex should see no change of behaviour (including no improvement). This is to make it as easy as possible to backport this change to 2.1 and 2.0 if we choose to.
>
> I think this change needs some tests before it's truly ready to merge, but it's getting to the end of my work week and I think this code is ready for feedback, so here it is!
>

I forgot to vote.

  review: needs_fixing

Because it certainly should have some amount of testing. If only to
prevent future regressions (like we already have :). Though you don't
have to test it at the "get_stream()" level.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkubrdwACgkQJdeBCYSNAAPiOgCcDS74mDZFRHc6TOjROAUcFsjj
0ykAoM7AAlpwfv/uNire7vqtLaBtkohG
=90G5
-----END PGP SIGNATURE-----

review: Needs Fixing
Revision history for this message
Andrew Bennetts (spiv) wrote :
Download full text (3.6 KiB)

John A Meinel wrote:
[...]
> Thanks for doing this. I did the work in Packer, but it is nice to have
> it here.

Ah! I thought we had something like this. I'll see what code and ideas
I can reuse.

> I would tend to set "auto_reorder" to always on. I don't see it ever
> making things worse, and there should only be a small overhead for the
> reordering. As such, I would also get rid of the constructor flag.

Hmm, ok. If I was just targetting bzr.dev I would have done this
already, but if you're confident it won't cause unexpected issues I'm
happy to make it unconditional (including in backports).

> Doing it by names is ok, I think Index already has _name for most
> indices. You could also order by "access_tuple()" which I believe is a
> public API and returns (transport, name) to the original .pack file. The
> *nice* thing about doing that is that you don't have to introduce any
> new apis here. But if it isn't as clean, then don't worry too much about it.

_name is really a path for _transport, and as such includes the suffix.
So I don't think it's as clean either code-wise or conceptually to rely
on that. And as you noticed later access_tuple is on AggregateIndex, so
it's not readily available either.

> If it isn't too much overhead, you might consider:
>
> keys_remove = keys.remove
> # Be careful, if keys starts empty, we would break, and get a name error
> index_hit_count = 0
> for index in self._indices:
> if not keys:
> break
> index_hit_count = 0
> for node in index.iter_entries(keys):
> keys_remove(node[1])
> yield node
> index_hit_count += 1
>
> if index_hit_count:
> hit_indices.append((index_hit_count, index))
>
> hit_indices.sort(reverse=True)
>
> self._move_index_order(hit_indices)
>
>
> The nice thing about it, is that it is adaptive. In that if you start a
> search, and hit it in index 1, then you keep looking there, but if the
> keys start showing up in index 2, then you'll switch. The only other bit
> of logic that I could think of is:

I don't quite follow you here. Wouldn't my logic switch to preferring
index 2 anyway?

It's not clear to me that ordering the hit_indices by hit count is
better.

> if index_hit_count == index.key_count():
> index_hit_count = 0 (or maybe 1)
>
> I realize to do this correctly, you'd need to track hit count between
> calls, which isn't realistic. However, that basic check will work for
> pack files that only have 1 or 2 keys that are found quickly. (And
> commit creates indexes with only 1 entry, so it is fairly common.) Just
> a thought, though.

This is an interesting idea. Although perhaps those look ups will
already fail very quickly because at that point that entire index has
already been read and cached? I don't think I'll do this in this patch
as I think it'd need more investigation and measurement to justify, but
it might make a good incremental improvement later.

> + # Tell all the CombinedGraphIndex objects about each other, so
> they can
> + # share hints about which pack names to search first.
> + all_combined = [agg_idx.combined_index for agg_idx in all_indices]
> + for combined_idx in all_combined:
> + ...

Read more...

Revision history for this message
Andrew Bennetts (spiv) wrote :

Andrew Bennetts wrote:
> John A Meinel wrote:
[...]
> > The nice thing about it, is that it is adaptive. In that if you start a
> > search, and hit it in index 1, then you keep looking there, but if the
> > keys start showing up in index 2, then you'll switch. The only other bit
> > of logic that I could think of is:
>
> I don't quite follow you here. Wouldn't my logic switch to preferring
> index 2 anyway?

On reflection, I *think* what you mean is if a call to iter_entries (via
get_parent_map or otherwise) queries for [key_from_1, key_from_2,
key_from_2], my logic will keep the index order as [1, 2], but it might
be better to switch them to [2, 1]. (I initially thought that you were
talking about a search conducted over multiple calls to iter_entries,
not just one.)

That could be true, but I'm not sure it would really make a big
difference in practice. How about writing a patch for my patch and
doing some measurements? I'll happily review it :)

Or do you already have some analysis that would support this from when
you changed the Packer code?

-Andrew.

Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (3.5 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> John A Meinel wrote:
> [...]
>> Thanks for doing this. I did the work in Packer, but it is nice to have
>> it here.
>
> Ah! I thought we had something like this. I'll see what code and ideas
> I can reuse.
>
>> I would tend to set "auto_reorder" to always on. I don't see it ever
>> making things worse, and there should only be a small overhead for the
>> reordering. As such, I would also get rid of the constructor flag.
>
> Hmm, ok. If I was just targetting bzr.dev I would have done this
> already, but if you're confident it won't cause unexpected issues I'm
> happy to make it unconditional (including in backports).
>
>> Doing it by names is ok, I think Index already has _name for most
>> indices. You could also order by "access_tuple()" which I believe is a
>> public API and returns (transport, name) to the original .pack file. The
>> *nice* thing about doing that is that you don't have to introduce any
>> new apis here. But if it isn't as clean, then don't worry too much about it.
>
> _name is really a path for _transport, and as such includes the suffix.
> So I don't think it's as clean either code-wise or conceptually to rely
> on that. And as you noticed later access_tuple is on AggregateIndex, so
> it's not readily available either.
>
>> If it isn't too much overhead, you might consider:
>>
>> keys_remove = keys.remove
>> # Be careful, if keys starts empty, we would break, and get a name error
>> index_hit_count = 0
>> for index in self._indices:
>> if not keys:
>> break
>> index_hit_count = 0
>> for node in index.iter_entries(keys):
>> keys_remove(node[1])
>> yield node
>> index_hit_count += 1
>>
>> if index_hit_count:
>> hit_indices.append((index_hit_count, index))
>>
>> hit_indices.sort(reverse=True)
>>
>> self._move_index_order(hit_indices)
>>
>>
>> The nice thing about it, is that it is adaptive. In that if you start a
>> search, and hit it in index 1, then you keep looking there, but if the
>> keys start showing up in index 2, then you'll switch. The only other bit
>> of logic that I could think of is:
>
> I don't quite follow you here. Wouldn't my logic switch to preferring
> index 2 anyway?
>
> It's not clear to me that ordering the hit_indices by hit count is
> better.
>

The point is that if the first index in the list gets 1 key hit, but the
second gets 10, and the third gets 5, then it will get sorted into
(index2, index3, index1) rather than (index1, index2, index3). At least,
what I saw with your code was that it is just 'preserve existing order,
but move the ones hit to the front'. Put another way, say the hits are:

5 index1
10 index2
0 index3
4 index5
0 index6

your version would sort to

index1
index2
index5
index3
index6

mine would sort to

index2
index1
index5
index3
index6

I think the latter is slightly preferred. I don't have any numbers to
back that up, just a feeling. (Most likely in this case index2 is a
'bigger' index, and thus has more to *be* hit.)

As for small indices getting rejected quickly, it is true they won't
encounter any I/O overhead, which may be sufficient. It is a function
call, but th...

Read more...

Revision history for this message
Andrew Bennetts (spiv) wrote :

John A Meinel wrote:
[...]
> I think the latter is slightly preferred. I don't have any numbers to
> back that up, just a feeling. (Most likely in this case index2 is a
> 'bigger' index, and thus has more to *be* hit.)

I agree it probably helps, a little, but I'm not going to do it without
some measurements to confirm it does, because of the risk of
counter-intuitive results to untested optimisations. I know you've done
more exploring in this area so I trust your intuititions a bit more than
mine, but I'm still reluctant as the bang for buck seems low.

Also, even my simple logic will tend to keep the more hit indices closer
to the front than others after more and more queries, just not quite as
quickly as it might with hit counts. And this way we don't need to
worry about whether we should be keeping and aging hit counts between
queries, etc.

> As for small indices getting rejected quickly, it is true they won't
> encounter any I/O overhead, which may be sufficient. It is a function
> call, but that is probably tiny overhead, and they'll get sorted late
> quickly anyway.

Right. (FWIW, it's I/O overhead I'm most worried about.)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/index.py'
2--- bzrlib/index.py 2010-02-17 17:11:16 +0000
3+++ bzrlib/index.py 2010-03-18 05:26:26 +0000
4@@ -1227,10 +1227,15 @@
5 static data.
6
7 Queries against the combined index will be made against the first index,
8- and then the second and so on. The order of index's can thus influence
9+ and then the second and so on. The order of indices can thus influence
10 performance significantly. For example, if one index is on local disk and a
11 second on a remote server, the local disk index should be before the other
12 in the index list.
13+
14+ Also, queries tend to need results from the same indices as previous
15+ queries. So the indices will be reordered after every query to put the
16+ indices that had the result(s) of that query first (while otherwise
17+ preserving the relative ordering).
18 """
19
20 def __init__(self, indices, reload_func=None):
21@@ -1243,6 +1248,13 @@
22 """
23 self._indices = indices
24 self._reload_func = reload_func
25+ # Sibling indices are other CombinedGraphIndex that we should call
26+ # _move_to_front_by_name on when we auto-reorder ourself.
27+ self._sibling_indices = []
28+ # A list of names that corresponds to the instances in self._indices,
29+ # so _index_names[0] is always the name for _indices[0], etc. Sibling
30+ # indices must all use the same set of names as each other.
31+ self._index_names = [None] * len(self._indices)
32
33 def __repr__(self):
34 return "%s(%s)" % (
35@@ -1271,13 +1283,17 @@
36
37 has_key = _has_key_from_parent_map
38
39- def insert_index(self, pos, index):
40+ def insert_index(self, pos, index, name=None):
41 """Insert a new index in the list of indices to query.
42
43 :param pos: The position to insert the index.
44 :param index: The index to insert.
45+ :param name: a name for this index, e.g. a pack name. These names can
46+ be used to reflect index reorderings to related CombinedGraphIndex
47+ instances that use the same names. (see set_sibling_indices)
48 """
49 self._indices.insert(pos, index)
50+ self._index_names.insert(pos, name)
51
52 def iter_all_entries(self):
53 """Iterate over all keys within the index
54@@ -1308,22 +1324,28 @@
55 value and are only reported once.
56
57 :param keys: An iterable providing the keys to be retrieved.
58- :return: An iterable of (index, key, reference_lists, value). There is no
59- defined order for the result iteration - it will be in the most
60+ :return: An iterable of (index, key, reference_lists, value). There is
61+ no defined order for the result iteration - it will be in the most
62 efficient order for the index.
63 """
64 keys = set(keys)
65+ hit_indices = []
66 while True:
67 try:
68 for index in self._indices:
69 if not keys:
70- return
71+ break
72+ index_hit = False
73 for node in index.iter_entries(keys):
74 keys.remove(node[1])
75 yield node
76- return
77+ index_hit = True
78+ if index_hit:
79+ hit_indices.append(index)
80+ break
81 except errors.NoSuchFile:
82 self._reload_or_raise()
83+ self._move_to_front(hit_indices)
84
85 def iter_entries_prefix(self, keys):
86 """Iterate over keys within the index using prefix matching.
87@@ -1349,17 +1371,77 @@
88 if not keys:
89 return
90 seen_keys = set()
91+ hit_indices = []
92 while True:
93 try:
94 for index in self._indices:
95+ index_hit = False
96 for node in index.iter_entries_prefix(keys):
97 if node[1] in seen_keys:
98 continue
99 seen_keys.add(node[1])
100 yield node
101- return
102+ index_hit = True
103+ if index_hit:
104+ hit_indices.append(index)
105+ break
106 except errors.NoSuchFile:
107 self._reload_or_raise()
108+ self._move_to_front(hit_indices)
109+
110+ def _move_to_front(self, hit_indices):
111+ """Rearrange self._indices so that hit_indices are first.
112+
113+ Order is maintained as much as possible, e.g. the first unhit index
114+ will be the first index in _indices after the hit_indices, and the
115+ hit_indices will be present in exactly the order they are passed to
116+ _move_to_front.
117+
118+ _move_to_front propagates to all objects in self._sibling_indices by
119+ calling _move_to_front_by_name.
120+ """
121+ hit_names = self._move_to_front_by_index(hit_indices)
122+ for sibling_idx in self._sibling_indices:
123+ sibling_idx._move_to_front_by_name(hit_names)
124+
125+ def _move_to_front_by_index(self, hit_indices):
126+ """Core logic for _move_to_front.
127+
128+ Returns a list of names corresponding to the hit_indices param.
129+ """
130+ indices_info = zip(self._index_names, self._indices)
131+ if 'index' in debug.debug_flags:
132+ mutter('CombinedGraphIndex reordering: currently %r, promoting %r',
133+ indices_info, hit_indices)
134+ hit_indices_info = []
135+ hit_names = []
136+ unhit_indices_info = []
137+ for name, idx in indices_info:
138+ if idx in hit_indices:
139+ info = hit_indices_info
140+ hit_names.append(name)
141+ else:
142+ info = unhit_indices_info
143+ info.append((name, idx))
144+ final_info = hit_indices_info + unhit_indices_info
145+ self._indices = [idx for (name, idx) in final_info]
146+ self._index_names = [name for (name, idx) in final_info]
147+ if 'index' in debug.debug_flags:
148+ mutter('CombinedGraphIndex reordered: %r', self._indices)
149+ return hit_names
150+
151+ def _move_to_front_by_name(self, hit_names):
152+ """Moves indices named by 'hit_names' to front of the search order, as
153+ described in _move_to_front.
154+ """
155+ # Translate names to index instances, and then call
156+ # _move_to_front_by_index.
157+ indices_info = zip(self._index_names, self._indices)
158+ hit_indices = []
159+ for name, idx in indices_info:
160+ if name in hit_names:
161+ hit_indices.append(idx)
162+ self._move_to_front_by_index(hit_indices)
163
164 def find_ancestry(self, keys, ref_list_num):
165 """Find the complete ancestry for the given set of keys.
166@@ -1372,6 +1454,7 @@
167 we care about.
168 :return: (parent_map, missing_keys)
169 """
170+ # XXX: make this call _move_to_front?
171 missing_keys = set()
172 parent_map = {}
173 keys_to_lookup = set(keys)
174@@ -1457,6 +1540,11 @@
175 ' Raising original exception.')
176 raise exc_type, exc_value, exc_traceback
177
178+ def set_sibling_indices(self, sibling_combined_graph_indices):
179+ """Set the CombinedGraphIndex objects to reorder after reordering self.
180+ """
181+ self._sibling_indices = sibling_combined_graph_indices
182+
183 def validate(self):
184 """Validate that everything in the index can be accessed."""
185 while True:
186
187=== modified file 'bzrlib/repofmt/pack_repo.py'
188--- bzrlib/repofmt/pack_repo.py 2010-02-12 11:58:21 +0000
189+++ bzrlib/repofmt/pack_repo.py 2010-03-18 05:26:26 +0000
190@@ -587,26 +587,6 @@
191 flush_func=flush_func)
192 self.add_callback = None
193
194- def replace_indices(self, index_to_pack, indices):
195- """Replace the current mappings with fresh ones.
196-
197- This should probably not be used eventually, rather incremental add and
198- removal of indices. It has been added during refactoring of existing
199- code.
200-
201- :param index_to_pack: A mapping from index objects to
202- (transport, name) tuples for the pack file data.
203- :param indices: A list of indices.
204- """
205- # refresh the revision pack map dict without replacing the instance.
206- self.index_to_pack.clear()
207- self.index_to_pack.update(index_to_pack)
208- # XXX: API break - clearly a 'replace' method would be good?
209- self.combined_index._indices[:] = indices
210- # the current add nodes callback for the current writable index if
211- # there is one.
212- self.add_callback = None
213-
214 def add_index(self, index, pack):
215 """Add index to the aggregate, which is an index for Pack pack.
216
217@@ -619,7 +599,7 @@
218 # expose it to the index map
219 self.index_to_pack[index] = pack.access_tuple()
220 # put it at the front of the linear index list
221- self.combined_index.insert_index(0, index)
222+ self.combined_index.insert_index(0, index, pack.name)
223
224 def add_writable_index(self, index, pack):
225 """Add an index which is able to have data added to it.
226@@ -645,6 +625,7 @@
227 self.data_access.set_writer(None, None, (None, None))
228 self.index_to_pack.clear()
229 del self.combined_index._indices[:]
230+ del self.combined_index._index_names[:]
231 self.add_callback = None
232
233 def remove_index(self, index):
234@@ -653,7 +634,9 @@
235 :param index: An index from the pack parameter.
236 """
237 del self.index_to_pack[index]
238- self.combined_index._indices.remove(index)
239+ pos = self.combined_index._indices.index(index)
240+ del self.combined_index._indices[pos]
241+ del self.combined_index._index_names[pos]
242 if (self.add_callback is not None and
243 getattr(index, 'add_nodes', None) == self.add_callback):
244 self.add_callback = None
245@@ -1415,11 +1398,20 @@
246 self.inventory_index = AggregateIndex(self.reload_pack_names, flush)
247 self.text_index = AggregateIndex(self.reload_pack_names, flush)
248 self.signature_index = AggregateIndex(self.reload_pack_names, flush)
249+ all_indices = [self.revision_index, self.inventory_index,
250+ self.text_index, self.signature_index]
251 if use_chk_index:
252 self.chk_index = AggregateIndex(self.reload_pack_names, flush)
253+ all_indices.append(self.chk_index)
254 else:
255 # used to determine if we're using a chk_index elsewhere.
256 self.chk_index = None
257+ # Tell all the CombinedGraphIndex objects about each other, so they can
258+ # share hints about which pack names to search first.
259+ all_combined = [agg_idx.combined_index for agg_idx in all_indices]
260+ for combined_idx in all_combined:
261+ combined_idx.set_sibling_indices(
262+ set(all_combined).difference([combined_idx]))
263 # resumed packs
264 self._resumed_packs = []
265
266
267=== modified file 'bzrlib/tests/per_pack_repository.py'
268--- bzrlib/tests/per_pack_repository.py 2010-02-23 07:43:11 +0000
269+++ bzrlib/tests/per_pack_repository.py 2010-03-18 05:26:26 +0000
270@@ -288,6 +288,23 @@
271 repo._pack_collection._clear_obsolete_packs()
272 self.assertTrue(repo_transport.has('obsolete_packs/.nfsblahblah'))
273
274+ def test_pack_collection_sets_sibling_indices(self):
275+ """The CombinedGraphIndex objects in the pack collection are all
276+ siblings of each other, so that search-order reorderings will be copied
277+ to each other.
278+ """
279+ repo = self.make_repository('repo')
280+ pack_coll = repo._pack_collection
281+ indices = set([pack_coll.revision_index, pack_coll.inventory_index,
282+ pack_coll.text_index, pack_coll.signature_index])
283+ if pack_coll.chk_index is not None:
284+ indices.add(pack_coll.chk_index)
285+ combined_indices = set(idx.combined_index for idx in indices)
286+ for combined_index in combined_indices:
287+ self.assertEqual(
288+ combined_indices.difference([combined_index]),
289+ combined_index._sibling_indices)
290+
291 def test_pack_after_two_commits_packs_everything(self):
292 format = self.get_format()
293 tree = self.make_branch_and_tree('.', format=format)
294
295=== modified file 'bzrlib/tests/test_index.py'
296--- bzrlib/tests/test_index.py 2010-02-17 17:11:16 +0000
297+++ bzrlib/tests/test_index.py 2010-03-18 05:26:26 +0000
298@@ -1349,6 +1349,50 @@
299 self.assertListRaises(errors.NoSuchFile, index.iter_entries_prefix,
300 [('1',)])
301
302+
303+ def make_index_with_simple_nodes(self, name, num_nodes=1):
304+ """Make an index named after 'name', with keys named after 'name' too.
305+
306+ Nodes will have a value of '' and no references.
307+ """
308+ nodes = [
309+ (('index-%s-key-%s' % (name, n),), '', ())
310+ for n in range(1, num_nodes+1)]
311+ return self.make_index('index-%s' % name, 0, nodes=nodes)
312+
313+ def test_reorder_after_iter_entries(self):
314+ # Four indices: [key1] in index1, [key2,key3] in index2, [] in index3,
315+ # [key4] in index4.
316+ index = CombinedGraphIndex([])
317+ index.insert_index(0, self.make_index_with_simple_nodes('1'), '1')
318+ index.insert_index(1, self.make_index_with_simple_nodes('2'), '2')
319+ index.insert_index(2, self.make_index_with_simple_nodes('3'), '3')
320+ index.insert_index(3, self.make_index_with_simple_nodes('4'), '4')
321+ index1, index2, index3, index4 = index._indices
322+ # Query a key from index4 and index2.
323+ self.assertLength(2, list(index.iter_entries(
324+ [('index-4-key-1',), ('index-2-key-1',)])))
325+ # Now index2 and index4 should be moved to the front (and index1 should
326+ # still be before index3).
327+ self.assertEqual([index2, index4, index1, index3], index._indices)
328+ self.assertEqual(['2', '4', '1', '3'], index._index_names)
329+
330+ def test_reorder_propagates_to_siblings(self):
331+ # Two CombinedGraphIndex objects, with the same number of indicies with
332+ # matching names.
333+ cgi1 = CombinedGraphIndex([])
334+ cgi2 = CombinedGraphIndex([])
335+ cgi1.insert_index(0, self.make_index_with_simple_nodes('1-1'), 'one')
336+ cgi1.insert_index(1, self.make_index_with_simple_nodes('1-2'), 'two')
337+ cgi2.insert_index(0, self.make_index_with_simple_nodes('2-1'), 'one')
338+ cgi2.insert_index(1, self.make_index_with_simple_nodes('2-2'), 'two')
339+ index2_1, index2_2 = cgi2._indices
340+ cgi1.set_sibling_indices([cgi2])
341+ # Trigger a reordering in cgi1. cgi2 will be reordered as well.
342+ list(cgi1.iter_entries([('index-1-2-key-1',)]))
343+ self.assertEqual([index2_2, index2_1], cgi2._indices)
344+ self.assertEqual(['two', 'one'], cgi2._index_names)
345+
346 def test_validate_reloads(self):
347 index, reload_counter = self.make_combined_index_with_missing()
348 index.validate()