Merge lp:~jameinel/bzr/2.1.0b4-convert-kg-heads into lp:bzr
- 2.1.0b4-convert-kg-heads
- Merge into bzr.dev
Status: | Merged | ||||
---|---|---|---|---|---|
Approved by: | Robert Collins | ||||
Approved revision: | no longer in the source branch. | ||||
Merged at revision: | not available | ||||
Proposed branch: | lp:~jameinel/bzr/2.1.0b4-convert-kg-heads | ||||
Merge into: | lp:bzr | ||||
Diff against target: |
229 lines (+69/-16) 5 files modified
NEWS (+4/-0) bzrlib/fetch.py (+16/-5) bzrlib/graph.py (+13/-0) bzrlib/repository.py (+18/-11) bzrlib/tests/test_graph.py (+18/-0) |
||||
To merge this branch: | bzr merge lp:~jameinel/bzr/2.1.0b4-convert-kg-heads | ||||
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Robert Collins (community) | Approve | ||
Review via email: mp+15394@code.launchpad.net |
Commit message
Description of the change
John A Meinel (jameinel) wrote : | # |
Robert Collins (lifeless) wrote : | # |
I'm not sure that 100 is the right figure, could you do some timing data
on e.g. emacs and put forward a change to that figure if its wrong?
This branch can land now however.
review: approve
merge: approve
John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Robert Collins wrote:
> Review: Approve
> I'm not sure that 100 is the right figure, could you do some timing data
> on e.g. emacs and put forward a change to that figure if its wrong?
>
> This branch can land now however.
>
> review: approve
> merge: approve
>
So I don't think there is going to be an ideal value that is always correct.
The emacs history is going to be very different from mysql, for example.
(I've got about 70k mysql revs, but the mainline is only 2k-4k long
depending on the branch.)
However, repo.revisions.
Takes somewhere between 0.5 to 1.0s. Conversions I've done are around 3
revs per second. This heads is about 2x faster, so if we assume
3r/s * 1.0s / 0.5 = 6revs.
So somewhere around 6 revs it will be faster to just load the whole
revision graph.
As mentioned, there are lots of variables. (I think Emacs heads() is
going to generally be cheaper because it is more linear, though it may
be more expensive because then you have to walk more revs?)
I could drop it to 10 revs if you would prefer. Adding 1.0s is pretty
trivial, and we only do this on 'local' access. (IDS only works on
file=>file, and it is being done on the 'source' side for smart streaming.)
Were you thinking the number should be bigger or smaller?
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAks
0bcAn1QsHYKA4FW
=QFMf
-----END PGP SIGNATURE-----
Robert Collins (lifeless) wrote : | # |
So, all I knew was that 100 was a WAG :)
If the tradeoff point is 6 revisions on mysql, it would be great to also
have a figure for where it is on emacs, then we could triangulate.
However, it seems like 25 or 50 or something might be better for now
until we have more more data.
-Rob
John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Robert Collins wrote:
> So, all I knew was that 100 was a WAG :)
>
> If the tradeoff point is 6 revisions on mysql, it would be great to also
> have a figure for where it is on emacs, then we could triangulate.
> However, it seems like 25 or 50 or something might be better for now
> until we have more more data.
>
> -Rob
>
I'm having some difficulty downloading the emacs source. My laptop seems
to like to hiccup on new network connections periodically, and the time
to download all of emacs is outside of that time. I do have it
downloading on my server, which is a bit underpowered.
It did show our limitations over HTTP, as I hit 335MB downloaded, but
the repo is only about 206MB total. The download is also *very* choppy,
but that machine is woefully underpowered. (P-III 700MHz.)
Anyway, emacs seems to be 3.5s. So if you go with the same 50% +
3revs/sec you end up with 21.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAks
MOwAnjyuNYtmU5w
=oHv6
-----END PGP SIGNATURE-----
Preview Diff
1 | === modified file 'NEWS' | |||
2 | --- NEWS 2009-11-27 10:43:47 +0000 | |||
3 | +++ NEWS 2009-11-30 03:50:26 +0000 | |||
4 | @@ -53,6 +53,10 @@ | |||
5 | 53 | etc. Now only change slashes if there is something being glob expanded. | 53 | etc. Now only change slashes if there is something being glob expanded. |
6 | 54 | (John Arbash Meinel, #485771) | 54 | (John Arbash Meinel, #485771) |
7 | 55 | 55 | ||
8 | 56 | * Use our faster ``KnownGraph.heads()`` functionality when computing the | ||
9 | 57 | new rich-root heads. This can cut a conversion time in half (mysql from | ||
10 | 58 | 13.5h => 6.2h) (John Arbash Meinel, #487632) | ||
11 | 59 | |||
12 | 56 | Improvements | 60 | Improvements |
13 | 57 | ************ | 61 | ************ |
14 | 58 | 62 | ||
15 | 59 | 63 | ||
16 | === modified file 'bzrlib/fetch.py' | |||
17 | --- bzrlib/fetch.py 2009-08-07 04:29:36 +0000 | |||
18 | +++ bzrlib/fetch.py 2009-11-30 03:50:26 +0000 | |||
19 | @@ -28,6 +28,8 @@ | |||
20 | 28 | from bzrlib.lazy_import import lazy_import | 28 | from bzrlib.lazy_import import lazy_import |
21 | 29 | lazy_import(globals(), """ | 29 | lazy_import(globals(), """ |
22 | 30 | from bzrlib import ( | 30 | from bzrlib import ( |
23 | 31 | graph as _mod_graph, | ||
24 | 32 | static_tuple, | ||
25 | 31 | tsort, | 33 | tsort, |
26 | 32 | versionedfile, | 34 | versionedfile, |
27 | 33 | ) | 35 | ) |
28 | @@ -36,10 +38,10 @@ | |||
29 | 36 | from bzrlib import ( | 38 | from bzrlib import ( |
30 | 37 | errors, | 39 | errors, |
31 | 38 | symbol_versioning, | 40 | symbol_versioning, |
32 | 41 | ui, | ||
33 | 39 | ) | 42 | ) |
34 | 40 | from bzrlib.revision import NULL_REVISION | 43 | from bzrlib.revision import NULL_REVISION |
35 | 41 | from bzrlib.trace import mutter | 44 | from bzrlib.trace import mutter |
36 | 42 | import bzrlib.ui | ||
37 | 43 | 45 | ||
38 | 44 | 46 | ||
39 | 45 | class RepoFetcher(object): | 47 | class RepoFetcher(object): |
40 | @@ -96,7 +98,7 @@ | |||
41 | 96 | # assert not missing | 98 | # assert not missing |
42 | 97 | self.count_total = 0 | 99 | self.count_total = 0 |
43 | 98 | self.file_ids_names = {} | 100 | self.file_ids_names = {} |
45 | 99 | pb = bzrlib.ui.ui_factory.nested_progress_bar() | 101 | pb = ui.ui_factory.nested_progress_bar() |
46 | 100 | pb.show_pct = pb.show_count = False | 102 | pb.show_pct = pb.show_count = False |
47 | 101 | try: | 103 | try: |
48 | 102 | pb.update("Finding revisions", 0, 2) | 104 | pb.update("Finding revisions", 0, 2) |
49 | @@ -123,7 +125,7 @@ | |||
50 | 123 | raise errors.IncompatibleRepositories( | 125 | raise errors.IncompatibleRepositories( |
51 | 124 | self.from_repository, self.to_repository, | 126 | self.from_repository, self.to_repository, |
52 | 125 | "different rich-root support") | 127 | "different rich-root support") |
54 | 126 | pb = bzrlib.ui.ui_factory.nested_progress_bar() | 128 | pb = ui.ui_factory.nested_progress_bar() |
55 | 127 | try: | 129 | try: |
56 | 128 | pb.update("Get stream source") | 130 | pb.update("Get stream source") |
57 | 129 | source = self.from_repository._get_source( | 131 | source = self.from_repository._get_source( |
58 | @@ -251,13 +253,22 @@ | |||
59 | 251 | # yet, and are unlikely to in non-rich-root environments anyway. | 253 | # yet, and are unlikely to in non-rich-root environments anyway. |
60 | 252 | root_id_order.sort(key=operator.itemgetter(0)) | 254 | root_id_order.sort(key=operator.itemgetter(0)) |
61 | 253 | # Create a record stream containing the roots to create. | 255 | # Create a record stream containing the roots to create. |
64 | 254 | from bzrlib.graph import FrozenHeadsCache | 256 | if len(revs) > 100: |
65 | 255 | graph = FrozenHeadsCache(graph) | 257 | graph = _get_rich_root_heads_graph(self.source_repo, revs) |
66 | 256 | new_roots_stream = _new_root_data_stream( | 258 | new_roots_stream = _new_root_data_stream( |
67 | 257 | root_id_order, rev_id_to_root_id, parent_map, self.source, graph) | 259 | root_id_order, rev_id_to_root_id, parent_map, self.source, graph) |
68 | 258 | return [('texts', new_roots_stream)] | 260 | return [('texts', new_roots_stream)] |
69 | 259 | 261 | ||
70 | 260 | 262 | ||
71 | 263 | def _get_rich_root_heads_graph(source_repo, revision_ids): | ||
72 | 264 | """Get a Graph object suitable for asking heads() for new rich roots.""" | ||
73 | 265 | st = static_tuple.StaticTuple | ||
74 | 266 | revision_keys = [st(r_id).intern() for r_id in revision_ids] | ||
75 | 267 | known_graph = source_repo.revisions.get_known_graph_ancestry( | ||
76 | 268 | revision_keys) | ||
77 | 269 | return graph.GraphThunkIdsToKeys(known_graph) | ||
78 | 270 | |||
79 | 271 | |||
80 | 261 | def _new_root_data_stream( | 272 | def _new_root_data_stream( |
81 | 262 | root_keys_to_create, rev_id_to_root_id_map, parent_map, repo, graph=None): | 273 | root_keys_to_create, rev_id_to_root_id_map, parent_map, repo, graph=None): |
82 | 263 | """Generate a texts substream of synthesised root entries. | 274 | """Generate a texts substream of synthesised root entries. |
83 | 264 | 275 | ||
84 | === modified file 'bzrlib/graph.py' | |||
85 | --- bzrlib/graph.py 2009-09-14 01:48:28 +0000 | |||
86 | +++ bzrlib/graph.py 2009-11-30 03:50:26 +0000 | |||
87 | @@ -1679,6 +1679,19 @@ | |||
88 | 1679 | return result | 1679 | return result |
89 | 1680 | 1680 | ||
90 | 1681 | 1681 | ||
91 | 1682 | class GraphThunkIdsToKeys(object): | ||
92 | 1683 | """Forwards calls about 'ids' to be about keys internally.""" | ||
93 | 1684 | |||
94 | 1685 | def __init__(self, graph): | ||
95 | 1686 | self._graph = graph | ||
96 | 1687 | |||
97 | 1688 | def heads(self, ids): | ||
98 | 1689 | """See Graph.heads()""" | ||
99 | 1690 | as_keys = [(i,) for i in ids] | ||
100 | 1691 | head_keys = self._graph.heads(as_keys) | ||
101 | 1692 | return set([h[0] for h in head_keys]) | ||
102 | 1693 | |||
103 | 1694 | |||
104 | 1682 | _counters = [0,0,0,0,0,0,0] | 1695 | _counters = [0,0,0,0,0,0,0] |
105 | 1683 | try: | 1696 | try: |
106 | 1684 | from bzrlib._known_graph_pyx import KnownGraph | 1697 | from bzrlib._known_graph_pyx import KnownGraph |
107 | 1685 | 1698 | ||
108 | === modified file 'bzrlib/repository.py' | |||
109 | --- bzrlib/repository.py 2009-11-19 15:06:47 +0000 | |||
110 | +++ bzrlib/repository.py 2009-11-30 03:50:26 +0000 | |||
111 | @@ -26,6 +26,7 @@ | |||
112 | 26 | chk_map, | 26 | chk_map, |
113 | 27 | debug, | 27 | debug, |
114 | 28 | errors, | 28 | errors, |
115 | 29 | fetch as _mod_fetch, | ||
116 | 29 | fifo_cache, | 30 | fifo_cache, |
117 | 30 | generate_ids, | 31 | generate_ids, |
118 | 31 | gpg, | 32 | gpg, |
119 | @@ -2668,8 +2669,8 @@ | |||
120 | 2668 | for ((revision_id,), parent_keys) in \ | 2669 | for ((revision_id,), parent_keys) in \ |
121 | 2669 | self.revisions.get_parent_map(query_keys).iteritems(): | 2670 | self.revisions.get_parent_map(query_keys).iteritems(): |
122 | 2670 | if parent_keys: | 2671 | if parent_keys: |
125 | 2671 | result[revision_id] = tuple(parent_revid | 2672 | result[revision_id] = tuple([parent_revid |
126 | 2672 | for (parent_revid,) in parent_keys) | 2673 | for (parent_revid,) in parent_keys]) |
127 | 2673 | else: | 2674 | else: |
128 | 2674 | result[revision_id] = (_mod_revision.NULL_REVISION,) | 2675 | result[revision_id] = (_mod_revision.NULL_REVISION,) |
129 | 2675 | return result | 2676 | return result |
130 | @@ -3412,8 +3413,7 @@ | |||
131 | 3412 | provided a default one will be created. | 3413 | provided a default one will be created. |
132 | 3413 | :return: None. | 3414 | :return: None. |
133 | 3414 | """ | 3415 | """ |
136 | 3415 | from bzrlib.fetch import RepoFetcher | 3416 | f = _mod_fetch.RepoFetcher(to_repository=self.target, |
135 | 3416 | f = RepoFetcher(to_repository=self.target, | ||
137 | 3417 | from_repository=self.source, | 3417 | from_repository=self.source, |
138 | 3418 | last_revision=revision_id, | 3418 | last_revision=revision_id, |
139 | 3419 | fetch_spec=fetch_spec, | 3419 | fetch_spec=fetch_spec, |
140 | @@ -3819,13 +3819,15 @@ | |||
141 | 3819 | basis_id, delta, current_revision_id, parents_parents) | 3819 | basis_id, delta, current_revision_id, parents_parents) |
142 | 3820 | cache[current_revision_id] = parent_tree | 3820 | cache[current_revision_id] = parent_tree |
143 | 3821 | 3821 | ||
145 | 3822 | def _fetch_batch(self, revision_ids, basis_id, cache): | 3822 | def _fetch_batch(self, revision_ids, basis_id, cache, a_graph=None): |
146 | 3823 | """Fetch across a few revisions. | 3823 | """Fetch across a few revisions. |
147 | 3824 | 3824 | ||
148 | 3825 | :param revision_ids: The revisions to copy | 3825 | :param revision_ids: The revisions to copy |
149 | 3826 | :param basis_id: The revision_id of a tree that must be in cache, used | 3826 | :param basis_id: The revision_id of a tree that must be in cache, used |
150 | 3827 | as a basis for delta when no other base is available | 3827 | as a basis for delta when no other base is available |
151 | 3828 | :param cache: A cache of RevisionTrees that we can use. | 3828 | :param cache: A cache of RevisionTrees that we can use. |
152 | 3829 | :param a_graph: A Graph object to determine the heads() of the | ||
153 | 3830 | rich-root data stream. | ||
154 | 3829 | :return: The revision_id of the last converted tree. The RevisionTree | 3831 | :return: The revision_id of the last converted tree. The RevisionTree |
155 | 3830 | for it will be in cache | 3832 | for it will be in cache |
156 | 3831 | """ | 3833 | """ |
157 | @@ -3895,10 +3897,9 @@ | |||
158 | 3895 | from_texts = self.source.texts | 3897 | from_texts = self.source.texts |
159 | 3896 | to_texts = self.target.texts | 3898 | to_texts = self.target.texts |
160 | 3897 | if root_keys_to_create: | 3899 | if root_keys_to_create: |
163 | 3898 | from bzrlib.fetch import _new_root_data_stream | 3900 | root_stream = _mod_fetch._new_root_data_stream( |
162 | 3899 | root_stream = _new_root_data_stream( | ||
164 | 3900 | root_keys_to_create, self._revision_id_to_root_id, parent_map, | 3901 | root_keys_to_create, self._revision_id_to_root_id, parent_map, |
166 | 3901 | self.source) | 3902 | self.source, graph=a_graph) |
167 | 3902 | to_texts.insert_record_stream(root_stream) | 3903 | to_texts.insert_record_stream(root_stream) |
168 | 3903 | to_texts.insert_record_stream(from_texts.get_record_stream( | 3904 | to_texts.insert_record_stream(from_texts.get_record_stream( |
169 | 3904 | text_keys, self.target._format._fetch_order, | 3905 | text_keys, self.target._format._fetch_order, |
170 | @@ -3961,13 +3962,20 @@ | |||
171 | 3961 | cache[basis_id] = basis_tree | 3962 | cache[basis_id] = basis_tree |
172 | 3962 | del basis_tree # We don't want to hang on to it here | 3963 | del basis_tree # We don't want to hang on to it here |
173 | 3963 | hints = [] | 3964 | hints = [] |
174 | 3965 | if self._converting_to_rich_root and len(revision_ids) > 100: | ||
175 | 3966 | a_graph = _mod_fetch._get_rich_root_heads_graph(self.source, | ||
176 | 3967 | revision_ids) | ||
177 | 3968 | else: | ||
178 | 3969 | a_graph = None | ||
179 | 3970 | |||
180 | 3964 | for offset in range(0, len(revision_ids), batch_size): | 3971 | for offset in range(0, len(revision_ids), batch_size): |
181 | 3965 | self.target.start_write_group() | 3972 | self.target.start_write_group() |
182 | 3966 | try: | 3973 | try: |
183 | 3967 | pb.update('Transferring revisions', offset, | 3974 | pb.update('Transferring revisions', offset, |
184 | 3968 | len(revision_ids)) | 3975 | len(revision_ids)) |
185 | 3969 | batch = revision_ids[offset:offset+batch_size] | 3976 | batch = revision_ids[offset:offset+batch_size] |
187 | 3970 | basis_id = self._fetch_batch(batch, basis_id, cache) | 3977 | basis_id = self._fetch_batch(batch, basis_id, cache, |
188 | 3978 | a_graph=a_graph) | ||
189 | 3971 | except: | 3979 | except: |
190 | 3972 | self.target.abort_write_group() | 3980 | self.target.abort_write_group() |
191 | 3973 | raise | 3981 | raise |
192 | @@ -4446,8 +4454,7 @@ | |||
193 | 4446 | fetching the inventory weave. | 4454 | fetching the inventory weave. |
194 | 4447 | """ | 4455 | """ |
195 | 4448 | if self._rich_root_upgrade(): | 4456 | if self._rich_root_upgrade(): |
198 | 4449 | import bzrlib.fetch | 4457 | return _mod_fetch.Inter1and2Helper( |
197 | 4450 | return bzrlib.fetch.Inter1and2Helper( | ||
199 | 4451 | self.from_repository).generate_root_texts(revs) | 4458 | self.from_repository).generate_root_texts(revs) |
200 | 4452 | else: | 4459 | else: |
201 | 4453 | return [] | 4460 | return [] |
202 | 4454 | 4461 | ||
203 | === modified file 'bzrlib/tests/test_graph.py' | |||
204 | --- bzrlib/tests/test_graph.py 2009-08-04 04:36:34 +0000 | |||
205 | +++ bzrlib/tests/test_graph.py 2009-11-30 03:50:26 +0000 | |||
206 | @@ -1582,6 +1582,24 @@ | |||
207 | 1582 | self.assertCollapsed(d, d) | 1582 | self.assertCollapsed(d, d) |
208 | 1583 | 1583 | ||
209 | 1584 | 1584 | ||
210 | 1585 | class TestGraphThunkIdsToKeys(tests.TestCase): | ||
211 | 1586 | |||
212 | 1587 | def test_heads(self): | ||
213 | 1588 | # A | ||
214 | 1589 | # |\ | ||
215 | 1590 | # B C | ||
216 | 1591 | # |/ | ||
217 | 1592 | # D | ||
218 | 1593 | d = {('D',): [('B',), ('C',)], ('C',):[('A',)], | ||
219 | 1594 | ('B',): [('A',)], ('A',): []} | ||
220 | 1595 | g = _mod_graph.Graph(_mod_graph.DictParentsProvider(d)) | ||
221 | 1596 | graph_thunk = _mod_graph.GraphThunkIdsToKeys(g) | ||
222 | 1597 | self.assertEqual(['D'], sorted(graph_thunk.heads(['D', 'A']))) | ||
223 | 1598 | self.assertEqual(['D'], sorted(graph_thunk.heads(['D', 'B']))) | ||
224 | 1599 | self.assertEqual(['D'], sorted(graph_thunk.heads(['D', 'C']))) | ||
225 | 1600 | self.assertEqual(['B', 'C'], sorted(graph_thunk.heads(['B', 'C']))) | ||
226 | 1601 | |||
227 | 1602 | |||
228 | 1585 | class TestPendingAncestryResultGetKeys(TestCaseWithMemoryTransport): | 1603 | class TestPendingAncestryResultGetKeys(TestCaseWithMemoryTransport): |
229 | 1586 | """Tests for bzrlib.graph.PendingAncestryResult.""" | 1604 | """Tests for bzrlib.graph.PendingAncestryResult.""" |
230 | 1587 | 1605 |
See this graph: launchpadlibrar ian.net/ 36247556/ conversion_ times.png
http://
For the net effect of this patch. In converting mysql from 1.9 => 2a it cuts the time for the conversion approx 2:1.
The basic observation was that we spend *way* too much time graph walking during the conversion. Part of that was because we didn't pass *any* graph object down, so we didn't have any benefit of caching, etc. So for all heads() requests we paid the full api layering (translating back and forth from strings <=> tuples for all entries in the history, etc.)
Anyway, we have a very fast heads implementation, so lets use it. I put in the check:
if ... len(revision_ids) > 100:
in there, so that we don't *always* load the full history for small conversion updates. However, for anything more than that, we probably get a net win out of loading the whole ancestry and using the faster heads implementation.
I wish there was a way I could put the image inline, like I used to be able to do with Thunderbird sending requests via Bundle Buggy. (At least, people using email clients would see the image right away...)