Merge lp:~jameinel/bzr/2.1.0b4-convert-kg-heads into lp:bzr

Proposed by John A Meinel
Status: Merged
Approved by: Robert Collins
Approved revision: no longer in the source branch.
Merged at revision: not available
Proposed branch: lp:~jameinel/bzr/2.1.0b4-convert-kg-heads
Merge into: lp:bzr
Diff against target: 229 lines (+69/-16)
5 files modified
NEWS (+4/-0)
bzrlib/fetch.py (+16/-5)
bzrlib/graph.py (+13/-0)
bzrlib/repository.py (+18/-11)
bzrlib/tests/test_graph.py (+18/-0)
To merge this branch: bzr merge lp:~jameinel/bzr/2.1.0b4-convert-kg-heads
Reviewer Review Type Date Requested Status
Robert Collins (community) Approve
Review via email: mp+15394@code.launchpad.net
To post a comment you must log in.
Revision history for this message
John A Meinel (jameinel) wrote :

See this graph:
http://launchpadlibrarian.net/36247556/conversion_times.png

For the net effect of this patch. In converting mysql from 1.9 => 2a it cuts the time for the conversion approx 2:1.

The basic observation was that we spend *way* too much time graph walking during the conversion. Part of that was because we didn't pass *any* graph object down, so we didn't have any benefit of caching, etc. So for all heads() requests we paid the full api layering (translating back and forth from strings <=> tuples for all entries in the history, etc.)

Anyway, we have a very fast heads implementation, so lets use it. I put in the check:

if ... len(revision_ids) > 100:

in there, so that we don't *always* load the full history for small conversion updates. However, for anything more than that, we probably get a net win out of loading the whole ancestry and using the faster heads implementation.

I wish there was a way I could put the image inline, like I used to be able to do with Thunderbird sending requests via Bundle Buggy. (At least, people using email clients would see the image right away...)

Revision history for this message
Robert Collins (lifeless) wrote :

I'm not sure that 100 is the right figure, could you do some timing data
on e.g. emacs and put forward a change to that figure if its wrong?

This branch can land now however.

 review: approve
 merge: approve

review: Approve
Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> Review: Approve
> I'm not sure that 100 is the right figure, could you do some timing data
> on e.g. emacs and put forward a change to that figure if its wrong?
>
> This branch can land now however.
>
> review: approve
> merge: approve
>

So I don't think there is going to be an ideal value that is always correct.

The emacs history is going to be very different from mysql, for example.
(I've got about 70k mysql revs, but the mainline is only 2k-4k long
depending on the branch.)

However, repo.revisions.get_known_graph_ancestry([tip])

Takes somewhere between 0.5 to 1.0s. Conversions I've done are around 3
revs per second. This heads is about 2x faster, so if we assume

3r/s * 1.0s / 0.5 = 6revs.

So somewhere around 6 revs it will be faster to just load the whole
revision graph.

As mentioned, there are lots of variables. (I think Emacs heads() is
going to generally be cheaper because it is more linear, though it may
be more expensive because then you have to walk more revs?)

I could drop it to 10 revs if you would prefer. Adding 1.0s is pretty
trivial, and we only do this on 'local' access. (IDS only works on
file=>file, and it is being done on the 'source' side for smart streaming.)

Were you thinking the number should be bigger or smaller?

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksT7JQACgkQJdeBCYSNAANbMgCfcqUZg3F6mfUsOGfCpdwjnKx/
0bcAn1QsHYKA4FW2cKvYnw5jnGj6OM6w
=QFMf
-----END PGP SIGNATURE-----

Revision history for this message
Robert Collins (lifeless) wrote :

So, all I knew was that 100 was a WAG :)

If the tradeoff point is 6 revisions on mysql, it would be great to also
have a figure for where it is on emacs, then we could triangulate.
However, it seems like 25 or 50 or something might be better for now
until we have more more data.

-Rob

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> So, all I knew was that 100 was a WAG :)
>
> If the tradeoff point is 6 revisions on mysql, it would be great to also
> have a figure for where it is on emacs, then we could triangulate.
> However, it seems like 25 or 50 or something might be better for now
> until we have more more data.
>
> -Rob
>

I'm having some difficulty downloading the emacs source. My laptop seems
to like to hiccup on new network connections periodically, and the time
to download all of emacs is outside of that time. I do have it
downloading on my server, which is a bit underpowered.

It did show our limitations over HTTP, as I hit 335MB downloaded, but
the repo is only about 206MB total. The download is also *very* choppy,
but that machine is woefully underpowered. (P-III 700MHz.)

Anyway, emacs seems to be 3.5s. So if you go with the same 50% +
3revs/sec you end up with 21.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksWev8ACgkQJdeBCYSNAAMv/wCeNGBEQdhdpOYKy+YcbBZ4E/Rs
MOwAnjyuNYtmU5wJ+Uh3azUscM0inIJV
=oHv6
-----END PGP SIGNATURE-----

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'NEWS'
--- NEWS 2009-11-27 10:43:47 +0000
+++ NEWS 2009-11-30 03:50:26 +0000
@@ -53,6 +53,10 @@
53 etc. Now only change slashes if there is something being glob expanded.53 etc. Now only change slashes if there is something being glob expanded.
54 (John Arbash Meinel, #485771)54 (John Arbash Meinel, #485771)
5555
56* Use our faster ``KnownGraph.heads()`` functionality when computing the
57 new rich-root heads. This can cut a conversion time in half (mysql from
58 13.5h => 6.2h) (John Arbash Meinel, #487632)
59
56Improvements60Improvements
57************61************
5862
5963
=== modified file 'bzrlib/fetch.py'
--- bzrlib/fetch.py 2009-08-07 04:29:36 +0000
+++ bzrlib/fetch.py 2009-11-30 03:50:26 +0000
@@ -28,6 +28,8 @@
28from bzrlib.lazy_import import lazy_import28from bzrlib.lazy_import import lazy_import
29lazy_import(globals(), """29lazy_import(globals(), """
30from bzrlib import (30from bzrlib import (
31 graph as _mod_graph,
32 static_tuple,
31 tsort,33 tsort,
32 versionedfile,34 versionedfile,
33 )35 )
@@ -36,10 +38,10 @@
36from bzrlib import (38from bzrlib import (
37 errors,39 errors,
38 symbol_versioning,40 symbol_versioning,
41 ui,
39 )42 )
40from bzrlib.revision import NULL_REVISION43from bzrlib.revision import NULL_REVISION
41from bzrlib.trace import mutter44from bzrlib.trace import mutter
42import bzrlib.ui
4345
4446
45class RepoFetcher(object):47class RepoFetcher(object):
@@ -96,7 +98,7 @@
96 # assert not missing98 # assert not missing
97 self.count_total = 099 self.count_total = 0
98 self.file_ids_names = {}100 self.file_ids_names = {}
99 pb = bzrlib.ui.ui_factory.nested_progress_bar()101 pb = ui.ui_factory.nested_progress_bar()
100 pb.show_pct = pb.show_count = False102 pb.show_pct = pb.show_count = False
101 try:103 try:
102 pb.update("Finding revisions", 0, 2)104 pb.update("Finding revisions", 0, 2)
@@ -123,7 +125,7 @@
123 raise errors.IncompatibleRepositories(125 raise errors.IncompatibleRepositories(
124 self.from_repository, self.to_repository,126 self.from_repository, self.to_repository,
125 "different rich-root support")127 "different rich-root support")
126 pb = bzrlib.ui.ui_factory.nested_progress_bar()128 pb = ui.ui_factory.nested_progress_bar()
127 try:129 try:
128 pb.update("Get stream source")130 pb.update("Get stream source")
129 source = self.from_repository._get_source(131 source = self.from_repository._get_source(
@@ -251,13 +253,22 @@
251 # yet, and are unlikely to in non-rich-root environments anyway.253 # yet, and are unlikely to in non-rich-root environments anyway.
252 root_id_order.sort(key=operator.itemgetter(0))254 root_id_order.sort(key=operator.itemgetter(0))
253 # Create a record stream containing the roots to create.255 # Create a record stream containing the roots to create.
254 from bzrlib.graph import FrozenHeadsCache256 if len(revs) > 100:
255 graph = FrozenHeadsCache(graph)257 graph = _get_rich_root_heads_graph(self.source_repo, revs)
256 new_roots_stream = _new_root_data_stream(258 new_roots_stream = _new_root_data_stream(
257 root_id_order, rev_id_to_root_id, parent_map, self.source, graph)259 root_id_order, rev_id_to_root_id, parent_map, self.source, graph)
258 return [('texts', new_roots_stream)]260 return [('texts', new_roots_stream)]
259261
260262
263def _get_rich_root_heads_graph(source_repo, revision_ids):
264 """Get a Graph object suitable for asking heads() for new rich roots."""
265 st = static_tuple.StaticTuple
266 revision_keys = [st(r_id).intern() for r_id in revision_ids]
267 known_graph = source_repo.revisions.get_known_graph_ancestry(
268 revision_keys)
269 return graph.GraphThunkIdsToKeys(known_graph)
270
271
261def _new_root_data_stream(272def _new_root_data_stream(
262 root_keys_to_create, rev_id_to_root_id_map, parent_map, repo, graph=None):273 root_keys_to_create, rev_id_to_root_id_map, parent_map, repo, graph=None):
263 """Generate a texts substream of synthesised root entries.274 """Generate a texts substream of synthesised root entries.
264275
=== modified file 'bzrlib/graph.py'
--- bzrlib/graph.py 2009-09-14 01:48:28 +0000
+++ bzrlib/graph.py 2009-11-30 03:50:26 +0000
@@ -1679,6 +1679,19 @@
1679 return result1679 return result
16801680
16811681
1682class GraphThunkIdsToKeys(object):
1683 """Forwards calls about 'ids' to be about keys internally."""
1684
1685 def __init__(self, graph):
1686 self._graph = graph
1687
1688 def heads(self, ids):
1689 """See Graph.heads()"""
1690 as_keys = [(i,) for i in ids]
1691 head_keys = self._graph.heads(as_keys)
1692 return set([h[0] for h in head_keys])
1693
1694
1682_counters = [0,0,0,0,0,0,0]1695_counters = [0,0,0,0,0,0,0]
1683try:1696try:
1684 from bzrlib._known_graph_pyx import KnownGraph1697 from bzrlib._known_graph_pyx import KnownGraph
16851698
=== modified file 'bzrlib/repository.py'
--- bzrlib/repository.py 2009-11-19 15:06:47 +0000
+++ bzrlib/repository.py 2009-11-30 03:50:26 +0000
@@ -26,6 +26,7 @@
26 chk_map,26 chk_map,
27 debug,27 debug,
28 errors,28 errors,
29 fetch as _mod_fetch,
29 fifo_cache,30 fifo_cache,
30 generate_ids,31 generate_ids,
31 gpg,32 gpg,
@@ -2668,8 +2669,8 @@
2668 for ((revision_id,), parent_keys) in \2669 for ((revision_id,), parent_keys) in \
2669 self.revisions.get_parent_map(query_keys).iteritems():2670 self.revisions.get_parent_map(query_keys).iteritems():
2670 if parent_keys:2671 if parent_keys:
2671 result[revision_id] = tuple(parent_revid2672 result[revision_id] = tuple([parent_revid
2672 for (parent_revid,) in parent_keys)2673 for (parent_revid,) in parent_keys])
2673 else:2674 else:
2674 result[revision_id] = (_mod_revision.NULL_REVISION,)2675 result[revision_id] = (_mod_revision.NULL_REVISION,)
2675 return result2676 return result
@@ -3412,8 +3413,7 @@
3412 provided a default one will be created.3413 provided a default one will be created.
3413 :return: None.3414 :return: None.
3414 """3415 """
3415 from bzrlib.fetch import RepoFetcher3416 f = _mod_fetch.RepoFetcher(to_repository=self.target,
3416 f = RepoFetcher(to_repository=self.target,
3417 from_repository=self.source,3417 from_repository=self.source,
3418 last_revision=revision_id,3418 last_revision=revision_id,
3419 fetch_spec=fetch_spec,3419 fetch_spec=fetch_spec,
@@ -3819,13 +3819,15 @@
3819 basis_id, delta, current_revision_id, parents_parents)3819 basis_id, delta, current_revision_id, parents_parents)
3820 cache[current_revision_id] = parent_tree3820 cache[current_revision_id] = parent_tree
38213821
3822 def _fetch_batch(self, revision_ids, basis_id, cache):3822 def _fetch_batch(self, revision_ids, basis_id, cache, a_graph=None):
3823 """Fetch across a few revisions.3823 """Fetch across a few revisions.
38243824
3825 :param revision_ids: The revisions to copy3825 :param revision_ids: The revisions to copy
3826 :param basis_id: The revision_id of a tree that must be in cache, used3826 :param basis_id: The revision_id of a tree that must be in cache, used
3827 as a basis for delta when no other base is available3827 as a basis for delta when no other base is available
3828 :param cache: A cache of RevisionTrees that we can use.3828 :param cache: A cache of RevisionTrees that we can use.
3829 :param a_graph: A Graph object to determine the heads() of the
3830 rich-root data stream.
3829 :return: The revision_id of the last converted tree. The RevisionTree3831 :return: The revision_id of the last converted tree. The RevisionTree
3830 for it will be in cache3832 for it will be in cache
3831 """3833 """
@@ -3895,10 +3897,9 @@
3895 from_texts = self.source.texts3897 from_texts = self.source.texts
3896 to_texts = self.target.texts3898 to_texts = self.target.texts
3897 if root_keys_to_create:3899 if root_keys_to_create:
3898 from bzrlib.fetch import _new_root_data_stream3900 root_stream = _mod_fetch._new_root_data_stream(
3899 root_stream = _new_root_data_stream(
3900 root_keys_to_create, self._revision_id_to_root_id, parent_map,3901 root_keys_to_create, self._revision_id_to_root_id, parent_map,
3901 self.source)3902 self.source, graph=a_graph)
3902 to_texts.insert_record_stream(root_stream)3903 to_texts.insert_record_stream(root_stream)
3903 to_texts.insert_record_stream(from_texts.get_record_stream(3904 to_texts.insert_record_stream(from_texts.get_record_stream(
3904 text_keys, self.target._format._fetch_order,3905 text_keys, self.target._format._fetch_order,
@@ -3961,13 +3962,20 @@
3961 cache[basis_id] = basis_tree3962 cache[basis_id] = basis_tree
3962 del basis_tree # We don't want to hang on to it here3963 del basis_tree # We don't want to hang on to it here
3963 hints = []3964 hints = []
3965 if self._converting_to_rich_root and len(revision_ids) > 100:
3966 a_graph = _mod_fetch._get_rich_root_heads_graph(self.source,
3967 revision_ids)
3968 else:
3969 a_graph = None
3970
3964 for offset in range(0, len(revision_ids), batch_size):3971 for offset in range(0, len(revision_ids), batch_size):
3965 self.target.start_write_group()3972 self.target.start_write_group()
3966 try:3973 try:
3967 pb.update('Transferring revisions', offset,3974 pb.update('Transferring revisions', offset,
3968 len(revision_ids))3975 len(revision_ids))
3969 batch = revision_ids[offset:offset+batch_size]3976 batch = revision_ids[offset:offset+batch_size]
3970 basis_id = self._fetch_batch(batch, basis_id, cache)3977 basis_id = self._fetch_batch(batch, basis_id, cache,
3978 a_graph=a_graph)
3971 except:3979 except:
3972 self.target.abort_write_group()3980 self.target.abort_write_group()
3973 raise3981 raise
@@ -4446,8 +4454,7 @@
4446 fetching the inventory weave.4454 fetching the inventory weave.
4447 """4455 """
4448 if self._rich_root_upgrade():4456 if self._rich_root_upgrade():
4449 import bzrlib.fetch4457 return _mod_fetch.Inter1and2Helper(
4450 return bzrlib.fetch.Inter1and2Helper(
4451 self.from_repository).generate_root_texts(revs)4458 self.from_repository).generate_root_texts(revs)
4452 else:4459 else:
4453 return []4460 return []
44544461
=== modified file 'bzrlib/tests/test_graph.py'
--- bzrlib/tests/test_graph.py 2009-08-04 04:36:34 +0000
+++ bzrlib/tests/test_graph.py 2009-11-30 03:50:26 +0000
@@ -1582,6 +1582,24 @@
1582 self.assertCollapsed(d, d)1582 self.assertCollapsed(d, d)
15831583
15841584
1585class TestGraphThunkIdsToKeys(tests.TestCase):
1586
1587 def test_heads(self):
1588 # A
1589 # |\
1590 # B C
1591 # |/
1592 # D
1593 d = {('D',): [('B',), ('C',)], ('C',):[('A',)],
1594 ('B',): [('A',)], ('A',): []}
1595 g = _mod_graph.Graph(_mod_graph.DictParentsProvider(d))
1596 graph_thunk = _mod_graph.GraphThunkIdsToKeys(g)
1597 self.assertEqual(['D'], sorted(graph_thunk.heads(['D', 'A'])))
1598 self.assertEqual(['D'], sorted(graph_thunk.heads(['D', 'B'])))
1599 self.assertEqual(['D'], sorted(graph_thunk.heads(['D', 'C'])))
1600 self.assertEqual(['B', 'C'], sorted(graph_thunk.heads(['B', 'C'])))
1601
1602
1585class TestPendingAncestryResultGetKeys(TestCaseWithMemoryTransport):1603class TestPendingAncestryResultGetKeys(TestCaseWithMemoryTransport):
1586 """Tests for bzrlib.graph.PendingAncestryResult."""1604 """Tests for bzrlib.graph.PendingAncestryResult."""
15871605