Merge lp:~mkbosmans/bzr/topo_sort into lp:~bzr/bzr/trunk-old

Proposed by Maarten Bosmans
Status: Merged
Merged at revision: not available
Proposed branch: lp:~mkbosmans/bzr/topo_sort
Merge into: lp:~bzr/bzr/trunk-old
Diff against target: 323 lines
To merge this branch: bzr merge lp:~mkbosmans/bzr/topo_sort
Reviewer Review Type Date Requested Status
John A Meinel Approve
Review via email: mp+9902@code.launchpad.net

This proposal supersedes a proposal from 2009-07-31.

To post a comment you must log in.
Revision history for this message
Maarten Bosmans (mkbosmans) wrote : Posted in a previous version of this proposal

By refactoring the code in tsort.TopoSorter.iter_topo_order a performance improvement of 40% can be achieved.
The runtime to sort the graph of bzr.dev went from 620 ms to 380 ms on my computer. The slightly modified algorithm is also a bit more readable by flattening out a while loop.

A further improvement to 260 ms can be achieved by using a completely different algorithm for tsort.topo_sort. I'm not shure whether this second improvement is enough to warrant two algorithms, but it probably is because it's quite easy to understand and follow. The faster algorithm cannot be used for the iterator because almost all the works takes place before the first element can be yielded.

Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal
Download full text (3.8 KiB)

131 + for parents in graph.itervalues():
132 + for parent in parents:
133 + if parent in nchildren:
134 + nchildren[parent] += 1

^- since you just populated the dictionary with 0 for every node, the only times when a parent isn't in the nchildren dict is when it is a ghost.
So I wouldn't do a "x in y" check for each one, but just catch a potential KeyError.

eg:

for parents in graph.itervalues():
  for parent in parents:
    try:
      nchildren[parent] += 1
    except KeyError:
      # Ghost, we don't track their child count
      pass

Similarly for this pass:
145 + parents = graph.pop(node_name)
146 + for parent in parents:
147 + if parent in nchildren:
148 + nchildren[parent] -= 1
149 + if nchildren[parent] == 0:
150 + nochildnodes.append(parent)

Actually, in this pass, if we end up with a parent that isn't in 'nchildren' that didn't miss earlier, then we probably have a serious issue.

I might actually turn the first pass into gathering nodes which are known to be ghosts, doing another try/except here and making sure anything missing is already a known ghost.

Avoiding the 'if' check should more than make up for whatever overhead introduced when there is a ghost.

There are other things we can do if we really want to optimize this loop.

1) You are calling "foo.append()". It turns out to usually be a bit faster to do:
  foo_append = foo.append
  while XXX:
     foo_append

2) You seem to be popping an item off the list at the beginning, and probably are often pushing (append) one back on at the end. In my tests with this sort of code in KnownGraph it turns out to often be a lot better to peek at the last item, and then replace it the first time you append, rather than pop + append. For example:

no_children_append = no_children.append
node_stack_append = node_stack.append
graph_pop = graph.pop

... # while
node_name = no_children[-1]
remaining = True
node_stack_append(node_name)

parents = graph_pop(node_name)
for parent in parents:
  try:
    n = numchildren[parent] - 1
  except KeyError:
    # We don't worry about ghosts
    continue
  if n > 0:
    numchildren[parent] = n
  else:
    # We could set num_children[parent] = 0, but we don't need to because
    # it shouldn't ever be referenced again
    # we might want to actually do numchildren.pop(parent), because that will
    # make future dict lookups slightly faster, at the cost of resizing
    # needs to be benchmarked
    if remaining:
      remaining = False
      no_children[-1] = parent
    else:
      no_children_append(parent)
if remaining:
  # We added this node, and didn't have any parents replace it
  # remove the last entry
  no_children.pop()

You can actually take it a step farther, and never 'pop' the no_children list, but keep a pointer to the 'current' item. And then append when you need to add one more, and just move the pointer otherwise. (Look at the _known_graph.pyx code for some details.)

I would be curious what results you would get, but I wouldn't be surprised if it was significant. Given that you are at 260ms for 20k+ nodes, you are already in the sub ms per node, which is where stuff like attribute lookup, append/pop really start to add up.

...

no...

Read more...

review: Approve
Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal

Sorry, I didn't mean to mark it approve prematurely.

To give some numbers using topo_sort on 26k revs of bzr.dev loaded into a dict:

 188.0ms bzr.dev
  75.8ms mkbosmans code
  74.0ms getting rid of the "if x in y" lookup (not a big win)
  64.2ms changing "foo.append" to "foo_append"
  62.5ms using nochildren[-1] rather than always pop+append
  63.0ms using nochildren[last_tip] (probably a small win on more linear histories)
  65.6ms reversed(node_name_stack) # I was surprised that reversed(x) is slower than
          x.reverse() when the list has 26k items

  37.4ms Pyrex version that mostly matches the optimized version I put together

  49.9ms k = KnownGraph(parent_map); k.topo_sort()
   9.5ms k.topo_sort()

So building the KnownGraph object costs approximately as much as the optimized C topo_sort (which makes sense, because to find_gdfo we basically topo_sort and update the counters.)

However, once we *have* a KnownGraph (if we decided we wanted one for some other reason), then we can generate the topological sorted output in <10ms. Or about 18x faster than the original python TopoSorted implementation.

My test code is available at lp:~jameinel/bzr/1.18-topo-sort

The new implementations in _known_graph.pyx don't have test cases on them, and we would need to implement topo_sort in _known_graph.py, etc. But I honestly think that is the area to work in to get the best performance.

I believe you were also considering updating "tsort.merge_sort", and again this would be the place to look into writing some really optimized code.

review: Needs Fixing
Revision history for this message
Maarten Bosmans (mkbosmans) wrote : Posted in a previous version of this proposal

I benchmarked all the suggested changes with the same result as you had. That is, mainly the foo_append change was a significant improvement, the rest not some much.

> ^- since you just populated the dictionary with 0 for every node, the only
> times when a parent isn't in the nchildren dict is when it is a ghost.
> So I wouldn't do a "x in y" check for each one, but just catch a potential
> KeyError.

Indeed, I added this check after the first version failed the unit test for a ghost node in the graph. As you mentioned removing the check wasn't a big win, so I'm inclined to leave it with the simpler version as is, perhaps with a comment added that explains that the check is for ghost parents.

> 1) You are calling "foo.append()". It turns out to usually be a bit faster to
> do:

This is indeed faster. I'll change it.

> 2) You seem to be popping an item off the list at the beginning, and probably
> are often pushing (append) one back on at the end. In my tests with this sort
> of code in KnownGraph it turns out to often be a lot better to peek at the
> last item, and then replace it the first time you append, rather than pop +
> append. For example:
>
> ...
>
> I would be curious what results you would get, but I wouldn't be surprised if
> it was significant. Given that you are at 260ms for 20k+ nodes, you are
> already in the sub ms per node, which is where stuff like attribute lookup,
> append/pop really start to add up.

So perhaps Python needs a more efficient stack implementation? I'm not very attracted to having such hacks in the otherwise quite clean algorithm. And as you already mentioned, it's not a big performance win. A similar speedup can be acquired by using a set instead of a stack. This requires only an added set() on initialization and using .add instead of .append.

> Of course, if we really think this code is a bottleneck and worth optimizing,
> then we may want to keep a reasonable and fairly simple implementation in
> python, and look into writing a pyrex version.
>
> Could you look at some of the suggestions here and give benchmark results?

If your pyrex code is a lot faster, may be it is better to add that and scrap all these optimizations. The topo_sorter function can than simply remain

  return TopoSorter(graph).sorted()

and we don't have to Python algorithms implemented.

If you still think this new algorithm is worth going in, I will make a new version with the foo_append and stack->set changes. I propose that we leave the optimizations with those two and instead of micro-optimizing it further, focus our attention to merge_sort, which is much more used, I believe.

BTW, how do I make these extra changes? Should I add a commit to the branch that only makes these few changes, or should I go back to the revision that added the whole algorithm, recommit it with changes and push the new branch over the current in lp?

Revision history for this message
Marius Kruger (amanica) wrote : Posted in a previous version of this proposal

>
> BTW, how do I make these extra changes? Should I add a commit to the branch
> that only makes these few changes
>
yes. i.e if I'm not missing anything, just commit the changes as per review
feedback, push to the same branch on lp.
Then set the status of the merge proposal to resubmit.
And when you are ready propose the branch for merging again.

> , or should I go back to the revision that added the whole algorithm,
> recommit it with changes and push the new branch over the current in lp?
>
no

Revision history for this message
Maarten Bosmans (mkbosmans) wrote : Posted in a previous version of this proposal

The last commit 4582 adds the foo_append improvement.
Also instead of toying with pointer to list items to be popped, etc., I used collections.deque, which is a double-linked list, so that data structure captures your suggested optimization for the list used as stack nicely. It depends on Python 2.4, I hope that's OK for bzr.
To make the algorithm more clear, I added a comment about ghost parents and renamed some of the variables.

These are my new timings (not to be compared with any times posted elsewhere)

r4579 1.41 improved TopoSorted.sorted()
r4580 1.05 new algorithm topo_sort()
r4582 0.93 foo_append, etc.
       0.89 use deque

So this is a nice 15% further improvement for the new algorithm.

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Maarten Bosmans wrote:
> Maarten Bosmans has proposed merging lp:~mkbosmans/bzr/topo_sort into lp:bzr.

Seems a reasonable stepping stone to me.

 review approve
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkqBm5cACgkQJdeBCYSNAAPM/ACgqCI0mKbfJfmD47lgedEIlwGl
CkEAoMeYdjo8mfNBY/XXAaiK/RTSxSdh
=G6va
-----END PGP SIGNATURE-----

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/reconcile.py'
2--- bzrlib/reconcile.py 2009-06-10 03:56:49 +0000
3+++ bzrlib/reconcile.py 2009-08-19 09:36:26 +0000
4@@ -33,7 +33,7 @@
5 repofmt,
6 )
7 from bzrlib.trace import mutter, note
8-from bzrlib.tsort import TopoSorter
9+from bzrlib.tsort import topo_sort
10 from bzrlib.versionedfile import AdapterFactory, FulltextContentFactory
11
12
13@@ -247,8 +247,7 @@
14
15 # we have topological order of revisions and non ghost parents ready.
16 self._setup_steps(len(self._rev_graph))
17- revision_keys = [(rev_id,) for rev_id in
18- TopoSorter(self._rev_graph.items()).iter_topo_order()]
19+ revision_keys = [(rev_id,) for rev_id in topo_sort(self._rev_graph)]
20 stream = self._change_inv_parents(
21 self.inventory.get_record_stream(revision_keys, 'unordered', True),
22 self._new_inv_parents,
23@@ -378,7 +377,7 @@
24 new_inventories = self.repo._temp_inventories()
25 # we have topological order of revisions and non ghost parents ready.
26 graph = self.revisions.get_parent_map(self.revisions.keys())
27- revision_keys = list(TopoSorter(graph).iter_topo_order())
28+ revision_keys = topo_sort(graph)
29 revision_ids = [key[-1] for key in revision_keys]
30 self._setup_steps(len(revision_keys))
31 stream = self._change_inv_parents(
32
33=== modified file 'bzrlib/repository.py'
34--- bzrlib/repository.py 2009-08-17 23:15:55 +0000
35+++ bzrlib/repository.py 2009-08-19 09:36:26 +0000
36@@ -4351,7 +4351,7 @@
37 phase = 'file'
38 revs = search.get_keys()
39 graph = self.from_repository.get_graph()
40- revs = list(graph.iter_topo_order(revs))
41+ revs = tsort.topo_sort(graph.get_parent_map(revs))
42 data_to_fetch = self.from_repository.item_keys_introduced_by(revs)
43 text_keys = []
44 for knit_kind, file_id, revisions in data_to_fetch:
45
46=== modified file 'bzrlib/tests/blackbox/test_ancestry.py'
47--- bzrlib/tests/blackbox/test_ancestry.py 2009-03-23 14:59:43 +0000
48+++ bzrlib/tests/blackbox/test_ancestry.py 2009-08-19 09:36:26 +0000
49@@ -44,7 +44,7 @@
50 def _check_ancestry(self, location='', result=None):
51 out = self.run_bzr(['ancestry', location])[0]
52 if result is None:
53- result = "A1\nB1\nA2\nA3\n"
54+ result = "A1\nA2\nB1\nA3\n"
55 self.assertEqualDiff(out, result)
56
57 def test_ancestry(self):
58
59=== modified file 'bzrlib/tests/test_tsort.py'
60--- bzrlib/tests/test_tsort.py 2009-08-17 15:26:18 +0000
61+++ bzrlib/tests/test_tsort.py 2009-08-19 09:36:26 +0000
62@@ -39,6 +39,20 @@
63 list,
64 TopoSorter(graph).iter_topo_order())
65
66+ def assertSortAndIterateOrder(self, graph):
67+ """Check that sorting and iter_topo_order on graph really results in topological order.
68+
69+ For every child in the graph, check if it comes after all of it's parents.
70+ """
71+ sort_result = topo_sort(graph)
72+ iter_result = list(TopoSorter(graph).iter_topo_order())
73+ for (node, parents) in graph:
74+ for parent in parents:
75+ self.assertTrue(sort_result.index(node) > sort_result.index(parent),
76+ "parent %s must come before child %s:\n%s" % (parent, node, sort_result))
77+ self.assertTrue(iter_result.index(node) > iter_result.index(parent),
78+ "parent %s must come before child %s:\n%s" % (parent, node, iter_result))
79+
80 def test_tsort_empty(self):
81 """TopoSort empty list"""
82 self.assertSortAndIterate([], [])
83@@ -72,10 +86,10 @@
84 def test_tsort_partial(self):
85 """Topological sort with partial ordering.
86
87- If the graph does not give an order between two nodes, they are
88- returned in lexicographical order.
89+ Multiple correct orderings are possible, so test for
90+ correctness, not for exact match on the resulting list.
91 """
92- self.assertSortAndIterate(([(0, []),
93+ self.assertSortAndIterateOrder([(0, []),
94 (1, [0]),
95 (2, [0]),
96 (3, [0]),
97@@ -83,8 +97,7 @@
98 (5, [1, 2]),
99 (6, [1, 2]),
100 (7, [2, 3]),
101- (8, [0, 1, 4, 5, 6])]),
102- [0, 1, 2, 3, 4, 5, 6, 7, 8])
103+ (8, [0, 1, 4, 5, 6])])
104
105 def test_tsort_unincluded_parent(self):
106 """Sort nodes, but don't include some parents in the output"""
107
108=== modified file 'bzrlib/tsort.py'
109--- bzrlib/tsort.py 2009-08-17 15:26:18 +0000
110+++ bzrlib/tsort.py 2009-08-19 09:36:26 +0000
111@@ -20,6 +20,7 @@
112
113 from bzrlib import errors
114 import bzrlib.revision as _mod_revision
115+from collections import deque
116
117
118 __all__ = ["topo_sort", "TopoSorter", "merge_sort", "MergeSorter"]
119@@ -34,8 +35,57 @@
120 their children.
121
122 node identifiers can be any hashable object, and are typically strings.
123+
124+ This function has the same purpose as the TopoSorter class, but uses a
125+ different algorithm to sort the graph. That means that while both return a list
126+ with parents before their child nodes, the exact ordering can be different.
127+
128+ topo_sort is faster when the whole list is needed, while when iterating over a
129+ part of the list, TopoSorter.iter_topo_order should be used.
130 """
131- return TopoSorter(graph).sorted()
132+ # store a dict of the graph.
133+ graph = dict(graph)
134+ # this is the stack storing on which the sorted nodes are pushed.
135+ node_stack = []
136+
137+ # count the number of children for every node in the graph
138+ node_child_count = dict.fromkeys(graph.iterkeys(), 0)
139+ for parents in graph.itervalues():
140+ for parent in parents:
141+ # don't count the parent if it's a ghost
142+ if parent in node_child_count:
143+ node_child_count[parent] += 1
144+ # keep track of nodes without children in a separate list
145+ nochild_nodes = deque([node for (node, n) in node_child_count.iteritems() if n == 0])
146+
147+ graph_pop = graph.pop
148+ node_stack_append = node_stack.append
149+ nochild_nodes_pop = nochild_nodes.pop
150+ nochild_nodes_append = nochild_nodes.append
151+
152+ while nochild_nodes:
153+ # pick a node without a child and add it to the stack.
154+ node_name = nochild_nodes_pop()
155+ node_stack_append(node_name)
156+
157+ # the parents of the node lose it as a child; if it was the last
158+ # child, add the parent to the list of childless nodes.
159+ parents = graph_pop(node_name)
160+ for parent in parents:
161+ if parent in node_child_count:
162+ node_child_count[parent] -= 1
163+ if node_child_count[parent] == 0:
164+ nochild_nodes_append(parent)
165+
166+ # if there are still nodes left in the graph,
167+ # that means that there is a cycle
168+ if graph:
169+ raise errors.GraphCycleError(graph)
170+
171+ # the nodes where pushed on the stack child first, so this list needs to be
172+ # reversed before returning it.
173+ node_stack.reverse()
174+ return node_stack
175
176
177 class TopoSorter(object):
178@@ -60,22 +110,8 @@
179 iteration or sorting may raise GraphCycleError if a cycle is present
180 in the graph.
181 """
182- # a dict of the graph.
183+ # store a dict of the graph.
184 self._graph = dict(graph)
185- self._visitable = set(self._graph)
186- ### if debugging:
187- # self._original_graph = dict(graph)
188-
189- # this is a stack storing the depth first search into the graph.
190- self._node_name_stack = []
191- # at each level of 'recursion' we have to check each parent. This
192- # stack stores the parents we have not yet checked for the node at the
193- # matching depth in _node_name_stack
194- self._pending_parents_stack = []
195- # this is a set of the completed nodes for fast checking whether a
196- # parent in a node we are processing on the stack has already been
197- # emitted and thus can be skipped.
198- self._completed_node_names = set()
199
200 def sorted(self):
201 """Sort the graph and return as a list.
202@@ -100,67 +136,64 @@
203 After finishing iteration the sorter is empty and you cannot continue
204 iteration.
205 """
206- while self._graph:
207+ graph = self._graph
208+ visitable = set(graph)
209+
210+ # this is a stack storing the depth first search into the graph.
211+ pending_node_stack = []
212+ # at each level of 'recursion' we have to check each parent. This
213+ # stack stores the parents we have not yet checked for the node at the
214+ # matching depth in pending_node_stack
215+ pending_parents_stack = []
216+
217+ # this is a set of the completed nodes for fast checking whether a
218+ # parent in a node we are processing on the stack has already been
219+ # emitted and thus can be skipped.
220+ completed_node_names = set()
221+
222+ while graph:
223 # now pick a random node in the source graph, and transfer it to the
224- # top of the depth first search stack.
225- node_name, parents = self._graph.popitem()
226- self._push_node(node_name, parents)
227- while self._node_name_stack:
228- # loop until this call completes.
229- parents_to_visit = self._pending_parents_stack[-1]
230- # if all parents are done, the revision is done
231+ # top of the depth first search stack of pending nodes.
232+ node_name, parents = graph.popitem()
233+ pending_node_stack.append(node_name)
234+ pending_parents_stack.append(list(parents))
235+
236+ # loop until pending_node_stack is empty
237+ while pending_node_stack:
238+ parents_to_visit = pending_parents_stack[-1]
239+ # if there are no parents left, the revision is done
240 if not parents_to_visit:
241 # append the revision to the topo sorted list
242- # all the nodes parents have been added to the output, now
243- # we can add it to the output.
244- yield self._pop_node()
245+ # all the nodes parents have been added to the output,
246+ # now we can add it to the output.
247+ popped_node = pending_node_stack.pop()
248+ pending_parents_stack.pop()
249+ completed_node_names.add(popped_node)
250+ yield popped_node
251 else:
252- while self._pending_parents_stack[-1]:
253- # recurse depth first into a single parent
254- next_node_name = self._pending_parents_stack[-1].pop()
255- if next_node_name in self._completed_node_names:
256- # this parent was completed by a child on the
257- # call stack. skip it.
258- continue
259- if next_node_name not in self._visitable:
260- continue
261- # otherwise transfer it from the source graph into the
262- # top of the current depth first search stack.
263- try:
264- parents = self._graph.pop(next_node_name)
265- except KeyError:
266- # if the next node is not in the source graph it has
267- # already been popped from it and placed into the
268- # current search stack (but not completed or we would
269- # have hit the continue 4 lines up.
270- # this indicates a cycle.
271- raise errors.GraphCycleError(self._node_name_stack)
272- self._push_node(next_node_name, parents)
273- # and do not continue processing parents until this 'call'
274- # has recursed.
275- break
276-
277- def _push_node(self, node_name, parents):
278- """Add node_name to the pending node stack.
279-
280- Names in this stack will get emitted into the output as they are popped
281- off the stack.
282- """
283- self._node_name_stack.append(node_name)
284- self._pending_parents_stack.append(list(parents))
285-
286- def _pop_node(self):
287- """Pop the top node off the stack
288-
289- The node is appended to the sorted output.
290- """
291- # we are returning from the flattened call frame:
292- # pop off the local variables
293- node_name = self._node_name_stack.pop()
294- self._pending_parents_stack.pop()
295-
296- self._completed_node_names.add(node_name)
297- return node_name
298+ # recurse depth first into a single parent
299+ next_node_name = parents_to_visit.pop()
300+
301+ if next_node_name in completed_node_names:
302+ # parent was already completed by a child, skip it.
303+ continue
304+ if next_node_name not in visitable:
305+ # parent is not a node in the original graph, skip it.
306+ continue
307+
308+ # transfer it along with its parents from the source graph
309+ # into the top of the current depth first search stack.
310+ try:
311+ parents = graph.pop(next_node_name)
312+ except KeyError:
313+ # if the next node is not in the source graph it has
314+ # already been popped from it and placed into the
315+ # current search stack (but not completed or we would
316+ # have hit the continue 6 lines up). this indicates a
317+ # cycle.
318+ raise errors.GraphCycleError(pending_node_stack)
319+ pending_node_stack.append(next_node_name)
320+ pending_parents_stack.append(list(parents))
321
322
323 def merge_sort(graph, branch_tip, mainline_revisions=None, generate_revno=False):