Merge lp:~mkbosmans/bzr/topo_sort into lp:~bzr/bzr/trunk-old

Proposed by Maarten Bosmans
Status: Superseded
Proposed branch: lp:~mkbosmans/bzr/topo_sort
Merge into: lp:~bzr/bzr/trunk-old
Diff against target: None lines
To merge this branch: bzr merge lp:~mkbosmans/bzr/topo_sort
Reviewer Review Type Date Requested Status
John A Meinel Needs Fixing
Review via email: mp+9532@code.launchpad.net

This proposal has been superseded by a proposal from 2009-08-09.

To post a comment you must log in.
Revision history for this message
Maarten Bosmans (mkbosmans) wrote :

By refactoring the code in tsort.TopoSorter.iter_topo_order a performance improvement of 40% can be achieved.
The runtime to sort the graph of bzr.dev went from 620 ms to 380 ms on my computer. The slightly modified algorithm is also a bit more readable by flattening out a while loop.

A further improvement to 260 ms can be achieved by using a completely different algorithm for tsort.topo_sort. I'm not shure whether this second improvement is enough to warrant two algorithms, but it probably is because it's quite easy to understand and follow. The faster algorithm cannot be used for the iterator because almost all the works takes place before the first element can be yielded.

Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (3.8 KiB)

131 + for parents in graph.itervalues():
132 + for parent in parents:
133 + if parent in nchildren:
134 + nchildren[parent] += 1

^- since you just populated the dictionary with 0 for every node, the only times when a parent isn't in the nchildren dict is when it is a ghost.
So I wouldn't do a "x in y" check for each one, but just catch a potential KeyError.

eg:

for parents in graph.itervalues():
  for parent in parents:
    try:
      nchildren[parent] += 1
    except KeyError:
      # Ghost, we don't track their child count
      pass

Similarly for this pass:
145 + parents = graph.pop(node_name)
146 + for parent in parents:
147 + if parent in nchildren:
148 + nchildren[parent] -= 1
149 + if nchildren[parent] == 0:
150 + nochildnodes.append(parent)

Actually, in this pass, if we end up with a parent that isn't in 'nchildren' that didn't miss earlier, then we probably have a serious issue.

I might actually turn the first pass into gathering nodes which are known to be ghosts, doing another try/except here and making sure anything missing is already a known ghost.

Avoiding the 'if' check should more than make up for whatever overhead introduced when there is a ghost.

There are other things we can do if we really want to optimize this loop.

1) You are calling "foo.append()". It turns out to usually be a bit faster to do:
  foo_append = foo.append
  while XXX:
     foo_append

2) You seem to be popping an item off the list at the beginning, and probably are often pushing (append) one back on at the end. In my tests with this sort of code in KnownGraph it turns out to often be a lot better to peek at the last item, and then replace it the first time you append, rather than pop + append. For example:

no_children_append = no_children.append
node_stack_append = node_stack.append
graph_pop = graph.pop

... # while
node_name = no_children[-1]
remaining = True
node_stack_append(node_name)

parents = graph_pop(node_name)
for parent in parents:
  try:
    n = numchildren[parent] - 1
  except KeyError:
    # We don't worry about ghosts
    continue
  if n > 0:
    numchildren[parent] = n
  else:
    # We could set num_children[parent] = 0, but we don't need to because
    # it shouldn't ever be referenced again
    # we might want to actually do numchildren.pop(parent), because that will
    # make future dict lookups slightly faster, at the cost of resizing
    # needs to be benchmarked
    if remaining:
      remaining = False
      no_children[-1] = parent
    else:
      no_children_append(parent)
if remaining:
  # We added this node, and didn't have any parents replace it
  # remove the last entry
  no_children.pop()

You can actually take it a step farther, and never 'pop' the no_children list, but keep a pointer to the 'current' item. And then append when you need to add one more, and just move the pointer otherwise. (Look at the _known_graph.pyx code for some details.)

I would be curious what results you would get, but I wouldn't be surprised if it was significant. Given that you are at 260ms for 20k+ nodes, you are already in the sub ms per node, which is where stuff like attribute lookup, append/pop really start to add up.

...

no...

Read more...

review: Approve
Revision history for this message
John A Meinel (jameinel) wrote :

Sorry, I didn't mean to mark it approve prematurely.

To give some numbers using topo_sort on 26k revs of bzr.dev loaded into a dict:

 188.0ms bzr.dev
  75.8ms mkbosmans code
  74.0ms getting rid of the "if x in y" lookup (not a big win)
  64.2ms changing "foo.append" to "foo_append"
  62.5ms using nochildren[-1] rather than always pop+append
  63.0ms using nochildren[last_tip] (probably a small win on more linear histories)
  65.6ms reversed(node_name_stack) # I was surprised that reversed(x) is slower than
          x.reverse() when the list has 26k items

  37.4ms Pyrex version that mostly matches the optimized version I put together

  49.9ms k = KnownGraph(parent_map); k.topo_sort()
   9.5ms k.topo_sort()

So building the KnownGraph object costs approximately as much as the optimized C topo_sort (which makes sense, because to find_gdfo we basically topo_sort and update the counters.)

However, once we *have* a KnownGraph (if we decided we wanted one for some other reason), then we can generate the topological sorted output in <10ms. Or about 18x faster than the original python TopoSorted implementation.

My test code is available at lp:~jameinel/bzr/1.18-topo-sort

The new implementations in _known_graph.pyx don't have test cases on them, and we would need to implement topo_sort in _known_graph.py, etc. But I honestly think that is the area to work in to get the best performance.

I believe you were also considering updating "tsort.merge_sort", and again this would be the place to look into writing some really optimized code.

review: Needs Fixing
Revision history for this message
Maarten Bosmans (mkbosmans) wrote :

I benchmarked all the suggested changes with the same result as you had. That is, mainly the foo_append change was a significant improvement, the rest not some much.

> ^- since you just populated the dictionary with 0 for every node, the only
> times when a parent isn't in the nchildren dict is when it is a ghost.
> So I wouldn't do a "x in y" check for each one, but just catch a potential
> KeyError.

Indeed, I added this check after the first version failed the unit test for a ghost node in the graph. As you mentioned removing the check wasn't a big win, so I'm inclined to leave it with the simpler version as is, perhaps with a comment added that explains that the check is for ghost parents.

> 1) You are calling "foo.append()". It turns out to usually be a bit faster to
> do:

This is indeed faster. I'll change it.

> 2) You seem to be popping an item off the list at the beginning, and probably
> are often pushing (append) one back on at the end. In my tests with this sort
> of code in KnownGraph it turns out to often be a lot better to peek at the
> last item, and then replace it the first time you append, rather than pop +
> append. For example:
>
> ...
>
> I would be curious what results you would get, but I wouldn't be surprised if
> it was significant. Given that you are at 260ms for 20k+ nodes, you are
> already in the sub ms per node, which is where stuff like attribute lookup,
> append/pop really start to add up.

So perhaps Python needs a more efficient stack implementation? I'm not very attracted to having such hacks in the otherwise quite clean algorithm. And as you already mentioned, it's not a big performance win. A similar speedup can be acquired by using a set instead of a stack. This requires only an added set() on initialization and using .add instead of .append.

> Of course, if we really think this code is a bottleneck and worth optimizing,
> then we may want to keep a reasonable and fairly simple implementation in
> python, and look into writing a pyrex version.
>
> Could you look at some of the suggestions here and give benchmark results?

If your pyrex code is a lot faster, may be it is better to add that and scrap all these optimizations. The topo_sorter function can than simply remain

  return TopoSorter(graph).sorted()

and we don't have to Python algorithms implemented.

If you still think this new algorithm is worth going in, I will make a new version with the foo_append and stack->set changes. I propose that we leave the optimizations with those two and instead of micro-optimizing it further, focus our attention to merge_sort, which is much more used, I believe.

BTW, how do I make these extra changes? Should I add a commit to the branch that only makes these few changes, or should I go back to the revision that added the whole algorithm, recommit it with changes and push the new branch over the current in lp?

Revision history for this message
Marius Kruger (amanica) wrote :

>
> BTW, how do I make these extra changes? Should I add a commit to the branch
> that only makes these few changes
>
yes. i.e if I'm not missing anything, just commit the changes as per review
feedback, push to the same branch on lp.
Then set the status of the merge proposal to resubmit.
And when you are ready propose the branch for merging again.

> , or should I go back to the revision that added the whole algorithm,
> recommit it with changes and push the new branch over the current in lp?
>
no

Revision history for this message
Maarten Bosmans (mkbosmans) wrote :

The last commit 4582 adds the foo_append improvement.
Also instead of toying with pointer to list items to be popped, etc., I used collections.deque, which is a double-linked list, so that data structure captures your suggested optimization for the list used as stack nicely. It depends on Python 2.4, I hope that's OK for bzr.
To make the algorithm more clear, I added a comment about ghost parents and renamed some of the variables.

These are my new timings (not to be compared with any times posted elsewhere)

r4579 1.41 improved TopoSorted.sorted()
r4580 1.05 new algorithm topo_sort()
r4582 0.93 foo_append, etc.
       0.89 use deque

So this is a nice 15% further improvement for the new algorithm.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/reconcile.py'
2--- bzrlib/reconcile.py 2009-06-10 03:56:49 +0000
3+++ bzrlib/reconcile.py 2009-07-31 22:48:55 +0000
4@@ -33,7 +33,7 @@
5 repofmt,
6 )
7 from bzrlib.trace import mutter, note
8-from bzrlib.tsort import TopoSorter
9+from bzrlib.tsort import topo_sort
10 from bzrlib.versionedfile import AdapterFactory, FulltextContentFactory
11
12
13@@ -247,8 +247,7 @@
14
15 # we have topological order of revisions and non ghost parents ready.
16 self._setup_steps(len(self._rev_graph))
17- revision_keys = [(rev_id,) for rev_id in
18- TopoSorter(self._rev_graph.items()).iter_topo_order()]
19+ revision_keys = [(rev_id,) for rev_id in topo_sort(self._rev_graph)]
20 stream = self._change_inv_parents(
21 self.inventory.get_record_stream(revision_keys, 'unordered', True),
22 self._new_inv_parents,
23@@ -378,7 +377,7 @@
24 new_inventories = self.repo._temp_inventories()
25 # we have topological order of revisions and non ghost parents ready.
26 graph = self.revisions.get_parent_map(self.revisions.keys())
27- revision_keys = list(TopoSorter(graph).iter_topo_order())
28+ revision_keys = topo_sort(graph)
29 revision_ids = [key[-1] for key in revision_keys]
30 self._setup_steps(len(revision_keys))
31 stream = self._change_inv_parents(
32
33=== modified file 'bzrlib/repository.py'
34--- bzrlib/repository.py 2009-07-20 10:14:42 +0000
35+++ bzrlib/repository.py 2009-07-31 22:48:55 +0000
36@@ -4154,7 +4154,7 @@
37 phase = 'file'
38 revs = search.get_keys()
39 graph = self.from_repository.get_graph()
40- revs = list(graph.iter_topo_order(revs))
41+ revs = tsort.topo_sort(graph.get_parent_map(revs))
42 data_to_fetch = self.from_repository.item_keys_introduced_by(revs)
43 text_keys = []
44 for knit_kind, file_id, revisions in data_to_fetch:
45
46=== modified file 'bzrlib/tests/blackbox/test_ancestry.py'
47--- bzrlib/tests/blackbox/test_ancestry.py 2009-03-23 14:59:43 +0000
48+++ bzrlib/tests/blackbox/test_ancestry.py 2009-07-31 22:48:55 +0000
49@@ -44,7 +44,7 @@
50 def _check_ancestry(self, location='', result=None):
51 out = self.run_bzr(['ancestry', location])[0]
52 if result is None:
53- result = "A1\nB1\nA2\nA3\n"
54+ result = "A1\nA2\nB1\nA3\n"
55 self.assertEqualDiff(out, result)
56
57 def test_ancestry(self):
58
59=== modified file 'bzrlib/tests/test_tsort.py'
60--- bzrlib/tests/test_tsort.py 2009-03-23 14:59:43 +0000
61+++ bzrlib/tests/test_tsort.py 2009-07-30 12:55:32 +0000
62@@ -39,6 +39,20 @@
63 list,
64 TopoSorter(graph).iter_topo_order())
65
66+ def assertSortAndIterateOrder(self, graph):
67+ """Check that sorting and iter_topo_order on graph really results in topological order.
68+
69+ For every child in the graph, check if it comes after all of it's parents.
70+ """
71+ sort_result = topo_sort(graph)
72+ iter_result = list(TopoSorter(graph).iter_topo_order())
73+ for (node, parents) in graph:
74+ for parent in parents:
75+ self.assertTrue(sort_result.index(node) > sort_result.index(parent),
76+ "parent %s must come before child %s:\n%s" % (parent, node, sort_result))
77+ self.assertTrue(iter_result.index(node) > iter_result.index(parent),
78+ "parent %s must come before child %s:\n%s" % (parent, node, iter_result))
79+
80 def test_tsort_empty(self):
81 """TopoSort empty list"""
82 self.assertSortAndIterate([], [])
83@@ -72,10 +86,10 @@
84 def test_tsort_partial(self):
85 """Topological sort with partial ordering.
86
87- If the graph does not give an order between two nodes, they are
88- returned in lexicographical order.
89+ Multiple correct orderings are possible, so test for
90+ correctness, not for exact match on the resulting list.
91 """
92- self.assertSortAndIterate(([(0, []),
93+ self.assertSortAndIterateOrder([(0, []),
94 (1, [0]),
95 (2, [0]),
96 (3, [0]),
97@@ -83,8 +97,7 @@
98 (5, [1, 2]),
99 (6, [1, 2]),
100 (7, [2, 3]),
101- (8, [0, 1, 4, 5, 6])]),
102- [0, 1, 2, 3, 4, 5, 6, 7, 8])
103+ (8, [0, 1, 4, 5, 6])])
104
105 def test_tsort_unincluded_parent(self):
106 """Sort nodes, but don't include some parents in the output"""
107
108=== modified file 'bzrlib/tsort.py'
109--- bzrlib/tsort.py 2009-03-23 14:59:43 +0000
110+++ bzrlib/tsort.py 2009-07-31 09:23:52 +0000
111@@ -34,8 +34,51 @@
112 their children.
113
114 node identifiers can be any hashable object, and are typically strings.
115+
116+ This function has the same purpose as the TopoSorter class, but uses a
117+ different algorithm to sort the graph. That means that while both return a list
118+ with parents before their child nodes, the exact ordering can be different.
119+
120+ topo_sort is faster when the whole list is needed, while when iterating over a
121+ part of the list, TopoSorter.iter_topo_order should be used.
122 """
123- return TopoSorter(graph).sorted()
124+ # store a dict of the graph.
125+ graph = dict(graph)
126+ # this is the stack storing on which the sorted nodes are pushed.
127+ node_name_stack = []
128+
129+ # count the number of children for every node in the graph
130+ nchildren = dict.fromkeys(graph.iterkeys(), 0)
131+ for parents in graph.itervalues():
132+ for parent in parents:
133+ if parent in nchildren:
134+ nchildren[parent] += 1
135+ # keep track of nodes without children in a separate list
136+ nochildnodes = [node for (node, n) in nchildren.iteritems() if n == 0]
137+
138+ while nochildnodes:
139+ # pick a node without a child and add it to the stack.
140+ node_name = nochildnodes.pop()
141+ node_name_stack.append(node_name)
142+
143+ # the parents of the node lose it as a child; if it was the last
144+ # child, add the parent to the list of childless nodes.
145+ parents = graph.pop(node_name)
146+ for parent in parents:
147+ if parent in nchildren:
148+ nchildren[parent] -= 1
149+ if nchildren[parent] == 0:
150+ nochildnodes.append(parent)
151+
152+ # if there are still nodes left in the graph,
153+ # that means that there is a cycle
154+ if graph:
155+ raise errors.GraphCycleError(graph)
156+
157+ # the nodes where pushed on the stack child first, so this list needs to be
158+ # reversed before returning it.
159+ node_name_stack.reverse()
160+ return node_name_stack
161
162
163 class TopoSorter(object):
164@@ -60,22 +103,8 @@
165 iteration or sorting may raise GraphCycleError if a cycle is present
166 in the graph.
167 """
168- # a dict of the graph.
169+ # store a dict of the graph.
170 self._graph = dict(graph)
171- self._visitable = set(self._graph)
172- ### if debugging:
173- # self._original_graph = dict(graph)
174-
175- # this is a stack storing the depth first search into the graph.
176- self._node_name_stack = []
177- # at each level of 'recursion' we have to check each parent. This
178- # stack stores the parents we have not yet checked for the node at the
179- # matching depth in _node_name_stack
180- self._pending_parents_stack = []
181- # this is a set of the completed nodes for fast checking whether a
182- # parent in a node we are processing on the stack has already been
183- # emitted and thus can be skipped.
184- self._completed_node_names = set()
185
186 def sorted(self):
187 """Sort the graph and return as a list.
188@@ -100,67 +129,64 @@
189 After finishing iteration the sorter is empty and you cannot continue
190 iteration.
191 """
192- while self._graph:
193+ graph = self._graph
194+ visitable = set(graph)
195+
196+ # this is a stack storing the depth first search into the graph.
197+ pending_node_stack = []
198+ # at each level of 'recursion' we have to check each parent. This
199+ # stack stores the parents we have not yet checked for the node at the
200+ # matching depth in pending_node_stack
201+ pending_parents_stack = []
202+
203+ # this is a set of the completed nodes for fast checking whether a
204+ # parent in a node we are processing on the stack has already been
205+ # emitted and thus can be skipped.
206+ completed_node_names = set()
207+
208+ while graph:
209 # now pick a random node in the source graph, and transfer it to the
210- # top of the depth first search stack.
211- node_name, parents = self._graph.popitem()
212- self._push_node(node_name, parents)
213- while self._node_name_stack:
214- # loop until this call completes.
215- parents_to_visit = self._pending_parents_stack[-1]
216- # if all parents are done, the revision is done
217+ # top of the depth first search stack of pending nodes.
218+ node_name, parents = graph.popitem()
219+ pending_node_stack.append(node_name)
220+ pending_parents_stack.append(list(parents))
221+
222+ # loop until pending_node_stack is empty
223+ while pending_node_stack:
224+ parents_to_visit = pending_parents_stack[-1]
225+ # if there are no parents left, the revision is done
226 if not parents_to_visit:
227 # append the revision to the topo sorted list
228- # all the nodes parents have been added to the output, now
229- # we can add it to the output.
230- yield self._pop_node()
231+ # all the nodes parents have been added to the output,
232+ # now we can add it to the output.
233+ popped_node = pending_node_stack.pop()
234+ pending_parents_stack.pop()
235+ completed_node_names.add(popped_node)
236+ yield popped_node
237 else:
238- while self._pending_parents_stack[-1]:
239- # recurse depth first into a single parent
240- next_node_name = self._pending_parents_stack[-1].pop()
241- if next_node_name in self._completed_node_names:
242- # this parent was completed by a child on the
243- # call stack. skip it.
244- continue
245- if next_node_name not in self._visitable:
246- continue
247- # otherwise transfer it from the source graph into the
248- # top of the current depth first search stack.
249- try:
250- parents = self._graph.pop(next_node_name)
251- except KeyError:
252- # if the next node is not in the source graph it has
253- # already been popped from it and placed into the
254- # current search stack (but not completed or we would
255- # have hit the continue 4 lines up.
256- # this indicates a cycle.
257- raise errors.GraphCycleError(self._node_name_stack)
258- self._push_node(next_node_name, parents)
259- # and do not continue processing parents until this 'call'
260- # has recursed.
261- break
262-
263- def _push_node(self, node_name, parents):
264- """Add node_name to the pending node stack.
265-
266- Names in this stack will get emitted into the output as they are popped
267- off the stack.
268- """
269- self._node_name_stack.append(node_name)
270- self._pending_parents_stack.append(list(parents))
271-
272- def _pop_node(self):
273- """Pop the top node off the stack
274-
275- The node is appended to the sorted output.
276- """
277- # we are returning from the flattened call frame:
278- # pop off the local variables
279- node_name = self._node_name_stack.pop()
280- self._pending_parents_stack.pop()
281-
282- self._completed_node_names.add(node_name)
283- return node_name
284+ # recurse depth first into a single parent
285+ next_node_name = parents_to_visit.pop()
286+
287+ if next_node_name in completed_node_names:
288+ # parent was already completed by a child, skip it.
289+ continue
290+ if next_node_name not in visitable:
291+ # parent is not a node in the original graph, skip it.
292+ continue
293+
294+ # transfer it along with its parents from the source graph
295+ # into the top of the current depth first search stack.
296+ try:
297+ parents = graph.pop(next_node_name)
298+ except KeyError:
299+ # if the next node is not in the source graph it has
300+ # already been popped from it and placed into the
301+ # current search stack (but not completed or we would
302+ # have hit the continue 6 lines up). this indicates a
303+ # cycle.
304+ raise errors.GraphCycleError(pending_node_stack)
305+ pending_node_stack.append(next_node_name)
306+ pending_parents_stack.append(list(parents))
307
308
309 def merge_sort(graph, branch_tip, mainline_revisions=None, generate_revno=False):