Bazaar

Merge lp:~mkbosmans/bzr/topo_sort into lp:~bzr/bzr/trunk-old

topo_sort
Merge into trunk-old

Proposed by Maarten Bosmans on 2009-08-09

Status:	Merged
Merged at revision:	not available
Proposed branch:	lp:~mkbosmans/bzr/topo_sort
Merge into:	lp:~bzr/bzr/trunk-old
Diff against target:	323 lines
To merge this branch:	bzr merge lp:~mkbosmans/bzr/topo_sort
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
John A Meinel			Approve on 2009-08-11
Review via email: mp+9902@code.launchpad.net

This proposal supersedes a proposal from 2009-07-31.

Revision history for this message

Maarten Bosmans (mkbosmans) wrote on 2009-07-31: Posted in a previous version of this proposal

By refactoring the code in tsort.TopoSorter.iter_topo_order a performance improvement of 40% can be achieved.
The runtime to sort the graph of bzr.dev went from 620 ms to 380 ms on my computer. The slightly modified algorithm is also a bit more readable by flattening out a while loop.

A further improvement to 260 ms can be achieved by using a completely different algorithm for tsort.topo_sort. I'm not shure whether this second improvement is enough to warrant two algorithms, but it probably is because it's quite easy to understand and follow. The faster algorithm cannot be used for the iterator because almost all the works takes place before the first element can be yielded.

Revision history for this message

John A Meinel (jameinel) wrote on 2009-08-04: Posted in a previous version of this proposal

Download full text (3.8 KiB)

131 + for parents in graph.itervalues():
132 + for parent in parents:
133 + if parent in nchildren:
134 + nchildren[parent] += 1

^- since you just populated the dictionary with 0 for every node, the only times when a parent isn't in the nchildren dict is when it is a ghost.
So I wouldn't do a "x in y" check for each one, but just catch a potential KeyError.

eg:

for parents in graph.itervalues():
  for parent in parents:
    try:
      nchildren[parent] += 1
    except KeyError:
      # Ghost, we don't track their child count
      pass

Similarly for this pass:
145 + parents = graph.pop(node_name)
146 + for parent in parents:
147 + if parent in nchildren:
148 + nchildren[parent] -= 1
149 + if nchildren[parent] == 0:
150 + nochildnodes.append(parent)

Actually, in this pass, if we end up with a parent that isn't in 'nchildren' that didn't miss earlier, then we probably have a serious issue.

I might actually turn the first pass into gathering nodes which are known to be ghosts, doing another try/except here and making sure anything missing is already a known ghost.

Avoiding the 'if' check should more than make up for whatever overhead introduced when there is a ghost.

There are other things we can do if we really want to optimize this loop.

1) You are calling "foo.append()". It turns out to usually be a bit faster to do:
  foo_append = foo.append
  while XXX:
     foo_append

2) You seem to be popping an item off the list at the beginning, and probably are often pushing (append) one back on at the end. In my tests with this sort of code in KnownGraph it turns out to often be a lot better to peek at the last item, and then replace it the first time you append, rather than pop + append. For example:

no_children_append = no_children.append
node_stack_append = node_stack.append
graph_pop = graph.pop

... # while
node_name = no_children[-1]
remaining = True
node_stack_append(node_name)

parents = graph_pop(node_name)
for parent in parents:
  try:
    n = numchildren[parent] - 1
  except KeyError:
    # We don't worry about ghosts
    continue
  if n > 0:
    numchildren[parent] = n
  else:
    # We could set num_children[parent] = 0, but we don't need to because
    # it shouldn't ever be referenced again
    # we might want to actually do numchildren.pop(parent), because that will
    # make future dict lookups slightly faster, at the cost of resizing
    # needs to be benchmarked
    if remaining:
      remaining = False
      no_children[-1] = parent
    else:
      no_children_append(parent)
if remaining:
  # We added this node, and didn't have any parents replace it
  # remove the last entry
  no_children.pop()

You can actually take it a step farther, and never 'pop' the no_children list, but keep a pointer to the 'current' item. And then append when you need to add one more, and just move the pointer otherwise. (Look at the _known_graph.pyx code for some details.)

I would be curious what results you would get, but I wouldn't be surprised if it was significant. Given that you are at 260ms for 20k+ nodes, you are already in the sub ms per node, which is where stuff like attribute lookup, append/pop really start to add up.

...

no...

131	+ for parents in graph.itervalues():
132	+ for parent in parents:
133	+ if parent in nchildren:
134	+ nchildren[parent] += 1

eg:

for parents in graph.itervalues():
  for parent in parents:
    try:
      nchildren[parent] += 1
    except KeyError:
      # Ghost, we don't track their child count
      pass

Similarly for this pass:
145	+ parents = graph.pop(node_name)
146	+ for parent in parents:
147	+ if parent in nchildren:
148	+ nchildren[parent] -= 1
149	+ if nchildren[parent] == 0:
150	+ nochildnodes.append(parent)

Actually, in this pass, if we end up with a parent that isn't in 'nchildren' that didn't miss earlier, then we probably have a serious issue.

I might actually turn the first pass into gathering nodes which are known to be ghosts, doing another try/except here and making sure anything missing is already a known ghost.

Avoiding the 'if' check should more than make up for whatever overhead introduced when there is a ghost.

There are other things we can do if we really want to optimize this loop.

1) You are calling "foo.append()". It turns out to usually be a bit faster to do:
  foo_append = foo.append
  while XXX:
     foo_append

no_children_append = no_children.append
node_stack_append = node_stack.append
graph_pop = graph.pop

... # while
node_name = no_children[-1]
remaining = True
node_stack_append(node_name)

parents = graph_pop(node_name)
for parent in parents:
  try:
    n = numchildren[parent] - 1
  except KeyError:
    # We don't worry about ghosts
    continue
  if n > 0:
    numchildren[parent] = n
  else:
    # We could set num_children[parent] = 0, but we don't need to because
    # it shouldn't ever be referenced again
    # we might want to actually do numchildren.pop(parent), because that will 
    # make future dict lookups slightly faster, at the cost of resizing
    # needs to be benchmarked
    if remaining:
      remaining = False
      no_children[-1] = parent
    else:
      no_children_append(parent)
if remaining:
  # We added this node, and didn't have any parents replace it
  # remove the last entry
  no_children.pop()

...

node_name_stack.reverse()
return node_name_stack

We might consider doing this as:

return reversed(node_name_stack)

instead.

Creating a reverse iterator is cheap as it doesn't require doing any movement of actual nodes (so no memory allocation or memcpy of items.)

I don't think we have any code that assumes the result of 'topo_sort' is anything but something you can iterate. (ie, we don't have anything that indexes into it.)

Of course, if we really think this code is a bottleneck and worth optimizing, then we may want to keep a reasonable and fairly simple implementation in python, and look into writing a pyrex version.

Could you look at some of the suggestions here and give benchmark results?

review: Approve

Revision history for this message

John A Meinel (jameinel) wrote on 2009-08-05: Posted in a previous version of this proposal

Sorry, I didn't mean to mark it approve prematurely.

To give some numbers using topo_sort on 26k revs of bzr.dev loaded into a dict:

188.0ms bzr.dev
  75.8ms mkbosmans code
  74.0ms getting rid of the "if x in y" lookup (not a big win)
  64.2ms changing "foo.append" to "foo_append"
  62.5ms using nochildren[-1] rather than always pop+append
  63.0ms using nochildren[last_tip] (probably a small win on more linear histories)
  65.6ms reversed(node_name_stack) # I was surprised that reversed(x) is slower than
          x.reverse() when the list has 26k items

37.4ms Pyrex version that mostly matches the optimized version I put together

49.9ms k = KnownGraph(parent_map); k.topo_sort()
9.5ms k.topo_sort()

So building the KnownGraph object costs approximately as much as the optimized C topo_sort (which makes sense, because to find_gdfo we basically topo_sort and update the counters.)

However, once we *have* a KnownGraph (if we decided we wanted one for some other reason), then we can generate the topological sorted output in <10ms. Or about 18x faster than the original python TopoSorted implementation.

My test code is available at lp:~jameinel/bzr/1.18-topo-sort

The new implementations in _known_graph.pyx don't have test cases on them, and we would need to implement topo_sort in _known_graph.py, etc. But I honestly think that is the area to work in to get the best performance.

I believe you were also considering updating "tsort.merge_sort", and again this would be the place to look into writing some really optimized code.

review: Needs Fixing

Revision history for this message

Maarten Bosmans (mkbosmans) wrote on 2009-08-05: Posted in a previous version of this proposal

I benchmarked all the suggested changes with the same result as you had. That is, mainly the foo_append change was a significant improvement, the rest not some much.

> ^- since you just populated the dictionary with 0 for every node, the only
> times when a parent isn't in the nchildren dict is when it is a ghost.
> So I wouldn't do a "x in y" check for each one, but just catch a potential
> KeyError.

Indeed, I added this check after the first version failed the unit test for a ghost node in the graph. As you mentioned removing the check wasn't a big win, so I'm inclined to leave it with the simpler version as is, perhaps with a comment added that explains that the check is for ghost parents.

> 1) You are calling "foo.append()". It turns out to usually be a bit faster to
> do:

This is indeed faster. I'll change it.

> 2) You seem to be popping an item off the list at the beginning, and probably
> are often pushing (append) one back on at the end. In my tests with this sort
> of code in KnownGraph it turns out to often be a lot better to peek at the
> last item, and then replace it the first time you append, rather than pop +
> append. For example:
>
> ...
>
> I would be curious what results you would get, but I wouldn't be surprised if
> it was significant. Given that you are at 260ms for 20k+ nodes, you are
> already in the sub ms per node, which is where stuff like attribute lookup,
> append/pop really start to add up.

So perhaps Python needs a more efficient stack implementation? I'm not very attracted to having such hacks in the otherwise quite clean algorithm. And as you already mentioned, it's not a big performance win. A similar speedup can be acquired by using a set instead of a stack. This requires only an added set() on initialization and using .add instead of .append.

> Of course, if we really think this code is a bottleneck and worth optimizing,
> then we may want to keep a reasonable and fairly simple implementation in
> python, and look into writing a pyrex version.
>
> Could you look at some of the suggestions here and give benchmark results?

If your pyrex code is a lot faster, may be it is better to add that and scrap all these optimizations. The topo_sorter function can than simply remain

return TopoSorter(graph).sorted()

and we don't have to Python algorithms implemented.

If you still think this new algorithm is worth going in, I will make a new version with the foo_append and stack->set changes. I propose that we leave the optimizations with those two and instead of micro-optimizing it further, focus our attention to merge_sort, which is much more used, I believe.

BTW, how do I make these extra changes? Should I add a commit to the branch that only makes these few changes, or should I go back to the revision that added the whole algorithm, recommit it with changes and push the new branch over the current in lp?

I benchmarked all the suggested changes with the same result as you had. That is, mainly the foo_append change was a significant improvement, the rest not some much.

> 1) You are calling "foo.append()". It turns out to usually be a bit faster to
> do:

This is indeed faster. I'll change it.

> 2) You seem to be popping an item off the list at the beginning, and probably
> are often pushing (append) one back on at the end. In my tests with this sort
> of code in KnownGraph it turns out to often be a lot better to peek at the
> last item, and then replace it the first time you append, rather than pop +
> append. For example:
>
> ...
> 
> I would be curious what results you would get, but I wouldn't be surprised if
> it was significant. Given that you are at 260ms for 20k+ nodes, you are
> already in the sub ms per node, which is where stuff like attribute lookup,
> append/pop really start to add up.

> Of course, if we really think this code is a bottleneck and worth optimizing,
> then we may want to keep a reasonable and fairly simple implementation in
> python, and look into writing a pyrex version.
> 
> Could you look at some of the suggestions here and give benchmark results?

If your pyrex code is a lot faster, may be it is better to add that and scrap all these optimizations. The topo_sorter function can than simply remain

return TopoSorter(graph).sorted()

and we don't have to Python algorithms implemented.

Revision history for this message

Marius Kruger (amanica) wrote on 2009-08-05: Posted in a previous version of this proposal

>
> BTW, how do I make these extra changes? Should I add a commit to the branch
> that only makes these few changes
>
yes. i.e if I'm not missing anything, just commit the changes as per review
feedback, push to the same branch on lp.
Then set the status of the merge proposal to resubmit.
And when you are ready propose the branch for merging again.

> , or should I go back to the revision that added the whole algorithm,
> recommit it with changes and push the new branch over the current in lp?
>
no

Revision history for this message

Maarten Bosmans (mkbosmans) wrote on 2009-08-09: Posted in a previous version of this proposal

The last commit 4582 adds the foo_append improvement.
Also instead of toying with pointer to list items to be popped, etc., I used collections.deque, which is a double-linked list, so that data structure captures your suggested optimization for the list used as stack nicely. It depends on Python 2.4, I hope that's OK for bzr.
To make the algorithm more clear, I added a comment about ghost parents and renamed some of the variables.

These are my new timings (not to be compared with any times posted elsewhere)

r4579 1.41 improved TopoSorted.sorted()
r4580 1.05 new algorithm topo_sort()
r4582 0.93 foo_append, etc.
0.89 use deque

So this is a nice 15% further improvement for the new algorithm.

Revision history for this message

John A Meinel (jameinel) wrote on 2009-08-11:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Maarten Bosmans wrote:
> Maarten Bosmans has proposed merging lp:~mkbosmans/bzr/topo_sort into lp:bzr.

Seems a reasonable stepping stone to me.

review approve
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkqBm5cACgkQJdeBCYSNAAPM/ACgqCI0mKbfJfmD47lgedEIlwGl
CkEAoMeYdjo8mfNBY/XXAaiK/RTSxSdh
=G6va
-----END PGP SIGNATURE-----

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Aaron Bentley

Denys Duchier

Eric Siegerman

Gary van der Merwe

Jelmer Vernooij

John Szakmeister

Jonathan Lange

Maarten Bosmans

Marius Kruger

Martin Albisetti

Matt Nordhoff

Paul Hummer

SuperMMX

Talden

Yoshinori Sano

to status/vote changes:

Alexander Belchenko

Martin Eisenhardt

Tim Penhey

Vincent Ladeuil

Bazaar

Merge lp:~mkbosmans/bzr/topo_sort into lp:~bzr/bzr/trunk-old

Commit message

Description of the change

Preview Diff

Subscribers

 === modified file 'bzrlib/reconcile.py'
 --- bzrlib/reconcile.py	2009-06-10 03:56:49 +0000
 +++ bzrlib/reconcile.py	2009-08-19 09:36:26 +0000
@@ -33,7 +33,7 @@
      repofmt,
+     )
  from bzrlib.trace import mutter, note
--from bzrlib.tsort import TopoSorter
++from bzrlib.tsort import topo_sort
  from bzrlib.versionedfile import AdapterFactory, FulltextContentFactory
@@ -247,8 +247,7 @@
          # we have topological order of revisions and non ghost parents ready.
          self._setup_steps(len(self._rev_graph))
--        revision_keys = [(rev_id,) for rev_id in
--            TopoSorter(self._rev_graph.items()).iter_topo_order()]
++        revision_keys = [(rev_id,) for rev_id in topo_sort(self._rev_graph)]
          stream = self._change_inv_parents(
              self.inventory.get_record_stream(revision_keys, 'unordered', True),
              self._new_inv_parents,
@@ -378,7 +377,7 @@
          new_inventories = self.repo._temp_inventories()
          # we have topological order of revisions and non ghost parents ready.
          graph = self.revisions.get_parent_map(self.revisions.keys())
--        revision_keys = list(TopoSorter(graph).iter_topo_order())
++        revision_keys = topo_sort(graph)
          revision_ids = [key[-1] for key in revision_keys]
          self._setup_steps(len(revision_keys))
          stream = self._change_inv_parents(
 === modified file 'bzrlib/repository.py'
 --- bzrlib/repository.py	2009-08-17 23:15:55 +0000
 +++ bzrlib/repository.py	2009-08-19 09:36:26 +0000
@@ -4351,7 +4351,7 @@
          phase = 'file'
          revs = search.get_keys()
          graph = self.from_repository.get_graph()
--        revs = list(graph.iter_topo_order(revs))
++        revs = tsort.topo_sort(graph.get_parent_map(revs))
          data_to_fetch = self.from_repository.item_keys_introduced_by(revs)
          text_keys = []
          for knit_kind, file_id, revisions in data_to_fetch:
 === modified file 'bzrlib/tests/blackbox/test_ancestry.py'
 --- bzrlib/tests/blackbox/test_ancestry.py	2009-03-23 14:59:43 +0000
 +++ bzrlib/tests/blackbox/test_ancestry.py	2009-08-19 09:36:26 +0000
@@ -44,7 +44,7 @@
      def _check_ancestry(self, location='', result=None):
          out = self.run_bzr(['ancestry', location])[0]
          if result is None:
--            result = "A1\nB1\nA2\nA3\n"
++            result = "A1\nA2\nB1\nA3\n"
          self.assertEqualDiff(out, result)
      def test_ancestry(self):
 === modified file 'bzrlib/tests/test_tsort.py'
 --- bzrlib/tests/test_tsort.py	2009-08-17 15:26:18 +0000
 +++ bzrlib/tests/test_tsort.py	2009-08-19 09:36:26 +0000
@@ -39,6 +39,20 @@
                            list,
                            TopoSorter(graph).iter_topo_order())
++    def assertSortAndIterateOrder(self, graph):
++        """Check that sorting and iter_topo_order on graph really results in topological order.
++
++        For every child in the graph, check if it comes after all of it's parents.
++        """
++        sort_result = topo_sort(graph)
++        iter_result = list(TopoSorter(graph).iter_topo_order())
++        for (node, parents) in graph:
++            for parent in parents:
++                self.assertTrue(sort_result.index(node) > sort_result.index(parent),
++                    "parent %s must come before child %s:\n%s" % (parent, node, sort_result))
++                self.assertTrue(iter_result.index(node) > iter_result.index(parent),
++                    "parent %s must come before child %s:\n%s" % (parent, node, iter_result))
++
      def test_tsort_empty(self):
          """TopoSort empty list"""
          self.assertSortAndIterate([], [])
@@ -72,10 +86,10 @@
      def test_tsort_partial(self):
          """Topological sort with partial ordering.
--        If the graph does not give an order between two nodes, they are
--        returned in lexicographical order.
++        Multiple correct orderings are possible, so test for
++        correctness, not for exact match on the resulting list.
          """
--        self.assertSortAndIterate(([(0, []),
++        self.assertSortAndIterateOrder([(0, []),
                                     (1, [0]),
                                     (2, [0]),
                                     (3, [0]),
@@ -83,8 +97,7 @@
                                     (5, [1, 2]),
                                     (6, [1, 2]),
                                     (7, [2, 3]),
--                                   (8, [0, 1, 4, 5, 6])]),
--                                  [0, 1, 2, 3, 4, 5, 6, 7, 8])
++                                   (8, [0, 1, 4, 5, 6])])
      def test_tsort_unincluded_parent(self):
          """Sort nodes, but don't include some parents in the output"""
 === modified file 'bzrlib/tsort.py'
 --- bzrlib/tsort.py	2009-08-17 15:26:18 +0000
 +++ bzrlib/tsort.py	2009-08-19 09:36:26 +0000
@@ -20,6 +20,7 @@
  from bzrlib import errors
  import bzrlib.revision as _mod_revision
++from collections import deque
  __all__ = ["topo_sort", "TopoSorter", "merge_sort", "MergeSorter"]
@@ -34,8 +35,57 @@
      their children.
      node identifiers can be any hashable object, and are typically strings.
++
++    This function has the same purpose as the TopoSorter class, but uses a
++    different algorithm to sort the graph. That means that while both return a list
++    with parents before their child nodes, the exact ordering can be different.
++
++    topo_sort is faster when the whole list is needed, while when iterating over a
++    part of the list, TopoSorter.iter_topo_order should be used.
      """
--    return TopoSorter(graph).sorted()
++    # store a dict of the graph.
++    graph = dict(graph)
++    # this is the stack storing on which the sorted nodes are pushed.
++    node_stack = []
++
++    # count the number of children for every node in the graph
++    node_child_count = dict.fromkeys(graph.iterkeys(), 0)
++    for parents in graph.itervalues():
++        for parent in parents:
++            # don't count the parent if it's a ghost
++            if parent in node_child_count:
++                node_child_count[parent] += 1
++    # keep track of nodes without children in a separate list
++    nochild_nodes = deque([node for (node, n) in node_child_count.iteritems() if n == 0])
++
++    graph_pop = graph.pop
++    node_stack_append = node_stack.append
++    nochild_nodes_pop = nochild_nodes.pop
++    nochild_nodes_append = nochild_nodes.append
++
++    while nochild_nodes:
++        # pick a node without a child and add it to the stack.
++        node_name = nochild_nodes_pop()
++        node_stack_append(node_name)
++
++        # the parents of the node lose it as a child; if it was the last
++        # child, add the parent to the list of childless nodes.
++        parents = graph_pop(node_name)
++        for parent in parents:
++            if parent in node_child_count:
++                node_child_count[parent] -= 1
++                if node_child_count[parent] == 0:
++                    nochild_nodes_append(parent)
++
++    # if there are still nodes left in the graph,
++    # that means that there is a cycle
++    if graph:
++        raise errors.GraphCycleError(graph)
++
++    # the nodes where pushed on the stack child first, so this list needs to be
++    # reversed before returning it.
++    node_stack.reverse()
++    return node_stack
  class TopoSorter(object):
@@ -60,22 +110,8 @@
          iteration or sorting may raise GraphCycleError if a cycle is present
          in the graph.
          """
--        # a dict of the graph.
++        # store a dict of the graph.
          self._graph = dict(graph)
--        self._visitable = set(self._graph)
--        ### if debugging:
--        # self._original_graph = dict(graph)
--
--        # this is a stack storing the depth first search into the graph.
--        self._node_name_stack = []
--        # at each level of 'recursion' we have to check each parent. This
--        # stack stores the parents we have not yet checked for the node at the
--        # matching depth in _node_name_stack
--        self._pending_parents_stack = []
--        # this is a set of the completed nodes for fast checking whether a
--        # parent in a node we are processing on the stack has already been
--        # emitted and thus can be skipped.
--        self._completed_node_names = set()
      def sorted(self):
          """Sort the graph and return as a list.
@@ -100,67 +136,64 @@
          After finishing iteration the sorter is empty and you cannot continue
          iteration.
          """
--        while self._graph:
++        graph = self._graph
++        visitable = set(graph)
++
++        # this is a stack storing the depth first search into the graph.
++        pending_node_stack = []
++        # at each level of 'recursion' we have to check each parent. This
++        # stack stores the parents we have not yet checked for the node at the
++        # matching depth in pending_node_stack
++        pending_parents_stack = []
++
++        # this is a set of the completed nodes for fast checking whether a
++        # parent in a node we are processing on the stack has already been
++        # emitted and thus can be skipped.
++        completed_node_names = set()
++
++        while graph:
              # now pick a random node in the source graph, and transfer it to the
--            # top of the depth first search stack.
--            node_name, parents = self._graph.popitem()
--            self._push_node(node_name, parents)
--            while self._node_name_stack:
--                # loop until this call completes.
--                parents_to_visit = self._pending_parents_stack[-1]
--                # if all parents are done, the revision is done
++            # top of the depth first search stack of pending nodes.
++            node_name, parents = graph.popitem()
++            pending_node_stack.append(node_name)
++            pending_parents_stack.append(list(parents))
++
++            # loop until pending_node_stack is empty
++            while pending_node_stack:
++                parents_to_visit = pending_parents_stack[-1]
++                # if there are no parents left, the revision is done
                  if not parents_to_visit:
                      # append the revision to the topo sorted list
--                    # all the nodes parents have been added to the output, now
--                    # we can add it to the output.
--                    yield self._pop_node()
++                    # all the nodes parents have been added to the output,
++                    # now we can add it to the output.
++                    popped_node = pending_node_stack.pop()
++                    pending_parents_stack.pop()
++                    completed_node_names.add(popped_node)
++                    yield popped_node
                  else:
--                    while self._pending_parents_stack[-1]:
--                        # recurse depth first into a single parent
--                        next_node_name = self._pending_parents_stack[-1].pop()
--                        if next_node_name in self._completed_node_names:
--                            # this parent was completed by a child on the
--                            # call stack. skip it.
--                            continue
--                        if next_node_name not in self._visitable:
--                            continue
--                        # otherwise transfer it from the source graph into the
--                        # top of the current depth first search stack.
--                        try:
--                            parents = self._graph.pop(next_node_name)
--                        except KeyError:
--                            # if the next node is not in the source graph it has
--                            # already been popped from it and placed into the
--                            # current search stack (but not completed or we would
--                            # have hit the continue 4 lines up.
--                            # this indicates a cycle.
--                            raise errors.GraphCycleError(self._node_name_stack)
--                        self._push_node(next_node_name, parents)
--                        # and do not continue processing parents until this 'call'
--                        # has recursed.
--                        break
--
--    def _push_node(self, node_name, parents):
--        """Add node_name to the pending node stack.
--
--        Names in this stack will get emitted into the output as they are popped
--        off the stack.
--        """
--        self._node_name_stack.append(node_name)
--        self._pending_parents_stack.append(list(parents))
--
--    def _pop_node(self):
--        """Pop the top node off the stack
--
--        The node is appended to the sorted output.
--        """
--        # we are returning from the flattened call frame:
--        # pop off the local variables
--        node_name = self._node_name_stack.pop()
--        self._pending_parents_stack.pop()
--
--        self._completed_node_names.add(node_name)
--        return node_name
++                    # recurse depth first into a single parent
++                    next_node_name = parents_to_visit.pop()
++
++                    if next_node_name in completed_node_names:
++                        # parent was already completed by a child, skip it.
++                        continue
++                    if next_node_name not in visitable:
++                        # parent is not a node in the original graph, skip it.
++                        continue
++
++                    # transfer it along with its parents from the source graph
++                    # into the top of the current depth first search stack.
++                    try:
++                        parents = graph.pop(next_node_name)
++                    except KeyError:
++                        # if the next node is not in the source graph it has
++                        # already been popped from it and placed into the
++                        # current search stack (but not completed or we would
++                        # have hit the continue 6 lines up).  this indicates a
++                        # cycle.
++                        raise errors.GraphCycleError(pending_node_stack)
++                    pending_node_stack.append(next_node_name)
++                    pending_parents_stack.append(list(parents))
  def merge_sort(graph, branch_tip, mainline_revisions=None, generate_revno=False):