Code review comment for lp:~mkbosmans/bzr/topo_sort

Revision history for this message
John A Meinel (jameinel) wrote :

Sorry, I didn't mean to mark it approve prematurely.

To give some numbers using topo_sort on 26k revs of bzr.dev loaded into a dict:

 188.0ms bzr.dev
  75.8ms mkbosmans code
  74.0ms getting rid of the "if x in y" lookup (not a big win)
  64.2ms changing "foo.append" to "foo_append"
  62.5ms using nochildren[-1] rather than always pop+append
  63.0ms using nochildren[last_tip] (probably a small win on more linear histories)
  65.6ms reversed(node_name_stack) # I was surprised that reversed(x) is slower than
          x.reverse() when the list has 26k items

  37.4ms Pyrex version that mostly matches the optimized version I put together

  49.9ms k = KnownGraph(parent_map); k.topo_sort()
   9.5ms k.topo_sort()

So building the KnownGraph object costs approximately as much as the optimized C topo_sort (which makes sense, because to find_gdfo we basically topo_sort and update the counters.)

However, once we *have* a KnownGraph (if we decided we wanted one for some other reason), then we can generate the topological sorted output in <10ms. Or about 18x faster than the original python TopoSorted implementation.

My test code is available at lp:~jameinel/bzr/1.18-topo-sort

The new implementations in _known_graph.pyx don't have test cases on them, and we would need to implement topo_sort in _known_graph.py, etc. But I honestly think that is the area to work in to get the best performance.

I believe you were also considering updating "tsort.merge_sort", and again this would be the place to look into writing some really optimized code.

review: Needs Fixing

« Back to merge proposal