Bazaar

Code review comment for lp:~mkbosmans/bzr/topo_sort

topo_sort
Merge into trunk-old

Revision history for this message

John A Meinel (jameinel) wrote on 2009-08-04:

131 + for parents in graph.itervalues():
132 + for parent in parents:
133 + if parent in nchildren:
134 + nchildren[parent] += 1

^- since you just populated the dictionary with 0 for every node, the only times when a parent isn't in the nchildren dict is when it is a ghost.
So I wouldn't do a "x in y" check for each one, but just catch a potential KeyError.

eg:

for parents in graph.itervalues():
  for parent in parents:
    try:
      nchildren[parent] += 1
    except KeyError:
      # Ghost, we don't track their child count
      pass

Similarly for this pass:
145 + parents = graph.pop(node_name)
146 + for parent in parents:
147 + if parent in nchildren:
148 + nchildren[parent] -= 1
149 + if nchildren[parent] == 0:
150 + nochildnodes.append(parent)

Actually, in this pass, if we end up with a parent that isn't in 'nchildren' that didn't miss earlier, then we probably have a serious issue.

I might actually turn the first pass into gathering nodes which are known to be ghosts, doing another try/except here and making sure anything missing is already a known ghost.

Avoiding the 'if' check should more than make up for whatever overhead introduced when there is a ghost.

There are other things we can do if we really want to optimize this loop.

1) You are calling "foo.append()". It turns out to usually be a bit faster to do:
  foo_append = foo.append
  while XXX:
     foo_append

2) You seem to be popping an item off the list at the beginning, and probably are often pushing (append) one back on at the end. In my tests with this sort of code in KnownGraph it turns out to often be a lot better to peek at the last item, and then replace it the first time you append, rather than pop + append. For example:

no_children_append = no_children.append
node_stack_append = node_stack.append
graph_pop = graph.pop

... # while
node_name = no_children[-1]
remaining = True
node_stack_append(node_name)

parents = graph_pop(node_name)
for parent in parents:
  try:
    n = numchildren[parent] - 1
  except KeyError:
    # We don't worry about ghosts
    continue
  if n > 0:
    numchildren[parent] = n
  else:
    # We could set num_children[parent] = 0, but we don't need to because
    # it shouldn't ever be referenced again
    # we might want to actually do numchildren.pop(parent), because that will
    # make future dict lookups slightly faster, at the cost of resizing
    # needs to be benchmarked
    if remaining:
      remaining = False
      no_children[-1] = parent
    else:
      no_children_append(parent)
if remaining:
  # We added this node, and didn't have any parents replace it
  # remove the last entry
  no_children.pop()

You can actually take it a step farther, and never 'pop' the no_children list, but keep a pointer to the 'current' item. And then append when you need to add one more, and just move the pointer otherwise. (Look at the _known_graph.pyx code for some details.)

I would be curious what results you would get, but I wouldn't be surprised if it was significant. Given that you are at 260ms for 20k+ nodes, you are already in the sub ms per node, which is where stuff like attribute lookup, append/pop really start to add up.

...

node_name_stack.reverse()
return node_name_stack

We might consider doing this as:

return reversed(node_name_stack)

instead.

Creating a reverse iterator is cheap as it doesn't require doing any movement of actual nodes (so no memory allocation or memcpy of items.)

I don't think we have any code that assumes the result of 'topo_sort' is anything but something you can iterate. (ie, we don't have anything that indexes into it.)

Of course, if we really think this code is a bottleneck and worth optimizing, then we may want to keep a reasonable and fairly simple implementation in python, and look into writing a pyrex version.

Could you look at some of the suggestions here and give benchmark results?

131	+ for parents in graph.itervalues():
132	+ for parent in parents:
133	+ if parent in nchildren:
134	+ nchildren[parent] += 1

eg:

for parents in graph.itervalues():
  for parent in parents:
    try:
      nchildren[parent] += 1
    except KeyError:
      # Ghost, we don't track their child count
      pass

Similarly for this pass:
145	+ parents = graph.pop(node_name)
146	+ for parent in parents:
147	+ if parent in nchildren:
148	+ nchildren[parent] -= 1
149	+ if nchildren[parent] == 0:
150	+ nochildnodes.append(parent)

Actually, in this pass, if we end up with a parent that isn't in 'nchildren' that didn't miss earlier, then we probably have a serious issue.

I might actually turn the first pass into gathering nodes which are known to be ghosts, doing another try/except here and making sure anything missing is already a known ghost.

Avoiding the 'if' check should more than make up for whatever overhead introduced when there is a ghost.

There are other things we can do if we really want to optimize this loop.

1) You are calling "foo.append()". It turns out to usually be a bit faster to do:
  foo_append = foo.append
  while XXX:
     foo_append

no_children_append = no_children.append
node_stack_append = node_stack.append
graph_pop = graph.pop

... # while
node_name = no_children[-1]
remaining = True
node_stack_append(node_name)

parents = graph_pop(node_name)
for parent in parents:
  try:
    n = numchildren[parent] - 1
  except KeyError:
    # We don't worry about ghosts
    continue
  if n > 0:
    numchildren[parent] = n
  else:
    # We could set num_children[parent] = 0, but we don't need to because
    # it shouldn't ever be referenced again
    # we might want to actually do numchildren.pop(parent), because that will 
    # make future dict lookups slightly faster, at the cost of resizing
    # needs to be benchmarked
    if remaining:
      remaining = False
      no_children[-1] = parent
    else:
      no_children_append(parent)
if remaining:
  # We added this node, and didn't have any parents replace it
  # remove the last entry
  no_children.pop()

...

node_name_stack.reverse()
return node_name_stack

We might consider doing this as:

return reversed(node_name_stack)

instead.

Creating a reverse iterator is cheap as it doesn't require doing any movement of actual nodes (so no memory allocation or memcpy of items.)

I don't think we have any code that assumes the result of 'topo_sort' is anything but something you can iterate. (ie, we don't have anything that indexes into it.)

Of course, if we really think this code is a bottleneck and worth optimizing, then we may want to keep a reasonable and fairly simple implementation in python, and look into writing a pyrex version.

Could you look at some of the suggestions here and give benchmark results?

review: Approve

« Back to merge proposal