Bazaar

Merge lp:~jameinel/bzr/1.17-rework-annotate into lp:~bzr/bzr/trunk-old

1.17-rework-annotate
Merge into trunk-old

Proposed by John A Meinel on 2009-07-06

Status:

Merged

Merged at revision:

not available

Proposed branch:

lp:~jameinel/bzr/1.17-rework-annotate

Merge into:

lp:~bzr/bzr/trunk-old

Diff against target:

2746 lines

To merge this branch:

bzr merge lp:~jameinel/bzr/1.17-rework-annotate

High

Fix Released

Link a bug report

Reviewer	Date Requested	Status
Vincent Ladeuil	2009-07-07	Pending
bzr-core	2009-07-06	Pending
Review via email: mp+8281@code.launchpad.net

Revision history for this message

John A Meinel (jameinel) wrote on 2009-07-06:

This is a fairly major overhaul of the annotate code, with an eye on improving annotations overall. In the short term, it just makes it faster (~9s => ~7s for NEWS)

Overview of changes
1) Some small tweaks to BranchBuilder that I needed to write some test cases.
2) Changing from a bunch of loose functions in 'bzrlib.annotate.reannotate*' to a class Annotator.
3) Re-implement _KnitAnnotator as an implementation of this class. I didn't change much about how the texts were extracted and then compared, but there is a much better test suite against it now. It also vetted the design a bit, to ensure that the Annotator could be properly subclassed to do specialized extraction. (Knits and knitpack gives us hints as to what our delta should be, so that we don't have to re-delta every text versus all of its parents. Timing shows this to be rather significant.)
4) Implement a pyrex version of Annotator, to handle some inner-loop functionality. (Nicely enough, you can still subclass from it.)
5) This also includes a fairly fundamental change to how the annotation is produced.
   a) Switch from [(ann1, line1), (ann2, line2)] to ([ann1, ann2], [line1, line2]) style of tracking annotations. This means fewer times when we have to re-cast the data from a list of annotated lines into a list of plain lines.
   b) When computing the delta, compare all plain lines first. The prior code used to compare annotated lines, because it made computing overall annotations faster. However, that introduced bug #387294.
   c) Start tracking *multiple* sources for a given line. This means that rather than resolving 'heads()' and collisions at every revision, we wait to do the resolution on the *final* text. This removes a lot of heads() calls for stuff like NEWS (and is the primary performance improvement). I don't think this gets us a lot when dealing with a command line interface, but it has lots of potential for a GUI (which could then show all sources that introduced a line, etc.)

Still to do:
1) I'd like to break it up a bit more, and allow you to pass some sort of Policy object into the code. That would let you do things like ignore whitespace changes, only annotate based on mainline changes, etc.

2) Support _break_annotation_tie again. This is really something I'd like to fold into Policy, but I guess MySQL wanted a custom implementation. I don't really like the current api, as it is probably fixed at 2 revisions, and it passes in the lines along with the revisions. But I can certainly adapt what I need to the old api. Note that technically the api supports > 2, but I doubt that is actually supported anywhere, but I haven't seen the MySQL implementation as a comparison point.

3) GUI support. I don't really know how to expose the Annotator functionality to a GUI, or if the api really works well there until I've actually written some code. However, this patch has gotten too big already, so I'd like to get it reviewed as is.

This is a fairly major overhaul of the annotate code, with an eye on improving annotations overall. In the short term, it just makes it faster (~9s => ~7s for NEWS)

Overview of changes
1) Some small tweaks to BranchBuilder that I needed to write some test cases.
2) Changing from a bunch of loose functions in 'bzrlib.annotate.reannotate*' to a class Annotator.
3) Re-implement _KnitAnnotator as an implementation of this class. I didn't change much about how the texts were extracted and then compared, but there is a much better test suite against it now. It also vetted the design a bit, to ensure that the Annotator could be properly subclassed to do specialized extraction. (Knits and knitpack gives us hints as to what our delta should be, so that we don't have to re-delta every text versus all of its parents. Timing shows this to be rather significant.)
4) Implement a pyrex version of Annotator, to handle some inner-loop functionality. (Nicely enough, you can still subclass from it.)
5) This also includes a fairly fundamental change to how the annotation is produced. 
   a) Switch from [(ann1, line1), (ann2, line2)] to ([ann1, ann2], [line1, line2]) style of tracking annotations. This means fewer times when we have to re-cast the data from a list of annotated lines into a list of plain lines.
   b) When computing the delta, compare all plain lines first. The prior code used to compare annotated lines, because it made computing overall annotations faster. However, that introduced bug #387294.
   c) Start tracking *multiple* sources for a given line. This means that rather than resolving 'heads()' and collisions at every revision, we wait to do the resolution on the *final* text. This removes a lot of heads() calls for stuff like NEWS (and is the primary performance improvement). I don't think this gets us a lot when dealing with a command line interface, but it has lots of potential for a GUI (which could then show all sources that introduced a line, etc.)

Revision history for this message

Vincent Ladeuil (vila) wrote on 2009-07-07:

Download full text (4.5 KiB)

Review: Approve

Good to see more tests in that area !

There is little to comment on given the detailed cover letter, I
like the cleanup (to come ? :) in annotate and the introduction
of the Annotator class, but since you intend to build policy
classes as front-end, why not make the class private until you
feel more confident about the overall API ? Like you, I'm not
sure the GUIs will really need to access that class...

>>>>> "jam" == John A Meinel <email address hidden> writes:

jam> You have been requested to review the proposed merge of
jam> lp:~jameinel/bzr/1.17-rework-annotate into lp:bzr.

    jam> This is a fairly major overhaul of the annotate code,
    jam> with an eye on improving annotations overall. In the
    jam> short term, it just makes it faster (~9s => ~7s for
    jam> NEWS)

Which is always good to take (for interested readers that's still
with using --show-ids).

jam> Overview of changes

jam> 1) Some small tweaks to BranchBuilder that I needed to
jam> write some test cases.

Good, obviously the tests you modified are clearer too.

jam> 2) Changing from a bunch of loose functions in
jam> bzrlib.annotate.reannotate*' to a class Annotator.

Good step forward.

A couple of comments on the class API:

- _update_needed_children() and _get_needed_keys() sounds like
good candidates for Graph() or some specialization of it.

- _update_from_one_parent() the doc string says first parent, why
not call it _update_from_first_parent() then ? Unless you envision
some other possible usage...

- add_special_text(), hmm, what's that ? The doc string doesn't
help a lot :-) Does that need to be public ?

    jam> 4) Implement a pyrex version of Annotator, to handle some
    jam> inner-loop functionality. (Nicely enough, you can still subclass
    jam> from it.)

It would be nice to defines in pyrex *only* those inner-loops,
not a requirement to land that patch tough.

<snip/>

    jam> Still to do:
    jam> 1) I'd like to break it up a bit more, and allow you to pass some
    jam> sort of Policy object into the code. That would let you do things
    jam> like ignore whitespace changes, only annotate based on mainline
    jam> changes, etc.

    jam> 2) Support _break_annotation_tie again. This is really something
    jam> I'd like to fold into Policy, but I guess MySQL wanted a custom
    jam> implementation. I don't really like the current api, as it is
    jam> probably fixed at 2 revisions, and it passes in the lines along
    jam> with the revisions. But I can certainly adapt what I need to the
    jam> old api. Note that technically the api supports > 2, but I doubt
    jam> that is actually supported anywhere, but I haven't seen the MySQL
    jam> implementation as a comparison point.

Pretty much:

- extract the date from the revids of the annotations (only the
first two ones),
- return the oldest

It would be very appreciated to not break the actual result or at
the very least provides a way to get the same functionality
before 1.17 is out.

    jam> 3) GUI support. I don't really know how to expose the Annotator
    jam> functionality to a GUI, or if the api really works well there
    jam...

Review: Approve

Good to see more tests in that area !

>>>>> "jam" == John A Meinel <john@arbash-meinel.com> writes:

jam> You have been requested to review the proposed merge of
    jam> lp:~jameinel/bzr/1.17-rework-annotate into lp:bzr.

jam> This is a fairly major overhaul of the annotate code,
    jam> with an eye on improving annotations overall. In the
    jam> short term, it just makes it faster (~9s => ~7s for
    jam> NEWS)

Which is always good to take (for interested readers that's still
with using --show-ids).

jam> Overview of changes

jam> 1) Some small tweaks to BranchBuilder that I needed to
    jam> write some test cases.

Good, obviously the tests you modified are clearer too.

jam> 2) Changing from a bunch of loose functions in
    jam> bzrlib.annotate.reannotate*' to a class Annotator.

Good step forward.

A couple of comments on the class API:

- _update_needed_children() and _get_needed_keys() sounds like
  good candidates for Graph() or some specialization of it.

- _update_from_one_parent() the doc string says first parent, why
  not call it _update_from_first_parent() then ? Unless you envision
  some other possible usage...

- add_special_text(), hmm, what's that ? The doc string doesn't
  help a lot :-) Does that need to be public ?

jam> 4) Implement a pyrex version of Annotator, to handle some
    jam> inner-loop functionality. (Nicely enough, you can still subclass
    jam> from it.)

It would be nice to defines in pyrex *only* those inner-loops,
not a requirement to land that patch tough.

<snip/>

Pretty much:

- extract the date from the revids of the annotations (only the
  first two ones),
- return the oldest

It would be very appreciated to not break the actual result or at
the very least provides a way to get the same functionality
before 1.17 is out.

jam> 3) GUI support. I don't really know how to expose the Annotator
    jam> functionality to a GUI, or if the api really works well there
    jam> until I've actually written some code. However, this patch has
    jam> gotten too big already, so I'd like to get it reviewed as is.

There is already work to be done in the two major GUIs (bzr-gtk
and qbzr) regarding annotations, the API we provide is not enough
to keep both code bases as simple as they should. So I'd say, as
long as you don't break the actual one, feel free to propose a
new enhanced one ;).

I'd be happy to test any proposal in bzr-gtk (which I know better
than qbzr) and I had discussions on that precise point with Gary
regarding qbzr so I think he'll be interested too.

A couple of nits below.

jam> === added file 'bzrlib/_annotator_py.py'
...
    jam> + def _record_annotation(self, key, parent_keys, annotations):
...
    jam> +                the_heads = heads(annotation)
    jam> +                if len(the_heads) == 1:
    jam> +                    for head in the_heads:
    jam> +                        break

That's a really funny way to write head = heads[0]... Do I miss
something ?

<snip/>

jam> === added file 'bzrlib/_annotator_pyx.pyx'
...
    jam> +class Annotator:
...
    jam> +    def _update_needed_children(self, key, parent_keys):
    jam> +        for parent_key in parent_keys:
    jam> +            if parent_key in self._num_needed_children:
    jam> +                self._num_needed_children[parent_key] += 1

+=1 ? Hmm, I like that pyrex version you're using, send me some :)

Revision history for this message

John A Meinel (jameinel) wrote on 2009-07-07:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> There is little to comment on given the detailed cover letter, I
> like the cleanup (to come ? :) in annotate and the introduction
> of the Annotator class, but since you intend to build policy
> classes as front-end, why not make the class private until you
> feel more confident about the overall API ? Like you, I'm not
> sure the GUIs will really need to access that class...

I'm not sure if you understood me correctly. Annotator is *meant* to be
the final api for getting an annotation for various versions of a file.

An AnnotationPolicy is meant to be the way to say "I want to ignore
whitespace", etc.

I can make it hidden, but since:

VF.get_annotator() is meant to be the public api that
qannotate/gannotate will use...

...
>
> Pretty much:
>
> - extract the date from the revids of the annotations (only the
> first two ones),
> - return the oldest
>
> It would be very appreciated to not break the actual result or at
> the very least provides a way to get the same functionality
> before 1.17 is out.

So my idea was to do:

_break_annotation_tie = None

if _break_annotation_tie is not None:
# mutate the data to fit the old api
else:
# do it my way

...

heads is a set, you can't do set()[0], you have to:

list(heads)[0]

iter(heads).next()

heads.pop() # though I believe it is a frozenset and this is illegal

[head for head in heads][0]

for head in heads:
continue

for head in heads:
pass

for head in heads:
break

Take your pick. The last one is the fastest because it evaluates the
iter() inline, and doesn't have a function call. Nor does it build an
intermediate list. And for whatever reason, TIMEIT says that 'break' is
faster than the others.

>
> <snip/>
>
> jam> === added file 'bzrlib/_annotator_pyx.pyx'
> ...
> jam> +class Annotator:
> ...
> jam> + def _update_needed_children(self, key, parent_keys):
> jam> + for parent_key in parent_keys:
> jam> + if parent_key in self._num_needed_children:
> jam> + self._num_needed_children[parent_key] += 1
>
> +=1 ? Hmm, I like that pyrex version you're using, send me some :)

Actually for pyrex 0.9.8 you can even do:

cdef list foo

and then *it* will translate

foo.append(...)

into

PyList_Append(foo, ...)

It would be *really* nice to depend on 0.9.8.5 as it would clean up
certain bits. (Note it only really supports lists, it allows:
cdef dict foo
cdef tuple foo

and will do runtime checking, etc, but it doesn't have any smarts about
set_item /get item/append, etc.
)

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpTftEACgkQJdeBCYSNAAM/TgCfdlmsdNtMT2+t+lFRsL3QTgVr
E8AAmwdgejnYx0JK1rWMOIrEBeQJJfSN
=jaUI
-----END PGP SIGNATURE-----

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> There is little to comment on given the detailed cover letter, I
> like the cleanup (to come ?  :)  in annotate and the introduction
> of the Annotator class, but since you intend to build policy
> classes as front-end, why not make the class private until you
> feel more confident about the overall API ? Like you, I'm not
> sure the GUIs will really need to access that class...

I'm not sure if you understood me correctly. Annotator is *meant* to be
the final api for getting an annotation for various versions of a file.

An AnnotationPolicy is meant to be the way to say "I want to ignore
whitespace", etc.

I can make it hidden, but since:

VF.get_annotator() is meant to be the public api that
qannotate/gannotate will use...

...
> 
> Pretty much:
> 
> - extract the date from the revids of the annotations (only the
>   first two ones),
> - return the oldest
> 
> It would be very appreciated to not break the actual result or at
> the very least provides a way to get the same functionality
> before 1.17 is out.

So my idea was to do:

_break_annotation_tie = None

if _break_annotation_tie is not None:
  # mutate the data to fit the old api
else:
  # do it my way

...

>     jam> +                the_heads = heads(annotation)
>     jam> +                if len(the_heads) == 1:
>     jam> +                    for head in the_heads:
>     jam> +                        break
> 
> That's a really funny way to write head = heads[0]... Do I miss
> something ?

heads is a set, you can't do set()[0], you have to:

list(heads)[0]

iter(heads).next()

heads.pop() # though I believe it is a frozenset and this is illegal

[head for head in heads][0]

for head in heads:
  continue

for head in heads:
  pass

for head in heads:
  break

> 
> <snip/>
> 
>     jam> === added file 'bzrlib/_annotator_pyx.pyx'
> ...
>     jam> +class Annotator:
> ...
>     jam> +    def _update_needed_children(self, key, parent_keys):
>     jam> +        for parent_key in parent_keys:
>     jam> +            if parent_key in self._num_needed_children:
>     jam> +                self._num_needed_children[parent_key] += 1
> 
> +=1 ? Hmm, I like that pyrex version you're using, send me some :)

Actually for pyrex 0.9.8 you can even do:

cdef list foo

and then *it* will translate

foo.append(...)

into

PyList_Append(foo, ...)

It would be *really* nice to depend on 0.9.8.5 as it would clean up
certain bits. (Note it only really supports lists, it allows:
  cdef dict foo
  cdef tuple foo

and will do runtime checking, etc, but it doesn't have any smarts about
set_item /get item/append, etc.
)

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpTftEACgkQJdeBCYSNAAM/TgCfdlmsdNtMT2+t+lFRsL3QTgVr
E8AAmwdgejnYx0JK1rWMOIrEBeQJJfSN
=jaUI
-----END PGP SIGNATURE-----

Revision history for this message

John A Meinel (jameinel) wrote on 2009-07-07:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...
>
> A couple of comments on the class API:
>
> - _update_needed_children() and _get_needed_keys() sounds like
> good candidates for Graph() or some specialization of it.

True, though they also have side effects like removing texts when there
are no more needed children, etc.

>
> - _update_from_one_parent() the doc string says first parent, why
> not call it _update_from_first_parent() then ? Unless you envision
> some other possible usage...

Sure.

>
> - add_special_text(), hmm, what's that ? The doc string doesn't
> help a lot :-) Does that need to be public ?

It does, as it is used by WorkingTree to add the 'current:' text to be
annotated. (One other benefit of this new code is that 'bzr annotate
NEWS' after a merge doesn't annotate both parents independently... \o/)

>
> jam> 4) Implement a pyrex version of Annotator, to handle some
> jam> inner-loop functionality. (Nicely enough, you can still subclass
> jam> from it.)
>
> It would be nice to defines in pyrex *only* those inner-loops,
> not a requirement to land that patch tough.
>

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpThhgACgkQJdeBCYSNAAN5GACgvRr9AoVrnRJ/u4Gd2nPEjAil
mTcAn3q07M6kiM6qaerO6ldZRQSOp0wh
=9LiN
-----END PGP SIGNATURE-----

Revision history for this message

Vincent Ladeuil (vila) wrote on 2009-07-08:

>>>>> "jam" == John A Meinel <email address hidden> writes:

    >> There is little to comment on given the detailed cover letter, I
    >> like the cleanup (to come ? :) in annotate and the introduction
    >> of the Annotator class, but since you intend to build policy
    >> classes as front-end, why not make the class private until you
    >> feel more confident about the overall API ? Like you, I'm not
    >> sure the GUIs will really need to access that class...

    jam> I'm not sure if you understood me correctly. Annotator
    jam> is *meant* to be the final api for getting an annotation
    jam> for various versions of a file.

Oh ! Indeed, I understood you wanted trees to be the primary interface.

<snip/>

jam> So my idea was to do:

jam> _break_annotation_tie = None

    jam> if _break_annotation_tie is not None:
    jam> # mutate the data to fit the old api
    jam> else:
    jam> # do it my way

Fine.

jam> heads is a set, you can't do set()[0], you have to:

jam> list(heads)[0]

jam> iter(heads).next()

jam> heads.pop() # though I believe it is a frozenset and this is illegal

jam> [head for head in heads][0]

jam> for head in heads:
jam> continue

jam> for head in heads:
jam> pass

jam> for head in heads:
jam> break

Then I'd go with:

for head in heads: break # Get head from the set

on a single line to make it more obvious.

That's the first time I see that idiom, I will not be surprised
next time (hopefully, but others can).

<snip/>

>> +=1 ? Hmm, I like that pyrex version you're using, send me some :)

jam> Actually for pyrex 0.9.8 you can even do:

Oh ! Yes, jaunty is still at 0.9.7.2 ...

Reading the NEWS about it, I can only agree here.

What is needed to have the package updated ? Host it in the bzr
PPAs ?

Vincent

Revision history for this message

Vincent Ladeuil (vila) wrote on 2009-07-08:

>>>>> "jam" == John A Meinel <email address hidden> writes:

    jam> ...
    >>
    >> A couple of comments on the class API:
    >>
    >> - _update_needed_children() and _get_needed_keys() sounds like
    >> good candidates for Graph() or some specialization of it.

jam> True, though they also have side effects like removing
jam> texts when there are no more needed children, etc.

I see.

<snip/>

>> - add_special_text(), hmm, what's that ? The doc string doesn't
>> help a lot :-) Does that need to be public ?

jam> It does, as it is used by WorkingTree to add the 'current:' text to be
jam> annotated.

Haaa ! Worth mentioning then :)

    jam> (One other benefit of this new code is that 'bzr
    jam> annotate NEWS' after a merge doesn't annotate both
    jam> parents independently... \o/)

Hurrah !

Vincent

Revision history for this message

John A Meinel (jameinel) wrote on 2009-07-08:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Vincent Ladeuil wrote:
>>>>>> "jam" == John A Meinel <email address hidden> writes:
>
> >> There is little to comment on given the detailed cover letter, I
> >> like the cleanup (to come ? :) in annotate and the introduction
> >> of the Annotator class, but since you intend to build policy
> >> classes as front-end, why not make the class private until you
> >> feel more confident about the overall API ? Like you, I'm not
> >> sure the GUIs will really need to access that class...
>
> jam> I'm not sure if you understood me correctly. Annotator
> jam> is *meant* to be the final api for getting an annotation
> jam> for various versions of a file.
>
> Oh ! Indeed, I understood you wanted trees to be the primary interface.

So I'm trying to resolve the two issues. But a *tree* doesn't talk about
the history of a file. I might add:

tree.get_annotator(file_id)

That would say something like: "give me an object that can talk about
the history of the file you have."

I need some way for the WT to inject the 'current' version, and have a
default to annotating that tip. I haven't worked out the details yet.

You *could* do this by having a bunch of revision trees for each
revision you want to annotate, and then having the annotator cached at
the VF layer. But it seems better to control the caching lifetime and
parameters in the GUI layer, rather than underneath trees inside VF.

...

>
> Then I'd go with:
>
> for head in heads: break # Get head from the set
>
> on a single line to make it more obvious.
>
> That's the first time I see that idiom, I will not be surprised
> next time (hopefully, but others can).
>
> <snip/>
>
> >> +=1 ? Hmm, I like that pyrex version you're using, send me some :)
>
> jam> Actually for pyrex 0.9.8 you can even do:
>
> Oh ! Yes, jaunty is still at 0.9.7.2 ...
>
> Reading the NEWS about it, I can only agree here.
>
> What is needed to have the package updated ? Host it in the bzr
> PPAs ?
>
> Vincent

I don't really know.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpUqcYACgkQJdeBCYSNAAN8nwCfXd9wjlU4U9L48k7ollH9qFqX
kzoAoJQPLsJPwIy7y2CcrjeyOrQFYYAI
=qWQI
-----END PGP SIGNATURE-----

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Vincent Ladeuil wrote:
>>>>>> "jam" == John A Meinel <john@arbash-meinel.com> writes:
> 
>     >> There is little to comment on given the detailed cover letter, I
>     >> like the cleanup (to come ?  :)  in annotate and the introduction
>     >> of the Annotator class, but since you intend to build policy
>     >> classes as front-end, why not make the class private until you
>     >> feel more confident about the overall API ? Like you, I'm not
>     >> sure the GUIs will really need to access that class...
> 
>     jam> I'm not sure if you understood me correctly. Annotator
>     jam> is *meant* to be the final api for getting an annotation
>     jam> for various versions of a file.
> 
> Oh ! Indeed, I understood you wanted trees to be the primary interface.

So I'm trying to resolve the two issues. But a *tree* doesn't talk about
the history of a file. I might add:

tree.get_annotator(file_id)

That would say something like: "give me an object that can talk about
the history of the file you have."

I need some way for the WT to inject the 'current' version, and have a
default to annotating that tip. I haven't worked out the details yet.

...

> 
> Then I'd go with:
> 
>   for head in heads: break # Get head from the set
> 
> on a single line to make it more obvious.
> 
> That's the first time I see that idiom, I will not be surprised
> next time (hopefully, but others can).
> 
> <snip/>
> 
>     >> +=1 ? Hmm, I like that pyrex version you're using, send me some :)
> 
>     jam> Actually for pyrex 0.9.8 you can even do:
> 
> Oh ! Yes, jaunty is still at 0.9.7.2 ...
> 
> Reading the NEWS about it, I can only agree here.
> 
> What is needed to have the package updated ? Host it in the bzr
> PPAs ?
> 
>      Vincent

I don't really know.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpUqcYACgkQJdeBCYSNAAN8nwCfXd9wjlU4U9L48k7ollH9qFqX
kzoAoJQPLsJPwIy7y2CcrjeyOrQFYYAI
=qWQI
-----END PGP SIGNATURE-----

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Aaron Bentley

Denys Duchier

Eric Siegerman

Gary van der Merwe

Jelmer Vernooij

John A Meinel

John Szakmeister

Jonathan Lange

Marius Kruger

Martin Albisetti

Matt Nordhoff

Paul Hummer

SuperMMX

Talden

Yoshinori Sano

to status/vote changes:

Alexander Belchenko

Martin Eisenhardt

Tim Penhey

Vincent Ladeuil

Bazaar

Merge lp:~jameinel/bzr/1.17-rework-annotate into lp:~bzr/bzr/trunk-old

Commit message

Description of the change

Preview Diff

Subscribers

 === modified file '.bzrignore'
 --- .bzrignore	2009-06-22 12:52:39 +0000
 +++ .bzrignore	2009-07-08 23:35:26 +0000
@@ -38,6 +38,7 @@
  ./api
  doc/**/*.html
  doc/developers/performance.png
++bzrlib/_annotator_pyx.c
  bzrlib/_bencode_pyx.c
  bzrlib/_btree_serializer_pyx.c
  bzrlib/_chk_map_pyx.c
 === modified file 'NEWS'
 --- NEWS	2009-07-08 18:05:38 +0000
 +++ NEWS	2009-07-08 23:35:27 +0000
@@ -48,6 +48,9 @@
    diverged-branches`` when a push fails because the branches have
    diverged.  (Neil Martinsen-Burrell, #269477)
++* Annotate would sometimes 'latch on' to trivial lines, causing important
++  lines to be incorrectly annotated. (John Arbash Meinel, #387952)
++
  * Automatic format upgrades triggered by default stacking policies on a
 .16rc1 (or later) smart server work again.
    (Andrew Bennetts, #388675)
@@ -164,7 +167,12 @@
  Improvements
  ************
--``bzr ls`` is now faster. On OpenOffice.org, the time drops from 2.4
++* ``bzr annotate`` can now be significantly faster. The time for
++  ``bzr annotate NEWS`` is down to 7s from 22s in 1.16. Files with long
++  histories and lots of 'duplicate insertions' will be improved more than
++  others. (John Arbash Meinel, Vincent Ladeuil)
++
++* ``bzr ls`` is now faster. On OpenOffice.org, the time drops from 2.4
    to 1.1 seconds. The improvement for ``bzr ls -r-1`` is more
    substantial dropping from 54.3 to 1.1 seconds. (Ian Clatworthy)
 === added file 'bzrlib/_annotator_py.py'
 --- bzrlib/_annotator_py.py	1970-01-01 00:00:00 +0000
 +++ bzrlib/_annotator_py.py	2009-07-08 23:35:27 +0000
@@ -0,0 +1,309 @@
++# Copyright (C) 2009 Canonical Ltd
++#
++# This program is free software; you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation; either version 2 of the License, or
++# (at your option) any later version.
++#
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
++# GNU General Public License for more details.
++#
++# You should have received a copy of the GNU General Public License
++# along with this program; if not, write to the Free Software
++# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
++
++"""Functionality for doing annotations in the 'optimal' way"""
++
++from bzrlib.lazy_import import lazy_import
++lazy_import(globals(), """
++from bzrlib import annotate # Must be lazy to avoid circular importing
++""")
++from bzrlib import (
++    errors,
++    graph as _mod_graph,
++    osutils,
++    patiencediff,
++    ui,
++    )
++
++
++class Annotator(object):
++    """Class that drives performing annotations."""
++
++    def __init__(self, vf):
++        """Create a new Annotator from a VersionedFile."""
++        self._vf = vf
++        self._parent_map = {}
++        self._text_cache = {}
++        # Map from key => number of nexts that will be built from this key
++        self._num_needed_children = {}
++        self._annotations_cache = {}
++        self._heads_provider = None
++        self._ann_tuple_cache = {}
++
++    def _update_needed_children(self, key, parent_keys):
++        for parent_key in parent_keys:
++            if parent_key in self._num_needed_children:
++                self._num_needed_children[parent_key] += 1
++            else:
++                self._num_needed_children[parent_key] = 1
++
++    def _get_needed_keys(self, key):
++        """Determine the texts we need to get from the backing vf.
++
++        :return: (vf_keys_needed, ann_keys_needed)
++            vf_keys_needed  These are keys that we need to get from the vf
++            ann_keys_needed Texts which we have in self._text_cache but we
++                            don't have annotations for. We need to yield these
++                            in the proper order so that we can get proper
++                            annotations.
++        """
++        parent_map = self._parent_map
++        # We need 1 extra copy of the node we will be looking at when we are
++        # done
++        self._num_needed_children[key] = 1
++        vf_keys_needed = set()
++        ann_keys_needed = set()
++        needed_keys = set([key])
++        while needed_keys:
++            parent_lookup = []
++            next_parent_map = {}
++            for key in needed_keys:
++                if key in self._parent_map:
++                    # We don't need to lookup this key in the vf
++                    if key not in self._text_cache:
++                        # Extract this text from the vf
++                        vf_keys_needed.add(key)
++                    elif key not in self._annotations_cache:
++                        # We do need to annotate
++                        ann_keys_needed.add(key)
++                        next_parent_map[key] = self._parent_map[key]
++                else:
++                    parent_lookup.append(key)
++                    vf_keys_needed.add(key)
++            needed_keys = set()
++            next_parent_map.update(self._vf.get_parent_map(parent_lookup))
++            for key, parent_keys in next_parent_map.iteritems():
++                if parent_keys is None: # No graph versionedfile
++                    parent_keys = ()
++                    next_parent_map[key] = ()
++                self._update_needed_children(key, parent_keys)
++                needed_keys.update([key for key in parent_keys
++                                         if key not in parent_map])
++            parent_map.update(next_parent_map)
++            # _heads_provider does some graph caching, so it is only valid while
++            # self._parent_map hasn't changed
++            self._heads_provider = None
++        return vf_keys_needed, ann_keys_needed
++
++    def _get_needed_texts(self, key, pb=None):
++        """Get the texts we need to properly annotate key.
++
++        :param key: A Key that is present in self._vf
++        :return: Yield (this_key, text, num_lines)
++            'text' is an opaque object that just has to work with whatever
++            matcher object we are using. Currently it is always 'lines' but
++            future improvements may change this to a simple text string.
++        """
++        keys, ann_keys = self._get_needed_keys(key)
++        if pb is not None:
++            pb.update('getting stream', 0, len(keys))
++        stream  = self._vf.get_record_stream(keys, 'topological', True)
++        for idx, record in enumerate(stream):
++            if pb is not None:
++                pb.update('extracting', 0, len(keys))
++            if record.storage_kind == 'absent':
++                raise errors.RevisionNotPresent(record.key, self._vf)
++            this_key = record.key
++            lines = osutils.chunks_to_lines(record.get_bytes_as('chunked'))
++            num_lines = len(lines)
++            self._text_cache[this_key] = lines
++            yield this_key, lines, num_lines
++        for key in ann_keys:
++            lines = self._text_cache[key]
++            num_lines = len(lines)
++            yield key, lines, num_lines
++
++    def _get_parent_annotations_and_matches(self, key, text, parent_key):
++        """Get the list of annotations for the parent, and the matching lines.
++
++        :param text: The opaque value given by _get_needed_texts
++        :param parent_key: The key for the parent text
++        :return: (parent_annotations, matching_blocks)
++            parent_annotations is a list as long as the number of lines in
++                parent
++            matching_blocks is a list of (parent_idx, text_idx, len) tuples
++                indicating which lines match between the two texts
++        """
++        parent_lines = self._text_cache[parent_key]
++        parent_annotations = self._annotations_cache[parent_key]
++        # PatienceSequenceMatcher should probably be part of Policy
++        matcher = patiencediff.PatienceSequenceMatcher(None,
++            parent_lines, text)
++        matching_blocks = matcher.get_matching_blocks()
++        return parent_annotations, matching_blocks
++
++    def _update_from_first_parent(self, key, annotations, lines, parent_key):
++        """Reannotate this text relative to its first parent."""
++        (parent_annotations,
++         matching_blocks) = self._get_parent_annotations_and_matches(
++                                key, lines, parent_key)
++
++        for parent_idx, lines_idx, match_len in matching_blocks:
++            # For all matching regions we copy across the parent annotations
++            annotations[lines_idx:lines_idx + match_len] = \
++                parent_annotations[parent_idx:parent_idx + match_len]
++
++    def _update_from_other_parents(self, key, annotations, lines,
++                                   this_annotation, parent_key):
++        """Reannotate this text relative to a second (or more) parent."""
++        (parent_annotations,
++         matching_blocks) = self._get_parent_annotations_and_matches(
++                                key, lines, parent_key)
++
++        last_ann = None
++        last_parent = None
++        last_res = None
++        # TODO: consider making all annotations unique and then using 'is'
++        #       everywhere. Current results claim that isn't any faster,
++        #       because of the time spent deduping
++        #       deduping also saves a bit of memory. For NEWS it saves ~1MB,
++        #       but that is out of 200-300MB for extracting everything, so a
++        #       fairly trivial amount
++        for parent_idx, lines_idx, match_len in matching_blocks:
++            # For lines which match this parent, we will now resolve whether
++            # this parent wins over the current annotation
++            ann_sub = annotations[lines_idx:lines_idx + match_len]
++            par_sub = parent_annotations[parent_idx:parent_idx + match_len]
++            if ann_sub == par_sub:
++                continue
++            for idx in xrange(match_len):
++                ann = ann_sub[idx]
++                par_ann = par_sub[idx]
++                ann_idx = lines_idx + idx
++                if ann == par_ann:
++                    # Nothing to change
++                    continue
++                if ann == this_annotation:
++                    # Originally claimed 'this', but it was really in this
++                    # parent
++                    annotations[ann_idx] = par_ann
++                    continue
++                # Resolve the fact that both sides have a different value for
++                # last modified
++                if ann == last_ann and par_ann == last_parent:
++                    annotations[ann_idx] = last_res
++                else:
++                    new_ann = set(ann)
++                    new_ann.update(par_ann)
++                    new_ann = tuple(sorted(new_ann))
++                    annotations[ann_idx] = new_ann
++                    last_ann = ann
++                    last_parent = par_ann
++                    last_res = new_ann
++
++    def _record_annotation(self, key, parent_keys, annotations):
++        self._annotations_cache[key] = annotations
++        for parent_key in parent_keys:
++            num = self._num_needed_children[parent_key]
++            num -= 1
++            if num == 0:
++                del self._text_cache[parent_key]
++                del self._annotations_cache[parent_key]
++                # Do we want to clean up _num_needed_children at this point as
++                # well?
++            self._num_needed_children[parent_key] = num
++
++    def _annotate_one(self, key, text, num_lines):
++        this_annotation = (key,)
++        # Note: annotations will be mutated by calls to _update_from*
++        annotations = [this_annotation] * num_lines
++        parent_keys = self._parent_map[key]
++        if parent_keys:
++            self._update_from_first_parent(key, annotations, text,
++                                           parent_keys[0])
++            for parent in parent_keys[1:]:
++                self._update_from_other_parents(key, annotations, text,
++                                                this_annotation, parent)
++        self._record_annotation(key, parent_keys, annotations)
++
++    def add_special_text(self, key, parent_keys, text):
++        """Add a specific text to the graph.
++
++        This is used to add a text which is not otherwise present in the
++        versioned file. (eg. a WorkingTree injecting 'current:' into the
++        graph to annotate the edited content.)
++
++        :param key: The key to use to request this text be annotated
++        :param parent_keys: The parents of this text
++        :param text: A string containing the content of the text
++        """
++        self._parent_map[key] = parent_keys
++        self._text_cache[key] = osutils.split_lines(text)
++        self._heads_provider = None
++
++    def annotate(self, key):
++        """Return annotated fulltext for the given key.
++
++        :param key: A tuple defining the text to annotate
++        :return: ([annotations], [lines])
++            annotations is a list of tuples of keys, one for each line in lines
++                        each key is a possible source for the given line.
++            lines the text of "key" as a list of lines
++        """
++        pb = ui.ui_factory.nested_progress_bar()
++        try:
++            for text_key, text, num_lines in self._get_needed_texts(key, pb=pb):
++                self._annotate_one(text_key, text, num_lines)
++        finally:
++            pb.finished()
++        try:
++            annotations = self._annotations_cache[key]
++        except KeyError:
++            raise errors.RevisionNotPresent(key, self._vf)
++        return annotations, self._text_cache[key]
++
++    def _get_heads_provider(self):
++        if self._heads_provider is None:
++            self._heads_provider = _mod_graph.KnownGraph(self._parent_map)
++        return self._heads_provider
++
++    def _resolve_annotation_tie(self, the_heads, line, tiebreaker):
++        if tiebreaker is None:
++            head = sorted(the_heads)[0]
++        else:
++            # Backwards compatibility, break up the heads into pairs and
++            # resolve the result
++            next_head = iter(the_heads)
++            head = next_head.next()
++            for possible_head in next_head:
++                annotated_lines = ((head, line), (possible_head, line))
++                head = tiebreaker(annotated_lines)[0]
++        return head
++
++    def annotate_flat(self, key):
++        """Determine the single-best-revision to source for each line.
++
++        This is meant as a compatibility thunk to how annotate() used to work.
++        :return: [(ann_key, line)]
++            A list of tuples with a single annotation key for each line.
++        """
++        custom_tiebreaker = annotate._break_annotation_tie
++        annotations, lines = self.annotate(key)
++        out = []
++        heads = self._get_heads_provider().heads
++        append = out.append
++        for annotation, line in zip(annotations, lines):
++            if len(annotation) == 1:
++                head = annotation[0]
++            else:
++                the_heads = heads(annotation)
++                if len(the_heads) == 1:
++                    for head in the_heads: break # get the item out of the set
++                else:
++                    head = self._resolve_annotation_tie(the_heads, line,
++                                                        custom_tiebreaker)
++            append((head, line))
++        return out
 === added file 'bzrlib/_annotator_pyx.pyx'
 --- bzrlib/_annotator_pyx.pyx	1970-01-01 00:00:00 +0000
 +++ bzrlib/_annotator_pyx.pyx	2009-07-08 23:35:27 +0000
@@ -0,0 +1,287 @@
++# Copyright (C) 2009 Canonical Ltd
++#
++# This program is free software; you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation; either version 2 of the License, or
++# (at your option) any later version.
++#
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
++# GNU General Public License for more details.
++#
++# You should have received a copy of the GNU General Public License
++# along with this program; if not, write to the Free Software
++# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
++
++"""Functionality for doing annotations in the 'optimal' way"""
++
++cdef extern from "python-compat.h":
++    pass
++
++cdef extern from "Python.h":
++    ctypedef int Py_ssize_t
++    ctypedef struct PyObject:
++        pass
++    ctypedef struct PyListObject:
++        PyObject **ob_item
++    int PyList_CheckExact(object)
++    PyObject *PyList_GET_ITEM(object, Py_ssize_t o)
++    Py_ssize_t PyList_GET_SIZE(object)
++    int PyList_Append(object, object) except -1
++    int PyList_SetItem(object, Py_ssize_t o, object) except -1
++    int PyList_Sort(object) except -1
++
++    int PyTuple_CheckExact(object)
++    object PyTuple_New(Py_ssize_t len)
++    void PyTuple_SET_ITEM(object, Py_ssize_t pos, object)
++    void PyTuple_SET_ITEM_ptr "PyTuple_SET_ITEM" (object, Py_ssize_t,
++                                                  PyObject *)
++    int PyTuple_Resize(PyObject **, Py_ssize_t newlen)
++    PyObject *PyTuple_GET_ITEM(object, Py_ssize_t o)
++    Py_ssize_t PyTuple_GET_SIZE(object)
++
++    PyObject *PyDict_GetItem(object d, object k)
++    int PyDict_SetItem(object d, object k, object v) except -1
++
++    void Py_INCREF(object)
++    void Py_INCREF_ptr "Py_INCREF" (PyObject *)
++    void Py_DECREF_ptr "Py_DECREF" (PyObject *)
++
++    int Py_EQ
++    int Py_LT
++    int PyObject_RichCompareBool(object, object, int opid) except -1
++    int PyObject_RichCompareBool_ptr "PyObject_RichCompareBool" (
++        PyObject *, PyObject *, int opid)
++
++
++from bzrlib import _annotator_py
++
++
++cdef int _check_annotations_are_lists(annotations,
++                                      parent_annotations) except -1:
++    if not PyList_CheckExact(annotations):
++        raise TypeError('annotations must be a list')
++    if not PyList_CheckExact(parent_annotations):
++        raise TypeError('parent_annotations must be a list')
++    return 0
++
++
++cdef int _check_match_ranges(parent_annotations, annotations,
++                             Py_ssize_t parent_idx, Py_ssize_t lines_idx,
++                             Py_ssize_t match_len) except -1:
++    if parent_idx + match_len > PyList_GET_SIZE(parent_annotations):
++        raise ValueError('Match length exceeds len of'
++                         ' parent_annotations %s > %s'
++                         % (parent_idx + match_len,
++                            PyList_GET_SIZE(parent_annotations)))
++    if lines_idx + match_len > PyList_GET_SIZE(annotations):
++        raise ValueError('Match length exceeds len of'
++                         ' annotations %s > %s'
++                         % (lines_idx + match_len,
++                            PyList_GET_SIZE(annotations)))
++    return 0
++
++
++cdef PyObject *_next_tuple_entry(object tpl, Py_ssize_t *pos):
++    pos[0] = pos[0] + 1
++    if pos[0] >= PyTuple_GET_SIZE(tpl):
++        return NULL
++    return PyTuple_GET_ITEM(tpl, pos[0])
++
++
++cdef object _combine_annotations(ann_one, ann_two, cache):
++    """Combine the annotations from both sides."""
++    cdef Py_ssize_t pos_one, pos_two, len_one, len_two
++    cdef Py_ssize_t out_pos
++    cdef PyObject *temp, *left, *right
++
++    if (PyObject_RichCompareBool(ann_one, ann_two, Py_LT)):
++        cache_key = (ann_one, ann_two)
++    else:
++        cache_key = (ann_two, ann_one)
++    temp = PyDict_GetItem(cache, cache_key)
++    if temp != NULL:
++        return <object>temp
++
++    if not PyTuple_CheckExact(ann_one) or not PyTuple_CheckExact(ann_two):
++        raise TypeError('annotations must be tuples')
++    # We know that annotations are tuples, and that both sides are already
++    # sorted, so we can just walk and update a new list.
++    pos_one = -1
++    pos_two = -1
++    out_pos = 0
++    left = _next_tuple_entry(ann_one, &pos_one)
++    right = _next_tuple_entry(ann_two, &pos_two)
++    new_ann = PyTuple_New(PyTuple_GET_SIZE(ann_one)
++                          + PyTuple_GET_SIZE(ann_two))
++    while left != NULL and right != NULL:
++        # left == right is done by PyObject_RichCompareBool_ptr, however it
++        # avoids a function call for a very common case. Drops 'time bzr
++        # annotate NEWS' from 7.25s to 7.16s, so it *is* a visible impact.
++        if (left == right
++            or PyObject_RichCompareBool_ptr(left, right, Py_EQ)):
++            # Identical values, step both
++            Py_INCREF_ptr(left)
++            PyTuple_SET_ITEM_ptr(new_ann, out_pos, left)
++            left = _next_tuple_entry(ann_one, &pos_one)
++            right = _next_tuple_entry(ann_two, &pos_two)
++        elif (PyObject_RichCompareBool_ptr(left, right, Py_LT)):
++            # left < right or right == NULL
++            Py_INCREF_ptr(left)
++            PyTuple_SET_ITEM_ptr(new_ann, out_pos, left)
++            left = _next_tuple_entry(ann_one, &pos_one)
++        else: # right < left or left == NULL
++            Py_INCREF_ptr(right)
++            PyTuple_SET_ITEM_ptr(new_ann, out_pos, right)
++            right = _next_tuple_entry(ann_two, &pos_two)
++        out_pos = out_pos + 1
++    while left != NULL:
++        Py_INCREF_ptr(left)
++        PyTuple_SET_ITEM_ptr(new_ann, out_pos, left)
++        left = _next_tuple_entry(ann_one, &pos_one)
++        out_pos = out_pos + 1
++    while right != NULL:
++        Py_INCREF_ptr(right)
++        PyTuple_SET_ITEM_ptr(new_ann, out_pos, right)
++        right = _next_tuple_entry(ann_two, &pos_two)
++        out_pos = out_pos + 1
++    if out_pos != PyTuple_GET_SIZE(new_ann):
++        # Timing _PyTuple_Resize was not significantly faster that slicing
++        # PyTuple_Resize((<PyObject **>new_ann), out_pos)
++        new_ann = new_ann[0:out_pos]
++    PyDict_SetItem(cache, cache_key, new_ann)
++    return new_ann
++
++
++cdef int _apply_parent_annotations(annotations, parent_annotations,
++                                   matching_blocks) except -1:
++    """Apply the annotations from parent_annotations into annotations.
++
++    matching_blocks defines the ranges that match.
++    """
++    cdef Py_ssize_t parent_idx, lines_idx, match_len, idx
++    cdef PyListObject *par_list, *ann_list
++    cdef PyObject **par_temp, **ann_temp
++
++    _check_annotations_are_lists(annotations, parent_annotations)
++    par_list = <PyListObject *>parent_annotations
++    ann_list = <PyListObject *>annotations
++    # For NEWS and bzrlib/builtins.py, over 99% of the lines are simply copied
++    # across from the parent entry. So this routine is heavily optimized for
++    # that. Would be interesting if we could use memcpy() but we have to incref
++    # and decref
++    for parent_idx, lines_idx, match_len in matching_blocks:
++        _check_match_ranges(parent_annotations, annotations,
++                            parent_idx, lines_idx, match_len)
++        par_temp = par_list.ob_item + parent_idx
++        ann_temp = ann_list.ob_item + lines_idx
++        for idx from 0 <= idx < match_len:
++            Py_INCREF_ptr(par_temp[idx])
++            Py_DECREF_ptr(ann_temp[idx])
++            ann_temp[idx] = par_temp[idx]
++    return 0
++
++
++cdef int _merge_annotations(this_annotation, annotations, parent_annotations,
++                            matching_blocks, ann_cache) except -1:
++    cdef Py_ssize_t parent_idx, ann_idx, lines_idx, match_len, idx
++    cdef Py_ssize_t pos
++    cdef PyObject *ann_temp, *par_temp
++
++    _check_annotations_are_lists(annotations, parent_annotations)
++    last_ann = None
++    last_parent = None
++    last_res = None
++    for parent_idx, lines_idx, match_len in matching_blocks:
++        _check_match_ranges(parent_annotations, annotations,
++                            parent_idx, lines_idx, match_len)
++        # For lines which match this parent, we will now resolve whether
++        # this parent wins over the current annotation
++        for idx from 0 <= idx < match_len:
++            ann_idx = lines_idx + idx
++            ann_temp = PyList_GET_ITEM(annotations, ann_idx)
++            par_temp = PyList_GET_ITEM(parent_annotations, parent_idx + idx)
++            if (ann_temp == par_temp):
++                # This is parent, do nothing
++                # Pointer comparison is fine here. Value comparison would
++                # be ok, but it will be handled in the final if clause by
++                # merging the two tuples into the same tuple
++                # Avoiding the Py_INCREF and function call to
++                # PyObject_RichCompareBool using pointer comparison drops
++                # timing from 215ms => 125ms
++                continue
++            par_ann = <object>par_temp
++            ann = <object>ann_temp
++            if (ann is this_annotation):
++                # Originally claimed 'this', but it was really in this
++                # parent
++                Py_INCREF(par_ann)
++                PyList_SetItem(annotations, ann_idx, par_ann)
++                continue
++            # Resolve the fact that both sides have a different value for
++            # last modified
++            if (ann is last_ann and par_ann is last_parent):
++                Py_INCREF(last_res)
++                PyList_SetItem(annotations, ann_idx, last_res)
++            else:
++                new_ann = _combine_annotations(ann, par_ann, ann_cache)
++                Py_INCREF(new_ann)
++                PyList_SetItem(annotations, ann_idx, new_ann)
++                last_ann = ann
++                last_parent = par_ann
++                last_res = new_ann
++    return 0
++
++
++class Annotator(_annotator_py.Annotator):
++    """Class that drives performing annotations."""
++
++    def _update_from_first_parent(self, key, annotations, lines, parent_key):
++        """Reannotate this text relative to its first parent."""
++        (parent_annotations,
++         matching_blocks) = self._get_parent_annotations_and_matches(
++                                key, lines, parent_key)
++
++        _apply_parent_annotations(annotations, parent_annotations,
++                                  matching_blocks)
++
++    def _update_from_other_parents(self, key, annotations, lines,
++                                   this_annotation, parent_key):
++        """Reannotate this text relative to a second (or more) parent."""
++        (parent_annotations,
++         matching_blocks) = self._get_parent_annotations_and_matches(
++                                key, lines, parent_key)
++        _merge_annotations(this_annotation, annotations, parent_annotations,
++                           matching_blocks, self._ann_tuple_cache)
++
++    def annotate_flat(self, key):
++        """Determine the single-best-revision to source for each line.
++
++        This is meant as a compatibility thunk to how annotate() used to work.
++        """
++        cdef Py_ssize_t pos, num_lines
++
++        from bzrlib import annotate
++
++        custom_tiebreaker = annotate._break_annotation_tie
++        annotations, lines = self.annotate(key)
++        num_lines = len(lines)
++        out = []
++        heads = self._get_heads_provider().heads
++        for pos from 0 <= pos < num_lines:
++            annotation = annotations[pos]
++            line = lines[pos]
++            if len(annotation) == 1:
++                head = annotation[0]
++            else:
++                the_heads = heads(annotation)
++                if len(the_heads) == 1:
++                    for head in the_heads: break # get the item out of the set
++                else:
++                    # We need to resolve the ambiguity, for now just pick the
++                    # sorted smallest
++                    head = self._resolve_annotation_tie(the_heads, line,
++                                                        custom_tiebreaker)
++            PyList_Append(out, (head, line))
++        return out
 === modified file 'bzrlib/_known_graph_py.py'
 --- bzrlib/_known_graph_py.py	2009-06-19 17:40:59 +0000
 +++ bzrlib/_known_graph_py.py	2009-07-08 23:35:27 +0000
@@ -63,18 +63,12 @@
          - ghosts will have a parent_keys = None,
          - all nodes found will also have .child_keys populated with all known
            child_keys,
--        - self._tails will list all the nodes without parents.
          """
--        tails = self._tails = set()
          nodes = self._nodes
          for key, parent_keys in parent_map.iteritems():
              if key in nodes:
                  node = nodes[key]
                  node.parent_keys = parent_keys
--                if parent_keys:
--                    # This node has been added before being seen in parent_map
--                    # (see below)
--                    tails.remove(node)
              else:
                  node = _KnownGraphNode(key, parent_keys)
                  nodes[key] = node
@@ -84,17 +78,18 @@
                  except KeyError:
                      parent_node = _KnownGraphNode(parent_key, None)
                      nodes[parent_key] = parent_node
--                    # Potentially a tail, if we're wrong we'll remove it later
--                    # (see above)
--                    tails.add(parent_node)
                  parent_node.child_keys.append(key)
++    def _find_tails(self):
++        return [node for node in self._nodes.itervalues()
++                if not node.parent_keys]
++
      def _find_gdfo(self):
          nodes = self._nodes
          known_parent_gdfos = {}
          pending = []
--        for node in self._tails:
++        for node in self._find_tails():
              node.gdfo = 1
              pending.append(node)
@@ -144,9 +139,6 @@
              # No or only one candidate
              return frozenset(candidate_nodes)
          heads_key = frozenset(candidate_nodes)
--        if heads_key != frozenset(keys):
--            # Mention duplicates
--            note('%s != %s', heads_key, frozenset(keys))
          # Do we have a cached result ?
          try:
              heads = self._known_heads[heads_key]
 === modified file 'bzrlib/annotate.py'
 --- bzrlib/annotate.py	2009-04-08 13:13:30 +0000
 +++ bzrlib/annotate.py	2009-07-08 23:35:27 +0000
@@ -1,4 +1,4 @@
--# Copyright (C) 2004, 2005, 2006, 2007 Canonical Ltd
++# Copyright (C) 2004, 2005, 2006, 2007, 2008, 2009 Canonical Ltd
+ #
  # This program is free software; you can redistribute it and/or modify
  # it under the terms of the GNU General Public License as published by
@@ -313,7 +313,9 @@
      return matcher.get_matching_blocks()
--def _break_annotation_tie(annotated_lines):
++_break_annotation_tie = None
++
++def _old_break_annotation_tie(annotated_lines):
      """Chose an attribution between several possible ones.
      :param annotated_lines: A list of tuples ((file_id, rev_id), line) where
@@ -394,7 +396,11 @@
                          # If the result is not stable, there is a risk a
                          # performance degradation as criss-cross merges will
                          # flip-flop the attribution.
--                        output_append(_break_annotation_tie([left, right]))
++                        if _break_annotation_tie is None:
++                            output_append(
++                                _old_break_annotation_tie([left, right]))
++                        else:
++                            output_append(_break_annotation_tie([left, right]))
          last_child_idx = child_idx + match_len
@@ -444,3 +450,9 @@
          # If left and right agree on a range, just push that into the output
          lines_extend(annotated_lines[left_idx:left_idx + match_len])
      return lines
++
++
++try:
++    from bzrlib._annotator_pyx import Annotator
++except ImportError:
++    from bzrlib._annotator_py import Annotator
 === modified file 'bzrlib/branchbuilder.py'
 --- bzrlib/branchbuilder.py	2009-05-07 05:08:46 +0000
 +++ bzrlib/branchbuilder.py	2009-07-08 23:35:27 +0000
@@ -161,7 +161,8 @@
          self._tree = None
      def build_snapshot(self, revision_id, parent_ids, actions,
--        message=None, timestamp=None, allow_leftmost_as_ghost=False):
++        message=None, timestamp=None, allow_leftmost_as_ghost=False,
++        committer=None):
          """Build a commit, shaped in a specific way.
          :param revision_id: The handle for the new commit, can be None
@@ -176,6 +177,7 @@
              commit message will be written.
          :param timestamp: If non-None, set the timestamp of the commit to this
              value.
++        :param committer: An optional username to use for commit
          :param allow_leftmost_as_ghost: True if the leftmost parent should be
              permitted to be a ghost.
          :return: The revision_id of the new commit
@@ -241,7 +243,7 @@
              for file_id, content in new_contents.iteritems():
                  tree.put_file_bytes_non_atomic(file_id, content)
              return self._do_commit(tree, message=message, rev_id=revision_id,
--                timestamp=timestamp)
++                timestamp=timestamp, committer=committer)
          finally:
              tree.unlock()
 === modified file 'bzrlib/groupcompress.py'
 --- bzrlib/groupcompress.py	2009-07-01 10:47:37 +0000
 +++ bzrlib/groupcompress.py	2009-07-08 23:35:27 +0000
@@ -1069,29 +1069,11 @@
      def annotate(self, key):
          """See VersionedFiles.annotate."""
--        graph = Graph(self)
--        parent_map = self.get_parent_map([key])
--        if not parent_map:
--            raise errors.RevisionNotPresent(key, self)
--        if parent_map[key] is not None:
--            parent_map = dict((k, v) for k, v in graph.iter_ancestry([key])
--                              if v is not None)
--            keys = parent_map.keys()
--        else:
--            keys = [key]
--            parent_map = {key:()}
--        # We used Graph(self) to load the parent_map, but now that we have it,
--        # we can just query the parent map directly, so create a KnownGraph
--        heads_provider = _mod_graph.KnownGraph(parent_map)
--        parent_cache = {}
--        reannotate = annotate.reannotate
--        for record in self.get_record_stream(keys, 'topological', True):
--            key = record.key
--            lines = osutils.chunks_to_lines(record.get_bytes_as('chunked'))
--            parent_lines = [parent_cache[parent] for parent in parent_map[key]]
--            parent_cache[key] = list(
--                reannotate(parent_lines, lines, key, None, heads_provider))
--        return parent_cache[key]
++        ann = annotate.Annotator(self)
++        return ann.annotate_flat(key)
++
++    def get_annotator(self):
++        return annotate.Annotator(self)
      def check(self, progress_bar=None):
          """See VersionedFiles.check()."""
 === modified file 'bzrlib/knit.py'
 --- bzrlib/knit.py	2009-06-23 15:27:50 +0000
 +++ bzrlib/knit.py	2009-07-08 23:35:27 +0000
@@ -664,8 +664,6 @@
          see parse_fulltext which this inverts.
          """
--        # TODO: jam 20070209 We only do the caching thing to make sure that
--        #       the origin is a valid utf-8 line, eventually we could remove it
          return ['%s %s' % (o, t) for o, t in content._lines]
      def lower_line_delta(self, delta):
@@ -758,7 +756,7 @@
      def annotate(self, knit, key):
          annotator = _KnitAnnotator(knit)
--        return annotator.annotate(key)
++        return annotator.annotate_flat(key)
@@ -1044,6 +1042,9 @@
          """See VersionedFiles.annotate."""
          return self._factory.annotate(self, key)
++    def get_annotator(self):
++        return _KnitAnnotator(self)
++
      def check(self, progress_bar=None):
          """See VersionedFiles.check()."""
          # This doesn't actually test extraction of everything, but that will
@@ -3336,103 +3337,33 @@
      recommended.
      """
      annotator = _KnitAnnotator(knit)
--    return iter(annotator.annotate(revision_id))
--
--
--class _KnitAnnotator(object):
++    return iter(annotator.annotate_flat(revision_id))
++
++
++class _KnitAnnotator(annotate.Annotator):
      """Build up the annotations for a text."""
--    def __init__(self, knit):
--        self._knit = knit
--
--        # Content objects, differs from fulltexts because of how final newlines
--        # are treated by knits. the content objects here will always have a
--        # final newline
--        self._fulltext_contents = {}
--
--        # Annotated lines of specific revisions
--        self._annotated_lines = {}
--
--        # Track the raw data for nodes that we could not process yet.
--        # This maps the revision_id of the base to a list of children that will
--        # annotated from it.
--        self._pending_children = {}
--
--        # Nodes which cannot be extracted
--        self._ghosts = set()
--
--        # Track how many children this node has, so we know if we need to keep
--        # it
--        self._annotate_children = {}
--        self._compression_children = {}
++    def __init__(self, vf):
++        annotate.Annotator.__init__(self, vf)
++
++        # TODO: handle Nodes which cannot be extracted
++        # self._ghosts = set()
++
++        # Map from (key, parent_key) => matching_blocks, should be 'use once'
++        self._matching_blocks = {}
++
++        # KnitContent objects
++        self._content_objects = {}
++        # The number of children that depend on this fulltext content object
++        self._num_compression_children = {}
++        # Delta records that need their compression parent before they can be
++        # expanded
++        self._pending_deltas = {}
++        # Fulltext records that are waiting for their parents fulltexts before
++        # they can be yielded for annotation
++        self._pending_annotation = {}
          self._all_build_details = {}
--        # The children => parent revision_id graph
--        self._revision_id_graph = {}
--
--        self._heads_provider = None
--
--        self._nodes_to_keep_annotations = set()
--        self._generations_until_keep = 100
--
--    def set_generations_until_keep(self, value):
--        """Set the number of generations before caching a node.
--
--        Setting this to -1 will cache every merge node, setting this higher
--        will cache fewer nodes.
--        """
--        self._generations_until_keep = value
--
--    def _add_fulltext_content(self, revision_id, content_obj):
--        self._fulltext_contents[revision_id] = content_obj
--        # TODO: jam 20080305 It might be good to check the sha1digest here
--        return content_obj.text()
--
--    def _check_parents(self, child, nodes_to_annotate):
--        """Check if all parents have been processed.
--
--        :param child: A tuple of (rev_id, parents, raw_content)
--        :param nodes_to_annotate: If child is ready, add it to
--            nodes_to_annotate, otherwise put it back in self._pending_children
--        """
--        for parent_id in child[1]:
--            if (parent_id not in self._annotated_lines):
--                # This parent is present, but another parent is missing
--                self._pending_children.setdefault(parent_id,
--                                                  []).append(child)
--                break
--        else:
--            # This one is ready to be processed
--            nodes_to_annotate.append(child)
--
--    def _add_annotation(self, revision_id, fulltext, parent_ids,
--                        left_matching_blocks=None):
--        """Add an annotation entry.
--
--        All parents should already have been annotated.
--        :return: A list of children that now have their parents satisfied.
--        """
--        a = self._annotated_lines
--        annotated_parent_lines = [a[p] for p in parent_ids]
--        annotated_lines = list(annotate.reannotate(annotated_parent_lines,
--            fulltext, revision_id, left_matching_blocks,
--            heads_provider=self._get_heads_provider()))
--        self._annotated_lines[revision_id] = annotated_lines
--        for p in parent_ids:
--            ann_children = self._annotate_children[p]
--            ann_children.remove(revision_id)
--            if (not ann_children
--                and p not in self._nodes_to_keep_annotations):
--                del self._annotated_lines[p]
--                del self._all_build_details[p]
--                if p in self._fulltext_contents:
--                    del self._fulltext_contents[p]
--        # Now that we've added this one, see if there are any pending
--        # deltas to be done, certainly this parent is finished
--        nodes_to_annotate = []
--        for child in self._pending_children.pop(revision_id, []):
--            self._check_parents(child, nodes_to_annotate)
--        return nodes_to_annotate
      def _get_build_graph(self, key):
          """Get the graphs for building texts and annotations.
@@ -3446,202 +3377,243 @@
              passing to read_records_iter to start reading in the raw data from
              the pack file.
          """
--        if key in self._annotated_lines:
--            # Nothing to do
--            return []
          pending = set([key])
          records = []
--        generation = 0
--        kept_generation = 0
++        ann_keys = set()
++        self._num_needed_children[key] = 1
          while pending:
              # get all pending nodes
--            generation += 1
              this_iteration = pending
--            build_details = self._knit._index.get_build_details(this_iteration)
++            build_details = self._vf._index.get_build_details(this_iteration)
              self._all_build_details.update(build_details)
--            # new_nodes = self._knit._index._get_entries(this_iteration)
++            # new_nodes = self._vf._index._get_entries(this_iteration)
              pending = set()
              for key, details in build_details.iteritems():
--                (index_memo, compression_parent, parents,
++                (index_memo, compression_parent, parent_keys,
                   record_details) = details
--                self._revision_id_graph[key] = parents
++                self._parent_map[key] = parent_keys
++                self._heads_provider = None
                  records.append((key, index_memo))
                  # Do we actually need to check _annotated_lines?
--                pending.update(p for p in parents
--                                 if p not in self._all_build_details)
++                pending.update([p for p in parent_keys
++                                   if p not in self._all_build_details])
++                if parent_keys:
++                    for parent_key in parent_keys:
++                        if parent_key in self._num_needed_children:
++                            self._num_needed_children[parent_key] += 1
++                        else:
++                            self._num_needed_children[parent_key] = 1
                  if compression_parent:
--                    self._compression_children.setdefault(compression_parent,
--                        []).append(key)
--                if parents:
--                    for parent in parents:
--                        self._annotate_children.setdefault(parent,
--                            []).append(key)
--                    num_gens = generation - kept_generation
--                    if ((num_gens >= self._generations_until_keep)
--                        and len(parents) > 1):
--                        kept_generation = generation
--                        self._nodes_to_keep_annotations.add(key)
++                    if compression_parent in self._num_compression_children:
++                        self._num_compression_children[compression_parent] += 1
++                    else:
++                        self._num_compression_children[compression_parent] = 1
              missing_versions = this_iteration.difference(build_details.keys())
--            self._ghosts.update(missing_versions)
--            for missing_version in missing_versions:
--                # add a key, no parents
--                self._revision_id_graph[missing_version] = ()
--                pending.discard(missing_version) # don't look for it
--        if self._ghosts.intersection(self._compression_children):
--            raise KnitCorrupt(
--                "We cannot have nodes which have a ghost compression parent:\n"
--                "ghosts: %r\n"
--                "compression children: %r"
--                % (self._ghosts, self._compression_children))
--        # Cleanout anything that depends on a ghost so that we don't wait for
--        # the ghost to show up
--        for node in self._ghosts:
--            if node in self._annotate_children:
--                # We won't be building this node
--                del self._annotate_children[node]
++            if missing_versions:
++                for key in missing_versions:
++                    if key in self._parent_map and key in self._text_cache:
++                        # We already have this text ready, we just need to
++                        # yield it later so we get it annotated
++                        ann_keys.add(key)
++                        parent_keys = self._parent_map[key]
++                        for parent_key in parent_keys:
++                            if parent_key in self._num_needed_children:
++                                self._num_needed_children[parent_key] += 1
++                            else:
++                                self._num_needed_children[parent_key] = 1
++                        pending.update([p for p in parent_keys
++                                           if p not in self._all_build_details])
++                    else:
++                        raise errors.RevisionNotPresent(key, self._vf)
          # Generally we will want to read the records in reverse order, because
          # we find the parent nodes after the children
          records.reverse()
--        return records
--
--    def _annotate_records(self, records):
--        """Build the annotations for the listed records."""
++        return records, ann_keys
++
++    def _get_needed_texts(self, key, pb=None):
++        # if True or len(self._vf._fallback_vfs) > 0:
++        if len(self._vf._fallback_vfs) > 0:
++            # If we have fallbacks, go to the generic path
++            for v in annotate.Annotator._get_needed_texts(self, key, pb=pb):
++                yield v
++            return
++        while True:
++            try:
++                records, ann_keys = self._get_build_graph(key)
++                for idx, (sub_key, text, num_lines) in enumerate(
++                                                self._extract_texts(records)):
++                    if pb is not None:
++                        pb.update('annotating', idx, len(records))
++                    yield sub_key, text, num_lines
++                for sub_key in ann_keys:
++                    text = self._text_cache[sub_key]
++                    num_lines = len(text) # bad assumption
++                    yield sub_key, text, num_lines
++                return
++            except errors.RetryWithNewPacks, e:
++                self._vf._access.reload_or_raise(e)
++                # The cached build_details are no longer valid
++                self._all_build_details.clear()
++
++    def _cache_delta_blocks(self, key, compression_parent, delta, lines):
++        parent_lines = self._text_cache[compression_parent]
++        blocks = list(KnitContent.get_line_delta_blocks(delta, parent_lines, lines))
++        self._matching_blocks[(key, compression_parent)] = blocks
++
++    def _expand_record(self, key, parent_keys, compression_parent, record,
++                       record_details):
++        delta = None
++        if compression_parent:
++            if compression_parent not in self._content_objects:
++                # Waiting for the parent
++                self._pending_deltas.setdefault(compression_parent, []).append(
++                    (key, parent_keys, record, record_details))
++                return None
++            # We have the basis parent, so expand the delta
++            num = self._num_compression_children[compression_parent]
++            num -= 1
++            if num == 0:
++                base_content = self._content_objects.pop(compression_parent)
++                self._num_compression_children.pop(compression_parent)
++            else:
++                self._num_compression_children[compression_parent] = num
++                base_content = self._content_objects[compression_parent]
++            # It is tempting to want to copy_base_content=False for the last
++            # child object. However, whenever noeol=False,
++            # self._text_cache[parent_key] is content._lines. So mutating it
++            # gives very bad results.
++            # The alternative is to copy the lines into text cache, but then we
++            # are copying anyway, so just do it here.
++            content, delta = self._vf._factory.parse_record(
++                key, record, record_details, base_content,
++                copy_base_content=True)
++        else:
++            # Fulltext record
++            content, _ = self._vf._factory.parse_record(
++                key, record, record_details, None)
++        if self._num_compression_children.get(key, 0) > 0:
++            self._content_objects[key] = content
++        lines = content.text()
++        self._text_cache[key] = lines
++        if delta is not None:
++            self._cache_delta_blocks(key, compression_parent, delta, lines)
++        return lines
++
++    def _get_parent_annotations_and_matches(self, key, text, parent_key):
++        """Get the list of annotations for the parent, and the matching lines.
++
++        :param text: The opaque value given by _get_needed_texts
++        :param parent_key: The key for the parent text
++        :return: (parent_annotations, matching_blocks)
++            parent_annotations is a list as long as the number of lines in
++                parent
++            matching_blocks is a list of (parent_idx, text_idx, len) tuples
++                indicating which lines match between the two texts
++        """
++        block_key = (key, parent_key)
++        if block_key in self._matching_blocks:
++            blocks = self._matching_blocks.pop(block_key)
++            parent_annotations = self._annotations_cache[parent_key]
++            return parent_annotations, blocks
++        return annotate.Annotator._get_parent_annotations_and_matches(self,
++            key, text, parent_key)
++
++    def _process_pending(self, key):
++        """The content for 'key' was just processed.
++
++        Determine if there is any more pending work to be processed.
++        """
++        to_return = []
++        if key in self._pending_deltas:
++            compression_parent = key
++            children = self._pending_deltas.pop(key)
++            for child_key, parent_keys, record, record_details in children:
++                lines = self._expand_record(child_key, parent_keys,
++                                            compression_parent,
++                                            record, record_details)
++                if self._check_ready_for_annotations(child_key, parent_keys):
++                    to_return.append(child_key)
++        # Also check any children that are waiting for this parent to be
++        # annotation ready
++        if key in self._pending_annotation:
++            children = self._pending_annotation.pop(key)
++            to_return.extend([c for c, p_keys in children
++                              if self._check_ready_for_annotations(c, p_keys)])
++        return to_return
++
++    def _check_ready_for_annotations(self, key, parent_keys):
++        """return true if this text is ready to be yielded.
++
++        Otherwise, this will return False, and queue the text into
++        self._pending_annotation
++        """
++        for parent_key in parent_keys:
++            if parent_key not in self._annotations_cache:
++                # still waiting on at least one parent text, so queue it up
++                # Note that if there are multiple parents, we need to wait
++                # for all of them.
++                self._pending_annotation.setdefault(parent_key,
++                    []).append((key, parent_keys))
++                return False
++        return True
++
++    def _extract_texts(self, records):
++        """Extract the various texts needed based on records"""
          # We iterate in the order read, rather than a strict order requested
          # However, process what we can, and put off to the side things that
          # still need parents, cleaning them up when those parents are
          # processed.
--        for (rev_id, record,
--             digest) in self._knit._read_records_iter(records):
--            if rev_id in self._annotated_lines:
++        # Basic data flow:
++        #   1) As 'records' are read, see if we can expand these records into
++        #      Content objects (and thus lines)
++        #   2) If a given line-delta is waiting on its compression parent, it
++        #      gets queued up into self._pending_deltas, otherwise we expand
++        #      it, and put it into self._text_cache and self._content_objects
++        #   3) If we expanded the text, we will then check to see if all
++        #      parents have also been processed. If so, this text gets yielded,
++        #      else this record gets set aside into pending_annotation
++        #   4) Further, if we expanded the text in (2), we will then check to
++        #      see if there are any children in self._pending_deltas waiting to
++        #      also be processed. If so, we go back to (2) for those
++        #   5) Further again, if we yielded the text, we can then check if that
++        #      'unlocks' any of the texts in pending_annotations, which should
++        #      then get yielded as well
++        # Note that both steps 4 and 5 are 'recursive' in that unlocking one
++        # compression child could unlock yet another, and yielding a fulltext
++        # will also 'unlock' the children that are waiting on that annotation.
++        # (Though also, unlocking 1 parent's fulltext, does not unlock a child
++        # if other parents are also waiting.)
++        # We want to yield content before expanding child content objects, so
++        # that we know when we can re-use the content lines, and the annotation
++        # code can know when it can stop caching fulltexts, as well.
++
++        # Children that are missing their compression parent
++        pending_deltas = {}
++        for (key, record, digest) in self._vf._read_records_iter(records):
++            # ghosts?
++            details = self._all_build_details[key]
++            (_, compression_parent, parent_keys, record_details) = details
++            lines = self._expand_record(key, parent_keys, compression_parent,
++                                        record, record_details)
++            if lines is None:
++                # Pending delta should be queued up
                  continue
--            parent_ids = self._revision_id_graph[rev_id]
--            parent_ids = [p for p in parent_ids if p not in self._ghosts]
--            details = self._all_build_details[rev_id]
--            (index_memo, compression_parent, parents,
--             record_details) = details
--            nodes_to_annotate = []
--            # TODO: Remove the punning between compression parents, and
--            #       parent_ids, we should be able to do this without assuming
--            #       the build order
--            if len(parent_ids) == 0:
--                # There are no parents for this node, so just add it
--                # TODO: This probably needs to be decoupled
--                fulltext_content, delta = self._knit._factory.parse_record(
--                    rev_id, record, record_details, None)
--                fulltext = self._add_fulltext_content(rev_id, fulltext_content)
--                nodes_to_annotate.extend(self._add_annotation(rev_id, fulltext,
--                    parent_ids, left_matching_blocks=None))
--            else:
--                child = (rev_id, parent_ids, record)
--                # Check if all the parents are present
--                self._check_parents(child, nodes_to_annotate)
--            while nodes_to_annotate:
--                # Should we use a queue here instead of a stack?
--                (rev_id, parent_ids, record) = nodes_to_annotate.pop()
--                (index_memo, compression_parent, parents,
--                 record_details) = self._all_build_details[rev_id]
--                blocks = None
--                if compression_parent is not None:
--                    comp_children = self._compression_children[compression_parent]
--                    if rev_id not in comp_children:
--                        raise AssertionError("%r not in compression children %r"
--                            % (rev_id, comp_children))
--                    # If there is only 1 child, it is safe to reuse this
--                    # content
--                    reuse_content = (len(comp_children) == 1
--                        and compression_parent not in
--                            self._nodes_to_keep_annotations)
--                    if reuse_content:
--                        # Remove it from the cache since it will be changing
--                        parent_fulltext_content = self._fulltext_contents.pop(compression_parent)
--                        # Make sure to copy the fulltext since it might be
--                        # modified
--                        parent_fulltext = list(parent_fulltext_content.text())
--                    else:
--                        parent_fulltext_content = self._fulltext_contents[compression_parent]
--                        parent_fulltext = parent_fulltext_content.text()
--                    comp_children.remove(rev_id)
--                    fulltext_content, delta = self._knit._factory.parse_record(
--                        rev_id, record, record_details,
--                        parent_fulltext_content,
--                        copy_base_content=(not reuse_content))
--                    fulltext = self._add_fulltext_content(rev_id,
--                                                          fulltext_content)
--                    if compression_parent == parent_ids[0]:
--                        # the compression_parent is the left parent, so we can
--                        # re-use the delta
--                        blocks = KnitContent.get_line_delta_blocks(delta,
--                                parent_fulltext, fulltext)
--                else:
--                    fulltext_content = self._knit._factory.parse_fulltext(
--                        record, rev_id)
--                    fulltext = self._add_fulltext_content(rev_id,
--                        fulltext_content)
--                nodes_to_annotate.extend(
--                    self._add_annotation(rev_id, fulltext, parent_ids,
--                                     left_matching_blocks=blocks))
--
--    def _get_heads_provider(self):
--        """Create a heads provider for resolving ancestry issues."""
--        if self._heads_provider is not None:
--            return self._heads_provider
--        self._heads_provider = _mod_graph.KnownGraph(self._revision_id_graph)
--        return self._heads_provider
--
--    def annotate(self, key):
--        """Return the annotated fulltext at the given key.
--
--        :param key: The key to annotate.
--        """
--        if len(self._knit._fallback_vfs) > 0:
--            # stacked knits can't use the fast path at present.
--            return self._simple_annotate(key)
--        while True:
--            try:
--                records = self._get_build_graph(key)
--                if key in self._ghosts:
--                    raise errors.RevisionNotPresent(key, self._knit)
--                self._annotate_records(records)
--                return self._annotated_lines[key]
--            except errors.RetryWithNewPacks, e:
--                self._knit._access.reload_or_raise(e)
--                # The cached build_details are no longer valid
--                self._all_build_details.clear()
--
--    def _simple_annotate(self, key):
--        """Return annotated fulltext, rediffing from the full texts.
--
--        This is slow but makes no assumptions about the repository
--        being able to produce line deltas.
--        """
--        # TODO: this code generates a parent maps of present ancestors; it
--        #       could be split out into a separate method
--        #       -- mbp and robertc 20080704
--        graph = _mod_graph.Graph(self._knit)
--        parent_map = dict((k, v) for k, v in graph.iter_ancestry([key])
--                          if v is not None)
--        if not parent_map:
--            raise errors.RevisionNotPresent(key, self)
--        keys = parent_map.keys()
--        heads_provider = _mod_graph.KnownGraph(parent_map)
--        parent_cache = {}
--        reannotate = annotate.reannotate
--        for record in self._knit.get_record_stream(keys, 'topological', True):
--            key = record.key
--            fulltext = osutils.chunks_to_lines(record.get_bytes_as('chunked'))
--            parents = parent_map[key]
--            if parents is not None:
--                parent_lines = [parent_cache[parent] for parent in parent_map[key]]
--            else:
--                parent_lines = []
--            parent_cache[key] = list(
--                reannotate(parent_lines, fulltext, key, None, heads_provider))
--        try:
--            return parent_cache[key]
--        except KeyError, e:
--            raise errors.RevisionNotPresent(key, self._knit)
--
++            # At this point, we may be able to yield this content, if all
++            # parents are also finished
++            yield_this_text = self._check_ready_for_annotations(key,
++                                                                parent_keys)
++            if yield_this_text:
++                # All parents present
++                yield key, lines, len(lines)
++            to_process = self._process_pending(key)
++            while to_process:
++                this_process = to_process
++                to_process = []
++                for key in this_process:
++                    lines = self._text_cache[key]
++                    yield key, lines, len(lines)
++                    to_process.extend(self._process_pending(key))
  try:
      from bzrlib._knit_load_data_c import _load_data_c as _load_data
 === modified file 'bzrlib/revisiontree.py'
 --- bzrlib/revisiontree.py	2009-06-17 03:41:33 +0000
 +++ bzrlib/revisiontree.py	2009-07-08 23:35:27 +0000
@@ -87,7 +87,8 @@
                        default_revision=revision.CURRENT_REVISION):
          """See Tree.annotate_iter"""
          text_key = (file_id, self.inventory[file_id].revision)
--        annotations = self._repository.texts.annotate(text_key)
++        annotator = self._repository.texts.get_annotator()
++        annotations = annotator.annotate_flat(text_key)
          return [(key[-1], line) for key, line in annotations]
      def get_file_size(self, file_id):
 === modified file 'bzrlib/tests/__init__.py'
 --- bzrlib/tests/__init__.py	2009-07-02 11:37:38 +0000
 +++ bzrlib/tests/__init__.py	2009-07-08 23:35:28 +0000
@@ -3344,6 +3344,7 @@
                     'bzrlib.tests.per_repository',
                     'bzrlib.tests.per_repository_chk',
                     'bzrlib.tests.per_repository_reference',
++                   'bzrlib.tests.test__annotator',
                     'bzrlib.tests.test__chk_map',
                     'bzrlib.tests.test__dirstate_helpers',
                     'bzrlib.tests.test__groupcompress',
 === added file 'bzrlib/tests/test__annotator.py'
 --- bzrlib/tests/test__annotator.py	1970-01-01 00:00:00 +0000
 +++ bzrlib/tests/test__annotator.py	2009-07-08 23:35:28 +0000
@@ -0,0 +1,403 @@
++# Copyright (C) 2009 Canonical Ltd
++#
++# This program is free software; you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation; either version 2 of the License, or
++# (at your option) any later version.
++#
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
++# GNU General Public License for more details.
++#
++# You should have received a copy of the GNU General Public License
++# along with this program; if not, write to the Free Software
++# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
++
++"""Tests for Annotators."""
++
++from bzrlib import (
++    annotate,
++    _annotator_py,
++    errors,
++    knit,
++    revision,
++    tests,
++    )
++
++
++def load_tests(standard_tests, module, loader):
++    """Parameterize tests for all versions of groupcompress."""
++    scenarios = [
++        ('python', {'module': _annotator_py}),
++    ]
++    suite = loader.suiteClass()
++    if CompiledAnnotator.available():
++        from bzrlib import _annotator_pyx
++        scenarios.append(('C', {'module': _annotator_pyx}))
++    else:
++        # the compiled module isn't available, so we add a failing test
++        class FailWithoutFeature(tests.TestCase):
++            def test_fail(self):
++                self.requireFeature(CompiledAnnotator)
++        suite.addTest(loader.loadTestsFromTestCase(FailWithoutFeature))
++    result = tests.multiply_tests(standard_tests, scenarios, suite)
++    return result
++
++
++class _CompiledAnnotator(tests.Feature):
++
++    def _probe(self):
++        try:
++            import bzrlib._annotator_pyx
++        except ImportError:
++            return False
++        return True
++
++    def feature_name(self):
++        return 'bzrlib._annotator_pyx'
++
++CompiledAnnotator = _CompiledAnnotator()
++
++
++class TestAnnotator(tests.TestCaseWithMemoryTransport):
++
++    module = None # Set by load_tests
++
++    fa_key = ('f-id', 'a-id')
++    fb_key = ('f-id', 'b-id')
++    fc_key = ('f-id', 'c-id')
++    fd_key = ('f-id', 'd-id')
++    fe_key = ('f-id', 'e-id')
++    ff_key = ('f-id', 'f-id')
++
++    def make_no_graph_texts(self):
++        factory = knit.make_pack_factory(False, False, 2)
++        self.vf = factory(self.get_transport())
++        self.ann = self.module.Annotator(self.vf)
++        self.vf.add_lines(self.fa_key, (), ['simple\n', 'content\n'])
++        self.vf.add_lines(self.fb_key, (), ['simple\n', 'new content\n'])
++
++    def make_simple_text(self):
++        # TODO: all we really need is a VersionedFile instance, we'd like to
++        #       avoid creating all the intermediate stuff
++        factory = knit.make_pack_factory(True, True, 2)
++        self.vf = factory(self.get_transport())
++        # This assumes nothing special happens during __init__, which may be
++        # valid
++        self.ann = self.module.Annotator(self.vf)
++        #  A    'simple|content|'
++        #  |
++        #  B    'simple|new content|'
++        self.vf.add_lines(self.fa_key, [], ['simple\n', 'content\n'])
++        self.vf.add_lines(self.fb_key, [self.fa_key],
++                          ['simple\n', 'new content\n'])
++
++    def make_merge_text(self):
++        self.make_simple_text()
++        #  A    'simple|content|'
++        #  |\
++        #  B |  'simple|new content|'
++        #  | |
++        #  | C  'simple|from c|content|'
++        #  |/
++        #  D    'simple|from c|new content|introduced in merge|'
++        self.vf.add_lines(self.fc_key, [self.fa_key],
++                          ['simple\n', 'from c\n', 'content\n'])
++        self.vf.add_lines(self.fd_key, [self.fb_key, self.fc_key],
++                          ['simple\n', 'from c\n', 'new content\n',
++                           'introduced in merge\n'])
++
++    def make_common_merge_text(self):
++        """Both sides of the merge will have introduced a line."""
++        self.make_simple_text()
++        #  A    'simple|content|'
++        #  |\
++        #  B |  'simple|new content|'
++        #  | |
++        #  | C  'simple|new content|'
++        #  |/
++        #  D    'simple|new content|'
++        self.vf.add_lines(self.fc_key, [self.fa_key],
++                          ['simple\n', 'new content\n'])
++        self.vf.add_lines(self.fd_key, [self.fb_key, self.fc_key],
++                          ['simple\n', 'new content\n'])
++
++    def make_many_way_common_merge_text(self):
++        self.make_simple_text()
++        #  A-.    'simple|content|'
++        #  |\ \
++        #  B | |  'simple|new content|'
++        #  | | |
++        #  | C |  'simple|new content|'
++        #  |/  |
++        #  D   |  'simple|new content|'
++        #  |   |
++        #  |   E  'simple|new content|'
++        #  |  /
++        #  F-'    'simple|new content|'
++        self.vf.add_lines(self.fc_key, [self.fa_key],
++                          ['simple\n', 'new content\n'])
++        self.vf.add_lines(self.fd_key, [self.fb_key, self.fc_key],
++                          ['simple\n', 'new content\n'])
++        self.vf.add_lines(self.fe_key, [self.fa_key],
++                          ['simple\n', 'new content\n'])
++        self.vf.add_lines(self.ff_key, [self.fd_key, self.fe_key],
++                          ['simple\n', 'new content\n'])
++
++    def make_merge_and_restored_text(self):
++        self.make_simple_text()
++        #  A    'simple|content|'
++        #  |\
++        #  B |  'simple|new content|'
++        #  | |
++        #  C |  'simple|content|' # reverted to A
++        #   \|
++        #    D  'simple|content|'
++        # c reverts back to 'a' for the new content line
++        self.vf.add_lines(self.fc_key, [self.fb_key],
++                          ['simple\n', 'content\n'])
++        # d merges 'a' and 'c', to find both claim last modified
++        self.vf.add_lines(self.fd_key, [self.fa_key, self.fc_key],
++                          ['simple\n', 'content\n'])
++
++    def assertAnnotateEqual(self, expected_annotation, key, exp_text=None):
++        annotation, lines = self.ann.annotate(key)
++        self.assertEqual(expected_annotation, annotation)
++        if exp_text is None:
++            record = self.vf.get_record_stream([key], 'unordered', True).next()
++            exp_text = record.get_bytes_as('fulltext')
++        self.assertEqualDiff(exp_text, ''.join(lines))
++
++    def test_annotate_missing(self):
++        self.make_simple_text()
++        self.assertRaises(errors.RevisionNotPresent,
++                          self.ann.annotate, ('not', 'present'))
++
++    def test_annotate_simple(self):
++        self.make_simple_text()
++        self.assertAnnotateEqual([(self.fa_key,)]*2, self.fa_key)
++        self.assertAnnotateEqual([(self.fa_key,), (self.fb_key,)], self.fb_key)
++
++    def test_annotate_merge_text(self):
++        self.make_merge_text()
++        self.assertAnnotateEqual([(self.fa_key,), (self.fc_key,),
++                                  (self.fb_key,), (self.fd_key,)],
++                                 self.fd_key)
++
++    def test_annotate_common_merge_text(self):
++        self.make_common_merge_text()
++        self.assertAnnotateEqual([(self.fa_key,), (self.fb_key, self.fc_key)],
++                                 self.fd_key)
++
++    def test_annotate_many_way_common_merge_text(self):
++        self.make_many_way_common_merge_text()
++        self.assertAnnotateEqual([(self.fa_key,),
++                                  (self.fb_key, self.fc_key, self.fe_key)],
++                                 self.ff_key)
++
++    def test_annotate_merge_and_restored(self):
++        self.make_merge_and_restored_text()
++        self.assertAnnotateEqual([(self.fa_key,), (self.fa_key, self.fc_key)],
++                                 self.fd_key)
++
++    def test_annotate_flat_simple(self):
++        self.make_simple_text()
++        self.assertEqual([(self.fa_key, 'simple\n'),
++                          (self.fa_key, 'content\n'),
++                         ], self.ann.annotate_flat(self.fa_key))
++        self.assertEqual([(self.fa_key, 'simple\n'),
++                          (self.fb_key, 'new content\n'),
++                         ], self.ann.annotate_flat(self.fb_key))
++
++    def test_annotate_flat_merge_and_restored_text(self):
++        self.make_merge_and_restored_text()
++        # fc is a simple dominator of fa
++        self.assertEqual([(self.fa_key, 'simple\n'),
++                          (self.fc_key, 'content\n'),
++                         ], self.ann.annotate_flat(self.fd_key))
++
++    def test_annotate_common_merge_text(self):
++        self.make_common_merge_text()
++        # there is no common point, so we just pick the lexicographical lowest
++        # and 'b-id' comes before 'c-id'
++        self.assertEqual([(self.fa_key, 'simple\n'),
++                          (self.fb_key, 'new content\n'),
++                         ], self.ann.annotate_flat(self.fd_key))
++
++    def test_annotate_many_way_common_merge_text(self):
++        self.make_many_way_common_merge_text()
++        self.assertEqual([(self.fa_key, 'simple\n'),
++                         (self.fb_key, 'new content\n')],
++                         self.ann.annotate_flat(self.ff_key))
++
++    def test_annotate_flat_respects_break_ann_tie(self):
++        tiebreaker = annotate._break_annotation_tie
++        try:
++            calls = []
++            def custom_tiebreaker(annotated_lines):
++                self.assertEqual(2, len(annotated_lines))
++                left = annotated_lines[0]
++                self.assertEqual(2, len(left))
++                self.assertEqual('new content\n', left[1])
++                right = annotated_lines[1]
++                self.assertEqual(2, len(right))
++                self.assertEqual('new content\n', right[1])
++                calls.append((left[0], right[0]))
++                # Our custom tiebreaker takes the *largest* value, rather than
++                # the *smallest* value
++                if left[0] < right[0]:
++                    return right
++                else:
++                    return left
++            annotate._break_annotation_tie = custom_tiebreaker
++            self.make_many_way_common_merge_text()
++            self.assertEqual([(self.fa_key, 'simple\n'),
++                             (self.fe_key, 'new content\n')],
++                             self.ann.annotate_flat(self.ff_key))
++            self.assertEqual([(self.fe_key, self.fc_key),
++                              (self.fe_key, self.fb_key)], calls)
++        finally:
++            annotate._break_annotation_tie = tiebreaker
++
++
++    def test_needed_keys_simple(self):
++        self.make_simple_text()
++        keys, ann_keys = self.ann._get_needed_keys(self.fb_key)
++        self.assertEqual([self.fa_key, self.fb_key], sorted(keys))
++        self.assertEqual({self.fa_key: 1, self.fb_key: 1},
++                         self.ann._num_needed_children)
++        self.assertEqual(set(), ann_keys)
++
++    def test_needed_keys_many(self):
++        self.make_many_way_common_merge_text()
++        keys, ann_keys = self.ann._get_needed_keys(self.ff_key)
++        self.assertEqual([self.fa_key, self.fb_key, self.fc_key,
++                          self.fd_key, self.fe_key, self.ff_key,
++                         ], sorted(keys))
++        self.assertEqual({self.fa_key: 3,
++                          self.fb_key: 1,
++                          self.fc_key: 1,
++                          self.fd_key: 1,
++                          self.fe_key: 1,
++                          self.ff_key: 1,
++                         }, self.ann._num_needed_children)
++        self.assertEqual(set(), ann_keys)
++
++    def test_needed_keys_with_special_text(self):
++        self.make_many_way_common_merge_text()
++        spec_key = ('f-id', revision.CURRENT_REVISION)
++        spec_text = 'simple\nnew content\nlocally modified\n'
++        self.ann.add_special_text(spec_key, [self.fd_key, self.fe_key],
++                                  spec_text)
++        keys, ann_keys = self.ann._get_needed_keys(spec_key)
++        self.assertEqual([self.fa_key, self.fb_key, self.fc_key,
++                          self.fd_key, self.fe_key,
++                         ], sorted(keys))
++        self.assertEqual([spec_key], sorted(ann_keys))
++
++    def test_needed_keys_with_parent_texts(self):
++        self.make_many_way_common_merge_text()
++        # If 'D' and 'E' are already annotated, we don't need to extract all
++        # the texts
++        #  D   |  'simple|new content|'
++        #  |   |
++        #  |   E  'simple|new content|'
++        #  |  /
++        #  F-'    'simple|new content|'
++        self.ann._parent_map[self.fd_key] = (self.fb_key, self.fc_key)
++        self.ann._text_cache[self.fd_key] = ['simple\n', 'new content\n']
++        self.ann._annotations_cache[self.fd_key] = [
++            (self.fa_key,),
++            (self.fb_key, self.fc_key),
++            ]
++        self.ann._parent_map[self.fe_key] = (self.fa_key,)
++        self.ann._text_cache[self.fe_key] = ['simple\n', 'new content\n']
++        self.ann._annotations_cache[self.fe_key] = [
++            (self.fa_key,),
++            (self.fe_key,),
++            ]
++        keys, ann_keys = self.ann._get_needed_keys(self.ff_key)
++        self.assertEqual([self.ff_key], sorted(keys))
++        self.assertEqual({self.fd_key: 1,
++                          self.fe_key: 1,
++                          self.ff_key: 1,
++                         }, self.ann._num_needed_children)
++        self.assertEqual([], sorted(ann_keys))
++
++    def test_record_annotation_removes_texts(self):
++        self.make_many_way_common_merge_text()
++        # Populate the caches
++        for x in self.ann._get_needed_texts(self.ff_key):
++            continue
++        self.assertEqual({self.fa_key: 3,
++                          self.fb_key: 1,
++                          self.fc_key: 1,
++                          self.fd_key: 1,
++                          self.fe_key: 1,
++                          self.ff_key: 1,
++                         }, self.ann._num_needed_children)
++        self.assertEqual([self.fa_key, self.fb_key, self.fc_key,
++                          self.fd_key, self.fe_key, self.ff_key,
++                         ], sorted(self.ann._text_cache.keys()))
++        self.ann._record_annotation(self.fa_key, [], [])
++        self.ann._record_annotation(self.fb_key, [self.fa_key], [])
++        self.assertEqual({self.fa_key: 2,
++                          self.fb_key: 1,
++                          self.fc_key: 1,
++                          self.fd_key: 1,
++                          self.fe_key: 1,
++                          self.ff_key: 1,
++                         }, self.ann._num_needed_children)
++        self.assertTrue(self.fa_key in self.ann._text_cache)
++        self.assertTrue(self.fa_key in self.ann._annotations_cache)
++        self.ann._record_annotation(self.fc_key, [self.fa_key], [])
++        self.ann._record_annotation(self.fd_key, [self.fb_key, self.fc_key], [])
++        self.assertEqual({self.fa_key: 1,
++                          self.fb_key: 0,
++                          self.fc_key: 0,
++                          self.fd_key: 1,
++                          self.fe_key: 1,
++                          self.ff_key: 1,
++                         }, self.ann._num_needed_children)
++        self.assertTrue(self.fa_key in self.ann._text_cache)
++        self.assertTrue(self.fa_key in self.ann._annotations_cache)
++        self.assertFalse(self.fb_key in self.ann._text_cache)
++        self.assertFalse(self.fb_key in self.ann._annotations_cache)
++        self.assertFalse(self.fc_key in self.ann._text_cache)
++        self.assertFalse(self.fc_key in self.ann._annotations_cache)
++
++    def test_annotate_special_text(self):
++        # Things like WT and PreviewTree want to annotate an arbitrary text
++        # ('current:') so we need a way to add that to the group of files to be
++        # annotated.
++        self.make_many_way_common_merge_text()
++        #  A-.    'simple|content|'
++        #  |\ \
++        #  B | |  'simple|new content|'
++        #  | | |
++        #  | C |  'simple|new content|'
++        #  |/  |
++        #  D   |  'simple|new content|'
++        #  |   |
++        #  |   E  'simple|new content|'
++        #  |  /
++        #  SPEC   'simple|new content|locally modified|'
++        spec_key = ('f-id', revision.CURRENT_REVISION)
++        spec_text = 'simple\nnew content\nlocally modified\n'
++        self.ann.add_special_text(spec_key, [self.fd_key, self.fe_key],
++                                  spec_text)
++        self.assertAnnotateEqual([(self.fa_key,),
++                                  (self.fb_key, self.fc_key, self.fe_key),
++                                  (spec_key,),
++                                 ], spec_key,
++                                 exp_text=spec_text)
++
++    def test_no_graph(self):
++        self.make_no_graph_texts()
++        self.assertAnnotateEqual([(self.fa_key,),
++                                  (self.fa_key,),
++                                 ], self.fa_key)
++        self.assertAnnotateEqual([(self.fb_key,),
++                                  (self.fb_key,),
++                                 ], self.fb_key)
 === modified file 'bzrlib/tests/test__known_graph.py'
 --- bzrlib/tests/test__known_graph.py	2009-06-18 19:45:24 +0000
 +++ bzrlib/tests/test__known_graph.py	2009-07-08 23:35:28 +0000
@@ -14,7 +14,7 @@
  # along with this program; if not, write to the Free Software
  # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
--"""Tests for the python and pyrex extensions of groupcompress"""
++"""Tests for the python and pyrex extensions of KnownGraph"""
  from bzrlib import (
      errors,
@@ -63,6 +63,16 @@
  CompiledKnownGraphFeature = _CompiledKnownGraphFeature()
++#  a
++#  |\
++#  b |
++#  | |
++#  c |
++#   \|
++#    d
++alt_merge = {'a': [], 'b': ['a'], 'c': ['b'], 'd': ['a', 'c']}
++
++
  class TestKnownGraph(tests.TestCase):
      module = None # Set by load_tests
@@ -203,6 +213,10 @@
          self.assertEqual(set(['w', 'q']), graph.heads(['w', 's', 'q']))
          self.assertEqual(set(['z']), graph.heads(['s', 'z']))
++    def test_heads_alt_merge(self):
++        graph = self.make_known_graph(alt_merge)
++        self.assertEqual(set(['c']), graph.heads(['a', 'c']))
++
      def test_heads_with_ghost(self):
          graph = self.make_known_graph(test_graph.with_ghost)
          self.assertEqual(set(['e', 'g']), graph.heads(['e', 'g']))
 === modified file 'bzrlib/tests/test_annotate.py'
 --- bzrlib/tests/test_annotate.py	2009-03-23 14:59:43 +0000
 +++ bzrlib/tests/test_annotate.py	2009-07-08 23:35:28 +0000
@@ -176,38 +176,23 @@
+          |
          rev-3
          """
--
--        tree1 = self.make_branch_and_tree('tree1')
--        self.build_tree_contents([('tree1/a', 'first\n')])
--        tree1.add(['a'], ['a-id'])
--        tree1.commit('a', rev_id='rev-1',
--                     committer="joe@foo.com",
--                     timestamp=1166046000.00, timezone=0)
--
--        tree2 = tree1.bzrdir.sprout('tree2').open_workingtree()
--
--        self.build_tree_contents([('tree1/a', 'first\nsecond\n')])
--        tree1.commit('b', rev_id='rev-2',
--                     committer='joe@foo.com',
--                     timestamp=1166046001.00, timezone=0)
--
--        self.build_tree_contents([('tree2/a', 'first\nthird\n')])
--        tree2.commit('c', rev_id='rev-1_1_1',
--                     committer="barry@foo.com",
--                     timestamp=1166046002.00, timezone=0)
--
--        num_conflicts = tree1.merge_from_branch(tree2.branch)
--        self.assertEqual(1, num_conflicts)
--
--        self.build_tree_contents([('tree1/a',
--                                 'first\nsecond\nthird\n')])
--        tree1.set_conflicts(conflicts.ConflictList())
--        tree1.commit('merge 2', rev_id='rev-3',
--                     committer='sal@foo.com',
--                     timestamp=1166046003.00, timezone=0)
--        tree1.lock_read()
--        self.addCleanup(tree1.unlock)
--        return tree1, tree2
++        builder = self.make_branch_builder('branch')
++        builder.start_series()
++        self.addCleanup(builder.finish_series)
++        builder.build_snapshot('rev-1', None, [
++            ('add', ('', 'root-id', 'directory', None)),
++            ('add', ('a', 'a-id', 'file', 'first\n')),
++            ], timestamp=1166046000.00, committer="joe@foo.com")
++        builder.build_snapshot('rev-2', ['rev-1'], [
++            ('modify', ('a-id', 'first\nsecond\n')),
++            ], timestamp=1166046001.00, committer="joe@foo.com")
++        builder.build_snapshot('rev-1_1_1', ['rev-1'], [
++            ('modify', ('a-id', 'first\nthird\n')),
++            ], timestamp=1166046002.00, committer="barry@foo.com")
++        builder.build_snapshot('rev-3', ['rev-2', 'rev-1_1_1'], [
++            ('modify', ('a-id', 'first\nsecond\nthird\n')),
++            ], timestamp=1166046003.00, committer="sal@foo.com")
++        return builder
      def create_deeply_merged_trees(self):
          """Create some trees with a more complex merge history.
@@ -232,69 +217,51 @@
+          |
          rev-6
          """
--        tree1, tree2 = self.create_merged_trees()
--        tree1.unlock()
--
--        tree3 = tree2.bzrdir.sprout('tree3').open_workingtree()
--
--        tree2.commit('noop', rev_id='rev-1_1_2')
--        self.assertEqual(0, tree1.merge_from_branch(tree2.branch))
--        tree1.commit('noop merge', rev_id='rev-4')
--
--        self.build_tree_contents([('tree3/a', 'first\nthird\nfourth\n')])
--        tree3.commit('four', rev_id='rev-1_2_1',
--                     committer='jerry@foo.com',
--                     timestamp=1166046003.00, timezone=0)
--
--        tree4 = tree3.bzrdir.sprout('tree4').open_workingtree()
--
--        tree3.commit('noop', rev_id='rev-1_2_2',
--                     committer='jerry@foo.com',
--                     timestamp=1166046004.00, timezone=0)
--        self.assertEqual(0, tree1.merge_from_branch(tree3.branch))
--        tree1.commit('merge four', rev_id='rev-5')
--
--        self.build_tree_contents([('tree4/a',
--                                   'first\nthird\nfourth\nfifth\nsixth\n')])
--        tree4.commit('five and six', rev_id='rev-1_3_1',
--                     committer='george@foo.com',
--                     timestamp=1166046005.00, timezone=0)
--        self.assertEqual(0, tree1.merge_from_branch(tree4.branch))
--        tree1.commit('merge five and six', rev_id='rev-6')
--        tree1.lock_read()
--        return tree1
++        builder = self.create_merged_trees()
++        builder.build_snapshot('rev-1_1_2', ['rev-1_1_1'], [])
++        builder.build_snapshot('rev-4', ['rev-3', 'rev-1_1_2'], [])
++        builder.build_snapshot('rev-1_2_1', ['rev-1_1_1'], [
++            ('modify', ('a-id', 'first\nthird\nfourth\n')),
++            ], timestamp=1166046003.00, committer="jerry@foo.com")
++        builder.build_snapshot('rev-1_2_2', ['rev-1_2_1'], [],
++            timestamp=1166046004.00, committer="jerry@foo.com")
++        builder.build_snapshot('rev-5', ['rev-4', 'rev-1_2_2'], [
++            ('modify', ('a-id', 'first\nsecond\nthird\nfourth\n')),
++            ], timestamp=1166046004.00, committer="jerry@foo.com")
++        builder.build_snapshot('rev-1_3_1', ['rev-1_2_1'], [
++            ('modify', ('a-id', 'first\nthird\nfourth\nfifth\nsixth\n')),
++            ], timestamp=1166046005.00, committer="george@foo.com")
++        builder.build_snapshot('rev-6', ['rev-5', 'rev-1_3_1'], [
++            ('modify', ('a-id',
++                        'first\nsecond\nthird\nfourth\nfifth\nsixth\n')),
++            ])
++        return builder
      def create_duplicate_lines_tree(self):
--        tree1 = self.make_branch_and_tree('tree1')
++        builder = self.make_branch_builder('branch')
++        builder.start_series()
++        self.addCleanup(builder.finish_series)
          base_text = ''.join(l for r, l in duplicate_base)
          a_text = ''.join(l for r, l in duplicate_A)
          b_text = ''.join(l for r, l in duplicate_B)
          c_text = ''.join(l for r, l in duplicate_C)
          d_text = ''.join(l for r, l in duplicate_D)
          e_text = ''.join(l for r, l in duplicate_E)
--        self.build_tree_contents([('tree1/file', base_text)])
--        tree1.add(['file'], ['file-id'])
--        tree1.commit('base', rev_id='rev-base')
--        tree2 = tree1.bzrdir.sprout('tree2').open_workingtree()
--
--        self.build_tree_contents([('tree1/file', a_text),
--                                  ('tree2/file', b_text)])
--        tree1.commit('A', rev_id='rev-A')
--        tree2.commit('B', rev_id='rev-B')
--
--        tree2.merge_from_branch(tree1.branch)
--        conflicts.resolve(tree2, None) # Resolve the conflicts
--        self.build_tree_contents([('tree2/file', d_text)])
--        tree2.commit('D', rev_id='rev-D')
--
--        self.build_tree_contents([('tree1/file', c_text)])
--        tree1.commit('C', rev_id='rev-C')
--
--        tree1.merge_from_branch(tree2.branch)
--        conflicts.resolve(tree1, None) # Resolve the conflicts
--        self.build_tree_contents([('tree1/file', e_text)])
--        tree1.commit('E', rev_id='rev-E')
--        return tree1
++        builder.build_snapshot('rev-base', None, [
++            ('add', ('', 'root-id', 'directory', None)),
++            ('add', ('file', 'file-id', 'file', base_text)),
++            ])
++        builder.build_snapshot('rev-A', ['rev-base'], [
++            ('modify', ('file-id', a_text))])
++        builder.build_snapshot('rev-B', ['rev-base'], [
++            ('modify', ('file-id', b_text))])
++        builder.build_snapshot('rev-C', ['rev-A'], [
++            ('modify', ('file-id', c_text))])
++        builder.build_snapshot('rev-D', ['rev-B', 'rev-A'], [
++            ('modify', ('file-id', d_text))])
++        builder.build_snapshot('rev-E', ['rev-C', 'rev-D'], [
++            ('modify', ('file-id', e_text))])
++        return builder
      def assertRepoAnnotate(self, expected, repo, file_id, revision_id):
          """Assert that the revision is properly annotated."""
@@ -307,8 +274,8 @@
      def test_annotate_duplicate_lines(self):
          # XXX: Should this be a per_repository test?
--        tree1 = self.create_duplicate_lines_tree()
--        repo = tree1.branch.repository
++        builder = self.create_duplicate_lines_tree()
++        repo = builder.get_branch().repository
          repo.lock_read()
          self.addCleanup(repo.unlock)
          self.assertRepoAnnotate(duplicate_base, repo, 'file-id', 'rev-base')
@@ -319,10 +286,10 @@
          self.assertRepoAnnotate(duplicate_E, repo, 'file-id', 'rev-E')
      def test_annotate_shows_dotted_revnos(self):
--        tree1, tree2 = self.create_merged_trees()
++        builder = self.create_merged_trees()
          sio = StringIO()
--        annotate.annotate_file(tree1.branch, 'rev-3', 'a-id',
++        annotate.annotate_file(builder.get_branch(), 'rev-3', 'a-id',
                                 to_file=sio)
          self.assertEqualDiff('1     joe@foo | first\n'
                               '2     joe@foo | second\n'
@@ -331,10 +298,10 @@
      def test_annotate_limits_dotted_revnos(self):
          """Annotate should limit dotted revnos to a depth of 12"""
--        tree1 = self.create_deeply_merged_trees()
++        builder = self.create_deeply_merged_trees()
          sio = StringIO()
--        annotate.annotate_file(tree1.branch, 'rev-6', 'a-id',
++        annotate.annotate_file(builder.get_branch(), 'rev-6', 'a-id',
                                 to_file=sio, verbose=False, full=False)
          self.assertEqualDiff('1     joe@foo | first\n'
                               '2     joe@foo | second\n'
@@ -345,7 +312,7 @@
                               sio.getvalue())
          sio = StringIO()
--        annotate.annotate_file(tree1.branch, 'rev-6', 'a-id',
++        annotate.annotate_file(builder.get_branch(), 'rev-6', 'a-id',
                                 to_file=sio, verbose=False, full=True)
          self.assertEqualDiff('1     joe@foo | first\n'
                               '2     joe@foo | second\n'
@@ -357,7 +324,7 @@
          # verbose=True shows everything, the full revno, user id, and date
          sio = StringIO()
--        annotate.annotate_file(tree1.branch, 'rev-6', 'a-id',
++        annotate.annotate_file(builder.get_branch(), 'rev-6', 'a-id',
                                 to_file=sio, verbose=True, full=False)
          self.assertEqualDiff('1     joe@foo.com    20061213 | first\n'
                               '2     joe@foo.com    20061213 | second\n'
@@ -368,7 +335,7 @@
                               sio.getvalue())
          sio = StringIO()
--        annotate.annotate_file(tree1.branch, 'rev-6', 'a-id',
++        annotate.annotate_file(builder.get_branch(), 'rev-6', 'a-id',
                                 to_file=sio, verbose=True, full=True)
          self.assertEqualDiff('1     joe@foo.com    20061213 | first\n'
                               '2     joe@foo.com    20061213 | second\n'
@@ -384,10 +351,10 @@
          When annotating a non-mainline revision, the annotation should still
          use dotted revnos from the mainline.
          """
--        tree1 = self.create_deeply_merged_trees()
++        builder = self.create_deeply_merged_trees()
          sio = StringIO()
--        annotate.annotate_file(tree1.branch, 'rev-1_3_1', 'a-id',
++        annotate.annotate_file(builder.get_branch(), 'rev-1_3_1', 'a-id',
                                 to_file=sio, verbose=False, full=False)
          self.assertEqualDiff('1     joe@foo | first\n'
                               '1.1.1 barry@f | third\n'
@@ -397,10 +364,10 @@
                               sio.getvalue())
      def test_annotate_show_ids(self):
--        tree1 = self.create_deeply_merged_trees()
++        builder = self.create_deeply_merged_trees()
          sio = StringIO()
--        annotate.annotate_file(tree1.branch, 'rev-6', 'a-id',
++        annotate.annotate_file(builder.get_branch(), 'rev-6', 'a-id',
                                 to_file=sio, show_ids=True, full=False)
          # It looks better with real revision ids :)
@@ -413,7 +380,7 @@
                               sio.getvalue())
          sio = StringIO()
--        annotate.annotate_file(tree1.branch, 'rev-6', 'a-id',
++        annotate.annotate_file(builder.get_branch(), 'rev-6', 'a-id',
                                 to_file=sio, show_ids=True, full=True)
          self.assertEqualDiff('    rev-1 | first\n'
 === modified file 'bzrlib/tests/test_knit.py'
 --- bzrlib/tests/test_knit.py	2009-06-16 13:57:14 +0000
 +++ bzrlib/tests/test_knit.py	2009-07-08 23:35:28 +0000
@@ -366,16 +366,25 @@
          :return: (versioned_file, reload_counter)
              versioned_file  a KnitVersionedFiles using the packs for access
          """
--        tree = self.make_branch_and_memory_tree('tree')
--        tree.lock_write()
--        self.addCleanup(tree.unlock)
--        tree.add([''], ['root-id'])
--        tree.commit('one', rev_id='rev-1')
--        tree.commit('two', rev_id='rev-2')
--        tree.commit('three', rev_id='rev-3')
++        builder = self.make_branch_builder('.')
++        builder.start_series()
++        builder.build_snapshot('rev-1', None, [
++            ('add', ('', 'root-id', 'directory', None)),
++            ('add', ('file', 'file-id', 'file', 'content\nrev 1\n')),
++            ])
++        builder.build_snapshot('rev-2', ['rev-1'], [
++            ('modify', ('file-id', 'content\nrev 2\n')),
++            ])
++        builder.build_snapshot('rev-3', ['rev-2'], [
++            ('modify', ('file-id', 'content\nrev 3\n')),
++            ])
++        builder.finish_series()
++        b = builder.get_branch()
++        b.lock_write()
++        self.addCleanup(b.unlock)
          # Pack these three revisions into another pack file, but don't remove
          # the originals
--        repo = tree.branch.repository
++        repo = b.repository
          collection = repo._pack_collection
          collection.ensure_loaded()
          orig_packs = collection.packs
@@ -384,7 +393,7 @@
          # forget about the new pack
          collection.reset()
          repo.refresh_data()
--        vf = tree.branch.repository.revisions
++        vf = repo.revisions
          # Set up a reload() function that switches to using the new pack file
          new_index = new_pack.revision_index
          access_tuple = new_pack.access_tuple()
@@ -1313,6 +1322,168 @@
          return _KndxIndex(transport, mapper, lambda:None, allow_writes, lambda:True)
++class Test_KnitAnnotator(TestCaseWithMemoryTransport):
++
++    def make_annotator(self):
++        factory = knit.make_pack_factory(True, True, 1)
++        vf = factory(self.get_transport())
++        return knit._KnitAnnotator(vf)
++
++    def test__expand_fulltext(self):
++        ann = self.make_annotator()
++        rev_key = ('rev-id',)
++        ann._num_compression_children[rev_key] = 1
++        res = ann._expand_record(rev_key, (('parent-id',),), None,
++                           ['line1\n', 'line2\n'], ('fulltext', True))
++        # The content object and text lines should be cached appropriately
++        self.assertEqual(['line1\n', 'line2'], res)
++        content_obj = ann._content_objects[rev_key]
++        self.assertEqual(['line1\n', 'line2\n'], content_obj._lines)
++        self.assertEqual(res, content_obj.text())
++        self.assertEqual(res, ann._text_cache[rev_key])
++
++    def test__expand_delta_comp_parent_not_available(self):
++        # Parent isn't available yet, so we return nothing, but queue up this
++        # node for later processing
++        ann = self.make_annotator()
++        rev_key = ('rev-id',)
++        parent_key = ('parent-id',)
++        record = ['0,1,1\n', 'new-line\n']
++        details = ('line-delta', False)
++        res = ann._expand_record(rev_key, (parent_key,), parent_key,
++                                 record, details)
++        self.assertEqual(None, res)
++        self.assertTrue(parent_key in ann._pending_deltas)
++        pending = ann._pending_deltas[parent_key]
++        self.assertEqual(1, len(pending))
++        self.assertEqual((rev_key, (parent_key,), record, details), pending[0])
++
++    def test__expand_record_tracks_num_children(self):
++        ann = self.make_annotator()
++        rev_key = ('rev-id',)
++        rev2_key = ('rev2-id',)
++        parent_key = ('parent-id',)
++        record = ['0,1,1\n', 'new-line\n']
++        details = ('line-delta', False)
++        ann._num_compression_children[parent_key] = 2
++        ann._expand_record(parent_key, (), None, ['line1\n', 'line2\n'],
++                           ('fulltext', False))
++        res = ann._expand_record(rev_key, (parent_key,), parent_key,
++                                 record, details)
++        self.assertEqual({parent_key: 1}, ann._num_compression_children)
++        # Expanding the second child should remove the content object, and the
++        # num_compression_children entry
++        res = ann._expand_record(rev2_key, (parent_key,), parent_key,
++                                 record, details)
++        self.assertFalse(parent_key in ann._content_objects)
++        self.assertEqual({}, ann._num_compression_children)
++        # We should not cache the content_objects for rev2 and rev, because
++        # they do not have compression children of their own.
++        self.assertEqual({}, ann._content_objects)
++
++    def test__expand_delta_records_blocks(self):
++        ann = self.make_annotator()
++        rev_key = ('rev-id',)
++        parent_key = ('parent-id',)
++        record = ['0,1,1\n', 'new-line\n']
++        details = ('line-delta', True)
++        ann._num_compression_children[parent_key] = 2
++        ann._expand_record(parent_key, (), None,
++                           ['line1\n', 'line2\n', 'line3\n'],
++                           ('fulltext', False))
++        ann._expand_record(rev_key, (parent_key,), parent_key, record, details)
++        self.assertEqual({(rev_key, parent_key): [(1, 1, 1), (3, 3, 0)]},
++                         ann._matching_blocks)
++        rev2_key = ('rev2-id',)
++        record = ['0,1,1\n', 'new-line\n']
++        details = ('line-delta', False)
++        ann._expand_record(rev2_key, (parent_key,), parent_key, record, details)
++        self.assertEqual([(1, 1, 2), (3, 3, 0)],
++                         ann._matching_blocks[(rev2_key, parent_key)])
++
++    def test__get_parent_ann_uses_matching_blocks(self):
++        ann = self.make_annotator()
++        rev_key = ('rev-id',)
++        parent_key = ('parent-id',)
++        parent_ann = [(parent_key,)]*3
++        block_key = (rev_key, parent_key)
++        ann._annotations_cache[parent_key] = parent_ann
++        ann._matching_blocks[block_key] = [(0, 1, 1), (3, 3, 0)]
++        # We should not try to access any parent_lines content, because we know
++        # we already have the matching blocks
++        par_ann, blocks = ann._get_parent_annotations_and_matches(rev_key,
++                                        ['1\n', '2\n', '3\n'], parent_key)
++        self.assertEqual(parent_ann, par_ann)
++        self.assertEqual([(0, 1, 1), (3, 3, 0)], blocks)
++        self.assertEqual({}, ann._matching_blocks)
++
++    def test__process_pending(self):
++        ann = self.make_annotator()
++        rev_key = ('rev-id',)
++        p1_key = ('p1-id',)
++        p2_key = ('p2-id',)
++        record = ['0,1,1\n', 'new-line\n']
++        details = ('line-delta', False)
++        p1_record = ['line1\n', 'line2\n']
++        ann._num_compression_children[p1_key] = 1
++        res = ann._expand_record(rev_key, (p1_key,p2_key), p1_key,
++                                 record, details)
++        self.assertEqual(None, res)
++        # self.assertTrue(p1_key in ann._pending_deltas)
++        self.assertEqual({}, ann._pending_annotation)
++        # Now insert p1, and we should be able to expand the delta
++        res = ann._expand_record(p1_key, (), None, p1_record,
++                                 ('fulltext', False))
++        self.assertEqual(p1_record, res)
++        ann._annotations_cache[p1_key] = [(p1_key,)]*2
++        res = ann._process_pending(p1_key)
++        self.assertEqual([], res)
++        self.assertFalse(p1_key in ann._pending_deltas)
++        self.assertTrue(p2_key in ann._pending_annotation)
++        self.assertEqual({p2_key: [(rev_key, (p1_key, p2_key))]},
++                         ann._pending_annotation)
++        # Now fill in parent 2, and pending annotation should be satisfied
++        res = ann._expand_record(p2_key, (), None, [], ('fulltext', False))
++        ann._annotations_cache[p2_key] = []
++        res = ann._process_pending(p2_key)
++        self.assertEqual([rev_key], res)
++        self.assertEqual({}, ann._pending_annotation)
++        self.assertEqual({}, ann._pending_deltas)
++
++    def test_record_delta_removes_basis(self):
++        ann = self.make_annotator()
++        ann._expand_record(('parent-id',), (), None,
++                           ['line1\n', 'line2\n'], ('fulltext', False))
++        ann._num_compression_children['parent-id'] = 2
++
++    def test_annotate_special_text(self):
++        ann = self.make_annotator()
++        vf = ann._vf
++        rev1_key = ('rev-1',)
++        rev2_key = ('rev-2',)
++        rev3_key = ('rev-3',)
++        spec_key = ('special:',)
++        vf.add_lines(rev1_key, [], ['initial content\n'])
++        vf.add_lines(rev2_key, [rev1_key], ['initial content\n',
++                                            'common content\n',
++                                            'content in 2\n'])
++        vf.add_lines(rev3_key, [rev1_key], ['initial content\n',
++                                            'common content\n',
++                                            'content in 3\n'])
++        spec_text = ('initial content\n'
++                     'common content\n'
++                     'content in 2\n'
++                     'content in 3\n')
++        ann.add_special_text(spec_key, [rev2_key, rev3_key], spec_text)
++        anns, lines = ann.annotate(spec_key)
++        self.assertEqual([(rev1_key,),
++                          (rev2_key, rev3_key),
++                          (rev2_key,),
++                          (rev3_key,),
++                         ], anns)
++        self.assertEqualDiff(spec_text, ''.join(lines))
++
++
  class KnitTests(TestCaseWithTransport):
      """Class containing knit test helper routines."""
 === modified file 'bzrlib/tests/test_versionedfile.py'
 --- bzrlib/tests/test_versionedfile.py	2009-06-22 15:37:06 +0000
 +++ bzrlib/tests/test_versionedfile.py	2009-07-08 23:35:28 +0000
@@ -1557,6 +1557,42 @@
          self.assertRaises(RevisionNotPresent,
              files.annotate, prefix + ('missing-key',))
++    def test_get_annotator(self):
++        files = self.get_versionedfiles()
++        self.get_diamond_files(files)
++        origin_key = self.get_simple_key('origin')
++        base_key = self.get_simple_key('base')
++        left_key = self.get_simple_key('left')
++        right_key = self.get_simple_key('right')
++        merged_key = self.get_simple_key('merged')
++        # annotator = files.get_annotator()
++        # introduced full text
++        origins, lines = files.get_annotator().annotate(origin_key)
++        self.assertEqual([(origin_key,)], origins)
++        self.assertEqual(['origin\n'], lines)
++        # a delta
++        origins, lines = files.get_annotator().annotate(base_key)
++        self.assertEqual([(base_key,)], origins)
++        # a merge
++        origins, lines = files.get_annotator().annotate(merged_key)
++        if self.graph:
++            self.assertEqual([
++                (base_key,),
++                (left_key,),
++                (right_key,),
++                (merged_key,),
++                ], origins)
++        else:
++            # Without a graph everything is new.
++            self.assertEqual([
++                (merged_key,),
++                (merged_key,),
++                (merged_key,),
++                (merged_key,),
++                ], origins)
++        self.assertRaises(RevisionNotPresent,
++            files.get_annotator().annotate, self.get_simple_key('missing-key'))
++
      def test_construct(self):
          """Each parameterised test can be constructed on a transport."""
          files = self.get_versionedfiles()
 === modified file 'bzrlib/tests/workingtree_implementations/__init__.py'
 --- bzrlib/tests/workingtree_implementations/__init__.py	2009-06-10 03:56:49 +0000
 +++ bzrlib/tests/workingtree_implementations/__init__.py	2009-07-08 23:35:28 +0000
@@ -63,6 +63,7 @@
      test_workingtree_implementations = [
          'bzrlib.tests.workingtree_implementations.test_add_reference',
          'bzrlib.tests.workingtree_implementations.test_add',
++        'bzrlib.tests.workingtree_implementations.test_annotate_iter',
          'bzrlib.tests.workingtree_implementations.test_basis_inventory',
          'bzrlib.tests.workingtree_implementations.test_basis_tree',
          'bzrlib.tests.workingtree_implementations.test_break_lock',
 === added file 'bzrlib/tests/workingtree_implementations/test_annotate_iter.py'
 --- bzrlib/tests/workingtree_implementations/test_annotate_iter.py	1970-01-01 00:00:00 +0000
 +++ bzrlib/tests/workingtree_implementations/test_annotate_iter.py	2009-07-08 23:35:28 +0000
@@ -0,0 +1,189 @@
++# Copyright (C) 2009 Canonical Ltd
++#
++# This program is free software; you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation; either version 2 of the License, or
++# (at your option) any later version.
++#
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
++# GNU General Public License for more details.
++#
++# You should have received a copy of the GNU General Public License
++# along with this program; if not, write to the Free Software
++# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
++
++"""Tests for interface conformance of 'WorkingTree.annotate_iter'"""
++
++import os
++
++from bzrlib import (
++    errors,
++    inventory,
++    osutils,
++    tests,
++    )
++from bzrlib.tests.workingtree_implementations import TestCaseWithWorkingTree
++
++
++class TestAnnotateIter(TestCaseWithWorkingTree):
++
++    def make_single_rev_tree(self):
++        builder = self.make_branch_builder('branch')
++        builder.build_snapshot('rev-1', None, [
++            ('add', ('', 'TREE_ROOT', 'directory', None)),
++            ('add', ('file', 'file-id', 'file', 'initial content\n')),
++            ])
++        b = builder.get_branch()
++        tree = b.create_checkout('tree', lightweight=True)
++        tree.lock_read()
++        self.addCleanup(tree.unlock)
++        return tree
++
++    def test_annotate_same_as_parent(self):
++        tree = self.make_single_rev_tree()
++        annotations = tree.annotate_iter('file-id')
++        self.assertEqual([('rev-1', 'initial content\n')],
++                         annotations)
++
++    def test_annotate_mod_from_parent(self):
++        tree = self.make_single_rev_tree()
++        self.build_tree_contents([('tree/file',
++                                   'initial content\nnew content\n')])
++        annotations = tree.annotate_iter('file-id')
++        self.assertEqual([('rev-1', 'initial content\n'),
++                          ('current:', 'new content\n'),
++                         ], annotations)
++
++    def test_annotate_merge_parents(self):
++        builder = self.make_branch_builder('branch')
++        builder.start_series()
++        builder.build_snapshot('rev-1', None, [
++            ('add', ('', 'TREE_ROOT', 'directory', None)),
++            ('add', ('file', 'file-id', 'file', 'initial content\n')),
++            ])
++        builder.build_snapshot('rev-2', ['rev-1'], [
++            ('modify', ('file-id', 'initial content\ncontent in 2\n')),
++            ])
++        builder.build_snapshot('rev-3', ['rev-1'], [
++            ('modify', ('file-id', 'initial content\ncontent in 3\n')),
++            ])
++        builder.finish_series()
++        b = builder.get_branch()
++        tree = b.create_checkout('tree', revision_id='rev-2', lightweight=True)
++        tree.lock_write()
++        self.addCleanup(tree.unlock)
++        tree.set_parent_ids(['rev-2', 'rev-3'])
++        self.build_tree_contents([('tree/file',
++                                   'initial content\ncontent in 2\n'
++                                   'content in 3\nnew content\n')])
++        annotations = tree.annotate_iter('file-id')
++        self.assertEqual([('rev-1', 'initial content\n'),
++                          ('rev-2', 'content in 2\n'),
++                          ('rev-3', 'content in 3\n'),
++                          ('current:', 'new content\n'),
++                         ], annotations)
++
++    def test_annotate_merge_parent_no_file(self):
++        builder = self.make_branch_builder('branch')
++        builder.start_series()
++        builder.build_snapshot('rev-1', None, [
++            ('add', ('', 'TREE_ROOT', 'directory', None)),
++            ])
++        builder.build_snapshot('rev-2', ['rev-1'], [
++            ('add', ('file', 'file-id', 'file', 'initial content\n')),
++            ])
++        builder.build_snapshot('rev-3', ['rev-1'], [])
++        builder.finish_series()
++        b = builder.get_branch()
++        tree = b.create_checkout('tree', revision_id='rev-2', lightweight=True)
++        tree.lock_write()
++        self.addCleanup(tree.unlock)
++        tree.set_parent_ids(['rev-2', 'rev-3'])
++        self.build_tree_contents([('tree/file',
++                                   'initial content\nnew content\n')])
++        annotations = tree.annotate_iter('file-id')
++        self.assertEqual([('rev-2', 'initial content\n'),
++                          ('current:', 'new content\n'),
++                         ], annotations)
++
++    def test_annotate_merge_parent_was_directory(self):
++        builder = self.make_branch_builder('branch')
++        builder.start_series()
++        builder.build_snapshot('rev-1', None, [
++            ('add', ('', 'TREE_ROOT', 'directory', None)),
++            ])
++        builder.build_snapshot('rev-2', ['rev-1'], [
++            ('add', ('file', 'file-id', 'file', 'initial content\n')),
++            ])
++        builder.build_snapshot('rev-3', ['rev-1'], [
++            ('add', ('a_dir', 'file-id', 'directory', None)),
++            ])
++        builder.finish_series()
++        b = builder.get_branch()
++        tree = b.create_checkout('tree', revision_id='rev-2', lightweight=True)
++        tree.lock_write()
++        self.addCleanup(tree.unlock)
++        tree.set_parent_ids(['rev-2', 'rev-3'])
++        self.build_tree_contents([('tree/file',
++                                   'initial content\nnew content\n')])
++        annotations = tree.annotate_iter('file-id')
++        self.assertEqual([('rev-2', 'initial content\n'),
++                          ('current:', 'new content\n'),
++                         ], annotations)
++
++    def test_annotate_same_as_merge_parent(self):
++        builder = self.make_branch_builder('branch')
++        builder.start_series()
++        builder.build_snapshot('rev-1', None, [
++            ('add', ('', 'TREE_ROOT', 'directory', None)),
++            ('add', ('file', 'file-id', 'file', 'initial content\n')),
++            ])
++        builder.build_snapshot('rev-2', ['rev-1'], [
++            ])
++        builder.build_snapshot('rev-3', ['rev-1'], [
++            ('modify', ('file-id', 'initial content\ncontent in 3\n')),
++            ])
++        builder.finish_series()
++        b = builder.get_branch()
++        tree = b.create_checkout('tree', revision_id='rev-2', lightweight=True)
++        tree.lock_write()
++        self.addCleanup(tree.unlock)
++        tree.set_parent_ids(['rev-2', 'rev-3'])
++        self.build_tree_contents([('tree/file',
++                                   'initial content\ncontent in 3\n')])
++        annotations = tree.annotate_iter('file-id')
++        self.assertEqual([('rev-1', 'initial content\n'),
++                          ('rev-3', 'content in 3\n'),
++                         ], annotations)
++
++    def test_annotate_same_as_merge_parent_supersedes(self):
++        builder = self.make_branch_builder('branch')
++        builder.start_series()
++        builder.build_snapshot('rev-1', None, [
++            ('add', ('', 'TREE_ROOT', 'directory', None)),
++            ('add', ('file', 'file-id', 'file', 'initial content\n')),
++            ])
++        builder.build_snapshot('rev-2', ['rev-1'], [
++            ('modify', ('file-id', 'initial content\nnew content\n')),
++            ])
++        builder.build_snapshot('rev-3', ['rev-2'], [
++            ('modify', ('file-id', 'initial content\ncontent in 3\n')),
++            ])
++        builder.build_snapshot('rev-4', ['rev-3'], [
++            ('modify', ('file-id', 'initial content\nnew content\n')),
++            ])
++        # In this case, the content locally is the same as content in basis
++        # tree, but the merge revision states that *it* should win
++        builder.finish_series()
++        b = builder.get_branch()
++        tree = b.create_checkout('tree', revision_id='rev-2', lightweight=True)
++        tree.lock_write()
++        self.addCleanup(tree.unlock)
++        tree.set_parent_ids(['rev-2', 'rev-4'])
++        annotations = tree.annotate_iter('file-id')
++        self.assertEqual([('rev-1', 'initial content\n'),
++                          ('rev-4', 'new content\n'),
++                         ], annotations)
++
 === modified file 'bzrlib/transform.py'
 --- bzrlib/transform.py	2009-06-30 04:08:12 +0000
 +++ bzrlib/transform.py	2009-07-08 23:35:27 +0000
@@ -1,4 +1,4 @@
--# Copyright (C) 2006, 2007, 2008 Canonical Ltd
++# Copyright (C) 2006, 2007, 2008, 2009 Canonical Ltd
+ #
  # This program is free software; you can redistribute it and/or modify
  # it under the terms of the GNU General Public License as published by
@@ -1962,6 +1962,13 @@
              return old_annotation
          if not changed_content:
              return old_annotation
++        # TODO: This is doing something similar to what WT.annotate_iter is
++        #       doing, however it fails slightly because it doesn't know what
++        #       the *other* revision_id is, so it doesn't know how to give the
++        #       other as the origin for some lines, they all get
++        #       'default_revision'
++        #       It would be nice to be able to use the new Annotator based
++        #       approach, as well.
          return annotate.reannotate([old_annotation],
                                     self.get_file(file_id).readlines(),
                                     default_revision)
 === modified file 'bzrlib/versionedfile.py'
 --- bzrlib/versionedfile.py	2009-06-22 15:47:25 +0000
 +++ bzrlib/versionedfile.py	2009-07-08 23:35:28 +0000
@@ -30,6 +30,7 @@
  import urllib
  from bzrlib import (
++    annotate,
      errors,
      groupcompress,
      index,
@@ -1122,6 +1123,9 @@
              result.append((prefix + (origin,), line))
          return result
++    def get_annotator(self):
++        return annotate.Annotator(self)
++
      def check(self, progress_bar=None):
          """See VersionedFiles.check()."""
          for prefix, vf in self._iter_all_components():
 === modified file 'bzrlib/workingtree.py'
 --- bzrlib/workingtree.py	2009-07-01 10:40:07 +0000
 +++ bzrlib/workingtree.py	2009-07-08 23:35:28 +0000
@@ -1,4 +1,4 @@
--# Copyright (C) 2005, 2006, 2007, 2008 Canonical Ltd
++# Copyright (C) 2005, 2006, 2007, 2008, 2009 Canonical Ltd
+ #
  # This program is free software; you can redistribute it and/or modify
  # it under the terms of the GNU General Public License as published by
@@ -58,6 +58,7 @@
      errors,
      generate_ids,
      globbing,
++    graph as _mod_graph,
      hashcache,
      ignores,
      inventory,
@@ -477,31 +478,42 @@
          incorrectly attributed to CURRENT_REVISION (but after committing, the
          attribution will be correct).
          """
--        basis = self.basis_tree()
--        basis.lock_read()
--        try:
--            changes = self.iter_changes(basis, True, [self.id2path(file_id)],
--                require_versioned=True).next()
--            changed_content, kind = changes[2], changes[6]
--            if not changed_content:
--                return basis.annotate_iter(file_id)
--            if kind[1] is None:
--                return None
--            import annotate
--            if kind[0] != 'file':
--                old_lines = []
--            else:
--                old_lines = list(basis.annotate_iter(file_id))
--            old = [old_lines]
--            for tree in self.branch.repository.revision_trees(
--                self.get_parent_ids()[1:]):
--                if file_id not in tree:
--                    continue
--                old.append(list(tree.annotate_iter(file_id)))
--            return annotate.reannotate(old, self.get_file(file_id).readlines(),
--                                       default_revision)
--        finally:
--            basis.unlock()
++        maybe_file_parent_keys = []
++        for parent_id in self.get_parent_ids():
++            try:
++                parent_tree = self.revision_tree(parent_id)
++            except errors.NoSuchRevisionInTree:
++                parent_tree = self.branch.repository.revision_tree(parent_id)
++            parent_tree.lock_read()
++            try:
++                if file_id not in parent_tree:
++                    continue
++                ie = parent_tree.inventory[file_id]
++                if ie.kind != 'file':
++                    # Note: this is slightly unnecessary, because symlinks and
++                    # directories have a "text" which is the empty text, and we
++                    # know that won't mess up annotations. But it seems cleaner
++                    continue
++                parent_text_key = (file_id, ie.revision)
++                if parent_text_key not in maybe_file_parent_keys:
++                    maybe_file_parent_keys.append(parent_text_key)
++            finally:
++                parent_tree.unlock()
++        graph = _mod_graph.Graph(self.branch.repository.texts)
++        heads = graph.heads(maybe_file_parent_keys)
++        file_parent_keys = []
++        for key in maybe_file_parent_keys:
++            if key in heads:
++                file_parent_keys.append(key)
++
++        # Now we have the parents of this content
++        annotator = self.branch.repository.texts.get_annotator()
++        text = self.get_file(file_id).read()
++        this_key =(file_id, default_revision)
++        annotator.add_special_text(this_key, file_parent_keys, text)
++        annotations = [(key[-1], line)
++                       for key, line in annotator.annotate_flat(this_key)]
++        return annotations
      def _get_ancestors(self, default_revision):
          ancestors = set([default_revision])
 === modified file 'setup.py'
 --- setup.py	2009-06-23 07:10:03 +0000
 +++ setup.py	2009-07-08 23:35:27 +0000
@@ -259,6 +259,7 @@
          define_macros=define_macros, libraries=libraries))
++add_pyrex_extension('bzrlib._annotator_pyx')
  add_pyrex_extension('bzrlib._bencode_pyx')
  add_pyrex_extension('bzrlib._btree_serializer_pyx')
  add_pyrex_extension('bzrlib._chunks_to_lines_pyx')