Merge lp:~jameinel/bzr/1.17-rework-annotate into lp:~bzr/bzr/trunk-old

Proposed by John A Meinel
Status: Merged
Merged at revision: not available
Proposed branch: lp:~jameinel/bzr/1.17-rework-annotate
Merge into: lp:~bzr/bzr/trunk-old
Diff against target: 2746 lines
To merge this branch: bzr merge lp:~jameinel/bzr/1.17-rework-annotate
Reviewer Review Type Date Requested Status
Vincent Ladeuil Pending
bzr-core Pending
Review via email: mp+8281@code.launchpad.net
To post a comment you must log in.
Revision history for this message
John A Meinel (jameinel) wrote :

This is a fairly major overhaul of the annotate code, with an eye on improving annotations overall. In the short term, it just makes it faster (~9s => ~7s for NEWS)

Overview of changes
1) Some small tweaks to BranchBuilder that I needed to write some test cases.
2) Changing from a bunch of loose functions in 'bzrlib.annotate.reannotate*' to a class Annotator.
3) Re-implement _KnitAnnotator as an implementation of this class. I didn't change much about how the texts were extracted and then compared, but there is a much better test suite against it now. It also vetted the design a bit, to ensure that the Annotator could be properly subclassed to do specialized extraction. (Knits and knitpack gives us hints as to what our delta should be, so that we don't have to re-delta every text versus all of its parents. Timing shows this to be rather significant.)
4) Implement a pyrex version of Annotator, to handle some inner-loop functionality. (Nicely enough, you can still subclass from it.)
5) This also includes a fairly fundamental change to how the annotation is produced.
   a) Switch from [(ann1, line1), (ann2, line2)] to ([ann1, ann2], [line1, line2]) style of tracking annotations. This means fewer times when we have to re-cast the data from a list of annotated lines into a list of plain lines.
   b) When computing the delta, compare all plain lines first. The prior code used to compare annotated lines, because it made computing overall annotations faster. However, that introduced bug #387294.
   c) Start tracking *multiple* sources for a given line. This means that rather than resolving 'heads()' and collisions at every revision, we wait to do the resolution on the *final* text. This removes a lot of heads() calls for stuff like NEWS (and is the primary performance improvement). I don't think this gets us a lot when dealing with a command line interface, but it has lots of potential for a GUI (which could then show all sources that introduced a line, etc.)

Still to do:
1) I'd like to break it up a bit more, and allow you to pass some sort of Policy object into the code. That would let you do things like ignore whitespace changes, only annotate based on mainline changes, etc.

2) Support _break_annotation_tie again. This is really something I'd like to fold into Policy, but I guess MySQL wanted a custom implementation. I don't really like the current api, as it is probably fixed at 2 revisions, and it passes in the lines along with the revisions. But I can certainly adapt what I need to the old api. Note that technically the api supports > 2, but I doubt that is actually supported anywhere, but I haven't seen the MySQL implementation as a comparison point.

3) GUI support. I don't really know how to expose the Annotator functionality to a GUI, or if the api really works well there until I've actually written some code. However, this patch has gotten too big already, so I'd like to get it reviewed as is.

Revision history for this message
Vincent Ladeuil (vila) wrote :
Download full text (4.5 KiB)

Review: Approve

Good to see more tests in that area !

There is little to comment on given the detailed cover letter, I
like the cleanup (to come ? :) in annotate and the introduction
of the Annotator class, but since you intend to build policy
classes as front-end, why not make the class private until you
feel more confident about the overall API ? Like you, I'm not
sure the GUIs will really need to access that class...

>>>>> "jam" == John A Meinel <email address hidden> writes:

    jam> You have been requested to review the proposed merge of
    jam> lp:~jameinel/bzr/1.17-rework-annotate into lp:bzr.

    jam> This is a fairly major overhaul of the annotate code,
    jam> with an eye on improving annotations overall. In the
    jam> short term, it just makes it faster (~9s => ~7s for
    jam> NEWS)

Which is always good to take (for interested readers that's still
with using --show-ids).

    jam> Overview of changes

    jam> 1) Some small tweaks to BranchBuilder that I needed to
    jam> write some test cases.

Good, obviously the tests you modified are clearer too.

    jam> 2) Changing from a bunch of loose functions in
    jam> bzrlib.annotate.reannotate*' to a class Annotator.

Good step forward.

A couple of comments on the class API:

- _update_needed_children() and _get_needed_keys() sounds like
  good candidates for Graph() or some specialization of it.

- _update_from_one_parent() the doc string says first parent, why
  not call it _update_from_first_parent() then ? Unless you envision
  some other possible usage...

- add_special_text(), hmm, what's that ? The doc string doesn't
  help a lot :-) Does that need to be public ?

    jam> 4) Implement a pyrex version of Annotator, to handle some
    jam> inner-loop functionality. (Nicely enough, you can still subclass
    jam> from it.)

It would be nice to defines in pyrex *only* those inner-loops,
not a requirement to land that patch tough.

<snip/>

    jam> Still to do:
    jam> 1) I'd like to break it up a bit more, and allow you to pass some
    jam> sort of Policy object into the code. That would let you do things
    jam> like ignore whitespace changes, only annotate based on mainline
    jam> changes, etc.

    jam> 2) Support _break_annotation_tie again. This is really something
    jam> I'd like to fold into Policy, but I guess MySQL wanted a custom
    jam> implementation. I don't really like the current api, as it is
    jam> probably fixed at 2 revisions, and it passes in the lines along
    jam> with the revisions. But I can certainly adapt what I need to the
    jam> old api. Note that technically the api supports > 2, but I doubt
    jam> that is actually supported anywhere, but I haven't seen the MySQL
    jam> implementation as a comparison point.

Pretty much:

- extract the date from the revids of the annotations (only the
  first two ones),
- return the oldest

It would be very appreciated to not break the actual result or at
the very least provides a way to get the same functionality
before 1.17 is out.

    jam> 3) GUI support. I don't really know how to expose the Annotator
    jam> functionality to a GUI, or if the api really works well there
    jam...

Read more...

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> There is little to comment on given the detailed cover letter, I
> like the cleanup (to come ? :) in annotate and the introduction
> of the Annotator class, but since you intend to build policy
> classes as front-end, why not make the class private until you
> feel more confident about the overall API ? Like you, I'm not
> sure the GUIs will really need to access that class...

I'm not sure if you understood me correctly. Annotator is *meant* to be
the final api for getting an annotation for various versions of a file.

An AnnotationPolicy is meant to be the way to say "I want to ignore
whitespace", etc.

I can make it hidden, but since:

VF.get_annotator() is meant to be the public api that
qannotate/gannotate will use...

...
>
> Pretty much:
>
> - extract the date from the revids of the annotations (only the
> first two ones),
> - return the oldest
>
> It would be very appreciated to not break the actual result or at
> the very least provides a way to get the same functionality
> before 1.17 is out.

So my idea was to do:

_break_annotation_tie = None

if _break_annotation_tie is not None:
  # mutate the data to fit the old api
else:
  # do it my way

...

> jam> + the_heads = heads(annotation)
> jam> + if len(the_heads) == 1:
> jam> + for head in the_heads:
> jam> + break
>
> That's a really funny way to write head = heads[0]... Do I miss
> something ?

heads is a set, you can't do set()[0], you have to:

list(heads)[0]

iter(heads).next()

heads.pop() # though I believe it is a frozenset and this is illegal

[head for head in heads][0]

for head in heads:
  continue

for head in heads:
  pass

for head in heads:
  break

Take your pick. The last one is the fastest because it evaluates the
iter() inline, and doesn't have a function call. Nor does it build an
intermediate list. And for whatever reason, TIMEIT says that 'break' is
faster than the others.

>
> <snip/>
>
> jam> === added file 'bzrlib/_annotator_pyx.pyx'
> ...
> jam> +class Annotator:
> ...
> jam> + def _update_needed_children(self, key, parent_keys):
> jam> + for parent_key in parent_keys:
> jam> + if parent_key in self._num_needed_children:
> jam> + self._num_needed_children[parent_key] += 1
>
> +=1 ? Hmm, I like that pyrex version you're using, send me some :)

Actually for pyrex 0.9.8 you can even do:

cdef list foo

and then *it* will translate

foo.append(...)

into

PyList_Append(foo, ...)

It would be *really* nice to depend on 0.9.8.5 as it would clean up
certain bits. (Note it only really supports lists, it allows:
  cdef dict foo
  cdef tuple foo

and will do runtime checking, etc, but it doesn't have any smarts about
set_item /get item/append, etc.
)

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpTftEACgkQJdeBCYSNAAM/TgCfdlmsdNtMT2+t+lFRsL3QTgVr
E8AAmwdgejnYx0JK1rWMOIrEBeQJJfSN
=jaUI
-----END PGP SIGNATURE-----

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...
>
> A couple of comments on the class API:
>
> - _update_needed_children() and _get_needed_keys() sounds like
> good candidates for Graph() or some specialization of it.

True, though they also have side effects like removing texts when there
are no more needed children, etc.

>
> - _update_from_one_parent() the doc string says first parent, why
> not call it _update_from_first_parent() then ? Unless you envision
> some other possible usage...

Sure.

>
> - add_special_text(), hmm, what's that ? The doc string doesn't
> help a lot :-) Does that need to be public ?

It does, as it is used by WorkingTree to add the 'current:' text to be
annotated. (One other benefit of this new code is that 'bzr annotate
NEWS' after a merge doesn't annotate both parents independently... \o/)

>
> jam> 4) Implement a pyrex version of Annotator, to handle some
> jam> inner-loop functionality. (Nicely enough, you can still subclass
> jam> from it.)
>
> It would be nice to defines in pyrex *only* those inner-loops,
> not a requirement to land that patch tough.
>

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpThhgACgkQJdeBCYSNAAN5GACgvRr9AoVrnRJ/u4Gd2nPEjAil
mTcAn3q07M6kiM6qaerO6ldZRQSOp0wh
=9LiN
-----END PGP SIGNATURE-----

Revision history for this message
Vincent Ladeuil (vila) wrote :

>>>>> "jam" == John A Meinel <email address hidden> writes:

    >> There is little to comment on given the detailed cover letter, I
    >> like the cleanup (to come ? :) in annotate and the introduction
    >> of the Annotator class, but since you intend to build policy
    >> classes as front-end, why not make the class private until you
    >> feel more confident about the overall API ? Like you, I'm not
    >> sure the GUIs will really need to access that class...

    jam> I'm not sure if you understood me correctly. Annotator
    jam> is *meant* to be the final api for getting an annotation
    jam> for various versions of a file.

Oh ! Indeed, I understood you wanted trees to be the primary interface.

<snip/>

    jam> So my idea was to do:

    jam> _break_annotation_tie = None

    jam> if _break_annotation_tie is not None:
    jam> # mutate the data to fit the old api
    jam> else:
    jam> # do it my way

Fine.

    jam> heads is a set, you can't do set()[0], you have to:

    jam> list(heads)[0]

    jam> iter(heads).next()

    jam> heads.pop() # though I believe it is a frozenset and this is illegal

    jam> [head for head in heads][0]

    jam> for head in heads:
    jam> continue

    jam> for head in heads:
    jam> pass

    jam> for head in heads:
    jam> break

Then I'd go with:

  for head in heads: break # Get head from the set

on a single line to make it more obvious.

That's the first time I see that idiom, I will not be surprised
next time (hopefully, but others can).

<snip/>

    >> +=1 ? Hmm, I like that pyrex version you're using, send me some :)

    jam> Actually for pyrex 0.9.8 you can even do:

Oh ! Yes, jaunty is still at 0.9.7.2 ...

Reading the NEWS about it, I can only agree here.

What is needed to have the package updated ? Host it in the bzr
PPAs ?

     Vincent

Revision history for this message
Vincent Ladeuil (vila) wrote :

>>>>> "jam" == John A Meinel <email address hidden> writes:

    jam> ...
    >>
    >> A couple of comments on the class API:
    >>
    >> - _update_needed_children() and _get_needed_keys() sounds like
    >> good candidates for Graph() or some specialization of it.

    jam> True, though they also have side effects like removing
    jam> texts when there are no more needed children, etc.

I see.

<snip/>

    >> - add_special_text(), hmm, what's that ? The doc string doesn't
    >> help a lot :-) Does that need to be public ?

    jam> It does, as it is used by WorkingTree to add the 'current:' text to be
    jam> annotated.

Haaa ! Worth mentioning then :)

    jam> (One other benefit of this new code is that 'bzr
    jam> annotate NEWS' after a merge doesn't annotate both
    jam> parents independently... \o/)

Hurrah !

       Vincent

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Vincent Ladeuil wrote:
>>>>>> "jam" == John A Meinel <email address hidden> writes:
>
> >> There is little to comment on given the detailed cover letter, I
> >> like the cleanup (to come ? :) in annotate and the introduction
> >> of the Annotator class, but since you intend to build policy
> >> classes as front-end, why not make the class private until you
> >> feel more confident about the overall API ? Like you, I'm not
> >> sure the GUIs will really need to access that class...
>
> jam> I'm not sure if you understood me correctly. Annotator
> jam> is *meant* to be the final api for getting an annotation
> jam> for various versions of a file.
>
> Oh ! Indeed, I understood you wanted trees to be the primary interface.

So I'm trying to resolve the two issues. But a *tree* doesn't talk about
the history of a file. I might add:

 tree.get_annotator(file_id)

That would say something like: "give me an object that can talk about
the history of the file you have."

I need some way for the WT to inject the 'current' version, and have a
default to annotating that tip. I haven't worked out the details yet.

You *could* do this by having a bunch of revision trees for each
revision you want to annotate, and then having the annotator cached at
the VF layer. But it seems better to control the caching lifetime and
parameters in the GUI layer, rather than underneath trees inside VF.

...

>
> Then I'd go with:
>
> for head in heads: break # Get head from the set
>
> on a single line to make it more obvious.
>
> That's the first time I see that idiom, I will not be surprised
> next time (hopefully, but others can).
>
> <snip/>
>
> >> +=1 ? Hmm, I like that pyrex version you're using, send me some :)
>
> jam> Actually for pyrex 0.9.8 you can even do:
>
> Oh ! Yes, jaunty is still at 0.9.7.2 ...
>
> Reading the NEWS about it, I can only agree here.
>
> What is needed to have the package updated ? Host it in the bzr
> PPAs ?
>
> Vincent

I don't really know.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpUqcYACgkQJdeBCYSNAAN8nwCfXd9wjlU4U9L48k7ollH9qFqX
kzoAoJQPLsJPwIy7y2CcrjeyOrQFYYAI
=qWQI
-----END PGP SIGNATURE-----

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file '.bzrignore'
2--- .bzrignore 2009-06-22 12:52:39 +0000
3+++ .bzrignore 2009-07-08 23:35:26 +0000
4@@ -38,6 +38,7 @@
5 ./api
6 doc/**/*.html
7 doc/developers/performance.png
8+bzrlib/_annotator_pyx.c
9 bzrlib/_bencode_pyx.c
10 bzrlib/_btree_serializer_pyx.c
11 bzrlib/_chk_map_pyx.c
12
13=== modified file 'NEWS'
14--- NEWS 2009-07-08 18:05:38 +0000
15+++ NEWS 2009-07-08 23:35:27 +0000
16@@ -48,6 +48,9 @@
17 diverged-branches`` when a push fails because the branches have
18 diverged. (Neil Martinsen-Burrell, #269477)
19
20+* Annotate would sometimes 'latch on' to trivial lines, causing important
21+ lines to be incorrectly annotated. (John Arbash Meinel, #387952)
22+
23 * Automatic format upgrades triggered by default stacking policies on a
24 1.16rc1 (or later) smart server work again.
25 (Andrew Bennetts, #388675)
26@@ -164,7 +167,12 @@
27 Improvements
28 ************
29
30-``bzr ls`` is now faster. On OpenOffice.org, the time drops from 2.4
31+* ``bzr annotate`` can now be significantly faster. The time for
32+ ``bzr annotate NEWS`` is down to 7s from 22s in 1.16. Files with long
33+ histories and lots of 'duplicate insertions' will be improved more than
34+ others. (John Arbash Meinel, Vincent Ladeuil)
35+
36+* ``bzr ls`` is now faster. On OpenOffice.org, the time drops from 2.4
37 to 1.1 seconds. The improvement for ``bzr ls -r-1`` is more
38 substantial dropping from 54.3 to 1.1 seconds. (Ian Clatworthy)
39
40
41=== added file 'bzrlib/_annotator_py.py'
42--- bzrlib/_annotator_py.py 1970-01-01 00:00:00 +0000
43+++ bzrlib/_annotator_py.py 2009-07-08 23:35:27 +0000
44@@ -0,0 +1,309 @@
45+# Copyright (C) 2009 Canonical Ltd
46+#
47+# This program is free software; you can redistribute it and/or modify
48+# it under the terms of the GNU General Public License as published by
49+# the Free Software Foundation; either version 2 of the License, or
50+# (at your option) any later version.
51+#
52+# This program is distributed in the hope that it will be useful,
53+# but WITHOUT ANY WARRANTY; without even the implied warranty of
54+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
55+# GNU General Public License for more details.
56+#
57+# You should have received a copy of the GNU General Public License
58+# along with this program; if not, write to the Free Software
59+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
60+
61+"""Functionality for doing annotations in the 'optimal' way"""
62+
63+from bzrlib.lazy_import import lazy_import
64+lazy_import(globals(), """
65+from bzrlib import annotate # Must be lazy to avoid circular importing
66+""")
67+from bzrlib import (
68+ errors,
69+ graph as _mod_graph,
70+ osutils,
71+ patiencediff,
72+ ui,
73+ )
74+
75+
76+class Annotator(object):
77+ """Class that drives performing annotations."""
78+
79+ def __init__(self, vf):
80+ """Create a new Annotator from a VersionedFile."""
81+ self._vf = vf
82+ self._parent_map = {}
83+ self._text_cache = {}
84+ # Map from key => number of nexts that will be built from this key
85+ self._num_needed_children = {}
86+ self._annotations_cache = {}
87+ self._heads_provider = None
88+ self._ann_tuple_cache = {}
89+
90+ def _update_needed_children(self, key, parent_keys):
91+ for parent_key in parent_keys:
92+ if parent_key in self._num_needed_children:
93+ self._num_needed_children[parent_key] += 1
94+ else:
95+ self._num_needed_children[parent_key] = 1
96+
97+ def _get_needed_keys(self, key):
98+ """Determine the texts we need to get from the backing vf.
99+
100+ :return: (vf_keys_needed, ann_keys_needed)
101+ vf_keys_needed These are keys that we need to get from the vf
102+ ann_keys_needed Texts which we have in self._text_cache but we
103+ don't have annotations for. We need to yield these
104+ in the proper order so that we can get proper
105+ annotations.
106+ """
107+ parent_map = self._parent_map
108+ # We need 1 extra copy of the node we will be looking at when we are
109+ # done
110+ self._num_needed_children[key] = 1
111+ vf_keys_needed = set()
112+ ann_keys_needed = set()
113+ needed_keys = set([key])
114+ while needed_keys:
115+ parent_lookup = []
116+ next_parent_map = {}
117+ for key in needed_keys:
118+ if key in self._parent_map:
119+ # We don't need to lookup this key in the vf
120+ if key not in self._text_cache:
121+ # Extract this text from the vf
122+ vf_keys_needed.add(key)
123+ elif key not in self._annotations_cache:
124+ # We do need to annotate
125+ ann_keys_needed.add(key)
126+ next_parent_map[key] = self._parent_map[key]
127+ else:
128+ parent_lookup.append(key)
129+ vf_keys_needed.add(key)
130+ needed_keys = set()
131+ next_parent_map.update(self._vf.get_parent_map(parent_lookup))
132+ for key, parent_keys in next_parent_map.iteritems():
133+ if parent_keys is None: # No graph versionedfile
134+ parent_keys = ()
135+ next_parent_map[key] = ()
136+ self._update_needed_children(key, parent_keys)
137+ needed_keys.update([key for key in parent_keys
138+ if key not in parent_map])
139+ parent_map.update(next_parent_map)
140+ # _heads_provider does some graph caching, so it is only valid while
141+ # self._parent_map hasn't changed
142+ self._heads_provider = None
143+ return vf_keys_needed, ann_keys_needed
144+
145+ def _get_needed_texts(self, key, pb=None):
146+ """Get the texts we need to properly annotate key.
147+
148+ :param key: A Key that is present in self._vf
149+ :return: Yield (this_key, text, num_lines)
150+ 'text' is an opaque object that just has to work with whatever
151+ matcher object we are using. Currently it is always 'lines' but
152+ future improvements may change this to a simple text string.
153+ """
154+ keys, ann_keys = self._get_needed_keys(key)
155+ if pb is not None:
156+ pb.update('getting stream', 0, len(keys))
157+ stream = self._vf.get_record_stream(keys, 'topological', True)
158+ for idx, record in enumerate(stream):
159+ if pb is not None:
160+ pb.update('extracting', 0, len(keys))
161+ if record.storage_kind == 'absent':
162+ raise errors.RevisionNotPresent(record.key, self._vf)
163+ this_key = record.key
164+ lines = osutils.chunks_to_lines(record.get_bytes_as('chunked'))
165+ num_lines = len(lines)
166+ self._text_cache[this_key] = lines
167+ yield this_key, lines, num_lines
168+ for key in ann_keys:
169+ lines = self._text_cache[key]
170+ num_lines = len(lines)
171+ yield key, lines, num_lines
172+
173+ def _get_parent_annotations_and_matches(self, key, text, parent_key):
174+ """Get the list of annotations for the parent, and the matching lines.
175+
176+ :param text: The opaque value given by _get_needed_texts
177+ :param parent_key: The key for the parent text
178+ :return: (parent_annotations, matching_blocks)
179+ parent_annotations is a list as long as the number of lines in
180+ parent
181+ matching_blocks is a list of (parent_idx, text_idx, len) tuples
182+ indicating which lines match between the two texts
183+ """
184+ parent_lines = self._text_cache[parent_key]
185+ parent_annotations = self._annotations_cache[parent_key]
186+ # PatienceSequenceMatcher should probably be part of Policy
187+ matcher = patiencediff.PatienceSequenceMatcher(None,
188+ parent_lines, text)
189+ matching_blocks = matcher.get_matching_blocks()
190+ return parent_annotations, matching_blocks
191+
192+ def _update_from_first_parent(self, key, annotations, lines, parent_key):
193+ """Reannotate this text relative to its first parent."""
194+ (parent_annotations,
195+ matching_blocks) = self._get_parent_annotations_and_matches(
196+ key, lines, parent_key)
197+
198+ for parent_idx, lines_idx, match_len in matching_blocks:
199+ # For all matching regions we copy across the parent annotations
200+ annotations[lines_idx:lines_idx + match_len] = \
201+ parent_annotations[parent_idx:parent_idx + match_len]
202+
203+ def _update_from_other_parents(self, key, annotations, lines,
204+ this_annotation, parent_key):
205+ """Reannotate this text relative to a second (or more) parent."""
206+ (parent_annotations,
207+ matching_blocks) = self._get_parent_annotations_and_matches(
208+ key, lines, parent_key)
209+
210+ last_ann = None
211+ last_parent = None
212+ last_res = None
213+ # TODO: consider making all annotations unique and then using 'is'
214+ # everywhere. Current results claim that isn't any faster,
215+ # because of the time spent deduping
216+ # deduping also saves a bit of memory. For NEWS it saves ~1MB,
217+ # but that is out of 200-300MB for extracting everything, so a
218+ # fairly trivial amount
219+ for parent_idx, lines_idx, match_len in matching_blocks:
220+ # For lines which match this parent, we will now resolve whether
221+ # this parent wins over the current annotation
222+ ann_sub = annotations[lines_idx:lines_idx + match_len]
223+ par_sub = parent_annotations[parent_idx:parent_idx + match_len]
224+ if ann_sub == par_sub:
225+ continue
226+ for idx in xrange(match_len):
227+ ann = ann_sub[idx]
228+ par_ann = par_sub[idx]
229+ ann_idx = lines_idx + idx
230+ if ann == par_ann:
231+ # Nothing to change
232+ continue
233+ if ann == this_annotation:
234+ # Originally claimed 'this', but it was really in this
235+ # parent
236+ annotations[ann_idx] = par_ann
237+ continue
238+ # Resolve the fact that both sides have a different value for
239+ # last modified
240+ if ann == last_ann and par_ann == last_parent:
241+ annotations[ann_idx] = last_res
242+ else:
243+ new_ann = set(ann)
244+ new_ann.update(par_ann)
245+ new_ann = tuple(sorted(new_ann))
246+ annotations[ann_idx] = new_ann
247+ last_ann = ann
248+ last_parent = par_ann
249+ last_res = new_ann
250+
251+ def _record_annotation(self, key, parent_keys, annotations):
252+ self._annotations_cache[key] = annotations
253+ for parent_key in parent_keys:
254+ num = self._num_needed_children[parent_key]
255+ num -= 1
256+ if num == 0:
257+ del self._text_cache[parent_key]
258+ del self._annotations_cache[parent_key]
259+ # Do we want to clean up _num_needed_children at this point as
260+ # well?
261+ self._num_needed_children[parent_key] = num
262+
263+ def _annotate_one(self, key, text, num_lines):
264+ this_annotation = (key,)
265+ # Note: annotations will be mutated by calls to _update_from*
266+ annotations = [this_annotation] * num_lines
267+ parent_keys = self._parent_map[key]
268+ if parent_keys:
269+ self._update_from_first_parent(key, annotations, text,
270+ parent_keys[0])
271+ for parent in parent_keys[1:]:
272+ self._update_from_other_parents(key, annotations, text,
273+ this_annotation, parent)
274+ self._record_annotation(key, parent_keys, annotations)
275+
276+ def add_special_text(self, key, parent_keys, text):
277+ """Add a specific text to the graph.
278+
279+ This is used to add a text which is not otherwise present in the
280+ versioned file. (eg. a WorkingTree injecting 'current:' into the
281+ graph to annotate the edited content.)
282+
283+ :param key: The key to use to request this text be annotated
284+ :param parent_keys: The parents of this text
285+ :param text: A string containing the content of the text
286+ """
287+ self._parent_map[key] = parent_keys
288+ self._text_cache[key] = osutils.split_lines(text)
289+ self._heads_provider = None
290+
291+ def annotate(self, key):
292+ """Return annotated fulltext for the given key.
293+
294+ :param key: A tuple defining the text to annotate
295+ :return: ([annotations], [lines])
296+ annotations is a list of tuples of keys, one for each line in lines
297+ each key is a possible source for the given line.
298+ lines the text of "key" as a list of lines
299+ """
300+ pb = ui.ui_factory.nested_progress_bar()
301+ try:
302+ for text_key, text, num_lines in self._get_needed_texts(key, pb=pb):
303+ self._annotate_one(text_key, text, num_lines)
304+ finally:
305+ pb.finished()
306+ try:
307+ annotations = self._annotations_cache[key]
308+ except KeyError:
309+ raise errors.RevisionNotPresent(key, self._vf)
310+ return annotations, self._text_cache[key]
311+
312+ def _get_heads_provider(self):
313+ if self._heads_provider is None:
314+ self._heads_provider = _mod_graph.KnownGraph(self._parent_map)
315+ return self._heads_provider
316+
317+ def _resolve_annotation_tie(self, the_heads, line, tiebreaker):
318+ if tiebreaker is None:
319+ head = sorted(the_heads)[0]
320+ else:
321+ # Backwards compatibility, break up the heads into pairs and
322+ # resolve the result
323+ next_head = iter(the_heads)
324+ head = next_head.next()
325+ for possible_head in next_head:
326+ annotated_lines = ((head, line), (possible_head, line))
327+ head = tiebreaker(annotated_lines)[0]
328+ return head
329+
330+ def annotate_flat(self, key):
331+ """Determine the single-best-revision to source for each line.
332+
333+ This is meant as a compatibility thunk to how annotate() used to work.
334+ :return: [(ann_key, line)]
335+ A list of tuples with a single annotation key for each line.
336+ """
337+ custom_tiebreaker = annotate._break_annotation_tie
338+ annotations, lines = self.annotate(key)
339+ out = []
340+ heads = self._get_heads_provider().heads
341+ append = out.append
342+ for annotation, line in zip(annotations, lines):
343+ if len(annotation) == 1:
344+ head = annotation[0]
345+ else:
346+ the_heads = heads(annotation)
347+ if len(the_heads) == 1:
348+ for head in the_heads: break # get the item out of the set
349+ else:
350+ head = self._resolve_annotation_tie(the_heads, line,
351+ custom_tiebreaker)
352+ append((head, line))
353+ return out
354
355=== added file 'bzrlib/_annotator_pyx.pyx'
356--- bzrlib/_annotator_pyx.pyx 1970-01-01 00:00:00 +0000
357+++ bzrlib/_annotator_pyx.pyx 2009-07-08 23:35:27 +0000
358@@ -0,0 +1,287 @@
359+# Copyright (C) 2009 Canonical Ltd
360+#
361+# This program is free software; you can redistribute it and/or modify
362+# it under the terms of the GNU General Public License as published by
363+# the Free Software Foundation; either version 2 of the License, or
364+# (at your option) any later version.
365+#
366+# This program is distributed in the hope that it will be useful,
367+# but WITHOUT ANY WARRANTY; without even the implied warranty of
368+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
369+# GNU General Public License for more details.
370+#
371+# You should have received a copy of the GNU General Public License
372+# along with this program; if not, write to the Free Software
373+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
374+
375+"""Functionality for doing annotations in the 'optimal' way"""
376+
377+cdef extern from "python-compat.h":
378+ pass
379+
380+cdef extern from "Python.h":
381+ ctypedef int Py_ssize_t
382+ ctypedef struct PyObject:
383+ pass
384+ ctypedef struct PyListObject:
385+ PyObject **ob_item
386+ int PyList_CheckExact(object)
387+ PyObject *PyList_GET_ITEM(object, Py_ssize_t o)
388+ Py_ssize_t PyList_GET_SIZE(object)
389+ int PyList_Append(object, object) except -1
390+ int PyList_SetItem(object, Py_ssize_t o, object) except -1
391+ int PyList_Sort(object) except -1
392+
393+ int PyTuple_CheckExact(object)
394+ object PyTuple_New(Py_ssize_t len)
395+ void PyTuple_SET_ITEM(object, Py_ssize_t pos, object)
396+ void PyTuple_SET_ITEM_ptr "PyTuple_SET_ITEM" (object, Py_ssize_t,
397+ PyObject *)
398+ int PyTuple_Resize(PyObject **, Py_ssize_t newlen)
399+ PyObject *PyTuple_GET_ITEM(object, Py_ssize_t o)
400+ Py_ssize_t PyTuple_GET_SIZE(object)
401+
402+ PyObject *PyDict_GetItem(object d, object k)
403+ int PyDict_SetItem(object d, object k, object v) except -1
404+
405+ void Py_INCREF(object)
406+ void Py_INCREF_ptr "Py_INCREF" (PyObject *)
407+ void Py_DECREF_ptr "Py_DECREF" (PyObject *)
408+
409+ int Py_EQ
410+ int Py_LT
411+ int PyObject_RichCompareBool(object, object, int opid) except -1
412+ int PyObject_RichCompareBool_ptr "PyObject_RichCompareBool" (
413+ PyObject *, PyObject *, int opid)
414+
415+
416+from bzrlib import _annotator_py
417+
418+
419+cdef int _check_annotations_are_lists(annotations,
420+ parent_annotations) except -1:
421+ if not PyList_CheckExact(annotations):
422+ raise TypeError('annotations must be a list')
423+ if not PyList_CheckExact(parent_annotations):
424+ raise TypeError('parent_annotations must be a list')
425+ return 0
426+
427+
428+cdef int _check_match_ranges(parent_annotations, annotations,
429+ Py_ssize_t parent_idx, Py_ssize_t lines_idx,
430+ Py_ssize_t match_len) except -1:
431+ if parent_idx + match_len > PyList_GET_SIZE(parent_annotations):
432+ raise ValueError('Match length exceeds len of'
433+ ' parent_annotations %s > %s'
434+ % (parent_idx + match_len,
435+ PyList_GET_SIZE(parent_annotations)))
436+ if lines_idx + match_len > PyList_GET_SIZE(annotations):
437+ raise ValueError('Match length exceeds len of'
438+ ' annotations %s > %s'
439+ % (lines_idx + match_len,
440+ PyList_GET_SIZE(annotations)))
441+ return 0
442+
443+
444+cdef PyObject *_next_tuple_entry(object tpl, Py_ssize_t *pos):
445+ pos[0] = pos[0] + 1
446+ if pos[0] >= PyTuple_GET_SIZE(tpl):
447+ return NULL
448+ return PyTuple_GET_ITEM(tpl, pos[0])
449+
450+
451+cdef object _combine_annotations(ann_one, ann_two, cache):
452+ """Combine the annotations from both sides."""
453+ cdef Py_ssize_t pos_one, pos_two, len_one, len_two
454+ cdef Py_ssize_t out_pos
455+ cdef PyObject *temp, *left, *right
456+
457+ if (PyObject_RichCompareBool(ann_one, ann_two, Py_LT)):
458+ cache_key = (ann_one, ann_two)
459+ else:
460+ cache_key = (ann_two, ann_one)
461+ temp = PyDict_GetItem(cache, cache_key)
462+ if temp != NULL:
463+ return <object>temp
464+
465+ if not PyTuple_CheckExact(ann_one) or not PyTuple_CheckExact(ann_two):
466+ raise TypeError('annotations must be tuples')
467+ # We know that annotations are tuples, and that both sides are already
468+ # sorted, so we can just walk and update a new list.
469+ pos_one = -1
470+ pos_two = -1
471+ out_pos = 0
472+ left = _next_tuple_entry(ann_one, &pos_one)
473+ right = _next_tuple_entry(ann_two, &pos_two)
474+ new_ann = PyTuple_New(PyTuple_GET_SIZE(ann_one)
475+ + PyTuple_GET_SIZE(ann_two))
476+ while left != NULL and right != NULL:
477+ # left == right is done by PyObject_RichCompareBool_ptr, however it
478+ # avoids a function call for a very common case. Drops 'time bzr
479+ # annotate NEWS' from 7.25s to 7.16s, so it *is* a visible impact.
480+ if (left == right
481+ or PyObject_RichCompareBool_ptr(left, right, Py_EQ)):
482+ # Identical values, step both
483+ Py_INCREF_ptr(left)
484+ PyTuple_SET_ITEM_ptr(new_ann, out_pos, left)
485+ left = _next_tuple_entry(ann_one, &pos_one)
486+ right = _next_tuple_entry(ann_two, &pos_two)
487+ elif (PyObject_RichCompareBool_ptr(left, right, Py_LT)):
488+ # left < right or right == NULL
489+ Py_INCREF_ptr(left)
490+ PyTuple_SET_ITEM_ptr(new_ann, out_pos, left)
491+ left = _next_tuple_entry(ann_one, &pos_one)
492+ else: # right < left or left == NULL
493+ Py_INCREF_ptr(right)
494+ PyTuple_SET_ITEM_ptr(new_ann, out_pos, right)
495+ right = _next_tuple_entry(ann_two, &pos_two)
496+ out_pos = out_pos + 1
497+ while left != NULL:
498+ Py_INCREF_ptr(left)
499+ PyTuple_SET_ITEM_ptr(new_ann, out_pos, left)
500+ left = _next_tuple_entry(ann_one, &pos_one)
501+ out_pos = out_pos + 1
502+ while right != NULL:
503+ Py_INCREF_ptr(right)
504+ PyTuple_SET_ITEM_ptr(new_ann, out_pos, right)
505+ right = _next_tuple_entry(ann_two, &pos_two)
506+ out_pos = out_pos + 1
507+ if out_pos != PyTuple_GET_SIZE(new_ann):
508+ # Timing _PyTuple_Resize was not significantly faster that slicing
509+ # PyTuple_Resize((<PyObject **>new_ann), out_pos)
510+ new_ann = new_ann[0:out_pos]
511+ PyDict_SetItem(cache, cache_key, new_ann)
512+ return new_ann
513+
514+
515+cdef int _apply_parent_annotations(annotations, parent_annotations,
516+ matching_blocks) except -1:
517+ """Apply the annotations from parent_annotations into annotations.
518+
519+ matching_blocks defines the ranges that match.
520+ """
521+ cdef Py_ssize_t parent_idx, lines_idx, match_len, idx
522+ cdef PyListObject *par_list, *ann_list
523+ cdef PyObject **par_temp, **ann_temp
524+
525+ _check_annotations_are_lists(annotations, parent_annotations)
526+ par_list = <PyListObject *>parent_annotations
527+ ann_list = <PyListObject *>annotations
528+ # For NEWS and bzrlib/builtins.py, over 99% of the lines are simply copied
529+ # across from the parent entry. So this routine is heavily optimized for
530+ # that. Would be interesting if we could use memcpy() but we have to incref
531+ # and decref
532+ for parent_idx, lines_idx, match_len in matching_blocks:
533+ _check_match_ranges(parent_annotations, annotations,
534+ parent_idx, lines_idx, match_len)
535+ par_temp = par_list.ob_item + parent_idx
536+ ann_temp = ann_list.ob_item + lines_idx
537+ for idx from 0 <= idx < match_len:
538+ Py_INCREF_ptr(par_temp[idx])
539+ Py_DECREF_ptr(ann_temp[idx])
540+ ann_temp[idx] = par_temp[idx]
541+ return 0
542+
543+
544+cdef int _merge_annotations(this_annotation, annotations, parent_annotations,
545+ matching_blocks, ann_cache) except -1:
546+ cdef Py_ssize_t parent_idx, ann_idx, lines_idx, match_len, idx
547+ cdef Py_ssize_t pos
548+ cdef PyObject *ann_temp, *par_temp
549+
550+ _check_annotations_are_lists(annotations, parent_annotations)
551+ last_ann = None
552+ last_parent = None
553+ last_res = None
554+ for parent_idx, lines_idx, match_len in matching_blocks:
555+ _check_match_ranges(parent_annotations, annotations,
556+ parent_idx, lines_idx, match_len)
557+ # For lines which match this parent, we will now resolve whether
558+ # this parent wins over the current annotation
559+ for idx from 0 <= idx < match_len:
560+ ann_idx = lines_idx + idx
561+ ann_temp = PyList_GET_ITEM(annotations, ann_idx)
562+ par_temp = PyList_GET_ITEM(parent_annotations, parent_idx + idx)
563+ if (ann_temp == par_temp):
564+ # This is parent, do nothing
565+ # Pointer comparison is fine here. Value comparison would
566+ # be ok, but it will be handled in the final if clause by
567+ # merging the two tuples into the same tuple
568+ # Avoiding the Py_INCREF and function call to
569+ # PyObject_RichCompareBool using pointer comparison drops
570+ # timing from 215ms => 125ms
571+ continue
572+ par_ann = <object>par_temp
573+ ann = <object>ann_temp
574+ if (ann is this_annotation):
575+ # Originally claimed 'this', but it was really in this
576+ # parent
577+ Py_INCREF(par_ann)
578+ PyList_SetItem(annotations, ann_idx, par_ann)
579+ continue
580+ # Resolve the fact that both sides have a different value for
581+ # last modified
582+ if (ann is last_ann and par_ann is last_parent):
583+ Py_INCREF(last_res)
584+ PyList_SetItem(annotations, ann_idx, last_res)
585+ else:
586+ new_ann = _combine_annotations(ann, par_ann, ann_cache)
587+ Py_INCREF(new_ann)
588+ PyList_SetItem(annotations, ann_idx, new_ann)
589+ last_ann = ann
590+ last_parent = par_ann
591+ last_res = new_ann
592+ return 0
593+
594+
595+class Annotator(_annotator_py.Annotator):
596+ """Class that drives performing annotations."""
597+
598+ def _update_from_first_parent(self, key, annotations, lines, parent_key):
599+ """Reannotate this text relative to its first parent."""
600+ (parent_annotations,
601+ matching_blocks) = self._get_parent_annotations_and_matches(
602+ key, lines, parent_key)
603+
604+ _apply_parent_annotations(annotations, parent_annotations,
605+ matching_blocks)
606+
607+ def _update_from_other_parents(self, key, annotations, lines,
608+ this_annotation, parent_key):
609+ """Reannotate this text relative to a second (or more) parent."""
610+ (parent_annotations,
611+ matching_blocks) = self._get_parent_annotations_and_matches(
612+ key, lines, parent_key)
613+ _merge_annotations(this_annotation, annotations, parent_annotations,
614+ matching_blocks, self._ann_tuple_cache)
615+
616+ def annotate_flat(self, key):
617+ """Determine the single-best-revision to source for each line.
618+
619+ This is meant as a compatibility thunk to how annotate() used to work.
620+ """
621+ cdef Py_ssize_t pos, num_lines
622+
623+ from bzrlib import annotate
624+
625+ custom_tiebreaker = annotate._break_annotation_tie
626+ annotations, lines = self.annotate(key)
627+ num_lines = len(lines)
628+ out = []
629+ heads = self._get_heads_provider().heads
630+ for pos from 0 <= pos < num_lines:
631+ annotation = annotations[pos]
632+ line = lines[pos]
633+ if len(annotation) == 1:
634+ head = annotation[0]
635+ else:
636+ the_heads = heads(annotation)
637+ if len(the_heads) == 1:
638+ for head in the_heads: break # get the item out of the set
639+ else:
640+ # We need to resolve the ambiguity, for now just pick the
641+ # sorted smallest
642+ head = self._resolve_annotation_tie(the_heads, line,
643+ custom_tiebreaker)
644+ PyList_Append(out, (head, line))
645+ return out
646
647=== modified file 'bzrlib/_known_graph_py.py'
648--- bzrlib/_known_graph_py.py 2009-06-19 17:40:59 +0000
649+++ bzrlib/_known_graph_py.py 2009-07-08 23:35:27 +0000
650@@ -63,18 +63,12 @@
651 - ghosts will have a parent_keys = None,
652 - all nodes found will also have .child_keys populated with all known
653 child_keys,
654- - self._tails will list all the nodes without parents.
655 """
656- tails = self._tails = set()
657 nodes = self._nodes
658 for key, parent_keys in parent_map.iteritems():
659 if key in nodes:
660 node = nodes[key]
661 node.parent_keys = parent_keys
662- if parent_keys:
663- # This node has been added before being seen in parent_map
664- # (see below)
665- tails.remove(node)
666 else:
667 node = _KnownGraphNode(key, parent_keys)
668 nodes[key] = node
669@@ -84,17 +78,18 @@
670 except KeyError:
671 parent_node = _KnownGraphNode(parent_key, None)
672 nodes[parent_key] = parent_node
673- # Potentially a tail, if we're wrong we'll remove it later
674- # (see above)
675- tails.add(parent_node)
676 parent_node.child_keys.append(key)
677
678+ def _find_tails(self):
679+ return [node for node in self._nodes.itervalues()
680+ if not node.parent_keys]
681+
682 def _find_gdfo(self):
683 nodes = self._nodes
684 known_parent_gdfos = {}
685 pending = []
686
687- for node in self._tails:
688+ for node in self._find_tails():
689 node.gdfo = 1
690 pending.append(node)
691
692@@ -144,9 +139,6 @@
693 # No or only one candidate
694 return frozenset(candidate_nodes)
695 heads_key = frozenset(candidate_nodes)
696- if heads_key != frozenset(keys):
697- # Mention duplicates
698- note('%s != %s', heads_key, frozenset(keys))
699 # Do we have a cached result ?
700 try:
701 heads = self._known_heads[heads_key]
702
703=== modified file 'bzrlib/annotate.py'
704--- bzrlib/annotate.py 2009-04-08 13:13:30 +0000
705+++ bzrlib/annotate.py 2009-07-08 23:35:27 +0000
706@@ -1,4 +1,4 @@
707-# Copyright (C) 2004, 2005, 2006, 2007 Canonical Ltd
708+# Copyright (C) 2004, 2005, 2006, 2007, 2008, 2009 Canonical Ltd
709 #
710 # This program is free software; you can redistribute it and/or modify
711 # it under the terms of the GNU General Public License as published by
712@@ -313,7 +313,9 @@
713 return matcher.get_matching_blocks()
714
715
716-def _break_annotation_tie(annotated_lines):
717+_break_annotation_tie = None
718+
719+def _old_break_annotation_tie(annotated_lines):
720 """Chose an attribution between several possible ones.
721
722 :param annotated_lines: A list of tuples ((file_id, rev_id), line) where
723@@ -394,7 +396,11 @@
724 # If the result is not stable, there is a risk a
725 # performance degradation as criss-cross merges will
726 # flip-flop the attribution.
727- output_append(_break_annotation_tie([left, right]))
728+ if _break_annotation_tie is None:
729+ output_append(
730+ _old_break_annotation_tie([left, right]))
731+ else:
732+ output_append(_break_annotation_tie([left, right]))
733 last_child_idx = child_idx + match_len
734
735
736@@ -444,3 +450,9 @@
737 # If left and right agree on a range, just push that into the output
738 lines_extend(annotated_lines[left_idx:left_idx + match_len])
739 return lines
740+
741+
742+try:
743+ from bzrlib._annotator_pyx import Annotator
744+except ImportError:
745+ from bzrlib._annotator_py import Annotator
746
747=== modified file 'bzrlib/branchbuilder.py'
748--- bzrlib/branchbuilder.py 2009-05-07 05:08:46 +0000
749+++ bzrlib/branchbuilder.py 2009-07-08 23:35:27 +0000
750@@ -161,7 +161,8 @@
751 self._tree = None
752
753 def build_snapshot(self, revision_id, parent_ids, actions,
754- message=None, timestamp=None, allow_leftmost_as_ghost=False):
755+ message=None, timestamp=None, allow_leftmost_as_ghost=False,
756+ committer=None):
757 """Build a commit, shaped in a specific way.
758
759 :param revision_id: The handle for the new commit, can be None
760@@ -176,6 +177,7 @@
761 commit message will be written.
762 :param timestamp: If non-None, set the timestamp of the commit to this
763 value.
764+ :param committer: An optional username to use for commit
765 :param allow_leftmost_as_ghost: True if the leftmost parent should be
766 permitted to be a ghost.
767 :return: The revision_id of the new commit
768@@ -241,7 +243,7 @@
769 for file_id, content in new_contents.iteritems():
770 tree.put_file_bytes_non_atomic(file_id, content)
771 return self._do_commit(tree, message=message, rev_id=revision_id,
772- timestamp=timestamp)
773+ timestamp=timestamp, committer=committer)
774 finally:
775 tree.unlock()
776
777
778=== modified file 'bzrlib/groupcompress.py'
779--- bzrlib/groupcompress.py 2009-07-01 10:47:37 +0000
780+++ bzrlib/groupcompress.py 2009-07-08 23:35:27 +0000
781@@ -1069,29 +1069,11 @@
782
783 def annotate(self, key):
784 """See VersionedFiles.annotate."""
785- graph = Graph(self)
786- parent_map = self.get_parent_map([key])
787- if not parent_map:
788- raise errors.RevisionNotPresent(key, self)
789- if parent_map[key] is not None:
790- parent_map = dict((k, v) for k, v in graph.iter_ancestry([key])
791- if v is not None)
792- keys = parent_map.keys()
793- else:
794- keys = [key]
795- parent_map = {key:()}
796- # We used Graph(self) to load the parent_map, but now that we have it,
797- # we can just query the parent map directly, so create a KnownGraph
798- heads_provider = _mod_graph.KnownGraph(parent_map)
799- parent_cache = {}
800- reannotate = annotate.reannotate
801- for record in self.get_record_stream(keys, 'topological', True):
802- key = record.key
803- lines = osutils.chunks_to_lines(record.get_bytes_as('chunked'))
804- parent_lines = [parent_cache[parent] for parent in parent_map[key]]
805- parent_cache[key] = list(
806- reannotate(parent_lines, lines, key, None, heads_provider))
807- return parent_cache[key]
808+ ann = annotate.Annotator(self)
809+ return ann.annotate_flat(key)
810+
811+ def get_annotator(self):
812+ return annotate.Annotator(self)
813
814 def check(self, progress_bar=None):
815 """See VersionedFiles.check()."""
816
817=== modified file 'bzrlib/knit.py'
818--- bzrlib/knit.py 2009-06-23 15:27:50 +0000
819+++ bzrlib/knit.py 2009-07-08 23:35:27 +0000
820@@ -664,8 +664,6 @@
821
822 see parse_fulltext which this inverts.
823 """
824- # TODO: jam 20070209 We only do the caching thing to make sure that
825- # the origin is a valid utf-8 line, eventually we could remove it
826 return ['%s %s' % (o, t) for o, t in content._lines]
827
828 def lower_line_delta(self, delta):
829@@ -758,7 +756,7 @@
830
831 def annotate(self, knit, key):
832 annotator = _KnitAnnotator(knit)
833- return annotator.annotate(key)
834+ return annotator.annotate_flat(key)
835
836
837
838@@ -1044,6 +1042,9 @@
839 """See VersionedFiles.annotate."""
840 return self._factory.annotate(self, key)
841
842+ def get_annotator(self):
843+ return _KnitAnnotator(self)
844+
845 def check(self, progress_bar=None):
846 """See VersionedFiles.check()."""
847 # This doesn't actually test extraction of everything, but that will
848@@ -3336,103 +3337,33 @@
849 recommended.
850 """
851 annotator = _KnitAnnotator(knit)
852- return iter(annotator.annotate(revision_id))
853-
854-
855-class _KnitAnnotator(object):
856+ return iter(annotator.annotate_flat(revision_id))
857+
858+
859+class _KnitAnnotator(annotate.Annotator):
860 """Build up the annotations for a text."""
861
862- def __init__(self, knit):
863- self._knit = knit
864-
865- # Content objects, differs from fulltexts because of how final newlines
866- # are treated by knits. the content objects here will always have a
867- # final newline
868- self._fulltext_contents = {}
869-
870- # Annotated lines of specific revisions
871- self._annotated_lines = {}
872-
873- # Track the raw data for nodes that we could not process yet.
874- # This maps the revision_id of the base to a list of children that will
875- # annotated from it.
876- self._pending_children = {}
877-
878- # Nodes which cannot be extracted
879- self._ghosts = set()
880-
881- # Track how many children this node has, so we know if we need to keep
882- # it
883- self._annotate_children = {}
884- self._compression_children = {}
885+ def __init__(self, vf):
886+ annotate.Annotator.__init__(self, vf)
887+
888+ # TODO: handle Nodes which cannot be extracted
889+ # self._ghosts = set()
890+
891+ # Map from (key, parent_key) => matching_blocks, should be 'use once'
892+ self._matching_blocks = {}
893+
894+ # KnitContent objects
895+ self._content_objects = {}
896+ # The number of children that depend on this fulltext content object
897+ self._num_compression_children = {}
898+ # Delta records that need their compression parent before they can be
899+ # expanded
900+ self._pending_deltas = {}
901+ # Fulltext records that are waiting for their parents fulltexts before
902+ # they can be yielded for annotation
903+ self._pending_annotation = {}
904
905 self._all_build_details = {}
906- # The children => parent revision_id graph
907- self._revision_id_graph = {}
908-
909- self._heads_provider = None
910-
911- self._nodes_to_keep_annotations = set()
912- self._generations_until_keep = 100
913-
914- def set_generations_until_keep(self, value):
915- """Set the number of generations before caching a node.
916-
917- Setting this to -1 will cache every merge node, setting this higher
918- will cache fewer nodes.
919- """
920- self._generations_until_keep = value
921-
922- def _add_fulltext_content(self, revision_id, content_obj):
923- self._fulltext_contents[revision_id] = content_obj
924- # TODO: jam 20080305 It might be good to check the sha1digest here
925- return content_obj.text()
926-
927- def _check_parents(self, child, nodes_to_annotate):
928- """Check if all parents have been processed.
929-
930- :param child: A tuple of (rev_id, parents, raw_content)
931- :param nodes_to_annotate: If child is ready, add it to
932- nodes_to_annotate, otherwise put it back in self._pending_children
933- """
934- for parent_id in child[1]:
935- if (parent_id not in self._annotated_lines):
936- # This parent is present, but another parent is missing
937- self._pending_children.setdefault(parent_id,
938- []).append(child)
939- break
940- else:
941- # This one is ready to be processed
942- nodes_to_annotate.append(child)
943-
944- def _add_annotation(self, revision_id, fulltext, parent_ids,
945- left_matching_blocks=None):
946- """Add an annotation entry.
947-
948- All parents should already have been annotated.
949- :return: A list of children that now have their parents satisfied.
950- """
951- a = self._annotated_lines
952- annotated_parent_lines = [a[p] for p in parent_ids]
953- annotated_lines = list(annotate.reannotate(annotated_parent_lines,
954- fulltext, revision_id, left_matching_blocks,
955- heads_provider=self._get_heads_provider()))
956- self._annotated_lines[revision_id] = annotated_lines
957- for p in parent_ids:
958- ann_children = self._annotate_children[p]
959- ann_children.remove(revision_id)
960- if (not ann_children
961- and p not in self._nodes_to_keep_annotations):
962- del self._annotated_lines[p]
963- del self._all_build_details[p]
964- if p in self._fulltext_contents:
965- del self._fulltext_contents[p]
966- # Now that we've added this one, see if there are any pending
967- # deltas to be done, certainly this parent is finished
968- nodes_to_annotate = []
969- for child in self._pending_children.pop(revision_id, []):
970- self._check_parents(child, nodes_to_annotate)
971- return nodes_to_annotate
972
973 def _get_build_graph(self, key):
974 """Get the graphs for building texts and annotations.
975@@ -3446,202 +3377,243 @@
976 passing to read_records_iter to start reading in the raw data from
977 the pack file.
978 """
979- if key in self._annotated_lines:
980- # Nothing to do
981- return []
982 pending = set([key])
983 records = []
984- generation = 0
985- kept_generation = 0
986+ ann_keys = set()
987+ self._num_needed_children[key] = 1
988 while pending:
989 # get all pending nodes
990- generation += 1
991 this_iteration = pending
992- build_details = self._knit._index.get_build_details(this_iteration)
993+ build_details = self._vf._index.get_build_details(this_iteration)
994 self._all_build_details.update(build_details)
995- # new_nodes = self._knit._index._get_entries(this_iteration)
996+ # new_nodes = self._vf._index._get_entries(this_iteration)
997 pending = set()
998 for key, details in build_details.iteritems():
999- (index_memo, compression_parent, parents,
1000+ (index_memo, compression_parent, parent_keys,
1001 record_details) = details
1002- self._revision_id_graph[key] = parents
1003+ self._parent_map[key] = parent_keys
1004+ self._heads_provider = None
1005 records.append((key, index_memo))
1006 # Do we actually need to check _annotated_lines?
1007- pending.update(p for p in parents
1008- if p not in self._all_build_details)
1009+ pending.update([p for p in parent_keys
1010+ if p not in self._all_build_details])
1011+ if parent_keys:
1012+ for parent_key in parent_keys:
1013+ if parent_key in self._num_needed_children:
1014+ self._num_needed_children[parent_key] += 1
1015+ else:
1016+ self._num_needed_children[parent_key] = 1
1017 if compression_parent:
1018- self._compression_children.setdefault(compression_parent,
1019- []).append(key)
1020- if parents:
1021- for parent in parents:
1022- self._annotate_children.setdefault(parent,
1023- []).append(key)
1024- num_gens = generation - kept_generation
1025- if ((num_gens >= self._generations_until_keep)
1026- and len(parents) > 1):
1027- kept_generation = generation
1028- self._nodes_to_keep_annotations.add(key)
1029+ if compression_parent in self._num_compression_children:
1030+ self._num_compression_children[compression_parent] += 1
1031+ else:
1032+ self._num_compression_children[compression_parent] = 1
1033
1034 missing_versions = this_iteration.difference(build_details.keys())
1035- self._ghosts.update(missing_versions)
1036- for missing_version in missing_versions:
1037- # add a key, no parents
1038- self._revision_id_graph[missing_version] = ()
1039- pending.discard(missing_version) # don't look for it
1040- if self._ghosts.intersection(self._compression_children):
1041- raise KnitCorrupt(
1042- "We cannot have nodes which have a ghost compression parent:\n"
1043- "ghosts: %r\n"
1044- "compression children: %r"
1045- % (self._ghosts, self._compression_children))
1046- # Cleanout anything that depends on a ghost so that we don't wait for
1047- # the ghost to show up
1048- for node in self._ghosts:
1049- if node in self._annotate_children:
1050- # We won't be building this node
1051- del self._annotate_children[node]
1052+ if missing_versions:
1053+ for key in missing_versions:
1054+ if key in self._parent_map and key in self._text_cache:
1055+ # We already have this text ready, we just need to
1056+ # yield it later so we get it annotated
1057+ ann_keys.add(key)
1058+ parent_keys = self._parent_map[key]
1059+ for parent_key in parent_keys:
1060+ if parent_key in self._num_needed_children:
1061+ self._num_needed_children[parent_key] += 1
1062+ else:
1063+ self._num_needed_children[parent_key] = 1
1064+ pending.update([p for p in parent_keys
1065+ if p not in self._all_build_details])
1066+ else:
1067+ raise errors.RevisionNotPresent(key, self._vf)
1068 # Generally we will want to read the records in reverse order, because
1069 # we find the parent nodes after the children
1070 records.reverse()
1071- return records
1072-
1073- def _annotate_records(self, records):
1074- """Build the annotations for the listed records."""
1075+ return records, ann_keys
1076+
1077+ def _get_needed_texts(self, key, pb=None):
1078+ # if True or len(self._vf._fallback_vfs) > 0:
1079+ if len(self._vf._fallback_vfs) > 0:
1080+ # If we have fallbacks, go to the generic path
1081+ for v in annotate.Annotator._get_needed_texts(self, key, pb=pb):
1082+ yield v
1083+ return
1084+ while True:
1085+ try:
1086+ records, ann_keys = self._get_build_graph(key)
1087+ for idx, (sub_key, text, num_lines) in enumerate(
1088+ self._extract_texts(records)):
1089+ if pb is not None:
1090+ pb.update('annotating', idx, len(records))
1091+ yield sub_key, text, num_lines
1092+ for sub_key in ann_keys:
1093+ text = self._text_cache[sub_key]
1094+ num_lines = len(text) # bad assumption
1095+ yield sub_key, text, num_lines
1096+ return
1097+ except errors.RetryWithNewPacks, e:
1098+ self._vf._access.reload_or_raise(e)
1099+ # The cached build_details are no longer valid
1100+ self._all_build_details.clear()
1101+
1102+ def _cache_delta_blocks(self, key, compression_parent, delta, lines):
1103+ parent_lines = self._text_cache[compression_parent]
1104+ blocks = list(KnitContent.get_line_delta_blocks(delta, parent_lines, lines))
1105+ self._matching_blocks[(key, compression_parent)] = blocks
1106+
1107+ def _expand_record(self, key, parent_keys, compression_parent, record,
1108+ record_details):
1109+ delta = None
1110+ if compression_parent:
1111+ if compression_parent not in self._content_objects:
1112+ # Waiting for the parent
1113+ self._pending_deltas.setdefault(compression_parent, []).append(
1114+ (key, parent_keys, record, record_details))
1115+ return None
1116+ # We have the basis parent, so expand the delta
1117+ num = self._num_compression_children[compression_parent]
1118+ num -= 1
1119+ if num == 0:
1120+ base_content = self._content_objects.pop(compression_parent)
1121+ self._num_compression_children.pop(compression_parent)
1122+ else:
1123+ self._num_compression_children[compression_parent] = num
1124+ base_content = self._content_objects[compression_parent]
1125+ # It is tempting to want to copy_base_content=False for the last
1126+ # child object. However, whenever noeol=False,
1127+ # self._text_cache[parent_key] is content._lines. So mutating it
1128+ # gives very bad results.
1129+ # The alternative is to copy the lines into text cache, but then we
1130+ # are copying anyway, so just do it here.
1131+ content, delta = self._vf._factory.parse_record(
1132+ key, record, record_details, base_content,
1133+ copy_base_content=True)
1134+ else:
1135+ # Fulltext record
1136+ content, _ = self._vf._factory.parse_record(
1137+ key, record, record_details, None)
1138+ if self._num_compression_children.get(key, 0) > 0:
1139+ self._content_objects[key] = content
1140+ lines = content.text()
1141+ self._text_cache[key] = lines
1142+ if delta is not None:
1143+ self._cache_delta_blocks(key, compression_parent, delta, lines)
1144+ return lines
1145+
1146+ def _get_parent_annotations_and_matches(self, key, text, parent_key):
1147+ """Get the list of annotations for the parent, and the matching lines.
1148+
1149+ :param text: The opaque value given by _get_needed_texts
1150+ :param parent_key: The key for the parent text
1151+ :return: (parent_annotations, matching_blocks)
1152+ parent_annotations is a list as long as the number of lines in
1153+ parent
1154+ matching_blocks is a list of (parent_idx, text_idx, len) tuples
1155+ indicating which lines match between the two texts
1156+ """
1157+ block_key = (key, parent_key)
1158+ if block_key in self._matching_blocks:
1159+ blocks = self._matching_blocks.pop(block_key)
1160+ parent_annotations = self._annotations_cache[parent_key]
1161+ return parent_annotations, blocks
1162+ return annotate.Annotator._get_parent_annotations_and_matches(self,
1163+ key, text, parent_key)
1164+
1165+ def _process_pending(self, key):
1166+ """The content for 'key' was just processed.
1167+
1168+ Determine if there is any more pending work to be processed.
1169+ """
1170+ to_return = []
1171+ if key in self._pending_deltas:
1172+ compression_parent = key
1173+ children = self._pending_deltas.pop(key)
1174+ for child_key, parent_keys, record, record_details in children:
1175+ lines = self._expand_record(child_key, parent_keys,
1176+ compression_parent,
1177+ record, record_details)
1178+ if self._check_ready_for_annotations(child_key, parent_keys):
1179+ to_return.append(child_key)
1180+ # Also check any children that are waiting for this parent to be
1181+ # annotation ready
1182+ if key in self._pending_annotation:
1183+ children = self._pending_annotation.pop(key)
1184+ to_return.extend([c for c, p_keys in children
1185+ if self._check_ready_for_annotations(c, p_keys)])
1186+ return to_return
1187+
1188+ def _check_ready_for_annotations(self, key, parent_keys):
1189+ """return true if this text is ready to be yielded.
1190+
1191+ Otherwise, this will return False, and queue the text into
1192+ self._pending_annotation
1193+ """
1194+ for parent_key in parent_keys:
1195+ if parent_key not in self._annotations_cache:
1196+ # still waiting on at least one parent text, so queue it up
1197+ # Note that if there are multiple parents, we need to wait
1198+ # for all of them.
1199+ self._pending_annotation.setdefault(parent_key,
1200+ []).append((key, parent_keys))
1201+ return False
1202+ return True
1203+
1204+ def _extract_texts(self, records):
1205+ """Extract the various texts needed based on records"""
1206 # We iterate in the order read, rather than a strict order requested
1207 # However, process what we can, and put off to the side things that
1208 # still need parents, cleaning them up when those parents are
1209 # processed.
1210- for (rev_id, record,
1211- digest) in self._knit._read_records_iter(records):
1212- if rev_id in self._annotated_lines:
1213+ # Basic data flow:
1214+ # 1) As 'records' are read, see if we can expand these records into
1215+ # Content objects (and thus lines)
1216+ # 2) If a given line-delta is waiting on its compression parent, it
1217+ # gets queued up into self._pending_deltas, otherwise we expand
1218+ # it, and put it into self._text_cache and self._content_objects
1219+ # 3) If we expanded the text, we will then check to see if all
1220+ # parents have also been processed. If so, this text gets yielded,
1221+ # else this record gets set aside into pending_annotation
1222+ # 4) Further, if we expanded the text in (2), we will then check to
1223+ # see if there are any children in self._pending_deltas waiting to
1224+ # also be processed. If so, we go back to (2) for those
1225+ # 5) Further again, if we yielded the text, we can then check if that
1226+ # 'unlocks' any of the texts in pending_annotations, which should
1227+ # then get yielded as well
1228+ # Note that both steps 4 and 5 are 'recursive' in that unlocking one
1229+ # compression child could unlock yet another, and yielding a fulltext
1230+ # will also 'unlock' the children that are waiting on that annotation.
1231+ # (Though also, unlocking 1 parent's fulltext, does not unlock a child
1232+ # if other parents are also waiting.)
1233+ # We want to yield content before expanding child content objects, so
1234+ # that we know when we can re-use the content lines, and the annotation
1235+ # code can know when it can stop caching fulltexts, as well.
1236+
1237+ # Children that are missing their compression parent
1238+ pending_deltas = {}
1239+ for (key, record, digest) in self._vf._read_records_iter(records):
1240+ # ghosts?
1241+ details = self._all_build_details[key]
1242+ (_, compression_parent, parent_keys, record_details) = details
1243+ lines = self._expand_record(key, parent_keys, compression_parent,
1244+ record, record_details)
1245+ if lines is None:
1246+ # Pending delta should be queued up
1247 continue
1248- parent_ids = self._revision_id_graph[rev_id]
1249- parent_ids = [p for p in parent_ids if p not in self._ghosts]
1250- details = self._all_build_details[rev_id]
1251- (index_memo, compression_parent, parents,
1252- record_details) = details
1253- nodes_to_annotate = []
1254- # TODO: Remove the punning between compression parents, and
1255- # parent_ids, we should be able to do this without assuming
1256- # the build order
1257- if len(parent_ids) == 0:
1258- # There are no parents for this node, so just add it
1259- # TODO: This probably needs to be decoupled
1260- fulltext_content, delta = self._knit._factory.parse_record(
1261- rev_id, record, record_details, None)
1262- fulltext = self._add_fulltext_content(rev_id, fulltext_content)
1263- nodes_to_annotate.extend(self._add_annotation(rev_id, fulltext,
1264- parent_ids, left_matching_blocks=None))
1265- else:
1266- child = (rev_id, parent_ids, record)
1267- # Check if all the parents are present
1268- self._check_parents(child, nodes_to_annotate)
1269- while nodes_to_annotate:
1270- # Should we use a queue here instead of a stack?
1271- (rev_id, parent_ids, record) = nodes_to_annotate.pop()
1272- (index_memo, compression_parent, parents,
1273- record_details) = self._all_build_details[rev_id]
1274- blocks = None
1275- if compression_parent is not None:
1276- comp_children = self._compression_children[compression_parent]
1277- if rev_id not in comp_children:
1278- raise AssertionError("%r not in compression children %r"
1279- % (rev_id, comp_children))
1280- # If there is only 1 child, it is safe to reuse this
1281- # content
1282- reuse_content = (len(comp_children) == 1
1283- and compression_parent not in
1284- self._nodes_to_keep_annotations)
1285- if reuse_content:
1286- # Remove it from the cache since it will be changing
1287- parent_fulltext_content = self._fulltext_contents.pop(compression_parent)
1288- # Make sure to copy the fulltext since it might be
1289- # modified
1290- parent_fulltext = list(parent_fulltext_content.text())
1291- else:
1292- parent_fulltext_content = self._fulltext_contents[compression_parent]
1293- parent_fulltext = parent_fulltext_content.text()
1294- comp_children.remove(rev_id)
1295- fulltext_content, delta = self._knit._factory.parse_record(
1296- rev_id, record, record_details,
1297- parent_fulltext_content,
1298- copy_base_content=(not reuse_content))
1299- fulltext = self._add_fulltext_content(rev_id,
1300- fulltext_content)
1301- if compression_parent == parent_ids[0]:
1302- # the compression_parent is the left parent, so we can
1303- # re-use the delta
1304- blocks = KnitContent.get_line_delta_blocks(delta,
1305- parent_fulltext, fulltext)
1306- else:
1307- fulltext_content = self._knit._factory.parse_fulltext(
1308- record, rev_id)
1309- fulltext = self._add_fulltext_content(rev_id,
1310- fulltext_content)
1311- nodes_to_annotate.extend(
1312- self._add_annotation(rev_id, fulltext, parent_ids,
1313- left_matching_blocks=blocks))
1314-
1315- def _get_heads_provider(self):
1316- """Create a heads provider for resolving ancestry issues."""
1317- if self._heads_provider is not None:
1318- return self._heads_provider
1319- self._heads_provider = _mod_graph.KnownGraph(self._revision_id_graph)
1320- return self._heads_provider
1321-
1322- def annotate(self, key):
1323- """Return the annotated fulltext at the given key.
1324-
1325- :param key: The key to annotate.
1326- """
1327- if len(self._knit._fallback_vfs) > 0:
1328- # stacked knits can't use the fast path at present.
1329- return self._simple_annotate(key)
1330- while True:
1331- try:
1332- records = self._get_build_graph(key)
1333- if key in self._ghosts:
1334- raise errors.RevisionNotPresent(key, self._knit)
1335- self._annotate_records(records)
1336- return self._annotated_lines[key]
1337- except errors.RetryWithNewPacks, e:
1338- self._knit._access.reload_or_raise(e)
1339- # The cached build_details are no longer valid
1340- self._all_build_details.clear()
1341-
1342- def _simple_annotate(self, key):
1343- """Return annotated fulltext, rediffing from the full texts.
1344-
1345- This is slow but makes no assumptions about the repository
1346- being able to produce line deltas.
1347- """
1348- # TODO: this code generates a parent maps of present ancestors; it
1349- # could be split out into a separate method
1350- # -- mbp and robertc 20080704
1351- graph = _mod_graph.Graph(self._knit)
1352- parent_map = dict((k, v) for k, v in graph.iter_ancestry([key])
1353- if v is not None)
1354- if not parent_map:
1355- raise errors.RevisionNotPresent(key, self)
1356- keys = parent_map.keys()
1357- heads_provider = _mod_graph.KnownGraph(parent_map)
1358- parent_cache = {}
1359- reannotate = annotate.reannotate
1360- for record in self._knit.get_record_stream(keys, 'topological', True):
1361- key = record.key
1362- fulltext = osutils.chunks_to_lines(record.get_bytes_as('chunked'))
1363- parents = parent_map[key]
1364- if parents is not None:
1365- parent_lines = [parent_cache[parent] for parent in parent_map[key]]
1366- else:
1367- parent_lines = []
1368- parent_cache[key] = list(
1369- reannotate(parent_lines, fulltext, key, None, heads_provider))
1370- try:
1371- return parent_cache[key]
1372- except KeyError, e:
1373- raise errors.RevisionNotPresent(key, self._knit)
1374-
1375+ # At this point, we may be able to yield this content, if all
1376+ # parents are also finished
1377+ yield_this_text = self._check_ready_for_annotations(key,
1378+ parent_keys)
1379+ if yield_this_text:
1380+ # All parents present
1381+ yield key, lines, len(lines)
1382+ to_process = self._process_pending(key)
1383+ while to_process:
1384+ this_process = to_process
1385+ to_process = []
1386+ for key in this_process:
1387+ lines = self._text_cache[key]
1388+ yield key, lines, len(lines)
1389+ to_process.extend(self._process_pending(key))
1390
1391 try:
1392 from bzrlib._knit_load_data_c import _load_data_c as _load_data
1393
1394=== modified file 'bzrlib/revisiontree.py'
1395--- bzrlib/revisiontree.py 2009-06-17 03:41:33 +0000
1396+++ bzrlib/revisiontree.py 2009-07-08 23:35:27 +0000
1397@@ -87,7 +87,8 @@
1398 default_revision=revision.CURRENT_REVISION):
1399 """See Tree.annotate_iter"""
1400 text_key = (file_id, self.inventory[file_id].revision)
1401- annotations = self._repository.texts.annotate(text_key)
1402+ annotator = self._repository.texts.get_annotator()
1403+ annotations = annotator.annotate_flat(text_key)
1404 return [(key[-1], line) for key, line in annotations]
1405
1406 def get_file_size(self, file_id):
1407
1408=== modified file 'bzrlib/tests/__init__.py'
1409--- bzrlib/tests/__init__.py 2009-07-02 11:37:38 +0000
1410+++ bzrlib/tests/__init__.py 2009-07-08 23:35:28 +0000
1411@@ -3344,6 +3344,7 @@
1412 'bzrlib.tests.per_repository',
1413 'bzrlib.tests.per_repository_chk',
1414 'bzrlib.tests.per_repository_reference',
1415+ 'bzrlib.tests.test__annotator',
1416 'bzrlib.tests.test__chk_map',
1417 'bzrlib.tests.test__dirstate_helpers',
1418 'bzrlib.tests.test__groupcompress',
1419
1420=== added file 'bzrlib/tests/test__annotator.py'
1421--- bzrlib/tests/test__annotator.py 1970-01-01 00:00:00 +0000
1422+++ bzrlib/tests/test__annotator.py 2009-07-08 23:35:28 +0000
1423@@ -0,0 +1,403 @@
1424+# Copyright (C) 2009 Canonical Ltd
1425+#
1426+# This program is free software; you can redistribute it and/or modify
1427+# it under the terms of the GNU General Public License as published by
1428+# the Free Software Foundation; either version 2 of the License, or
1429+# (at your option) any later version.
1430+#
1431+# This program is distributed in the hope that it will be useful,
1432+# but WITHOUT ANY WARRANTY; without even the implied warranty of
1433+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
1434+# GNU General Public License for more details.
1435+#
1436+# You should have received a copy of the GNU General Public License
1437+# along with this program; if not, write to the Free Software
1438+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
1439+
1440+"""Tests for Annotators."""
1441+
1442+from bzrlib import (
1443+ annotate,
1444+ _annotator_py,
1445+ errors,
1446+ knit,
1447+ revision,
1448+ tests,
1449+ )
1450+
1451+
1452+def load_tests(standard_tests, module, loader):
1453+ """Parameterize tests for all versions of groupcompress."""
1454+ scenarios = [
1455+ ('python', {'module': _annotator_py}),
1456+ ]
1457+ suite = loader.suiteClass()
1458+ if CompiledAnnotator.available():
1459+ from bzrlib import _annotator_pyx
1460+ scenarios.append(('C', {'module': _annotator_pyx}))
1461+ else:
1462+ # the compiled module isn't available, so we add a failing test
1463+ class FailWithoutFeature(tests.TestCase):
1464+ def test_fail(self):
1465+ self.requireFeature(CompiledAnnotator)
1466+ suite.addTest(loader.loadTestsFromTestCase(FailWithoutFeature))
1467+ result = tests.multiply_tests(standard_tests, scenarios, suite)
1468+ return result
1469+
1470+
1471+class _CompiledAnnotator(tests.Feature):
1472+
1473+ def _probe(self):
1474+ try:
1475+ import bzrlib._annotator_pyx
1476+ except ImportError:
1477+ return False
1478+ return True
1479+
1480+ def feature_name(self):
1481+ return 'bzrlib._annotator_pyx'
1482+
1483+CompiledAnnotator = _CompiledAnnotator()
1484+
1485+
1486+class TestAnnotator(tests.TestCaseWithMemoryTransport):
1487+
1488+ module = None # Set by load_tests
1489+
1490+ fa_key = ('f-id', 'a-id')
1491+ fb_key = ('f-id', 'b-id')
1492+ fc_key = ('f-id', 'c-id')
1493+ fd_key = ('f-id', 'd-id')
1494+ fe_key = ('f-id', 'e-id')
1495+ ff_key = ('f-id', 'f-id')
1496+
1497+ def make_no_graph_texts(self):
1498+ factory = knit.make_pack_factory(False, False, 2)
1499+ self.vf = factory(self.get_transport())
1500+ self.ann = self.module.Annotator(self.vf)
1501+ self.vf.add_lines(self.fa_key, (), ['simple\n', 'content\n'])
1502+ self.vf.add_lines(self.fb_key, (), ['simple\n', 'new content\n'])
1503+
1504+ def make_simple_text(self):
1505+ # TODO: all we really need is a VersionedFile instance, we'd like to
1506+ # avoid creating all the intermediate stuff
1507+ factory = knit.make_pack_factory(True, True, 2)
1508+ self.vf = factory(self.get_transport())
1509+ # This assumes nothing special happens during __init__, which may be
1510+ # valid
1511+ self.ann = self.module.Annotator(self.vf)
1512+ # A 'simple|content|'
1513+ # |
1514+ # B 'simple|new content|'
1515+ self.vf.add_lines(self.fa_key, [], ['simple\n', 'content\n'])
1516+ self.vf.add_lines(self.fb_key, [self.fa_key],
1517+ ['simple\n', 'new content\n'])
1518+
1519+ def make_merge_text(self):
1520+ self.make_simple_text()
1521+ # A 'simple|content|'
1522+ # |\
1523+ # B | 'simple|new content|'
1524+ # | |
1525+ # | C 'simple|from c|content|'
1526+ # |/
1527+ # D 'simple|from c|new content|introduced in merge|'
1528+ self.vf.add_lines(self.fc_key, [self.fa_key],
1529+ ['simple\n', 'from c\n', 'content\n'])
1530+ self.vf.add_lines(self.fd_key, [self.fb_key, self.fc_key],
1531+ ['simple\n', 'from c\n', 'new content\n',
1532+ 'introduced in merge\n'])
1533+
1534+ def make_common_merge_text(self):
1535+ """Both sides of the merge will have introduced a line."""
1536+ self.make_simple_text()
1537+ # A 'simple|content|'
1538+ # |\
1539+ # B | 'simple|new content|'
1540+ # | |
1541+ # | C 'simple|new content|'
1542+ # |/
1543+ # D 'simple|new content|'
1544+ self.vf.add_lines(self.fc_key, [self.fa_key],
1545+ ['simple\n', 'new content\n'])
1546+ self.vf.add_lines(self.fd_key, [self.fb_key, self.fc_key],
1547+ ['simple\n', 'new content\n'])
1548+
1549+ def make_many_way_common_merge_text(self):
1550+ self.make_simple_text()
1551+ # A-. 'simple|content|'
1552+ # |\ \
1553+ # B | | 'simple|new content|'
1554+ # | | |
1555+ # | C | 'simple|new content|'
1556+ # |/ |
1557+ # D | 'simple|new content|'
1558+ # | |
1559+ # | E 'simple|new content|'
1560+ # | /
1561+ # F-' 'simple|new content|'
1562+ self.vf.add_lines(self.fc_key, [self.fa_key],
1563+ ['simple\n', 'new content\n'])
1564+ self.vf.add_lines(self.fd_key, [self.fb_key, self.fc_key],
1565+ ['simple\n', 'new content\n'])
1566+ self.vf.add_lines(self.fe_key, [self.fa_key],
1567+ ['simple\n', 'new content\n'])
1568+ self.vf.add_lines(self.ff_key, [self.fd_key, self.fe_key],
1569+ ['simple\n', 'new content\n'])
1570+
1571+ def make_merge_and_restored_text(self):
1572+ self.make_simple_text()
1573+ # A 'simple|content|'
1574+ # |\
1575+ # B | 'simple|new content|'
1576+ # | |
1577+ # C | 'simple|content|' # reverted to A
1578+ # \|
1579+ # D 'simple|content|'
1580+ # c reverts back to 'a' for the new content line
1581+ self.vf.add_lines(self.fc_key, [self.fb_key],
1582+ ['simple\n', 'content\n'])
1583+ # d merges 'a' and 'c', to find both claim last modified
1584+ self.vf.add_lines(self.fd_key, [self.fa_key, self.fc_key],
1585+ ['simple\n', 'content\n'])
1586+
1587+ def assertAnnotateEqual(self, expected_annotation, key, exp_text=None):
1588+ annotation, lines = self.ann.annotate(key)
1589+ self.assertEqual(expected_annotation, annotation)
1590+ if exp_text is None:
1591+ record = self.vf.get_record_stream([key], 'unordered', True).next()
1592+ exp_text = record.get_bytes_as('fulltext')
1593+ self.assertEqualDiff(exp_text, ''.join(lines))
1594+
1595+ def test_annotate_missing(self):
1596+ self.make_simple_text()
1597+ self.assertRaises(errors.RevisionNotPresent,
1598+ self.ann.annotate, ('not', 'present'))
1599+
1600+ def test_annotate_simple(self):
1601+ self.make_simple_text()
1602+ self.assertAnnotateEqual([(self.fa_key,)]*2, self.fa_key)
1603+ self.assertAnnotateEqual([(self.fa_key,), (self.fb_key,)], self.fb_key)
1604+
1605+ def test_annotate_merge_text(self):
1606+ self.make_merge_text()
1607+ self.assertAnnotateEqual([(self.fa_key,), (self.fc_key,),
1608+ (self.fb_key,), (self.fd_key,)],
1609+ self.fd_key)
1610+
1611+ def test_annotate_common_merge_text(self):
1612+ self.make_common_merge_text()
1613+ self.assertAnnotateEqual([(self.fa_key,), (self.fb_key, self.fc_key)],
1614+ self.fd_key)
1615+
1616+ def test_annotate_many_way_common_merge_text(self):
1617+ self.make_many_way_common_merge_text()
1618+ self.assertAnnotateEqual([(self.fa_key,),
1619+ (self.fb_key, self.fc_key, self.fe_key)],
1620+ self.ff_key)
1621+
1622+ def test_annotate_merge_and_restored(self):
1623+ self.make_merge_and_restored_text()
1624+ self.assertAnnotateEqual([(self.fa_key,), (self.fa_key, self.fc_key)],
1625+ self.fd_key)
1626+
1627+ def test_annotate_flat_simple(self):
1628+ self.make_simple_text()
1629+ self.assertEqual([(self.fa_key, 'simple\n'),
1630+ (self.fa_key, 'content\n'),
1631+ ], self.ann.annotate_flat(self.fa_key))
1632+ self.assertEqual([(self.fa_key, 'simple\n'),
1633+ (self.fb_key, 'new content\n'),
1634+ ], self.ann.annotate_flat(self.fb_key))
1635+
1636+ def test_annotate_flat_merge_and_restored_text(self):
1637+ self.make_merge_and_restored_text()
1638+ # fc is a simple dominator of fa
1639+ self.assertEqual([(self.fa_key, 'simple\n'),
1640+ (self.fc_key, 'content\n'),
1641+ ], self.ann.annotate_flat(self.fd_key))
1642+
1643+ def test_annotate_common_merge_text(self):
1644+ self.make_common_merge_text()
1645+ # there is no common point, so we just pick the lexicographical lowest
1646+ # and 'b-id' comes before 'c-id'
1647+ self.assertEqual([(self.fa_key, 'simple\n'),
1648+ (self.fb_key, 'new content\n'),
1649+ ], self.ann.annotate_flat(self.fd_key))
1650+
1651+ def test_annotate_many_way_common_merge_text(self):
1652+ self.make_many_way_common_merge_text()
1653+ self.assertEqual([(self.fa_key, 'simple\n'),
1654+ (self.fb_key, 'new content\n')],
1655+ self.ann.annotate_flat(self.ff_key))
1656+
1657+ def test_annotate_flat_respects_break_ann_tie(self):
1658+ tiebreaker = annotate._break_annotation_tie
1659+ try:
1660+ calls = []
1661+ def custom_tiebreaker(annotated_lines):
1662+ self.assertEqual(2, len(annotated_lines))
1663+ left = annotated_lines[0]
1664+ self.assertEqual(2, len(left))
1665+ self.assertEqual('new content\n', left[1])
1666+ right = annotated_lines[1]
1667+ self.assertEqual(2, len(right))
1668+ self.assertEqual('new content\n', right[1])
1669+ calls.append((left[0], right[0]))
1670+ # Our custom tiebreaker takes the *largest* value, rather than
1671+ # the *smallest* value
1672+ if left[0] < right[0]:
1673+ return right
1674+ else:
1675+ return left
1676+ annotate._break_annotation_tie = custom_tiebreaker
1677+ self.make_many_way_common_merge_text()
1678+ self.assertEqual([(self.fa_key, 'simple\n'),
1679+ (self.fe_key, 'new content\n')],
1680+ self.ann.annotate_flat(self.ff_key))
1681+ self.assertEqual([(self.fe_key, self.fc_key),
1682+ (self.fe_key, self.fb_key)], calls)
1683+ finally:
1684+ annotate._break_annotation_tie = tiebreaker
1685+
1686+
1687+ def test_needed_keys_simple(self):
1688+ self.make_simple_text()
1689+ keys, ann_keys = self.ann._get_needed_keys(self.fb_key)
1690+ self.assertEqual([self.fa_key, self.fb_key], sorted(keys))
1691+ self.assertEqual({self.fa_key: 1, self.fb_key: 1},
1692+ self.ann._num_needed_children)
1693+ self.assertEqual(set(), ann_keys)
1694+
1695+ def test_needed_keys_many(self):
1696+ self.make_many_way_common_merge_text()
1697+ keys, ann_keys = self.ann._get_needed_keys(self.ff_key)
1698+ self.assertEqual([self.fa_key, self.fb_key, self.fc_key,
1699+ self.fd_key, self.fe_key, self.ff_key,
1700+ ], sorted(keys))
1701+ self.assertEqual({self.fa_key: 3,
1702+ self.fb_key: 1,
1703+ self.fc_key: 1,
1704+ self.fd_key: 1,
1705+ self.fe_key: 1,
1706+ self.ff_key: 1,
1707+ }, self.ann._num_needed_children)
1708+ self.assertEqual(set(), ann_keys)
1709+
1710+ def test_needed_keys_with_special_text(self):
1711+ self.make_many_way_common_merge_text()
1712+ spec_key = ('f-id', revision.CURRENT_REVISION)
1713+ spec_text = 'simple\nnew content\nlocally modified\n'
1714+ self.ann.add_special_text(spec_key, [self.fd_key, self.fe_key],
1715+ spec_text)
1716+ keys, ann_keys = self.ann._get_needed_keys(spec_key)
1717+ self.assertEqual([self.fa_key, self.fb_key, self.fc_key,
1718+ self.fd_key, self.fe_key,
1719+ ], sorted(keys))
1720+ self.assertEqual([spec_key], sorted(ann_keys))
1721+
1722+ def test_needed_keys_with_parent_texts(self):
1723+ self.make_many_way_common_merge_text()
1724+ # If 'D' and 'E' are already annotated, we don't need to extract all
1725+ # the texts
1726+ # D | 'simple|new content|'
1727+ # | |
1728+ # | E 'simple|new content|'
1729+ # | /
1730+ # F-' 'simple|new content|'
1731+ self.ann._parent_map[self.fd_key] = (self.fb_key, self.fc_key)
1732+ self.ann._text_cache[self.fd_key] = ['simple\n', 'new content\n']
1733+ self.ann._annotations_cache[self.fd_key] = [
1734+ (self.fa_key,),
1735+ (self.fb_key, self.fc_key),
1736+ ]
1737+ self.ann._parent_map[self.fe_key] = (self.fa_key,)
1738+ self.ann._text_cache[self.fe_key] = ['simple\n', 'new content\n']
1739+ self.ann._annotations_cache[self.fe_key] = [
1740+ (self.fa_key,),
1741+ (self.fe_key,),
1742+ ]
1743+ keys, ann_keys = self.ann._get_needed_keys(self.ff_key)
1744+ self.assertEqual([self.ff_key], sorted(keys))
1745+ self.assertEqual({self.fd_key: 1,
1746+ self.fe_key: 1,
1747+ self.ff_key: 1,
1748+ }, self.ann._num_needed_children)
1749+ self.assertEqual([], sorted(ann_keys))
1750+
1751+ def test_record_annotation_removes_texts(self):
1752+ self.make_many_way_common_merge_text()
1753+ # Populate the caches
1754+ for x in self.ann._get_needed_texts(self.ff_key):
1755+ continue
1756+ self.assertEqual({self.fa_key: 3,
1757+ self.fb_key: 1,
1758+ self.fc_key: 1,
1759+ self.fd_key: 1,
1760+ self.fe_key: 1,
1761+ self.ff_key: 1,
1762+ }, self.ann._num_needed_children)
1763+ self.assertEqual([self.fa_key, self.fb_key, self.fc_key,
1764+ self.fd_key, self.fe_key, self.ff_key,
1765+ ], sorted(self.ann._text_cache.keys()))
1766+ self.ann._record_annotation(self.fa_key, [], [])
1767+ self.ann._record_annotation(self.fb_key, [self.fa_key], [])
1768+ self.assertEqual({self.fa_key: 2,
1769+ self.fb_key: 1,
1770+ self.fc_key: 1,
1771+ self.fd_key: 1,
1772+ self.fe_key: 1,
1773+ self.ff_key: 1,
1774+ }, self.ann._num_needed_children)
1775+ self.assertTrue(self.fa_key in self.ann._text_cache)
1776+ self.assertTrue(self.fa_key in self.ann._annotations_cache)
1777+ self.ann._record_annotation(self.fc_key, [self.fa_key], [])
1778+ self.ann._record_annotation(self.fd_key, [self.fb_key, self.fc_key], [])
1779+ self.assertEqual({self.fa_key: 1,
1780+ self.fb_key: 0,
1781+ self.fc_key: 0,
1782+ self.fd_key: 1,
1783+ self.fe_key: 1,
1784+ self.ff_key: 1,
1785+ }, self.ann._num_needed_children)
1786+ self.assertTrue(self.fa_key in self.ann._text_cache)
1787+ self.assertTrue(self.fa_key in self.ann._annotations_cache)
1788+ self.assertFalse(self.fb_key in self.ann._text_cache)
1789+ self.assertFalse(self.fb_key in self.ann._annotations_cache)
1790+ self.assertFalse(self.fc_key in self.ann._text_cache)
1791+ self.assertFalse(self.fc_key in self.ann._annotations_cache)
1792+
1793+ def test_annotate_special_text(self):
1794+ # Things like WT and PreviewTree want to annotate an arbitrary text
1795+ # ('current:') so we need a way to add that to the group of files to be
1796+ # annotated.
1797+ self.make_many_way_common_merge_text()
1798+ # A-. 'simple|content|'
1799+ # |\ \
1800+ # B | | 'simple|new content|'
1801+ # | | |
1802+ # | C | 'simple|new content|'
1803+ # |/ |
1804+ # D | 'simple|new content|'
1805+ # | |
1806+ # | E 'simple|new content|'
1807+ # | /
1808+ # SPEC 'simple|new content|locally modified|'
1809+ spec_key = ('f-id', revision.CURRENT_REVISION)
1810+ spec_text = 'simple\nnew content\nlocally modified\n'
1811+ self.ann.add_special_text(spec_key, [self.fd_key, self.fe_key],
1812+ spec_text)
1813+ self.assertAnnotateEqual([(self.fa_key,),
1814+ (self.fb_key, self.fc_key, self.fe_key),
1815+ (spec_key,),
1816+ ], spec_key,
1817+ exp_text=spec_text)
1818+
1819+ def test_no_graph(self):
1820+ self.make_no_graph_texts()
1821+ self.assertAnnotateEqual([(self.fa_key,),
1822+ (self.fa_key,),
1823+ ], self.fa_key)
1824+ self.assertAnnotateEqual([(self.fb_key,),
1825+ (self.fb_key,),
1826+ ], self.fb_key)
1827
1828=== modified file 'bzrlib/tests/test__known_graph.py'
1829--- bzrlib/tests/test__known_graph.py 2009-06-18 19:45:24 +0000
1830+++ bzrlib/tests/test__known_graph.py 2009-07-08 23:35:28 +0000
1831@@ -14,7 +14,7 @@
1832 # along with this program; if not, write to the Free Software
1833 # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
1834
1835-"""Tests for the python and pyrex extensions of groupcompress"""
1836+"""Tests for the python and pyrex extensions of KnownGraph"""
1837
1838 from bzrlib import (
1839 errors,
1840@@ -63,6 +63,16 @@
1841 CompiledKnownGraphFeature = _CompiledKnownGraphFeature()
1842
1843
1844+# a
1845+# |\
1846+# b |
1847+# | |
1848+# c |
1849+# \|
1850+# d
1851+alt_merge = {'a': [], 'b': ['a'], 'c': ['b'], 'd': ['a', 'c']}
1852+
1853+
1854 class TestKnownGraph(tests.TestCase):
1855
1856 module = None # Set by load_tests
1857@@ -203,6 +213,10 @@
1858 self.assertEqual(set(['w', 'q']), graph.heads(['w', 's', 'q']))
1859 self.assertEqual(set(['z']), graph.heads(['s', 'z']))
1860
1861+ def test_heads_alt_merge(self):
1862+ graph = self.make_known_graph(alt_merge)
1863+ self.assertEqual(set(['c']), graph.heads(['a', 'c']))
1864+
1865 def test_heads_with_ghost(self):
1866 graph = self.make_known_graph(test_graph.with_ghost)
1867 self.assertEqual(set(['e', 'g']), graph.heads(['e', 'g']))
1868
1869=== modified file 'bzrlib/tests/test_annotate.py'
1870--- bzrlib/tests/test_annotate.py 2009-03-23 14:59:43 +0000
1871+++ bzrlib/tests/test_annotate.py 2009-07-08 23:35:28 +0000
1872@@ -176,38 +176,23 @@
1873 |
1874 rev-3
1875 """
1876-
1877- tree1 = self.make_branch_and_tree('tree1')
1878- self.build_tree_contents([('tree1/a', 'first\n')])
1879- tree1.add(['a'], ['a-id'])
1880- tree1.commit('a', rev_id='rev-1',
1881- committer="joe@foo.com",
1882- timestamp=1166046000.00, timezone=0)
1883-
1884- tree2 = tree1.bzrdir.sprout('tree2').open_workingtree()
1885-
1886- self.build_tree_contents([('tree1/a', 'first\nsecond\n')])
1887- tree1.commit('b', rev_id='rev-2',
1888- committer='joe@foo.com',
1889- timestamp=1166046001.00, timezone=0)
1890-
1891- self.build_tree_contents([('tree2/a', 'first\nthird\n')])
1892- tree2.commit('c', rev_id='rev-1_1_1',
1893- committer="barry@foo.com",
1894- timestamp=1166046002.00, timezone=0)
1895-
1896- num_conflicts = tree1.merge_from_branch(tree2.branch)
1897- self.assertEqual(1, num_conflicts)
1898-
1899- self.build_tree_contents([('tree1/a',
1900- 'first\nsecond\nthird\n')])
1901- tree1.set_conflicts(conflicts.ConflictList())
1902- tree1.commit('merge 2', rev_id='rev-3',
1903- committer='sal@foo.com',
1904- timestamp=1166046003.00, timezone=0)
1905- tree1.lock_read()
1906- self.addCleanup(tree1.unlock)
1907- return tree1, tree2
1908+ builder = self.make_branch_builder('branch')
1909+ builder.start_series()
1910+ self.addCleanup(builder.finish_series)
1911+ builder.build_snapshot('rev-1', None, [
1912+ ('add', ('', 'root-id', 'directory', None)),
1913+ ('add', ('a', 'a-id', 'file', 'first\n')),
1914+ ], timestamp=1166046000.00, committer="joe@foo.com")
1915+ builder.build_snapshot('rev-2', ['rev-1'], [
1916+ ('modify', ('a-id', 'first\nsecond\n')),
1917+ ], timestamp=1166046001.00, committer="joe@foo.com")
1918+ builder.build_snapshot('rev-1_1_1', ['rev-1'], [
1919+ ('modify', ('a-id', 'first\nthird\n')),
1920+ ], timestamp=1166046002.00, committer="barry@foo.com")
1921+ builder.build_snapshot('rev-3', ['rev-2', 'rev-1_1_1'], [
1922+ ('modify', ('a-id', 'first\nsecond\nthird\n')),
1923+ ], timestamp=1166046003.00, committer="sal@foo.com")
1924+ return builder
1925
1926 def create_deeply_merged_trees(self):
1927 """Create some trees with a more complex merge history.
1928@@ -232,69 +217,51 @@
1929 |
1930 rev-6
1931 """
1932- tree1, tree2 = self.create_merged_trees()
1933- tree1.unlock()
1934-
1935- tree3 = tree2.bzrdir.sprout('tree3').open_workingtree()
1936-
1937- tree2.commit('noop', rev_id='rev-1_1_2')
1938- self.assertEqual(0, tree1.merge_from_branch(tree2.branch))
1939- tree1.commit('noop merge', rev_id='rev-4')
1940-
1941- self.build_tree_contents([('tree3/a', 'first\nthird\nfourth\n')])
1942- tree3.commit('four', rev_id='rev-1_2_1',
1943- committer='jerry@foo.com',
1944- timestamp=1166046003.00, timezone=0)
1945-
1946- tree4 = tree3.bzrdir.sprout('tree4').open_workingtree()
1947-
1948- tree3.commit('noop', rev_id='rev-1_2_2',
1949- committer='jerry@foo.com',
1950- timestamp=1166046004.00, timezone=0)
1951- self.assertEqual(0, tree1.merge_from_branch(tree3.branch))
1952- tree1.commit('merge four', rev_id='rev-5')
1953-
1954- self.build_tree_contents([('tree4/a',
1955- 'first\nthird\nfourth\nfifth\nsixth\n')])
1956- tree4.commit('five and six', rev_id='rev-1_3_1',
1957- committer='george@foo.com',
1958- timestamp=1166046005.00, timezone=0)
1959- self.assertEqual(0, tree1.merge_from_branch(tree4.branch))
1960- tree1.commit('merge five and six', rev_id='rev-6')
1961- tree1.lock_read()
1962- return tree1
1963+ builder = self.create_merged_trees()
1964+ builder.build_snapshot('rev-1_1_2', ['rev-1_1_1'], [])
1965+ builder.build_snapshot('rev-4', ['rev-3', 'rev-1_1_2'], [])
1966+ builder.build_snapshot('rev-1_2_1', ['rev-1_1_1'], [
1967+ ('modify', ('a-id', 'first\nthird\nfourth\n')),
1968+ ], timestamp=1166046003.00, committer="jerry@foo.com")
1969+ builder.build_snapshot('rev-1_2_2', ['rev-1_2_1'], [],
1970+ timestamp=1166046004.00, committer="jerry@foo.com")
1971+ builder.build_snapshot('rev-5', ['rev-4', 'rev-1_2_2'], [
1972+ ('modify', ('a-id', 'first\nsecond\nthird\nfourth\n')),
1973+ ], timestamp=1166046004.00, committer="jerry@foo.com")
1974+ builder.build_snapshot('rev-1_3_1', ['rev-1_2_1'], [
1975+ ('modify', ('a-id', 'first\nthird\nfourth\nfifth\nsixth\n')),
1976+ ], timestamp=1166046005.00, committer="george@foo.com")
1977+ builder.build_snapshot('rev-6', ['rev-5', 'rev-1_3_1'], [
1978+ ('modify', ('a-id',
1979+ 'first\nsecond\nthird\nfourth\nfifth\nsixth\n')),
1980+ ])
1981+ return builder
1982
1983 def create_duplicate_lines_tree(self):
1984- tree1 = self.make_branch_and_tree('tree1')
1985+ builder = self.make_branch_builder('branch')
1986+ builder.start_series()
1987+ self.addCleanup(builder.finish_series)
1988 base_text = ''.join(l for r, l in duplicate_base)
1989 a_text = ''.join(l for r, l in duplicate_A)
1990 b_text = ''.join(l for r, l in duplicate_B)
1991 c_text = ''.join(l for r, l in duplicate_C)
1992 d_text = ''.join(l for r, l in duplicate_D)
1993 e_text = ''.join(l for r, l in duplicate_E)
1994- self.build_tree_contents([('tree1/file', base_text)])
1995- tree1.add(['file'], ['file-id'])
1996- tree1.commit('base', rev_id='rev-base')
1997- tree2 = tree1.bzrdir.sprout('tree2').open_workingtree()
1998-
1999- self.build_tree_contents([('tree1/file', a_text),
2000- ('tree2/file', b_text)])
2001- tree1.commit('A', rev_id='rev-A')
2002- tree2.commit('B', rev_id='rev-B')
2003-
2004- tree2.merge_from_branch(tree1.branch)
2005- conflicts.resolve(tree2, None) # Resolve the conflicts
2006- self.build_tree_contents([('tree2/file', d_text)])
2007- tree2.commit('D', rev_id='rev-D')
2008-
2009- self.build_tree_contents([('tree1/file', c_text)])
2010- tree1.commit('C', rev_id='rev-C')
2011-
2012- tree1.merge_from_branch(tree2.branch)
2013- conflicts.resolve(tree1, None) # Resolve the conflicts
2014- self.build_tree_contents([('tree1/file', e_text)])
2015- tree1.commit('E', rev_id='rev-E')
2016- return tree1
2017+ builder.build_snapshot('rev-base', None, [
2018+ ('add', ('', 'root-id', 'directory', None)),
2019+ ('add', ('file', 'file-id', 'file', base_text)),
2020+ ])
2021+ builder.build_snapshot('rev-A', ['rev-base'], [
2022+ ('modify', ('file-id', a_text))])
2023+ builder.build_snapshot('rev-B', ['rev-base'], [
2024+ ('modify', ('file-id', b_text))])
2025+ builder.build_snapshot('rev-C', ['rev-A'], [
2026+ ('modify', ('file-id', c_text))])
2027+ builder.build_snapshot('rev-D', ['rev-B', 'rev-A'], [
2028+ ('modify', ('file-id', d_text))])
2029+ builder.build_snapshot('rev-E', ['rev-C', 'rev-D'], [
2030+ ('modify', ('file-id', e_text))])
2031+ return builder
2032
2033 def assertRepoAnnotate(self, expected, repo, file_id, revision_id):
2034 """Assert that the revision is properly annotated."""
2035@@ -307,8 +274,8 @@
2036
2037 def test_annotate_duplicate_lines(self):
2038 # XXX: Should this be a per_repository test?
2039- tree1 = self.create_duplicate_lines_tree()
2040- repo = tree1.branch.repository
2041+ builder = self.create_duplicate_lines_tree()
2042+ repo = builder.get_branch().repository
2043 repo.lock_read()
2044 self.addCleanup(repo.unlock)
2045 self.assertRepoAnnotate(duplicate_base, repo, 'file-id', 'rev-base')
2046@@ -319,10 +286,10 @@
2047 self.assertRepoAnnotate(duplicate_E, repo, 'file-id', 'rev-E')
2048
2049 def test_annotate_shows_dotted_revnos(self):
2050- tree1, tree2 = self.create_merged_trees()
2051+ builder = self.create_merged_trees()
2052
2053 sio = StringIO()
2054- annotate.annotate_file(tree1.branch, 'rev-3', 'a-id',
2055+ annotate.annotate_file(builder.get_branch(), 'rev-3', 'a-id',
2056 to_file=sio)
2057 self.assertEqualDiff('1 joe@foo | first\n'
2058 '2 joe@foo | second\n'
2059@@ -331,10 +298,10 @@
2060
2061 def test_annotate_limits_dotted_revnos(self):
2062 """Annotate should limit dotted revnos to a depth of 12"""
2063- tree1 = self.create_deeply_merged_trees()
2064+ builder = self.create_deeply_merged_trees()
2065
2066 sio = StringIO()
2067- annotate.annotate_file(tree1.branch, 'rev-6', 'a-id',
2068+ annotate.annotate_file(builder.get_branch(), 'rev-6', 'a-id',
2069 to_file=sio, verbose=False, full=False)
2070 self.assertEqualDiff('1 joe@foo | first\n'
2071 '2 joe@foo | second\n'
2072@@ -345,7 +312,7 @@
2073 sio.getvalue())
2074
2075 sio = StringIO()
2076- annotate.annotate_file(tree1.branch, 'rev-6', 'a-id',
2077+ annotate.annotate_file(builder.get_branch(), 'rev-6', 'a-id',
2078 to_file=sio, verbose=False, full=True)
2079 self.assertEqualDiff('1 joe@foo | first\n'
2080 '2 joe@foo | second\n'
2081@@ -357,7 +324,7 @@
2082
2083 # verbose=True shows everything, the full revno, user id, and date
2084 sio = StringIO()
2085- annotate.annotate_file(tree1.branch, 'rev-6', 'a-id',
2086+ annotate.annotate_file(builder.get_branch(), 'rev-6', 'a-id',
2087 to_file=sio, verbose=True, full=False)
2088 self.assertEqualDiff('1 joe@foo.com 20061213 | first\n'
2089 '2 joe@foo.com 20061213 | second\n'
2090@@ -368,7 +335,7 @@
2091 sio.getvalue())
2092
2093 sio = StringIO()
2094- annotate.annotate_file(tree1.branch, 'rev-6', 'a-id',
2095+ annotate.annotate_file(builder.get_branch(), 'rev-6', 'a-id',
2096 to_file=sio, verbose=True, full=True)
2097 self.assertEqualDiff('1 joe@foo.com 20061213 | first\n'
2098 '2 joe@foo.com 20061213 | second\n'
2099@@ -384,10 +351,10 @@
2100 When annotating a non-mainline revision, the annotation should still
2101 use dotted revnos from the mainline.
2102 """
2103- tree1 = self.create_deeply_merged_trees()
2104+ builder = self.create_deeply_merged_trees()
2105
2106 sio = StringIO()
2107- annotate.annotate_file(tree1.branch, 'rev-1_3_1', 'a-id',
2108+ annotate.annotate_file(builder.get_branch(), 'rev-1_3_1', 'a-id',
2109 to_file=sio, verbose=False, full=False)
2110 self.assertEqualDiff('1 joe@foo | first\n'
2111 '1.1.1 barry@f | third\n'
2112@@ -397,10 +364,10 @@
2113 sio.getvalue())
2114
2115 def test_annotate_show_ids(self):
2116- tree1 = self.create_deeply_merged_trees()
2117+ builder = self.create_deeply_merged_trees()
2118
2119 sio = StringIO()
2120- annotate.annotate_file(tree1.branch, 'rev-6', 'a-id',
2121+ annotate.annotate_file(builder.get_branch(), 'rev-6', 'a-id',
2122 to_file=sio, show_ids=True, full=False)
2123
2124 # It looks better with real revision ids :)
2125@@ -413,7 +380,7 @@
2126 sio.getvalue())
2127
2128 sio = StringIO()
2129- annotate.annotate_file(tree1.branch, 'rev-6', 'a-id',
2130+ annotate.annotate_file(builder.get_branch(), 'rev-6', 'a-id',
2131 to_file=sio, show_ids=True, full=True)
2132
2133 self.assertEqualDiff(' rev-1 | first\n'
2134
2135=== modified file 'bzrlib/tests/test_knit.py'
2136--- bzrlib/tests/test_knit.py 2009-06-16 13:57:14 +0000
2137+++ bzrlib/tests/test_knit.py 2009-07-08 23:35:28 +0000
2138@@ -366,16 +366,25 @@
2139 :return: (versioned_file, reload_counter)
2140 versioned_file a KnitVersionedFiles using the packs for access
2141 """
2142- tree = self.make_branch_and_memory_tree('tree')
2143- tree.lock_write()
2144- self.addCleanup(tree.unlock)
2145- tree.add([''], ['root-id'])
2146- tree.commit('one', rev_id='rev-1')
2147- tree.commit('two', rev_id='rev-2')
2148- tree.commit('three', rev_id='rev-3')
2149+ builder = self.make_branch_builder('.')
2150+ builder.start_series()
2151+ builder.build_snapshot('rev-1', None, [
2152+ ('add', ('', 'root-id', 'directory', None)),
2153+ ('add', ('file', 'file-id', 'file', 'content\nrev 1\n')),
2154+ ])
2155+ builder.build_snapshot('rev-2', ['rev-1'], [
2156+ ('modify', ('file-id', 'content\nrev 2\n')),
2157+ ])
2158+ builder.build_snapshot('rev-3', ['rev-2'], [
2159+ ('modify', ('file-id', 'content\nrev 3\n')),
2160+ ])
2161+ builder.finish_series()
2162+ b = builder.get_branch()
2163+ b.lock_write()
2164+ self.addCleanup(b.unlock)
2165 # Pack these three revisions into another pack file, but don't remove
2166 # the originals
2167- repo = tree.branch.repository
2168+ repo = b.repository
2169 collection = repo._pack_collection
2170 collection.ensure_loaded()
2171 orig_packs = collection.packs
2172@@ -384,7 +393,7 @@
2173 # forget about the new pack
2174 collection.reset()
2175 repo.refresh_data()
2176- vf = tree.branch.repository.revisions
2177+ vf = repo.revisions
2178 # Set up a reload() function that switches to using the new pack file
2179 new_index = new_pack.revision_index
2180 access_tuple = new_pack.access_tuple()
2181@@ -1313,6 +1322,168 @@
2182 return _KndxIndex(transport, mapper, lambda:None, allow_writes, lambda:True)
2183
2184
2185+class Test_KnitAnnotator(TestCaseWithMemoryTransport):
2186+
2187+ def make_annotator(self):
2188+ factory = knit.make_pack_factory(True, True, 1)
2189+ vf = factory(self.get_transport())
2190+ return knit._KnitAnnotator(vf)
2191+
2192+ def test__expand_fulltext(self):
2193+ ann = self.make_annotator()
2194+ rev_key = ('rev-id',)
2195+ ann._num_compression_children[rev_key] = 1
2196+ res = ann._expand_record(rev_key, (('parent-id',),), None,
2197+ ['line1\n', 'line2\n'], ('fulltext', True))
2198+ # The content object and text lines should be cached appropriately
2199+ self.assertEqual(['line1\n', 'line2'], res)
2200+ content_obj = ann._content_objects[rev_key]
2201+ self.assertEqual(['line1\n', 'line2\n'], content_obj._lines)
2202+ self.assertEqual(res, content_obj.text())
2203+ self.assertEqual(res, ann._text_cache[rev_key])
2204+
2205+ def test__expand_delta_comp_parent_not_available(self):
2206+ # Parent isn't available yet, so we return nothing, but queue up this
2207+ # node for later processing
2208+ ann = self.make_annotator()
2209+ rev_key = ('rev-id',)
2210+ parent_key = ('parent-id',)
2211+ record = ['0,1,1\n', 'new-line\n']
2212+ details = ('line-delta', False)
2213+ res = ann._expand_record(rev_key, (parent_key,), parent_key,
2214+ record, details)
2215+ self.assertEqual(None, res)
2216+ self.assertTrue(parent_key in ann._pending_deltas)
2217+ pending = ann._pending_deltas[parent_key]
2218+ self.assertEqual(1, len(pending))
2219+ self.assertEqual((rev_key, (parent_key,), record, details), pending[0])
2220+
2221+ def test__expand_record_tracks_num_children(self):
2222+ ann = self.make_annotator()
2223+ rev_key = ('rev-id',)
2224+ rev2_key = ('rev2-id',)
2225+ parent_key = ('parent-id',)
2226+ record = ['0,1,1\n', 'new-line\n']
2227+ details = ('line-delta', False)
2228+ ann._num_compression_children[parent_key] = 2
2229+ ann._expand_record(parent_key, (), None, ['line1\n', 'line2\n'],
2230+ ('fulltext', False))
2231+ res = ann._expand_record(rev_key, (parent_key,), parent_key,
2232+ record, details)
2233+ self.assertEqual({parent_key: 1}, ann._num_compression_children)
2234+ # Expanding the second child should remove the content object, and the
2235+ # num_compression_children entry
2236+ res = ann._expand_record(rev2_key, (parent_key,), parent_key,
2237+ record, details)
2238+ self.assertFalse(parent_key in ann._content_objects)
2239+ self.assertEqual({}, ann._num_compression_children)
2240+ # We should not cache the content_objects for rev2 and rev, because
2241+ # they do not have compression children of their own.
2242+ self.assertEqual({}, ann._content_objects)
2243+
2244+ def test__expand_delta_records_blocks(self):
2245+ ann = self.make_annotator()
2246+ rev_key = ('rev-id',)
2247+ parent_key = ('parent-id',)
2248+ record = ['0,1,1\n', 'new-line\n']
2249+ details = ('line-delta', True)
2250+ ann._num_compression_children[parent_key] = 2
2251+ ann._expand_record(parent_key, (), None,
2252+ ['line1\n', 'line2\n', 'line3\n'],
2253+ ('fulltext', False))
2254+ ann._expand_record(rev_key, (parent_key,), parent_key, record, details)
2255+ self.assertEqual({(rev_key, parent_key): [(1, 1, 1), (3, 3, 0)]},
2256+ ann._matching_blocks)
2257+ rev2_key = ('rev2-id',)
2258+ record = ['0,1,1\n', 'new-line\n']
2259+ details = ('line-delta', False)
2260+ ann._expand_record(rev2_key, (parent_key,), parent_key, record, details)
2261+ self.assertEqual([(1, 1, 2), (3, 3, 0)],
2262+ ann._matching_blocks[(rev2_key, parent_key)])
2263+
2264+ def test__get_parent_ann_uses_matching_blocks(self):
2265+ ann = self.make_annotator()
2266+ rev_key = ('rev-id',)
2267+ parent_key = ('parent-id',)
2268+ parent_ann = [(parent_key,)]*3
2269+ block_key = (rev_key, parent_key)
2270+ ann._annotations_cache[parent_key] = parent_ann
2271+ ann._matching_blocks[block_key] = [(0, 1, 1), (3, 3, 0)]
2272+ # We should not try to access any parent_lines content, because we know
2273+ # we already have the matching blocks
2274+ par_ann, blocks = ann._get_parent_annotations_and_matches(rev_key,
2275+ ['1\n', '2\n', '3\n'], parent_key)
2276+ self.assertEqual(parent_ann, par_ann)
2277+ self.assertEqual([(0, 1, 1), (3, 3, 0)], blocks)
2278+ self.assertEqual({}, ann._matching_blocks)
2279+
2280+ def test__process_pending(self):
2281+ ann = self.make_annotator()
2282+ rev_key = ('rev-id',)
2283+ p1_key = ('p1-id',)
2284+ p2_key = ('p2-id',)
2285+ record = ['0,1,1\n', 'new-line\n']
2286+ details = ('line-delta', False)
2287+ p1_record = ['line1\n', 'line2\n']
2288+ ann._num_compression_children[p1_key] = 1
2289+ res = ann._expand_record(rev_key, (p1_key,p2_key), p1_key,
2290+ record, details)
2291+ self.assertEqual(None, res)
2292+ # self.assertTrue(p1_key in ann._pending_deltas)
2293+ self.assertEqual({}, ann._pending_annotation)
2294+ # Now insert p1, and we should be able to expand the delta
2295+ res = ann._expand_record(p1_key, (), None, p1_record,
2296+ ('fulltext', False))
2297+ self.assertEqual(p1_record, res)
2298+ ann._annotations_cache[p1_key] = [(p1_key,)]*2
2299+ res = ann._process_pending(p1_key)
2300+ self.assertEqual([], res)
2301+ self.assertFalse(p1_key in ann._pending_deltas)
2302+ self.assertTrue(p2_key in ann._pending_annotation)
2303+ self.assertEqual({p2_key: [(rev_key, (p1_key, p2_key))]},
2304+ ann._pending_annotation)
2305+ # Now fill in parent 2, and pending annotation should be satisfied
2306+ res = ann._expand_record(p2_key, (), None, [], ('fulltext', False))
2307+ ann._annotations_cache[p2_key] = []
2308+ res = ann._process_pending(p2_key)
2309+ self.assertEqual([rev_key], res)
2310+ self.assertEqual({}, ann._pending_annotation)
2311+ self.assertEqual({}, ann._pending_deltas)
2312+
2313+ def test_record_delta_removes_basis(self):
2314+ ann = self.make_annotator()
2315+ ann._expand_record(('parent-id',), (), None,
2316+ ['line1\n', 'line2\n'], ('fulltext', False))
2317+ ann._num_compression_children['parent-id'] = 2
2318+
2319+ def test_annotate_special_text(self):
2320+ ann = self.make_annotator()
2321+ vf = ann._vf
2322+ rev1_key = ('rev-1',)
2323+ rev2_key = ('rev-2',)
2324+ rev3_key = ('rev-3',)
2325+ spec_key = ('special:',)
2326+ vf.add_lines(rev1_key, [], ['initial content\n'])
2327+ vf.add_lines(rev2_key, [rev1_key], ['initial content\n',
2328+ 'common content\n',
2329+ 'content in 2\n'])
2330+ vf.add_lines(rev3_key, [rev1_key], ['initial content\n',
2331+ 'common content\n',
2332+ 'content in 3\n'])
2333+ spec_text = ('initial content\n'
2334+ 'common content\n'
2335+ 'content in 2\n'
2336+ 'content in 3\n')
2337+ ann.add_special_text(spec_key, [rev2_key, rev3_key], spec_text)
2338+ anns, lines = ann.annotate(spec_key)
2339+ self.assertEqual([(rev1_key,),
2340+ (rev2_key, rev3_key),
2341+ (rev2_key,),
2342+ (rev3_key,),
2343+ ], anns)
2344+ self.assertEqualDiff(spec_text, ''.join(lines))
2345+
2346+
2347 class KnitTests(TestCaseWithTransport):
2348 """Class containing knit test helper routines."""
2349
2350
2351=== modified file 'bzrlib/tests/test_versionedfile.py'
2352--- bzrlib/tests/test_versionedfile.py 2009-06-22 15:37:06 +0000
2353+++ bzrlib/tests/test_versionedfile.py 2009-07-08 23:35:28 +0000
2354@@ -1557,6 +1557,42 @@
2355 self.assertRaises(RevisionNotPresent,
2356 files.annotate, prefix + ('missing-key',))
2357
2358+ def test_get_annotator(self):
2359+ files = self.get_versionedfiles()
2360+ self.get_diamond_files(files)
2361+ origin_key = self.get_simple_key('origin')
2362+ base_key = self.get_simple_key('base')
2363+ left_key = self.get_simple_key('left')
2364+ right_key = self.get_simple_key('right')
2365+ merged_key = self.get_simple_key('merged')
2366+ # annotator = files.get_annotator()
2367+ # introduced full text
2368+ origins, lines = files.get_annotator().annotate(origin_key)
2369+ self.assertEqual([(origin_key,)], origins)
2370+ self.assertEqual(['origin\n'], lines)
2371+ # a delta
2372+ origins, lines = files.get_annotator().annotate(base_key)
2373+ self.assertEqual([(base_key,)], origins)
2374+ # a merge
2375+ origins, lines = files.get_annotator().annotate(merged_key)
2376+ if self.graph:
2377+ self.assertEqual([
2378+ (base_key,),
2379+ (left_key,),
2380+ (right_key,),
2381+ (merged_key,),
2382+ ], origins)
2383+ else:
2384+ # Without a graph everything is new.
2385+ self.assertEqual([
2386+ (merged_key,),
2387+ (merged_key,),
2388+ (merged_key,),
2389+ (merged_key,),
2390+ ], origins)
2391+ self.assertRaises(RevisionNotPresent,
2392+ files.get_annotator().annotate, self.get_simple_key('missing-key'))
2393+
2394 def test_construct(self):
2395 """Each parameterised test can be constructed on a transport."""
2396 files = self.get_versionedfiles()
2397
2398=== modified file 'bzrlib/tests/workingtree_implementations/__init__.py'
2399--- bzrlib/tests/workingtree_implementations/__init__.py 2009-06-10 03:56:49 +0000
2400+++ bzrlib/tests/workingtree_implementations/__init__.py 2009-07-08 23:35:28 +0000
2401@@ -63,6 +63,7 @@
2402 test_workingtree_implementations = [
2403 'bzrlib.tests.workingtree_implementations.test_add_reference',
2404 'bzrlib.tests.workingtree_implementations.test_add',
2405+ 'bzrlib.tests.workingtree_implementations.test_annotate_iter',
2406 'bzrlib.tests.workingtree_implementations.test_basis_inventory',
2407 'bzrlib.tests.workingtree_implementations.test_basis_tree',
2408 'bzrlib.tests.workingtree_implementations.test_break_lock',
2409
2410=== added file 'bzrlib/tests/workingtree_implementations/test_annotate_iter.py'
2411--- bzrlib/tests/workingtree_implementations/test_annotate_iter.py 1970-01-01 00:00:00 +0000
2412+++ bzrlib/tests/workingtree_implementations/test_annotate_iter.py 2009-07-08 23:35:28 +0000
2413@@ -0,0 +1,189 @@
2414+# Copyright (C) 2009 Canonical Ltd
2415+#
2416+# This program is free software; you can redistribute it and/or modify
2417+# it under the terms of the GNU General Public License as published by
2418+# the Free Software Foundation; either version 2 of the License, or
2419+# (at your option) any later version.
2420+#
2421+# This program is distributed in the hope that it will be useful,
2422+# but WITHOUT ANY WARRANTY; without even the implied warranty of
2423+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
2424+# GNU General Public License for more details.
2425+#
2426+# You should have received a copy of the GNU General Public License
2427+# along with this program; if not, write to the Free Software
2428+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
2429+
2430+"""Tests for interface conformance of 'WorkingTree.annotate_iter'"""
2431+
2432+import os
2433+
2434+from bzrlib import (
2435+ errors,
2436+ inventory,
2437+ osutils,
2438+ tests,
2439+ )
2440+from bzrlib.tests.workingtree_implementations import TestCaseWithWorkingTree
2441+
2442+
2443+class TestAnnotateIter(TestCaseWithWorkingTree):
2444+
2445+ def make_single_rev_tree(self):
2446+ builder = self.make_branch_builder('branch')
2447+ builder.build_snapshot('rev-1', None, [
2448+ ('add', ('', 'TREE_ROOT', 'directory', None)),
2449+ ('add', ('file', 'file-id', 'file', 'initial content\n')),
2450+ ])
2451+ b = builder.get_branch()
2452+ tree = b.create_checkout('tree', lightweight=True)
2453+ tree.lock_read()
2454+ self.addCleanup(tree.unlock)
2455+ return tree
2456+
2457+ def test_annotate_same_as_parent(self):
2458+ tree = self.make_single_rev_tree()
2459+ annotations = tree.annotate_iter('file-id')
2460+ self.assertEqual([('rev-1', 'initial content\n')],
2461+ annotations)
2462+
2463+ def test_annotate_mod_from_parent(self):
2464+ tree = self.make_single_rev_tree()
2465+ self.build_tree_contents([('tree/file',
2466+ 'initial content\nnew content\n')])
2467+ annotations = tree.annotate_iter('file-id')
2468+ self.assertEqual([('rev-1', 'initial content\n'),
2469+ ('current:', 'new content\n'),
2470+ ], annotations)
2471+
2472+ def test_annotate_merge_parents(self):
2473+ builder = self.make_branch_builder('branch')
2474+ builder.start_series()
2475+ builder.build_snapshot('rev-1', None, [
2476+ ('add', ('', 'TREE_ROOT', 'directory', None)),
2477+ ('add', ('file', 'file-id', 'file', 'initial content\n')),
2478+ ])
2479+ builder.build_snapshot('rev-2', ['rev-1'], [
2480+ ('modify', ('file-id', 'initial content\ncontent in 2\n')),
2481+ ])
2482+ builder.build_snapshot('rev-3', ['rev-1'], [
2483+ ('modify', ('file-id', 'initial content\ncontent in 3\n')),
2484+ ])
2485+ builder.finish_series()
2486+ b = builder.get_branch()
2487+ tree = b.create_checkout('tree', revision_id='rev-2', lightweight=True)
2488+ tree.lock_write()
2489+ self.addCleanup(tree.unlock)
2490+ tree.set_parent_ids(['rev-2', 'rev-3'])
2491+ self.build_tree_contents([('tree/file',
2492+ 'initial content\ncontent in 2\n'
2493+ 'content in 3\nnew content\n')])
2494+ annotations = tree.annotate_iter('file-id')
2495+ self.assertEqual([('rev-1', 'initial content\n'),
2496+ ('rev-2', 'content in 2\n'),
2497+ ('rev-3', 'content in 3\n'),
2498+ ('current:', 'new content\n'),
2499+ ], annotations)
2500+
2501+ def test_annotate_merge_parent_no_file(self):
2502+ builder = self.make_branch_builder('branch')
2503+ builder.start_series()
2504+ builder.build_snapshot('rev-1', None, [
2505+ ('add', ('', 'TREE_ROOT', 'directory', None)),
2506+ ])
2507+ builder.build_snapshot('rev-2', ['rev-1'], [
2508+ ('add', ('file', 'file-id', 'file', 'initial content\n')),
2509+ ])
2510+ builder.build_snapshot('rev-3', ['rev-1'], [])
2511+ builder.finish_series()
2512+ b = builder.get_branch()
2513+ tree = b.create_checkout('tree', revision_id='rev-2', lightweight=True)
2514+ tree.lock_write()
2515+ self.addCleanup(tree.unlock)
2516+ tree.set_parent_ids(['rev-2', 'rev-3'])
2517+ self.build_tree_contents([('tree/file',
2518+ 'initial content\nnew content\n')])
2519+ annotations = tree.annotate_iter('file-id')
2520+ self.assertEqual([('rev-2', 'initial content\n'),
2521+ ('current:', 'new content\n'),
2522+ ], annotations)
2523+
2524+ def test_annotate_merge_parent_was_directory(self):
2525+ builder = self.make_branch_builder('branch')
2526+ builder.start_series()
2527+ builder.build_snapshot('rev-1', None, [
2528+ ('add', ('', 'TREE_ROOT', 'directory', None)),
2529+ ])
2530+ builder.build_snapshot('rev-2', ['rev-1'], [
2531+ ('add', ('file', 'file-id', 'file', 'initial content\n')),
2532+ ])
2533+ builder.build_snapshot('rev-3', ['rev-1'], [
2534+ ('add', ('a_dir', 'file-id', 'directory', None)),
2535+ ])
2536+ builder.finish_series()
2537+ b = builder.get_branch()
2538+ tree = b.create_checkout('tree', revision_id='rev-2', lightweight=True)
2539+ tree.lock_write()
2540+ self.addCleanup(tree.unlock)
2541+ tree.set_parent_ids(['rev-2', 'rev-3'])
2542+ self.build_tree_contents([('tree/file',
2543+ 'initial content\nnew content\n')])
2544+ annotations = tree.annotate_iter('file-id')
2545+ self.assertEqual([('rev-2', 'initial content\n'),
2546+ ('current:', 'new content\n'),
2547+ ], annotations)
2548+
2549+ def test_annotate_same_as_merge_parent(self):
2550+ builder = self.make_branch_builder('branch')
2551+ builder.start_series()
2552+ builder.build_snapshot('rev-1', None, [
2553+ ('add', ('', 'TREE_ROOT', 'directory', None)),
2554+ ('add', ('file', 'file-id', 'file', 'initial content\n')),
2555+ ])
2556+ builder.build_snapshot('rev-2', ['rev-1'], [
2557+ ])
2558+ builder.build_snapshot('rev-3', ['rev-1'], [
2559+ ('modify', ('file-id', 'initial content\ncontent in 3\n')),
2560+ ])
2561+ builder.finish_series()
2562+ b = builder.get_branch()
2563+ tree = b.create_checkout('tree', revision_id='rev-2', lightweight=True)
2564+ tree.lock_write()
2565+ self.addCleanup(tree.unlock)
2566+ tree.set_parent_ids(['rev-2', 'rev-3'])
2567+ self.build_tree_contents([('tree/file',
2568+ 'initial content\ncontent in 3\n')])
2569+ annotations = tree.annotate_iter('file-id')
2570+ self.assertEqual([('rev-1', 'initial content\n'),
2571+ ('rev-3', 'content in 3\n'),
2572+ ], annotations)
2573+
2574+ def test_annotate_same_as_merge_parent_supersedes(self):
2575+ builder = self.make_branch_builder('branch')
2576+ builder.start_series()
2577+ builder.build_snapshot('rev-1', None, [
2578+ ('add', ('', 'TREE_ROOT', 'directory', None)),
2579+ ('add', ('file', 'file-id', 'file', 'initial content\n')),
2580+ ])
2581+ builder.build_snapshot('rev-2', ['rev-1'], [
2582+ ('modify', ('file-id', 'initial content\nnew content\n')),
2583+ ])
2584+ builder.build_snapshot('rev-3', ['rev-2'], [
2585+ ('modify', ('file-id', 'initial content\ncontent in 3\n')),
2586+ ])
2587+ builder.build_snapshot('rev-4', ['rev-3'], [
2588+ ('modify', ('file-id', 'initial content\nnew content\n')),
2589+ ])
2590+ # In this case, the content locally is the same as content in basis
2591+ # tree, but the merge revision states that *it* should win
2592+ builder.finish_series()
2593+ b = builder.get_branch()
2594+ tree = b.create_checkout('tree', revision_id='rev-2', lightweight=True)
2595+ tree.lock_write()
2596+ self.addCleanup(tree.unlock)
2597+ tree.set_parent_ids(['rev-2', 'rev-4'])
2598+ annotations = tree.annotate_iter('file-id')
2599+ self.assertEqual([('rev-1', 'initial content\n'),
2600+ ('rev-4', 'new content\n'),
2601+ ], annotations)
2602+
2603
2604=== modified file 'bzrlib/transform.py'
2605--- bzrlib/transform.py 2009-06-30 04:08:12 +0000
2606+++ bzrlib/transform.py 2009-07-08 23:35:27 +0000
2607@@ -1,4 +1,4 @@
2608-# Copyright (C) 2006, 2007, 2008 Canonical Ltd
2609+# Copyright (C) 2006, 2007, 2008, 2009 Canonical Ltd
2610 #
2611 # This program is free software; you can redistribute it and/or modify
2612 # it under the terms of the GNU General Public License as published by
2613@@ -1962,6 +1962,13 @@
2614 return old_annotation
2615 if not changed_content:
2616 return old_annotation
2617+ # TODO: This is doing something similar to what WT.annotate_iter is
2618+ # doing, however it fails slightly because it doesn't know what
2619+ # the *other* revision_id is, so it doesn't know how to give the
2620+ # other as the origin for some lines, they all get
2621+ # 'default_revision'
2622+ # It would be nice to be able to use the new Annotator based
2623+ # approach, as well.
2624 return annotate.reannotate([old_annotation],
2625 self.get_file(file_id).readlines(),
2626 default_revision)
2627
2628=== modified file 'bzrlib/versionedfile.py'
2629--- bzrlib/versionedfile.py 2009-06-22 15:47:25 +0000
2630+++ bzrlib/versionedfile.py 2009-07-08 23:35:28 +0000
2631@@ -30,6 +30,7 @@
2632 import urllib
2633
2634 from bzrlib import (
2635+ annotate,
2636 errors,
2637 groupcompress,
2638 index,
2639@@ -1122,6 +1123,9 @@
2640 result.append((prefix + (origin,), line))
2641 return result
2642
2643+ def get_annotator(self):
2644+ return annotate.Annotator(self)
2645+
2646 def check(self, progress_bar=None):
2647 """See VersionedFiles.check()."""
2648 for prefix, vf in self._iter_all_components():
2649
2650=== modified file 'bzrlib/workingtree.py'
2651--- bzrlib/workingtree.py 2009-07-01 10:40:07 +0000
2652+++ bzrlib/workingtree.py 2009-07-08 23:35:28 +0000
2653@@ -1,4 +1,4 @@
2654-# Copyright (C) 2005, 2006, 2007, 2008 Canonical Ltd
2655+# Copyright (C) 2005, 2006, 2007, 2008, 2009 Canonical Ltd
2656 #
2657 # This program is free software; you can redistribute it and/or modify
2658 # it under the terms of the GNU General Public License as published by
2659@@ -58,6 +58,7 @@
2660 errors,
2661 generate_ids,
2662 globbing,
2663+ graph as _mod_graph,
2664 hashcache,
2665 ignores,
2666 inventory,
2667@@ -477,31 +478,42 @@
2668 incorrectly attributed to CURRENT_REVISION (but after committing, the
2669 attribution will be correct).
2670 """
2671- basis = self.basis_tree()
2672- basis.lock_read()
2673- try:
2674- changes = self.iter_changes(basis, True, [self.id2path(file_id)],
2675- require_versioned=True).next()
2676- changed_content, kind = changes[2], changes[6]
2677- if not changed_content:
2678- return basis.annotate_iter(file_id)
2679- if kind[1] is None:
2680- return None
2681- import annotate
2682- if kind[0] != 'file':
2683- old_lines = []
2684- else:
2685- old_lines = list(basis.annotate_iter(file_id))
2686- old = [old_lines]
2687- for tree in self.branch.repository.revision_trees(
2688- self.get_parent_ids()[1:]):
2689- if file_id not in tree:
2690- continue
2691- old.append(list(tree.annotate_iter(file_id)))
2692- return annotate.reannotate(old, self.get_file(file_id).readlines(),
2693- default_revision)
2694- finally:
2695- basis.unlock()
2696+ maybe_file_parent_keys = []
2697+ for parent_id in self.get_parent_ids():
2698+ try:
2699+ parent_tree = self.revision_tree(parent_id)
2700+ except errors.NoSuchRevisionInTree:
2701+ parent_tree = self.branch.repository.revision_tree(parent_id)
2702+ parent_tree.lock_read()
2703+ try:
2704+ if file_id not in parent_tree:
2705+ continue
2706+ ie = parent_tree.inventory[file_id]
2707+ if ie.kind != 'file':
2708+ # Note: this is slightly unnecessary, because symlinks and
2709+ # directories have a "text" which is the empty text, and we
2710+ # know that won't mess up annotations. But it seems cleaner
2711+ continue
2712+ parent_text_key = (file_id, ie.revision)
2713+ if parent_text_key not in maybe_file_parent_keys:
2714+ maybe_file_parent_keys.append(parent_text_key)
2715+ finally:
2716+ parent_tree.unlock()
2717+ graph = _mod_graph.Graph(self.branch.repository.texts)
2718+ heads = graph.heads(maybe_file_parent_keys)
2719+ file_parent_keys = []
2720+ for key in maybe_file_parent_keys:
2721+ if key in heads:
2722+ file_parent_keys.append(key)
2723+
2724+ # Now we have the parents of this content
2725+ annotator = self.branch.repository.texts.get_annotator()
2726+ text = self.get_file(file_id).read()
2727+ this_key =(file_id, default_revision)
2728+ annotator.add_special_text(this_key, file_parent_keys, text)
2729+ annotations = [(key[-1], line)
2730+ for key, line in annotator.annotate_flat(this_key)]
2731+ return annotations
2732
2733 def _get_ancestors(self, default_revision):
2734 ancestors = set([default_revision])
2735
2736=== modified file 'setup.py'
2737--- setup.py 2009-06-23 07:10:03 +0000
2738+++ setup.py 2009-07-08 23:35:27 +0000
2739@@ -259,6 +259,7 @@
2740 define_macros=define_macros, libraries=libraries))
2741
2742
2743+add_pyrex_extension('bzrlib._annotator_pyx')
2744 add_pyrex_extension('bzrlib._bencode_pyx')
2745 add_pyrex_extension('bzrlib._btree_serializer_pyx')
2746 add_pyrex_extension('bzrlib._chunks_to_lines_pyx')