Merge lp:~spiv/bzr/per-file-merge-hook-491711 into lp:bzr

Proposed by Andrew Bennetts
Status: Merged
Merged at revision: not available
Proposed branch: lp:~spiv/bzr/per-file-merge-hook-491711
Merge into: lp:bzr
Prerequisite: lp:~spiv/bzr/merge-minor-refactoring
Diff against target: 960 lines (+672/-74)
10 files modified
NEWS (+16/-0)
bzrlib/decorators.py (+76/-0)
bzrlib/hooks.py (+2/-0)
bzrlib/merge.py (+176/-74)
bzrlib/plugins/news_merge/README (+9/-0)
bzrlib/plugins/news_merge/__init__.py (+73/-0)
bzrlib/plugins/news_merge/news_merge.py (+88/-0)
bzrlib/plugins/news_merge/parser.py (+62/-0)
bzrlib/tests/__init__.py (+1/-0)
bzrlib/tests/per_merger.py (+169/-0)
To merge this branch: bzr merge lp:~spiv/bzr/per-file-merge-hook-491711
Reviewer Review Type Date Requested Status
John A Meinel Approve
Vincent Ladeuil Approve
Review via email: mp+17279@code.launchpad.net

This proposal supersedes a proposal from 2009-12-18.

To post a comment you must log in.
Revision history for this message
Andrew Bennetts (spiv) wrote : Posted in a previous version of this proposal

This branch adds a hook point that allows a plugin to perform the merging of the contents of specific files (see bug 491711).

The tests hopefully make the capabilities fairly clear. In short:

 * it is invoked for all changes where one side has changed the content, and the other side has changed the content or deleted the file
 * hook functions are passed the merger, file_id and trans_id (so can get access to filenames and file contents in the various this/other/base trees)
 * hook functions can explicitly do nothing (allowing the next registered function to try), or return a conflict, a successful merge, or a deletion of the file.
 * hook functions can set the contents of a file in both the success and conflict case

Some open questions (could be addressed in future patches perhaps):

 * should merge of a file rename vs. a text change use this hook point?
 * should a hook function be able to have some control over the emission of THIS/BASE/OTHER files when it results in a conflict, or over their contents of those files?
 * what helpers can we provide to help people write hook functions?
 * what more can we do to help multiple hook registrations co-exist peacefully? e.g. These functions already chain, but if multiple plugins register with this hook the ordering is not defined.

I think ideally this hook point would be marked experimental until we get some practical experience proving that it works well in practice (e.g. that someone can write a plugin to merge NEWS files), but that's what beta series are for ;)

Revision history for this message
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal

On Fri, 2009-12-18 at 08:08 +0000, Andrew Bennetts wrote:
>
> * it is invoked for all changes where one side has changed the
> content, and the other side has changed the content or deleted the fil

Why not 'all changes' ? I don't see why we should prevent hooks merging
even one-side changes in a more sophisticated manner.

> * should a hook function be able to have some control over the
> emission of THIS/BASE/OTHER files when it results in a conflict, or
> over their contents of those files?

I think yes, but that can come later.

> * what helpers can we provide to help people write hook functions?
> * what more can we do to help multiple hook registrations co-exist
> peacefully? e.g. These functions already chain, but if multiple
> plugins register with this hook the ordering is not defined.

I don't think the ordering matters, except in the case that someone is
writing a new 'fallback' hook, to replace our three-way merge attempt.
Perhaps put a priority into the hook or something? I'd wait for signs we
need it.

> I think ideally this hook point would be marked experimental until we
> get some practical experience proving that it works well in practice
> (e.g. that someone can write a plugin to merge NEWS files), but that's
> what beta series are for ;)

You can just mark it experimental in the docs ;)

-Rob

Revision history for this message
James Westby (james-w) wrote : Posted in a previous version of this proposal

On Fri, 18 Dec 2009 08:08:34 -0000, Andrew Bennetts <email address hidden> wrote:
> This branch adds a hook point that allows a plugin to perform the merging of the contents of specific files (see bug 491711).

Yay, thanks for working on this.

> The tests hopefully make the capabilities fairly clear. In short:
>
> * hook functions are passed the merger, file_id and trans_id (so can
> get access to filenames and file contents in the various
> this/other/base trees)

Is there any other information that is readily available?

I am a little concerned about the performance issues. bzr-builddeb is
going to be shipping one of these hooks, and I don't want that to
destroy the performance of merge.

At minimum on every conflict it has to look up the path names. If
we do as Rob suggests and call the hook on every modification that
is going to be O(tree) lookups.

Some way to short-circuit this might be good. Either passing in the
paths if they are already known (and are likely to be known even
with refactoring), or even allowing the hook to declare what paths
it is interested in (though that could be limiting).

  (e.g. if path_filter is not None and basename == path_filter:)

Have you written a very simple implementation of e.g. a NEWS merger
and measured the impact?

> Some open questions (could be addressed in future patches perhaps):
>
> * should merge of a file rename vs. a text change use this hook
> point?

I imagine we would want to make it possible.

Some helpers would be useful for splitting out the cases, e.g. I imagine
many will just want to act on two text changes.

> * should a hook function be able to have some control over the
> emission of THIS/BASE/OTHER files when it results in a conflict, or
> over their contents of those files?

I'm not sure I see a use case for this.

> * what helpers can we provide to help people write hook functions?

Helpers for getting the paths, and other meta information.

Helpers for determining the cause of the conflict, text changes vs.
path changes etc.

> * what more can we do to help multiple hook registrations co-exist
> peacefully? e.g. These functions already chain, but if multiple
> plugins register with this hook the ordering is not defined.

I think using a known ordering would be good. Even using sorted() on
the hook names. This would allow you to place something at the end
by calling it zzfallback for instance.

Otherwise changing the API to allow a priority could be good. However,
these APIs are always tricky to deal with, as you are playing a game
of rock-paper-scissors with and unbounded number of choices.

I bring it up now, as it may require API changes in the registration
to do this.

Thanks,

James

Revision history for this message
Aaron Bentley (abentley) wrote : Posted in a previous version of this proposal

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> On Fri, 2009-12-18 at 08:08 +0000, Andrew Bennetts wrote:
>> * it is invoked for all changes where one side has changed the
>> content, and the other side has changed the content or deleted the fil
>
> Why not 'all changes' ? I don't see why we should prevent hooks merging
> even one-side changes in a more sophisticated manner.

That would require us to pay attention to THIS-only changes, and that
would be a performance regression.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAksrtrIACgkQ0F+nu1YWqI3a6wCeNdmME7ZY/xSZ7ma6snOd30fE
XUUAni6xZaHfi47GfpkTivlXF8uFDa5w
=uew/
-----END PGP SIGNATURE-----

Revision history for this message
Aaron Bentley (abentley) wrote : Posted in a previous version of this proposal

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

James Westby wrote:
> On Fri, 18 Dec 2009 08:08:34 -0000, Andrew Bennetts <email address hidden> wrote:
>> This branch adds a hook point that allows a plugin to perform the merging of the contents of specific files (see bug 491711).
>> * should a hook function be able to have some control over the
>> emission of THIS/BASE/OTHER files when it results in a conflict, or
>> over their contents of those files?
>
> I'm not sure I see a use case for this.

merge3 and weave merge emit different things as their BASE, so if we
hooked merge to force a weave merge for certain files, we might want to
also use that style of BASE.

Can't see any use for controlling THIS or OTHER, though.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAksrt6oACgkQ0F+nu1YWqI26pgCdF24SIHw4I4pnYV1DpikUoPwT
3EgAn08cMSUKDneD39c0rBt4aKK+zYCh
=QYTu
-----END PGP SIGNATURE-----

Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> On Fri, 2009-12-18 at 08:08 +0000, Andrew Bennetts wrote:
>> * it is invoked for all changes where one side has changed the
>> content, and the other side has changed the content or deleted the fil
>
> Why not 'all changes' ? I don't see why we should prevent hooks merging
> even one-side changes in a more sophisticated manner.

I assume this is because the merge code itself splits these cases. Files
with clear winners don't get passed into the Merger code. Also, there is
bidirectional issues.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksrw+oACgkQJdeBCYSNAANPTQCeLRj5ST7fT58nGOkIJldtyNis
9fsAoMnNoCWak28nr8shdECpIyLRbxtY
=lRgj
-----END PGP SIGNATURE-----

Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...
>
> Is there any other information that is readily available?
>
> I am a little concerned about the performance issues. bzr-builddeb is
> going to be shipping one of these hooks, and I don't want that to
> destroy the performance of merge.
>
> At minimum on every conflict it has to look up the path names. If
> we do as Rob suggests and call the hook on every modification that
> is going to be O(tree) lookups.

Still O(changes), I don't see how you get O(tree).

>
> Some way to short-circuit this might be good. Either passing in the
> paths if they are already known (and are likely to be known even
> with refactoring), or even allowing the hook to declare what paths
> it is interested in (though that could be limiting).

I think passing in the "path_in_this_tree, path_in_other_tree, and
path_in_final_tree" as determined by the current merge logic would be
reasonable.

>
> (e.g. if path_filter is not None and basename == path_filter:)
>
> Have you written a very simple implementation of e.g. a NEWS merger
> and measured the impact?
>
>> Some open questions (could be addressed in future patches perhaps):
>>
>> * should merge of a file rename vs. a text change use this hook
>> point?
>
> I imagine we would want to make it possible.
>
> Some helpers would be useful for splitting out the cases, e.g. I imagine
> many will just want to act on two text changes.
>
>> * should a hook function be able to have some control over the
>> emission of THIS/BASE/OTHER files when it results in a conflict, or
>> over their contents of those files?
>
> I'm not sure I see a use case for this.
>
>> * what helpers can we provide to help people write hook functions?
>
> Helpers for getting the paths, and other meta information.
>
> Helpers for determining the cause of the conflict, text changes vs.
> path changes etc.
>
>> * what more can we do to help multiple hook registrations co-exist
>> peacefully? e.g. These functions already chain, but if multiple
>> plugins register with this hook the ordering is not defined.
>
> I think using a known ordering would be good. Even using sorted() on
> the hook names. This would allow you to place something at the end
> by calling it zzfallback for instance.
>
> Otherwise changing the API to allow a priority could be good. However,
> these APIs are always tricky to deal with, as you are playing a game
> of rock-paper-scissors with and unbounded number of choices.
>
> I bring it up now, as it may require API changes in the registration
> to do this.
>
> Thanks,
>
> James

I think explicitly making them unordered forces people to think about
whether they can do what they need in an unordered fashion.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksrxLwACgkQJdeBCYSNAANiYwCgpaTb3KqL65jqOHou/mpzrQwt
/3oAn1TyJC5/tS1+Xc0+3//PRxoai0lI
=Gavq
-----END PGP SIGNATURE-----

Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Aaron Bentley wrote:
> James Westby wrote:
>> On Fri, 18 Dec 2009 08:08:34 -0000, Andrew Bennetts <email address hidden> wrote:
>>> This branch adds a hook point that allows a plugin to perform the merging of the contents of specific files (see bug 491711).
>>> * should a hook function be able to have some control over the
>>> emission of THIS/BASE/OTHER files when it results in a conflict, or
>>> over their contents of those files?
>> I'm not sure I see a use case for this.
>
> merge3 and weave merge emit different things as their BASE, so if we
> hooked merge to force a weave merge for certain files, we might want to
> also use that style of BASE.
>
> Can't see any use for controlling THIS or OTHER, though.
>
> Aaron

Well, potentially a whitespace ignoring merger may want to pass that
ability on to the output files?

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksrxN0ACgkQJdeBCYSNAAMJ9gCgxngkhtNINZPw7iNuWPe8JUs3
6UIAn30zdkrAz6hhN2PoLlhgxsfy0vCm
=omK9
-----END PGP SIGNATURE-----

Revision history for this message
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal

On Fri, 2009-12-18 at 17:09 +0000, Aaron Bentley wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Robert Collins wrote:
> > On Fri, 2009-12-18 at 08:08 +0000, Andrew Bennetts wrote:
> >> * it is invoked for all changes where one side has changed the
> >> content, and the other side has changed the content or deleted the fil
> >
> > Why not 'all changes' ? I don't see why we should prevent hooks merging
> > even one-side changes in a more sophisticated manner.
>
> That would require us to pay attention to THIS-only changes, and that
> would be a performance regression.

So there is a middle ground, where OTHER-only changes are examined.
There are a number of useful things one can do where OTHER-only changes
are examined (such as better merges of NEWS items by accounting for
changes after a release) which the current definition won't help with. I
see that checking for THIS-only changes would have an impact, and while
it might be nice, I don't see a need to do it at this point.

-Rob

Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal
Download full text (6.4 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> Andrew Bennetts has proposed merging lp:~spiv/bzr/per-file-merge-hook-491711 into lp:bzr with lp:~spiv/bzr/merge-minor-refactoring as a prerequisite.
>
> Requested reviews:
> bzr-core (bzr-core)
> Related bugs:
> #491711 hook to permit per file merges
> https://bugs.launchpad.net/bugs/491711
>
>
> This branch adds a hook point that allows a plugin to perform the merging of the contents of specific files (see bug 491711).
>
> The tests hopefully make the capabilities fairly clear. In short:
>
> * it is invoked for all changes where one side has changed the content, and the other side has changed the content or deleted the file
> * hook functions are passed the merger, file_id and trans_id (so can get access to filenames and file contents in the various this/other/base trees)
> * hook functions can explicitly do nothing (allowing the next registered function to try), or return a conflict, a successful merge, or a deletion of the file.
> * hook functions can set the contents of a file in both the success and conflict case
>
> Some open questions (could be addressed in future patches perhaps):
>
> * should merge of a file rename vs. a text change use this hook point?
> * should a hook function be able to have some control over the emission of THIS/BASE/OTHER files when it results in a conflict, or over their contents of those files?
> * what helpers can we provide to help people write hook functions?
> * what more can we do to help multiple hook registrations co-exist peacefully? e.g. These functions already chain, but if multiple plugins register with this hook the ordering is not defined.
>
> I think ideally this hook point would be marked experimental until we get some practical experience proving that it works well in practice (e.g. that someone can write a plugin to merge NEWS files), but that's what beta series are for ;)
>
>

+ self.create_hook(hooks.HookPoint('merge_file_content',
+ "Called when file content needs to be merged (including
when one "
+ "side has deleted the file and the other has changed it)."
+ "merge_file_content is called with a "
+ "bzrlib.merge.MergeHookParams. The function should return a
tuple "
+ "of (status, lines), where status is one of 'not_applicable', "
+ "'success', 'conflicted', or 'delete'. If status is
success or "
+ "conflicted, then lines should be an iterable of the new
lines "
+ "for the file.",
+ (2, 1), None))

^- I'm wondering if we wouldn't want something smarter than a tuple of
(status, lines) being returned. It might give us a way to move to
supporting returning THIS/BASE/OTHER more directly. It may also make it
easier to handle path conflicts / updating the path as part of the hook.

I think that since the transform is being passed in (via the Merger
object), you actually can do any renaming you want. You just tell the
transform that the path for 'trans_id' should really be 'X'.

As james_w mentioned, we may want to make the api a little bit more
verbose, and have some helper functions o...

Read more...

review: Needs Information
Revision history for this message
Andrew Bennetts (spiv) wrote : Posted in a previous version of this proposal
Download full text (13.7 KiB)

First, thanks everyone for the deluge of thoughtful comments! It's
great to have so much enthusiasm about a patch :)

I'm going to reply to all the comments in this one jumbo reply, rather
than try to detangle the conversation threads this proposal has spawned.
I've skipped over some comments that are repeats of others, but I think
I've replied to every point. Do let me know if I've inadvertantly
forgotten something you'd like me to consider.

Robert Collins wrote:
> On Fri, 2009-12-18 at 08:08 +0000, Andrew Bennetts wrote:
> >
> > * it is invoked for all changes where one side has changed the
> > content, and the other side has changed the content or deleted the fil
>
> Why not 'all changes' ? I don't see why we should prevent hooks merging
> even one-side changes in a more sophisticated manner.

Hmm. I guess because if there's a clear one-side winner then I don't
think of that as a “merge” (from the point of view of that file), and I
didn't have a use-case for it. I was also trying to be mindful of
keeping the performance cost minimal, as Aaron mentioned. You have
given a use-case in a follow-up (“better merges of NEWS items by
accounting for changes after a release”), and that would definitely be
nice to have. I'll have a play and see what it would take to make
OTHER-only changes go via the hook, as your followup suggests. It
sounds like no-one is strongly interested in THIS-only at this point, so
I'm happy to ignore that for this patch.

Also, I know it's a bit of a cop-out answer, but a plugin could always
implement a whole new Merger if the hook isn't powerful enough...

John's reply to you said:

] I assume this is because the merge code itself splits these cases. Files
] with clear winners don't get passed into the Merger code. Also, there is
] bidirectional issues.

Right, I definitely did want to keep the changes as minimal as possible
because I'm not deeply familiar with the structure of the merge code,
but I do get the sense that it would be easy to accidentally introduce
bugs if I rearranged things.

I'm not certain what you mean by bidirectional issues? I think I can
make some educated guesses, but I'd rather you explained for me :)

> > * should a hook function be able to have some control over the
> > emission of THIS/BASE/OTHER files when it results in a conflict, or
> > over their contents of those files?
>
> I think yes, but that can come later.

Cool.

James Westby wrote:
> On Fri, 18 Dec 2009 08:08:34 -0000, Andrew Bennetts <email address hidden> wrote:
[...]
> > * hook functions are passed the merger, file_id and trans_id (so can
> > get access to filenames and file contents in the various
> > this/other/base trees)
>
> Is there any other information that is readily available?

As John says, the path in the various tree versions are fairly cheaply
available via the merger object. I think it'll be worth adding
convenience methods to get at them. I imagine some plugins might want
to just check the basename, others would like to check the full path,
some might just use the file_id.

> I am a little concerned about the performance issues. bzr-builddeb is
> going to be shipping one of these hooks, ...

Revision history for this message
James Westby (james-w) wrote : Posted in a previous version of this proposal

On Mon, 21 Dec 2009 02:39:14 -0000, Andrew Bennetts <email address hidden> wrote:
> I'm interested in the “on every conflict it [bzr-builddeb?] has to look
> up the path names” remark though — I don't think this hook as-is
> provides a convenient way to implement “try the regular logic, if that
> conflicts then try my custom plugin logic”. Is that something you want?
> I suspect the way to provide that nicely may be via making the regular
> logic interact via the same interfaces as the hook (i.e. receiving the
> same params object and returning the same values), and then providing a
> convenient way to call that via the hook params.

The way I imagine it working is that a hook will be registered by
bzr-builddeb that will merge debian/changelog files. The question
is to know what files to act on. We can try and parse the files
and just skip everything that doesn't parse as one of those files.
However, that will be both quite slow, plus hugely pessimistic.

Therefore I would want to short-circuit, and only bother trying
to parse files where the basename is "changelog". It would be
great if the basename was easily accesible to the hook so
that it could do this check without doing something like
trans_id->file_id->path.

> Yeah, that's a good point. Even many those that want to handle more than
> plain text vs text changes might still want to punt on delete vs text,
> for example. So I need to take the knowledge about what sort of changes
> are involved that's implicit in the code path now and turn that into
> some data that the hook function can inspect.

Yep, I'm mainly concerned in these comments about the API that is
presented to plugin writers. Being able simply separate out the cases
without each plugin having to cargo cult code to detect what situation
it is being called for would be good.

I think this will partially come from APIs, which I imagine are there,
I just haven't looked, and from docs. We should docuement the things
people commonly want to do so that it is obvious to plugin writers.
My concern would be that you write a hook that assumes you are always
dealing with text vs. text, and then it crashes when you get text vs.
delete. Making it clear that you can detect the cases at least points
you towards considering which you want to handle, even if the API
doesn't force you to think about them all.

> Well what sort of priority system do you have in mind that would be
> better than ordering by name?

Nothing in particular. I was just making a plea for a defined order.
If e.g. the order was whatever dict.keys() returns then there is no
way to get your hook to run last if you need to.

Thanks,

James

Revision history for this message
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal

On Mon, 2009-12-21 at 02:57 +0000, James Westby wrote:

> Nothing in particular. I was just making a plea for a defined order.
> If e.g. the order was whatever dict.keys() returns then there is no
> way to get your hook to run last if you need to.

You don't with names either.

'Run last' is very very hard to pin down.

'Topological order' is a bit easier, 'numeric order' is very simple.

Hooks don't have names so much as descriptions, I strongly urge /not/ to
use the labels as a sort order; ugh. We don't have orders elsewhere in
hook land and we haven't needed to - even though this discussion happens
every time.

Re: merging changelog, I'd lookup the full path and compare to
'debian/changelog' and 'changelog' - its simple and should be fast.

Andrew: are kind changes able to invoke this hook? If not please make
that clear.

I think passing in the Merger is fine; we pass in Branch in other hooks
and trust folk to use public bits and not make a mess. If someone does
fiddle, the conflict resolution phases will detect that pretty robustly
I think.

-Rob

Revision history for this message
Andrew Bennetts (spiv) wrote : Posted in a previous version of this proposal

Robert Collins wrote:
> On Mon, 2009-12-21 at 02:57 +0000, James Westby wrote:
>
> > Nothing in particular. I was just making a plea for a defined order.
> > If e.g. the order was whatever dict.keys() returns then there is no
> > way to get your hook to run last if you need to.
>
> You don't with names either.
>
> 'Run last' is very very hard to pin down.
>
> 'Topological order' is a bit easier, 'numeric order' is very simple.
>
> Hooks don't have names so much as descriptions, I strongly urge /not/ to
> use the labels as a sort order; ugh. We don't have orders elsewhere in
> hook land and we haven't needed to - even though this discussion happens
> every time.

Yeah, I don't much like punning that. I'd like to make the order
defined and stable somehow rather than dependent on the somewhat
unstable order that plugins will get loaded, though.

> Re: merging changelog, I'd lookup the full path and compare to
> 'debian/changelog' and 'changelog' - its simple and should be fast.

I definitely think that's what I'd try to start with. Maybe that will
be a significant performance hit in which case we can rethink it, but
let's not prematurely optimise.

> Andrew: are kind changes able to invoke this hook? If not please make
> that clear.

I don't know off the top of my head. I'll find out and make it
explicit. My guess is no, but I'm not very confident about that.

> I think passing in the Merger is fine; we pass in Branch in other hooks
> and trust folk to use public bits and not make a mess. If someone does
> fiddle, the conflict resolution phases will detect that pretty robustly
> I think.

I'm not worried about people accessing non-public bits (they should
already know that _foo attributes are off-limits unless they're ready to
deal with the likely breakage), apart from ensuring that the public bits
are complete enough for hooks to do what they need.

However, I think it's important to decide if we pass the Merger with the
intent that the hook will use it only read-only (to retrieve file names,
file contents, maybe details about other things in the tree, etc), or if
we actually want to support the hook mutating things via the Merger. It
seems the answer is the latter, which is fine, but it's worth being
clear about that. That answer suggests that my original proposal that
the return value from the hook makes changes isn't quite right when the
hook can just do change(s) directly using the Merger.

-Andrew.

Revision history for this message
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal

I'd suggest one of:
 return value indicates 'handled' or 'not handled'
 return value is parameter to pass to next hook or None-to-stop

and let changes that require calling fn's on Merger be done directly.

-Rob

Revision history for this message
Andrew Bennetts (spiv) wrote :

This addresses most (all?) of the points raised in the original reviews.

Major changes:

  * Added 'news_merge' plugin to contrib/! This is an example plugin using the merge hook function so that you can see it in action, and so that users will have some inspiration for writing their own.
  * The built-in merge logic is now invoked the same way as user-registered hook functions. So in theory anything the built-in logic can do, a plugin can do too.
  * the hook point is invoked a bit earlier, so it can have a say in more kinds of conflicts by inspecting the 'winner' attribute.
  * added some convenience methods/properties to the params object.
  * generally simpler and more linear code path than lp:bzr, thanks to refactorings for above changes.

I think looking at the news_merge plugin the API is still a bit cumbersome (note the repetition of logic vs. the built-in text merge to see if this is an ordinary regular file, both sides changed merge). But having an example proves it works, and gives us good data for thinking of improvements.

Revision history for this message
Martin Pool (mbp) wrote :

2010/1/13 Andrew Bennetts <email address hidden>:
> This addresses most (all?) of the points raised in the original reviews.
>
> Major changes:
>
>  * Added 'news_merge' plugin to contrib/!  This is an example plugin using the merge hook function so that you can see it in action, and so that users will have some inspiration for writing their own.
>  * The built-in merge logic is now invoked the same way as user-registered hook functions.  So in theory anything the built-in logic can do, a plugin can do too.
>  * the hook point is invoked a bit earlier, so it can have a say in more kinds of conflicts by inspecting the 'winner' attribute.
>  * added some convenience methods/properties to the params object.
>  * generally simpler and more linear code path than lp:bzr, thanks to refactorings for above changes.
>
> I think looking at the news_merge plugin the API is still a bit cumbersome (note the repetition of logic vs. the built-in text merge to see if this is an ordinary regular file, both sides changed merge).  But having an example proves it works, and gives us good data for thinking of improvements.

(Without reading the diff) that sounds like a nice list of changes.

I don't want to add latency to these changes but I would say that
putting things in contrib/ is not a fabulous way to get it out to
people: it relies on them discovering that it's there and how to use
it. It is a bit confusing for packagers whether they should install
the things from in there or not: if they don't, it's more
inaccessible, and if they do then the packaged version is quite
different from the standard one. Empirically, the other things in
there have tended to rot. They are not tested.

Therefore maybe you should put this in bzrlib.plugins but arrange that
it has no effect (and little load time overhead) unless it's
configured on?

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message
Robert Collins (lifeless) wrote :

On Wed, 2010-01-13 at 07:54 +0000, Martin Pool wrote:
>
> Therefore maybe you should put this in bzrlib.plugins but arrange that
> it has no effect (and little load time overhead) unless it's
> configured on?

+1000

-Rob

Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (6.1 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> Andrew Bennetts has proposed merging lp:~spiv/bzr/per-file-merge-hook-491711 into lp:bzr with lp:~spiv/bzr/merge-minor-refactoring as a prerequisite.
>
> Requested reviews:
> John A Meinel (jameinel)
> Related bugs:
> #491711 hook to permit per file merges
> https://bugs.launchpad.net/bugs/491711
>
>

I think overall this is looking good. Looking over the actual diff
brought up a lot of questions about how various things hook together. I
don't know that everything needs to be addressed before merging, but I
think it would be good to at least understand what is going on.

 review: needsfixing

+class MergeHookParams(object):
+ """Object holding parameters passed to merge_file_content hooks.
+
+ There are 3 fields hooks can access:
+

What about "this_pair", "other_pair", etc. I don't really understand
what the "pair" is. this_pair[0] looks to be a Kind. Also, you should
probably document "base_lines", "this_lines", and "other_lines" here as
well. The current documentation assumes you'll go to the merger, but
you've already provided helpers for that.

base_lines, this_lines and other_lines doesn't map very cleanly into
WeaveMerge... But maybe it is good enough.

+ if hook_status == 'not_applicable':
+ # This is a contents conflict, because none of the available
+ # functions could merge it.
+ result = None
+ name = self.tt.final_name(trans_id)
+ parent_id = self.tt.final_parent(trans_id)
+ if file_id in self.this_tree.inventory:
+ self.tt.unversion_file(trans_id)
+ file_group = self._dump_conflicts(name, parent_id, file_id,
+ set_version=True)
+ self._raw_conflicts.append(('contents conflict', file_group))

^- Are we sure this is correct? If there is a contents_conflict we mark
the file as unversioned?

Also, you are using "self.this_tree.inventory" but later we use "file_id
in self.this_tree".

I'd probably rather use "self.this_tree.has_id(file_id)" in both cases.
Trees shouldn't really have a __contains__ method, and I'd like to avoid
using ".inventory" if we can. (Most Dirstate actions can be resolved
without building an inventory at all.)

...

+ elif hook_status == 'conflicted':
+ # XXX: perhaps the hook should be able to provide
+ # the BASE/THIS/OTHER files?
+ self.tt.create_file(lines, trans_id)
+ self._raw_conflicts.append(('text conflict', trans_id))
+ name = self.tt.final_name(trans_id)
+ parent_id = self.tt.final_parent(trans_id)
+ file_group = self._dump_conflicts(name, parent_id, file_id)
+ file_group.append(trans_id)

^- Do you know why we are appending the trans_id here? file_group isn't
mentioned again. So is this mutating an internal structure behind the
scenes? (_dump_conflicts is creating a list and storing it in state
somewhere, and we mutate it to add the new id.) If so, a comment would
be nice. (Probably not your code, but certainly a "what is going on?"
moment.)

...

+ try:...

Read more...

review: Needs Fixing
Revision history for this message
Martin Pool (mbp) wrote :

As a follow on (after merging) it would be good to

* announce this on udd
* see if there is some existing code for merging debian changelogs and merge that

Revision history for this message
Andrew Bennetts (spiv) wrote :
Download full text (9.4 KiB)

John A Meinel wrote:
> Review: Needs Fixing
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Andrew Bennetts wrote:
> > Andrew Bennetts has proposed merging lp:~spiv/bzr/per-file-merge-hook-491711 into lp:bzr with lp:~spiv/bzr/merge-minor-refactoring as a prerequisite.
> >
> > Requested reviews:
> > John A Meinel (jameinel)
> > Related bugs:
> > #491711 hook to permit per file merges
> > https://bugs.launchpad.net/bugs/491711
> >
> >
>
> I think overall this is looking good. Looking over the actual diff
> brought up a lot of questions about how various things hook together. I
> don't know that everything needs to be addressed before merging, but I
> think it would be good to at least understand what is going on.

Fair enough. There are parts I don't fully understand, but those parts
I'm pretty sure are working the same as in trunk (and are equally
confusing in trunk).

Thanks for the review!

> +class MergeHookParams(object):
> + """Object holding parameters passed to merge_file_content hooks.
> +
> + There are 3 fields hooks can access:
> +
>
> What about "this_pair", "other_pair", etc. I don't really understand
> what the "pair" is. this_pair[0] looks to be a Kind. Also, you should

Yeah, it is a kind. This was a refactoring that I should have taken a
little further by just passing the kinds rather than the “pairs”. I've
done that now, and added this_kind and other_kind to the class
docstring.

> probably document "base_lines", "this_lines", and "other_lines" here as
> well. The current documentation assumes you'll go to the merger, but
> you've already provided helpers for that.

I've removed the outdated comment that suggests using the merger for
this.

The *_lines properties are already documented, though, they have their
own docstrings. Unless you think they should be mentioned in the class
docstring too, but I didn't think that was our usual practice?

> base_lines, this_lines and other_lines doesn't map very cleanly into
> WeaveMerge... But maybe it is good enough.

Yeah, I admit to being unsure about what to do with the non-default
Merger classes. I guess for now we leave it up to the plugins to check
the type of Merger if they are sensitive to it? I would like to give
plugins the chance to use their own hooks even in the case of
WeaveMerge.

> + if hook_status == 'not_applicable':
> + # This is a contents conflict, because none of the available
> + # functions could merge it.
> + result = None
> + name = self.tt.final_name(trans_id)
> + parent_id = self.tt.final_parent(trans_id)
> + if file_id in self.this_tree.inventory:
> + self.tt.unversion_file(trans_id)
> + file_group = self._dump_conflicts(name, parent_id, file_id,
> + set_version=True)
> + self._raw_conflicts.append(('contents conflict', file_group))
>
> ^- Are we sure this is correct? If there is a contents_conflict we mark
> the file as unversioned?

I'm not really sure why, but this is what lp:bzr does in these cases.
So this isn't an issue introduced by this branch.

> Also, you are u...

Read more...

Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (7.7 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...

>> + try:
>> + self.tt.tree_kind(trans_id)
>> + self.tt.delete_contents(trans_id)
>> + except errors.NoSuchFile:
>> + pass
>>
>> ^- I don't quite understand the unconditional "delete_contents" if the
>> contents exist. Especially given that:
>>
>> + elif hook_status == 'success':
>> + self.tt.create_file(lines, trans_id)
>>
>> Is called *before* this.
>
> Yes, this certainly puzzled me when I came across it in trunk! (Again,
> this oddity is definitely present in trunk, just that my patch has
> perhaps made it more visible.)
>
>> Is it that TreeTransform.delete_contents() indicates we want to delete
>> the content in the current tree, and "create_file()" is creating
>> contents in the "next" tree?
>
> Exactly how TreeTransform isn't clearly documented that I could find.
> If someone that knows precisely what the operations on it means and
> could document that it would be a great help, I think. AFAICS there
> isn't even a description of what a trans_id is... is an abbreviation of
> “transform id” or “transaction id”, are they file_ids, or something else
> but like file_ids? What's the difference between unversion_file()
> and delete_contents()? etc.

I can help with this.

A trans_id is a handle for the TreeTransform. Similar in concept to a
file_id mapping to a path in an inventory. However, TreeTransform can
talk about files that aren't versioned (have no trans_id), and multiple
copies of the same logical file (two people introduce the same file_id
at different locations, etc). As such, trans_id are an abstraction of a
specific file/directory content during the transaction. AFAICT they are
always of the form "new-%d".

unversion_file is saying that we need to remove the file_id from the
inventory.
delete_contents says we want to call 'os.remove()' on the object.

Note that you don't get a new trans_id for content being removed versus
content being added. I checked and _removed_contents indicates files
that are to be deleted from the wt, but you can *also* have the same
trans_id in _new_contents. So I think trans_id is like file_id, in that
it is how TT maps this file should be replacing that file.

I think part of it is that TT doesn't think about replacements. It only
has a remove phase and an add phase.

1) All new content is staged into .bzr/checkout/limbo, generally with
names like 'new-1'. Though we do try to create files in their final
directories with their final names if we know in advance. (So we don't
rename all files, just the containing dirs for 'bzr co'.)

2) All old files are renamed into .bzr/checkout/pending_deletion

3) All new files are renamed from .bzr/checkout/limbo into the working tree

4) All old files in .bzr/checkout/pending_deletion are removed

That is how we get atomicity of updates, so that if something fails in
(3) we can still revert back to the original tree.

>
> So, for this specific puzzle...
>
> The code I'm guessing is there to ensure that the file is marked as
> versioned, but that the temporary file used for the new contents is
> removed? Staring again at the treetransform.py...

Read more...

Revision history for this message
Aaron Bentley (abentley) wrote :
Download full text (4.7 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
>> ^- Are we sure this is correct? If there is a contents_conflict we mark
>> the file as unversioned?

That's correct. For a contents conflict that is not a text conflict,
bzr does not leave a file in place with the original filename, therefore
we also unversion it, and version the .OTHER, .THIS or .BASE--
whatever's there.

>> + elif hook_status == 'conflicted':
>> + # XXX: perhaps the hook should be able to provide
>> + # the BASE/THIS/OTHER files?
>> + self.tt.create_file(lines, trans_id)
>> + self._raw_conflicts.append(('text conflict', trans_id))
>> + name = self.tt.final_name(trans_id)
>> + parent_id = self.tt.final_parent(trans_id)
>> + file_group = self._dump_conflicts(name, parent_id, file_id)
>> + file_group.append(trans_id)
>>
>> ^- Do you know why we are appending the trans_id here? file_group isn't
>> mentioned again.

> Hmm. Again, this is something that lp:bzr does that I've
> preserved/copied. On closer inspection, I don't think it has any effect
> (either in lp:bzr or in my branch), so I'll just delete that.

This code doesn't appear to have ever made sense. It was introduced in
<email address hidden>. I
suspect it was meant to help with treating a file and its conflict files
as a group for conflict handling.

>> + try:
>> + self.tt.tree_kind(trans_id)
>> + self.tt.delete_contents(trans_id)
>> + except errors.NoSuchFile:
>> + pass
>>
>> ^- I don't quite understand the unconditional "delete_contents" if the
>> contents exist. Especially given that:
>>
>> + elif hook_status == 'success':
>> + self.tt.create_file(lines, trans_id)
>>
>> Is called *before* this.

Operations on a TreeTransform simply schedule things to happen when you
call 'apply'. Before and after are irrelevant (except when you cancel
an operation you had previously scheduled). All that matters is that
the tree transform is consistent when you call apply.

>> Is it that TreeTransform.delete_contents() indicates we want to delete
>> the content in the current tree, and "create_file()" is creating
>> contents in the "next" tree?

delete_contents means that applying the transform should delete these
contents, and create_file means that applying the transform should
create a file with specified contents.

> Exactly how TreeTransform isn't clearly documented that I could find.

How it what? Works?

> If someone that knows precisely what the operations on it means and
> could document that it would be a great help, I think.

I've certainly tried in the past, and I'm happy to enhance the
documentation. Could you start by explaining what is confusing or
missing from the TreeTransform docstring?

> AFAICS there
> isn't even a description of what a trans_id is...

There are three paragraphs of description:

> Transform/Transaction ids
> -------------------------
> trans_ids are temporary ids assigned to all files involved in a transform.
> It's possible, even common, that not all ...

Read more...

Revision history for this message
Andrew Bennetts (spiv) wrote :
Download full text (3.4 KiB)

Aaron Bentley wrote:
[...]
> > Exactly how TreeTransform isn't clearly documented that I could find.
>
> How it what? Works?

Yes, that's the missing word. I'm glad your telepathy is functioning
well :)

> > If someone that knows precisely what the operations on it means and
> > could document that it would be a great help, I think.
>
> I've certainly tried in the past, and I'm happy to enhance the
> documentation. Could you start by explaining what is confusing or
> missing from the TreeTransform docstring?

Simply that I didn't find it when I looked for it, believe or not! I
looked for a class docstring on the class that implemented these methods
and instance variables (TreeTransformBase), and then looked for a module
docstring, and then tried grepping doc/developers/. It didn't occur to
me to try looking a couple of classes (and 1000 lines) further down
transform.py (and a couple of classes further down the inheritance
hierarchy too) to see if one of those had introductory documentation.

This is the sort of situation where I miss dedicated “interface”
documentation, I guess.

Thanks for pointing me to that, I thought I'd seen it before but several
minutes of solid searching failed to find it for me! I expect I'll
remember it for next time.

[...]
> is an abbreviation of
> > “transform id” or “transaction id”
>
> Both, as you can see above.

[Eep! Ambiguity of names doesn't help clarity. I realise it's basically
an arbitrary choice, but I suggest picking one. Actually, it's also
that neither name is particuarly satisfying: one of these IDs doesn't
correspond to a single tree transform (which is done as a single
transaction), it corresponds to an element of that transform. So, I'd
probably find “tentry_id” (short for “transform entry ID”) clear and
mnemonic... but it's a bit late to change now :)
]

> >, are they file_ids, or something else
> > but like file_ids?
>
>
> > What's the difference between unversion_file()
> > and delete_contents()?
>
> One schedules the deletion of the contents of a file. The other
> schedules removes the removal of file id of the file. The docstring for
> delete_contents is:
>
> > """Schedule the contents of a path entry for deletion"""
>
> The docstring for unversion_file is:
> > """Schedule a path entry to become unversioned"""
>
> If this is unclear, could you say what about it is unclear?

First I should say I had an intuition about the meanings of these, and I
think my intuition was basically correct. So it's not so bad.

It's unclear for a few reasons:

 * the docstrings don't really add information not already in the method
   name,
 * my confidence in my intuition about the names was low because the
   terms were so close to synonymous, and
 * the terminology is not memorably clear.

By “memorably clear”, I mean that terms like “delete” and “unversion”
are close to synonymous, so the distinction is hard to notice and
recall.

I don't mean say you necessarily should have found better names, maybe
there are no better names for these concepts... none spring to mind for
me! But (assuming I had found the TreeTransform docstring in the first
place) some explicit descri...

Read more...

Revision history for this message
Andrew Bennetts (spiv) wrote :

John A Meinel wrote:
[...]
> Though it also means that calling:
> self.tt.tree_kind()
> self.tt.delete_contents()
>
> is redundant and double statting the file. So we should clean that up

Yeah, I'd noticed that, but wasn't sure if it was tasteful to clean that
up. Seeing as you think we should I've gone ahead and removed that line
:)

[...]
> So the comment looks correct to my understanding.

I've added a comment based on Aaron's wording.

[...]
> If I set up location.conf to say:
> [/home/jameinel/dev/bzr]
> news_merger_files = NEWS
> news_merge_files:policy = recurse

(Thanks for this, this is exactly the syntax I've used for the
news_merge plugin configuration)

> Will that match when 'bzr merge --preview' is invoked (what Branch is
> given to the PreviewTree, etc.) I'm not 100% sure that PreviewTree *has*
> a .branch property. (Given that RevisionTree does not, but has a
> _repository attribute. Even better it is passed in as 'branch'... :)

It turned out I had to pass the branch a little deeper in all cases, but
having done that 'merge --preview' works just as well as regular 'merge'
:)

[...]
> It is probably reasonable to merge soon. I'd probably like to see the
> news plugin in 'plugins' first, but that is not critical.

I've done that anyway. Please try it out! :)

-Andrew.

Revision history for this message
Vincent Ladeuil (vila) wrote :

I reviewed this chronologically (reading oldest revision first)
and I found it was far easier to understand what was going
on. I'm not sure that trick could be used for all reviews but it
definetly made that one a nice tour :)

Keep that in mind when reading the comments below as they may
seem out of context otherwise.

I had a moderately good understanding of the merge code and I
find that your refactoring makes it far clearer, thanks for that
!

32 === modified file 'bzrlib/decorators.py'
33 --- bzrlib/decorators.py 2009-10-15 02:19:43 +0000
34 +++ bzrlib/decorators.py 2010-01-18 08:00:37 +0000
35 @@ -253,3 +253,19 @@
36 global needs_read_lock, needs_write_lock
37 needs_read_lock = _pretty_needs_read_lock
38 needs_write_lock = _pretty_needs_write_lock
39 +
40 +
41 +def cachedproperty(getter_func):
42 + """Like property(getter_func), but caches the value.
43 +
44 + Note: there's no way to clear or reset the cache.
45 + """
46 + memo = []

AIUI, this is gc'ed when no more references exist on the cached
value. It's not obvious and may be worth documenting.

Also, this can be used in a lot of places in the codebase so
that's worth a NEWS entry isn't it ?

And last remark on that subject, can't use some unique object()
to initialize the cache instead of a list for each property ?
Using a list here is cute but a bit hackish, using a dedicated
object to represent the 'not cached yet' could make the code more
obvious to the casual reader.

The hook scope is a bit unclear regarding the tt updates (spotted
when reading revno 4887). Why don't you let the hooks do them (in
merge.merge_contents()) ? Shouldn't that be at least documented
in the hook docstring ?

I had to read the code to verify that is_file_merge() was indeed
is_text_merge(), is it just me or should it be renamed ? Ha,
fixed in revno 4890 :) Never mind then.

248 + params = MergeHookParams(self, file_id, trans_id, this_pair[0],
249 + other_pair[0], winner)

Using this_kind= this_pair[0] and other_kind=other_pair[0] will
add a final touch of clarity to revno 4894 :)

Overall, nice job and reading the history of how it was put together
was really a big help to understand it.

I'll be happy to land this for 2.1 unless John or Aaron have more objections
until tomorrow.

Addressing the points I raised above with be nice but not mandatory for landing.

review: Approve
Revision history for this message
Andrew Bennetts (spiv) wrote :

Vincent Ladeuil wrote:
[...]
> 41 +def cachedproperty(getter_func):
> 42 + """Like property(getter_func), but caches the value.
> 43 +
> 44 + Note: there's no way to clear or reset the cache.
> 45 + """
> 46 + memo = []
>
>
> AIUI, this is gc'ed when no more references exist on the cached
> value. It's not obvious and may be worth documenting.

Erk, this is totally bogus. I'm not sure what I was thinking! I was
trying for a limited quick & dirty cachedproperty decorator without the
complexity of the one in Launchpad, but I think I should just borrow
Launchpad's, so I've now done that (including arranging for its doctests
to be run).

> Also, this can be used in a lot of places in the codebase so
> that's worth a NEWS entry isn't it ?

Good idea. Done.

> The hook scope is a bit unclear regarding the tt updates (spotted
> when reading revno 4887). Why don't you let the hooks do them (in
> merge.merge_contents()) ? Shouldn't that be at least documented
> in the hook docstring ?

I'm not sure what you mean here? You mean let the hook do the
self.tt.version_file and self.tt.delete_contents calls that happen at
the end of merge_contents?

I guess I'm still not certain exactly how much the hook should be
allowed to do... the more plugins people try to write for it, the more
we'll know :)

[...]
> I'll be happy to land this for 2.1 unless John or Aaron have more objections
> until tomorrow.
>
> Addressing the points I raised above with be nice but not mandatory for landing.

That's very much for the extra review.

-Andrew.

Revision history for this message
Vincent Ladeuil (vila) wrote :

>>>>> "Andrew" == Andrew Bennetts <email address hidden> writes:

    Andrew> Vincent Ladeuil wrote:
    Andrew> [...]
    >> 41 +def cachedproperty(getter_func):
    >> 42 + """Like property(getter_func), but caches the value.
    >> 43 +
    >> 44 + Note: there's no way to clear or reset the cache.
    >> 45 + """
    >> 46 + memo = []
    >>
    >>
    >> AIUI, this is gc'ed when no more references exist on the cached
    >> value. It's not obvious and may be worth documenting.

    Andrew> Erk, this is totally bogus.

Really ? How so ?

<snip/>

    >> The hook scope is a bit unclear regarding the tt updates
    >> (spotted when reading revno 4887). Why don't you let the
    >> hooks do them (in merge.merge_contents()) ? Shouldn't
    >> that be at least documented in the hook docstring ?

    Andrew> I'm not sure what you mean here? You mean let the
    Andrew> hook do the self.tt.version_file and
    Andrew> self.tt.delete_contents calls that happen at the end
    Andrew> of merge_contents?

Yes or let the caller do it but document it anyway.

    Andrew> I guess I'm still not certain exactly how much the
    Andrew> hook should be allowed to do... the more plugins
    Andrew> people try to write for it, the more we'll know :)

Ok, no problem, I'm not sure either. It's just that it seems odd
to *forbid* the hook to do some stuff a bit different if
needed. But there is no point forcing people to do it
unconditionally either. Time will tell.

I'm happy enough with what is possible today anyway.

Revision history for this message
John A Meinel (jameinel) wrote :

One problem I potentially see with the plugin is this:
585 +def filename_matches_config(params):
586 + config = params.merger.this_branch.get_config()
587 + affected_files = config.get_user_option_as_list('news_merge_files')
588 + if affected_files:
589 + filename = params.merger.this_tree.id2path(params.file_id)
590 + if filename in affected_files:
591 + return True
592 + return False

The issue is that "branch.get_config()" is not cached, and re-reads all config files. And you trigger this for *all* modified files. So if you are merging, say, bzr.dev after 20 revs, you'll get 50 changed files, and we will re-read branch.conf, bazaar.conf and locations.conf 50 times.

I wonder if we should have a place on MergeHookParams to store hook data...

Otherwise I like this revision.

review: Approve
Revision history for this message
Robert Collins (lifeless) wrote :

On Mon, 2010-01-18 at 11:06 +0000, Andrew Bennetts wrote:

> > AIUI, this is gc'ed when no more references exist on the cached
> > value. It's not obvious and may be worth documenting.
>
> Erk, this is totally bogus. I'm not sure what I was thinking! I was
> trying for a limited quick & dirty cachedproperty decorator without the
> complexity of the one in Launchpad, but I think I should just borrow
> Launchpad's, so I've now done that (including arranging for its doctests
> to be run).

You'll need to get an ack to rerelease the LP code to do this: Launchpad
is AGPL, which is incompatible with GPL.

I agree with John that we need to avoid processing branch.conf per-file.

-Rob

Revision history for this message
Vincent Ladeuil (vila) wrote :

> On Mon, 2010-01-18 at 11:06 +0000, Andrew Bennetts wrote:
>

> You'll need to get an ack to rerelease the LP code to do this: Launchpad
> is AGPL, which is incompatible with GPL.

I've got the ack from flacoste.

>
> I agree with John that we need to avoid processing branch.conf per-file.

spiv proposed to cache the config in the plugin, I'll tweak and merge.

We may want to discuss caching the config in the branch itself
(I can't find scenarios where a locked branch could have trouble with that)
but that's outside the scope of this plugin.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2010-01-17 20:06:41 +0000
3+++ NEWS 2010-01-18 11:01:23 +0000
4@@ -20,6 +20,15 @@
5 * Add bug information to log output when available.
6 (Neil Martinsen-Burrell, Guillermo Gonzalez, #251729)
7
8+* Added ``merge_file_content`` hook point to ``Merger``, allowing plugins
9+ to register custom merge logic, e.g. to provide smarter merging for
10+ particular files.
11+
12+* Bazaar now includes the ``news_merge`` plugin. It is disabled by
13+ default, to enable it add a ``news_merge_files`` option to your
14+ configuration. Consult ``bzr help news_merge`` for more information.
15+ (Andrew Bennetts)
16+
17 * ``bzr branch`` now takes a ``--bind`` option. This lets you
18 branch and bind all in one command. (Ian Clatworthy)
19
20@@ -154,11 +163,18 @@
21 API Changes
22 ***********
23
24+* Added ``cachedproperty`` decorator to ``bzrlib.decorators``.
25+ (Andrew Bennetts)
26+
27 * Many test features were renamed from ``FooFeature`` to ``foo_feature``
28 to be consistent with instances being lower case and classes being
29 CamelCase. For the features that were more likely to be used, we added a
30 deprecation thunk, but not all. (John Arbash Meinel)
31
32+* Merger classes (such as ``Merge3Merger``) now expect a ``this_branch``
33+ parameter in their constructora, and provide ``this_branch`` as an
34+ attribute. (Andrew Bennetts)
35+
36 * The Branch hooks pre_change_branch_tip no longer masks exceptions raised
37 by plugins - the original exceptions are now preserved. (Robert Collins)
38
39
40=== modified file 'bzrlib/decorators.py'
41--- bzrlib/decorators.py 2009-10-15 02:19:43 +0000
42+++ bzrlib/decorators.py 2010-01-18 11:01:23 +0000
43@@ -253,3 +253,79 @@
44 global needs_read_lock, needs_write_lock
45 needs_read_lock = _pretty_needs_read_lock
46 needs_write_lock = _pretty_needs_write_lock
47+
48+
49+# This implementation of cachedproperty is copied from Launchpad's
50+# canonical.launchpad.cachedproperty module.
51+def cachedproperty(attrname_or_fn):
52+ """A decorator for methods that makes them properties with their return
53+ value cached.
54+
55+ The value is cached on the instance, using the attribute name provided.
56+
57+ If you don't provide a name, the mangled name of the property is used.
58+
59+ >>> class CachedPropertyTest(object):
60+ ...
61+ ... @cachedproperty('_foo_cache')
62+ ... def foo(self):
63+ ... print 'foo computed'
64+ ... return 23
65+ ...
66+ ... @cachedproperty
67+ ... def bar(self):
68+ ... print 'bar computed'
69+ ... return 69
70+
71+ >>> cpt = CachedPropertyTest()
72+ >>> getattr(cpt, '_foo_cache', None) is None
73+ True
74+ >>> cpt.foo
75+ foo computed
76+ 23
77+ >>> cpt.foo
78+ 23
79+ >>> cpt._foo_cache
80+ 23
81+ >>> cpt.bar
82+ bar computed
83+ 69
84+ >>> cpt._bar_cached_value
85+ 69
86+
87+ """
88+ if isinstance(attrname_or_fn, basestring):
89+ attrname = attrname_or_fn
90+ return _CachedPropertyForAttr(attrname)
91+ else:
92+ fn = attrname_or_fn
93+ attrname = '_%s_cached_value' % fn.__name__
94+ return _CachedProperty(attrname, fn)
95+
96+
97+class _CachedPropertyForAttr(object):
98+
99+ def __init__(self, attrname):
100+ self.attrname = attrname
101+
102+ def __call__(self, fn):
103+ return _CachedProperty(self.attrname, fn)
104+
105+
106+class _CachedProperty(object):
107+
108+ def __init__(self, attrname, fn):
109+ self.fn = fn
110+ self.attrname = attrname
111+ self.marker = object()
112+
113+ def __get__(self, inst, cls=None):
114+ if inst is None:
115+ return self
116+ cachedresult = getattr(inst, self.attrname, self.marker)
117+ if cachedresult is self.marker:
118+ result = self.fn(inst)
119+ setattr(inst, self.attrname, result)
120+ return result
121+ else:
122+ return cachedresult
123
124=== modified file 'bzrlib/hooks.py'
125--- bzrlib/hooks.py 2010-01-03 03:49:07 +0000
126+++ bzrlib/hooks.py 2010-01-18 11:01:23 +0000
127@@ -45,6 +45,8 @@
128 'bzrlib.info', 'InfoHooks')
129 known_hooks.register_lazy(('bzrlib.lock', 'Lock.hooks'), 'bzrlib.lock',
130 'LockHooks')
131+known_hooks.register_lazy(('bzrlib.merge', 'Merger.hooks'), 'bzrlib.merge',
132+ 'MergeHooks')
133 known_hooks.register_lazy(('bzrlib.msgeditor', 'hooks'), 'bzrlib.msgeditor',
134 'MessageEditorHooks')
135 known_hooks.register_lazy(('bzrlib.mutabletree', 'MutableTree.hooks'),
136
137=== modified file 'bzrlib/merge.py'
138--- bzrlib/merge.py 2009-12-18 08:22:42 +0000
139+++ bzrlib/merge.py 2010-01-18 11:01:23 +0000
140@@ -19,8 +19,10 @@
141 branch as _mod_branch,
142 conflicts as _mod_conflicts,
143 debug,
144+ decorators,
145 errors,
146 graph as _mod_graph,
147+ hooks,
148 merge3,
149 osutils,
150 patiencediff,
151@@ -50,7 +52,68 @@
152 from_tree.unlock()
153
154
155+class MergeHooks(hooks.Hooks):
156+
157+ def __init__(self):
158+ hooks.Hooks.__init__(self)
159+ self.create_hook(hooks.HookPoint('merge_file_content',
160+ "Called when file content needs to be merged (including when one "
161+ "side has deleted the file and the other has changed it)."
162+ "merge_file_content is called with a "
163+ "bzrlib.merge.MergeHookParams. The function should return a tuple "
164+ "of (status, lines), where status is one of 'not_applicable', "
165+ "'success', 'conflicted', or 'delete'. If status is success or "
166+ "conflicted, then lines should be an iterable of strings of the "
167+ "new file contents.",
168+ (2, 1), None))
169+
170+
171+class MergeHookParams(object):
172+ """Object holding parameters passed to merge_file_content hooks.
173+
174+ There are 3 fields hooks can access:
175+
176+ :ivar merger: the Merger object
177+ :ivar file_id: the file ID of the file being merged
178+ :ivar trans_id: the transform ID for the merge of this file
179+ :ivar this_kind: kind of file_id in 'this' tree
180+ :ivar other_kind: kind of file_id in 'other' tree
181+ :ivar winner: one of 'this', 'other', 'conflict'
182+ """
183+
184+ def __init__(self, merger, file_id, trans_id, this_kind, other_kind,
185+ winner):
186+ self.merger = merger
187+ self.file_id = file_id
188+ self.trans_id = trans_id
189+ self.this_kind = this_kind
190+ self.other_kind = other_kind
191+ self.winner = winner
192+
193+ def is_file_merge(self):
194+ """True if this_kind and other_kind are both 'file'."""
195+ return self.this_kind == 'file' and self.other_kind == 'file'
196+
197+ @decorators.cachedproperty
198+ def base_lines(self):
199+ """The lines of the 'base' version of the file."""
200+ return self.merger.get_lines(self.merger.base_tree, self.file_id)
201+
202+ @decorators.cachedproperty
203+ def this_lines(self):
204+ """The lines of the 'this' version of the file."""
205+ return self.merger.get_lines(self.merger.this_tree, self.file_id)
206+
207+ @decorators.cachedproperty
208+ def other_lines(self):
209+ """The lines of the 'other' version of the file."""
210+ return self.merger.get_lines(self.merger.other_tree, self.file_id)
211+
212+
213 class Merger(object):
214+
215+ hooks = MergeHooks()
216+
217 def __init__(self, this_branch, other_tree=None, base_tree=None,
218 this_tree=None, pb=None, change_reporter=None,
219 recurse='down', revision_graph=None):
220@@ -432,7 +495,7 @@
221 'other_tree': self.other_tree,
222 'interesting_ids': self.interesting_ids,
223 'interesting_files': self.interesting_files,
224- 'pp': self.pp,
225+ 'pp': self.pp, 'this_branch': self.this_branch,
226 'do_merge': False}
227 if self.merge_type.requires_base:
228 kwargs['base_tree'] = self.base_tree
229@@ -542,13 +605,14 @@
230 interesting_ids=None, reprocess=False, show_base=False,
231 pb=progress.DummyProgress(), pp=None, change_reporter=None,
232 interesting_files=None, do_merge=True,
233- cherrypick=False, lca_trees=None):
234+ cherrypick=False, lca_trees=None, this_branch=None):
235 """Initialize the merger object and perform the merge.
236
237 :param working_tree: The working tree to apply the merge to
238 :param this_tree: The local tree in the merge operation
239 :param base_tree: The common tree in the merge operation
240 :param other_tree: The other tree to merge changes from
241+ :param this_branch: The branch associated with this_tree
242 :param interesting_ids: The file_ids of files that should be
243 participate in the merge. May not be combined with
244 interesting_files.
245@@ -577,6 +641,7 @@
246 self.this_tree = working_tree
247 self.base_tree = base_tree
248 self.other_tree = other_tree
249+ self.this_branch = this_branch
250 self._raw_conflicts = []
251 self.cooked_conflicts = []
252 self.reprocess = reprocess
253@@ -892,7 +957,7 @@
254 self.tt.final_kind(other_root)
255 except errors.NoSuchFile:
256 return
257- if self.other_tree.inventory.root.file_id in self.this_tree.inventory:
258+ if self.this_tree.has_id(self.other_tree.inventory.root.file_id):
259 # the other tree's root is a non-root in the current tree
260 return
261 self.reparent_children(self.other_tree.inventory.root, self.tt.root)
262@@ -940,7 +1005,7 @@
263 @staticmethod
264 def executable(tree, file_id):
265 """Determine the executability of a file-id (used as a key method)."""
266- if file_id not in tree:
267+ if not tree.has_id(file_id):
268 return None
269 if tree.kind(file_id) != "file":
270 return False
271@@ -949,7 +1014,7 @@
272 @staticmethod
273 def kind(tree, file_id):
274 """Determine the kind of a file-id (used as a key method)."""
275- if file_id not in tree:
276+ if not tree.has_id(file_id):
277 return None
278 return tree.kind(file_id)
279
280@@ -1038,7 +1103,7 @@
281
282 def merge_names(self, file_id):
283 def get_entry(tree):
284- if file_id in tree.inventory:
285+ if tree.has_id(file_id):
286 return tree.inventory[file_id]
287 else:
288 return None
289@@ -1107,18 +1172,6 @@
290 contents = None
291 return kind, contents
292
293- def contents_conflict():
294- trans_id = self.tt.trans_id_file_id(file_id)
295- name = self.tt.final_name(trans_id)
296- parent_id = self.tt.final_parent(trans_id)
297- if file_id in self.this_tree.inventory:
298- self.tt.unversion_file(trans_id)
299- if file_id in self.this_tree:
300- self.tt.delete_contents(trans_id)
301- file_group = self._dump_conflicts(name, parent_id, file_id,
302- set_version=True)
303- self._raw_conflicts.append(('contents conflict', file_group))
304-
305 # See SPOT run. run, SPOT, run.
306 # So we're not QUITE repeating ourselves; we do tricky things with
307 # file kind...
308@@ -1140,63 +1193,112 @@
309 if winner == 'this':
310 # No interesting changes introduced by OTHER
311 return "unmodified"
312+ # We have a hypothetical conflict, but if we have files, then we
313+ # can try to merge the content
314 trans_id = self.tt.trans_id_file_id(file_id)
315- if winner == 'other':
316+ params = MergeHookParams(self, file_id, trans_id, this_pair[0],
317+ other_pair[0], winner)
318+ hooks = Merger.hooks['merge_file_content']
319+ hooks = list(hooks) + [self.default_text_merge]
320+ hook_status = 'not_applicable'
321+ for hook in hooks:
322+ hook_status, lines = hook(params)
323+ if hook_status != 'not_applicable':
324+ # Don't try any more hooks, this one applies.
325+ break
326+ result = "modified"
327+ if hook_status == 'not_applicable':
328+ # This is a contents conflict, because none of the available
329+ # functions could merge it.
330+ result = None
331+ name = self.tt.final_name(trans_id)
332+ parent_id = self.tt.final_parent(trans_id)
333+ if self.this_tree.has_id(file_id):
334+ self.tt.unversion_file(trans_id)
335+ file_group = self._dump_conflicts(name, parent_id, file_id,
336+ set_version=True)
337+ self._raw_conflicts.append(('contents conflict', file_group))
338+ elif hook_status == 'success':
339+ self.tt.create_file(lines, trans_id)
340+ elif hook_status == 'conflicted':
341+ # XXX: perhaps the hook should be able to provide
342+ # the BASE/THIS/OTHER files?
343+ self.tt.create_file(lines, trans_id)
344+ self._raw_conflicts.append(('text conflict', trans_id))
345+ name = self.tt.final_name(trans_id)
346+ parent_id = self.tt.final_parent(trans_id)
347+ self._dump_conflicts(name, parent_id, file_id)
348+ elif hook_status == 'delete':
349+ self.tt.unversion_file(trans_id)
350+ result = "deleted"
351+ elif hook_status == 'done':
352+ # The hook function did whatever it needs to do directly, no
353+ # further action needed here.
354+ pass
355+ else:
356+ raise AssertionError('unknown hook_status: %r' % (hook_status,))
357+ if not self.this_tree.has_id(file_id) and result == "modified":
358+ self.tt.version_file(file_id, trans_id)
359+ # The merge has been performed, so the old contents should not be
360+ # retained.
361+ try:
362+ self.tt.delete_contents(trans_id)
363+ except errors.NoSuchFile:
364+ pass
365+ return result
366+
367+ def _default_other_winner_merge(self, merge_hook_params):
368+ """Replace this contents with other."""
369+ file_id = merge_hook_params.file_id
370+ trans_id = merge_hook_params.trans_id
371+ file_in_this = self.this_tree.has_id(file_id)
372+ if self.other_tree.has_id(file_id):
373+ # OTHER changed the file
374+ wt = self.this_tree
375+ if wt.supports_content_filtering():
376+ # We get the path from the working tree if it exists.
377+ # That fails though when OTHER is adding a file, so
378+ # we fall back to the other tree to find the path if
379+ # it doesn't exist locally.
380+ try:
381+ filter_tree_path = wt.id2path(file_id)
382+ except errors.NoSuchId:
383+ filter_tree_path = self.other_tree.id2path(file_id)
384+ else:
385+ # Skip the id2path lookup for older formats
386+ filter_tree_path = None
387+ transform.create_from_tree(self.tt, trans_id,
388+ self.other_tree, file_id,
389+ filter_tree_path=filter_tree_path)
390+ return 'done', None
391+ elif file_in_this:
392+ # OTHER deleted the file
393+ return 'delete', None
394+ else:
395+ raise AssertionError(
396+ 'winner is OTHER, but file_id %r not in THIS or OTHER tree'
397+ % (file_id,))
398+
399+ def default_text_merge(self, merge_hook_params):
400+ if merge_hook_params.winner == 'other':
401 # OTHER is a straight winner, so replace this contents with other
402- file_in_this = file_id in self.this_tree
403- if file_in_this:
404- # Remove any existing contents
405- self.tt.delete_contents(trans_id)
406- if file_id in self.other_tree:
407- # OTHER changed the file
408- wt = self.this_tree
409- if wt.supports_content_filtering():
410- # We get the path from the working tree if it exists.
411- # That fails though when OTHER is adding a file, so
412- # we fall back to the other tree to find the path if
413- # it doesn't exist locally.
414- try:
415- filter_tree_path = wt.id2path(file_id)
416- except errors.NoSuchId:
417- filter_tree_path = self.other_tree.id2path(file_id)
418- else:
419- # Skip the id2path lookup for older formats
420- filter_tree_path = None
421- transform.create_from_tree(self.tt, trans_id,
422- self.other_tree, file_id,
423- filter_tree_path=filter_tree_path)
424- if not file_in_this:
425- self.tt.version_file(file_id, trans_id)
426- return "modified"
427- elif file_in_this:
428- # OTHER deleted the file
429- self.tt.unversion_file(trans_id)
430- return "deleted"
431+ return self._default_other_winner_merge(merge_hook_params)
432+ elif merge_hook_params.is_file_merge():
433+ # THIS and OTHER are both files, so text merge. Either
434+ # BASE is a file, or both converted to files, so at least we
435+ # have agreement that output should be a file.
436+ try:
437+ self.text_merge(merge_hook_params.file_id,
438+ merge_hook_params.trans_id)
439+ except errors.BinaryFile:
440+ return 'not_applicable', None
441+ return 'done', None
442 else:
443- # We have a hypothetical conflict, but if we have files, then we
444- # can try to merge the content
445- if this_pair[0] == 'file' and other_pair[0] == 'file':
446- # THIS and OTHER are both files, so text merge. Either
447- # BASE is a file, or both converted to files, so at least we
448- # have agreement that output should be a file.
449- try:
450- self.text_merge(file_id, trans_id)
451- except errors.BinaryFile:
452- return contents_conflict()
453- if file_id not in self.this_tree:
454- self.tt.version_file(file_id, trans_id)
455- try:
456- self.tt.tree_kind(trans_id)
457- self.tt.delete_contents(trans_id)
458- except errors.NoSuchFile:
459- pass
460- return "modified"
461- else:
462- return contents_conflict()
463+ return 'not_applicable', None
464
465 def get_lines(self, tree, file_id):
466 """Return the lines in a file, or an empty list."""
467- if file_id in tree:
468+ if tree.has_id(file_id):
469 return tree.get_file(file_id).readlines()
470 else:
471 return []
472@@ -1205,7 +1307,7 @@
473 """Perform a three-way text merge on a file_id"""
474 # it's possible that we got here with base as a different type.
475 # if so, we just want two-way text conflicts.
476- if file_id in self.base_tree and \
477+ if self.base_tree.has_id(file_id) and \
478 self.base_tree.kind(file_id) == "file":
479 base_lines = self.get_lines(self.base_tree, file_id)
480 else:
481@@ -1274,7 +1376,7 @@
482 versioned = False
483 file_group = []
484 for suffix, tree, lines in data:
485- if file_id in tree:
486+ if tree.has_id(file_id):
487 trans_id = self._conflict_file(name, parent_id, tree, file_id,
488 suffix, lines, filter_tree_path)
489 file_group.append(trans_id)
490@@ -1324,11 +1426,11 @@
491 if winner == "this":
492 executability = this_executable
493 else:
494- if file_id in self.other_tree:
495+ if self.other_tree.has_id(file_id):
496 executability = other_executable
497- elif file_id in self.this_tree:
498+ elif self.this_tree.has_id(file_id):
499 executability = this_executable
500- elif file_id in self.base_tree:
501+ elif self.base_tree_has_id(file_id):
502 executability = base_executable
503 if executability is not None:
504 trans_id = self.tt.trans_id_file_id(file_id)
505
506=== added directory 'bzrlib/plugins/news_merge'
507=== added file 'bzrlib/plugins/news_merge/README'
508--- bzrlib/plugins/news_merge/README 1970-01-01 00:00:00 +0000
509+++ bzrlib/plugins/news_merge/README 2010-01-18 11:01:23 +0000
510@@ -0,0 +1,9 @@
511+A plugin for merging bzr's NEWS file.
512+
513+Install as usual (e.g. symlink this directory into ~/.bazaar/plugins), and any
514+file with a path of NEWS that is in bzr's NEWS format will be merged with this
515+hook.
516+
517+This hook can resolve conflicts where both sides add entries at the same place.
518+If it encounters a more difficult conflict it gives up and bzr will fallback to
519+the default merge algorithm.
520
521=== added file 'bzrlib/plugins/news_merge/__init__.py'
522--- bzrlib/plugins/news_merge/__init__.py 1970-01-01 00:00:00 +0000
523+++ bzrlib/plugins/news_merge/__init__.py 2010-01-18 11:01:23 +0000
524@@ -0,0 +1,73 @@
525+# Copyright (C) 2010 Canonical Ltd
526+#
527+# This program is free software; you can redistribute it and/or modify
528+# it under the terms of the GNU General Public License as published by
529+# the Free Software Foundation; either version 2 of the License, or
530+# (at your option) any later version.
531+#
532+# This program is distributed in the hope that it will be useful,
533+# but WITHOUT ANY WARRANTY; without even the implied warranty of
534+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
535+# GNU General Public License for more details.
536+#
537+# You should have received a copy of the GNU General Public License
538+# along with this program; if not, write to the Free Software
539+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
540+
541+"""Merge hook for bzr's NEWS file.
542+
543+To enable this plugin, add a section to your branch.conf or location.conf
544+like::
545+
546+ [/home/user/code/bzr]
547+ news_merge_files = NEWS
548+ news_merge_files:policy = recurse
549+
550+The news_merge_files config option takes a list of file paths, separated by
551+commas.
552+
553+Limitations:
554+
555+* if there's a conflict in more than just bullet points, this doesn't yet know
556+ how to resolve that, so bzr will fallback to the default line-based merge.
557+"""
558+
559+# Put most of the code in a separate module that we lazy-import to keep the
560+# overhead of this plugin as minimal as possible.
561+from bzrlib.lazy_import import lazy_import
562+lazy_import(globals(), """
563+from bzrlib.plugins.news_merge.news_merge import news_merger
564+""")
565+
566+from bzrlib.merge import Merger
567+
568+
569+def news_merge_hook(params):
570+ """Merger.merge_file_content hook function for bzr-format NEWS files."""
571+ # First, check whether this custom merge logic should be used. We expect
572+ # most files should not be merged by this file.
573+ if params.winner == 'other':
574+ # OTHER is a straight winner, rely on default merge.
575+ return 'not_applicable', None
576+ elif not params.is_file_merge():
577+ # THIS and OTHER aren't both files.
578+ return 'not_applicable', None
579+ elif not filename_matches_config(params):
580+ # The filename isn't listed in the 'news_merge_files' config option.
581+ return 'not_applicable', None
582+ return news_merger(params)
583+
584+
585+def filename_matches_config(params):
586+ config = params.merger.this_branch.get_config()
587+ affected_files = config.get_user_option_as_list('news_merge_files')
588+ if affected_files:
589+ filename = params.merger.this_tree.id2path(params.file_id)
590+ if filename in affected_files:
591+ return True
592+ return False
593+
594+
595+Merger.hooks.install_named_hook(
596+ 'merge_file_content', news_merge_hook, 'NEWS file merge')
597+
598
599=== added file 'bzrlib/plugins/news_merge/news_merge.py'
600--- bzrlib/plugins/news_merge/news_merge.py 1970-01-01 00:00:00 +0000
601+++ bzrlib/plugins/news_merge/news_merge.py 2010-01-18 11:01:23 +0000
602@@ -0,0 +1,88 @@
603+# Copyright (C) 2010 Canonical Ltd
604+#
605+# This program is free software; you can redistribute it and/or modify
606+# it under the terms of the GNU General Public License as published by
607+# the Free Software Foundation; either version 2 of the License, or
608+# (at your option) any later version.
609+#
610+# This program is distributed in the hope that it will be useful,
611+# but WITHOUT ANY WARRANTY; without even the implied warranty of
612+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
613+# GNU General Public License for more details.
614+#
615+# You should have received a copy of the GNU General Public License
616+# along with this program; if not, write to the Free Software
617+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
618+
619+"""Merge logic for news_merge plugin."""
620+
621+
622+from bzrlib.plugins.news_merge.parser import simple_parse
623+from bzrlib import merge3
624+
625+
626+magic_marker = '|NEWS-MERGE-MAGIC-MARKER|'
627+
628+
629+def news_merger(params):
630+ """Perform a simple 3-way merge of a bzr NEWS file.
631+
632+ Each section of a bzr NEWS file is essentially an ordered set of bullet
633+ points, so we can simply take a set of bullet points, determine which
634+ bullets to add and which to remove, sort, and reserialize.
635+ """
636+ # Transform the different versions of the NEWS file into a bunch of text
637+ # lines where each line matches one part of the overall structure, e.g. a
638+ # heading or bullet.
639+ def munge(lines):
640+ return list(blocks_to_fakelines(simple_parse(''.join(lines))))
641+ this_lines = munge(params.this_lines)
642+ other_lines = munge(params.other_lines)
643+ base_lines = munge(params.base_lines)
644+ m3 = merge3.Merge3(base_lines, this_lines, other_lines)
645+ result_lines = []
646+ for group in m3.merge_groups():
647+ if group[0] == 'conflict':
648+ _, base, a, b = group
649+ # Are all the conflicting lines bullets? If so, we can merge this.
650+ for line_set in [base, a, b]:
651+ for line in line_set:
652+ if not line.startswith('bullet'):
653+ # Something else :(
654+ # Maybe the default merge can cope.
655+ return 'not_applicable', None
656+ # Calculate additions and deletions.
657+ new_in_a = set(a).difference(base)
658+ new_in_b = set(b).difference(base)
659+ all_new = new_in_a.union(new_in_b)
660+ deleted_in_a = set(base).difference(a)
661+ deleted_in_b = set(base).difference(b)
662+ # Combine into the final set of bullet points.
663+ final = all_new.difference(deleted_in_a).difference(deleted_in_b)
664+ # Sort, and emit.
665+ final = sorted(final, key=sort_key)
666+ result_lines.extend(final)
667+ else:
668+ result_lines.extend(group[1])
669+ # Transform the merged elements back into real blocks of lines.
670+ return 'success', list(fakelines_to_blocks(result_lines))
671+
672+
673+def blocks_to_fakelines(blocks):
674+ for kind, text in blocks:
675+ yield '%s%s%s' % (kind, magic_marker, text)
676+
677+
678+def fakelines_to_blocks(fakelines):
679+ fakelines = list(fakelines)
680+ # Strip out the magic_marker, and reinstate the \n\n between blocks
681+ for fakeline in fakelines[:-1]:
682+ yield fakeline.split(magic_marker, 1)[1] + '\n\n'
683+ # The final block doesn't have a trailing \n\n.
684+ for fakeline in fakelines[-1:]:
685+ yield fakeline.split(magic_marker, 1)[1]
686+
687+
688+def sort_key(s):
689+ return s.replace('`', '').lower()
690+
691
692=== added file 'bzrlib/plugins/news_merge/parser.py'
693--- bzrlib/plugins/news_merge/parser.py 1970-01-01 00:00:00 +0000
694+++ bzrlib/plugins/news_merge/parser.py 2010-01-18 11:01:23 +0000
695@@ -0,0 +1,62 @@
696+# Copyright (C) 2010 Canonical Ltd
697+#
698+# This program is free software; you can redistribute it and/or modify
699+# it under the terms of the GNU General Public License as published by
700+# the Free Software Foundation; either version 2 of the License, or
701+# (at your option) any later version.
702+#
703+# This program is distributed in the hope that it will be useful,
704+# but WITHOUT ANY WARRANTY; without even the implied warranty of
705+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
706+# GNU General Public License for more details.
707+#
708+# You should have received a copy of the GNU General Public License
709+# along with this program; if not, write to the Free Software
710+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
711+
712+"""Simple parser for bzr's NEWS file.
713+
714+Simple as this is, it's a bit over-powered for news_merge's needs, which only
715+cares about 'bullet' and 'everything else'.
716+
717+This module can be run as a standalone Python program; pass it a filename and
718+it will print the parsed form of a file (a series of 2-tuples, see
719+simple_parse's docstring).
720+"""
721+
722+
723+def simple_parse(content):
724+ """Returns blocks, where each block is a 2-tuple (kind, text).
725+
726+ :kind: one of 'heading', 'release', 'section', 'empty' or 'text'.
727+ :text: a str, including newlines.
728+ """
729+ blocks = content.split('\n\n')
730+ for block in blocks:
731+ if block.startswith('###'):
732+ # First line is ###...: Top heading
733+ yield 'heading', block
734+ continue
735+ last_line = block.rsplit('\n', 1)[-1]
736+ if last_line.startswith('###'):
737+ # last line is ###...: 2nd-level heading
738+ yield 'release', block
739+ elif last_line.startswith('***'):
740+ # last line is ***...: 3rd-level heading
741+ yield 'section', block
742+ elif block.startswith('* '):
743+ # bullet
744+ yield 'bullet', block
745+ elif block.strip() == '':
746+ # empty
747+ yield 'empty', block
748+ else:
749+ # plain text
750+ yield 'text', block
751+
752+
753+if __name__ == '__main__':
754+ import sys
755+ content = open(sys.argv[1], 'rb').read()
756+ for result in simple_parse(content):
757+ print result
758
759=== modified file 'bzrlib/tests/__init__.py'
760--- bzrlib/tests/__init__.py 2010-01-17 20:06:41 +0000
761+++ bzrlib/tests/__init__.py 2010-01-18 11:01:23 +0000
762@@ -3752,6 +3752,7 @@
763 return [
764 'bzrlib',
765 'bzrlib.branchbuilder',
766+ 'bzrlib.decorators',
767 'bzrlib.export',
768 'bzrlib.inventory',
769 'bzrlib.iterablefile',
770
771=== modified file 'bzrlib/tests/per_merger.py'
772--- bzrlib/tests/per_merger.py 2009-12-18 06:54:20 +0000
773+++ bzrlib/tests/per_merger.py 2010-01-18 11:01:23 +0000
774@@ -18,6 +18,7 @@
775
776 import os
777
778+from bzrlib.conflicts import TextConflict
779 from bzrlib import (
780 errors,
781 merge as _mod_merge,
782@@ -28,6 +29,7 @@
783 multiply_tests,
784 TestCaseWithTransport,
785 )
786+from bzrlib.tests.test_merge_core import MergeBuilder
787 from bzrlib.transform import TreeTransform
788
789
790@@ -188,4 +190,171 @@
791 self.assertRaises(errors.LockError, wt.unlock)
792
793
794+class TestHookMergeFileContent(TestCaseWithTransport):
795+ """Tests that the 'merge_file_content' hook is invoked."""
796+
797+ def setUp(self):
798+ TestCaseWithTransport.setUp(self)
799+ self.hook_log = []
800+
801+ def install_hook_noop(self):
802+ def hook_na(merge_params):
803+ # This hook unconditionally does nothing.
804+ self.hook_log.append(('no-op',))
805+ return 'not_applicable', None
806+ _mod_merge.Merger.hooks.install_named_hook(
807+ 'merge_file_content', hook_na, 'test hook (no-op)')
808+
809+ def install_hook_success(self):
810+ def hook_success(merge_params):
811+ self.hook_log.append(('success',))
812+ if merge_params.file_id == '1':
813+ return 'success', ['text-merged-by-hook']
814+ return 'not_applicable', None
815+ _mod_merge.Merger.hooks.install_named_hook(
816+ 'merge_file_content', hook_success, 'test hook (success)')
817+
818+ def install_hook_conflict(self):
819+ def hook_conflict(merge_params):
820+ self.hook_log.append(('conflict',))
821+ if merge_params.file_id == '1':
822+ return 'conflicted', ['text-with-conflict-markers-from-hook']
823+ return 'not_applicable', None
824+ _mod_merge.Merger.hooks.install_named_hook(
825+ 'merge_file_content', hook_conflict, 'test hook (delete)')
826+
827+ def install_hook_delete(self):
828+ def hook_delete(merge_params):
829+ self.hook_log.append(('delete',))
830+ if merge_params.file_id == '1':
831+ return 'delete', None
832+ return 'not_applicable', None
833+ _mod_merge.Merger.hooks.install_named_hook(
834+ 'merge_file_content', hook_delete, 'test hook (delete)')
835+
836+ def install_hook_log_lines(self):
837+ """Install a hook that saves the get_lines for the this, base and other
838+ versions of the file.
839+ """
840+ def hook_log_lines(merge_params):
841+ self.hook_log.append((
842+ 'log_lines',
843+ merge_params.this_lines,
844+ merge_params.other_lines,
845+ merge_params.base_lines,
846+ ))
847+ return 'not_applicable', None
848+ _mod_merge.Merger.hooks.install_named_hook(
849+ 'merge_file_content', hook_log_lines, 'test hook (log_lines)')
850+
851+ def make_merge_builder(self):
852+ builder = MergeBuilder(self.test_base_dir)
853+ self.addCleanup(builder.cleanup)
854+ return builder
855+
856+ def create_file_needing_contents_merge(self, builder, file_id):
857+ builder.add_file(file_id, builder.tree_root, "name1", "text1", True)
858+ builder.change_contents(file_id, other="text4", this="text3")
859+
860+ def test_change_vs_change(self):
861+ """Hook is used for (changed, changed)"""
862+ self.install_hook_success()
863+ builder = self.make_merge_builder()
864+ builder.add_file("1", builder.tree_root, "name1", "text1", True)
865+ builder.change_contents("1", other="text4", this="text3")
866+ conflicts = builder.merge(self.merge_type)
867+ self.assertEqual(conflicts, [])
868+ self.assertEqual(
869+ builder.this.get_file('1').read(), 'text-merged-by-hook')
870+
871+ def test_change_vs_deleted(self):
872+ """Hook is used for (changed, deleted)"""
873+ self.install_hook_success()
874+ builder = self.make_merge_builder()
875+ builder.add_file("1", builder.tree_root, "name1", "text1", True)
876+ builder.change_contents("1", this="text2")
877+ builder.remove_file("1", other=True)
878+ conflicts = builder.merge(self.merge_type)
879+ self.assertEqual(conflicts, [])
880+ self.assertEqual(
881+ builder.this.get_file('1').read(), 'text-merged-by-hook')
882+
883+ def test_result_can_be_delete(self):
884+ """A hook's result can be the deletion of a file."""
885+ self.install_hook_delete()
886+ builder = self.make_merge_builder()
887+ self.create_file_needing_contents_merge(builder, "1")
888+ conflicts = builder.merge(self.merge_type)
889+ self.assertEqual(conflicts, [])
890+ self.assertRaises(errors.NoSuchId, builder.this.id2path, '1')
891+ self.assertEqual([], list(builder.this.list_files()))
892+
893+ def test_result_can_be_conflict(self):
894+ """A hook's result can be a conflict."""
895+ self.install_hook_conflict()
896+ builder = self.make_merge_builder()
897+ self.create_file_needing_contents_merge(builder, "1")
898+ conflicts = builder.merge(self.merge_type)
899+ self.assertEqual(conflicts, [TextConflict('name1', file_id='1')])
900+ # The hook still gets to set the file contents in this case, so that it
901+ # can insert custom conflict markers.
902+ self.assertEqual(
903+ builder.this.get_file('1').read(),
904+ 'text-with-conflict-markers-from-hook')
905+
906+ def test_can_access_this_other_and_base_versions(self):
907+ """The hook function can call params.merger.get_lines to access the
908+ THIS/OTHER/BASE versions of the file.
909+ """
910+ self.install_hook_log_lines()
911+ builder = self.make_merge_builder()
912+ builder.add_file("1", builder.tree_root, "name1", "text1", True)
913+ builder.change_contents("1", this="text2", other="text3")
914+ conflicts = builder.merge(self.merge_type)
915+ self.assertEqual(
916+ [('log_lines', ['text2'], ['text3'], ['text1'])], self.hook_log)
917+
918+ def test_chain_when_not_applicable(self):
919+ """When a hook function returns not_applicable, the next function is
920+ tried (when one exists).
921+ """
922+ self.install_hook_noop()
923+ self.install_hook_success()
924+ builder = self.make_merge_builder()
925+ self.create_file_needing_contents_merge(builder, "1")
926+ conflicts = builder.merge(self.merge_type)
927+ self.assertEqual(conflicts, [])
928+ self.assertEqual(
929+ builder.this.get_file('1').read(), 'text-merged-by-hook')
930+ self.assertEqual([('no-op',), ('success',)], self.hook_log)
931+
932+ def test_chain_stops_after_success(self):
933+ """When a hook function returns success, no later functions are tried.
934+ """
935+ self.install_hook_success()
936+ self.install_hook_noop()
937+ builder = self.make_merge_builder()
938+ self.create_file_needing_contents_merge(builder, "1")
939+ conflicts = builder.merge(self.merge_type)
940+ self.assertEqual([('success',)], self.hook_log)
941+
942+ def test_chain_stops_after_conflict(self):
943+ """When a hook function returns conflict, no later functions are tried.
944+ """
945+ self.install_hook_conflict()
946+ self.install_hook_noop()
947+ builder = self.make_merge_builder()
948+ self.create_file_needing_contents_merge(builder, "1")
949+ conflicts = builder.merge(self.merge_type)
950+ self.assertEqual([('conflict',)], self.hook_log)
951+
952+ def test_chain_stops_after_delete(self):
953+ """When a hook function returns delete, no later functions are tried.
954+ """
955+ self.install_hook_delete()
956+ self.install_hook_noop()
957+ builder = self.make_merge_builder()
958+ self.create_file_needing_contents_merge(builder, "1")
959+ conflicts = builder.merge(self.merge_type)
960+ self.assertEqual([('delete',)], self.hook_log)
961