Merge lp:~spiv/bzr/per-file-merge-hook-491711 into lp:bzr

Proposed by Andrew Bennetts
Status: Merged
Merged at revision: not available
Proposed branch: lp:~spiv/bzr/per-file-merge-hook-491711
Merge into: lp:bzr
Prerequisite: lp:~spiv/bzr/merge-minor-refactoring
Diff against target: 960 lines (+672/-74)
10 files modified
NEWS (+16/-0)
bzrlib/decorators.py (+76/-0)
bzrlib/hooks.py (+2/-0)
bzrlib/merge.py (+176/-74)
bzrlib/plugins/news_merge/README (+9/-0)
bzrlib/plugins/news_merge/__init__.py (+73/-0)
bzrlib/plugins/news_merge/news_merge.py (+88/-0)
bzrlib/plugins/news_merge/parser.py (+62/-0)
bzrlib/tests/__init__.py (+1/-0)
bzrlib/tests/per_merger.py (+169/-0)
To merge this branch: bzr merge lp:~spiv/bzr/per-file-merge-hook-491711
Reviewer Review Type Date Requested Status
John A Meinel Approve
Vincent Ladeuil Approve
Review via email: mp+17279@code.launchpad.net

This proposal supersedes a proposal from 2009-12-18.

To post a comment you must log in.
Revision history for this message
Andrew Bennetts (spiv) wrote : Posted in a previous version of this proposal

This branch adds a hook point that allows a plugin to perform the merging of the contents of specific files (see bug 491711).

The tests hopefully make the capabilities fairly clear. In short:

 * it is invoked for all changes where one side has changed the content, and the other side has changed the content or deleted the file
 * hook functions are passed the merger, file_id and trans_id (so can get access to filenames and file contents in the various this/other/base trees)
 * hook functions can explicitly do nothing (allowing the next registered function to try), or return a conflict, a successful merge, or a deletion of the file.
 * hook functions can set the contents of a file in both the success and conflict case

Some open questions (could be addressed in future patches perhaps):

 * should merge of a file rename vs. a text change use this hook point?
 * should a hook function be able to have some control over the emission of THIS/BASE/OTHER files when it results in a conflict, or over their contents of those files?
 * what helpers can we provide to help people write hook functions?
 * what more can we do to help multiple hook registrations co-exist peacefully? e.g. These functions already chain, but if multiple plugins register with this hook the ordering is not defined.

I think ideally this hook point would be marked experimental until we get some practical experience proving that it works well in practice (e.g. that someone can write a plugin to merge NEWS files), but that's what beta series are for ;)

Revision history for this message
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal

On Fri, 2009-12-18 at 08:08 +0000, Andrew Bennetts wrote:
>
> * it is invoked for all changes where one side has changed the
> content, and the other side has changed the content or deleted the fil

Why not 'all changes' ? I don't see why we should prevent hooks merging
even one-side changes in a more sophisticated manner.

> * should a hook function be able to have some control over the
> emission of THIS/BASE/OTHER files when it results in a conflict, or
> over their contents of those files?

I think yes, but that can come later.

> * what helpers can we provide to help people write hook functions?
> * what more can we do to help multiple hook registrations co-exist
> peacefully? e.g. These functions already chain, but if multiple
> plugins register with this hook the ordering is not defined.

I don't think the ordering matters, except in the case that someone is
writing a new 'fallback' hook, to replace our three-way merge attempt.
Perhaps put a priority into the hook or something? I'd wait for signs we
need it.

> I think ideally this hook point would be marked experimental until we
> get some practical experience proving that it works well in practice
> (e.g. that someone can write a plugin to merge NEWS files), but that's
> what beta series are for ;)

You can just mark it experimental in the docs ;)

-Rob

Revision history for this message
James Westby (james-w) wrote : Posted in a previous version of this proposal

On Fri, 18 Dec 2009 08:08:34 -0000, Andrew Bennetts <email address hidden> wrote:
> This branch adds a hook point that allows a plugin to perform the merging of the contents of specific files (see bug 491711).

Yay, thanks for working on this.

> The tests hopefully make the capabilities fairly clear. In short:
>
> * hook functions are passed the merger, file_id and trans_id (so can
> get access to filenames and file contents in the various
> this/other/base trees)

Is there any other information that is readily available?

I am a little concerned about the performance issues. bzr-builddeb is
going to be shipping one of these hooks, and I don't want that to
destroy the performance of merge.

At minimum on every conflict it has to look up the path names. If
we do as Rob suggests and call the hook on every modification that
is going to be O(tree) lookups.

Some way to short-circuit this might be good. Either passing in the
paths if they are already known (and are likely to be known even
with refactoring), or even allowing the hook to declare what paths
it is interested in (though that could be limiting).

  (e.g. if path_filter is not None and basename == path_filter:)

Have you written a very simple implementation of e.g. a NEWS merger
and measured the impact?

> Some open questions (could be addressed in future patches perhaps):
>
> * should merge of a file rename vs. a text change use this hook
> point?

I imagine we would want to make it possible.

Some helpers would be useful for splitting out the cases, e.g. I imagine
many will just want to act on two text changes.

> * should a hook function be able to have some control over the
> emission of THIS/BASE/OTHER files when it results in a conflict, or
> over their contents of those files?

I'm not sure I see a use case for this.

> * what helpers can we provide to help people write hook functions?

Helpers for getting the paths, and other meta information.

Helpers for determining the cause of the conflict, text changes vs.
path changes etc.

> * what more can we do to help multiple hook registrations co-exist
> peacefully? e.g. These functions already chain, but if multiple
> plugins register with this hook the ordering is not defined.

I think using a known ordering would be good. Even using sorted() on
the hook names. This would allow you to place something at the end
by calling it zzfallback for instance.

Otherwise changing the API to allow a priority could be good. However,
these APIs are always tricky to deal with, as you are playing a game
of rock-paper-scissors with and unbounded number of choices.

I bring it up now, as it may require API changes in the registration
to do this.

Thanks,

James

Revision history for this message
Aaron Bentley (abentley) wrote : Posted in a previous version of this proposal

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> On Fri, 2009-12-18 at 08:08 +0000, Andrew Bennetts wrote:
>> * it is invoked for all changes where one side has changed the
>> content, and the other side has changed the content or deleted the fil
>
> Why not 'all changes' ? I don't see why we should prevent hooks merging
> even one-side changes in a more sophisticated manner.

That would require us to pay attention to THIS-only changes, and that
would be a performance regression.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAksrtrIACgkQ0F+nu1YWqI3a6wCeNdmME7ZY/xSZ7ma6snOd30fE
XUUAni6xZaHfi47GfpkTivlXF8uFDa5w
=uew/
-----END PGP SIGNATURE-----

Revision history for this message
Aaron Bentley (abentley) wrote : Posted in a previous version of this proposal

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

James Westby wrote:
> On Fri, 18 Dec 2009 08:08:34 -0000, Andrew Bennetts <email address hidden> wrote:
>> This branch adds a hook point that allows a plugin to perform the merging of the contents of specific files (see bug 491711).
>> * should a hook function be able to have some control over the
>> emission of THIS/BASE/OTHER files when it results in a conflict, or
>> over their contents of those files?
>
> I'm not sure I see a use case for this.

merge3 and weave merge emit different things as their BASE, so if we
hooked merge to force a weave merge for certain files, we might want to
also use that style of BASE.

Can't see any use for controlling THIS or OTHER, though.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAksrt6oACgkQ0F+nu1YWqI26pgCdF24SIHw4I4pnYV1DpikUoPwT
3EgAn08cMSUKDneD39c0rBt4aKK+zYCh
=QYTu
-----END PGP SIGNATURE-----

Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> On Fri, 2009-12-18 at 08:08 +0000, Andrew Bennetts wrote:
>> * it is invoked for all changes where one side has changed the
>> content, and the other side has changed the content or deleted the fil
>
> Why not 'all changes' ? I don't see why we should prevent hooks merging
> even one-side changes in a more sophisticated manner.

I assume this is because the merge code itself splits these cases. Files
with clear winners don't get passed into the Merger code. Also, there is
bidirectional issues.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksrw+oACgkQJdeBCYSNAANPTQCeLRj5ST7fT58nGOkIJldtyNis
9fsAoMnNoCWak28nr8shdECpIyLRbxtY
=lRgj
-----END PGP SIGNATURE-----

Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...
>
> Is there any other information that is readily available?
>
> I am a little concerned about the performance issues. bzr-builddeb is
> going to be shipping one of these hooks, and I don't want that to
> destroy the performance of merge.
>
> At minimum on every conflict it has to look up the path names. If
> we do as Rob suggests and call the hook on every modification that
> is going to be O(tree) lookups.

Still O(changes), I don't see how you get O(tree).

>
> Some way to short-circuit this might be good. Either passing in the
> paths if they are already known (and are likely to be known even
> with refactoring), or even allowing the hook to declare what paths
> it is interested in (though that could be limiting).

I think passing in the "path_in_this_tree, path_in_other_tree, and
path_in_final_tree" as determined by the current merge logic would be
reasonable.

>
> (e.g. if path_filter is not None and basename == path_filter:)
>
> Have you written a very simple implementation of e.g. a NEWS merger
> and measured the impact?
>
>> Some open questions (could be addressed in future patches perhaps):
>>
>> * should merge of a file rename vs. a text change use this hook
>> point?
>
> I imagine we would want to make it possible.
>
> Some helpers would be useful for splitting out the cases, e.g. I imagine
> many will just want to act on two text changes.
>
>> * should a hook function be able to have some control over the
>> emission of THIS/BASE/OTHER files when it results in a conflict, or
>> over their contents of those files?
>
> I'm not sure I see a use case for this.
>
>> * what helpers can we provide to help people write hook functions?
>
> Helpers for getting the paths, and other meta information.
>
> Helpers for determining the cause of the conflict, text changes vs.
> path changes etc.
>
>> * what more can we do to help multiple hook registrations co-exist
>> peacefully? e.g. These functions already chain, but if multiple
>> plugins register with this hook the ordering is not defined.
>
> I think using a known ordering would be good. Even using sorted() on
> the hook names. This would allow you to place something at the end
> by calling it zzfallback for instance.
>
> Otherwise changing the API to allow a priority could be good. However,
> these APIs are always tricky to deal with, as you are playing a game
> of rock-paper-scissors with and unbounded number of choices.
>
> I bring it up now, as it may require API changes in the registration
> to do this.
>
> Thanks,
>
> James

I think explicitly making them unordered forces people to think about
whether they can do what they need in an unordered fashion.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksrxLwACgkQJdeBCYSNAANiYwCgpaTb3KqL65jqOHou/mpzrQwt
/3oAn1TyJC5/tS1+Xc0+3//PRxoai0lI
=Gavq
-----END PGP SIGNATURE-----

Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Aaron Bentley wrote:
> James Westby wrote:
>> On Fri, 18 Dec 2009 08:08:34 -0000, Andrew Bennetts <email address hidden> wrote:
>>> This branch adds a hook point that allows a plugin to perform the merging of the contents of specific files (see bug 491711).
>>> * should a hook function be able to have some control over the
>>> emission of THIS/BASE/OTHER files when it results in a conflict, or
>>> over their contents of those files?
>> I'm not sure I see a use case for this.
>
> merge3 and weave merge emit different things as their BASE, so if we
> hooked merge to force a weave merge for certain files, we might want to
> also use that style of BASE.
>
> Can't see any use for controlling THIS or OTHER, though.
>
> Aaron

Well, potentially a whitespace ignoring merger may want to pass that
ability on to the output files?

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksrxN0ACgkQJdeBCYSNAAMJ9gCgxngkhtNINZPw7iNuWPe8JUs3
6UIAn30zdkrAz6hhN2PoLlhgxsfy0vCm
=omK9
-----END PGP SIGNATURE-----

Revision history for this message
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal

On Fri, 2009-12-18 at 17:09 +0000, Aaron Bentley wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Robert Collins wrote:
> > On Fri, 2009-12-18 at 08:08 +0000, Andrew Bennetts wrote:
> >> * it is invoked for all changes where one side has changed the
> >> content, and the other side has changed the content or deleted the fil
> >
> > Why not 'all changes' ? I don't see why we should prevent hooks merging
> > even one-side changes in a more sophisticated manner.
>
> That would require us to pay attention to THIS-only changes, and that
> would be a performance regression.

So there is a middle ground, where OTHER-only changes are examined.
There are a number of useful things one can do where OTHER-only changes
are examined (such as better merges of NEWS items by accounting for
changes after a release) which the current definition won't help with. I
see that checking for THIS-only changes would have an impact, and while
it might be nice, I don't see a need to do it at this point.

-Rob

Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal
Download full text (6.4 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> Andrew Bennetts has proposed merging lp:~spiv/bzr/per-file-merge-hook-491711 into lp:bzr with lp:~spiv/bzr/merge-minor-refactoring as a prerequisite.
>
> Requested reviews:
> bzr-core (bzr-core)
> Related bugs:
> #491711 hook to permit per file merges
> https://bugs.launchpad.net/bugs/491711
>
>
> This branch adds a hook point that allows a plugin to perform the merging of the contents of specific files (see bug 491711).
>
> The tests hopefully make the capabilities fairly clear. In short:
>
> * it is invoked for all changes where one side has changed the content, and the other side has changed the content or deleted the file
> * hook functions are passed the merger, file_id and trans_id (so can get access to filenames and file contents in the various this/other/base trees)
> * hook functions can explicitly do nothing (allowing the next registered function to try), or return a conflict, a successful merge, or a deletion of the file.
> * hook functions can set the contents of a file in both the success and conflict case
>
> Some open questions (could be addressed in future patches perhaps):
>
> * should merge of a file rename vs. a text change use this hook point?
> * should a hook function be able to have some control over the emission of THIS/BASE/OTHER files when it results in a conflict, or over their contents of those files?
> * what helpers can we provide to help people write hook functions?
> * what more can we do to help multiple hook registrations co-exist peacefully? e.g. These functions already chain, but if multiple plugins register with this hook the ordering is not defined.
>
> I think ideally this hook point would be marked experimental until we get some practical experience proving that it works well in practice (e.g. that someone can write a plugin to merge NEWS files), but that's what beta series are for ;)
>
>

+ self.create_hook(hooks.HookPoint('merge_file_content',
+ "Called when file content needs to be merged (including
when one "
+ "side has deleted the file and the other has changed it)."
+ "merge_file_content is called with a "
+ "bzrlib.merge.MergeHookParams. The function should return a
tuple "
+ "of (status, lines), where status is one of 'not_applicable', "
+ "'success', 'conflicted', or 'delete'. If status is
success or "
+ "conflicted, then lines should be an iterable of the new
lines "
+ "for the file.",
+ (2, 1), None))

^- I'm wondering if we wouldn't want something smarter than a tuple of
(status, lines) being returned. It might give us a way to move to
supporting returning THIS/BASE/OTHER more directly. It may also make it
easier to handle path conflicts / updating the path as part of the hook.

I think that since the transform is being passed in (via the Merger
object), you actually can do any renaming you want. You just tell the
transform that the path for 'trans_id' should really be 'X'.

As james_w mentioned, we may want to make the api a little bit more
verbose, and have some helper functions o...

Read more...

review: Needs Information
Revision history for this message
Andrew Bennetts (spiv) wrote : Posted in a previous version of this proposal
Download full text (13.7 KiB)

First, thanks everyone for the deluge of thoughtful comments! It's
great to have so much enthusiasm about a patch :)

I'm going to reply to all the comments in this one jumbo reply, rather
than try to detangle the conversation threads this proposal has spawned.
I've skipped over some comments that are repeats of others, but I think
I've replied to every point. Do let me know if I've inadvertantly
forgotten something you'd like me to consider.

Robert Collins wrote:
> On Fri, 2009-12-18 at 08:08 +0000, Andrew Bennetts wrote:
> >
> > * it is invoked for all changes where one side has changed the
> > content, and the other side has changed the content or deleted the fil
>
> Why not 'all changes' ? I don't see why we should prevent hooks merging
> even one-side changes in a more sophisticated manner.

Hmm. I guess because if there's a clear one-side winner then I don't
think of that as a “merge” (from the point of view of that file), and I
didn't have a use-case for it. I was also trying to be mindful of
keeping the performance cost minimal, as Aaron mentioned. You have
given a use-case in a follow-up (“better merges of NEWS items by
accounting for changes after a release”), and that would definitely be
nice to have. I'll have a play and see what it would take to make
OTHER-only changes go via the hook, as your followup suggests. It
sounds like no-one is strongly interested in THIS-only at this point, so
I'm happy to ignore that for this patch.

Also, I know it's a bit of a cop-out answer, but a plugin could always
implement a whole new Merger if the hook isn't powerful enough...

John's reply to you said:

] I assume this is because the merge code itself splits these cases. Files
] with clear winners don't get passed into the Merger code. Also, there is
] bidirectional issues.

Right, I definitely did want to keep the changes as minimal as possible
because I'm not deeply familiar with the structure of the merge code,
but I do get the sense that it would be easy to accidentally introduce
bugs if I rearranged things.

I'm not certain what you mean by bidirectional issues? I think I can
make some educated guesses, but I'd rather you explained for me :)

> > * should a hook function be able to have some control over the
> > emission of THIS/BASE/OTHER files when it results in a conflict, or
> > over their contents of those files?
>
> I think yes, but that can come later.

Cool.

James Westby wrote:
> On Fri, 18 Dec 2009 08:08:34 -0000, Andrew Bennetts <email address hidden> wrote:
[...]
> > * hook functions are passed the merger, file_id and trans_id (so can
> > get access to filenames and file contents in the various
> > this/other/base trees)
>
> Is there any other information that is readily available?

As John says, the path in the various tree versions are fairly cheaply
available via the merger object. I think it'll be worth adding
convenience methods to get at them. I imagine some plugins might want
to just check the basename, others would like to check the full path,
some might just use the file_id.

> I am a little concerned about the performance issues. bzr-builddeb is
> going to be shipping one of these hooks, ...

Revision history for this message
James Westby (james-w) wrote : Posted in a previous version of this proposal

On Mon, 21 Dec 2009 02:39:14 -0000, Andrew Bennetts <email address hidden> wrote:
> I'm interested in the “on every conflict it [bzr-builddeb?] has to look
> up the path names” remark though — I don't think this hook as-is
> provides a convenient way to implement “try the regular logic, if that
> conflicts then try my custom plugin logic”. Is that something you want?
> I suspect the way to provide that nicely may be via making the regular
> logic interact via the same interfaces as the hook (i.e. receiving the
> same params object and returning the same values), and then providing a
> convenient way to call that via the hook params.

The way I imagine it working is that a hook will be registered by
bzr-builddeb that will merge debian/changelog files. The question
is to know what files to act on. We can try and parse the files
and just skip everything that doesn't parse as one of those files.
However, that will be both quite slow, plus hugely pessimistic.

Therefore I would want to short-circuit, and only bother trying
to parse files where the basename is "changelog". It would be
great if the basename was easily accesible to the hook so
that it could do this check without doing something like
trans_id->file_id->path.

> Yeah, that's a good point. Even many those that want to handle more than
> plain text vs text changes might still want to punt on delete vs text,
> for example. So I need to take the knowledge about what sort of changes
> are involved that's implicit in the code path now and turn that into
> some data that the hook function can inspect.

Yep, I'm mainly concerned in these comments about the API that is
presented to plugin writers. Being able simply separate out the cases
without each plugin having to cargo cult code to detect what situation
it is being called for would be good.

I think this will partially come from APIs, which I imagine are there,
I just haven't looked, and from docs. We should docuement the things
people commonly want to do so that it is obvious to plugin writers.
My concern would be that you write a hook that assumes you are always
dealing with text vs. text, and then it crashes when you get text vs.
delete. Making it clear that you can detect the cases at least points
you towards considering which you want to handle, even if the API
doesn't force you to think about them all.

> Well what sort of priority system do you have in mind that would be
> better than ordering by name?

Nothing in particular. I was just making a plea for a defined order.
If e.g. the order was whatever dict.keys() returns then there is no
way to get your hook to run last if you need to.

Thanks,

James

Revision history for this message
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal

On Mon, 2009-12-21 at 02:57 +0000, James Westby wrote:

> Nothing in particular. I was just making a plea for a defined order.
> If e.g. the order was whatever dict.keys() returns then there is no
> way to get your hook to run last if you need to.

You don't with names either.

'Run last' is very very hard to pin down.

'Topological order' is a bit easier, 'numeric order' is very simple.

Hooks don't have names so much as descriptions, I strongly urge /not/ to
use the labels as a sort order; ugh. We don't have orders elsewhere in
hook land and we haven't needed to - even though this discussion happens
every time.

Re: merging changelog, I'd lookup the full path and compare to
'debian/changelog' and 'changelog' - its simple and should be fast.

Andrew: are kind changes able to invoke this hook? If not please make
that clear.

I think passing in the Merger is fine; we pass in Branch in other hooks
and trust folk to use public bits and not make a mess. If someone does
fiddle, the conflict resolution phases will detect that pretty robustly
I think.

-Rob

Revision history for this message
Andrew Bennetts (spiv) wrote : Posted in a previous version of this proposal

Robert Collins wrote:
> On Mon, 2009-12-21 at 02:57 +0000, James Westby wrote:
>
> > Nothing in particular. I was just making a plea for a defined order.
> > If e.g. the order was whatever dict.keys() returns then there is no
> > way to get your hook to run last if you need to.
>
> You don't with names either.
>
> 'Run last' is very very hard to pin down.
>
> 'Topological order' is a bit easier, 'numeric order' is very simple.
>
> Hooks don't have names so much as descriptions, I strongly urge /not/ to
> use the labels as a sort order; ugh. We don't have orders elsewhere in
> hook land and we haven't needed to - even though this discussion happens
> every time.

Yeah, I don't much like punning that. I'd like to make the order
defined and stable somehow rather than dependent on the somewhat
unstable order that plugins will get loaded, though.

> Re: merging changelog, I'd lookup the full path and compare to
> 'debian/changelog' and 'changelog' - its simple and should be fast.

I definitely think that's what I'd try to start with. Maybe that will
be a significant performance hit in which case we can rethink it, but
let's not prematurely optimise.

> Andrew: are kind changes able to invoke this hook? If not please make
> that clear.

I don't know off the top of my head. I'll find out and make it
explicit. My guess is no, but I'm not very confident about that.

> I think passing in the Merger is fine; we pass in Branch in other hooks
> and trust folk to use public bits and not make a mess. If someone does
> fiddle, the conflict resolution phases will detect that pretty robustly
> I think.

I'm not worried about people accessing non-public bits (they should
already know that _foo attributes are off-limits unless they're ready to
deal with the likely breakage), apart from ensuring that the public bits
are complete enough for hooks to do what they need.

However, I think it's important to decide if we pass the Merger with the
intent that the hook will use it only read-only (to retrieve file names,
file contents, maybe details about other things in the tree, etc), or if
we actually want to support the hook mutating things via the Merger. It
seems the answer is the latter, which is fine, but it's worth being
clear about that. That answer suggests that my original proposal that
the return value from the hook makes changes isn't quite right when the
hook can just do change(s) directly using the Merger.

-Andrew.

Revision history for this message
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal

I'd suggest one of:
 return value indicates 'handled' or 'not handled'
 return value is parameter to pass to next hook or None-to-stop

and let changes that require calling fn's on Merger be done directly.

-Rob

Revision history for this message
Andrew Bennetts (spiv) wrote :

This addresses most (all?) of the points raised in the original reviews.

Major changes:

  * Added 'news_merge' plugin to contrib/! This is an example plugin using the merge hook function so that you can see it in action, and so that users will have some inspiration for writing their own.
  * The built-in merge logic is now invoked the same way as user-registered hook functions. So in theory anything the built-in logic can do, a plugin can do too.
  * the hook point is invoked a bit earlier, so it can have a say in more kinds of conflicts by inspecting the 'winner' attribute.
  * added some convenience methods/properties to the params object.
  * generally simpler and more linear code path than lp:bzr, thanks to refactorings for above changes.

I think looking at the news_merge plugin the API is still a bit cumbersome (note the repetition of logic vs. the built-in text merge to see if this is an ordinary regular file, both sides changed merge). But having an example proves it works, and gives us good data for thinking of improvements.

Revision history for this message
Martin Pool (mbp) wrote :

2010/1/13 Andrew Bennetts <email address hidden>:
> This addresses most (all?) of the points raised in the original reviews.
>
> Major changes:
>
>  * Added 'news_merge' plugin to contrib/!  This is an example plugin using the merge hook function so that you can see it in action, and so that users will have some inspiration for writing their own.
>  * The built-in merge logic is now invoked the same way as user-registered hook functions.  So in theory anything the built-in logic can do, a plugin can do too.
>  * the hook point is invoked a bit earlier, so it can have a say in more kinds of conflicts by inspecting the 'winner' attribute.
>  * added some convenience methods/properties to the params object.
>  * generally simpler and more linear code path than lp:bzr, thanks to refactorings for above changes.
>
> I think looking at the news_merge plugin the API is still a bit cumbersome (note the repetition of logic vs. the built-in text merge to see if this is an ordinary regular file, both sides changed merge).  But having an example proves it works, and gives us good data for thinking of improvements.

(Without reading the diff) that sounds like a nice list of changes.

I don't want to add latency to these changes but I would say that
putting things in contrib/ is not a fabulous way to get it out to
people: it relies on them discovering that it's there and how to use
it. It is a bit confusing for packagers whether they should install
the things from in there or not: if they don't, it's more
inaccessible, and if they do then the packaged version is quite
different from the standard one. Empirically, the other things in
there have tended to rot. They are not tested.

Therefore maybe you should put this in bzrlib.plugins but arrange that
it has no effect (and little load time overhead) unless it's
configured on?

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message
Robert Collins (lifeless) wrote :

On Wed, 2010-01-13 at 07:54 +0000, Martin Pool wrote:
>
> Therefore maybe you should put this in bzrlib.plugins but arrange that
> it has no effect (and little load time overhead) unless it's
> configured on?

+1000

-Rob

Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (6.1 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> Andrew Bennetts has proposed merging lp:~spiv/bzr/per-file-merge-hook-491711 into lp:bzr with lp:~spiv/bzr/merge-minor-refactoring as a prerequisite.
>
> Requested reviews:
> John A Meinel (jameinel)
> Related bugs:
> #491711 hook to permit per file merges
> https://bugs.launchpad.net/bugs/491711
>
>

I think overall this is looking good. Looking over the actual diff
brought up a lot of questions about how various things hook together. I
don't know that everything needs to be addressed before merging, but I
think it would be good to at least understand what is going on.

 review: needsfixing

+class MergeHookParams(object):
+ """Object holding parameters passed to merge_file_content hooks.
+
+ There are 3 fields hooks can access:
+

What about "this_pair", "other_pair", etc. I don't really understand
what the "pair" is. this_pair[0] looks to be a Kind. Also, you should
probably document "base_lines", "this_lines", and "other_lines" here as
well. The current documentation assumes you'll go to the merger, but
you've already provided helpers for that.

base_lines, this_lines and other_lines doesn't map very cleanly into
WeaveMerge... But maybe it is good enough.

+ if hook_status == 'not_applicable':
+ # This is a contents conflict, because none of the available
+ # functions could merge it.
+ result = None
+ name = self.tt.final_name(trans_id)
+ parent_id = self.tt.final_parent(trans_id)
+ if file_id in self.this_tree.inventory:
+ self.tt.unversion_file(trans_id)
+ file_group = self._dump_conflicts(name, parent_id, file_id,
+ set_version=True)
+ self._raw_conflicts.append(('contents conflict', file_group))

^- Are we sure this is correct? If there is a contents_conflict we mark
the file as unversioned?

Also, you are using "self.this_tree.inventory" but later we use "file_id
in self.this_tree".

I'd probably rather use "self.this_tree.has_id(file_id)" in both cases.
Trees shouldn't really have a __contains__ method, and I'd like to avoid
using ".inventory" if we can. (Most Dirstate actions can be resolved
without building an inventory at all.)

...

+ elif hook_status == 'conflicted':
+ # XXX: perhaps the hook should be able to provide
+ # the BASE/THIS/OTHER files?
+ self.tt.create_file(lines, trans_id)
+ self._raw_conflicts.append(('text conflict', trans_id))
+ name = self.tt.final_name(trans_id)
+ parent_id = self.tt.final_parent(trans_id)
+ file_group = self._dump_conflicts(name, parent_id, file_id)
+ file_group.append(trans_id)

^- Do you know why we are appending the trans_id here? file_group isn't
mentioned again. So is this mutating an internal structure behind the
scenes? (_dump_conflicts is creating a list and storing it in state
somewhere, and we mutate it to add the new id.) If so, a comment would
be nice. (Probably not your code, but certainly a "what is going on?"
moment.)

...

+ try:...

Read more...

review: Needs Fixing
Revision history for this message
Martin Pool (mbp) wrote :

As a follow on (after merging) it would be good to

* announce this on udd
* see if there is some existing code for merging debian changelogs and merge that

Revision history for this message
Andrew Bennetts (spiv) wrote :
Download full text (9.4 KiB)

John A Meinel wrote:
> Review: Needs Fixing
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Andrew Bennetts wrote:
> > Andrew Bennetts has proposed merging lp:~spiv/bzr/per-file-merge-hook-491711 into lp:bzr with lp:~spiv/bzr/merge-minor-refactoring as a prerequisite.
> >
> > Requested reviews:
> > John A Meinel (jameinel)
> > Related bugs:
> > #491711 hook to permit per file merges
> > https://bugs.launchpad.net/bugs/491711
> >
> >
>
> I think overall this is looking good. Looking over the actual diff
> brought up a lot of questions about how various things hook together. I
> don't know that everything needs to be addressed before merging, but I
> think it would be good to at least understand what is going on.

Fair enough. There are parts I don't fully understand, but those parts
I'm pretty sure are working the same as in trunk (and are equally
confusing in trunk).

Thanks for the review!

> +class MergeHookParams(object):
> + """Object holding parameters passed to merge_file_content hooks.
> +
> + There are 3 fields hooks can access:
> +
>
> What about "this_pair", "other_pair", etc. I don't really understand
> what the "pair" is. this_pair[0] looks to be a Kind. Also, you should

Yeah, it is a kind. This was a refactoring that I should have taken a
little further by just passing the kinds rather than the “pairs”. I've
done that now, and added this_kind and other_kind to the class
docstring.

> probably document "base_lines", "this_lines", and "other_lines" here as
> well. The current documentation assumes you'll go to the merger, but
> you've already provided helpers for that.

I've removed the outdated comment that suggests using the merger for
this.

The *_lines properties are already documented, though, they have their
own docstrings. Unless you think they should be mentioned in the class
docstring too, but I didn't think that was our usual practice?

> base_lines, this_lines and other_lines doesn't map very cleanly into
> WeaveMerge... But maybe it is good enough.

Yeah, I admit to being unsure about what to do with the non-default
Merger classes. I guess for now we leave it up to the plugins to check
the type of Merger if they are sensitive to it? I would like to give
plugins the chance to use their own hooks even in the case of
WeaveMerge.

> + if hook_status == 'not_applicable':
> + # This is a contents conflict, because none of the available
> + # functions could merge it.
> + result = None
> + name = self.tt.final_name(trans_id)
> + parent_id = self.tt.final_parent(trans_id)
> + if file_id in self.this_tree.inventory:
> + self.tt.unversion_file(trans_id)
> + file_group = self._dump_conflicts(name, parent_id, file_id,
> + set_version=True)
> + self._raw_conflicts.append(('contents conflict', file_group))
>
> ^- Are we sure this is correct? If there is a contents_conflict we mark
> the file as unversioned?

I'm not really sure why, but this is what lp:bzr does in these cases.
So this isn't an issue introduced by this branch.

> Also, you are u...

Read more...

Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (7.7 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...

>> + try:
>> + self.tt.tree_kind(trans_id)
>> + self.tt.delete_contents(trans_id)
>> + except errors.NoSuchFile:
>> + pass
>>
>> ^- I don't quite understand the unconditional "delete_contents" if the
>> contents exist. Especially given that:
>>
>> + elif hook_status == 'success':
>> + self.tt.create_file(lines, trans_id)
>>
>> Is called *before* this.
>
> Yes, this certainly puzzled me when I came across it in trunk! (Again,
> this oddity is definitely present in trunk, just that my patch has
> perhaps made it more visible.)
>
>> Is it that TreeTransform.delete_contents() indicates we want to delete
>> the content in the current tree, and "create_file()" is creating
>> contents in the "next" tree?
>
> Exactly how TreeTransform isn't clearly documented that I could find.
> If someone that knows precisely what the operations on it means and
> could document that it would be a great help, I think. AFAICS there
> isn't even a description of what a trans_id is... is an abbreviation of
> “transform id” or “transaction id”, are they file_ids, or something else
> but like file_ids? What's the difference between unversion_file()
> and delete_contents()? etc.

I can help with this.

A trans_id is a handle for the TreeTransform. Similar in concept to a
file_id mapping to a path in an inventory. However, TreeTransform can
talk about files that aren't versioned (have no trans_id), and multiple
copies of the same logical file (two people introduce the same file_id
at different locations, etc). As such, trans_id are an abstraction of a
specific file/directory content during the transaction. AFAICT they are
always of the form "new-%d".

unversion_file is saying that we need to remove the file_id from the
inventory.
delete_contents says we want to call 'os.remove()' on the object.

Note that you don't get a new trans_id for content being removed versus
content being added. I checked and _removed_contents indicates files
that are to be deleted from the wt, but you can *also* have the same
trans_id in _new_contents. So I think trans_id is like file_id, in that
it is how TT maps this file should be replacing that file.

I think part of it is that TT doesn't think about replacements. It only
has a remove phase and an add phase.

1) All new content is staged into .bzr/checkout/limbo, generally with
names like 'new-1'. Though we do try to create files in their final
directories with their final names if we know in advance. (So we don't
rename all files, just the containing dirs for 'bzr co'.)

2) All old files are renamed into .bzr/checkout/pending_deletion

3) All new files are renamed from .bzr/checkout/limbo into the working tree

4) All old files in .bzr/checkout/pending_deletion are removed

That is how we get atomicity of updates, so that if something fails in
(3) we can still revert back to the original tree.

>
> So, for this specific puzzle...
>
> The code I'm guessing is there to ensure that the file is marked as
> versioned, but that the temporary file used for the new contents is
> removed? Staring again at the treetransform.py...

Read more...

Revision history for this message
Aaron Bentley (abentley) wrote :
Download full text (4.7 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
>> ^- Are we sure this is correct? If there is a contents_conflict we mark
>> the file as unversioned?

That's correct. For a contents conflict that is not a text conflict,
bzr does not leave a file in place with the original filename, therefore
we also unversion it, and version the .OTHER, .THIS or .BASE--
whatever's there.

>> + elif hook_status == 'conflicted':
>> + # XXX: perhaps the hook should be able to provide
>> + # the BASE/THIS/OTHER files?
>> + self.tt.create_file(lines, trans_id)
>> + self._raw_conflicts.append(('text conflict', trans_id))
>> + name = self.tt.final_name(trans_id)
>> + parent_id = self.tt.final_parent(trans_id)
>> + file_group = self._dump_conflicts(name, parent_id, file_id)
>> + file_group.append(trans_id)
>>
>> ^- Do you know why we are appending the trans_id here? file_group isn't
>> mentioned again.

> Hmm. Again, this is something that lp:bzr does that I've
> preserved/copied. On closer inspection, I don't think it has any effect
> (either in lp:bzr or in my branch), so I'll just delete that.

This code doesn't appear to have ever made sense. It was introduced in
<email address hidden>. I
suspect it was meant to help with treating a file and its conflict files
as a group for conflict handling.

>> + try:
>> + self.tt.tree_kind(trans_id)
>> + self.tt.delete_contents(trans_id)
>> + except errors.NoSuchFile:
>> + pass
>>
>> ^- I don't quite understand the unconditional "delete_contents" if the
>> contents exist. Especially given that:
>>
>> + elif hook_status == 'success':
>> + self.tt.create_file(lines, trans_id)
>>
>> Is called *before* this.

Operations on a TreeTransform simply schedule things to happen when you
call 'apply'. Before and after are irrelevant (except when you cancel
an operation you had previously scheduled). All that matters is that
the tree transform is consistent when you call apply.

>> Is it that TreeTransform.delete_contents() indicates we want to delete
>> the content in the current tree, and "create_file()" is creating
>> contents in the "next" tree?

delete_contents means that applying the transform should delete these
contents, and create_file means that applying the transform should
create a file with specified contents.

> Exactly how TreeTransform isn't clearly documented that I could find.

How it what? Works?

> If someone that knows precisely what the operations on it means and
> could document that it would be a great help, I think.

I've certainly tried in the past, and I'm happy to enhance the
documentation. Could you start by explaining what is confusing or
missing from the TreeTransform docstring?

> AFAICS there
> isn't even a description of what a trans_id is...

There are three paragraphs of description:

> Transform/Transaction ids
> -------------------------
> trans_ids are temporary ids assigned to all files involved in a transform.
> It's possible, even common, that not all ...

Read more...

Revision history for this message
Andrew Bennetts (spiv) wrote :
Download full text (3.4 KiB)

Aaron Bentley wrote:
[...]
> > Exactly how TreeTransform isn't clearly documented that I could find.
>
> How it what? Works?

Yes, that's the missing word. I'm glad your telepathy is functioning
well :)

> > If someone that knows precisely what the operations on it means and
> > could document that it would be a great help, I think.
>
> I've certainly tried in the past, and I'm happy to enhance the
> documentation. Could you start by explaining what is confusing or
> missing from the TreeTransform docstring?

Simply that I didn't find it when I looked for it, believe or not! I
looked for a class docstring on the class that implemented these methods
and instance variables (TreeTransformBase), and then looked for a module
docstring, and then tried grepping doc/developers/. It didn't occur to
me to try looking a couple of classes (and 1000 lines) further down
transform.py (and a couple of classes further down the inheritance
hierarchy too) to see if one of those had introductory documentation.

This is the sort of situation where I miss dedicated “interface”
documentation, I guess.

Thanks for pointing me to that, I thought I'd seen it before but several
minutes of solid searching failed to find it for me! I expect I'll
remember it for next time.

[...]
> is an abbreviation of
> > “transform id” or “transaction id”
>
> Both, as you can see above.

[Eep! Ambiguity of names doesn't help clarity. I realise it's basically
an arbitrary choice, but I suggest picking one. Actually, it's also
that neither name is particuarly satisfying: one of these IDs doesn't
correspond to a single tree transform (which is done as a single
transaction), it corresponds to an element of that transform. So, I'd
probably find “tentry_id” (short for “transform entry ID”) clear and
mnemonic... but it's a bit late to change now :)
]

> >, are they file_ids, or something else
> > but like file_ids?
>
>
> > What's the difference between unversion_file()
> > and delete_contents()?
>
> One schedules the deletion of the contents of a file. The other
> schedules removes the removal of file id of the file. The docstring for
> delete_contents is:
>
> > """Schedule the contents of a path entry for deletion"""
>
> The docstring for unversion_file is:
> > """Schedule a path entry to become unversioned"""
>
> If this is unclear, could you say what about it is unclear?

First I should say I had an intuition about the meanings of these, and I
think my intuition was basically correct. So it's not so bad.

It's unclear for a few reasons:

 * the docstrings don't really add information not already in the method
   name,
 * my confidence in my intuition about the names was low because the
   terms were so close to synonymous, and
 * the terminology is not memorably clear.

By “memorably clear”, I mean that terms like “delete” and “unversion”
are close to synonymous, so the distinction is hard to notice and
recall.

I don't mean say you necessarily should have found better names, maybe
there are no better names for these concepts... none spring to mind for
me! But (assuming I had found the TreeTransform docstring in the first
place) some explicit descri...

Read more...

Revision history for this message
Andrew Bennetts (spiv) wrote :

John A Meinel wrote:
[...]
> Though it also means that calling:
> self.tt.tree_kind()
> self.tt.delete_contents()
>
> is redundant and double statting the file. So we should clean that up

Yeah, I'd noticed that, but wasn't sure if it was tasteful to clean that
up. Seeing as you think we should I've gone ahead and removed that line
:)

[...]
> So the comment looks correct to my understanding.

I've added a comment based on Aaron's wording.

[...]
> If I set up location.conf to say:
> [/home/jameinel/dev/bzr]
> news_merger_files = NEWS
> news_merge_files:policy = recurse

(Thanks for this, this is exactly the syntax I've used for the
news_merge plugin configuration)

> Will that match when 'bzr merge --preview' is invoked (what Branch is
> given to the PreviewTree, etc.) I'm not 100% sure that PreviewTree *has*
> a .branch property. (Given that RevisionTree does not, but has a
> _repository attribute. Even better it is passed in as 'branch'... :)

It turned out I had to pass the branch a little deeper in all cases, but
having done that 'merge --preview' works just as well as regular 'merge'
:)

[...]
> It is probably reasonable to merge soon. I'd probably like to see the
> news plugin in 'plugins' first, but that is not critical.

I've done that anyway. Please try it out! :)

-Andrew.

Revision history for this message
Vincent Ladeuil (vila) wrote :

I reviewed this chronologically (reading oldest revision first)
and I found it was far easier to understand what was going
on. I'm not sure that trick could be used for all reviews but it
definetly made that one a nice tour :)

Keep that in mind when reading the comments below as they may
seem out of context otherwise.

I had a moderately good understanding of the merge code and I
find that your refactoring makes it far clearer, thanks for that
!

32 === modified file 'bzrlib/decorators.py'
33 --- bzrlib/decorators.py 2009-10-15 02:19:43 +0000
34 +++ bzrlib/decorators.py 2010-01-18 08:00:37 +0000
35 @@ -253,3 +253,19 @@
36 global needs_read_lock, needs_write_lock
37 needs_read_lock = _pretty_needs_read_lock
38 needs_write_lock = _pretty_needs_write_lock
39 +
40 +
41 +def cachedproperty(getter_func):
42 + """Like property(getter_func), but caches the value.
43 +
44 + Note: there's no way to clear or reset the cache.
45 + """
46 + memo = []

AIUI, this is gc'ed when no more references exist on the cached
value. It's not obvious and may be worth documenting.

Also, this can be used in a lot of places in the codebase so
that's worth a NEWS entry isn't it ?

And last remark on that subject, can't use some unique object()
to initialize the cache instead of a list for each property ?
Using a list here is cute but a bit hackish, using a dedicated
object to represent the 'not cached yet' could make the code more
obvious to the casual reader.

The hook scope is a bit unclear regarding the tt updates (spotted
when reading revno 4887). Why don't you let the hooks do them (in
merge.merge_contents()) ? Shouldn't that be at least documented
in the hook docstring ?

I had to read the code to verify that is_file_merge() was indeed
is_text_merge(), is it just me or should it be renamed ? Ha,
fixed in revno 4890 :) Never mind then.

248 + params = MergeHookParams(self, file_id, trans_id, this_pair[0],
249 + other_pair[0], winner)

Using this_kind= this_pair[0] and other_kind=other_pair[0] will
add a final touch of clarity to revno 4894 :)

Overall, nice job and reading the history of how it was put together
was really a big help to understand it.

I'll be happy to land this for 2.1 unless John or Aaron have more objections
until tomorrow.

Addressing the points I raised above with be nice but not mandatory for landing.

review: Approve
Revision history for this message
Andrew Bennetts (spiv) wrote :

Vincent Ladeuil wrote:
[...]
> 41 +def cachedproperty(getter_func):
> 42 + """Like property(getter_func), but caches the value.
> 43 +
> 44 + Note: there's no way to clear or reset the cache.
> 45 + """
> 46 + memo = []
>
>
> AIUI, this is gc'ed when no more references exist on the cached
> value. It's not obvious and may be worth documenting.

Erk, this is totally bogus. I'm not sure what I was thinking! I was
trying for a limited quick & dirty cachedproperty decorator without the
complexity of the one in Launchpad, but I think I should just borrow
Launchpad's, so I've now done that (including arranging for its doctests
to be run).

> Also, this can be used in a lot of places in the codebase so
> that's worth a NEWS entry isn't it ?

Good idea. Done.

> The hook scope is a bit unclear regarding the tt updates (spotted
> when reading revno 4887). Why don't you let the hooks do them (in
> merge.merge_contents()) ? Shouldn't that be at least documented
> in the hook docstring ?

I'm not sure what you mean here? You mean let the hook do the
self.tt.version_file and self.tt.delete_contents calls that happen at
the end of merge_contents?

I guess I'm still not certain exactly how much the hook should be
allowed to do... the more plugins people try to write for it, the more
we'll know :)

[...]
> I'll be happy to land this for 2.1 unless John or Aaron have more objections
> until tomorrow.
>
> Addressing the points I raised above with be nice but not mandatory for landing.

That's very much for the extra review.

-Andrew.

Revision history for this message
Vincent Ladeuil (vila) wrote :

>>>>> "Andrew" == Andrew Bennetts <email address hidden> writes:

    Andrew> Vincent Ladeuil wrote:
    Andrew> [...]
    >> 41 +def cachedproperty(getter_func):
    >> 42 + """Like property(getter_func), but caches the value.
    >> 43 +
    >> 44 + Note: there's no way to clear or reset the cache.
    >> 45 + """
    >> 46 + memo = []
    >>
    >>
    >> AIUI, this is gc'ed when no more references exist on the cached
    >> value. It's not obvious and may be worth documenting.

    Andrew> Erk, this is totally bogus.

Really ? How so ?

<snip/>

    >> The hook scope is a bit unclear regarding the tt updates
    >> (spotted when reading revno 4887). Why don't you let the
    >> hooks do them (in merge.merge_contents()) ? Shouldn't
    >> that be at least documented in the hook docstring ?

    Andrew> I'm not sure what you mean here? You mean let the
    Andrew> hook do the self.tt.version_file and
    Andrew> self.tt.delete_contents calls that happen at the end
    Andrew> of merge_contents?

Yes or let the caller do it but document it anyway.

    Andrew> I guess I'm still not certain exactly how much the
    Andrew> hook should be allowed to do... the more plugins
    Andrew> people try to write for it, the more we'll know :)

Ok, no problem, I'm not sure either. It's just that it seems odd
to *forbid* the hook to do some stuff a bit different if
needed. But there is no point forcing people to do it
unconditionally either. Time will tell.

I'm happy enough with what is possible today anyway.

Revision history for this message
John A Meinel (jameinel) wrote :

One problem I potentially see with the plugin is this:
585 +def filename_matches_config(params):
586 + config = params.merger.this_branch.get_config()
587 + affected_files = config.get_user_option_as_list('news_merge_files')
588 + if affected_files:
589 + filename = params.merger.this_tree.id2path(params.file_id)
590 + if filename in affected_files:
591 + return True
592 + return False

The issue is that "branch.get_config()" is not cached, and re-reads all config files. And you trigger this for *all* modified files. So if you are merging, say, bzr.dev after 20 revs, you'll get 50 changed files, and we will re-read branch.conf, bazaar.conf and locations.conf 50 times.

I wonder if we should have a place on MergeHookParams to store hook data...

Otherwise I like this revision.

review: Approve
Revision history for this message
Robert Collins (lifeless) wrote :

On Mon, 2010-01-18 at 11:06 +0000, Andrew Bennetts wrote:

> > AIUI, this is gc'ed when no more references exist on the cached
> > value. It's not obvious and may be worth documenting.
>
> Erk, this is totally bogus. I'm not sure what I was thinking! I was
> trying for a limited quick & dirty cachedproperty decorator without the
> complexity of the one in Launchpad, but I think I should just borrow
> Launchpad's, so I've now done that (including arranging for its doctests
> to be run).

You'll need to get an ack to rerelease the LP code to do this: Launchpad
is AGPL, which is incompatible with GPL.

I agree with John that we need to avoid processing branch.conf per-file.

-Rob

Revision history for this message
Vincent Ladeuil (vila) wrote :

> On Mon, 2010-01-18 at 11:06 +0000, Andrew Bennetts wrote:
>

> You'll need to get an ack to rerelease the LP code to do this: Launchpad
> is AGPL, which is incompatible with GPL.

I've got the ack from flacoste.

>
> I agree with John that we need to avoid processing branch.conf per-file.

spiv proposed to cache the config in the plugin, I'll tweak and merge.

We may want to discuss caching the config in the branch itself
(I can't find scenarios where a locked branch could have trouble with that)
but that's outside the scope of this plugin.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'NEWS'
--- NEWS 2010-01-17 20:06:41 +0000
+++ NEWS 2010-01-18 11:01:23 +0000
@@ -20,6 +20,15 @@
20* Add bug information to log output when available.20* Add bug information to log output when available.
21 (Neil Martinsen-Burrell, Guillermo Gonzalez, #251729)21 (Neil Martinsen-Burrell, Guillermo Gonzalez, #251729)
2222
23* Added ``merge_file_content`` hook point to ``Merger``, allowing plugins
24 to register custom merge logic, e.g. to provide smarter merging for
25 particular files.
26
27* Bazaar now includes the ``news_merge`` plugin. It is disabled by
28 default, to enable it add a ``news_merge_files`` option to your
29 configuration. Consult ``bzr help news_merge`` for more information.
30 (Andrew Bennetts)
31
23* ``bzr branch`` now takes a ``--bind`` option. This lets you32* ``bzr branch`` now takes a ``--bind`` option. This lets you
24 branch and bind all in one command. (Ian Clatworthy)33 branch and bind all in one command. (Ian Clatworthy)
2534
@@ -154,11 +163,18 @@
154API Changes163API Changes
155***********164***********
156165
166* Added ``cachedproperty`` decorator to ``bzrlib.decorators``.
167 (Andrew Bennetts)
168
157* Many test features were renamed from ``FooFeature`` to ``foo_feature``169* Many test features were renamed from ``FooFeature`` to ``foo_feature``
158 to be consistent with instances being lower case and classes being170 to be consistent with instances being lower case and classes being
159 CamelCase. For the features that were more likely to be used, we added a171 CamelCase. For the features that were more likely to be used, we added a
160 deprecation thunk, but not all. (John Arbash Meinel)172 deprecation thunk, but not all. (John Arbash Meinel)
161173
174* Merger classes (such as ``Merge3Merger``) now expect a ``this_branch``
175 parameter in their constructora, and provide ``this_branch`` as an
176 attribute. (Andrew Bennetts)
177
162* The Branch hooks pre_change_branch_tip no longer masks exceptions raised178* The Branch hooks pre_change_branch_tip no longer masks exceptions raised
163 by plugins - the original exceptions are now preserved. (Robert Collins)179 by plugins - the original exceptions are now preserved. (Robert Collins)
164180
165181
=== modified file 'bzrlib/decorators.py'
--- bzrlib/decorators.py 2009-10-15 02:19:43 +0000
+++ bzrlib/decorators.py 2010-01-18 11:01:23 +0000
@@ -253,3 +253,79 @@
253 global needs_read_lock, needs_write_lock253 global needs_read_lock, needs_write_lock
254 needs_read_lock = _pretty_needs_read_lock254 needs_read_lock = _pretty_needs_read_lock
255 needs_write_lock = _pretty_needs_write_lock255 needs_write_lock = _pretty_needs_write_lock
256
257
258# This implementation of cachedproperty is copied from Launchpad's
259# canonical.launchpad.cachedproperty module.
260def cachedproperty(attrname_or_fn):
261 """A decorator for methods that makes them properties with their return
262 value cached.
263
264 The value is cached on the instance, using the attribute name provided.
265
266 If you don't provide a name, the mangled name of the property is used.
267
268 >>> class CachedPropertyTest(object):
269 ...
270 ... @cachedproperty('_foo_cache')
271 ... def foo(self):
272 ... print 'foo computed'
273 ... return 23
274 ...
275 ... @cachedproperty
276 ... def bar(self):
277 ... print 'bar computed'
278 ... return 69
279
280 >>> cpt = CachedPropertyTest()
281 >>> getattr(cpt, '_foo_cache', None) is None
282 True
283 >>> cpt.foo
284 foo computed
285 23
286 >>> cpt.foo
287 23
288 >>> cpt._foo_cache
289 23
290 >>> cpt.bar
291 bar computed
292 69
293 >>> cpt._bar_cached_value
294 69
295
296 """
297 if isinstance(attrname_or_fn, basestring):
298 attrname = attrname_or_fn
299 return _CachedPropertyForAttr(attrname)
300 else:
301 fn = attrname_or_fn
302 attrname = '_%s_cached_value' % fn.__name__
303 return _CachedProperty(attrname, fn)
304
305
306class _CachedPropertyForAttr(object):
307
308 def __init__(self, attrname):
309 self.attrname = attrname
310
311 def __call__(self, fn):
312 return _CachedProperty(self.attrname, fn)
313
314
315class _CachedProperty(object):
316
317 def __init__(self, attrname, fn):
318 self.fn = fn
319 self.attrname = attrname
320 self.marker = object()
321
322 def __get__(self, inst, cls=None):
323 if inst is None:
324 return self
325 cachedresult = getattr(inst, self.attrname, self.marker)
326 if cachedresult is self.marker:
327 result = self.fn(inst)
328 setattr(inst, self.attrname, result)
329 return result
330 else:
331 return cachedresult
256332
=== modified file 'bzrlib/hooks.py'
--- bzrlib/hooks.py 2010-01-03 03:49:07 +0000
+++ bzrlib/hooks.py 2010-01-18 11:01:23 +0000
@@ -45,6 +45,8 @@
45 'bzrlib.info', 'InfoHooks')45 'bzrlib.info', 'InfoHooks')
46known_hooks.register_lazy(('bzrlib.lock', 'Lock.hooks'), 'bzrlib.lock',46known_hooks.register_lazy(('bzrlib.lock', 'Lock.hooks'), 'bzrlib.lock',
47 'LockHooks')47 'LockHooks')
48known_hooks.register_lazy(('bzrlib.merge', 'Merger.hooks'), 'bzrlib.merge',
49 'MergeHooks')
48known_hooks.register_lazy(('bzrlib.msgeditor', 'hooks'), 'bzrlib.msgeditor',50known_hooks.register_lazy(('bzrlib.msgeditor', 'hooks'), 'bzrlib.msgeditor',
49 'MessageEditorHooks')51 'MessageEditorHooks')
50known_hooks.register_lazy(('bzrlib.mutabletree', 'MutableTree.hooks'),52known_hooks.register_lazy(('bzrlib.mutabletree', 'MutableTree.hooks'),
5153
=== modified file 'bzrlib/merge.py'
--- bzrlib/merge.py 2009-12-18 08:22:42 +0000
+++ bzrlib/merge.py 2010-01-18 11:01:23 +0000
@@ -19,8 +19,10 @@
19 branch as _mod_branch,19 branch as _mod_branch,
20 conflicts as _mod_conflicts,20 conflicts as _mod_conflicts,
21 debug,21 debug,
22 decorators,
22 errors,23 errors,
23 graph as _mod_graph,24 graph as _mod_graph,
25 hooks,
24 merge3,26 merge3,
25 osutils,27 osutils,
26 patiencediff,28 patiencediff,
@@ -50,7 +52,68 @@
50 from_tree.unlock()52 from_tree.unlock()
5153
5254
55class MergeHooks(hooks.Hooks):
56
57 def __init__(self):
58 hooks.Hooks.__init__(self)
59 self.create_hook(hooks.HookPoint('merge_file_content',
60 "Called when file content needs to be merged (including when one "
61 "side has deleted the file and the other has changed it)."
62 "merge_file_content is called with a "
63 "bzrlib.merge.MergeHookParams. The function should return a tuple "
64 "of (status, lines), where status is one of 'not_applicable', "
65 "'success', 'conflicted', or 'delete'. If status is success or "
66 "conflicted, then lines should be an iterable of strings of the "
67 "new file contents.",
68 (2, 1), None))
69
70
71class MergeHookParams(object):
72 """Object holding parameters passed to merge_file_content hooks.
73
74 There are 3 fields hooks can access:
75
76 :ivar merger: the Merger object
77 :ivar file_id: the file ID of the file being merged
78 :ivar trans_id: the transform ID for the merge of this file
79 :ivar this_kind: kind of file_id in 'this' tree
80 :ivar other_kind: kind of file_id in 'other' tree
81 :ivar winner: one of 'this', 'other', 'conflict'
82 """
83
84 def __init__(self, merger, file_id, trans_id, this_kind, other_kind,
85 winner):
86 self.merger = merger
87 self.file_id = file_id
88 self.trans_id = trans_id
89 self.this_kind = this_kind
90 self.other_kind = other_kind
91 self.winner = winner
92
93 def is_file_merge(self):
94 """True if this_kind and other_kind are both 'file'."""
95 return self.this_kind == 'file' and self.other_kind == 'file'
96
97 @decorators.cachedproperty
98 def base_lines(self):
99 """The lines of the 'base' version of the file."""
100 return self.merger.get_lines(self.merger.base_tree, self.file_id)
101
102 @decorators.cachedproperty
103 def this_lines(self):
104 """The lines of the 'this' version of the file."""
105 return self.merger.get_lines(self.merger.this_tree, self.file_id)
106
107 @decorators.cachedproperty
108 def other_lines(self):
109 """The lines of the 'other' version of the file."""
110 return self.merger.get_lines(self.merger.other_tree, self.file_id)
111
112
53class Merger(object):113class Merger(object):
114
115 hooks = MergeHooks()
116
54 def __init__(self, this_branch, other_tree=None, base_tree=None,117 def __init__(self, this_branch, other_tree=None, base_tree=None,
55 this_tree=None, pb=None, change_reporter=None,118 this_tree=None, pb=None, change_reporter=None,
56 recurse='down', revision_graph=None):119 recurse='down', revision_graph=None):
@@ -432,7 +495,7 @@
432 'other_tree': self.other_tree,495 'other_tree': self.other_tree,
433 'interesting_ids': self.interesting_ids,496 'interesting_ids': self.interesting_ids,
434 'interesting_files': self.interesting_files,497 'interesting_files': self.interesting_files,
435 'pp': self.pp,498 'pp': self.pp, 'this_branch': self.this_branch,
436 'do_merge': False}499 'do_merge': False}
437 if self.merge_type.requires_base:500 if self.merge_type.requires_base:
438 kwargs['base_tree'] = self.base_tree501 kwargs['base_tree'] = self.base_tree
@@ -542,13 +605,14 @@
542 interesting_ids=None, reprocess=False, show_base=False,605 interesting_ids=None, reprocess=False, show_base=False,
543 pb=progress.DummyProgress(), pp=None, change_reporter=None,606 pb=progress.DummyProgress(), pp=None, change_reporter=None,
544 interesting_files=None, do_merge=True,607 interesting_files=None, do_merge=True,
545 cherrypick=False, lca_trees=None):608 cherrypick=False, lca_trees=None, this_branch=None):
546 """Initialize the merger object and perform the merge.609 """Initialize the merger object and perform the merge.
547610
548 :param working_tree: The working tree to apply the merge to611 :param working_tree: The working tree to apply the merge to
549 :param this_tree: The local tree in the merge operation612 :param this_tree: The local tree in the merge operation
550 :param base_tree: The common tree in the merge operation613 :param base_tree: The common tree in the merge operation
551 :param other_tree: The other tree to merge changes from614 :param other_tree: The other tree to merge changes from
615 :param this_branch: The branch associated with this_tree
552 :param interesting_ids: The file_ids of files that should be616 :param interesting_ids: The file_ids of files that should be
553 participate in the merge. May not be combined with617 participate in the merge. May not be combined with
554 interesting_files.618 interesting_files.
@@ -577,6 +641,7 @@
577 self.this_tree = working_tree641 self.this_tree = working_tree
578 self.base_tree = base_tree642 self.base_tree = base_tree
579 self.other_tree = other_tree643 self.other_tree = other_tree
644 self.this_branch = this_branch
580 self._raw_conflicts = []645 self._raw_conflicts = []
581 self.cooked_conflicts = []646 self.cooked_conflicts = []
582 self.reprocess = reprocess647 self.reprocess = reprocess
@@ -892,7 +957,7 @@
892 self.tt.final_kind(other_root)957 self.tt.final_kind(other_root)
893 except errors.NoSuchFile:958 except errors.NoSuchFile:
894 return959 return
895 if self.other_tree.inventory.root.file_id in self.this_tree.inventory:960 if self.this_tree.has_id(self.other_tree.inventory.root.file_id):
896 # the other tree's root is a non-root in the current tree961 # the other tree's root is a non-root in the current tree
897 return962 return
898 self.reparent_children(self.other_tree.inventory.root, self.tt.root)963 self.reparent_children(self.other_tree.inventory.root, self.tt.root)
@@ -940,7 +1005,7 @@
940 @staticmethod1005 @staticmethod
941 def executable(tree, file_id):1006 def executable(tree, file_id):
942 """Determine the executability of a file-id (used as a key method)."""1007 """Determine the executability of a file-id (used as a key method)."""
943 if file_id not in tree:1008 if not tree.has_id(file_id):
944 return None1009 return None
945 if tree.kind(file_id) != "file":1010 if tree.kind(file_id) != "file":
946 return False1011 return False
@@ -949,7 +1014,7 @@
949 @staticmethod1014 @staticmethod
950 def kind(tree, file_id):1015 def kind(tree, file_id):
951 """Determine the kind of a file-id (used as a key method)."""1016 """Determine the kind of a file-id (used as a key method)."""
952 if file_id not in tree:1017 if not tree.has_id(file_id):
953 return None1018 return None
954 return tree.kind(file_id)1019 return tree.kind(file_id)
9551020
@@ -1038,7 +1103,7 @@
10381103
1039 def merge_names(self, file_id):1104 def merge_names(self, file_id):
1040 def get_entry(tree):1105 def get_entry(tree):
1041 if file_id in tree.inventory:1106 if tree.has_id(file_id):
1042 return tree.inventory[file_id]1107 return tree.inventory[file_id]
1043 else:1108 else:
1044 return None1109 return None
@@ -1107,18 +1172,6 @@
1107 contents = None1172 contents = None
1108 return kind, contents1173 return kind, contents
11091174
1110 def contents_conflict():
1111 trans_id = self.tt.trans_id_file_id(file_id)
1112 name = self.tt.final_name(trans_id)
1113 parent_id = self.tt.final_parent(trans_id)
1114 if file_id in self.this_tree.inventory:
1115 self.tt.unversion_file(trans_id)
1116 if file_id in self.this_tree:
1117 self.tt.delete_contents(trans_id)
1118 file_group = self._dump_conflicts(name, parent_id, file_id,
1119 set_version=True)
1120 self._raw_conflicts.append(('contents conflict', file_group))
1121
1122 # See SPOT run. run, SPOT, run.1175 # See SPOT run. run, SPOT, run.
1123 # So we're not QUITE repeating ourselves; we do tricky things with1176 # So we're not QUITE repeating ourselves; we do tricky things with
1124 # file kind...1177 # file kind...
@@ -1140,63 +1193,112 @@
1140 if winner == 'this':1193 if winner == 'this':
1141 # No interesting changes introduced by OTHER1194 # No interesting changes introduced by OTHER
1142 return "unmodified"1195 return "unmodified"
1196 # We have a hypothetical conflict, but if we have files, then we
1197 # can try to merge the content
1143 trans_id = self.tt.trans_id_file_id(file_id)1198 trans_id = self.tt.trans_id_file_id(file_id)
1144 if winner == 'other':1199 params = MergeHookParams(self, file_id, trans_id, this_pair[0],
1200 other_pair[0], winner)
1201 hooks = Merger.hooks['merge_file_content']
1202 hooks = list(hooks) + [self.default_text_merge]
1203 hook_status = 'not_applicable'
1204 for hook in hooks:
1205 hook_status, lines = hook(params)
1206 if hook_status != 'not_applicable':
1207 # Don't try any more hooks, this one applies.
1208 break
1209 result = "modified"
1210 if hook_status == 'not_applicable':
1211 # This is a contents conflict, because none of the available
1212 # functions could merge it.
1213 result = None
1214 name = self.tt.final_name(trans_id)
1215 parent_id = self.tt.final_parent(trans_id)
1216 if self.this_tree.has_id(file_id):
1217 self.tt.unversion_file(trans_id)
1218 file_group = self._dump_conflicts(name, parent_id, file_id,
1219 set_version=True)
1220 self._raw_conflicts.append(('contents conflict', file_group))
1221 elif hook_status == 'success':
1222 self.tt.create_file(lines, trans_id)
1223 elif hook_status == 'conflicted':
1224 # XXX: perhaps the hook should be able to provide
1225 # the BASE/THIS/OTHER files?
1226 self.tt.create_file(lines, trans_id)
1227 self._raw_conflicts.append(('text conflict', trans_id))
1228 name = self.tt.final_name(trans_id)
1229 parent_id = self.tt.final_parent(trans_id)
1230 self._dump_conflicts(name, parent_id, file_id)
1231 elif hook_status == 'delete':
1232 self.tt.unversion_file(trans_id)
1233 result = "deleted"
1234 elif hook_status == 'done':
1235 # The hook function did whatever it needs to do directly, no
1236 # further action needed here.
1237 pass
1238 else:
1239 raise AssertionError('unknown hook_status: %r' % (hook_status,))
1240 if not self.this_tree.has_id(file_id) and result == "modified":
1241 self.tt.version_file(file_id, trans_id)
1242 # The merge has been performed, so the old contents should not be
1243 # retained.
1244 try:
1245 self.tt.delete_contents(trans_id)
1246 except errors.NoSuchFile:
1247 pass
1248 return result
1249
1250 def _default_other_winner_merge(self, merge_hook_params):
1251 """Replace this contents with other."""
1252 file_id = merge_hook_params.file_id
1253 trans_id = merge_hook_params.trans_id
1254 file_in_this = self.this_tree.has_id(file_id)
1255 if self.other_tree.has_id(file_id):
1256 # OTHER changed the file
1257 wt = self.this_tree
1258 if wt.supports_content_filtering():
1259 # We get the path from the working tree if it exists.
1260 # That fails though when OTHER is adding a file, so
1261 # we fall back to the other tree to find the path if
1262 # it doesn't exist locally.
1263 try:
1264 filter_tree_path = wt.id2path(file_id)
1265 except errors.NoSuchId:
1266 filter_tree_path = self.other_tree.id2path(file_id)
1267 else:
1268 # Skip the id2path lookup for older formats
1269 filter_tree_path = None
1270 transform.create_from_tree(self.tt, trans_id,
1271 self.other_tree, file_id,
1272 filter_tree_path=filter_tree_path)
1273 return 'done', None
1274 elif file_in_this:
1275 # OTHER deleted the file
1276 return 'delete', None
1277 else:
1278 raise AssertionError(
1279 'winner is OTHER, but file_id %r not in THIS or OTHER tree'
1280 % (file_id,))
1281
1282 def default_text_merge(self, merge_hook_params):
1283 if merge_hook_params.winner == 'other':
1145 # OTHER is a straight winner, so replace this contents with other1284 # OTHER is a straight winner, so replace this contents with other
1146 file_in_this = file_id in self.this_tree1285 return self._default_other_winner_merge(merge_hook_params)
1147 if file_in_this:1286 elif merge_hook_params.is_file_merge():
1148 # Remove any existing contents1287 # THIS and OTHER are both files, so text merge. Either
1149 self.tt.delete_contents(trans_id)1288 # BASE is a file, or both converted to files, so at least we
1150 if file_id in self.other_tree:1289 # have agreement that output should be a file.
1151 # OTHER changed the file1290 try:
1152 wt = self.this_tree1291 self.text_merge(merge_hook_params.file_id,
1153 if wt.supports_content_filtering():1292 merge_hook_params.trans_id)
1154 # We get the path from the working tree if it exists.1293 except errors.BinaryFile:
1155 # That fails though when OTHER is adding a file, so1294 return 'not_applicable', None
1156 # we fall back to the other tree to find the path if1295 return 'done', None
1157 # it doesn't exist locally.
1158 try:
1159 filter_tree_path = wt.id2path(file_id)
1160 except errors.NoSuchId:
1161 filter_tree_path = self.other_tree.id2path(file_id)
1162 else:
1163 # Skip the id2path lookup for older formats
1164 filter_tree_path = None
1165 transform.create_from_tree(self.tt, trans_id,
1166 self.other_tree, file_id,
1167 filter_tree_path=filter_tree_path)
1168 if not file_in_this:
1169 self.tt.version_file(file_id, trans_id)
1170 return "modified"
1171 elif file_in_this:
1172 # OTHER deleted the file
1173 self.tt.unversion_file(trans_id)
1174 return "deleted"
1175 else:1296 else:
1176 # We have a hypothetical conflict, but if we have files, then we1297 return 'not_applicable', None
1177 # can try to merge the content
1178 if this_pair[0] == 'file' and other_pair[0] == 'file':
1179 # THIS and OTHER are both files, so text merge. Either
1180 # BASE is a file, or both converted to files, so at least we
1181 # have agreement that output should be a file.
1182 try:
1183 self.text_merge(file_id, trans_id)
1184 except errors.BinaryFile:
1185 return contents_conflict()
1186 if file_id not in self.this_tree:
1187 self.tt.version_file(file_id, trans_id)
1188 try:
1189 self.tt.tree_kind(trans_id)
1190 self.tt.delete_contents(trans_id)
1191 except errors.NoSuchFile:
1192 pass
1193 return "modified"
1194 else:
1195 return contents_conflict()
11961298
1197 def get_lines(self, tree, file_id):1299 def get_lines(self, tree, file_id):
1198 """Return the lines in a file, or an empty list."""1300 """Return the lines in a file, or an empty list."""
1199 if file_id in tree:1301 if tree.has_id(file_id):
1200 return tree.get_file(file_id).readlines()1302 return tree.get_file(file_id).readlines()
1201 else:1303 else:
1202 return []1304 return []
@@ -1205,7 +1307,7 @@
1205 """Perform a three-way text merge on a file_id"""1307 """Perform a three-way text merge on a file_id"""
1206 # it's possible that we got here with base as a different type.1308 # it's possible that we got here with base as a different type.
1207 # if so, we just want two-way text conflicts.1309 # if so, we just want two-way text conflicts.
1208 if file_id in self.base_tree and \1310 if self.base_tree.has_id(file_id) and \
1209 self.base_tree.kind(file_id) == "file":1311 self.base_tree.kind(file_id) == "file":
1210 base_lines = self.get_lines(self.base_tree, file_id)1312 base_lines = self.get_lines(self.base_tree, file_id)
1211 else:1313 else:
@@ -1274,7 +1376,7 @@
1274 versioned = False1376 versioned = False
1275 file_group = []1377 file_group = []
1276 for suffix, tree, lines in data:1378 for suffix, tree, lines in data:
1277 if file_id in tree:1379 if tree.has_id(file_id):
1278 trans_id = self._conflict_file(name, parent_id, tree, file_id,1380 trans_id = self._conflict_file(name, parent_id, tree, file_id,
1279 suffix, lines, filter_tree_path)1381 suffix, lines, filter_tree_path)
1280 file_group.append(trans_id)1382 file_group.append(trans_id)
@@ -1324,11 +1426,11 @@
1324 if winner == "this":1426 if winner == "this":
1325 executability = this_executable1427 executability = this_executable
1326 else:1428 else:
1327 if file_id in self.other_tree:1429 if self.other_tree.has_id(file_id):
1328 executability = other_executable1430 executability = other_executable
1329 elif file_id in self.this_tree:1431 elif self.this_tree.has_id(file_id):
1330 executability = this_executable1432 executability = this_executable
1331 elif file_id in self.base_tree:1433 elif self.base_tree_has_id(file_id):
1332 executability = base_executable1434 executability = base_executable
1333 if executability is not None:1435 if executability is not None:
1334 trans_id = self.tt.trans_id_file_id(file_id)1436 trans_id = self.tt.trans_id_file_id(file_id)
13351437
=== added directory 'bzrlib/plugins/news_merge'
=== added file 'bzrlib/plugins/news_merge/README'
--- bzrlib/plugins/news_merge/README 1970-01-01 00:00:00 +0000
+++ bzrlib/plugins/news_merge/README 2010-01-18 11:01:23 +0000
@@ -0,0 +1,9 @@
1A plugin for merging bzr's NEWS file.
2
3Install as usual (e.g. symlink this directory into ~/.bazaar/plugins), and any
4file with a path of NEWS that is in bzr's NEWS format will be merged with this
5hook.
6
7This hook can resolve conflicts where both sides add entries at the same place.
8If it encounters a more difficult conflict it gives up and bzr will fallback to
9the default merge algorithm.
010
=== added file 'bzrlib/plugins/news_merge/__init__.py'
--- bzrlib/plugins/news_merge/__init__.py 1970-01-01 00:00:00 +0000
+++ bzrlib/plugins/news_merge/__init__.py 2010-01-18 11:01:23 +0000
@@ -0,0 +1,73 @@
1# Copyright (C) 2010 Canonical Ltd
2#
3# This program is free software; you can redistribute it and/or modify
4# it under the terms of the GNU General Public License as published by
5# the Free Software Foundation; either version 2 of the License, or
6# (at your option) any later version.
7#
8# This program is distributed in the hope that it will be useful,
9# but WITHOUT ANY WARRANTY; without even the implied warranty of
10# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
11# GNU General Public License for more details.
12#
13# You should have received a copy of the GNU General Public License
14# along with this program; if not, write to the Free Software
15# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
16
17"""Merge hook for bzr's NEWS file.
18
19To enable this plugin, add a section to your branch.conf or location.conf
20like::
21
22 [/home/user/code/bzr]
23 news_merge_files = NEWS
24 news_merge_files:policy = recurse
25
26The news_merge_files config option takes a list of file paths, separated by
27commas.
28
29Limitations:
30
31* if there's a conflict in more than just bullet points, this doesn't yet know
32 how to resolve that, so bzr will fallback to the default line-based merge.
33"""
34
35# Put most of the code in a separate module that we lazy-import to keep the
36# overhead of this plugin as minimal as possible.
37from bzrlib.lazy_import import lazy_import
38lazy_import(globals(), """
39from bzrlib.plugins.news_merge.news_merge import news_merger
40""")
41
42from bzrlib.merge import Merger
43
44
45def news_merge_hook(params):
46 """Merger.merge_file_content hook function for bzr-format NEWS files."""
47 # First, check whether this custom merge logic should be used. We expect
48 # most files should not be merged by this file.
49 if params.winner == 'other':
50 # OTHER is a straight winner, rely on default merge.
51 return 'not_applicable', None
52 elif not params.is_file_merge():
53 # THIS and OTHER aren't both files.
54 return 'not_applicable', None
55 elif not filename_matches_config(params):
56 # The filename isn't listed in the 'news_merge_files' config option.
57 return 'not_applicable', None
58 return news_merger(params)
59
60
61def filename_matches_config(params):
62 config = params.merger.this_branch.get_config()
63 affected_files = config.get_user_option_as_list('news_merge_files')
64 if affected_files:
65 filename = params.merger.this_tree.id2path(params.file_id)
66 if filename in affected_files:
67 return True
68 return False
69
70
71Merger.hooks.install_named_hook(
72 'merge_file_content', news_merge_hook, 'NEWS file merge')
73
074
=== added file 'bzrlib/plugins/news_merge/news_merge.py'
--- bzrlib/plugins/news_merge/news_merge.py 1970-01-01 00:00:00 +0000
+++ bzrlib/plugins/news_merge/news_merge.py 2010-01-18 11:01:23 +0000
@@ -0,0 +1,88 @@
1# Copyright (C) 2010 Canonical Ltd
2#
3# This program is free software; you can redistribute it and/or modify
4# it under the terms of the GNU General Public License as published by
5# the Free Software Foundation; either version 2 of the License, or
6# (at your option) any later version.
7#
8# This program is distributed in the hope that it will be useful,
9# but WITHOUT ANY WARRANTY; without even the implied warranty of
10# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
11# GNU General Public License for more details.
12#
13# You should have received a copy of the GNU General Public License
14# along with this program; if not, write to the Free Software
15# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
16
17"""Merge logic for news_merge plugin."""
18
19
20from bzrlib.plugins.news_merge.parser import simple_parse
21from bzrlib import merge3
22
23
24magic_marker = '|NEWS-MERGE-MAGIC-MARKER|'
25
26
27def news_merger(params):
28 """Perform a simple 3-way merge of a bzr NEWS file.
29
30 Each section of a bzr NEWS file is essentially an ordered set of bullet
31 points, so we can simply take a set of bullet points, determine which
32 bullets to add and which to remove, sort, and reserialize.
33 """
34 # Transform the different versions of the NEWS file into a bunch of text
35 # lines where each line matches one part of the overall structure, e.g. a
36 # heading or bullet.
37 def munge(lines):
38 return list(blocks_to_fakelines(simple_parse(''.join(lines))))
39 this_lines = munge(params.this_lines)
40 other_lines = munge(params.other_lines)
41 base_lines = munge(params.base_lines)
42 m3 = merge3.Merge3(base_lines, this_lines, other_lines)
43 result_lines = []
44 for group in m3.merge_groups():
45 if group[0] == 'conflict':
46 _, base, a, b = group
47 # Are all the conflicting lines bullets? If so, we can merge this.
48 for line_set in [base, a, b]:
49 for line in line_set:
50 if not line.startswith('bullet'):
51 # Something else :(
52 # Maybe the default merge can cope.
53 return 'not_applicable', None
54 # Calculate additions and deletions.
55 new_in_a = set(a).difference(base)
56 new_in_b = set(b).difference(base)
57 all_new = new_in_a.union(new_in_b)
58 deleted_in_a = set(base).difference(a)
59 deleted_in_b = set(base).difference(b)
60 # Combine into the final set of bullet points.
61 final = all_new.difference(deleted_in_a).difference(deleted_in_b)
62 # Sort, and emit.
63 final = sorted(final, key=sort_key)
64 result_lines.extend(final)
65 else:
66 result_lines.extend(group[1])
67 # Transform the merged elements back into real blocks of lines.
68 return 'success', list(fakelines_to_blocks(result_lines))
69
70
71def blocks_to_fakelines(blocks):
72 for kind, text in blocks:
73 yield '%s%s%s' % (kind, magic_marker, text)
74
75
76def fakelines_to_blocks(fakelines):
77 fakelines = list(fakelines)
78 # Strip out the magic_marker, and reinstate the \n\n between blocks
79 for fakeline in fakelines[:-1]:
80 yield fakeline.split(magic_marker, 1)[1] + '\n\n'
81 # The final block doesn't have a trailing \n\n.
82 for fakeline in fakelines[-1:]:
83 yield fakeline.split(magic_marker, 1)[1]
84
85
86def sort_key(s):
87 return s.replace('`', '').lower()
88
089
=== added file 'bzrlib/plugins/news_merge/parser.py'
--- bzrlib/plugins/news_merge/parser.py 1970-01-01 00:00:00 +0000
+++ bzrlib/plugins/news_merge/parser.py 2010-01-18 11:01:23 +0000
@@ -0,0 +1,62 @@
1# Copyright (C) 2010 Canonical Ltd
2#
3# This program is free software; you can redistribute it and/or modify
4# it under the terms of the GNU General Public License as published by
5# the Free Software Foundation; either version 2 of the License, or
6# (at your option) any later version.
7#
8# This program is distributed in the hope that it will be useful,
9# but WITHOUT ANY WARRANTY; without even the implied warranty of
10# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
11# GNU General Public License for more details.
12#
13# You should have received a copy of the GNU General Public License
14# along with this program; if not, write to the Free Software
15# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
16
17"""Simple parser for bzr's NEWS file.
18
19Simple as this is, it's a bit over-powered for news_merge's needs, which only
20cares about 'bullet' and 'everything else'.
21
22This module can be run as a standalone Python program; pass it a filename and
23it will print the parsed form of a file (a series of 2-tuples, see
24simple_parse's docstring).
25"""
26
27
28def simple_parse(content):
29 """Returns blocks, where each block is a 2-tuple (kind, text).
30
31 :kind: one of 'heading', 'release', 'section', 'empty' or 'text'.
32 :text: a str, including newlines.
33 """
34 blocks = content.split('\n\n')
35 for block in blocks:
36 if block.startswith('###'):
37 # First line is ###...: Top heading
38 yield 'heading', block
39 continue
40 last_line = block.rsplit('\n', 1)[-1]
41 if last_line.startswith('###'):
42 # last line is ###...: 2nd-level heading
43 yield 'release', block
44 elif last_line.startswith('***'):
45 # last line is ***...: 3rd-level heading
46 yield 'section', block
47 elif block.startswith('* '):
48 # bullet
49 yield 'bullet', block
50 elif block.strip() == '':
51 # empty
52 yield 'empty', block
53 else:
54 # plain text
55 yield 'text', block
56
57
58if __name__ == '__main__':
59 import sys
60 content = open(sys.argv[1], 'rb').read()
61 for result in simple_parse(content):
62 print result
063
=== modified file 'bzrlib/tests/__init__.py'
--- bzrlib/tests/__init__.py 2010-01-17 20:06:41 +0000
+++ bzrlib/tests/__init__.py 2010-01-18 11:01:23 +0000
@@ -3752,6 +3752,7 @@
3752 return [3752 return [
3753 'bzrlib',3753 'bzrlib',
3754 'bzrlib.branchbuilder',3754 'bzrlib.branchbuilder',
3755 'bzrlib.decorators',
3755 'bzrlib.export',3756 'bzrlib.export',
3756 'bzrlib.inventory',3757 'bzrlib.inventory',
3757 'bzrlib.iterablefile',3758 'bzrlib.iterablefile',
37583759
=== modified file 'bzrlib/tests/per_merger.py'
--- bzrlib/tests/per_merger.py 2009-12-18 06:54:20 +0000
+++ bzrlib/tests/per_merger.py 2010-01-18 11:01:23 +0000
@@ -18,6 +18,7 @@
1818
19import os19import os
2020
21from bzrlib.conflicts import TextConflict
21from bzrlib import (22from bzrlib import (
22 errors,23 errors,
23 merge as _mod_merge,24 merge as _mod_merge,
@@ -28,6 +29,7 @@
28 multiply_tests,29 multiply_tests,
29 TestCaseWithTransport,30 TestCaseWithTransport,
30 )31 )
32from bzrlib.tests.test_merge_core import MergeBuilder
31from bzrlib.transform import TreeTransform33from bzrlib.transform import TreeTransform
3234
3335
@@ -188,4 +190,171 @@
188 self.assertRaises(errors.LockError, wt.unlock)190 self.assertRaises(errors.LockError, wt.unlock)
189191
190192
193class TestHookMergeFileContent(TestCaseWithTransport):
194 """Tests that the 'merge_file_content' hook is invoked."""
195
196 def setUp(self):
197 TestCaseWithTransport.setUp(self)
198 self.hook_log = []
199
200 def install_hook_noop(self):
201 def hook_na(merge_params):
202 # This hook unconditionally does nothing.
203 self.hook_log.append(('no-op',))
204 return 'not_applicable', None
205 _mod_merge.Merger.hooks.install_named_hook(
206 'merge_file_content', hook_na, 'test hook (no-op)')
207
208 def install_hook_success(self):
209 def hook_success(merge_params):
210 self.hook_log.append(('success',))
211 if merge_params.file_id == '1':
212 return 'success', ['text-merged-by-hook']
213 return 'not_applicable', None
214 _mod_merge.Merger.hooks.install_named_hook(
215 'merge_file_content', hook_success, 'test hook (success)')
216
217 def install_hook_conflict(self):
218 def hook_conflict(merge_params):
219 self.hook_log.append(('conflict',))
220 if merge_params.file_id == '1':
221 return 'conflicted', ['text-with-conflict-markers-from-hook']
222 return 'not_applicable', None
223 _mod_merge.Merger.hooks.install_named_hook(
224 'merge_file_content', hook_conflict, 'test hook (delete)')
225
226 def install_hook_delete(self):
227 def hook_delete(merge_params):
228 self.hook_log.append(('delete',))
229 if merge_params.file_id == '1':
230 return 'delete', None
231 return 'not_applicable', None
232 _mod_merge.Merger.hooks.install_named_hook(
233 'merge_file_content', hook_delete, 'test hook (delete)')
234
235 def install_hook_log_lines(self):
236 """Install a hook that saves the get_lines for the this, base and other
237 versions of the file.
238 """
239 def hook_log_lines(merge_params):
240 self.hook_log.append((
241 'log_lines',
242 merge_params.this_lines,
243 merge_params.other_lines,
244 merge_params.base_lines,
245 ))
246 return 'not_applicable', None
247 _mod_merge.Merger.hooks.install_named_hook(
248 'merge_file_content', hook_log_lines, 'test hook (log_lines)')
249
250 def make_merge_builder(self):
251 builder = MergeBuilder(self.test_base_dir)
252 self.addCleanup(builder.cleanup)
253 return builder
254
255 def create_file_needing_contents_merge(self, builder, file_id):
256 builder.add_file(file_id, builder.tree_root, "name1", "text1", True)
257 builder.change_contents(file_id, other="text4", this="text3")
258
259 def test_change_vs_change(self):
260 """Hook is used for (changed, changed)"""
261 self.install_hook_success()
262 builder = self.make_merge_builder()
263 builder.add_file("1", builder.tree_root, "name1", "text1", True)
264 builder.change_contents("1", other="text4", this="text3")
265 conflicts = builder.merge(self.merge_type)
266 self.assertEqual(conflicts, [])
267 self.assertEqual(
268 builder.this.get_file('1').read(), 'text-merged-by-hook')
269
270 def test_change_vs_deleted(self):
271 """Hook is used for (changed, deleted)"""
272 self.install_hook_success()
273 builder = self.make_merge_builder()
274 builder.add_file("1", builder.tree_root, "name1", "text1", True)
275 builder.change_contents("1", this="text2")
276 builder.remove_file("1", other=True)
277 conflicts = builder.merge(self.merge_type)
278 self.assertEqual(conflicts, [])
279 self.assertEqual(
280 builder.this.get_file('1').read(), 'text-merged-by-hook')
281
282 def test_result_can_be_delete(self):
283 """A hook's result can be the deletion of a file."""
284 self.install_hook_delete()
285 builder = self.make_merge_builder()
286 self.create_file_needing_contents_merge(builder, "1")
287 conflicts = builder.merge(self.merge_type)
288 self.assertEqual(conflicts, [])
289 self.assertRaises(errors.NoSuchId, builder.this.id2path, '1')
290 self.assertEqual([], list(builder.this.list_files()))
291
292 def test_result_can_be_conflict(self):
293 """A hook's result can be a conflict."""
294 self.install_hook_conflict()
295 builder = self.make_merge_builder()
296 self.create_file_needing_contents_merge(builder, "1")
297 conflicts = builder.merge(self.merge_type)
298 self.assertEqual(conflicts, [TextConflict('name1', file_id='1')])
299 # The hook still gets to set the file contents in this case, so that it
300 # can insert custom conflict markers.
301 self.assertEqual(
302 builder.this.get_file('1').read(),
303 'text-with-conflict-markers-from-hook')
304
305 def test_can_access_this_other_and_base_versions(self):
306 """The hook function can call params.merger.get_lines to access the
307 THIS/OTHER/BASE versions of the file.
308 """
309 self.install_hook_log_lines()
310 builder = self.make_merge_builder()
311 builder.add_file("1", builder.tree_root, "name1", "text1", True)
312 builder.change_contents("1", this="text2", other="text3")
313 conflicts = builder.merge(self.merge_type)
314 self.assertEqual(
315 [('log_lines', ['text2'], ['text3'], ['text1'])], self.hook_log)
316
317 def test_chain_when_not_applicable(self):
318 """When a hook function returns not_applicable, the next function is
319 tried (when one exists).
320 """
321 self.install_hook_noop()
322 self.install_hook_success()
323 builder = self.make_merge_builder()
324 self.create_file_needing_contents_merge(builder, "1")
325 conflicts = builder.merge(self.merge_type)
326 self.assertEqual(conflicts, [])
327 self.assertEqual(
328 builder.this.get_file('1').read(), 'text-merged-by-hook')
329 self.assertEqual([('no-op',), ('success',)], self.hook_log)
330
331 def test_chain_stops_after_success(self):
332 """When a hook function returns success, no later functions are tried.
333 """
334 self.install_hook_success()
335 self.install_hook_noop()
336 builder = self.make_merge_builder()
337 self.create_file_needing_contents_merge(builder, "1")
338 conflicts = builder.merge(self.merge_type)
339 self.assertEqual([('success',)], self.hook_log)
340
341 def test_chain_stops_after_conflict(self):
342 """When a hook function returns conflict, no later functions are tried.
343 """
344 self.install_hook_conflict()
345 self.install_hook_noop()
346 builder = self.make_merge_builder()
347 self.create_file_needing_contents_merge(builder, "1")
348 conflicts = builder.merge(self.merge_type)
349 self.assertEqual([('conflict',)], self.hook_log)
350
351 def test_chain_stops_after_delete(self):
352 """When a hook function returns delete, no later functions are tried.
353 """
354 self.install_hook_delete()
355 self.install_hook_noop()
356 builder = self.make_merge_builder()
357 self.create_file_needing_contents_merge(builder, "1")
358 conflicts = builder.merge(self.merge_type)
359 self.assertEqual([('delete',)], self.hook_log)
191360