Bazaar

Code review comment for lp:~spiv/bzr/per-file-merge-hook-491711

per-file-merge-hook-491711
Merge into bzr.dev

Revision history for this message

Andrew Bennetts (spiv) wrote on 2010-01-15:

John A Meinel wrote:
> Review: Needs Fixing
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Andrew Bennetts wrote:
> > Andrew Bennetts has proposed merging lp:~spiv/bzr/per-file-merge-hook-491711 into lp:bzr with lp:~spiv/bzr/merge-minor-refactoring as a prerequisite.
> >
> > Requested reviews:
> > John A Meinel (jameinel)
> > Related bugs:
> > #491711 hook to permit per file merges
> > https://bugs.launchpad.net/bugs/491711
> >
> >
>
> I think overall this is looking good. Looking over the actual diff
> brought up a lot of questions about how various things hook together. I
> don't know that everything needs to be addressed before merging, but I
> think it would be good to at least understand what is going on.

Fair enough. There are parts I don't fully understand, but those parts
I'm pretty sure are working the same as in trunk (and are equally
confusing in trunk).

Thanks for the review!

> +class MergeHookParams(object):
> + """Object holding parameters passed to merge_file_content hooks.
> +
> + There are 3 fields hooks can access:
> +
>
> What about "this_pair", "other_pair", etc. I don't really understand
> what the "pair" is. this_pair[0] looks to be a Kind. Also, you should

Yeah, it is a kind. This was a refactoring that I should have taken a
little further by just passing the kinds rather than the “pairs”. I've
done that now, and added this_kind and other_kind to the class
docstring.

> probably document "base_lines", "this_lines", and "other_lines" here as
> well. The current documentation assumes you'll go to the merger, but
> you've already provided helpers for that.

I've removed the outdated comment that suggests using the merger for
this.

The *_lines properties are already documented, though, they have their
own docstrings. Unless you think they should be mentioned in the class
docstring too, but I didn't think that was our usual practice?

> base_lines, this_lines and other_lines doesn't map very cleanly into
> WeaveMerge... But maybe it is good enough.

Yeah, I admit to being unsure about what to do with the non-default
Merger classes. I guess for now we leave it up to the plugins to check
the type of Merger if they are sensitive to it? I would like to give
plugins the chance to use their own hooks even in the case of
WeaveMerge.

> + if hook_status == 'not_applicable':
> + # This is a contents conflict, because none of the available
> + # functions could merge it.
> + result = None
> + name = self.tt.final_name(trans_id)
> + parent_id = self.tt.final_parent(trans_id)
> + if file_id in self.this_tree.inventory:
> + self.tt.unversion_file(trans_id)
> + file_group = self._dump_conflicts(name, parent_id, file_id,
> + set_version=True)
> + self._raw_conflicts.append(('contents conflict', file_group))
>
> ^- Are we sure this is correct? If there is a contents_conflict we mark
> the file as unversioned?

I'm not really sure why, but this is what lp:bzr does in these cases.
So this isn't an issue introduced by this branch.

> Also, you are using "self.this_tree.inventory" but later we use "file_id
> in self.this_tree".
>
> I'd probably rather use "self.this_tree.has_id(file_id)" in both cases.
> Trees shouldn't really have a __contains__ method, and I'd like to avoid
> using ".inventory" if we can. (Most Dirstate actions can be resolved
> without building an inventory at all.)

Fair enough. Again, I think this was probably an issue before this
branch, but this at least is something I can improve with confidence in
this one :)

I've gone ahead and changed this everywhere in this file, even places I
hadn't previously touched.

> + elif hook_status == 'conflicted':
> + # XXX: perhaps the hook should be able to provide
> + # the BASE/THIS/OTHER files?
> + self.tt.create_file(lines, trans_id)
> + self._raw_conflicts.append(('text conflict', trans_id))
> + name = self.tt.final_name(trans_id)
> + parent_id = self.tt.final_parent(trans_id)
> + file_group = self._dump_conflicts(name, parent_id, file_id)
> + file_group.append(trans_id)
>
> ^- Do you know why we are appending the trans_id here? file_group isn't
> mentioned again. So is this mutating an internal structure behind the
> scenes? (_dump_conflicts is creating a list and storing it in state
> somewhere, and we mutate it to add the new id.) If so, a comment would
> be nice. (Probably not your code, but certainly a "what is going on?"
> moment.)

Hmm. Again, this is something that lp:bzr does that I've
preserved/copied. On closer inspection, I don't think it has any effect
(either in lp:bzr or in my branch), so I'll just delete that.

> + try:
> + self.tt.tree_kind(trans_id)
> + self.tt.delete_contents(trans_id)
> + except errors.NoSuchFile:
> + pass
>
> ^- I don't quite understand the unconditional "delete_contents" if the
> contents exist. Especially given that:
>
> + elif hook_status == 'success':
> + self.tt.create_file(lines, trans_id)
>
> Is called *before* this.

Yes, this certainly puzzled me when I came across it in trunk! (Again,
this oddity is definitely present in trunk, just that my patch has
perhaps made it more visible.)

> Is it that TreeTransform.delete_contents() indicates we want to delete
> the content in the current tree, and "create_file()" is creating
> contents in the "next" tree?

Exactly how TreeTransform isn't clearly documented that I could find.
If someone that knows precisely what the operations on it means and
could document that it would be a great help, I think. AFAICS there
isn't even a description of what a trans_id is... is an abbreviation of
“transform id” or “transaction id”, are they file_ids, or something else
but like file_ids? What's the difference between unversion_file()
and delete_contents()? etc.

So, for this specific puzzle...

The code I'm guessing is there to ensure that the file is marked as
versioned, but that the temporary file used for the new contents is
removed? Staring again at the treetransform.py code isn't bringing me
enlightenment.

> A comment of this might clarify:
> # Now get rid of the contents in the working tree, so that any new
> # content (even conflicts) has a place to go.
> ...

I really wish I knew if that was right.

All I'm certain of is that I'm doing the same as trunk in this case.

> + def _default_other_winner_merge(self, merge_hook_params):
> + """Replace this contents with other."""
> + file_id = merge_hook_params.file_id
> + trans_id = merge_hook_params.trans_id
> + file_in_this = file_id in self.this_tree
> + if file_id in self.other_tree:
> + # OTHER changed the file
> + wt = self.this_tree
> + if wt.supports_content_filtering():
> + # We get the path from the working tree if it exists.
> + # That fails though when OTHER is adding a file, so
> + # we fall back to the other tree to find the path if
> + # it doesn't exist locally.
> + try:
> + filter_tree_path = wt.id2path(file_id)
> + except errors.NoSuchId:
> + filter_tree_path = self.other_tree.id2path(file_id)
> + else:
> + # Skip the id2path lookup for older formats
> + filter_tree_path = None
>
> ^- Something seems missing if we are doing this only in this code path.
> What about all the other code paths that will be creating file contents?
> How do the MergeHooks interact with content filtering?

That's a good question! I don't know, but it's interesting that even in
trunk I only see content filtering involved in this case (OTHER is
winner) and when generating conflicts. When both sides have modified
the file no content filtering appears to be involved, which doesn't
sound right.

So I *think* my patch doesn't make this worse, unless a plugin decides
to perform the merge for the winner == 'other' case.

[...]
> +"""Merge hook for bzr's NEWS file.
> +
> +Install this as a plugin, e.g:
> +
> + cp contrib/news-file-merge-hook.py ~/.bazaar/plugins/news_merge.py
> +"""
>
> ^- This is no longer true, since you changed it to "news_merge/__init__.py".
>
> +from .parser import simple_parse
>
> ^- ????
>
> I'm guessing ".parser" is python 2.6 syntax, but this certainly isn't
> 2.4 syntax.

Right. Fixed.

> Given the discussion, I would probably move this into bzrlib/plugins/*
> and change the trigger to look at a config variable to determine whether
> it is active.

I haven't done this yet, but will soon.

> Which makes me think that MergeHookParams is incomplete if we don't have
> semi-easy access to what Branch/Tree this hook is being run on. I
> suppose we can use "params.merger.this_tree.branch.get_config()" ?

I suppose so... I'll find out when I move the plugin to bzrlib.plugins I
guess! :)

> How does this interact with "bzr merge --preview".

It seems to work as I expect: any registered hook, such as news_merge's,
is used. Do you have some specific concerns in mind?

> I would also move the bulk of the code into a separate file that is only
> imported with the merge hook is actually triggered.

I haven't yet moved the code, although there isn't much of it, but I
have added some lazy_import magic.

-Andrew.

John A Meinel wrote:
> Review: Needs Fixing
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Andrew Bennetts wrote:
> > Andrew Bennetts has proposed merging lp:~spiv/bzr/per-file-merge-hook-491711 into lp:bzr with lp:~spiv/bzr/merge-minor-refactoring as a prerequisite.
> > 
> >     Requested reviews:
> >     John A Meinel (jameinel)
> > Related bugs:
> >   #491711 hook to permit per file merges
> >   https://bugs.launchpad.net/bugs/491711
> > 
> > 
> 
> I think overall this is looking good. Looking over the actual diff
> brought up a lot of questions about how various things hook together. I
> don't know that everything needs to be addressed before merging, but I
> think it would be good to at least understand what is going on.

Fair enough.  There are parts I don't fully understand, but those parts
I'm pretty sure are working the same as in trunk (and are equally
confusing in trunk).

Thanks for the review!

> +class MergeHookParams(object):
> +    """Object holding parameters passed to merge_file_content hooks.
> +
> +    There are 3 fields hooks can access:
> +
> 
> What about "this_pair", "other_pair", etc. I don't really understand
> what the "pair" is. this_pair[0] looks to be a Kind. Also, you should

Yeah, it is a kind.  This was a refactoring that I should have taken a
little further by just passing the kinds rather than the “pairs”.  I've
done that now, and added this_kind and other_kind to the class
docstring.

> probably document "base_lines", "this_lines", and "other_lines" here as
> well. The current documentation assumes you'll go to the merger, but
> you've already provided helpers for that.

I've removed the outdated comment that suggests using the merger for
this.

The *_lines properties are already documented, though, they have their
own docstrings.  Unless you think they should be mentioned in the class
docstring too, but I didn't think that was our usual practice?

> base_lines, this_lines and other_lines doesn't map very cleanly into
> WeaveMerge... But maybe it is good enough.

Yeah, I admit to being unsure about what to do with the non-default
Merger classes.  I guess for now we leave it up to the plugins to check
the type of Merger if they are sensitive to it?  I would like to give
plugins the chance to use their own hooks even in the case of
WeaveMerge.

> +        if hook_status == 'not_applicable':
> +            # This is a contents conflict, because none of the available
> +            # functions could merge it.
> +            result = None
> +            name = self.tt.final_name(trans_id)
> +            parent_id = self.tt.final_parent(trans_id)
> +            if file_id in self.this_tree.inventory:
> +                self.tt.unversion_file(trans_id)
> +            file_group = self._dump_conflicts(name, parent_id, file_id,
> +                                              set_version=True)
> +            self._raw_conflicts.append(('contents conflict', file_group))
> 
> ^- Are we sure this is correct? If there is a contents_conflict we mark
> the file as unversioned?

I'm not really sure why, but this is what lp:bzr does in these cases.
So this isn't an issue introduced by this branch.

> Also, you are using "self.this_tree.inventory" but later we use "file_id
> in self.this_tree".
> 
> I'd probably rather use "self.this_tree.has_id(file_id)" in both cases.
> Trees shouldn't really have a __contains__ method, and I'd like to avoid
> using ".inventory" if we can. (Most Dirstate actions can be resolved
> without building an inventory at all.)

Fair enough.  Again, I think this was probably an issue before this
branch, but this at least is something I can improve with confidence in
this one :)

I've gone ahead and changed this everywhere in this file, even places I
hadn't previously touched.

> +        elif hook_status == 'conflicted':
> +            # XXX: perhaps the hook should be able to provide
> +            # the BASE/THIS/OTHER files?
> +            self.tt.create_file(lines, trans_id)
> +            self._raw_conflicts.append(('text conflict', trans_id))
> +            name = self.tt.final_name(trans_id)
> +            parent_id = self.tt.final_parent(trans_id)
> +            file_group = self._dump_conflicts(name, parent_id, file_id)
> +            file_group.append(trans_id)
> 
> ^- Do you know why we are appending the trans_id here? file_group isn't
> mentioned again. So is this mutating an internal structure behind the
> scenes? (_dump_conflicts is creating a list and storing it in state
> somewhere, and we mutate it to add the new id.) If so, a comment would
> be nice. (Probably not your code, but certainly a "what is going on?"
> moment.)

Hmm.  Again, this is something that lp:bzr does that I've
preserved/copied.  On closer inspection, I don't think it has any effect
(either in lp:bzr or in my branch), so I'll just delete that.

> +        try:
> +            self.tt.tree_kind(trans_id)
> +            self.tt.delete_contents(trans_id)
> +        except errors.NoSuchFile:
> +            pass
> 
> ^- I don't quite understand the unconditional "delete_contents" if the
> contents exist. Especially given that:
> 
> +        elif hook_status == 'success':
> +            self.tt.create_file(lines, trans_id)
> 
> Is called *before* this.

Yes, this certainly puzzled me when I came across it in trunk!  (Again,
this oddity is definitely present in trunk, just that my patch has
perhaps made it more visible.)

> Is it that TreeTransform.delete_contents() indicates we want to delete
> the content in the current tree, and "create_file()" is creating
> contents in the "next" tree?

Exactly how TreeTransform isn't clearly documented that I could find.
If someone that knows precisely what the operations on it means and
could document that it would be a great help, I think.  AFAICS there
isn't even a description of what a trans_id is... is an abbreviation of
“transform id” or “transaction id”, are they file_ids, or something else
but like file_ids?  What's the difference between unversion_file()
and delete_contents()?  etc.

So, for this specific puzzle...

The code I'm guessing is there to ensure that the file is marked as
versioned, but that the temporary file used for the new contents is
removed?  Staring again at the treetransform.py code isn't bringing me
enlightenment.

> A comment of this might clarify:
>   # Now get rid of the contents in the working tree, so that any new
>   # content (even conflicts) has a place to go.
> ...

I really wish I knew if that was right.

All I'm certain of is that I'm doing the same as trunk in this case.

> +    def _default_other_winner_merge(self, merge_hook_params):
> +        """Replace this contents with other."""
> +        file_id = merge_hook_params.file_id
> +        trans_id = merge_hook_params.trans_id
> +        file_in_this = file_id in self.this_tree
> +        if file_id in self.other_tree:
> +            # OTHER changed the file
> +            wt = self.this_tree
> +            if wt.supports_content_filtering():
> +                # We get the path from the working tree if it exists.
> +                # That fails though when OTHER is adding a file, so
> +                # we fall back to the other tree to find the path if
> +                # it doesn't exist locally.
> +                try:
> +                    filter_tree_path = wt.id2path(file_id)
> +                except errors.NoSuchId:
> +                    filter_tree_path = self.other_tree.id2path(file_id)
> +            else:
> +                # Skip the id2path lookup for older formats
> +                filter_tree_path = None
> 
> ^- Something seems missing if we are doing this only in this code path.
> What about all the other code paths that will be creating file contents?
> How do the MergeHooks interact with content filtering?

That's a good question!  I don't know, but it's interesting that even in
trunk I only see content filtering involved in this case (OTHER is
winner) and when generating conflicts.  When both sides have modified
the file no content filtering appears to be involved, which doesn't
sound right.

So I *think* my patch doesn't make this worse, unless a plugin decides
to perform the merge for the winner == 'other' case.

[...]
> +"""Merge hook for bzr's NEWS file.
> +
> +Install this as a plugin, e.g:
> +
> +    cp contrib/news-file-merge-hook.py ~/.bazaar/plugins/news_merge.py
> +"""
> 
> ^- This is no longer true, since you changed it to "news_merge/__init__.py".
> 
> +from .parser import simple_parse
> 
> ^- ????
> 
> I'm guessing ".parser" is python 2.6 syntax, but this certainly isn't
> 2.4 syntax.

Right.  Fixed.