Merge lp:~toshio/bzr/allow-dirty-patches into lp:bzr

Proposed by Toshio Kuratomi
Status: Merged
Merged at revision: not available
Proposed branch: lp:~toshio/bzr/allow-dirty-patches
Merge into: lp:bzr
Diff against target: 113 lines (+53/-8)
1 file modified
bzrlib/patches.py (+53/-8)
To merge this branch: bzr merge lp:~toshio/bzr/allow-dirty-patches
Reviewer Review Type Date Requested Status
Martin Pool Approve
Review via email: mp+18854@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Toshio Kuratomi (toshio) wrote :

Handle dirty patches.

Revision history for this message
Robert Collins (lifeless) wrote :

> Handle dirty patches.

Could you enlarge on what you mean a little? Thanks

-Rob

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

On Mon, 2010-02-08 at 22:09 +0000, Robert Collins wrote:
> > Handle dirty patches.
>
> Could you enlarge on what you mean a little? Thanks
This was about being able to parse patches with other data in them, such
as patches generated by git.

Cheers,

Jelmer

Revision history for this message
John A Meinel (jameinel) wrote :

This looks ok to me. I'm a bit surprised that we stop as soon as we get trailing garbage, but we allow preceding garbage. What happens with a patch like:

=== modified file 'file-1'
--- file-1 2010-01-13 21:15:55 +0000
+++ file-1 2010-02-09 19:19:46 +0000
@@ -54,17 +54,21 @@
 text
+one
-two

A comment about this next change
=== modified file 'file-2'
--- file-1 2010-01-13 21:15:55 +0000
+++ file-1 2010-02-09 19:19:46 +0000
@@ -54,17 +54,21 @@

Are they just considered separate 'patches', and thus the line is treated as bogus header data for the second patch?

How does 'patch' handle interleaved stuff?

Having thought about it more, we may want to be more relaxed overall about what we allow. I guess the original argument was that we wanted to make sure to understand the diff in a merge-proposal, so that we could validate that the diff you preview is the diff that would be applied once merged.

Revision history for this message
Martin Pool (mbp) wrote :

> How does 'patch' handle interleaved stuff?

I think unix patch just keeps reading through the whole stream, applying them one after the other.

>
> Having thought about it more, we may want to be more relaxed overall about
> what we allow. I guess the original argument was that we wanted to make sure
> to understand the diff in a merge-proposal, so that we could validate that the
> diff you preview is the diff that would be applied once merged.

I think so too. I don't see how this is necessarily a problem for merge proposals.

However, this patch is probably still an improvement in its own right, even if we want to accept trailing junk later.

It would be nice to see at least a basic test for this. Toshio, would you be able to add one? bzrlib/tests/test_patches is pretty clear so you just need to add one or two other cases.

review: Needs Fixing
Revision history for this message
Martin Pool (mbp) wrote :

I thought I'd add a test for this as patch pilot, so I put in this one based on bug 502076:

    def test_parse_leading_noise(self):
        """Parse a valid patch header"""
        # https://bugs.edge.launchpad.net/bzr/+bug/502076
        lines = ["diff -pruN commands.py",
            "--- orig/commands.py"
            "+++ mod/dommands.py"]
        bits = parse_patch(iter(lines), allow_dirty=True)

however this fails with

Traceback (most recent call last):
  File "/home/mbp/lib/python/testtools/runtest.py", line 128, in _run_user
    return fn(*args)
  File "/home/mbp/lib/python/testtools/testcase.py", line 369, in _run_test_method
    testMethod()
  File "/home/mbp/bzr/502076-dirty-patches/bzrlib/tests/test_patches.py", line 63, in test_parse_leading_noise
    bits = parse_patch(iter(lines), allow_dirty=True)
  File "/home/mbp/bzr/502076-dirty-patches/bzrlib/patches.py", line 364, in parse_patch
    (orig_name, mod_name) = get_patch_names(iter_lines)
  File "/home/mbp/bzr/502076-dirty-patches/bzrlib/patches.py", line 76, in get_patch_names
    raise MalformedPatchHeader("No orig name", line)
MalformedPatchHeader: Malformed patch header. No orig name
'diff -pruN commands.py'

Can someone tell me what format this is supposed to be parsing, or what's wrong with that test?

review: Needs Fixing
Revision history for this message
Toshio Kuratomi (toshio) wrote :

> I thought I'd add a test for this as patch pilot, so I put in this one based
> on bug 502076:
>
> def test_parse_leading_noise(self):
> """Parse a valid patch header"""
> # https://bugs.edge.launchpad.net/bzr/+bug/502076
> lines = ["diff -pruN commands.py",
> "--- orig/commands.py"
> "+++ mod/dommands.py"]
> bits = parse_patch(iter(lines), allow_dirty=True)
>
> however this fails with

> Can someone tell me what format this is supposed to be parsing, or what's
> wrong with that test?

Thanks for the test -- there's two problems with this patch:
1) Call parse_patches() instead of parse_patch():
   bits = parse_patch(iter(lines), allow_dirty=True)

2) You need a comma in your header lines list:
             "--- orig/commands.py",

Revision history for this message
Martin Pool (mbp) wrote :

> > Can someone tell me what format this is supposed to be parsing, or what's
> > wrong with that test?
>
> Thanks for the test -- there's two problems with this patch:
> 1) Call parse_patches() instead of parse_patch():
> bits = parse_patch(iter(lines), allow_dirty=True)
>
> 2) You need a comma in your header lines list:
> "--- orig/commands.py",

Thanks for that. With those changes, this works.

Is it really intended and reasonable that parse_patch (rather than parse_patches) takes allow_dirty, but doesn't accept garbage at the start? If so, we're ok to merge. (Or we could even handle the rest as a followon.)

Revision history for this message
Martin Pool (mbp) wrote :

I'll submit this with those changes.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/patches.py'
2--- bzrlib/patches.py 2009-11-03 15:45:56 +0000
3+++ bzrlib/patches.py 2010-02-08 17:05:27 +0000
4@@ -250,7 +250,13 @@
5 return shift
6
7
8-def iter_hunks(iter_lines):
9+def iter_hunks(iter_lines, allow_dirty=False):
10+ '''
11+ :arg iter_lines: iterable of lines to parse for hunks
12+ :kwarg allow_dirty: If True, when we encounter something that is not
13+ a hunk header when we're looking for one, assume the rest of the lines
14+ are not part of the patch (comments or other junk). Default False
15+ '''
16 hunk = None
17 for line in iter_lines:
18 if line == "\n":
19@@ -260,7 +266,15 @@
20 continue
21 if hunk is not None:
22 yield hunk
23- hunk = hunk_from_header(line)
24+ try:
25+ hunk = hunk_from_header(line)
26+ except MalformedHunkHeader:
27+ if allow_dirty:
28+ # If the line isn't a hunk header, then we've reached the end
29+ # of this patch and there's "junk" at the end. Ignore the
30+ # rest of this patch.
31+ return
32+ raise
33 orig_size = 0
34 mod_size = 0
35 while orig_size < hunk.orig_range or mod_size < hunk.mod_range:
36@@ -339,7 +353,12 @@
37 pos += 1
38
39
40-def parse_patch(iter_lines):
41+def parse_patch(iter_lines, allow_dirty=False):
42+ '''
43+ :arg iter_lines: iterable of lines to parse
44+ :kwarg allow_dirty: If True, allow the patch to have trailing junk.
45+ Default False
46+ '''
47 iter_lines = iter_lines_handle_nl(iter_lines)
48 try:
49 (orig_name, mod_name) = get_patch_names(iter_lines)
50@@ -347,15 +366,29 @@
51 return BinaryPatch(e.orig_name, e.mod_name)
52 else:
53 patch = Patch(orig_name, mod_name)
54- for hunk in iter_hunks(iter_lines):
55+ for hunk in iter_hunks(iter_lines, allow_dirty):
56 patch.hunks.append(hunk)
57 return patch
58
59
60-def iter_file_patch(iter_lines):
61+def iter_file_patch(iter_lines, allow_dirty=False):
62+ '''
63+ :arg iter_lines: iterable of lines to parse for patches
64+ :kwarg allow_dirty: If True, allow comments and other non-patch text
65+ before the first patch. Note that the algorithm here can only find
66+ such text before any patches have been found. Comments after the
67+ first patch are stripped away in iter_hunks() if it is also passed
68+ allow_dirty=True. Default False.
69+ '''
70+ ### FIXME: Docstring is not quite true. We allow certain comments no
71+ # matter what, If they startwith '===', '***', or '#' Someone should
72+ # reexamine this logic and decide if we should include those in
73+ # allow_dirty or restrict those to only being before the patch is found
74+ # (as allow_dirty does).
75 regex = re.compile(binary_files_re)
76 saved_lines = []
77 orig_range = 0
78+ beginning = True
79 for line in iter_lines:
80 if line.startswith('=== ') or line.startswith('*** '):
81 continue
82@@ -365,7 +398,12 @@
83 if line.startswith('-') or line.startswith(' '):
84 orig_range -= 1
85 elif line.startswith('--- ') or regex.match(line):
86- if len(saved_lines) > 0:
87+ if allow_dirty and beginning:
88+ # Patches can have "junk" at the beginning
89+ # Stripping junk from the end of patches is handled when we
90+ # parse the patch
91+ beginning = False
92+ elif len(saved_lines) > 0:
93 yield saved_lines
94 saved_lines = []
95 elif line.startswith('@@'):
96@@ -397,8 +435,15 @@
97 yield last_line
98
99
100-def parse_patches(iter_lines):
101- return [parse_patch(f.__iter__()) for f in iter_file_patch(iter_lines)]
102+def parse_patches(iter_lines, allow_dirty=False):
103+ '''
104+ :arg iter_lines: iterable of lines to parse for patches
105+ :kwarg allow_dirty: If True, allow text that's not part of the patch at
106+ selected places. This includes comments before and after a patch
107+ for instance. Default False.
108+ '''
109+ return [parse_patch(f.__iter__(), allow_dirty) for f in
110+ iter_file_patch(iter_lines, allow_dirty)]
111
112
113 def difference_index(atext, btext):