Merge lp:~toshio/bzr/allow-dirty-patches into lp:bzr

Proposed by Toshio Kuratomi
Status: Merged
Merged at revision: not available
Proposed branch: lp:~toshio/bzr/allow-dirty-patches
Merge into: lp:bzr
Diff against target: 113 lines (+53/-8)
1 file modified
bzrlib/patches.py (+53/-8)
To merge this branch: bzr merge lp:~toshio/bzr/allow-dirty-patches
Reviewer Review Type Date Requested Status
Martin Pool Approve
Review via email: mp+18854@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Toshio Kuratomi (toshio) wrote :

Handle dirty patches.

Revision history for this message
Robert Collins (lifeless) wrote :

> Handle dirty patches.

Could you enlarge on what you mean a little? Thanks

-Rob

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

On Mon, 2010-02-08 at 22:09 +0000, Robert Collins wrote:
> > Handle dirty patches.
>
> Could you enlarge on what you mean a little? Thanks
This was about being able to parse patches with other data in them, such
as patches generated by git.

Cheers,

Jelmer

Revision history for this message
John A Meinel (jameinel) wrote :

This looks ok to me. I'm a bit surprised that we stop as soon as we get trailing garbage, but we allow preceding garbage. What happens with a patch like:

=== modified file 'file-1'
--- file-1 2010-01-13 21:15:55 +0000
+++ file-1 2010-02-09 19:19:46 +0000
@@ -54,17 +54,21 @@
 text
+one
-two

A comment about this next change
=== modified file 'file-2'
--- file-1 2010-01-13 21:15:55 +0000
+++ file-1 2010-02-09 19:19:46 +0000
@@ -54,17 +54,21 @@

Are they just considered separate 'patches', and thus the line is treated as bogus header data for the second patch?

How does 'patch' handle interleaved stuff?

Having thought about it more, we may want to be more relaxed overall about what we allow. I guess the original argument was that we wanted to make sure to understand the diff in a merge-proposal, so that we could validate that the diff you preview is the diff that would be applied once merged.

Revision history for this message
Martin Pool (mbp) wrote :

> How does 'patch' handle interleaved stuff?

I think unix patch just keeps reading through the whole stream, applying them one after the other.

>
> Having thought about it more, we may want to be more relaxed overall about
> what we allow. I guess the original argument was that we wanted to make sure
> to understand the diff in a merge-proposal, so that we could validate that the
> diff you preview is the diff that would be applied once merged.

I think so too. I don't see how this is necessarily a problem for merge proposals.

However, this patch is probably still an improvement in its own right, even if we want to accept trailing junk later.

It would be nice to see at least a basic test for this. Toshio, would you be able to add one? bzrlib/tests/test_patches is pretty clear so you just need to add one or two other cases.

review: Needs Fixing
Revision history for this message
Martin Pool (mbp) wrote :

I thought I'd add a test for this as patch pilot, so I put in this one based on bug 502076:

    def test_parse_leading_noise(self):
        """Parse a valid patch header"""
        # https://bugs.edge.launchpad.net/bzr/+bug/502076
        lines = ["diff -pruN commands.py",
            "--- orig/commands.py"
            "+++ mod/dommands.py"]
        bits = parse_patch(iter(lines), allow_dirty=True)

however this fails with

Traceback (most recent call last):
  File "/home/mbp/lib/python/testtools/runtest.py", line 128, in _run_user
    return fn(*args)
  File "/home/mbp/lib/python/testtools/testcase.py", line 369, in _run_test_method
    testMethod()
  File "/home/mbp/bzr/502076-dirty-patches/bzrlib/tests/test_patches.py", line 63, in test_parse_leading_noise
    bits = parse_patch(iter(lines), allow_dirty=True)
  File "/home/mbp/bzr/502076-dirty-patches/bzrlib/patches.py", line 364, in parse_patch
    (orig_name, mod_name) = get_patch_names(iter_lines)
  File "/home/mbp/bzr/502076-dirty-patches/bzrlib/patches.py", line 76, in get_patch_names
    raise MalformedPatchHeader("No orig name", line)
MalformedPatchHeader: Malformed patch header. No orig name
'diff -pruN commands.py'

Can someone tell me what format this is supposed to be parsing, or what's wrong with that test?

review: Needs Fixing
Revision history for this message
Toshio Kuratomi (toshio) wrote :

> I thought I'd add a test for this as patch pilot, so I put in this one based
> on bug 502076:
>
> def test_parse_leading_noise(self):
> """Parse a valid patch header"""
> # https://bugs.edge.launchpad.net/bzr/+bug/502076
> lines = ["diff -pruN commands.py",
> "--- orig/commands.py"
> "+++ mod/dommands.py"]
> bits = parse_patch(iter(lines), allow_dirty=True)
>
> however this fails with

> Can someone tell me what format this is supposed to be parsing, or what's
> wrong with that test?

Thanks for the test -- there's two problems with this patch:
1) Call parse_patches() instead of parse_patch():
   bits = parse_patch(iter(lines), allow_dirty=True)

2) You need a comma in your header lines list:
             "--- orig/commands.py",

Revision history for this message
Martin Pool (mbp) wrote :

> > Can someone tell me what format this is supposed to be parsing, or what's
> > wrong with that test?
>
> Thanks for the test -- there's two problems with this patch:
> 1) Call parse_patches() instead of parse_patch():
> bits = parse_patch(iter(lines), allow_dirty=True)
>
> 2) You need a comma in your header lines list:
> "--- orig/commands.py",

Thanks for that. With those changes, this works.

Is it really intended and reasonable that parse_patch (rather than parse_patches) takes allow_dirty, but doesn't accept garbage at the start? If so, we're ok to merge. (Or we could even handle the rest as a followon.)

Revision history for this message
Martin Pool (mbp) wrote :

I'll submit this with those changes.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'bzrlib/patches.py'
--- bzrlib/patches.py 2009-11-03 15:45:56 +0000
+++ bzrlib/patches.py 2010-02-08 17:05:27 +0000
@@ -250,7 +250,13 @@
250 return shift250 return shift
251251
252252
253def iter_hunks(iter_lines):253def iter_hunks(iter_lines, allow_dirty=False):
254 '''
255 :arg iter_lines: iterable of lines to parse for hunks
256 :kwarg allow_dirty: If True, when we encounter something that is not
257 a hunk header when we're looking for one, assume the rest of the lines
258 are not part of the patch (comments or other junk). Default False
259 '''
254 hunk = None260 hunk = None
255 for line in iter_lines:261 for line in iter_lines:
256 if line == "\n":262 if line == "\n":
@@ -260,7 +266,15 @@
260 continue266 continue
261 if hunk is not None:267 if hunk is not None:
262 yield hunk268 yield hunk
263 hunk = hunk_from_header(line)269 try:
270 hunk = hunk_from_header(line)
271 except MalformedHunkHeader:
272 if allow_dirty:
273 # If the line isn't a hunk header, then we've reached the end
274 # of this patch and there's "junk" at the end. Ignore the
275 # rest of this patch.
276 return
277 raise
264 orig_size = 0278 orig_size = 0
265 mod_size = 0279 mod_size = 0
266 while orig_size < hunk.orig_range or mod_size < hunk.mod_range:280 while orig_size < hunk.orig_range or mod_size < hunk.mod_range:
@@ -339,7 +353,12 @@
339 pos += 1353 pos += 1
340354
341355
342def parse_patch(iter_lines):356def parse_patch(iter_lines, allow_dirty=False):
357 '''
358 :arg iter_lines: iterable of lines to parse
359 :kwarg allow_dirty: If True, allow the patch to have trailing junk.
360 Default False
361 '''
343 iter_lines = iter_lines_handle_nl(iter_lines)362 iter_lines = iter_lines_handle_nl(iter_lines)
344 try:363 try:
345 (orig_name, mod_name) = get_patch_names(iter_lines)364 (orig_name, mod_name) = get_patch_names(iter_lines)
@@ -347,15 +366,29 @@
347 return BinaryPatch(e.orig_name, e.mod_name)366 return BinaryPatch(e.orig_name, e.mod_name)
348 else:367 else:
349 patch = Patch(orig_name, mod_name)368 patch = Patch(orig_name, mod_name)
350 for hunk in iter_hunks(iter_lines):369 for hunk in iter_hunks(iter_lines, allow_dirty):
351 patch.hunks.append(hunk)370 patch.hunks.append(hunk)
352 return patch371 return patch
353372
354373
355def iter_file_patch(iter_lines):374def iter_file_patch(iter_lines, allow_dirty=False):
375 '''
376 :arg iter_lines: iterable of lines to parse for patches
377 :kwarg allow_dirty: If True, allow comments and other non-patch text
378 before the first patch. Note that the algorithm here can only find
379 such text before any patches have been found. Comments after the
380 first patch are stripped away in iter_hunks() if it is also passed
381 allow_dirty=True. Default False.
382 '''
383 ### FIXME: Docstring is not quite true. We allow certain comments no
384 # matter what, If they startwith '===', '***', or '#' Someone should
385 # reexamine this logic and decide if we should include those in
386 # allow_dirty or restrict those to only being before the patch is found
387 # (as allow_dirty does).
356 regex = re.compile(binary_files_re)388 regex = re.compile(binary_files_re)
357 saved_lines = []389 saved_lines = []
358 orig_range = 0390 orig_range = 0
391 beginning = True
359 for line in iter_lines:392 for line in iter_lines:
360 if line.startswith('=== ') or line.startswith('*** '):393 if line.startswith('=== ') or line.startswith('*** '):
361 continue394 continue
@@ -365,7 +398,12 @@
365 if line.startswith('-') or line.startswith(' '):398 if line.startswith('-') or line.startswith(' '):
366 orig_range -= 1399 orig_range -= 1
367 elif line.startswith('--- ') or regex.match(line):400 elif line.startswith('--- ') or regex.match(line):
368 if len(saved_lines) > 0:401 if allow_dirty and beginning:
402 # Patches can have "junk" at the beginning
403 # Stripping junk from the end of patches is handled when we
404 # parse the patch
405 beginning = False
406 elif len(saved_lines) > 0:
369 yield saved_lines407 yield saved_lines
370 saved_lines = []408 saved_lines = []
371 elif line.startswith('@@'):409 elif line.startswith('@@'):
@@ -397,8 +435,15 @@
397 yield last_line435 yield last_line
398436
399437
400def parse_patches(iter_lines):438def parse_patches(iter_lines, allow_dirty=False):
401 return [parse_patch(f.__iter__()) for f in iter_file_patch(iter_lines)]439 '''
440 :arg iter_lines: iterable of lines to parse for patches
441 :kwarg allow_dirty: If True, allow text that's not part of the patch at
442 selected places. This includes comments before and after a patch
443 for instance. Default False.
444 '''
445 return [parse_patch(f.__iter__(), allow_dirty) for f in
446 iter_file_patch(iter_lines, allow_dirty)]
402447
403448
404def difference_index(atext, btext):449def difference_index(atext, btext):