Merge lp:~abentley/bzr/parse-binary-diff-2.0 into lp:bzr/2.0

Proposed by Aaron Bentley
Status: Merged
Approved by: John A Meinel
Approved revision: no longer in the source branch.
Merged at revision: not available
Proposed branch: lp:~abentley/bzr/parse-binary-diff-2.0
Merge into: lp:bzr/2.0
Diff against target: 82 lines
4 files modified
NEWS (+1/-0)
bzrlib/patches.py (+6/-2)
bzrlib/tests/test_patches.py (+9/-0)
bzrlib/tests/test_patches_data/binary-after-normal.patch (+6/-0)
To merge this branch: bzr merge lp:~abentley/bzr/parse-binary-diff-2.0
Reviewer Review Type Date Requested Status
John A Meinel Approve
Review via email: mp+14421@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Aaron Bentley (abentley) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all,

My initial fix for binary file sections in patches turns out to be
woefully incomplete, as it only works for binary file sections that
appear at the beginning of the patch. This patch ensures that binary
file sections that appear after normal hunks also work.

In order to do this, iter_file_patch must recognize that a "binary files
differ" section indicates the beginning of a new file. The performance
impact should be negligible, since a compiled regex is used.

I am proposing this against 2.0 because it's a relatively uninvasive bugfix.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkrxkHUACgkQ0F+nu1YWqI1zGACghB5mp3zJCml707PHS7kDEF+A
wysAn3iEt/8GrStU8c9tJiaFBst7xUpZ
=qcqP
-----END PGP SIGNATURE-----

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Aaron Bentley wrote:
> Aaron Bentley has proposed merging lp:~abentley/bzr/parse-binary-diff-2.0 into lp:bzr/2.0.
>
> Requested reviews:
> bzr-core (bzr-core)
>
>
> Hi all,
>
> My initial fix for binary file sections in patches turns out to be
> woefully incomplete, as it only works for binary file sections that
> appear at the beginning of the patch. This patch ensures that binary
> file sections that appear after normal hunks also work.
>
> In order to do this, iter_file_patch must recognize that a "binary files
> differ" section indicates the beginning of a new file. The performance
> impact should be negligible, since a compiled regex is used.
>
> I am proposing this against 2.0 because it's a relatively uninvasive bugfix.
>
> Aaron

So the main concern with this is whether you could match that exact line
in the content of a patch. However, I think the diff syntax requires
that each line inside a patch chunk starts with ' ' (single whitespace
character.)

Any line that doesn't is just ignored (I think).

Can you confirm that.

Looking at your 'binary-after-normal.patch' file, I'm not sure it is
actually correct. When I test it here, I get:

- --- baz 2009-10-14 19:49:59 +0000
+++ quxx 2009-10-14 19:51:00 +0000
@@ -1 +1 @@
- -hello
+goodbye

Binary files bar 2009-10-14 19:49:59 +0000 and qux 2009-10-14 19:50:35
+0000 differ

(A single blank line declares the end of the previous section.)

Anyway, I'm ok not relying on that blank line, but I certainly thought
it was semi-important.

 review: approve
 merge: approve

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkryBPkACgkQJdeBCYSNAAPUbQCgzQ25sEy0QKAZdrOdANan3FRa
w9kAoJUTWYGJ+UTMcTeSdcw5mAuwKuvM
=g58t
-----END PGP SIGNATURE-----

review: Approve
Revision history for this message
Aaron Bentley (abentley) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Thanks for your review.

John A Meinel wrote:
> So the main concern with this is whether you could match that exact line
> in the content of a patch. However, I think the diff syntax requires
> that each line inside a patch chunk starts with ' ' (single whitespace
> character.)

That is correct.

> Any line that doesn't is just ignored (I think).

Any line that doesn't will cause patch to abort like this:

patch: **** malformed patch at line 10: I am a line without a leading space

> Can you confirm that.

I can confirm that the line "Binary files x and y differ" cannot appear
inside a valid patch hunk. It can only occur outside patch hunks, so
the contents being diff'ed can't cause a false positive.

> When I test it here, I get:
>
> --- baz 2009-10-14 19:49:59 +0000
> +++ quxx 2009-10-14 19:51:00 +0000
> @@ -1 +1 @@
> -hello
> +goodbye
>
> Binary files bar 2009-10-14 19:49:59 +0000 and qux 2009-10-14 19:50:35
> +0000 differ
>
> (A single blank line declares the end of the previous section.)

No, the single blank line has no significance. It is not emitted by
diff, for example:

$ diff -ru old new
diff -ru old/baz new/baz
- --- old/baz 2009-11-04 23:19:29.000000000 -0500
+++ new/baz 2009-11-04 23:19:41.000000000 -0500
@@ -1 +1 @@
- -hello
+goodbye
Binary files old/tar and new/tar differ

> Anyway, I'm ok not relying on that blank line, but I certainly thought
> it was semi-important.

It's not. bzr tends to emit extra blank lines that diff itself would
not. We could consider this a bug. Patch parsing is usually pretty
loose, so these extra lines are ignored.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkryVVYACgkQ0F+nu1YWqI2j7gCfb5ZAJdWh1+//DC12eMCFqF3a
hj4An3uWtuxMt+IGOYiuenYvrNbijaw6
=ESka
-----END PGP SIGNATURE-----

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2009-10-30 15:58:10 +0000
3+++ NEWS 2009-11-04 14:40:23 +0000
4@@ -19,6 +19,7 @@
5
6 Bug Fixes
7 *********
8+* Improve "Binary files differ" hunk handling. (Aaron Bentley, #436325)
9
10 Improvements
11 ************
12
13=== modified file 'bzrlib/patches.py'
14--- bzrlib/patches.py 2009-10-14 22:08:45 +0000
15+++ bzrlib/patches.py 2009-11-04 14:40:23 +0000
16@@ -17,6 +17,9 @@
17 import re
18
19
20+binary_files_re = 'Binary files (.*) and (.*) differ\n'
21+
22+
23 class BinaryFiles(Exception):
24
25 def __init__(self, orig_name, mod_name):
26@@ -66,7 +69,7 @@
27 def get_patch_names(iter_lines):
28 try:
29 line = iter_lines.next()
30- match = re.match('Binary files (.*) and (.*) differ\n', line)
31+ match = re.match(binary_files_re, line)
32 if match is not None:
33 raise BinaryFiles(match.group(1), match.group(2))
34 if not line.startswith("--- "):
35@@ -350,6 +353,7 @@
36
37
38 def iter_file_patch(iter_lines):
39+ regex = re.compile(binary_files_re)
40 saved_lines = []
41 orig_range = 0
42 for line in iter_lines:
43@@ -360,7 +364,7 @@
44 elif orig_range > 0:
45 if line.startswith('-') or line.startswith(' '):
46 orig_range -= 1
47- elif line.startswith('--- '):
48+ elif line.startswith('--- ') or regex.match(line):
49 if len(saved_lines) > 0:
50 yield saved_lines
51 saved_lines = []
52
53=== modified file 'bzrlib/tests/test_patches.py'
54--- bzrlib/tests/test_patches.py 2009-10-15 15:25:20 +0000
55+++ bzrlib/tests/test_patches.py 2009-11-04 14:40:23 +0000
56@@ -156,6 +156,15 @@
57 self.assertContainsRe(str(patches[0]),
58 'Binary files bar\t.* and qux\t.* differ\n')
59
60+ def test_parse_binary_after_normal(self):
61+ patches = parse_patches(self.data_lines("binary-after-normal.patch"))
62+ self.assertIs(BinaryPatch, patches[1].__class__)
63+ self.assertIs(Patch, patches[0].__class__)
64+ self.assertContainsRe(patches[1].oldname, '^bar\t')
65+ self.assertContainsRe(patches[1].newname, '^qux\t')
66+ self.assertContainsRe(str(patches[1]),
67+ 'Binary files bar\t.* and qux\t.* differ\n')
68+
69 def test_roundtrip_binary(self):
70 patchtext = ''.join(self.data_lines("binary.patch"))
71 patches = parse_patches(patchtext.splitlines(True))
72
73=== added file 'bzrlib/tests/test_patches_data/binary-after-normal.patch'
74--- bzrlib/tests/test_patches_data/binary-after-normal.patch 1970-01-01 00:00:00 +0000
75+++ bzrlib/tests/test_patches_data/binary-after-normal.patch 2009-11-04 14:40:23 +0000
76@@ -0,0 +1,6 @@
77+--- baz 2009-10-14 19:49:59 +0000
78++++ quxx 2009-10-14 19:51:00 +0000
79+@@ -1 +1 @@
80+-hello
81++goodbye
82+Binary files bar 2009-10-14 19:49:59 +0000 and qux 2009-10-14 19:50:35 +0000 differ

Subscribers

People subscribed via source and target branches