Merge lp:~songofacandy/bzr/fix-523746-dev into lp:bzr

Proposed by methane
Status: Merged
Approved by: Martin Pool
Approved revision: no longer in the source branch.
Merged at revision: 5892
Proposed branch: lp:~songofacandy/bzr/fix-523746-dev
Merge into: lp:bzr
Diff against target: 156 lines (+98/-12)
2 files modified
bzrlib/diff.py (+51/-11)
bzrlib/tests/test_diff.py (+47/-1)
To merge this branch: bzr merge lp:~songofacandy/bzr/fix-523746-dev
Reviewer Review Type Date Requested Status
John A Meinel Approve
Vincent Ladeuil Abstain
Alexander Belchenko Pending
Gary van der Merwe Pending
Martin Packman Pending
bzr-core Pending
Review via email: mp+20537@code.launchpad.net

Commit message

better handling of subprocesses with non-ascii encodings and filenames (bug 523746)

To post a comment you must log in.
Revision history for this message
methane (songofacandy) wrote :
Revision history for this message
Vincent Ladeuil (vila) wrote :

@Naoki, can you please use more meaningful branch names
(see Making a Merge Proposal in doc/developers/HACKING.txt) to ease Patch Pilot work ? :-)

Also, it helps a lot if you put an initial comment (we sometimes call it the cover letter)
when you start the merge proposal explaining the intent of the patch.

If you've fixed a bug, you desserve credit for that, so please add a NEWS entry too :)

review: Needs Fixing
Revision history for this message
Vincent Ladeuil (vila) wrote :

Oh, silly me, I mis-read your initial comment as pointing to itself,
where in fact, you just did what I asked in the *other* mp.

Anyway, this illustrates why I find sparse branch names unhelpful, I just didn't
recognize it :)

In fact you made the comments I was waiting for on the other mp:

> This problem depends on OS, locale settings and other environment.
> So I agree with you that this patch should land to bzr.dev and be tested on
> vary environment while bzr 2.2 is released.

Fine, I'll mark the other mp rejected.

So, test against variations on locale settings exist in bzrlib/tests/test_non_ascii.py
and bzrlib/tests/blackbox/test_non_ascii.py

Ask for help if you don't find the existing tests enough to write your own.

review: Needs Fixing
Revision history for this message
Vincent Ladeuil (vila) wrote :

@Naoki, I've added tests in lp:~vila/bzr/523746-difftool-file-names and doing so I think I found a simpler fix, could you test it and give feedback ?

review: Needs Information
Revision history for this message
methane (songofacandy) wrote :

This problem contains many situation.
I think your fix doesn't work sometimes because:

1. osutils.get_user_encoding() is not good in rare case.
user_encoding is not equals to filesystem encoding.

2. path.encode(enc) can raise UnicodeEncodeError.
So bzr should use "encode(enc, 'replace')"

3. _write_file() writes file with unicode path.
But if bzr can't encode the path collectly, external command can't
open the file.
So bzr should encode the path before _write_file().

To test these situation:

* Use fixed encoding in the test.
* Use filenames that contains chars that can't encode with the encoding.
* Replace subprocess.call and check arguments.

--
INADA Naoki <email address hidden>

Revision history for this message
Vincent Ladeuil (vila) wrote :

> This problem contains many situation.
> I think your fix doesn't work

That's why I wanted you to test it :) Did you ? If yes, did it fail and how ?

>sometimes because:
>
> 1. osutils.get_user_encoding() is not good in rare case.
> user_encoding is not equals to filesystem encoding.

My assumption was that subprocess.Popen should receive its arguments as if
they were coming from the command line: in user encoding.

>
> 2. path.encode(enc) can raise UnicodeEncodeError.
> So bzr should use "encode(enc, 'replace')"

That will mean some path can't be encoded in user encoding which in turn means
it could not be user as a parameter to bzr no ?

>
> 3. _write_file() writes file with unicode path.
> But if bzr can't encode the path collectly, external command can't
> open the file.

We rely on python to do the right thing from unicode paths there.

> So bzr should encode the path before _write_file().
>
> To test these situation:
>
> * Use fixed encoding in the test.

That's what the tests are doing for the user encoding.

> * Use filenames that contains chars that can't encode with the encoding.

Hmm, AFAIR, when we encounter problems there it's because some setting is wrong in the environment. As mentioned below, I couldn't automate the tests enough
to reach that point, hence my call for real-life tests :)

> * Replace subprocess.call and check arguments.

I don't want to do that, if the problem is in subprocess I will not be able to detect it.

That's the main point I dislike with the current tests, they don't exercise
subprocess.Popen, but I don't have an external diff tool I can call available
on all platforms.

Revision history for this message
Vincent Ladeuil (vila) wrote :

> We rely on python to do the right thing from unicode paths there.
i.e. encode using 'mbcs' as returned by getsystemfilecoding() on windows
and whatever is appropriate on other systems.

Revision history for this message
methane (songofacandy) wrote :

>> 1. osutils.get_user_encoding() is not good in rare case.
>> user_encoding is not equals to filesystem encoding.
>
>My assumption was that subprocess.Popen should receive its arguments as if
>they were coming from the command line: in user encoding.

OK, I don't know about when get_user_encoding() returns encoding that is not
suitable for commandline arguments.

>> 2. path.encode(enc) can raise UnicodeEncodeError.
>> So bzr should use "encode(enc, 'replace')"
>
>That will mean some path can't be encoded in user encoding which in turn means
>it could not be user as a parameter to bzr no ?

No. bzr uses unicode arguments in Windows. So users can pass unicode filename to bzr.
And "bzr diff --using=winmergeu.exe" doesn't requires file path but shows diff on
such a file.

>> 3. _write_file() writes file with unicode path.
>> But if bzr can't encode the path collectly, external command can't
>> open the file.
>
> We rely on python to do the right thing from unicode paths there.

On windows, unicode path is too perfect.
If unicode path contains chars that is not be able to encode,
we can't use the path in encoded string.
But we can create a file with such a name when using unicode.
So we should replace such a char before _write_file().

Revision history for this message
Vincent Ladeuil (vila) wrote :

>>>>> "Naoki" == INADA Naoki <email address hidden> writes:

    >>> 1. osutils.get_user_encoding() is not good in rare case.
    >>> user_encoding is not equals to filesystem encoding.
    >>
    >> My assumption was that subprocess.Popen should receive its arguments as if
    >> they were coming from the command line: in user encoding.

    Naoki> OK, I don't know about when get_user_encoding() returns encoding that is not
    Naoki> suitable for commandline arguments.

Good.

    >>> 2. path.encode(enc) can raise UnicodeEncodeError.
    >>> So bzr should use "encode(enc, 'replace')"
    >>
    >> That will mean some path can't be encoded in user encoding which
    >> in turn means it could not be user as a parameter to bzr no ?

    Naoki> No. bzr uses unicode arguments in Windows.

No. bzr uses unicode paths internally, they are coming from the command
line and decoded with user encoding or from the bzr db where they are
encoding into utf-8.

    Naoki> So users can pass unicode filename to bzr.

Encoded appropriately yes. They are decoded into unicode and encoded by
bzr or by python when access to the filesystem is needed at which point
the file system encoding is used (which may be different from the user
encoding).

    Naoki> And "bzr diff --using=winmergeu.exe" doesn't requires file
    Naoki> path but shows diff on such a file.

Sure and it may decode them using the *terminal* encoding for the
content of the diff but DiffFromTool has a parameter 'path_encoding' to
control that.

Note that this is still not the encoding we need to use to create any
needed file though.

    >>> 3. _write_file() writes file with unicode path.
    >>> But if bzr can't encode the path collectly, external command can't
    >>> open the file.
    >>
    >> We rely on python to do the right thing from unicode paths there.

    Naoki> On windows, unicode path is too perfect.
    Naoki> If unicode path contains chars that is not be able to encode,
    Naoki> we can't use the path in encoded string.

If we can't use it here, we can't use it either when creating the
working tree, so I'll tend to ignore the problem here.

    Naoki> But we can create a file with such a name when using unicode.
    Naoki> So we should replace such a char before _write_file().

I don't think so.

Can we chat about that on IRC or can you provide me a file name that
breaks my patch (preferably as a python string using the \xnnnn
notation) ?

Revision history for this message
Vincent Ladeuil (vila) wrote :

So it appears that I lack the windows knowledge (and the ability to tests)
to effectively help Naoki here.

The consensus seems to be that the problem is now to either:
- make subprocess.Popen accept unicode arguments (Naoki as a monkey-patch for that
  but I find it a bit scary),
- bypass subprocess.Popen and use a direct win32 API (I'm out of my league on
  this one)

John ? Alexander ? Can you give it a try ?

review: Abstain
Revision history for this message
Vincent Ladeuil (vila) wrote :

> John ? Alexander ? Can you give it a try ?

Or may be Gary since you're working in this area right now and have some way of testing it on windows ?

Revision history for this message
Alexander Belchenko (bialix) wrote :

> So it appears that I lack the windows knowledge (and the ability to tests)
> to effectively help Naoki here.
>
> The consensus seems to be that the problem is now to either:
> - make subprocess.Popen accept unicode arguments (Naoki as a monkey-patch for
> that
> but I find it a bit scary),
> - bypass subprocess.Popen and use a direct win32 API (I'm out of my league on
> this one)
>
> John ? Alexander ? Can you give it a try ?

Sorry, I'm a bit busy these days, have lost this ping. I can't promise I will look into it soon. But I'll try.

Revision history for this message
Gary van der Merwe (garyvdm) wrote :

Vincent Ladeuil wrote on 2010-04-21 :
> That's the main point I dislike with the current tests, they don't exercise
> subprocess.Popen, but I don't have an external diff tool I can call available
> on all platforms.

I've fixed that in this branch: lp:~garyvdm/bzr/523746-difftool-file-names

Those tests fail in wine for both Vincent's and INADA's branches.

- new_abs_path = self._prepare_files(file_id, old_path, new_path,
- allow_write_new=True,
- force_temp=True)[1]
- command = self._get_command(osutils.pathjoin('old', old_path),
- osutils.pathjoin('new', new_path))
+ old_abs_path, new_abs_path = self._prepare_files(
+ file_id, old_path, new_path,
+ allow_write_new=True,
+ force_temp=True)
+ command = self._get_command(old_abs_path, new_abs_path)

This is a nice simple change. Maybe we should merge this straight away, separately from the other changes.

- new_file = open(new_abs_path, 'r')
+ new_file = open(new_abs_path, 'rb')

Why is this changed?

Revision history for this message
methane (songofacandy) wrote :

> Vincent Ladeuil wrote on 2010-04-21 :
> > That's the main point I dislike with the current tests, they don't exercise
> > subprocess.Popen, but I don't have an external diff tool I can call
> available
> > on all platforms.
>
> I've fixed that in this branch: lp:~garyvdm/bzr/523746-difftool-file-names
>
> Those tests fail in wine for both Vincent's and INADA's branches.

I'll see the branch later.

> - new_abs_path = self._prepare_files(file_id, old_path, new_path,
> - allow_write_new=True,
> - force_temp=True)[1]
> - command = self._get_command(osutils.pathjoin('old', old_path),
> - osutils.pathjoin('new', new_path))
> + old_abs_path, new_abs_path = self._prepare_files(
> + file_id, old_path, new_path,
> + allow_write_new=True,
> + force_temp=True)
> + command = self._get_command(old_abs_path, new_abs_path)
>
> This is a nice simple change. Maybe we should merge this straight away,
> separately from the other changes.

I think so.

> - new_file = open(new_abs_path, 'r')
> + new_file = open(new_abs_path, 'rb')
>
> Why is this changed?

This change is made because I think bzr should use 'binary' mode in this point.
External diff tool may have edit feature and user may fix line ending. So bzr should keep
EOL bytes.

But, this change is not relating to #523746.

Revision history for this message
methane (songofacandy) wrote :

I'm ashamed to leave this so long.

> > Vincent Ladeuil wrote on 2010-04-21 :
> > > That's the main point I dislike with the current tests, they don't
> exercise
> > > subprocess.Popen, but I don't have an external diff tool I can call
> > available
> > > on all platforms.
> >
> > I've fixed that in this branch: lp:~garyvdm/bzr/523746-difftool-file-names
> >
> > Those tests fail in wine for both Vincent's and INADA's branches.
>
> I'll see the branch later.

Your test fails because my branch changes filename for it safely write
to the filename and pass the filename to subprocess.

I split filename creation from DiffFromTool._write_file and wrote test for it.
Please see it.

I've wrote test executing DiffFromTool but it doesn't tests real problem
because overriding sys.getfilesystemencoding() doesn't changes
Py_FileSystemDefaultEncoding in CPython core.

Revision history for this message
Vincent Ladeuil (vila) wrote :

Thanks for coming back songofacandy (If you explain me one last time which of INADA or Naoki I should use, I swear I won't forget anymore :-}) !

Once again, I'd like to defer to people with a better windows knowledge than myself in this specific area, I'll nominate explicit reviewers, please, one of you, review ! :) Or if you can't just abstain ;)

Revision history for this message
Vincent Ladeuil (vila) wrote :

My understanding was that we couldn't arbitrarily try to encode/decode both paths and other arguments. I may be right in general and wrong in this particular case where mbcs may be the silver bullet. That's why I'm deferring to people knowing better than me.

Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (3.2 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/06/2011 03:29 AM, INADA Naoki wrote:
> INADA Naoki has proposed merging lp:~songofacandy/bzr/fix-523746-dev into lp:bzr.
>
> Requested reviews:
> Vincent Ladeuil (vila)
> bzr-core (bzr-core)
> Related bugs:
> Bug #523746 in Bazaar: "crash with winmerge in cp932 japanese character"
> https://bugs.launchpad.net/bzr/+bug/523746
>
> For more details, see:
> https://code.launchpad.net/~songofacandy/bzr/fix-523746-dev/+merge/20537

+ def _safe_filename(self, prefix, relpath):
+ if sys.platform == 'win32':
+ fenc = 'mbcs'
+ else:
+ fenc = sys.getfilesystemencoding() or 'ascii'
+ # encoded_str.replace('?', '_') may break multibyte char.
+ # So we should encode, decode, then replace(u'?', u'_')
+ relpath_tmp = relpath.encode(fenc, 'replace').decode(fenc,
'replace')
+ relpath_tmp = relpath_tmp.replace(u'?', u'_')
+ return osutils.pathjoin(self._root, prefix, relpath_tmp)

Why are we using "encode('replace')" rather than "encode('ignore')". It
doesn't seem to buy us much to be putting "_" into the filename, rather
than just writing out whatever letters we can.

I also don't understand why you do the encode/decode dance and the put a
munged Unicode string back into the path. I think I sort of understand,
but a docstring would probably help a lot at understanding why we need
to use this function.

...

+ command = [AtTemplate(t).substitute(my_map) for t in
+ self.command_template]
+ if sys.platform == 'win32': # Popen doesn't accept unicode on win32
+ command_encoded = []
+ for c in command:
+ if isinstance(c, unicode):
+ command_encoded.append(c.encode('mbcs'))
+ else:
+ command_encoded.append(c)
+ return command_encoded
+ else:
+ return command

^- I'm not sure whether c.encode('mbcs') can fail, or whether it will
return '?' for unusable characters. For example, I can create Arabic or
Japanese filenames (since the filesystem is UTF-16) even though 'mbcs'
is only able to handle ~latin-1. Any thoughts on how to handle that
cleanly? We don't have to change Popen to support Unicode (though that
would be ideal), but at least something that would give users helpful
feedback, rather than just tracebacks.

+ def test_encodable_filename(self):
+ import sys
+ diffobj = diff.DiffFromTool(['python', '@old_path', '@new_path'],
+ None, None, None)

On *my* system, there is no 'python' (I never add it to sys.path, as I
happen to have 2.4/5/6/7 all installed.) There is 'sys.executable', though.

diffobj = diff.DiffFromTool([sys.executable, ...])

Also, we probably want to skip this test if "sys.frozen" is set, since I
don't think we can run it from 'bzr.exe'. I think we already had a
feature for that.

John
=:->

 review: needsfixing

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk3EA3oACgkQJdeBCYSNAAMYNgCeJIIEWRF1Dg/OLWQ5kRDCI6EP
W0cAoID...

Read more...

review: Needs Fixing
Revision history for this message
methane (songofacandy) wrote :
Download full text (3.6 KiB)

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 05/06/2011 03:29 AM, INADA Naoki wrote:
> > INADA Naoki has proposed merging lp:~songofacandy/bzr/fix-523746-dev into
> lp:bzr.
> >
> > Requested reviews:
> > Vincent Ladeuil (vila)
> > bzr-core (bzr-core)
> > Related bugs:
> > Bug #523746 in Bazaar: "crash with winmerge in cp932 japanese character"
> > https://bugs.launchpad.net/bzr/+bug/523746
> >
> > For more details, see:
> > https://code.launchpad.net/~songofacandy/bzr/fix-523746-dev/+merge/20537
>
>
> + def _safe_filename(self, prefix, relpath):
> + if sys.platform == 'win32':
> + fenc = 'mbcs'
> + else:
> + fenc = sys.getfilesystemencoding() or 'ascii'
> + # encoded_str.replace('?', '_') may break multibyte char.
> + # So we should encode, decode, then replace(u'?', u'_')
> + relpath_tmp = relpath.encode(fenc, 'replace').decode(fenc,
> 'replace')
> + relpath_tmp = relpath_tmp.replace(u'?', u'_')
> + return osutils.pathjoin(self._root, prefix, relpath_tmp)
>
> Why are we using "encode('replace')" rather than "encode('ignore')". It
> doesn't seem to buy us much to be putting "_" into the filename, rather
> than just writing out whatever letters we can.

Because "encode('ignore')" may makes null string.
Fallback to other string is also OK, but I don't have nice name for fallback.

> I also don't understand why you do the encode/decode dance and the put a
> munged Unicode string back into the path. I think I sort of understand,
> but a docstring would probably help a lot at understanding why we need
> to use this function.

OK, I'll add docstrings.

>
> ...
>
> + command = [AtTemplate(t).substitute(my_map) for t in
> + self.command_template]
> + if sys.platform == 'win32': # Popen doesn't accept unicode on win32
> + command_encoded = []
> + for c in command:
> + if isinstance(c, unicode):
> + command_encoded.append(c.encode('mbcs'))
> + else:
> + command_encoded.append(c)
> + return command_encoded
> + else:
> + return command
>
> ^- I'm not sure whether c.encode('mbcs') can fail, or whether it will
> return '?' for unusable characters. For example, I can create Arabic or
> Japanese filenames (since the filesystem is UTF-16) even though 'mbcs'
> is only able to handle ~latin-1. Any thoughts on how to handle that
> cleanly? We don't have to change Popen to support Unicode (though that
> would be ideal), but at least something that would give users helpful
> feedback, rather than just tracebacks.

c.encode('mbcs') may fail on very rare circumstance. For example, path of
diff tool or temporary directory is not be able to encode with 'mbcs'.
I don't have any idea what should we do on that circumstance.

But filepath on commandline should not be fail decoding unlsess
decode('mbcs').encode('mbcs') can be fail.

> + def test_encodable_filename(self):
> + import sys
> + diffobj = diff.DiffFromTool(['python', '@old_path', '@new_path'],
> + None, None, No...

Read more...

Revision history for this message
methane (songofacandy) wrote :

Done

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 5/6/2011 8:28 PM, INADA Naoki wrote:
> INADA Naoki has proposed merging lp:~songofacandy/bzr/fix-523746-dev into lp:bzr.
>
> Requested reviews:
> John A Meinel (jameinel)
> Alexander Belchenko (bialix)
> Gary van der Merwe (garyvdm)
> Martin [gz] (gz)
> Vincent Ladeuil (vila)
> bzr-core (bzr-core)
> Related bugs:
> Bug #523746 in Bazaar: "crash with winmerge in cp932 japanese character"
> https://bugs.launchpad.net/bzr/+bug/523746
>
> For more details, see:
> https://code.launchpad.net/~songofacandy/bzr/fix-523746-dev/+merge/20537

Can you give a little bit more background in your merge proposal? I know
there is a bug linked, but describing what you are doing makes it easier
to understand the diff.

 review: needsinfo

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk3Liw0ACgkQJdeBCYSNAAOxzACguDKS9ztLjW6NJ54ii0fWIffL
dgkAoJ8kAv8ws8Atxw+O4d4o3ke54TMa
=u/ie
-----END PGP SIGNATURE-----

review: Needs Information
Revision history for this message
methane (songofacandy) wrote :

OK.

1. On Windows.
Some diff application doesn't receive command line argument as a UTF-16 string
but as a string encoded with current codepage.
In this situation, passing Unicode argument to child process may fail
when the file path
contains character that is not be able to encode with current codepage.
So, we should use only chars those are able to encode with current codepage as a
temporary file.

2. On Unix.
All application may receive commandline argument as just bytes. So we can pass
any path as a argument.
But, GUI diff tool may be not able to show such filepath. So we should
use string
that is able to encode with current locale as a temporary file, too.

My fix removes chars those are not able to encode with current codepage/locale.
And force to use temporary file when checkouted file contains such chars as a
path.

On Thu, May 12, 2011 at 4:23 PM, John Arbash Meinel
<email address hidden> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 5/6/2011 8:28 PM, INADA Naoki wrote:
>> INADA Naoki has proposed merging lp:~songofacandy/bzr/fix-523746-dev into lp:bzr.
>>
>> Requested reviews:
>>   John A Meinel (jameinel)
>>   Alexander Belchenko (bialix)
>>   Gary van der Merwe (garyvdm)
>>   Martin [gz] (gz)
>>   Vincent Ladeuil (vila)
>>   bzr-core (bzr-core)
>> Related bugs:
>>   Bug #523746 in Bazaar: "crash with winmerge in cp932 japanese character"
>>   https://bugs.launchpad.net/bzr/+bug/523746
>>
>> For more details, see:
>> https://code.launchpad.net/~songofacandy/bzr/fix-523746-dev/+merge/20537
>
> Can you give a little bit more background in your merge proposal? I know
> there is a bug linked, but describing what you are doing makes it easier
> to understand the diff.
>
>  review: needsinfo
>
> John
> =:->
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (Cygwin)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAk3Liw0ACgkQJdeBCYSNAAOxzACguDKS9ztLjW6NJ54ii0fWIffL
> dgkAoJ8kAv8ws8Atxw+O4d4o3ke54TMa
> =u/ie
> -----END PGP SIGNATURE-----
>

--
INADA Naoki  <email address hidden>

Revision history for this message
John A Meinel (jameinel) wrote :

Seems fine to me. I don't really like the encode().decode() stuff, but I'll live with it.

review: Approve
Revision history for this message
Martin Pool (mbp) wrote :

sent to pqm by email

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'bzrlib/diff.py'
--- bzrlib/diff.py 2011-04-07 10:36:24 +0000
+++ bzrlib/diff.py 2011-05-06 17:43:35 +0000
@@ -745,8 +745,18 @@
745745
746 def _get_command(self, old_path, new_path):746 def _get_command(self, old_path, new_path):
747 my_map = {'old_path': old_path, 'new_path': new_path}747 my_map = {'old_path': old_path, 'new_path': new_path}
748 return [AtTemplate(t).substitute(my_map) for t in748 command = [AtTemplate(t).substitute(my_map) for t in
749 self.command_template]749 self.command_template]
750 if sys.platform == 'win32': # Popen doesn't accept unicode on win32
751 command_encoded = []
752 for c in command:
753 if isinstance(c, unicode):
754 command_encoded.append(c.encode('mbcs'))
755 else:
756 command_encoded.append(c)
757 return command_encoded
758 else:
759 return command
750760
751 def _execute(self, old_path, new_path):761 def _execute(self, old_path, new_path):
752 command = self._get_command(old_path, new_path)762 command = self._get_command(old_path, new_path)
@@ -772,12 +782,42 @@
772 raise782 raise
773 return True783 return True
774784
785 @staticmethod
786 def _fenc():
787 """Returns safe encoding for passing file path to diff tool"""
788 if sys.platform == 'win32':
789 return 'mbcs'
790 else:
791 # Don't fallback to 'utf-8' because subprocess may not be able to
792 # handle utf-8 correctly when locale is not utf-8.
793 return sys.getfilesystemencoding() or 'ascii'
794
795 def _is_safepath(self, path):
796 """Return true if `path` may be able to pass to subprocess."""
797 fenc = self._fenc()
798 try:
799 return path == path.encode(fenc).decode(fenc)
800 except UnicodeError:
801 return False
802
803 def _safe_filename(self, prefix, relpath):
804 """Replace unsafe character in `relpath` then join `self._root`,
805 `prefix` and `relpath`."""
806 fenc = self._fenc()
807 # encoded_str.replace('?', '_') may break multibyte char.
808 # So we should encode, decode, then replace(u'?', u'_')
809 relpath_tmp = relpath.encode(fenc, 'replace').decode(fenc, 'replace')
810 relpath_tmp = relpath_tmp.replace(u'?', u'_')
811 return osutils.pathjoin(self._root, prefix, relpath_tmp)
812
775 def _write_file(self, file_id, tree, prefix, relpath, force_temp=False,813 def _write_file(self, file_id, tree, prefix, relpath, force_temp=False,
776 allow_write=False):814 allow_write=False):
777 if not force_temp and isinstance(tree, WorkingTree):815 if not force_temp and isinstance(tree, WorkingTree):
778 return tree.abspath(tree.id2path(file_id))816 full_path = tree.abspath(tree.id2path(file_id))
779 817 if self._is_safepath(full_path):
780 full_path = osutils.pathjoin(self._root, prefix, relpath)818 return full_path
819
820 full_path = self._safe_filename(prefix, relpath)
781 if not force_temp and self._try_symlink_root(tree, prefix):821 if not force_temp and self._try_symlink_root(tree, prefix):
782 return full_path822 return full_path
783 parent_dir = osutils.dirname(full_path)823 parent_dir = osutils.dirname(full_path)
@@ -840,13 +880,13 @@
840 """880 """
841 old_path = self.old_tree.id2path(file_id)881 old_path = self.old_tree.id2path(file_id)
842 new_path = self.new_tree.id2path(file_id)882 new_path = self.new_tree.id2path(file_id)
843 new_abs_path = self._prepare_files(file_id, old_path, new_path,883 old_abs_path, new_abs_path = self._prepare_files(
844 allow_write_new=True,884 file_id, old_path, new_path,
845 force_temp=True)[1]885 allow_write_new=True,
846 command = self._get_command(osutils.pathjoin('old', old_path),886 force_temp=True)
847 osutils.pathjoin('new', new_path))887 command = self._get_command(old_abs_path, new_abs_path)
848 subprocess.call(command, cwd=self._root)888 subprocess.call(command, cwd=self._root)
849 new_file = open(new_abs_path, 'r')889 new_file = open(new_abs_path, 'rb')
850 try:890 try:
851 return new_file.read()891 return new_file.read()
852 finally:892 finally:
853893
=== modified file 'bzrlib/tests/test_diff.py'
--- bzrlib/tests/test_diff.py 2011-04-18 23:19:16 +0000
+++ bzrlib/tests/test_diff.py 2011-05-06 17:43:35 +0000
@@ -33,7 +33,7 @@
33 transform,33 transform,
34 )34 )
35from bzrlib.symbol_versioning import deprecated_in35from bzrlib.symbol_versioning import deprecated_in
36from bzrlib.tests import features36from bzrlib.tests import features, EncodingAdapter
37from bzrlib.tests.blackbox.test_diff import subst_dates37from bzrlib.tests.blackbox.test_diff import subst_dates
3838
3939
@@ -1421,6 +1421,52 @@
1421 diff_obj._prepare_files('file2-id', 'oldname2', 'newname2')1421 diff_obj._prepare_files('file2-id', 'oldname2', 'newname2')
14221422
14231423
1424class TestDiffFromToolEncodedFilename(tests.TestCaseWithTransport):
1425
1426 def test_encodable_filename(self):
1427 # Just checks file path for external diff tool.
1428 # We cannot change CPython's internal encoding used by os.exec*.
1429 import sys
1430 diffobj = diff.DiffFromTool(['dummy', '@old_path', '@new_path'],
1431 None, None, None)
1432 for _, scenario in EncodingAdapter.encoding_scenarios:
1433 encoding = scenario['encoding']
1434 dirname = scenario['info']['directory']
1435 filename = scenario['info']['filename']
1436
1437 self.overrideAttr(diffobj, '_fenc', lambda: encoding)
1438 relpath = dirname + u'/' + filename
1439 fullpath = diffobj._safe_filename('safe', relpath)
1440 self.assertEqual(
1441 fullpath,
1442 fullpath.encode(encoding).decode(encoding)
1443 )
1444 self.assert_(fullpath.startswith(diffobj._root + '/safe'))
1445
1446 def test_unencodable_filename(self):
1447 import sys
1448 diffobj = diff.DiffFromTool(['dummy', '@old_path', '@new_path'],
1449 None, None, None)
1450 for _, scenario in EncodingAdapter.encoding_scenarios:
1451 encoding = scenario['encoding']
1452 dirname = scenario['info']['directory']
1453 filename = scenario['info']['filename']
1454
1455 if encoding == 'iso-8859-1':
1456 encoding = 'iso-8859-2'
1457 else:
1458 encoding = 'iso-8859-1'
1459
1460 self.overrideAttr(diffobj, '_fenc', lambda: encoding)
1461 relpath = dirname + u'/' + filename
1462 fullpath = diffobj._safe_filename('safe', relpath)
1463 self.assertEqual(
1464 fullpath,
1465 fullpath.encode(encoding).decode(encoding)
1466 )
1467 self.assert_(fullpath.startswith(diffobj._root + '/safe'))
1468
1469
1424class TestGetTreesAndBranchesToDiffLocked(tests.TestCaseWithTransport):1470class TestGetTreesAndBranchesToDiffLocked(tests.TestCaseWithTransport):
14251471
1426 def call_gtabtd(self, path_list, revision_specs, old_url, new_url):1472 def call_gtabtd(self, path_list, revision_specs, old_url, new_url):