Merge lp:~songofacandy/bzr/fix-523746 into lp:bzr/2.0

Proposed by methane
Status: Superseded
Proposed branch: lp:~songofacandy/bzr/fix-523746
Merge into: lp:bzr/2.0
Diff against target: 95 lines (+24/-13)
3 files modified
bzrlib/diff.py (+7/-12)
bzrlib/osutils.py (+10/-1)
bzrlib/tests/test_osutils.py (+7/-0)
To merge this branch: bzr merge lp:~songofacandy/bzr/fix-523746
Reviewer Review Type Date Requested Status
bzr-core Pending
Review via email: mp+19734@code.launchpad.net
To post a comment you must log in.
Revision history for this message
methane (songofacandy) wrote :

Fix UnicodeEncodeError when invoking external diff with non ASCII filenames.

Revision history for this message
Robert Collins (lifeless) wrote :

I recall subprocess being very slow to import; it worries me that we'll
be importing it all the time.

A quick check on my machine took about 10ms to import, so its definitely
noticable.

-Rob

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> I recall subprocess being very slow to import; it worries me that we'll
> be importing it all the time.
>
> A quick check on my machine took about 10ms to import, so its definitely
> noticable.
>
> -Rob
>

It can pretty easily be moved into the lazy_import section.

Also, this doesn't handle stuff like Unicode filenames that are valid on
disk, but not valid in user encoding. (Which can be pretty easy to
trigger on Windows.)

However, it is probably better than what we have today.

Actually, we have a different problem. In that 'get_user_encoding()' !=
'sys.get_filesystemencoding'. I'm not 100% sure what needs to be passed
to programs.

The real fix is to use CreateProcessW and pass it a unicode string,
rather than trying to get everything round-tripped across an encoding
barrier.

This may be a stepping point in that direction, though.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkt+9AQACgkQJdeBCYSNAANcDACeMCCHcSVntHnO3XkvWLzCj82R
H2IAoMGFEiBOXh+t4Z0W+C64WG59gPWo
=6JUh
-----END PGP SIGNATURE-----

Revision history for this message
methane (songofacandy) wrote :

>> I recall subprocess being very slow to import; it worries me that we'll
>> be importing it all the time.
>>
>> A quick check on my machine took about 10ms to import, so its definitely
>> noticable.
>>
>> -Rob
>>
>
>It can pretty easily be moved into the lazy_import section.

osutils.Popen is a class. So subprocess imported every times
when import osutils.
I'll make osutils.Popen as a factory function instead of class.

> Actually, we have a different problem. In that 'get_user_encoding()' !=
> 'sys.get_filesystemencoding'. I'm not 100% sure what needs to be passed
> to programs.

Yes. Arguments contains file path and others texts.
I think there are no single right way to encode process arguments.
But I change to use filesystemencoding because file paths appears in
arguments rather than non-ASCII text arguments.
In most case, filesystemencoding == user_encoding(). So I think using
filesystemencoding solves 90% case.

> The real fix is to use CreateProcessW and pass it a unicode string,
> rather than trying to get everything round-tripped across an encoding
> barrier.

On Windows, CreateProcessW is good. But this bug affects not only Windows.
Popen encoding arguments with defaultencoding so popen can't encode utf-8
file path on Linux, too.

To use CreateProcessW, we should decode bytes strings into unicode.
This brings another problem: "What encoding should we use when decoding".
And to use CreateProcessW, we should change subprocess internals or reimplement
subprocess module.

So, just for now, I'd like to fix 90% case.

lp:~songofacandy/bzr/fix-523746 updated
4737. By methane

Make Popen as a factory function.

4738. By methane

Use _fs_enc instead of user_encoding()

4739. By methane

Add test and fix easy miss.

Revision history for this message
methane (songofacandy) wrote :

I resubmitted merge proposal.

lp:~songofacandy/bzr/fix-523746 updated
4740. By methane

Fix import ordering.

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

INADA Naoki wrote:
>>> I recall subprocess being very slow to import; it worries me that we'll
>>> be importing it all the time.
>>>
>>> A quick check on my machine took about 10ms to import, so its definitely
>>> noticable.
>>>
>>> -Rob
>>>
>> It can pretty easily be moved into the lazy_import section.
>
> osutils.Popen is a class. So subprocess imported every times
> when import osutils.
> I'll make osutils.Popen as a factory function instead of class.
>
>
>> Actually, we have a different problem. In that 'get_user_encoding()' !=
>> 'sys.get_filesystemencoding'. I'm not 100% sure what needs to be passed
>> to programs.
>
> Yes. Arguments contains file path and others texts.
> I think there are no single right way to encode process arguments.
> But I change to use filesystemencoding because file paths appears in
> arguments rather than non-ASCII text arguments.
> In most case, filesystemencoding == user_encoding(). So I think using
> filesystemencoding solves 90% case.

Only outsidef of Windows. On Windows filesystemencoding == mbcs, but
user_encoding is something like 'cp1252'. (and terminal encoding is
cp437...)

Now, I don't know whether arbitrary programs are going to use filesystem
encoding or user_encoding for decoding their arguments. I *think* that
argument decoding would be an OS sort of thing, and we use
filesystemencoding for ENV variables and thus it would make sense for
arguments. bzr itself used user_encoding before we use the Unicode api.

>
>> The real fix is to use CreateProcessW and pass it a unicode string,
>> rather than trying to get everything round-tripped across an encoding
>> barrier.
>
> On Windows, CreateProcessW is good. But this bug affects not only Windows.
> Popen encoding arguments with defaultencoding so popen can't encode utf-8
> file path on Linux, too.

Sure. And on Linux, it has pretty much standardized that everything is
in utf-8. Which is *soooo* much nicer. (file content, filenames, env
vars, terminal encoding, etc.)

>
> To use CreateProcessW, we should decode bytes strings into unicode.
> This brings another problem: "What encoding should we use when decoding".
> And to use CreateProcessW, we should change subprocess internals or reimplement
> subprocess module.
>
> So, just for now, I'd like to fix 90% case.
>

As long as we don't break other cases, I'm fine with that.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkuC3GUACgkQJdeBCYSNAAMUuwCfS28ggzvo1JNRialesoqv8Slw
U2MAn0J3tGr5HGGOPIqEz+w7CCm10hKm
=V9+v
-----END PGP SIGNATURE-----

lp:~songofacandy/bzr/fix-523746 updated
4741. By methane

Don't use Conditional Expressions for Python 2.4 compatibility.

Unmerged revisions

4741. By methane

Don't use Conditional Expressions for Python 2.4 compatibility.

4740. By methane

Fix import ordering.

4739. By methane

Add test and fix easy miss.

4738. By methane

Use _fs_enc instead of user_encoding()

4737. By methane

Make Popen as a factory function.

4736. By methane

Wrap Popen() to support unicode arguments.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'bzrlib/diff.py'
--- bzrlib/diff.py 2009-07-29 21:35:05 +0000
+++ bzrlib/diff.py 2010-02-20 05:02:13 +0000
@@ -131,11 +131,11 @@
131 stderr = None131 stderr = None
132132
133 try:133 try:
134 pipe = subprocess.Popen(diffcmd,134 pipe = osutils.Popen(diffcmd,
135 stdin=subprocess.PIPE,135 stdin=subprocess.PIPE,
136 stdout=subprocess.PIPE,136 stdout=subprocess.PIPE,
137 stderr=stderr,137 stderr=stderr,
138 env=env)138 env=env)
139 except OSError, e:139 except OSError, e:
140 if e.errno == errno.ENOENT:140 if e.errno == errno.ENOENT:
141 raise errors.NoDiff(str(e))141 raise errors.NoDiff(str(e))
@@ -171,11 +171,6 @@
171171
172 if not diff_opts:172 if not diff_opts:
173 diff_opts = []173 diff_opts = []
174 if sys.platform == 'win32':
175 # Popen doesn't do the proper encoding for external commands
176 # Since we are dealing with an ANSI api, use mbcs encoding
177 old_filename = old_filename.encode('mbcs')
178 new_filename = new_filename.encode('mbcs')
179 diffcmd = ['diff',174 diffcmd = ['diff',
180 '--label', old_filename,175 '--label', old_filename,
181 old_abspath,176 old_abspath,
@@ -687,8 +682,8 @@
687 def _execute(self, old_path, new_path):682 def _execute(self, old_path, new_path):
688 command = self._get_command(old_path, new_path)683 command = self._get_command(old_path, new_path)
689 try:684 try:
690 proc = subprocess.Popen(command, stdout=subprocess.PIPE,685 proc = osutils.Popen(command, stdout=subprocess.PIPE,
691 cwd=self._root)686 cwd=self._root)
692 except OSError, e:687 except OSError, e:
693 if e.errno == errno.ENOENT:688 if e.errno == errno.ENOENT:
694 raise errors.ExecutableMissing(command[0])689 raise errors.ExecutableMissing(command[0])
695690
=== modified file 'bzrlib/osutils.py'
--- bzrlib/osutils.py 2009-10-08 03:55:30 +0000
+++ bzrlib/osutils.py 2010-02-20 05:02:13 +0000
@@ -34,11 +34,11 @@
34 splitdrive as _nt_splitdrive,34 splitdrive as _nt_splitdrive,
35 )35 )
36import posixpath36import posixpath
37import subprocess
37import shutil38import shutil
38from shutil import (39from shutil import (
39 rmtree,40 rmtree,
40 )41 )
41import subprocess
42import tempfile42import tempfile
43from tempfile import (43from tempfile import (
44 mkdtemp,44 mkdtemp,
@@ -1907,3 +1907,12 @@
1907 if use_cache:1907 if use_cache:
1908 _cached_concurrency = concurrency1908 _cached_concurrency = concurrency
1909 return concurrency1909 return concurrency
1910
1911def Popen(subprocess_args, *args, **kwargs):
1912 """A wrapper for Popen for unicode arguments."""
1913 if not isinstance(subprocess_args, basestring):
1914 subprocess_args = [s.encode(_fs_enc) if isinstance(s, unicode) else s
1915 for s in subprocess_args]
1916 elif isinstance(subprocess_args, unicode):
1917 subprocess_args = subprocess_args.encode(_fs_enc)
1918 return subprocess.Popen(subprocess_args, *args, **kwargs)
19101919
=== modified file 'bzrlib/tests/test_osutils.py'
--- bzrlib/tests/test_osutils.py 2009-10-08 17:13:24 +0000
+++ bzrlib/tests/test_osutils.py 2010-02-20 05:02:13 +0000
@@ -24,6 +24,7 @@
24import stat24import stat
25import sys25import sys
26import time26import time
27import subprocess
2728
28from bzrlib import (29from bzrlib import (
29 errors,30 errors,
@@ -1841,3 +1842,9 @@
1841 def test_local_concurrency(self):1842 def test_local_concurrency(self):
1842 concurrency = osutils.local_concurrency()1843 concurrency = osutils.local_concurrency()
1843 self.assertIsInstance(concurrency, int)1844 self.assertIsInstance(concurrency, int)
1845
1846class TestPopen(tests.TestCase):
1847
1848 def test_unicode_args(self):
1849 proc = osutils.Popen(["echo", u"a\N{Euro Sign}b"], stdout=subprocess.PIPE)
1850 self.assertEquals(proc.wait(), 0)

Subscribers

People subscribed via source and target branches