Merge into bzr.dev : fix-523746-dev : Code : Bazaar

Reviewer	Date Requested	Status
John A Meinel	2011-05-06	Approve on 2011-05-16
Vincent Ladeuil	2010-03-03	Abstain on 2010-04-22
Alexander Belchenko	2011-05-06	Pending
Gary van der Merwe	2011-05-06	Pending
Martin Packman	2011-05-06	Pending
bzr-core	2010-04-29	Pending
Review via email: mp+20537@code.launchpad.net

Revision history for this message

methane (songofacandy) wrote on 2010-03-03:

#

https://code.edge.launchpad.net/~songofacandy/bzr/fix-523746-2/+merge/19938 for bzr.dev.

Revision history for this message

Vincent Ladeuil (vila) wrote on 2010-03-03:

#

@Naoki, can you please use more meaningful branch names
(see Making a Merge Proposal in doc/developers/HACKING.txt) to ease Patch Pilot work ? :-)

Also, it helps a lot if you put an initial comment (we sometimes call it the cover letter)
when you start the merge proposal explaining the intent of the patch.

If you've fixed a bug, you desserve credit for that, so please add a NEWS entry too :)

review: Needs Fixing

Revision history for this message

Vincent Ladeuil (vila) wrote on 2010-03-03:

#

Oh, silly me, I mis-read your initial comment as pointing to itself,
where in fact, you just did what I asked in the *other* mp.

Anyway, this illustrates why I find sparse branch names unhelpful, I just didn't
recognize it :)

In fact you made the comments I was waiting for on the other mp:

> This problem depends on OS, locale settings and other environment.
> So I agree with you that this patch should land to bzr.dev and be tested on
> vary environment while bzr 2.2 is released.

Fine, I'll mark the other mp rejected.

So, test against variations on locale settings exist in bzrlib/tests/test_non_ascii.py
and bzrlib/tests/blackbox/test_non_ascii.py

Ask for help if you don't find the existing tests enough to write your own.

review: Needs Fixing

Revision history for this message

Vincent Ladeuil (vila) wrote on 2010-04-21:

#

@Naoki, I've added tests in lp:~vila/bzr/523746-difftool-file-names and doing so I think I found a simpler fix, could you test it and give feedback ?

review: Needs Information

Revision history for this message

methane (songofacandy) wrote on 2010-04-21:

#

This problem contains many situation.
I think your fix doesn't work sometimes because:

1. osutils.get_user_encoding() is not good in rare case.
user_encoding is not equals to filesystem encoding.

2. path.encode(enc) can raise UnicodeEncodeError.
So bzr should use "encode(enc, 'replace')"

3. _write_file() writes file with unicode path.
But if bzr can't encode the path collectly, external command can't
open the file.
So bzr should encode the path before _write_file().

To test these situation:

* Use fixed encoding in the test.
* Use filenames that contains chars that can't encode with the encoding.
* Replace subprocess.call and check arguments.

--
INADA Naoki <email address hidden>

Revision history for this message

Vincent Ladeuil (vila) wrote on 2010-04-21:

#

> This problem contains many situation.
> I think your fix doesn't work

That's why I wanted you to test it :) Did you ? If yes, did it fail and how ?

>sometimes because:
>
> 1. osutils.get_user_encoding() is not good in rare case.
> user_encoding is not equals to filesystem encoding.

My assumption was that subprocess.Popen should receive its arguments as if
they were coming from the command line: in user encoding.

>
> 2. path.encode(enc) can raise UnicodeEncodeError.
> So bzr should use "encode(enc, 'replace')"

That will mean some path can't be encoded in user encoding which in turn means
it could not be user as a parameter to bzr no ?

>
> 3. _write_file() writes file with unicode path.
> But if bzr can't encode the path collectly, external command can't
> open the file.

We rely on python to do the right thing from unicode paths there.

> So bzr should encode the path before _write_file().
>
> To test these situation:
>
> * Use fixed encoding in the test.

That's what the tests are doing for the user encoding.

> * Use filenames that contains chars that can't encode with the encoding.

Hmm, AFAIR, when we encounter problems there it's because some setting is wrong in the environment. As mentioned below, I couldn't automate the tests enough
to reach that point, hence my call for real-life tests :)

> * Replace subprocess.call and check arguments.

I don't want to do that, if the problem is in subprocess I will not be able to detect it.

That's the main point I dislike with the current tests, they don't exercise
subprocess.Popen, but I don't have an external diff tool I can call available
on all platforms.

Revision history for this message

Vincent Ladeuil (vila) wrote on 2010-04-21:

#

> We rely on python to do the right thing from unicode paths there.
i.e. encode using 'mbcs' as returned by getsystemfilecoding() on windows
and whatever is appropriate on other systems.

Revision history for this message

methane (songofacandy) wrote on 2010-04-22:

#

>> 1. osutils.get_user_encoding() is not good in rare case.
>> user_encoding is not equals to filesystem encoding.
>
>My assumption was that subprocess.Popen should receive its arguments as if
>they were coming from the command line: in user encoding.

OK, I don't know about when get_user_encoding() returns encoding that is not
suitable for commandline arguments.

>> 2. path.encode(enc) can raise UnicodeEncodeError.
>> So bzr should use "encode(enc, 'replace')"
>
>That will mean some path can't be encoded in user encoding which in turn means
>it could not be user as a parameter to bzr no ?

No. bzr uses unicode arguments in Windows. So users can pass unicode filename to bzr.
And "bzr diff --using=winmergeu.exe" doesn't requires file path but shows diff on
such a file.

>> 3. _write_file() writes file with unicode path.
>> But if bzr can't encode the path collectly, external command can't
>> open the file.
>
> We rely on python to do the right thing from unicode paths there.

On windows, unicode path is too perfect.
If unicode path contains chars that is not be able to encode,
we can't use the path in encoded string.
But we can create a file with such a name when using unicode.
So we should replace such a char before _write_file().

Revision history for this message

Vincent Ladeuil (vila) wrote on 2010-04-22:

#

>>>>> "Naoki" == INADA Naoki <email address hidden> writes:

    >>> 1. osutils.get_user_encoding() is not good in rare case.
    >>> user_encoding is not equals to filesystem encoding.
    >>
    >> My assumption was that subprocess.Popen should receive its arguments as if
    >> they were coming from the command line: in user encoding.

Naoki> OK, I don't know about when get_user_encoding() returns encoding that is not
Naoki> suitable for commandline arguments.

Good.

    >>> 2. path.encode(enc) can raise UnicodeEncodeError.
    >>> So bzr should use "encode(enc, 'replace')"
    >>
    >> That will mean some path can't be encoded in user encoding which
    >> in turn means it could not be user as a parameter to bzr no ?

Naoki> No. bzr uses unicode arguments in Windows.

No. bzr uses unicode paths internally, they are coming from the command
line and decoded with user encoding or from the bzr db where they are
encoding into utf-8.

Naoki> So users can pass unicode filename to bzr.

Encoded appropriately yes. They are decoded into unicode and encoded by
bzr or by python when access to the filesystem is needed at which point
the file system encoding is used (which may be different from the user
encoding).

Naoki> And "bzr diff --using=winmergeu.exe" doesn't requires file
Naoki> path but shows diff on such a file.

Sure and it may decode them using the *terminal* encoding for the
content of the diff but DiffFromTool has a parameter 'path_encoding' to
control that.

Note that this is still not the encoding we need to use to create any
needed file though.

    >>> 3. _write_file() writes file with unicode path.
    >>> But if bzr can't encode the path collectly, external command can't
    >>> open the file.
    >>
    >> We rely on python to do the right thing from unicode paths there.

    Naoki> On windows, unicode path is too perfect.
    Naoki> If unicode path contains chars that is not be able to encode,
    Naoki> we can't use the path in encoded string.

If we can't use it here, we can't use it either when creating the
working tree, so I'll tend to ignore the problem here.

Naoki> But we can create a file with such a name when using unicode.
Naoki> So we should replace such a char before _write_file().

I don't think so.

Can we chat about that on IRC or can you provide me a file name that
breaks my patch (preferably as a python string using the \xnnnn
notation) ?

>>>>> "Naoki" == INADA Naoki <songofacandy@gmail.com> writes:

>>> 1. osutils.get_user_encoding() is not good in rare case.
    >>> user_encoding is not equals to filesystem encoding.
    >> 
    >> My assumption was that subprocess.Popen should receive its arguments as if
    >> they were coming from the command line: in user encoding.

Naoki> OK, I don't know about when get_user_encoding() returns encoding that is not
    Naoki> suitable for commandline arguments.

Good.

>>> 2. path.encode(enc) can raise UnicodeEncodeError.
    >>> So bzr should use "encode(enc, 'replace')"
    >> 
    >> That will mean some path can't be encoded in user encoding which
    >> in turn means it could not be user as a parameter to bzr no ?

Naoki> No. bzr uses unicode arguments in Windows.

No. bzr uses unicode paths internally, they are coming from the command
line and decoded with user encoding or from the bzr db where they are
encoding into utf-8.

Naoki> So users can pass unicode filename to bzr.

Encoded appropriately yes. They are decoded into unicode and encoded by
bzr or by python when access to the filesystem is needed at which point
the file system encoding is used (which may be different from the user
encoding).

Naoki> And "bzr diff --using=winmergeu.exe" doesn't requires file
    Naoki> path but shows diff on such a file.

Sure and it may decode them using the *terminal* encoding for the
content of the diff but DiffFromTool has a parameter 'path_encoding' to
control that.

Note that this is still not the encoding we need to use to create any
needed file though.

>>> 3. _write_file() writes file with unicode path.
    >>> But if bzr can't encode the path collectly, external command can't
    >>> open the file.
    >> 
    >> We rely on python to do the right thing from unicode paths there.

Naoki> On windows, unicode path is too perfect.
    Naoki> If unicode path contains chars that is not be able to encode,
    Naoki> we can't use the path in encoded string.

If we can't use it here, we can't use it either when creating the
working tree, so I'll tend to ignore the problem here.

Naoki> But we can create a file with such a name when using unicode.
    Naoki> So we should replace such a char before _write_file().

I don't think so.

Can we chat about that on IRC or can you provide me a file name that
breaks my patch (preferably as a python string using the \xnnnn
notation) ?

Revision history for this message

Vincent Ladeuil (vila) wrote on 2010-04-22:

#

So it appears that I lack the windows knowledge (and the ability to tests)
to effectively help Naoki here.

The consensus seems to be that the problem is now to either:
- make subprocess.Popen accept unicode arguments (Naoki as a monkey-patch for that
but I find it a bit scary),
- bypass subprocess.Popen and use a direct win32 API (I'm out of my league on
this one)

John ? Alexander ? Can you give it a try ?

review: Abstain

Revision history for this message

Vincent Ladeuil (vila) wrote on 2010-04-29:

#

> John ? Alexander ? Can you give it a try ?

Or may be Gary since you're working in this area right now and have some way of testing it on windows ?

Revision history for this message

Alexander Belchenko (bialix) wrote on 2010-04-29:

#

> So it appears that I lack the windows knowledge (and the ability to tests)
> to effectively help Naoki here.
>
> The consensus seems to be that the problem is now to either:
> - make subprocess.Popen accept unicode arguments (Naoki as a monkey-patch for
> that
> but I find it a bit scary),
> - bypass subprocess.Popen and use a direct win32 API (I'm out of my league on
> this one)
>
> John ? Alexander ? Can you give it a try ?

Sorry, I'm a bit busy these days, have lost this ping. I can't promise I will look into it soon. But I'll try.

Revision history for this message

Gary van der Merwe (garyvdm) wrote on 2010-05-04:

#

Vincent Ladeuil wrote on 2010-04-21 :
> That's the main point I dislike with the current tests, they don't exercise
> subprocess.Popen, but I don't have an external diff tool I can call available
> on all platforms.

I've fixed that in this branch: lp:~garyvdm/bzr/523746-difftool-file-names

Those tests fail in wine for both Vincent's and INADA's branches.

- new_abs_path = self._prepare_files(file_id, old_path, new_path,
- allow_write_new=True,
- force_temp=True)[1]
- command = self._get_command(osutils.pathjoin('old', old_path),
- osutils.pathjoin('new', new_path))
+ old_abs_path, new_abs_path = self._prepare_files(
+ file_id, old_path, new_path,
+ allow_write_new=True,
+ force_temp=True)
+ command = self._get_command(old_abs_path, new_abs_path)

This is a nice simple change. Maybe we should merge this straight away, separately from the other changes.

- new_file = open(new_abs_path, 'r')
+ new_file = open(new_abs_path, 'rb')

Why is this changed?

Revision history for this message

methane (songofacandy) wrote on 2010-05-04:

#

> Vincent Ladeuil wrote on 2010-04-21 :
> > That's the main point I dislike with the current tests, they don't exercise
> > subprocess.Popen, but I don't have an external diff tool I can call
> available
> > on all platforms.
>
> I've fixed that in this branch: lp:~garyvdm/bzr/523746-difftool-file-names
>
> Those tests fail in wine for both Vincent's and INADA's branches.

I'll see the branch later.

> - new_abs_path = self._prepare_files(file_id, old_path, new_path,
> - allow_write_new=True,
> - force_temp=True)[1]
> - command = self._get_command(osutils.pathjoin('old', old_path),
> - osutils.pathjoin('new', new_path))
> + old_abs_path, new_abs_path = self._prepare_files(
> + file_id, old_path, new_path,
> + allow_write_new=True,
> + force_temp=True)
> + command = self._get_command(old_abs_path, new_abs_path)
>
> This is a nice simple change. Maybe we should merge this straight away,
> separately from the other changes.

I think so.

> - new_file = open(new_abs_path, 'r')
> + new_file = open(new_abs_path, 'rb')
>
> Why is this changed?

This change is made because I think bzr should use 'binary' mode in this point.
External diff tool may have edit feature and user may fix line ending. So bzr should keep
EOL bytes.

But, this change is not relating to #523746.

Revision history for this message

methane (songofacandy) wrote on 2011-05-06:

#

I'm ashamed to leave this so long.

> > Vincent Ladeuil wrote on 2010-04-21 :
> > > That's the main point I dislike with the current tests, they don't
> exercise
> > > subprocess.Popen, but I don't have an external diff tool I can call
> > available
> > > on all platforms.
> >
> > I've fixed that in this branch: lp:~garyvdm/bzr/523746-difftool-file-names
> >
> > Those tests fail in wine for both Vincent's and INADA's branches.
>
> I'll see the branch later.

Your test fails because my branch changes filename for it safely write
to the filename and pass the filename to subprocess.

I split filename creation from DiffFromTool._write_file and wrote test for it.
Please see it.

I've wrote test executing DiffFromTool but it doesn't tests real problem
because overriding sys.getfilesystemencoding() doesn't changes
Py_FileSystemDefaultEncoding in CPython core.

Revision history for this message

Vincent Ladeuil (vila) wrote on 2011-05-06:

#

Thanks for coming back songofacandy (If you explain me one last time which of INADA or Naoki I should use, I swear I won't forget anymore :-}) !

Once again, I'd like to defer to people with a better windows knowledge than myself in this specific area, I'll nominate explicit reviewers, please, one of you, review ! :) Or if you can't just abstain ;)

Revision history for this message

Vincent Ladeuil (vila) wrote on 2011-05-06:

#

My understanding was that we couldn't arbitrarily try to encode/decode both paths and other arguments. I may be right in general and wrong in this particular case where mbcs may be the silver bullet. That's why I'm deferring to people knowing better than me.

Revision history for this message

John A Meinel (jameinel) wrote on 2011-05-06:

#

Download full text (3.2 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/06/2011 03:29 AM, INADA Naoki wrote:
> INADA Naoki has proposed merging lp:~songofacandy/bzr/fix-523746-dev into lp:bzr.
>
> Requested reviews:
> Vincent Ladeuil (vila)
> bzr-core (bzr-core)
> Related bugs:
> Bug #523746 in Bazaar: "crash with winmerge in cp932 japanese character"
> https://bugs.launchpad.net/bzr/+bug/523746
>
> For more details, see:
> https://code.launchpad.net/~songofacandy/bzr/fix-523746-dev/+merge/20537

+ def _safe_filename(self, prefix, relpath):
+ if sys.platform == 'win32':
+ fenc = 'mbcs'
+ else:
+ fenc = sys.getfilesystemencoding() or 'ascii'
+ # encoded_str.replace('?', '_') may break multibyte char.
+ # So we should encode, decode, then replace(u'?', u'_')
+ relpath_tmp = relpath.encode(fenc, 'replace').decode(fenc,
'replace')
+ relpath_tmp = relpath_tmp.replace(u'?', u'_')
+ return osutils.pathjoin(self._root, prefix, relpath_tmp)

Why are we using "encode('replace')" rather than "encode('ignore')". It
doesn't seem to buy us much to be putting "_" into the filename, rather
than just writing out whatever letters we can.

I also don't understand why you do the encode/decode dance and the put a
munged Unicode string back into the path. I think I sort of understand,
but a docstring would probably help a lot at understanding why we need
to use this function.

...

+ command = [AtTemplate(t).substitute(my_map) for t in
+ self.command_template]
+ if sys.platform == 'win32': # Popen doesn't accept unicode on win32
+ command_encoded = []
+ for c in command:
+ if isinstance(c, unicode):
+ command_encoded.append(c.encode('mbcs'))
+ else:
+ command_encoded.append(c)
+ return command_encoded
+ else:
+ return command

^- I'm not sure whether c.encode('mbcs') can fail, or whether it will
return '?' for unusable characters. For example, I can create Arabic or
Japanese filenames (since the filesystem is UTF-16) even though 'mbcs'
is only able to handle ~latin-1. Any thoughts on how to handle that
cleanly? We don't have to change Popen to support Unicode (though that
would be ideal), but at least something that would give users helpful
feedback, rather than just tracebacks.

+ def test_encodable_filename(self):
+ import sys
+ diffobj = diff.DiffFromTool(['python', '@old_path', '@new_path'],
+ None, None, None)

On *my* system, there is no 'python' (I never add it to sys.path, as I
happen to have 2.4/5/6/7 all installed.) There is 'sys.executable', though.

diffobj = diff.DiffFromTool([sys.executable, ...])

Also, we probably want to skip this test if "sys.frozen" is set, since I
don't think we can run it from 'bzr.exe'. I think we already had a
feature for that.

John
=:->

review: needsfixing

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk3EA3oACgkQJdeBCYSNAAMYNgCeJIIEWRF1Dg/OLWQ5kRDCI6EP
W0cAoID...

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 05/06/2011 03:29 AM, INADA Naoki wrote:
> > INADA Naoki has proposed merging lp:~songofacandy/bzr/fix-523746-dev into
> lp:bzr.
> >
> > Requested reviews:
> >   Vincent Ladeuil (vila)
> >   bzr-core (bzr-core)
> > Related bugs:
> >   Bug #523746 in Bazaar: "crash with winmerge in cp932 japanese character"
> >   https://bugs.launchpad.net/bzr/+bug/523746
> >
> > For more details, see:
> > https://code.launchpad.net/~songofacandy/bzr/fix-523746-dev/+merge/20537
> 
> 
> +    def _safe_filename(self, prefix, relpath):
> +        if sys.platform == 'win32':
> +            fenc = 'mbcs'
> +        else:
> +            fenc = sys.getfilesystemencoding() or 'ascii'
> +        # encoded_str.replace('?', '_') may break multibyte char.
> +        # So we should encode, decode, then replace(u'?', u'_')
> +        relpath_tmp = relpath.encode(fenc, 'replace').decode(fenc,
> 'replace')
> +        relpath_tmp = relpath_tmp.replace(u'?', u'_')
> +        return osutils.pathjoin(self._root, prefix, relpath_tmp)
> 
> Why are we using "encode('replace')" rather than "encode('ignore')". It
> doesn't seem to buy us much to be putting "_" into the filename, rather
> than just writing out whatever letters we can.

Because "encode('ignore')" may makes null string.
Fallback to other string is also OK, but I don't have nice name for fallback.

> I also don't understand why you do the encode/decode dance and the put a
> munged Unicode string back into the path. I think I sort of understand,
> but a docstring would probably help a lot at understanding why we need
> to use this function.

OK, I'll add docstrings.

> 
> ...
> 
> +        command = [AtTemplate(t).substitute(my_map) for t in
> +                   self.command_template]
> +        if sys.platform == 'win32': # Popen doesn't accept unicode on win32
> +            command_encoded = []
> +            for c in command:
> +                if isinstance(c, unicode):
> +                    command_encoded.append(c.encode('mbcs'))
> +                else:
> +                    command_encoded.append(c)
> +            return command_encoded
> +        else:
> +            return command
> 
> ^- I'm not sure whether c.encode('mbcs') can fail, or whether it will
> return '?' for unusable characters. For example, I can create Arabic or
> Japanese filenames (since the filesystem is UTF-16) even though 'mbcs'
> is only able to handle ~latin-1. Any thoughts on how to handle that
> cleanly? We don't have to change Popen to support Unicode (though that
> would be ideal), but at least something that would give users helpful
> feedback, rather than just tracebacks.

c.encode('mbcs') may fail on very rare circumstance. For example, path of
diff tool or temporary directory is not be able to encode with 'mbcs'.
I don't have any idea what should we do on that circumstance.

But filepath on commandline should not be fail decoding unlsess
decode('mbcs').encode('mbcs') can be fail.

> +    def test_encodable_filename(self):
> +        import sys
> +        diffobj = diff.DiffFromTool(['python', '@old_path', '@new_path'],
> +                                    None, None, None)
> 
> 
> On *my* system, there is no 'python' (I never add it to sys.path, as I
> happen to have 2.4/5/6/7 all installed.) There is 'sys.executable', though.
> 
> diffobj = diff.DiffFromTool([sys.executable, ...])
> 
> Also, we probably want to skip this test if "sys.frozen" is set, since I
> don't think we can run it from 'bzr.exe'. I think we already had a
> feature for that.

It doesn't execute any commands. It just builds file names.
So I'll change 'python' to 'dummy'.

Revision history for this message

methane (songofacandy) wrote on 2011-05-06:

#

Done

Revision history for this message

John A Meinel (jameinel) wrote on 2011-05-12:

#

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 5/6/2011 8:28 PM, INADA Naoki wrote:
> INADA Naoki has proposed merging lp:~songofacandy/bzr/fix-523746-dev into lp:bzr.
>
> Requested reviews:
> John A Meinel (jameinel)
> Alexander Belchenko (bialix)
> Gary van der Merwe (garyvdm)
> Martin [gz] (gz)
> Vincent Ladeuil (vila)
> bzr-core (bzr-core)
> Related bugs:
> Bug #523746 in Bazaar: "crash with winmerge in cp932 japanese character"
> https://bugs.launchpad.net/bzr/+bug/523746
>
> For more details, see:
> https://code.launchpad.net/~songofacandy/bzr/fix-523746-dev/+merge/20537

Can you give a little bit more background in your merge proposal? I know
there is a bug linked, but describing what you are doing makes it easier
to understand the diff.

review: needsinfo

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk3Liw0ACgkQJdeBCYSNAAOxzACguDKS9ztLjW6NJ54ii0fWIffL
dgkAoJ8kAv8ws8Atxw+O4d4o3ke54TMa
=u/ie
-----END PGP SIGNATURE-----

review: Needs Information

Revision history for this message

methane (songofacandy) wrote on 2011-05-12:

#

OK.

1. On Windows.
Some diff application doesn't receive command line argument as a UTF-16 string
but as a string encoded with current codepage.
In this situation, passing Unicode argument to child process may fail
when the file path
contains character that is not be able to encode with current codepage.
So, we should use only chars those are able to encode with current codepage as a
temporary file.

2. On Unix.
All application may receive commandline argument as just bytes. So we can pass
any path as a argument.
But, GUI diff tool may be not able to show such filepath. So we should
use string
that is able to encode with current locale as a temporary file, too.

My fix removes chars those are not able to encode with current codepage/locale.
And force to use temporary file when checkouted file contains such chars as a
path.

On Thu, May 12, 2011 at 4:23 PM, John Arbash Meinel
<email address hidden> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 5/6/2011 8:28 PM, INADA Naoki wrote:
>> INADA Naoki has proposed merging lp:~songofacandy/bzr/fix-523746-dev into lp:bzr.
>>
>> Requested reviews:
>> John A Meinel (jameinel)
>> Alexander Belchenko (bialix)
>> Gary van der Merwe (garyvdm)
>> Martin [gz] (gz)
>> Vincent Ladeuil (vila)
>> bzr-core (bzr-core)
>> Related bugs:
>> Bug #523746 in Bazaar: "crash with winmerge in cp932 japanese character"
>> https://bugs.launchpad.net/bzr/+bug/523746
>>
>> For more details, see:
>> https://code.launchpad.net/~songofacandy/bzr/fix-523746-dev/+merge/20537
>
> Can you give a little bit more background in your merge proposal? I know
> there is a bug linked, but describing what you are doing makes it easier
> to understand the diff.
>
> review: needsinfo
>
> John
> =:->
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (Cygwin)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAk3Liw0ACgkQJdeBCYSNAAOxzACguDKS9ztLjW6NJ54ii0fWIffL
> dgkAoJ8kAv8ws8Atxw+O4d4o3ke54TMa
> =u/ie
> -----END PGP SIGNATURE-----
>

--
INADA Naoki <email address hidden>

OK.

1. On Windows.
Some diff application doesn't receive command line argument as a UTF-16 string
but as a string encoded with current codepage.
In this situation, passing Unicode argument to child process may fail
when the file path
contains character that is not be able to encode with current codepage.
So, we should use only chars those are able to encode with current codepage as a
temporary file.

2. On Unix.
All application may receive commandline argument as just bytes. So we can pass
any path as a argument.
But, GUI diff tool may be not able to show such filepath. So we should
use string
that is able to encode with current locale as a temporary file, too.

My fix removes chars those are not able to encode with current codepage/locale.
And force to use temporary file when checkouted file contains such chars as a
path.

On Thu, May 12, 2011 at 4:23 PM, John Arbash Meinel
<john@arbash-meinel.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 5/6/2011 8:28 PM, INADA Naoki wrote:
>> INADA Naoki has proposed merging lp:~songofacandy/bzr/fix-523746-dev into lp:bzr.
>>
>> Requested reviews:
>>   John A Meinel (jameinel)
>>   Alexander Belchenko (bialix)
>>   Gary van der Merwe (garyvdm)
>>   Martin [gz] (gz)
>>   Vincent Ladeuil (vila)
>>   bzr-core (bzr-core)
>> Related bugs:
>>   Bug #523746 in Bazaar: "crash with winmerge in cp932 japanese character"
>>   https://bugs.launchpad.net/bzr/+bug/523746
>>
>> For more details, see:
>> https://code.launchpad.net/~songofacandy/bzr/fix-523746-dev/+merge/20537
>
> Can you give a little bit more background in your merge proposal? I know
> there is a bug linked, but describing what you are doing makes it easier
> to understand the diff.
>
>  review: needsinfo
>
> John
> =:->
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (Cygwin)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAk3Liw0ACgkQJdeBCYSNAAOxzACguDKS9ztLjW6NJ54ii0fWIffL
> dgkAoJ8kAv8ws8Atxw+O4d4o3ke54TMa
> =u/ie
> -----END PGP SIGNATURE-----
>

-- 
INADA Naoki  <songofacandy@gmail.com>

Revision history for this message

John A Meinel (jameinel) wrote on 2011-05-16:

#

Seems fine to me. I don't really like the encode().decode() stuff, but I'll live with it.

review: Approve

Revision history for this message

Martin Pool (mbp) wrote on 2011-05-18:

#

sent to pqm by email

 === modified file 'bzrlib/diff.py'
 --- bzrlib/diff.py	2011-04-07 10:36:24 +0000
 +++ bzrlib/diff.py	2011-05-06 17:43:35 +0000
@@ -745,8 +745,18 @@
      def _get_command(self, old_path, new_path):
          my_map = {'old_path': old_path, 'new_path': new_path}
--        return [AtTemplate(t).substitute(my_map) for t in
--                self.command_template]
++        command = [AtTemplate(t).substitute(my_map) for t in
++                   self.command_template]
++        if sys.platform == 'win32': # Popen doesn't accept unicode on win32
++            command_encoded = []
++            for c in command:
++                if isinstance(c, unicode):
++                    command_encoded.append(c.encode('mbcs'))
++                else:
++                    command_encoded.append(c)
++            return command_encoded
++        else:
++            return command
      def _execute(self, old_path, new_path):
          command = self._get_command(old_path, new_path)
@@ -772,12 +782,42 @@
                  raise
          return True
++    @staticmethod
++    def _fenc():
++        """Returns safe encoding for passing file path to diff tool"""
++        if sys.platform == 'win32':
++            return 'mbcs'
++        else:
++            # Don't fallback to 'utf-8' because subprocess may not be able to
++            # handle utf-8 correctly when locale is not utf-8.
++            return sys.getfilesystemencoding() or 'ascii'
++
++    def _is_safepath(self, path):
++        """Return true if `path` may be able to pass to subprocess."""
++        fenc = self._fenc()
++        try:
++            return path == path.encode(fenc).decode(fenc)
++        except UnicodeError:
++            return False
++
++    def _safe_filename(self, prefix, relpath):
++        """Replace unsafe character in `relpath` then join `self._root`,
++        `prefix` and `relpath`."""
++        fenc = self._fenc()
++        # encoded_str.replace('?', '_') may break multibyte char.
++        # So we should encode, decode, then replace(u'?', u'_')
++        relpath_tmp = relpath.encode(fenc, 'replace').decode(fenc, 'replace')
++        relpath_tmp = relpath_tmp.replace(u'?', u'_')
++        return osutils.pathjoin(self._root, prefix, relpath_tmp)
++
      def _write_file(self, file_id, tree, prefix, relpath, force_temp=False,
                      allow_write=False):
          if not force_temp and isinstance(tree, WorkingTree):
--            return tree.abspath(tree.id2path(file_id))
--
--        full_path = osutils.pathjoin(self._root, prefix, relpath)
++            full_path = tree.abspath(tree.id2path(file_id))
++            if self._is_safepath(full_path):
++                return full_path
++
++        full_path = self._safe_filename(prefix, relpath)
          if not force_temp and self._try_symlink_root(tree, prefix):
              return full_path
          parent_dir = osutils.dirname(full_path)
@@ -840,13 +880,13 @@
          """
          old_path = self.old_tree.id2path(file_id)
          new_path = self.new_tree.id2path(file_id)
--        new_abs_path = self._prepare_files(file_id, old_path, new_path,
--                                           allow_write_new=True,
--                                           force_temp=True)[1]
--        command = self._get_command(osutils.pathjoin('old', old_path),
--                                    osutils.pathjoin('new', new_path))
++        old_abs_path, new_abs_path = self._prepare_files(
++                                            file_id, old_path, new_path,
++                                            allow_write_new=True,
++                                            force_temp=True)
++        command = self._get_command(old_abs_path, new_abs_path)
          subprocess.call(command, cwd=self._root)
--        new_file = open(new_abs_path, 'r')
++        new_file = open(new_abs_path, 'rb')
          try:
              return new_file.read()
          finally:
 === modified file 'bzrlib/tests/test_diff.py'
 --- bzrlib/tests/test_diff.py	2011-04-18 23:19:16 +0000
 +++ bzrlib/tests/test_diff.py	2011-05-06 17:43:35 +0000
@@ -33,7 +33,7 @@
      transform,
+     )
  from bzrlib.symbol_versioning import deprecated_in
--from bzrlib.tests import features
++from bzrlib.tests import features, EncodingAdapter
  from bzrlib.tests.blackbox.test_diff import subst_dates
@@ -1421,6 +1421,52 @@
          diff_obj._prepare_files('file2-id', 'oldname2', 'newname2')
++class TestDiffFromToolEncodedFilename(tests.TestCaseWithTransport):
++
++    def test_encodable_filename(self):
++        # Just checks file path for external diff tool.
++        # We cannot change CPython's internal encoding used by os.exec*.
++        import sys
++        diffobj = diff.DiffFromTool(['dummy', '@old_path', '@new_path'],
++                                    None, None, None)
++        for _, scenario in EncodingAdapter.encoding_scenarios:
++            encoding = scenario['encoding']
++            dirname  = scenario['info']['directory']
++            filename = scenario['info']['filename']
++
++            self.overrideAttr(diffobj, '_fenc', lambda: encoding)
++            relpath = dirname + u'/' + filename
++            fullpath = diffobj._safe_filename('safe', relpath)
++            self.assertEqual(
++                    fullpath,
++                    fullpath.encode(encoding).decode(encoding)
++                    )
++            self.assert_(fullpath.startswith(diffobj._root + '/safe'))
++
++    def test_unencodable_filename(self):
++        import sys
++        diffobj = diff.DiffFromTool(['dummy', '@old_path', '@new_path'],
++                                    None, None, None)
++        for _, scenario in EncodingAdapter.encoding_scenarios:
++            encoding = scenario['encoding']
++            dirname  = scenario['info']['directory']
++            filename = scenario['info']['filename']
++
++            if encoding == 'iso-8859-1':
++                encoding = 'iso-8859-2'
++            else:
++                encoding = 'iso-8859-1'
++
++            self.overrideAttr(diffobj, '_fenc', lambda: encoding)
++            relpath = dirname + u'/' + filename
++            fullpath = diffobj._safe_filename('safe', relpath)
++            self.assertEqual(
++                    fullpath,
++                    fullpath.encode(encoding).decode(encoding)
++                    )
++            self.assert_(fullpath.startswith(diffobj._root + '/safe'))
++
++
  class TestGetTreesAndBranchesToDiffLocked(tests.TestCaseWithTransport):
      def call_gtabtd(self, path_list, revision_specs, old_url, new_url):

Bazaar

Merge lp:~songofacandy/bzr/fix-523746-dev into lp:bzr

Commit message

Description of the change

Preview Diff

Subscribers