Code review comment for lp:~songofacandy/bzr/fix-523746-dev

Revision history for this message
methane (songofacandy) wrote :

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 05/06/2011 03:29 AM, INADA Naoki wrote:
> > INADA Naoki has proposed merging lp:~songofacandy/bzr/fix-523746-dev into
> lp:bzr.
> >
> > Requested reviews:
> > Vincent Ladeuil (vila)
> > bzr-core (bzr-core)
> > Related bugs:
> > Bug #523746 in Bazaar: "crash with winmerge in cp932 japanese character"
> > https://bugs.launchpad.net/bzr/+bug/523746
> >
> > For more details, see:
> > https://code.launchpad.net/~songofacandy/bzr/fix-523746-dev/+merge/20537
>
>
> + def _safe_filename(self, prefix, relpath):
> + if sys.platform == 'win32':
> + fenc = 'mbcs'
> + else:
> + fenc = sys.getfilesystemencoding() or 'ascii'
> + # encoded_str.replace('?', '_') may break multibyte char.
> + # So we should encode, decode, then replace(u'?', u'_')
> + relpath_tmp = relpath.encode(fenc, 'replace').decode(fenc,
> 'replace')
> + relpath_tmp = relpath_tmp.replace(u'?', u'_')
> + return osutils.pathjoin(self._root, prefix, relpath_tmp)
>
> Why are we using "encode('replace')" rather than "encode('ignore')". It
> doesn't seem to buy us much to be putting "_" into the filename, rather
> than just writing out whatever letters we can.

Because "encode('ignore')" may makes null string.
Fallback to other string is also OK, but I don't have nice name for fallback.

> I also don't understand why you do the encode/decode dance and the put a
> munged Unicode string back into the path. I think I sort of understand,
> but a docstring would probably help a lot at understanding why we need
> to use this function.

OK, I'll add docstrings.

>
> ...
>
> + command = [AtTemplate(t).substitute(my_map) for t in
> + self.command_template]
> + if sys.platform == 'win32': # Popen doesn't accept unicode on win32
> + command_encoded = []
> + for c in command:
> + if isinstance(c, unicode):
> + command_encoded.append(c.encode('mbcs'))
> + else:
> + command_encoded.append(c)
> + return command_encoded
> + else:
> + return command
>
> ^- I'm not sure whether c.encode('mbcs') can fail, or whether it will
> return '?' for unusable characters. For example, I can create Arabic or
> Japanese filenames (since the filesystem is UTF-16) even though 'mbcs'
> is only able to handle ~latin-1. Any thoughts on how to handle that
> cleanly? We don't have to change Popen to support Unicode (though that
> would be ideal), but at least something that would give users helpful
> feedback, rather than just tracebacks.

c.encode('mbcs') may fail on very rare circumstance. For example, path of
diff tool or temporary directory is not be able to encode with 'mbcs'.
I don't have any idea what should we do on that circumstance.

But filepath on commandline should not be fail decoding unlsess
decode('mbcs').encode('mbcs') can be fail.

> + def test_encodable_filename(self):
> + import sys
> + diffobj = diff.DiffFromTool(['python', '@old_path', '@new_path'],
> + None, None, None)
>
>
> On *my* system, there is no 'python' (I never add it to sys.path, as I
> happen to have 2.4/5/6/7 all installed.) There is 'sys.executable', though.
>
> diffobj = diff.DiffFromTool([sys.executable, ...])
>
> Also, we probably want to skip this test if "sys.frozen" is set, since I
> don't think we can run it from 'bzr.exe'. I think we already had a
> feature for that.

It doesn't execute any commands. It just builds file names.
So I'll change 'python' to 'dummy'.

« Back to merge proposal