Code review comment for lp:~songofacandy/bzr/fix-523746-dev

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/06/2011 03:29 AM, INADA Naoki wrote:
> INADA Naoki has proposed merging lp:~songofacandy/bzr/fix-523746-dev into lp:bzr.
>
> Requested reviews:
> Vincent Ladeuil (vila)
> bzr-core (bzr-core)
> Related bugs:
> Bug #523746 in Bazaar: "crash with winmerge in cp932 japanese character"
> https://bugs.launchpad.net/bzr/+bug/523746
>
> For more details, see:
> https://code.launchpad.net/~songofacandy/bzr/fix-523746-dev/+merge/20537

+ def _safe_filename(self, prefix, relpath):
+ if sys.platform == 'win32':
+ fenc = 'mbcs'
+ else:
+ fenc = sys.getfilesystemencoding() or 'ascii'
+ # encoded_str.replace('?', '_') may break multibyte char.
+ # So we should encode, decode, then replace(u'?', u'_')
+ relpath_tmp = relpath.encode(fenc, 'replace').decode(fenc,
'replace')
+ relpath_tmp = relpath_tmp.replace(u'?', u'_')
+ return osutils.pathjoin(self._root, prefix, relpath_tmp)

Why are we using "encode('replace')" rather than "encode('ignore')". It
doesn't seem to buy us much to be putting "_" into the filename, rather
than just writing out whatever letters we can.

I also don't understand why you do the encode/decode dance and the put a
munged Unicode string back into the path. I think I sort of understand,
but a docstring would probably help a lot at understanding why we need
to use this function.

...

+ command = [AtTemplate(t).substitute(my_map) for t in
+ self.command_template]
+ if sys.platform == 'win32': # Popen doesn't accept unicode on win32
+ command_encoded = []
+ for c in command:
+ if isinstance(c, unicode):
+ command_encoded.append(c.encode('mbcs'))
+ else:
+ command_encoded.append(c)
+ return command_encoded
+ else:
+ return command

^- I'm not sure whether c.encode('mbcs') can fail, or whether it will
return '?' for unusable characters. For example, I can create Arabic or
Japanese filenames (since the filesystem is UTF-16) even though 'mbcs'
is only able to handle ~latin-1. Any thoughts on how to handle that
cleanly? We don't have to change Popen to support Unicode (though that
would be ideal), but at least something that would give users helpful
feedback, rather than just tracebacks.

+ def test_encodable_filename(self):
+ import sys
+ diffobj = diff.DiffFromTool(['python', '@old_path', '@new_path'],
+ None, None, None)

On *my* system, there is no 'python' (I never add it to sys.path, as I
happen to have 2.4/5/6/7 all installed.) There is 'sys.executable', though.

diffobj = diff.DiffFromTool([sys.executable, ...])

Also, we probably want to skip this test if "sys.frozen" is set, since I
don't think we can run it from 'bzr.exe'. I think we already had a
feature for that.

John
=:->

 review: needsfixing

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk3EA3oACgkQJdeBCYSNAAMYNgCeJIIEWRF1Dg/OLWQ5kRDCI6EP
W0cAoIDOG6NTQVPXNh0PrjFUUyoTcble
=vNIg
-----END PGP SIGNATURE-----

review: Needs Fixing

« Back to merge proposal