Merge lp:~ericmoritz/bzr/488519-walkdirs-encoding-bug into lp:bzr

Proposed by Eric Moritz
Status: Merged
Approved by: Vincent Ladeuil
Approved revision: no longer in the source branch.
Merged at revision: 5302
Proposed branch: lp:~ericmoritz/bzr/488519-walkdirs-encoding-bug
Merge into: lp:bzr
Diff against target: 194 lines (+74/-13)
4 files modified
bzrlib/errors.py (+3/-3)
bzrlib/osutils.py (+25/-9)
bzrlib/tests/__init__.py (+11/-0)
bzrlib/tests/test_osutils.py (+35/-1)
To merge this branch: bzr merge lp:~ericmoritz/bzr/488519-walkdirs-encoding-bug
Reviewer Review Type Date Requested Status
Martin Packman (community) Approve
John A Meinel Approve
Martin Pool Pending
Eric Moritz Pending
Review via email: mp+27006@code.launchpad.net

This proposal supersedes a proposal from 2010-06-08.

Description of the change

A fix for bug #488519

To post a comment you must log in.
Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> You have been requested to review the proposed merge of lp:~ericmoritz/bzr/488519-walkdirs-encoding-bug into lp:bzr.
>
> A fix for bug #488519
>
>

This is the 'walkdirs' code and not the '_walkdirs_utf8' code, the
latter being the more performance critical code. And he is correct that
we are already casting everything to Unicode, except for what Python
cannot cast based on fs_encoding. And the only way we can determine that
is by checking the type of the string. As such, I think his change is
reasonable.

I will note that:

a) I prefer "if type(filename) is unicode" versus the isinstance check,
but it shouldn't be a huge difference. (isinstance has to check the
inheritance heirarchy, but I would assume that being exact 99.9% of the
time means it would still be fast.)

b) I expect that if we get a string, then the decode will fail. That is
sort of the crux of the problem (python listdir() failed to decode it,
so it just returns a string, I don't know why we would expect to find
differently.)

c) It would be better to put ByteStreamedNamedFilesystem into
bzrlib/tests/features.py and call it "byte_stream_named_filesystem". But
maybe not since the other ones like UTF8Filesystem are still in this
location.

The rest seems ok as long as the other reviewers are happy.

 review: approve
John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkwOn90ACgkQJdeBCYSNAAObeACgqq60DJ3Jy3T3MnZ4ZSERL4gK
X4YAoMH9WydE+dCoYL1c4MEmEQzie42y
=Ctuq
-----END PGP SIGNATURE-----

review: Approve
Revision history for this message
Martin Packman (gz) wrote :

There's still bzrlib.errors spurious trailing whitespace change, and now there are more in bzrlib.osutils as well...

Looks to me like all the substantive issues have been addressed though, let's try this out.

review: Approve
Revision history for this message
Eric Moritz (ericmoritz) wrote :

Ok I'll delete the trailing whitespace and resubmit

Revision history for this message
Martin Packman (gz) wrote :

> Ok I'll delete the trailing whitespace and resubmit

You don't need to resubmit for every minor change, just push the new revision up to you branch and launchpad will update this merge proposal and the diff.

Revision history for this message
Eric Moritz (ericmoritz) wrote :

Ok, I've applied the new suggestion

Revision history for this message
Vincent Ladeuil (vila) wrote :

The NEWS entry was missing, I added it and submitted to pqm.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'bzrlib/errors.py'
--- bzrlib/errors.py 2010-05-28 06:04:57 +0000
+++ bzrlib/errors.py 2010-06-11 01:51:29 +0000
@@ -1923,12 +1923,12 @@
1923class CantMoveRoot(BzrError):1923class CantMoveRoot(BzrError):
19241924
1925 _fmt = "Moving the root directory is not supported at this time"1925 _fmt = "Moving the root directory is not supported at this time"
1926 1926
1927 1927
1928class TransformRenameFailed(BzrError):1928class TransformRenameFailed(BzrError):
19291929
1930 _fmt = "Failed to rename %(from_path)s to %(to_path)s: %(why)s"1930 _fmt = "Failed to rename %(from_path)s to %(to_path)s: %(why)s"
1931 1931
1932 def __init__(self, from_path, to_path, why, errno):1932 def __init__(self, from_path, to_path, why, errno):
1933 self.from_path = from_path1933 self.from_path = from_path
1934 self.to_path = to_path1934 self.to_path = to_path
19351935
=== modified file 'bzrlib/osutils.py'
--- bzrlib/osutils.py 2010-05-27 17:59:18 +0000
+++ bzrlib/osutils.py 2010-06-11 01:51:29 +0000
@@ -931,7 +931,7 @@
931931
932def parent_directories(filename):932def parent_directories(filename):
933 """Return the list of parent directories, deepest first.933 """Return the list of parent directories, deepest first.
934 934
935 For example, parent_directories("a/b/c") -> ["a/b", "a"].935 For example, parent_directories("a/b/c") -> ["a/b", "a"].
936 """936 """
937 parents = []937 parents = []
@@ -961,7 +961,7 @@
961 # NB: This docstring is just an example, not a doctest, because doctest961 # NB: This docstring is just an example, not a doctest, because doctest
962 # currently can't cope with the use of lazy imports in this namespace --962 # currently can't cope with the use of lazy imports in this namespace --
963 # mbp 20090729963 # mbp 20090729
964 964
965 # This currently doesn't report the failure at the time it occurs, because965 # This currently doesn't report the failure at the time it occurs, because
966 # they tend to happen very early in startup when we can't check config966 # they tend to happen very early in startup when we can't check config
967 # files etc, and also we want to report all failures but not spam the user967 # files etc, and also we want to report all failures but not spam the user
@@ -1037,8 +1037,8 @@
10371037
10381038
1039def delete_any(path):1039def delete_any(path):
1040 """Delete a file, symlink or directory. 1040 """Delete a file, symlink or directory.
1041 1041
1042 Will delete even if readonly.1042 Will delete even if readonly.
1043 """1043 """
1044 try:1044 try:
@@ -1233,6 +1233,22 @@
1233 # but for now, we haven't optimized...1233 # but for now, we haven't optimized...
1234 return [canonical_relpath(base, p) for p in paths]1234 return [canonical_relpath(base, p) for p in paths]
12351235
1236
1237def decode_filename(filename):
1238 """Decode the filename using the filesystem encoding
1239
1240 If it is unicode, it is returned.
1241 Otherwise it is decoded from the the filesystem's encoding. If decoding
1242 fails, a errors.BadFilenameEncoding exception is raised.
1243 """
1244 if type(filename) is unicode:
1245 return filename
1246 try:
1247 return filename.decode(_fs_enc)
1248 except UnicodeDecodeError:
1249 raise errors.BadFilenameEncoding(filename, _fs_enc)
1250
1251
1236def safe_unicode(unicode_or_utf8_string):1252def safe_unicode(unicode_or_utf8_string):
1237 """Coerce unicode_or_utf8_string into unicode.1253 """Coerce unicode_or_utf8_string into unicode.
12381254
@@ -1644,7 +1660,7 @@
1644 dirblock = []1660 dirblock = []
1645 append = dirblock.append1661 append = dirblock.append
1646 try:1662 try:
1647 names = sorted(_listdir(top))1663 names = sorted(map(decode_filename, _listdir(top)))
1648 except OSError, e:1664 except OSError, e:
1649 if not _is_error_enotdir(e):1665 if not _is_error_enotdir(e):
1650 raise1666 raise
@@ -2020,14 +2036,14 @@
20202036
2021def send_all(sock, bytes, report_activity=None):2037def send_all(sock, bytes, report_activity=None):
2022 """Send all bytes on a socket.2038 """Send all bytes on a socket.
2023 2039
2024 Breaks large blocks in smaller chunks to avoid buffering limitations on2040 Breaks large blocks in smaller chunks to avoid buffering limitations on
2025 some platforms, and catches EINTR which may be thrown if the send is2041 some platforms, and catches EINTR which may be thrown if the send is
2026 interrupted by a signal.2042 interrupted by a signal.
20272043
2028 This is preferred to socket.sendall(), because it avoids portability bugs2044 This is preferred to socket.sendall(), because it avoids portability bugs
2029 and provides activity reporting.2045 and provides activity reporting.
2030 2046
2031 :param report_activity: Call this as bytes are read, see2047 :param report_activity: Call this as bytes are read, see
2032 Transport._report_activity2048 Transport._report_activity
2033 """2049 """
@@ -2121,7 +2137,7 @@
21212137
2122def until_no_eintr(f, *a, **kw):2138def until_no_eintr(f, *a, **kw):
2123 """Run f(*a, **kw), retrying if an EINTR error occurs.2139 """Run f(*a, **kw), retrying if an EINTR error occurs.
2124 2140
2125 WARNING: you must be certain that it is safe to retry the call repeatedly2141 WARNING: you must be certain that it is safe to retry the call repeatedly
2126 if EINTR does occur. This is typically only true for low-level operations2142 if EINTR does occur. This is typically only true for low-level operations
2127 like os.read. If in any doubt, don't use this.2143 like os.read. If in any doubt, don't use this.
@@ -2258,7 +2274,7 @@
2258if sys.platform == 'win32':2274if sys.platform == 'win32':
2259 def open_file(filename, mode='r', bufsize=-1):2275 def open_file(filename, mode='r', bufsize=-1):
2260 """This function is used to override the ``open`` builtin.2276 """This function is used to override the ``open`` builtin.
2261 2277
2262 But it uses O_NOINHERIT flag so the file handle is not inherited by2278 But it uses O_NOINHERIT flag so the file handle is not inherited by
2263 child processes. Deleting or renaming a closed file opened with this2279 child processes. Deleting or renaming a closed file opened with this
2264 function is not blocking child processes.2280 function is not blocking child processes.
22652281
=== modified file 'bzrlib/tests/__init__.py'
--- bzrlib/tests/__init__.py 2010-05-25 17:27:52 +0000
+++ bzrlib/tests/__init__.py 2010-06-11 01:51:29 +0000
@@ -4345,6 +4345,17 @@
4345UnicodeFilename = _UnicodeFilename()4345UnicodeFilename = _UnicodeFilename()
43464346
43474347
4348class _ByteStringNamedFilesystem(Feature):
4349 """Is the filesystem based on bytes?"""
4350
4351 def _probe(self):
4352 if os.name == "posix":
4353 return True
4354 return False
4355
4356ByteStringNamedFilesystem = _ByteStringNamedFilesystem()
4357
4358
4348class _UTF8Filesystem(Feature):4359class _UTF8Filesystem(Feature):
4349 """Is the filesystem UTF-8?"""4360 """Is the filesystem UTF-8?"""
43504361
43514362
=== modified file 'bzrlib/tests/test_osutils.py'
--- bzrlib/tests/test_osutils.py 2010-05-27 17:59:18 +0000
+++ bzrlib/tests/test_osutils.py 2010-06-11 01:51:29 +0000
@@ -1083,6 +1083,40 @@
1083 # Ensure the message contains the file name1083 # Ensure the message contains the file name
1084 self.assertContainsRe(str(e), "\./test-unreadable")1084 self.assertContainsRe(str(e), "\./test-unreadable")
10851085
1086
1087 def test_walkdirs_encoding_error(self):
1088 # <https://bugs.launchpad.net/bzr/+bug/488519>
1089 # walkdirs didn't raise a useful message when the filenames
1090 # are not using the filesystem's encoding
1091
1092 # require a bytestring based filesystem
1093 self.requireFeature(tests.ByteStringNamedFilesystem)
1094
1095 tree = [
1096 '.bzr',
1097 '0file',
1098 '1dir/',
1099 '1dir/0file',
1100 '1dir/1dir/',
1101 '1file'
1102 ]
1103
1104 self.build_tree(tree)
1105
1106 # rename the 1file to a latin-1 filename
1107 os.rename("./1file", "\xe8file")
1108
1109 self._save_platform_info()
1110 win32utils.winver = None # Avoid the win32 detection code
1111 osutils._fs_enc = 'UTF-8'
1112
1113 # this should raise on error
1114 def attempt():
1115 for dirdetail, dirblock in osutils.walkdirs('.'):
1116 pass
1117
1118 self.assertRaises(errors.BadFilenameEncoding, attempt)
1119
1086 def test__walkdirs_utf8(self):1120 def test__walkdirs_utf8(self):
1087 tree = [1121 tree = [
1088 '.bzr',1122 '.bzr',
@@ -1921,7 +1955,7 @@
1921 def restore_osutils_globals(self):1955 def restore_osutils_globals(self):
1922 osutils._terminal_size_state = self._orig_terminal_size_state1956 osutils._terminal_size_state = self._orig_terminal_size_state
1923 osutils._first_terminal_size = self._orig_first_terminal_size1957 osutils._first_terminal_size = self._orig_first_terminal_size
1924 1958
1925 def replace_stdout(self, new):1959 def replace_stdout(self, new):
1926 self.overrideAttr(sys, 'stdout', new)1960 self.overrideAttr(sys, 'stdout', new)
19271961