Merge lp:~gagern/bzr/encodingSafeTests into lp:bzr

Proposed by Martin von Gagern
Status: Work in progress
Proposed branch: lp:~gagern/bzr/encodingSafeTests
Merge into: lp:bzr
Diff against target: 142 lines (+79/-11)
2 files modified
bzrlib/tests/__init__.py (+29/-11)
bzrlib/tests/test_selftest.py (+50/-0)
To merge this branch: bzr merge lp:~gagern/bzr/encodingSafeTests
Reviewer Review Type Date Requested Status
Martin Packman (community) Abstain
Andrew Bennetts Needs Fixing
Review via email: mp+23925@code.launchpad.net

Description of the change

Ensure tests won't cause the whole selftest to abort even if UnicodeErrors occur when formatting some test result.

I had proposed this in 2008: http://thread.gmane.org/gmane.comp.version-control.bazaar-ng.general/43607
Back then, John Arbash Meinel and Martin Pool had approved it, but Martin Pool later voted resubmit, as the feature failed its own test cases.

I've updated it to current bzr.dev, and selftests work for me now. Do you want this patch?

To post a comment you must log in.
Revision history for this message
Andrew Bennetts (spiv) wrote :

+ return string_or_unicode.encode('string_escape').replace(r'\n', '\n')

This mangles the output:

>>> x = r'backslash n, not newline: \n'
>>> x.encode('string_escape').replace(r'\n', '\n')
'backslash n, not newline: \\\n'
>>> '\n' in x
False
>>> '\n' in x.encode('string_escape').replace(r'\n', '\n')
True

Otherwise this seems ok. When UnicodeErrors occur you _safe_str all variable that might be involved, which is perhaps too zealous. But it's undoubtedly better than allowing the UnicodeError to abort selftest.

review: Needs Fixing
lp:~gagern/bzr/encodingSafeTests updated
3530. By Martin von Gagern

Fix broken newline unescaping.

In order to show useful diffs, and readable output in general, newlines
should not be escaped. The unescaping done before was broken if r'\n'
appeared in the input, as Andrew Bennetts reviewed. In order to avoid using
regular expressions, we now split lines and join them after escaping. Added
a test case as well, and a new test class for toplevel functions in tests.

Revision history for this message
Martin Packman (gz) wrote :

Making the test result output robust against dodgy bytestrings is a worthy goal, but there are issues with the specifics here. Most generally, I think testtools is the place to start.

There are three things this patch is making 'safe':

* Test descriptions, which come from test ids
Always ascii, as noted by John last time around:
<https://lists.ubuntu.com/archives/bazaar/2008q3/045619.html>

* Exception instances from testtools
Already been run through an encoding and decoding process. There *are* issues like bug 501166 that need fixing upstream.

* Diffs where one text is a (non-ascii) str and the other is unicode
This is undecidable as the function does not know the encoding of the bytestring. Just escaping both inputs can result in a bogus pass, asserting the types are the same would catch badly written tests.

review: Abstain
Revision history for this message
Robert Collins (lifeless) wrote :

So, I think Martin[gz]'s analysis is good; I'm ok with us doing what
your patch does *as long as* we don't lose motivation to fix these
other things:
 - so, any comparisons doing implicit casting should error (that is,
we should say that u'foo' != 'foo'), which will need to be its own
patch [it will be huge].
 - we need to *really visibly* log the unicode exception
 - we need to not permit any unicode exceptions on PQM, because its
meant to only get clean things.

I think on balance, that I wouldn't do what you've done, myself - I'd
just find and fix the root causes anytime the test suite did get
aborted: there can't be many such root causes.

HTH,
Rob

Revision history for this message
John A Meinel (jameinel) wrote :

I'm a bit unsure of how to proceed on this one. It looks like it should be a resubmit/do it differently sort of vote. At least going by Martin and Robert's comments.

Revision history for this message
Martin von Gagern (gagern) wrote :

See https://code.edge.launchpad.net/~gagern/bzr/setlocale/+merge/391 for some background on this request here: I mostly submitted the branch because it was the only thing left over from that ancient setlocale merge request. I wanted to get things closed down properly. It was vila who suggested I submit it as well.

The issues that motivated these changes have long since been dealt with, so I don't urgently need this fix. If you don't feel like merging it, then don't, I won't object (for this merge here). Only thing I'd like to avoid is rejecting the thing now and redoing the work in a few years because nobody remembers this branch.

I won't have time to code a different approach to this. So if you want it done differently, fine by me, but at least now I won't be the one implementing it.

Revision history for this message
Martin Packman (gz) wrote :

Submitting this merge request was a good idea, there are real problems here still even after two years. What is different is the test suite now depends on testtools, and there's more of a push to get str/unicode issues sorted out. It's fine if you're not interested in working out what needs changing now as it's no longer an issue for you.

I've spoken to Robert already about making some changes to how tracebacks are stringified in testtools, and with Martin Pool about general exception printing in Bazaar. These changes are going to be a pain, but if you're okay with it I'd like to CC you on the review when I'm done.

Revision history for this message
Martin von Gagern (gagern) wrote :

> if you're okay with it I'd like to CC you on the review when I'm done.

Sure do.

Unmerged revisions

3530. By Martin von Gagern

Fix broken newline unescaping.

In order to show useful diffs, and readable output in general, newlines
should not be escaped. The unescaping done before was broken if r'\n'
appeared in the input, as Andrew Bennetts reviewed. In order to avoid using
regular expressions, we now split lines and join them after escaping. Added
a test case as well, and a new test class for toplevel functions in tests.

3529. By Martin von Gagern

Merge from trunk, with lots of adjustments.

This branch has been lying around for quite a while, so there has been a
number of changes to incorporate. Therefore this merge has a large edit
component to it.

3528. By Martin von Gagern

Added tests for tests._safe_string.

3527. By Martin von Gagern

More robust handling of encoding issues in tests.

Problems with encoding errors could lead to not only single tests
failing but the whole selftest process crashing. The introduction of
safe strings as a fallback in case of trouble avoids most of these
problems. The safe strings try to maintain as much information as
possible while using only ASCII characters. This is achieved by
formatting the strings using escape sequences.

This modification only changes how certain test results are displayed
to the user. There is no change to the semantics of these tests, so a
test that used to fail before will still continue to fail. Hopefully if
it does fail it is more likely that the test suite will continue with
the next test and not abort due to some exception displaying the error.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'bzrlib/tests/__init__.py'
--- bzrlib/tests/__init__.py 2010-04-22 15:44:21 +0000
+++ bzrlib/tests/__init__.py 2010-04-23 07:05:38 +0000
@@ -139,6 +139,18 @@
139SUBUNIT_SEEK_SET = 0139SUBUNIT_SEEK_SET = 0
140SUBUNIT_SEEK_CUR = 1140SUBUNIT_SEEK_CUR = 1
141141
142def _safe_string(string_or_unicode):
143 """Escape argument so it won't likely cause UnicodeErrors."""
144 if isinstance(string_or_unicode, str):
145 return '\n'.join([s.encode('string_escape')
146 for s in string_or_unicode.split('\n')])
147 if isinstance(string_or_unicode, unicode):
148 return '\n'.join([s.encode('unicode_escape')
149 for s in string_or_unicode.split(u'\n')])
150 if string_or_unicode is None:
151 return '<None>'
152 return str(string_or_unicode).encode('string_escape')
153
142154
143class ExtendedTestResult(unittest._TextTestResult):155class ExtendedTestResult(unittest._TextTestResult):
144 """Accepts, reports and accumulates the results of running tests.156 """Accepts, reports and accumulates the results of running tests.
@@ -488,17 +500,20 @@
488 def _test_description(self, test):500 def _test_description(self, test):
489 return self._shortened_test_description(test)501 return self._shortened_test_description(test)
490502
503 def _report(self, kind, test, err):
504 descr = self._test_description(test)
505 msg = err[1]
506 try:
507 self.stream.write('%s: %s\n %s\n' % (kind, descr, msg))
508 except UnicodeError:
509 self.stream.write('%s: %s\n %s\n' %
510 (kind, _safe_string(descr), _safe_string(msg)))
511
491 def report_error(self, test, err):512 def report_error(self, test, err):
492 self.stream.write('ERROR: %s\n %s\n' % (513 self._report('ERROR', test, err)
493 self._test_description(test),
494 err[1],
495 ))
496514
497 def report_failure(self, test, err):515 def report_failure(self, test, err):
498 self.stream.write('FAIL: %s\n %s\n' % (516 self._report('FAIL', test, err)
499 self._test_description(test),
500 err[1],
501 ))
502517
503 def report_known_failure(self, test, err):518 def report_known_failure(self, test, err):
504 pass519 pass
@@ -1113,9 +1128,12 @@
1113 message = 'first string is missing a final newline.\n'1128 message = 'first string is missing a final newline.\n'
1114 if a == b + '\n':1129 if a == b + '\n':
1115 message = 'second string is missing a final newline.\n'1130 message = 'second string is missing a final newline.\n'
1116 raise AssertionError(message +1131 try:
1117 self._ndiff_strings(a, b))1132 diff = self._ndiff_strings(a, b)
11181133 except UnicodeError:
1134 diff = self._ndiff_strings(_safe_string(a), _safe_string(b))
1135 raise AssertionError(message + diff)
1136
1119 def assertEqualMode(self, mode, mode_test):1137 def assertEqualMode(self, mode, mode_test):
1120 self.assertEqual(mode, mode_test,1138 self.assertEqual(mode, mode_test,
1121 'mode mismatch %o != %o' % (mode, mode_test))1139 'mode mismatch %o != %o' % (mode, mode_test))
11221140
=== modified file 'bzrlib/tests/test_selftest.py'
--- bzrlib/tests/test_selftest.py 2010-04-18 08:51:37 +0000
+++ bzrlib/tests/test_selftest.py 2010-04-23 07:05:38 +0000
@@ -510,6 +510,14 @@
510 os.lstat("foo"), os.lstat("longname"))510 os.lstat("foo"), os.lstat("longname"))
511511
512512
513class TestTestsPackage(tests.TestCase):
514 """Test parts of the bzrlib.tests package like top-level functions."""
515
516 def test_safe_string_with_newline(self):
517 self.assertEquals('foo\\\\\nbar\\\\nbaz',
518 tests._safe_string('foo\\\nbar\\nbaz'))
519
520
513class TestTestCaseWithMemoryTransport(tests.TestCaseWithMemoryTransport):521class TestTestCaseWithMemoryTransport(tests.TestCaseWithMemoryTransport):
514522
515 def test_home_is_non_existant_dir_under_root(self):523 def test_home_is_non_existant_dir_under_root(self):
@@ -996,6 +1004,32 @@
996 test.run(result)1004 test.run(result)
997 self.assertEquals(1, result.calls)1005 self.assertEquals(1, result.calls)
9981006
1007 def test_safe_string_message(self):
1008 result_stream = StringIO()
1009 result = bzrlib.tests.TextTestResult(
1010 unittest._WritelnDecorator(result_stream),
1011 descriptions=0,
1012 verbosity=1,
1013 )
1014 test = self.get_passing_test()
1015 # To be a bit like real life, the strings deal with
1016 # japanese Katakana characters. The binary string first
1017 # encodes the character U+30A4 in EUC-JP and then U+30A6
1018 # in UTF-8. Mixed encodings like this is the kind of issue
1019 # addressed by _safe_string.
1020 test.id = lambda: u'\u30a2\n\u30a8'
1021 # this seeds the state to handle reporting the test.
1022 result.startTest(test)
1023 prefix = len(result_stream.getvalue())
1024 # the err parameter has the shape:
1025 # (class, exception object, traceback)
1026 err = (AssertionError, AssertionError('\xa5\xa4\n\xe3\x82\xa6'), [])
1027 result.report_failure(test, err)
1028 output = result_stream.getvalue()[prefix:]
1029 self.assertEqual(
1030 'FAIL: \\u30a2\n\\u30a8\n \\xa5\\xa4\\n\\xe3\\x82\\xa6\n',
1031 output)
1032
9991033
1000class TestUnicodeFilenameFeature(tests.TestCase):1034class TestUnicodeFilenameFeature(tests.TestCase):
10011035
@@ -1654,6 +1688,22 @@
1654 test.run(unittest.TestResult())1688 test.run(unittest.TestResult())
1655 self.assertEqual('original', obj.test_attr)1689 self.assertEqual('original', obj.test_attr)
16561690
1691 def test_safe_string_diff(self):
1692 """Problems with encodings should fall back to _safe_string."""
1693 try:
1694 # To be a bit like real life, the strings deal with
1695 # japanese Katakana characters. The binary string first
1696 # encodes the character U+30A4 in EUC-JP and then U+30A6
1697 # in UTF-8. Mixed encodings like this is the kind of issue
1698 # addressed by _safe_string.
1699 self.assertEqualDiff(u'\u30a2\n\u30a8',
1700 '\xa5\xa4\n\xe3\x82\xa6')
1701 except AssertionError, err:
1702 self.assertEqualDiff('texts not equal:\n' +
1703 '- \\u30a2\n- \\u30a8\n' +
1704 '+ \\xa5\\xa4\n+ \\xe3\\x82\\xa6\n',
1705 err.message)
1706
16571707
1658# NB: Don't delete this; it's not actually from 0.11!1708# NB: Don't delete this; it's not actually from 0.11!
1659@deprecated_function(deprecated_in((0, 11, 0)))1709@deprecated_function(deprecated_in((0, 11, 0)))