Merge lp:~mbp/bzr/340394-encoding-option into lp:bzr

Proposed by Martin Pool
Status: Merged
Approved by: John A Meinel
Approved revision: no longer in the source branch.
Merged at revision: 5317
Proposed branch: lp:~mbp/bzr/340394-encoding-option
Merge into: lp:bzr
Diff against target: 244 lines (+153/-3)
7 files modified
NEWS (+3/-0)
bzrlib/help_topics/en/configuration.txt (+11/-0)
bzrlib/tests/__init__.py (+2/-0)
bzrlib/tests/fixtures.py (+84/-0)
bzrlib/tests/test_fixtures.py (+28/-0)
bzrlib/tests/test_ui.py (+18/-1)
bzrlib/ui/__init__.py (+7/-2)
To merge this branch: bzr merge lp:~mbp/bzr/340394-encoding-option
Reviewer Review Type Date Requested Status
Robert Collins (community) Needs Information
John A Meinel Approve
Review via email: mp+27913@code.launchpad.net

Description of the change

This is probably a bit out of date with other test changes, but I'll put it up so bialix can see it: it adds an output_encoding config option that controls how we encode non-file-content output. My next step was going to be to add a way to set this on the command line for the duration of a process.

To post a comment you must log in.
Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> Martin Pool has proposed merging lp:~mbp/bzr/340394-encoding-option into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
>
>
> This is probably a bit out of date with other test changes, but I'll put it up so bialix can see it: it adds an output_encoding config option that controls how we encode non-file-content output. My next step was going to be to add a way to set this on the command line for the duration of a process.
>

=== modified file 'bzrlib/tests/__init__.py'
- --- bzrlib/tests/__init__.py 2010-06-08 01:42:50 +0000
+++ bzrlib/tests/__init__.py 2010-06-18 09:45:47 +0000
@@ -3704,6 +3704,7 @@
         'bzrlib.tests.test_export',
         'bzrlib.tests.test_extract',
         'bzrlib.tests.test_fetch',
+ 'bzrlib.tests.test_fixtures',
         'bzrlib.tests.test_fifo_cache',
         'bzrlib.tests.test_filters',
         'bzrlib.tests.test_ftp_transport',
@@ -3839,6 +3840,7 @@
         'bzrlib.option',
         'bzrlib.symbol_versioning',
         'bzrlib.tests',
+ 'bzrlib.tests.fixtures',
         'bzrlib.timestamp',
         'bzrlib.version_info_formats.format_custom',
         ]

^- This confused me at first, but I see that it is "tests.test_fixtures"
versus "tests.fixtures". Probably fine overall, just mentioning the
confusion.

=== added file 'bzrlib/tests/test_fixtures.py'
- --- bzrlib/tests/test_fixtures.py 1970-01-01 00:00:00 +0000
+++ bzrlib/tests/test_fixtures.py 2010-06-18 09:45:47 +0000
@@ -0,0 +1,28 @@
+# Copyright (C) 2005-2010 Canonical Ltd

^- I'm pretty sure this isn't accurate :).

+class TestUIConfiguration(tests.TestCaseWithTransport):
+
+ def test_output_encoding_configuration(self):
+ enc = fixtures.generate_unicode_encodings().next()
+ config.GlobalConfig().set_user_option('output_encoding',
+ enc)
+ ui = tests.TestUIFactory(stdin=None,
+ stdout=tests.StringIOWrapper(),
+ stderr=tests.StringIOWrapper())
+ os = ui.make_output_stream()
+ self.assertEquals(os.encoding, enc)
+
+

^- this seems like it would want to be a permutation test across all the
'generate_unicode_encodings()' rather than a single test of the first one.

Conceptually I'm fine with this. I think it is a reasonable way to
handle stuff like "I want utf-8 always, even though my system doesn't
think so."

 merge: approve

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkwbgo4ACgkQJdeBCYSNAANGdQCghkA48qKjGt9reS13FkE8e2bK
ydAAoJZYizvZDCtdcKauNFwhfss15XTn
=4eZG
-----END PGP SIGNATURE-----

review: Approve
Revision history for this message
Robert Collins (lifeless) wrote :

This doesn't make sense: why would we want to have an output encoding that isn't compatible with the operating systems encoding? I'm channelling Martin[gz] here I think - I recall him asking this question in IRC, and I think its a good question.

review: Needs Information
Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> Review: Needs Information
> This doesn't make sense: why would we want to have an output encoding that isn't compatible with the operating systems encoding? I'm channelling Martin[gz] here I think - I recall him asking this question in IRC, and I think its a good question.

Incorrect auto-detection. (Fairly common, Mac hasn't traditionally
detected well, neither FreeBSD, etc.)

Output to a file.

bzr log | less
  vs
bzr log > content.txt

(On Windows, the former should probably use Terminal encoding, but the
later should use locale.getpreferredencoding())

I think the big thing is having this as a step along the way to being
able to supply it on a per-command basis.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkwfbKAACgkQJdeBCYSNAANDyACggTDoJffExuFR7Rkrj6X0CxZi
3B8AoI06keflVw93Ppezw6p++KQx/yrV
=DM/S
-----END PGP SIGNATURE-----

Revision history for this message
Robert Collins (lifeless) wrote :

That seems to be coupled to the as yet untackled command-line-option-setting work; still, I can see the argument, will land.

The fixtures file is pretty rudimentary now but we can iterate.

Revision history for this message
Robert Collins (lifeless) wrote :

Oh, one more thing of interest - I'm not sure 'Unicode options' is the best name for this in the help, but I can't think of a better one right now.

I'm proposing a fixed branch with NEWS entries and correct copyright years now; would appreciate a rubber stamp on it.

Revision history for this message
Alexander Belchenko (bialix) wrote :

John A Meinel пишет:
> Robert Collins wrote:
>> Review: Needs Information
>> This doesn't make sense: why would we want to have an output encoding that isn't compatible with the operating systems encoding? I'm channelling Martin[gz] here I think - I recall him asking this question in IRC, and I think its a good question.
>
> Incorrect auto-detection. (Fairly common, Mac hasn't traditionally
> detected well, neither FreeBSD, etc.)
>
> Output to a file.
>
> bzr log | less
> vs
> bzr log > content.txt
>
> (On Windows, the former should probably use Terminal encoding, but the
> later should use locale.getpreferredencoding())

Exactly.

> I think the big thing is having this as a step along the way to being
> able to supply it on a per-command basis.

On UDS I've talked with Martin and proposed global command-line option
--encoding for this, e.g.:

bzr --encoding utf-8 log > file.txt

But if this patch is the first step towards this goal, it's ok.

Revision history for this message
Martin Pool (mbp) wrote :

On 21 June 2010 23:44, John Arbash Meinel <email address hidden> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Robert Collins wrote:
>> Review: Needs Information
>> This doesn't make sense: why would we want to have an output encoding that isn't compatible with the operating systems encoding? I'm channelling Martin[gz] here I think - I recall him asking this question in IRC, and I think its a good question.
>
> Incorrect auto-detection. (Fairly common, Mac hasn't traditionally
> detected well, neither FreeBSD, etc.)

It is a good question. Beyond the cases John mentions I just might
want the output in some arbitrary encoding: I might always work in
8859-7 but want to export a diff as utf-8 to attach it to a Launchpad
bug.

--
Martin

Revision history for this message
Robert Collins (lifeless) wrote :

On Tue, Jun 22, 2010 at 6:58 PM, Martin Pool <email address hidden> wrote:
>> Incorrect auto-detection. (Fairly common, Mac hasn't traditionally
>> detected well, neither FreeBSD, etc.)
>
> It is a good question.  Beyond the cases John mentions I just might
> want the output in some arbitrary encoding: I might always work in
> 8859-7 but want to export a diff as utf-8 to attach it to a Launchpad
> bug.

I don't see that particular one as relevant because the OS
autodetection permits changing that via the existing standard
environment variables already - or am I missing something? [possibly
the variables are too big a hammer and the diff won't generate from
disk correctly if the variables are changed?]

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> On Tue, Jun 22, 2010 at 6:58 PM, Martin Pool <email address hidden> wrote:
>>> Incorrect auto-detection. (Fairly common, Mac hasn't traditionally
>>> detected well, neither FreeBSD, etc.)
>> It is a good question. Beyond the cases John mentions I just might
>> want the output in some arbitrary encoding: I might always work in
>> 8859-7 but want to export a diff as utf-8 to attach it to a Launchpad
>> bug.
>
> I don't see that particular one as relevant because the OS
> autodetection permits changing that via the existing standard
> environment variables already - or am I missing something? [possibly
> the variables are too big a hammer and the diff won't generate from
> disk correctly if the variables are changed?]

You can't set utf-8 as the codepage? (I believe you can only set 8-bit
code pages for the shell on Windows)

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkwhDIwACgkQJdeBCYSNAANP3gCgp95nm1ELcGjxd7XmzbN/Wr+1
0z4AoLuawZMpxliHyQ+VxWfxNGiTmdKx
=KNBS
-----END PGP SIGNATURE-----

Revision history for this message
Robert Collins (lifeless) wrote :

Ah! anyhow, as said above, lets merge it.

Revision history for this message
Martin Pool (mbp) wrote :

On 23 June 2010 05:01, Robert Collins <email address hidden> wrote:
> On Tue, Jun 22, 2010 at 6:58 PM, Martin Pool <email address hidden> wrote:
>>> Incorrect auto-detection. (Fairly common, Mac hasn't traditionally
>>> detected well, neither FreeBSD, etc.)
>>
>> It is a good question.  Beyond the cases John mentions I just might
>> want the output in some arbitrary encoding: I might always work in
>> 8859-7 but want to export a diff as utf-8 to attach it to a Launchpad
>> bug.
>
> I don't see that particular one as relevant because the OS
> autodetection permits changing that via the existing standard
> environment variables already - or am I missing something? [possibly
> the variables are too big a hammer and the diff won't generate from
> disk correctly if the variables are changed?]

It's a few things:

 * eventually I want to have a command line option that rebinds these
variables for the scope of a process
 * I'm not sure if it's easy to change this from the windows command line
 * OS autodetection is sometimes wrong
 * you might want to set things on a finer granularity, for example
the unix path encoding, and these are not well exposed

--
Martin

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2010-06-17 14:06:36 +0000
3+++ NEWS 2010-06-18 09:45:47 +0000
4@@ -403,6 +403,9 @@
5 Testing
6 *******
7
8+* Add ``bzrlib.tests.fixtures`` to hold code for setting up objects
9+ to test. (Martin Pool)
10+
11 * Added ``bzrlib.tests.matchers`` as a place to put matchers, along with
12 our first in-tree matcher. See the module docstring for details.
13 (Robert Collins)
14
15=== modified file 'bzrlib/help_topics/en/configuration.txt'
16--- bzrlib/help_topics/en/configuration.txt 2010-06-02 05:03:31 +0000
17+++ bzrlib/help_topics/en/configuration.txt 2010-06-18 09:45:47 +0000
18@@ -496,6 +496,17 @@
19 using deprecated formats.
20
21
22+Unicode options
23+---------------
24+
25+output_encoding
26+~~~~~~~~~~~~~~~
27+
28+A Python unicode encoding name for text output from bzr, such as log
29+information. Values include: utf8, cp850, ascii, iso-8859-1. The default
30+is the terminal encoding prefered by the operating system.
31+
32+
33 Branch type specific options
34 ----------------------------
35
36
37=== modified file 'bzrlib/tests/__init__.py'
38--- bzrlib/tests/__init__.py 2010-06-08 01:42:50 +0000
39+++ bzrlib/tests/__init__.py 2010-06-18 09:45:47 +0000
40@@ -3704,6 +3704,7 @@
41 'bzrlib.tests.test_export',
42 'bzrlib.tests.test_extract',
43 'bzrlib.tests.test_fetch',
44+ 'bzrlib.tests.test_fixtures',
45 'bzrlib.tests.test_fifo_cache',
46 'bzrlib.tests.test_filters',
47 'bzrlib.tests.test_ftp_transport',
48@@ -3839,6 +3840,7 @@
49 'bzrlib.option',
50 'bzrlib.symbol_versioning',
51 'bzrlib.tests',
52+ 'bzrlib.tests.fixtures',
53 'bzrlib.timestamp',
54 'bzrlib.version_info_formats.format_custom',
55 ]
56
57=== added file 'bzrlib/tests/fixtures.py'
58--- bzrlib/tests/fixtures.py 1970-01-01 00:00:00 +0000
59+++ bzrlib/tests/fixtures.py 2010-06-18 09:45:47 +0000
60@@ -0,0 +1,84 @@
61+# Copyright (C) 2005-2010 Canonical Ltd
62+#
63+# This program is free software; you can redistribute it and/or modify
64+# it under the terms of the GNU General Public License as published by
65+# the Free Software Foundation; either version 2 of the License, or
66+# (at your option) any later version.
67+#
68+# This program is distributed in the hope that it will be useful,
69+# but WITHOUT ANY WARRANTY; without even the implied warranty of
70+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
71+# GNU General Public License for more details.
72+#
73+# You should have received a copy of the GNU General Public License
74+# along with this program; if not, write to the Free Software
75+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
76+
77+
78+"""Fixtures that can be used within tests.
79+
80+Fixtures can be created during a test as a way to separate out creation of
81+objects to test. Fixture objects can hold some state so that different
82+objects created during a test instance can be related. Normally a fixture
83+should live only for the duration of a single test, and its tearDown method
84+should be passed to `addCleanup` on the test.
85+"""
86+
87+
88+import itertools
89+
90+
91+def generate_unicode_names():
92+ """Generate a sequence of arbitrary unique unicode names.
93+
94+ By default they are not representable in ascii.
95+
96+ >>> gen = generate_unicode_names()
97+ >>> n1 = gen.next()
98+ >>> n2 = gen.next()
99+ >>> type(n1)
100+ <type 'unicode'>
101+ >>> n1 == n2
102+ False
103+ >>> n1.encode('ascii', 'replace') == n1
104+ False
105+ """
106+ # include a mathematical symbol unlikely to be in 8-bit encodings
107+ return (u"\N{SINE WAVE}%d" % x for x in itertools.count())
108+
109+
110+interesting_encodings = [
111+ ('iso-8859-1', False),
112+ ('ascii', False),
113+ ('cp850', False),
114+ ('utf-8', True),
115+ ('ucs-2', True),
116+ ]
117+
118+
119+def generate_unicode_encodings(universal_encoding=None):
120+ """Return a generator of unicode encoding names.
121+
122+ These can be passed to Python encode/decode/etc.
123+
124+ :param universal_encoding: True/False/None tristate to say whether the
125+ generated encodings either can or cannot encode all unicode
126+ strings.
127+
128+ >>> n1 = generate_unicode_names().next()
129+ >>> enc = generate_unicode_encodings(universal_encoding=True).next()
130+ >>> enc2 = generate_unicode_encodings(universal_encoding=False).next()
131+ >>> n1.encode(enc).decode(enc) == n1
132+ True
133+ >>> try:
134+ ... n1.encode(enc2).decode(enc2)
135+ ... except UnicodeError:
136+ ... print 'fail'
137+ fail
138+ """
139+ # TODO: check they're supported on this platform?
140+ if universal_encoding is not None:
141+ e = [n for (n, u) in interesting_encodings if u == universal_encoding]
142+ else:
143+ e = [n for (n, u) in interesting_encodings]
144+ return itertools.cycle(iter(e))
145
146=== added file 'bzrlib/tests/test_fixtures.py'
147--- bzrlib/tests/test_fixtures.py 1970-01-01 00:00:00 +0000
148+++ bzrlib/tests/test_fixtures.py 2010-06-18 09:45:47 +0000
149@@ -0,0 +1,28 @@
150+# Copyright (C) 2005-2010 Canonical Ltd
151+#
152+# This program is free software; you can redistribute it and/or modify
153+# it under the terms of the GNU General Public License as published by
154+# the Free Software Foundation; either version 2 of the License, or
155+# (at your option) any later version.
156+#
157+# This program is distributed in the hope that it will be useful,
158+# but WITHOUT ANY WARRANTY; without even the implied warranty of
159+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
160+# GNU General Public License for more details.
161+#
162+# You should have received a copy of the GNU General Public License
163+# along with this program; if not, write to the Free Software
164+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
165+
166+"""Tests for test fixtures"""
167+
168+import codecs
169+
170+from bzrlib import (
171+ tests,
172+ )
173+from bzrlib.tests import (
174+ fixtures,
175+ )
176+
177+
178
179=== modified file 'bzrlib/tests/test_ui.py'
180--- bzrlib/tests/test_ui.py 2010-05-20 18:23:17 +0000
181+++ bzrlib/tests/test_ui.py 2010-06-18 09:45:47 +0000
182@@ -24,6 +24,7 @@
183 from StringIO import StringIO
184
185 from bzrlib import (
186+ config,
187 errors,
188 remote,
189 repository,
190@@ -33,10 +34,26 @@
191 from bzrlib.symbol_versioning import (
192 deprecated_in,
193 )
194-from bzrlib.tests import test_progress
195+from bzrlib.tests import (
196+ fixtures,
197+ test_progress,
198+ )
199 from bzrlib.ui import text as _mod_ui_text
200
201
202+class TestUIConfiguration(tests.TestCaseWithTransport):
203+
204+ def test_output_encoding_configuration(self):
205+ enc = fixtures.generate_unicode_encodings().next()
206+ config.GlobalConfig().set_user_option('output_encoding',
207+ enc)
208+ ui = tests.TestUIFactory(stdin=None,
209+ stdout=tests.StringIOWrapper(),
210+ stderr=tests.StringIOWrapper())
211+ os = ui.make_output_stream()
212+ self.assertEquals(os.encoding, enc)
213+
214+
215 class TestTextUIFactory(tests.TestCase):
216
217 def test_text_factory_ascii_password(self):
218
219=== modified file 'bzrlib/ui/__init__.py'
220--- bzrlib/ui/__init__.py 2010-03-25 07:34:15 +0000
221+++ bzrlib/ui/__init__.py 2010-06-18 09:45:47 +0000
222@@ -158,8 +158,9 @@
223 version of stdout, but in a GUI it might be appropriate to send it to a
224 window displaying the text.
225
226- :param encoding: Unicode encoding for output; default is the
227- terminal encoding, which may be different from the user encoding.
228+ :param encoding: Unicode encoding for output; if not specified
229+ uses the configured 'output_encoding' if any; otherwise the
230+ terminal encoding.
231 (See get_terminal_encoding.)
232
233 :param encoding_type: How to handle encoding errors:
234@@ -167,6 +168,10 @@
235 """
236 # XXX: is the caller supposed to close the resulting object?
237 if encoding is None:
238+ from bzrlib import config
239+ encoding = config.GlobalConfig().get_user_option(
240+ 'output_encoding')
241+ if encoding is None:
242 encoding = osutils.get_terminal_encoding()
243 if encoding_type is None:
244 encoding_type = 'replace'