Merge lp:~mbp/bzr/340394-encoding-option into lp:bzr
- 340394-encoding-option
- Merge into bzr.dev
Status: | Merged |
---|---|
Approved by: | John A Meinel |
Approved revision: | no longer in the source branch. |
Merged at revision: | 5317 |
Proposed branch: | lp:~mbp/bzr/340394-encoding-option |
Merge into: | lp:bzr |
Diff against target: |
244 lines (+153/-3) 7 files modified
NEWS (+3/-0) bzrlib/help_topics/en/configuration.txt (+11/-0) bzrlib/tests/__init__.py (+2/-0) bzrlib/tests/fixtures.py (+84/-0) bzrlib/tests/test_fixtures.py (+28/-0) bzrlib/tests/test_ui.py (+18/-1) bzrlib/ui/__init__.py (+7/-2) |
To merge this branch: | bzr merge lp:~mbp/bzr/340394-encoding-option |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Robert Collins (community) | Needs Information | ||
John A Meinel | Approve | ||
Review via email: mp+27913@code.launchpad.net |
Commit message
Description of the change
This is probably a bit out of date with other test changes, but I'll put it up so bialix can see it: it adds an output_encoding config option that controls how we encode non-file-content output. My next step was going to be to add a way to set this on the command line for the duration of a process.
Robert Collins (lifeless) wrote : | # |
This doesn't make sense: why would we want to have an output encoding that isn't compatible with the operating systems encoding? I'm channelling Martin[gz] here I think - I recall him asking this question in IRC, and I think its a good question.
John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Robert Collins wrote:
> Review: Needs Information
> This doesn't make sense: why would we want to have an output encoding that isn't compatible with the operating systems encoding? I'm channelling Martin[gz] here I think - I recall him asking this question in IRC, and I think its a good question.
Incorrect auto-detection. (Fairly common, Mac hasn't traditionally
detected well, neither FreeBSD, etc.)
Output to a file.
bzr log | less
vs
bzr log > content.txt
(On Windows, the former should probably use Terminal encoding, but the
later should use locale.
I think the big thing is having this as a step along the way to being
able to supply it on a per-command basis.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAkw
3B8AoI06keflVw9
=DM/S
-----END PGP SIGNATURE-----
Robert Collins (lifeless) wrote : | # |
That seems to be coupled to the as yet untackled command-
The fixtures file is pretty rudimentary now but we can iterate.
Robert Collins (lifeless) wrote : | # |
Oh, one more thing of interest - I'm not sure 'Unicode options' is the best name for this in the help, but I can't think of a better one right now.
I'm proposing a fixed branch with NEWS entries and correct copyright years now; would appreciate a rubber stamp on it.
Alexander Belchenko (bialix) wrote : | # |
John A Meinel пишет:
> Robert Collins wrote:
>> Review: Needs Information
>> This doesn't make sense: why would we want to have an output encoding that isn't compatible with the operating systems encoding? I'm channelling Martin[gz] here I think - I recall him asking this question in IRC, and I think its a good question.
>
> Incorrect auto-detection. (Fairly common, Mac hasn't traditionally
> detected well, neither FreeBSD, etc.)
>
> Output to a file.
>
> bzr log | less
> vs
> bzr log > content.txt
>
> (On Windows, the former should probably use Terminal encoding, but the
> later should use locale.
Exactly.
> I think the big thing is having this as a step along the way to being
> able to supply it on a per-command basis.
On UDS I've talked with Martin and proposed global command-line option
--encoding for this, e.g.:
bzr --encoding utf-8 log > file.txt
But if this patch is the first step towards this goal, it's ok.
Martin Pool (mbp) wrote : | # |
On 21 June 2010 23:44, John Arbash Meinel <email address hidden> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Robert Collins wrote:
>> Review: Needs Information
>> This doesn't make sense: why would we want to have an output encoding that isn't compatible with the operating systems encoding? I'm channelling Martin[gz] here I think - I recall him asking this question in IRC, and I think its a good question.
>
> Incorrect auto-detection. (Fairly common, Mac hasn't traditionally
> detected well, neither FreeBSD, etc.)
It is a good question. Beyond the cases John mentions I just might
want the output in some arbitrary encoding: I might always work in
8859-7 but want to export a diff as utf-8 to attach it to a Launchpad
bug.
--
Martin
Robert Collins (lifeless) wrote : | # |
On Tue, Jun 22, 2010 at 6:58 PM, Martin Pool <email address hidden> wrote:
>> Incorrect auto-detection. (Fairly common, Mac hasn't traditionally
>> detected well, neither FreeBSD, etc.)
>
> It is a good question. Beyond the cases John mentions I just might
> want the output in some arbitrary encoding: I might always work in
> 8859-7 but want to export a diff as utf-8 to attach it to a Launchpad
> bug.
I don't see that particular one as relevant because the OS
autodetection permits changing that via the existing standard
environment variables already - or am I missing something? [possibly
the variables are too big a hammer and the diff won't generate from
disk correctly if the variables are changed?]
John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Robert Collins wrote:
> On Tue, Jun 22, 2010 at 6:58 PM, Martin Pool <email address hidden> wrote:
>>> Incorrect auto-detection. (Fairly common, Mac hasn't traditionally
>>> detected well, neither FreeBSD, etc.)
>> It is a good question. Beyond the cases John mentions I just might
>> want the output in some arbitrary encoding: I might always work in
>> 8859-7 but want to export a diff as utf-8 to attach it to a Launchpad
>> bug.
>
> I don't see that particular one as relevant because the OS
> autodetection permits changing that via the existing standard
> environment variables already - or am I missing something? [possibly
> the variables are too big a hammer and the diff won't generate from
> disk correctly if the variables are changed?]
You can't set utf-8 as the codepage? (I believe you can only set 8-bit
code pages for the shell on Windows)
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAkw
0z4AoLuawZMpxli
=KNBS
-----END PGP SIGNATURE-----
Robert Collins (lifeless) wrote : | # |
Ah! anyhow, as said above, lets merge it.
Martin Pool (mbp) wrote : | # |
On 23 June 2010 05:01, Robert Collins <email address hidden> wrote:
> On Tue, Jun 22, 2010 at 6:58 PM, Martin Pool <email address hidden> wrote:
>>> Incorrect auto-detection. (Fairly common, Mac hasn't traditionally
>>> detected well, neither FreeBSD, etc.)
>>
>> It is a good question. Beyond the cases John mentions I just might
>> want the output in some arbitrary encoding: I might always work in
>> 8859-7 but want to export a diff as utf-8 to attach it to a Launchpad
>> bug.
>
> I don't see that particular one as relevant because the OS
> autodetection permits changing that via the existing standard
> environment variables already - or am I missing something? [possibly
> the variables are too big a hammer and the diff won't generate from
> disk correctly if the variables are changed?]
It's a few things:
* eventually I want to have a command line option that rebinds these
variables for the scope of a process
* I'm not sure if it's easy to change this from the windows command line
* OS autodetection is sometimes wrong
* you might want to set things on a finer granularity, for example
the unix path encoding, and these are not well exposed
--
Martin
Preview Diff
1 | === modified file 'NEWS' |
2 | --- NEWS 2010-06-17 14:06:36 +0000 |
3 | +++ NEWS 2010-06-18 09:45:47 +0000 |
4 | @@ -403,6 +403,9 @@ |
5 | Testing |
6 | ******* |
7 | |
8 | +* Add ``bzrlib.tests.fixtures`` to hold code for setting up objects |
9 | + to test. (Martin Pool) |
10 | + |
11 | * Added ``bzrlib.tests.matchers`` as a place to put matchers, along with |
12 | our first in-tree matcher. See the module docstring for details. |
13 | (Robert Collins) |
14 | |
15 | === modified file 'bzrlib/help_topics/en/configuration.txt' |
16 | --- bzrlib/help_topics/en/configuration.txt 2010-06-02 05:03:31 +0000 |
17 | +++ bzrlib/help_topics/en/configuration.txt 2010-06-18 09:45:47 +0000 |
18 | @@ -496,6 +496,17 @@ |
19 | using deprecated formats. |
20 | |
21 | |
22 | +Unicode options |
23 | +--------------- |
24 | + |
25 | +output_encoding |
26 | +~~~~~~~~~~~~~~~ |
27 | + |
28 | +A Python unicode encoding name for text output from bzr, such as log |
29 | +information. Values include: utf8, cp850, ascii, iso-8859-1. The default |
30 | +is the terminal encoding prefered by the operating system. |
31 | + |
32 | + |
33 | Branch type specific options |
34 | ---------------------------- |
35 | |
36 | |
37 | === modified file 'bzrlib/tests/__init__.py' |
38 | --- bzrlib/tests/__init__.py 2010-06-08 01:42:50 +0000 |
39 | +++ bzrlib/tests/__init__.py 2010-06-18 09:45:47 +0000 |
40 | @@ -3704,6 +3704,7 @@ |
41 | 'bzrlib.tests.test_export', |
42 | 'bzrlib.tests.test_extract', |
43 | 'bzrlib.tests.test_fetch', |
44 | + 'bzrlib.tests.test_fixtures', |
45 | 'bzrlib.tests.test_fifo_cache', |
46 | 'bzrlib.tests.test_filters', |
47 | 'bzrlib.tests.test_ftp_transport', |
48 | @@ -3839,6 +3840,7 @@ |
49 | 'bzrlib.option', |
50 | 'bzrlib.symbol_versioning', |
51 | 'bzrlib.tests', |
52 | + 'bzrlib.tests.fixtures', |
53 | 'bzrlib.timestamp', |
54 | 'bzrlib.version_info_formats.format_custom', |
55 | ] |
56 | |
57 | === added file 'bzrlib/tests/fixtures.py' |
58 | --- bzrlib/tests/fixtures.py 1970-01-01 00:00:00 +0000 |
59 | +++ bzrlib/tests/fixtures.py 2010-06-18 09:45:47 +0000 |
60 | @@ -0,0 +1,84 @@ |
61 | +# Copyright (C) 2005-2010 Canonical Ltd |
62 | +# |
63 | +# This program is free software; you can redistribute it and/or modify |
64 | +# it under the terms of the GNU General Public License as published by |
65 | +# the Free Software Foundation; either version 2 of the License, or |
66 | +# (at your option) any later version. |
67 | +# |
68 | +# This program is distributed in the hope that it will be useful, |
69 | +# but WITHOUT ANY WARRANTY; without even the implied warranty of |
70 | +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
71 | +# GNU General Public License for more details. |
72 | +# |
73 | +# You should have received a copy of the GNU General Public License |
74 | +# along with this program; if not, write to the Free Software |
75 | +# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA |
76 | + |
77 | + |
78 | +"""Fixtures that can be used within tests. |
79 | + |
80 | +Fixtures can be created during a test as a way to separate out creation of |
81 | +objects to test. Fixture objects can hold some state so that different |
82 | +objects created during a test instance can be related. Normally a fixture |
83 | +should live only for the duration of a single test, and its tearDown method |
84 | +should be passed to `addCleanup` on the test. |
85 | +""" |
86 | + |
87 | + |
88 | +import itertools |
89 | + |
90 | + |
91 | +def generate_unicode_names(): |
92 | + """Generate a sequence of arbitrary unique unicode names. |
93 | + |
94 | + By default they are not representable in ascii. |
95 | + |
96 | + >>> gen = generate_unicode_names() |
97 | + >>> n1 = gen.next() |
98 | + >>> n2 = gen.next() |
99 | + >>> type(n1) |
100 | + <type 'unicode'> |
101 | + >>> n1 == n2 |
102 | + False |
103 | + >>> n1.encode('ascii', 'replace') == n1 |
104 | + False |
105 | + """ |
106 | + # include a mathematical symbol unlikely to be in 8-bit encodings |
107 | + return (u"\N{SINE WAVE}%d" % x for x in itertools.count()) |
108 | + |
109 | + |
110 | +interesting_encodings = [ |
111 | + ('iso-8859-1', False), |
112 | + ('ascii', False), |
113 | + ('cp850', False), |
114 | + ('utf-8', True), |
115 | + ('ucs-2', True), |
116 | + ] |
117 | + |
118 | + |
119 | +def generate_unicode_encodings(universal_encoding=None): |
120 | + """Return a generator of unicode encoding names. |
121 | + |
122 | + These can be passed to Python encode/decode/etc. |
123 | + |
124 | + :param universal_encoding: True/False/None tristate to say whether the |
125 | + generated encodings either can or cannot encode all unicode |
126 | + strings. |
127 | + |
128 | + >>> n1 = generate_unicode_names().next() |
129 | + >>> enc = generate_unicode_encodings(universal_encoding=True).next() |
130 | + >>> enc2 = generate_unicode_encodings(universal_encoding=False).next() |
131 | + >>> n1.encode(enc).decode(enc) == n1 |
132 | + True |
133 | + >>> try: |
134 | + ... n1.encode(enc2).decode(enc2) |
135 | + ... except UnicodeError: |
136 | + ... print 'fail' |
137 | + fail |
138 | + """ |
139 | + # TODO: check they're supported on this platform? |
140 | + if universal_encoding is not None: |
141 | + e = [n for (n, u) in interesting_encodings if u == universal_encoding] |
142 | + else: |
143 | + e = [n for (n, u) in interesting_encodings] |
144 | + return itertools.cycle(iter(e)) |
145 | |
146 | === added file 'bzrlib/tests/test_fixtures.py' |
147 | --- bzrlib/tests/test_fixtures.py 1970-01-01 00:00:00 +0000 |
148 | +++ bzrlib/tests/test_fixtures.py 2010-06-18 09:45:47 +0000 |
149 | @@ -0,0 +1,28 @@ |
150 | +# Copyright (C) 2005-2010 Canonical Ltd |
151 | +# |
152 | +# This program is free software; you can redistribute it and/or modify |
153 | +# it under the terms of the GNU General Public License as published by |
154 | +# the Free Software Foundation; either version 2 of the License, or |
155 | +# (at your option) any later version. |
156 | +# |
157 | +# This program is distributed in the hope that it will be useful, |
158 | +# but WITHOUT ANY WARRANTY; without even the implied warranty of |
159 | +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
160 | +# GNU General Public License for more details. |
161 | +# |
162 | +# You should have received a copy of the GNU General Public License |
163 | +# along with this program; if not, write to the Free Software |
164 | +# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA |
165 | + |
166 | +"""Tests for test fixtures""" |
167 | + |
168 | +import codecs |
169 | + |
170 | +from bzrlib import ( |
171 | + tests, |
172 | + ) |
173 | +from bzrlib.tests import ( |
174 | + fixtures, |
175 | + ) |
176 | + |
177 | + |
178 | |
179 | === modified file 'bzrlib/tests/test_ui.py' |
180 | --- bzrlib/tests/test_ui.py 2010-05-20 18:23:17 +0000 |
181 | +++ bzrlib/tests/test_ui.py 2010-06-18 09:45:47 +0000 |
182 | @@ -24,6 +24,7 @@ |
183 | from StringIO import StringIO |
184 | |
185 | from bzrlib import ( |
186 | + config, |
187 | errors, |
188 | remote, |
189 | repository, |
190 | @@ -33,10 +34,26 @@ |
191 | from bzrlib.symbol_versioning import ( |
192 | deprecated_in, |
193 | ) |
194 | -from bzrlib.tests import test_progress |
195 | +from bzrlib.tests import ( |
196 | + fixtures, |
197 | + test_progress, |
198 | + ) |
199 | from bzrlib.ui import text as _mod_ui_text |
200 | |
201 | |
202 | +class TestUIConfiguration(tests.TestCaseWithTransport): |
203 | + |
204 | + def test_output_encoding_configuration(self): |
205 | + enc = fixtures.generate_unicode_encodings().next() |
206 | + config.GlobalConfig().set_user_option('output_encoding', |
207 | + enc) |
208 | + ui = tests.TestUIFactory(stdin=None, |
209 | + stdout=tests.StringIOWrapper(), |
210 | + stderr=tests.StringIOWrapper()) |
211 | + os = ui.make_output_stream() |
212 | + self.assertEquals(os.encoding, enc) |
213 | + |
214 | + |
215 | class TestTextUIFactory(tests.TestCase): |
216 | |
217 | def test_text_factory_ascii_password(self): |
218 | |
219 | === modified file 'bzrlib/ui/__init__.py' |
220 | --- bzrlib/ui/__init__.py 2010-03-25 07:34:15 +0000 |
221 | +++ bzrlib/ui/__init__.py 2010-06-18 09:45:47 +0000 |
222 | @@ -158,8 +158,9 @@ |
223 | version of stdout, but in a GUI it might be appropriate to send it to a |
224 | window displaying the text. |
225 | |
226 | - :param encoding: Unicode encoding for output; default is the |
227 | - terminal encoding, which may be different from the user encoding. |
228 | + :param encoding: Unicode encoding for output; if not specified |
229 | + uses the configured 'output_encoding' if any; otherwise the |
230 | + terminal encoding. |
231 | (See get_terminal_encoding.) |
232 | |
233 | :param encoding_type: How to handle encoding errors: |
234 | @@ -167,6 +168,10 @@ |
235 | """ |
236 | # XXX: is the caller supposed to close the resulting object? |
237 | if encoding is None: |
238 | + from bzrlib import config |
239 | + encoding = config.GlobalConfig().get_user_option( |
240 | + 'output_encoding') |
241 | + if encoding is None: |
242 | encoding = osutils.get_terminal_encoding() |
243 | if encoding_type is None: |
244 | encoding_type = 'replace' |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Martin Pool wrote:
> Martin Pool has proposed merging lp:~mbp/bzr/340394-encoding-option into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
>
>
> This is probably a bit out of date with other test changes, but I'll put it up so bialix can see it: it adds an output_encoding config option that controls how we encode non-file-content output. My next step was going to be to add a way to set this on the command line for the duration of a process.
>
=== modified file 'bzrlib/ tests/_ _init__ .py' tests/_ _init__ .py 2010-06-08 01:42:50 +0000 tests/_ _init__ .py 2010-06-18 09:45:47 +0000
'bzrlib. tests.test_ export' ,
'bzrlib. tests.test_ extract' ,
'bzrlib. tests.test_ fetch', tests.test_ fixtures' ,
'bzrlib. tests.test_ fifo_cache' ,
'bzrlib. tests.test_ filters' ,
'bzrlib. tests.test_ ftp_transport' ,
'bzrlib. option' ,
'bzrlib. symbol_ versioning' ,
'bzrlib. tests', tests.fixtures' ,
'bzrlib. timestamp' ,
'bzrlib. version_ info_formats. format_ custom' ,
- --- bzrlib/
+++ bzrlib/
@@ -3704,6 +3704,7 @@
+ 'bzrlib.
@@ -3839,6 +3840,7 @@
+ 'bzrlib.
]
^- This confused me at first, but I see that it is "tests. test_fixtures"
versus "tests.fixtures". Probably fine overall, just mentioning the
confusion.
=== added file 'bzrlib/ tests/test_ fixtures. py' tests/test_ fixtures. py 1970-01-01 00:00:00 +0000 tests/test_ fixtures. py 2010-06-18 09:45:47 +0000
- --- bzrlib/
+++ bzrlib/
@@ -0,0 +1,28 @@
+# Copyright (C) 2005-2010 Canonical Ltd
^- I'm pretty sure this isn't accurate :).
+class TestUIConfigura tion(tests. TestCaseWithTra nsport) : encoding_ configuration( self): generate_ unicode_ encodings( ).next( ) GlobalConfig( ).set_user_ option( 'output_ encoding' , tory(stdin= None, tests.StringIOW rapper( ), tests.StringIOW rapper( )) output_ stream( ) ls(os.encoding, enc)
+
+ def test_output_
+ enc = fixtures.
+ config.
+ enc)
+ ui = tests.TestUIFac
+ stdout=
+ stderr=
+ os = ui.make_
+ self.assertEqua
+
+
^- this seems like it would want to be a permutation test across all the unicode_ encodings( )' rather than a single test of the first one.
'generate_
Conceptually I'm fine with this. I think it is a reasonable way to
handle stuff like "I want utf-8 always, even though my system doesn't
think so."
merge: approve
John
=:->
-----BEGIN PGP SIGNATURE----- enigmail. mozdev. org/
bgo4ACgkQJdeBCY SNAANGdQCghkA48 qKjGt9reS13FkE8 e2bK dcKauNFwhfss15X Tn
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAkw
ydAAoJZYizvZDCt
=4eZG
-----END PGP SIGNATURE-----