Bazaar

Merge lp:~mbp/bzr/340394-encoding-option into lp:bzr

340394-encoding-option
Merge into bzr.dev

Proposed by Martin Pool on 2010-06-18

Status:	Merged
Approved by:	John A Meinel on 2010-06-18
Approved revision:	no longer in the source branch.
Merged at revision:	5317
Proposed branch:	lp:~mbp/bzr/340394-encoding-option
Merge into:	lp:bzr
Diff against target:	244 lines (+153/-3) 7 files modified NEWS (+3/-0) bzrlib/help_topics/en/configuration.txt (+11/-0) bzrlib/tests/__init__.py (+2/-0) bzrlib/tests/fixtures.py (+84/-0) bzrlib/tests/test_fixtures.py (+28/-0) bzrlib/tests/test_ui.py (+18/-1) bzrlib/ui/__init__.py (+7/-2)
To merge this branch:	bzr merge lp:~mbp/bzr/340394-encoding-option
Related bugs:	Link a bug report

Reviewer	Date Requested	Status
Robert Collins (community)		Needs Information on 2010-06-20
John A Meinel	2010-06-18	Approve on 2010-06-18
Review via email: mp+27913@code.launchpad.net

Description of the change

This is probably a bit out of date with other test changes, but I'll put it up so bialix can see it: it adds an output_encoding config option that controls how we encode non-file-content output. My next step was going to be to add a way to set this on the command line for the duration of a process.

Revision history for this message

John A Meinel (jameinel) wrote on 2010-06-18:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> Martin Pool has proposed merging lp:~mbp/bzr/340394-encoding-option into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
>
>
> This is probably a bit out of date with other test changes, but I'll put it up so bialix can see it: it adds an output_encoding config option that controls how we encode non-file-content output. My next step was going to be to add a way to set this on the command line for the duration of a process.
>

=== modified file 'bzrlib/tests/__init__.py'
- --- bzrlib/tests/__init__.py 2010-06-08 01:42:50 +0000
+++ bzrlib/tests/__init__.py 2010-06-18 09:45:47 +0000
@@ -3704,6 +3704,7 @@
         'bzrlib.tests.test_export',
         'bzrlib.tests.test_extract',
         'bzrlib.tests.test_fetch',
+ 'bzrlib.tests.test_fixtures',
         'bzrlib.tests.test_fifo_cache',
         'bzrlib.tests.test_filters',
         'bzrlib.tests.test_ftp_transport',
@@ -3839,6 +3840,7 @@
         'bzrlib.option',
         'bzrlib.symbol_versioning',
         'bzrlib.tests',
+ 'bzrlib.tests.fixtures',
         'bzrlib.timestamp',
         'bzrlib.version_info_formats.format_custom',
         ]

^- This confused me at first, but I see that it is "tests.test_fixtures"
versus "tests.fixtures". Probably fine overall, just mentioning the
confusion.

=== added file 'bzrlib/tests/test_fixtures.py'
- --- bzrlib/tests/test_fixtures.py 1970-01-01 00:00:00 +0000
+++ bzrlib/tests/test_fixtures.py 2010-06-18 09:45:47 +0000
@@ -0,0 +1,28 @@
+# Copyright (C) 2005-2010 Canonical Ltd

^- I'm pretty sure this isn't accurate :).

+class TestUIConfiguration(tests.TestCaseWithTransport):
+
+ def test_output_encoding_configuration(self):
+ enc = fixtures.generate_unicode_encodings().next()
+ config.GlobalConfig().set_user_option('output_encoding',
+ enc)
+ ui = tests.TestUIFactory(stdin=None,
+ stdout=tests.StringIOWrapper(),
+ stderr=tests.StringIOWrapper())
+ os = ui.make_output_stream()
+ self.assertEquals(os.encoding, enc)
+
+

^- this seems like it would want to be a permutation test across all the
'generate_unicode_encodings()' rather than a single test of the first one.

Conceptually I'm fine with this. I think it is a reasonable way to
handle stuff like "I want utf-8 always, even though my system doesn't
think so."

merge: approve

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkwbgo4ACgkQJdeBCYSNAANGdQCghkA48qKjGt9reS13FkE8e2bK
ydAAoJZYizvZDCtdcKauNFwhfss15XTn
=4eZG
-----END PGP SIGNATURE-----

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> Martin Pool has proposed merging lp:~mbp/bzr/340394-encoding-option into lp:bzr.
> 
> Requested reviews:
>   bzr-core (bzr-core)
> 
> 
> This is probably a bit out of date with other test changes, but I'll put it up so bialix can see it: it adds an output_encoding config option that controls how we encode non-file-content output.  My next step was going to be to add a way to set this on the command line for the duration of a process.
>

=== modified file 'bzrlib/tests/__init__.py'
- --- bzrlib/tests/__init__.py	2010-06-08 01:42:50 +0000
+++ bzrlib/tests/__init__.py	2010-06-18 09:45:47 +0000
@@ -3704,6 +3704,7 @@
         'bzrlib.tests.test_export',
         'bzrlib.tests.test_extract',
         'bzrlib.tests.test_fetch',
+        'bzrlib.tests.test_fixtures',
         'bzrlib.tests.test_fifo_cache',
         'bzrlib.tests.test_filters',
         'bzrlib.tests.test_ftp_transport',
@@ -3839,6 +3840,7 @@
         'bzrlib.option',
         'bzrlib.symbol_versioning',
         'bzrlib.tests',
+        'bzrlib.tests.fixtures',
         'bzrlib.timestamp',
         'bzrlib.version_info_formats.format_custom',
         ]

^- This confused me at first, but I see that it is "tests.test_fixtures"
versus "tests.fixtures". Probably fine overall, just mentioning the
confusion.

=== added file 'bzrlib/tests/test_fixtures.py'
- --- bzrlib/tests/test_fixtures.py	1970-01-01 00:00:00 +0000
+++ bzrlib/tests/test_fixtures.py	2010-06-18 09:45:47 +0000
@@ -0,0 +1,28 @@
+# Copyright (C) 2005-2010 Canonical Ltd

^- I'm pretty sure this isn't accurate :).

+class TestUIConfiguration(tests.TestCaseWithTransport):
+
+    def test_output_encoding_configuration(self):
+        enc = fixtures.generate_unicode_encodings().next()
+        config.GlobalConfig().set_user_option('output_encoding',
+            enc)
+        ui = tests.TestUIFactory(stdin=None,
+            stdout=tests.StringIOWrapper(),
+            stderr=tests.StringIOWrapper())
+        os = ui.make_output_stream()
+        self.assertEquals(os.encoding, enc)
+
+

^- this seems like it would want to be a permutation test across all the
'generate_unicode_encodings()' rather than a single test of the first one.

Conceptually I'm fine with this. I think it is a reasonable way to
handle stuff like "I want utf-8 always, even though my system doesn't
think so."

merge: approve

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkwbgo4ACgkQJdeBCYSNAANGdQCghkA48qKjGt9reS13FkE8e2bK
ydAAoJZYizvZDCtdcKauNFwhfss15XTn
=4eZG
-----END PGP SIGNATURE-----

review: Approve

Revision history for this message

Robert Collins (lifeless) wrote on 2010-06-20:

This doesn't make sense: why would we want to have an output encoding that isn't compatible with the operating systems encoding? I'm channelling Martin[gz] here I think - I recall him asking this question in IRC, and I think its a good question.

review: Needs Information

Revision history for this message

John A Meinel (jameinel) wrote on 2010-06-21:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> Review: Needs Information
> This doesn't make sense: why would we want to have an output encoding that isn't compatible with the operating systems encoding? I'm channelling Martin[gz] here I think - I recall him asking this question in IRC, and I think its a good question.

Incorrect auto-detection. (Fairly common, Mac hasn't traditionally
detected well, neither FreeBSD, etc.)

Output to a file.

bzr log | less
vs
bzr log > content.txt

(On Windows, the former should probably use Terminal encoding, but the
later should use locale.getpreferredencoding())

I think the big thing is having this as a step along the way to being
able to supply it on a per-command basis.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkwfbKAACgkQJdeBCYSNAANDyACggTDoJffExuFR7Rkrj6X0CxZi
3B8AoI06keflVw93Ppezw6p++KQx/yrV
=DM/S
-----END PGP SIGNATURE-----

Revision history for this message

Robert Collins (lifeless) wrote on 2010-06-21:

That seems to be coupled to the as yet untackled command-line-option-setting work; still, I can see the argument, will land.

The fixtures file is pretty rudimentary now but we can iterate.

Revision history for this message

Robert Collins (lifeless) wrote on 2010-06-21:

Oh, one more thing of interest - I'm not sure 'Unicode options' is the best name for this in the help, but I can't think of a better one right now.

I'm proposing a fixed branch with NEWS entries and correct copyright years now; would appreciate a rubber stamp on it.

Revision history for this message

Alexander Belchenko (bialix) wrote on 2010-06-22:

John A Meinel пишет:
> Robert Collins wrote:
>> Review: Needs Information
>> This doesn't make sense: why would we want to have an output encoding that isn't compatible with the operating systems encoding? I'm channelling Martin[gz] here I think - I recall him asking this question in IRC, and I think its a good question.
>
> Incorrect auto-detection. (Fairly common, Mac hasn't traditionally
> detected well, neither FreeBSD, etc.)
>
> Output to a file.
>
> bzr log | less
> vs
> bzr log > content.txt
>
> (On Windows, the former should probably use Terminal encoding, but the
> later should use locale.getpreferredencoding())

Exactly.

> I think the big thing is having this as a step along the way to being
> able to supply it on a per-command basis.

On UDS I've talked with Martin and proposed global command-line option
--encoding for this, e.g.:

bzr --encoding utf-8 log > file.txt

But if this patch is the first step towards this goal, it's ok.

Revision history for this message

Martin Pool (mbp) wrote on 2010-06-22:

On 21 June 2010 23:44, John Arbash Meinel <email address hidden> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Robert Collins wrote:
>> Review: Needs Information
>> This doesn't make sense: why would we want to have an output encoding that isn't compatible with the operating systems encoding? I'm channelling Martin[gz] here I think - I recall him asking this question in IRC, and I think its a good question.
>
> Incorrect auto-detection. (Fairly common, Mac hasn't traditionally
> detected well, neither FreeBSD, etc.)

It is a good question. Beyond the cases John mentions I just might
want the output in some arbitrary encoding: I might always work in
8859-7 but want to export a diff as utf-8 to attach it to a Launchpad
bug.

--
Martin

Revision history for this message

Robert Collins (lifeless) wrote on 2010-06-22:

On Tue, Jun 22, 2010 at 6:58 PM, Martin Pool <email address hidden> wrote:
>> Incorrect auto-detection. (Fairly common, Mac hasn't traditionally
>> detected well, neither FreeBSD, etc.)
>
> It is a good question. Beyond the cases John mentions I just might
> want the output in some arbitrary encoding: I might always work in
> 8859-7 but want to export a diff as utf-8 to attach it to a Launchpad
> bug.

I don't see that particular one as relevant because the OS
autodetection permits changing that via the existing standard
environment variables already - or am I missing something? [possibly
the variables are too big a hammer and the diff won't generate from
disk correctly if the variables are changed?]

Revision history for this message

John A Meinel (jameinel) wrote on 2010-06-22:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> On Tue, Jun 22, 2010 at 6:58 PM, Martin Pool <email address hidden> wrote:
>>> Incorrect auto-detection. (Fairly common, Mac hasn't traditionally
>>> detected well, neither FreeBSD, etc.)
>> It is a good question. Beyond the cases John mentions I just might
>> want the output in some arbitrary encoding: I might always work in
>> 8859-7 but want to export a diff as utf-8 to attach it to a Launchpad
>> bug.
>
> I don't see that particular one as relevant because the OS
> autodetection permits changing that via the existing standard
> environment variables already - or am I missing something? [possibly
> the variables are too big a hammer and the diff won't generate from
> disk correctly if the variables are changed?]

You can't set utf-8 as the codepage? (I believe you can only set 8-bit
code pages for the shell on Windows)

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkwhDIwACgkQJdeBCYSNAANP3gCgp95nm1ELcGjxd7XmzbN/Wr+1
0z4AoLuawZMpxliHyQ+VxWfxNGiTmdKx
=KNBS
-----END PGP SIGNATURE-----

Revision history for this message

Robert Collins (lifeless) wrote on 2010-06-22:

Ah! anyhow, as said above, lets merge it.

Revision history for this message

Martin Pool (mbp) wrote on 2010-06-23:

On 23 June 2010 05:01, Robert Collins <email address hidden> wrote:
> On Tue, Jun 22, 2010 at 6:58 PM, Martin Pool <email address hidden> wrote:
>>> Incorrect auto-detection. (Fairly common, Mac hasn't traditionally
>>> detected well, neither FreeBSD, etc.)
>>
>> It is a good question. Beyond the cases John mentions I just might
>> want the output in some arbitrary encoding: I might always work in
>> 8859-7 but want to export a diff as utf-8 to attach it to a Launchpad
>> bug.
>
> I don't see that particular one as relevant because the OS
> autodetection permits changing that via the existing standard
> environment variables already - or am I missing something? [possibly
> the variables are too big a hammer and the diff won't generate from
> disk correctly if the variables are changed?]

It's a few things:

* eventually I want to have a command line option that rebinds these
variables for the scope of a process
* I'm not sure if it's easy to change this from the windows command line
* OS autodetection is sometimes wrong
* you might want to set things on a finer granularity, for example
the unix path encoding, and these are not well exposed

--
Martin

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Alejandro Cornejo2

Bazaar Codereview Subscribers

Benoit Pierre

Gmood

Karl Bielefeldt

Mahmoud Hassan

Martin Pool

Matt Nordhoff

Mohd Fikri Mohd Amin

MrJOHN

Václav Haisman

bzr PQM

vincenzo

to status/vote changes:

Alexander Belchenko

amandla2023

 === modified file 'NEWS'
 --- NEWS	2010-06-17 14:06:36 +0000
 +++ NEWS	2010-06-18 09:45:47 +0000
@@ -403,6 +403,9 @@
  Testing
  *******
++* Add ``bzrlib.tests.fixtures`` to hold code for setting up objects
++  to test.  (Martin Pool)
++
  * Added ``bzrlib.tests.matchers`` as a place to put matchers, along with
    our first in-tree matcher. See the module docstring for details.
    (Robert Collins)
 === modified file 'bzrlib/help_topics/en/configuration.txt'
 --- bzrlib/help_topics/en/configuration.txt	2010-06-02 05:03:31 +0000
 +++ bzrlib/help_topics/en/configuration.txt	2010-06-18 09:45:47 +0000
@@ -496,6 +496,17 @@
      using deprecated formats.
++Unicode options
++---------------
++
++output_encoding
++~~~~~~~~~~~~~~~
++
++A Python unicode encoding name for text output from bzr, such as log
++information.  Values include: utf8, cp850, ascii, iso-8859-1.  The default
++is the terminal encoding prefered by the operating system.
++
++
  Branch type specific options
  ----------------------------
 === modified file 'bzrlib/tests/__init__.py'
 --- bzrlib/tests/__init__.py	2010-06-08 01:42:50 +0000
 +++ bzrlib/tests/__init__.py	2010-06-18 09:45:47 +0000
@@ -3704,6 +3704,7 @@
          'bzrlib.tests.test_export',
          'bzrlib.tests.test_extract',
          'bzrlib.tests.test_fetch',
++        'bzrlib.tests.test_fixtures',
          'bzrlib.tests.test_fifo_cache',
          'bzrlib.tests.test_filters',
          'bzrlib.tests.test_ftp_transport',
@@ -3839,6 +3840,7 @@
          'bzrlib.option',
          'bzrlib.symbol_versioning',
          'bzrlib.tests',
++        'bzrlib.tests.fixtures',
          'bzrlib.timestamp',
          'bzrlib.version_info_formats.format_custom',
+         ]
 === added file 'bzrlib/tests/fixtures.py'
 --- bzrlib/tests/fixtures.py	1970-01-01 00:00:00 +0000
 +++ bzrlib/tests/fixtures.py	2010-06-18 09:45:47 +0000
@@ -0,0 +1,84 @@
++# Copyright (C) 2005-2010 Canonical Ltd
++#
++# This program is free software; you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation; either version 2 of the License, or
++# (at your option) any later version.
++#
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
++# GNU General Public License for more details.
++#
++# You should have received a copy of the GNU General Public License
++# along with this program; if not, write to the Free Software
++# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
++
++
++"""Fixtures that can be used within tests.
++
++Fixtures can be created during a test as a way to separate out creation of
++objects to test.  Fixture objects can hold some state so that different
++objects created during a test instance can be related.  Normally a fixture
++should live only for the duration of a single test, and its tearDown method
++should be passed to `addCleanup` on the test.
++"""
++
++
++import itertools
++
++
++def generate_unicode_names():
++    """Generate a sequence of arbitrary unique unicode names.
++
++    By default they are not representable in ascii.
++
++    >>> gen = generate_unicode_names()
++    >>> n1 = gen.next()
++    >>> n2 = gen.next()
++    >>> type(n1)
++    <type 'unicode'>
++    >>> n1 == n2
++    False
++    >>> n1.encode('ascii', 'replace') == n1
++    False
++    """
++    # include a mathematical symbol unlikely to be in 8-bit encodings
++    return (u"\N{SINE WAVE}%d" % x for x in itertools.count())
++
++
++interesting_encodings = [
++    ('iso-8859-1', False),
++    ('ascii', False),
++    ('cp850', False),
++    ('utf-8', True),
++    ('ucs-2', True),
++    ]
++
++
++def generate_unicode_encodings(universal_encoding=None):
++    """Return a generator of unicode encoding names.
++
++    These can be passed to Python encode/decode/etc.
++
++    :param universal_encoding: True/False/None tristate to say whether the
++        generated encodings either can or cannot encode all unicode
++        strings.
++
++    >>> n1 = generate_unicode_names().next()
++    >>> enc = generate_unicode_encodings(universal_encoding=True).next()
++    >>> enc2 = generate_unicode_encodings(universal_encoding=False).next()
++    >>> n1.encode(enc).decode(enc) == n1
++    True
++    >>> try:
++    ...   n1.encode(enc2).decode(enc2)
++    ... except UnicodeError:
++    ...   print 'fail'
++    fail
++    """
++    # TODO: check they're supported on this platform?
++    if universal_encoding is not None:
++        e = [n for (n, u) in interesting_encodings if u == universal_encoding]
++    else:
++        e = [n for (n, u) in interesting_encodings]
++    return itertools.cycle(iter(e))
 === added file 'bzrlib/tests/test_fixtures.py'
 --- bzrlib/tests/test_fixtures.py	1970-01-01 00:00:00 +0000
 +++ bzrlib/tests/test_fixtures.py	2010-06-18 09:45:47 +0000
@@ -0,0 +1,28 @@
++# Copyright (C) 2005-2010 Canonical Ltd
++#
++# This program is free software; you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation; either version 2 of the License, or
++# (at your option) any later version.
++#
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
++# GNU General Public License for more details.
++#
++# You should have received a copy of the GNU General Public License
++# along with this program; if not, write to the Free Software
++# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
++
++"""Tests for test fixtures"""
++
++import codecs
++
++from bzrlib import (
++    tests,
++    )
++from bzrlib.tests import (
++    fixtures,
++    )
++
++
 === modified file 'bzrlib/tests/test_ui.py'
 --- bzrlib/tests/test_ui.py	2010-05-20 18:23:17 +0000
 +++ bzrlib/tests/test_ui.py	2010-06-18 09:45:47 +0000
@@ -24,6 +24,7 @@
  from StringIO import StringIO
  from bzrlib import (
++    config,
      errors,
      remote,
      repository,
@@ -33,10 +34,26 @@
  from bzrlib.symbol_versioning import (
      deprecated_in,
+     )
--from bzrlib.tests import test_progress
++from bzrlib.tests import (
++    fixtures,
++    test_progress,
++    )
  from bzrlib.ui import text as _mod_ui_text
++class TestUIConfiguration(tests.TestCaseWithTransport):
++
++    def test_output_encoding_configuration(self):
++        enc = fixtures.generate_unicode_encodings().next()
++        config.GlobalConfig().set_user_option('output_encoding',
++            enc)
++        ui = tests.TestUIFactory(stdin=None,
++            stdout=tests.StringIOWrapper(),
++            stderr=tests.StringIOWrapper())
++        os = ui.make_output_stream()
++        self.assertEquals(os.encoding, enc)
++
++
  class TestTextUIFactory(tests.TestCase):
      def test_text_factory_ascii_password(self):
 === modified file 'bzrlib/ui/__init__.py'
 --- bzrlib/ui/__init__.py	2010-03-25 07:34:15 +0000
 +++ bzrlib/ui/__init__.py	2010-06-18 09:45:47 +0000
@@ -158,8 +158,9 @@
          version of stdout, but in a GUI it might be appropriate to send it to a
          window displaying the text.
--        :param encoding: Unicode encoding for output; default is the
--            terminal encoding, which may be different from the user encoding.
++        :param encoding: Unicode encoding for output; if not specified
++            uses the configured 'output_encoding' if any; otherwise the
++            terminal encoding.
              (See get_terminal_encoding.)
          :param encoding_type: How to handle encoding errors:
@@ -167,6 +168,10 @@
          """
          # XXX: is the caller supposed to close the resulting object?
          if encoding is None:
++            from bzrlib import config
++            encoding = config.GlobalConfig().get_user_option(
++                'output_encoding')
++        if encoding is None:
              encoding = osutils.get_terminal_encoding()
          if encoding_type is None:
              encoding_type = 'replace'

Bazaar

Merge lp:~mbp/bzr/340394-encoding-option into lp:bzr

Commit message

Description of the change

Preview Diff

Subscribers