Bazaar

Merge lp:~gz/bzr/require_unicode_committer_614593 into lp:bzr

require_unicode_committer_614593
Merge into bzr.dev

Proposed by Martin Packman on 2010-10-13

Status:

Merged

Approved by:

John A Meinel on 2010-10-15

Approved revision:

no longer in the source branch.

Merged at revision:

5510

Proposed branch:

lp:~gz/bzr/require_unicode_committer_614593

Merge into:

lp:bzr

Diff against target:

46 lines (+13/-1)

3 files modified

bzrlib/repository.py (+2/-0)
bzrlib/tests/per_repository/test_commit_builder.py (+10/-0)
bzrlib/tests/test_testament.py (+1/-1)

To merge this branch:

bzr merge lp:~gz/bzr/require_unicode_committer_614593

Low

Fix Released

Link a bug report

Reviewer	Date Requested	Status
Vincent Ladeuil	2010-10-13	Approve on 2010-10-14
John A Meinel	2010-10-14	Pending
Review via email: mp+38334@code.launchpad.net

Commit message

Check committer values are ascii or unicode and fix a test where it was not

Description of the change

Currently commit can be passed a non-ascii str value for committer and the value makes it all the way through to the repository serialisation code where it potentially outputs bogus data. As generally the input will be decoded to unicode already from the command line or config this isn't generally fatal, but is laying a bit of a trap for test and plugin authors to do the wrong thing without realising. With Python 2.7 it happens that this is now caught by the xml escaping function, and a misspelt test started failing.

As well as fixing that test, I've added a guard in commit that raises UnicodeDecodeError for all non-ascii str values. I think that's better than decoding with user_encoding or similar as we don't actually have a good basis for believing a str passed is one encoding or another, so it's better to leave that up to the caller. We could trap the decode error and raise some other flavour of exception instead, but as this will mostly be for bzrlib coders rather than users that's probably no more informative.

Revision history for this message

Vincent Ladeuil (vila) wrote on 2010-10-14:

This sounds fine to me, but I'd like John to confirm that we are indeed expecting Unicode there.

review: Approve

Revision history for this message

John A Meinel (jameinel) wrote on 2010-10-14:

If I were doing this, I would probably have done it deeper. Specifically "committer" is just passed into Repository.get_commit_builder(), and the commit builder code has:

        if committer is None:
            self._committer = self._config.username()
        else:
            self._committer = committer

That is probably the point where I would assert that it is properly Unicode. That may be a bit later than you would like (since the commit is pretty much finished then), but it should help prevent injection via some other path.

You could also potentially check in the Revision constructor.

Either way, we are expecting a unicode string here.

Revision history for this message

Martin Packman (gz) wrote on 2010-10-15:

Thanks John, that's just the kind of feedback I wanted. I'll move the guard further in as you suggest.

Doing that, just realised the test I added doesn't actually fail unless I add a format="dirstate-with-subtree" like the testament tests have. This suggests it should be in... per_repository.test_commit_builder instead maybe?

Revision history for this message

Martin Packman (gz) wrote on 2010-10-15:

Implemented those changes, but am now worrying over some new things.

Using per_repository seems wasteful as the guard triggers before the repo actually gets touched, so the 22 variations aren't interestingly different.

Also, CommitBuilder has an existing _validate_unicode_text used with the message and rev props I could have used for the committer check... but it doesn't actually do what it says. Which suggests they also could introduce bad bytestrings to the lower levels.

Revision history for this message

John A Meinel (jameinel) wrote on 2010-10-19:

I'm fine with it being per-repository, since each repository can have a custom CommitBuilder. Even if 99% of them are the same, catching the failure on a new repo format is good.

I would be ok if you wanted to add something like "if not isinstance(text, unicode)" to the _validate check. The code exists because revision property values (vs keys) are meant to be unicode content. And the set of unicode strings has some additional restrictions (not having '\r' in them).

(Just because it doesn't check everything it could *today* doesn't mean we can't add more checks :)

Revision history for this message

Martin Packman (gz) wrote on 2010-10-19:

Thanks John, I'll land this then and file a followup bug on tightening up the existing _validate methods.

Revision history for this message

Martin Packman (gz) wrote on 2010-10-19:

sent to pqm by email

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Alejandro Cornejo2

Bazaar Codereview Subscribers

Benoit Pierre

Gmood

Karl Bielefeldt

Mahmoud Hassan

Martin Packman

Matt Nordhoff

Mohd Fikri Mohd Amin

MrJOHN

Václav Haisman

bzr PQM

vincenzo

to status/vote changes:

Alexander Belchenko

amandla2023

1	=== modified file 'bzrlib/repository.py'
2	--- bzrlib/repository.py 2010-10-15 11:20:45 +0000
3	+++ bzrlib/repository.py 2010-10-15 16:41:54 +0000
4	@@ -115,6 +115,8 @@
5
6	if committer is None:
7	self._committer = self._config.username()
8	+ elif not isinstance(committer, unicode):
9	+ self._committer = committer.decode() # throw if non-ascii
10	else:
11	self._committer = committer
12
13
14	=== modified file 'bzrlib/tests/per_repository/test_commit_builder.py'
15	--- bzrlib/tests/per_repository/test_commit_builder.py 2010-08-10 19:17:49 +0000
16	+++ bzrlib/tests/per_repository/test_commit_builder.py 2010-10-15 16:41:54 +0000
17	@@ -1288,6 +1288,16 @@
18	self.assertRaises(ValueError, builder.commit,
19	u'Invalid\r\ncommit message\r\n')
20
21	+ def test_non_ascii_str_committer_rejected(self):
22	+ """Ensure an error is raised on a non-ascii byte string committer"""
23	+ branch = self.make_branch('.')
24	+ branch.repository.lock_write()
25	+ self.addCleanup(branch.repository.unlock)
26	+ self.assertRaises(UnicodeDecodeError,
27	+ branch.repository.get_commit_builder,
28	+ branch, [], branch.get_config(),
29	+ committer="Erik B\xe5gfors <erik@example.com>")
30	+
31	def test_stacked_repositories_reject_commit_builder(self):
32	# As per bug 375013, committing to stacked repositories is currently
33	# broken, so repositories with fallbacks refuse to hand out a commit
34
35	=== modified file 'bzrlib/tests/test_testament.py'
36	--- bzrlib/tests/test_testament.py 2009-03-23 14:59:43 +0000
37	+++ bzrlib/tests/test_testament.py 2010-10-15 16:41:54 +0000
38	@@ -129,7 +129,7 @@
39	timestamp=1129025493,
40	timezone=36000,
41	rev_id='test@user-3',
42	- committer='Erik B\xe5gfors <test@user>',
43	+ committer=u'Erik B\xe5gfors <test@user>',
44	revprops={'uni':u'\xb5'}
45	)
46	t = self.from_revision(self.b.repository, 'test@user-3')