I've fixed it to only escape the headers. I tried using message_from_string(), but the Message object requires that each header be deleted and then re-added, otherwise it would just create a duplicate header, and I started to worry about corner cases such as duplicate headers, which the method I was using didn't seem to take into account.
from zope.component import getUtility
@@ -233,10 +234,15 @@
# though it's much more convenient to just pass 8-bit strings.
if isinstance(bytes, xmlrpclib.Binary):
bytes = bytes.data
- # Although it is illegal for an email to have unencoded non-ascii
- # characters, it is better to let the list owner process the
- # message than to cause an oops.
- bytes = escape_nonascii_uniquely(bytes)
+ # Although it is illegal for an email header to have unencoded
+ # non-ascii characters, it is better to let the list owner
+ # process the message than to cause an oops.
+ header_body_separator = re.compile('\r\n\r\n|\r\r|\n\n')
+ match = header_body_separator.search(bytes)
+ header = bytes[:match.start()]
+ header = escape_nonascii_uniquely(header)
+ bytes = header + bytes[match.start():]
+ mailing_list = getUtility(IMailingListSet).get(team_name)
message = getUtility(IMessageSet).fromEmail(bytes) mailing_list.holdMessage(message)
-Messages with non-ascii in their headers or bodies are not exactly legal
+Messages with non-ascii in their headers are not exactly legal
(they should be encoded) but do occur especially in spam. These
messages can be held for moderator approval too. To avoid blowing up
later if the string is converted to a unicode object, the non-ascii
@@ -239,8 +239,7 @@
... Message-ID: <fifth-post\xa9>
... Date: Fri, 01 Aug 2000 01:08:59 -0000
...
- ... Watch out for badgers! \xa9
- ... Don't double quote characters: =E3=F6=FC
+ ... Don't escape non-ascii characters in the body! \xa9
... """)
>>> import xmlrpclib
@@ -269,8 +268,7 @@
'Message-ID: <fifth-post\\xa9>',
'Date: Fri, 01 Aug 2000 01:08:59 -0000',
'',
- 'Watch out for badgers! \\xa9',
- "Don't double quote characters: =E3=F6=FC"]
+ "Don't escape non-ascii characters in the body! \xa9"]
>>> held_message_spam.status
<DBItem PostedMessageStatus.NEW, (0) New status>
Hi Aaron,
I've fixed it to only escape the headers. I tried using message_ from_string( ), but the Message object requires that each header be deleted and then re-added, otherwise it would just create a duplicate header, and I started to worry about corner cases such as duplicate headers, which the method I was using didn't seem to take into account.
-Edwin
=== modified file 'lib/canonical/ launchpad/ xmlrpc/ mailinglist. py' launchpad/ xmlrpc/ mailinglist. py 2010-08-23 17:43:51 +0000 launchpad/ xmlrpc/ mailinglist. py 2010-08-23 22:03:50 +0000 stAPIView' ,
--- lib/canonical/
+++ lib/canonical/
@@ -8,6 +8,7 @@
'MailingLi
]
+import re
import xmlrpclib
from zope.component import getUtility nonascii_ uniquely( bytes) body_separator = re.compile( '\r\n\r\ n|\r\r| \n\n') body_separator. search( bytes) match.start( )] nonascii_ uniquely( header) start() :]
mailing_ list = getUtility( IMailingListSet ).get(team_ name) IMessageSet) .fromEmail( bytes)
mailing_ list.holdMessag e(message)
@@ -233,10 +234,15 @@
# though it's much more convenient to just pass 8-bit strings.
if isinstance(bytes, xmlrpclib.Binary):
bytes = bytes.data
- # Although it is illegal for an email to have unencoded non-ascii
- # characters, it is better to let the list owner process the
- # message than to cause an oops.
- bytes = escape_
+ # Although it is illegal for an email header to have unencoded
+ # non-ascii characters, it is better to let the list owner
+ # process the message than to cause an oops.
+ header_
+ match = header_
+ header = bytes[:
+ header = escape_
+ bytes = header + bytes[match.
+
message = getUtility(
=== modified file 'lib/lp/ registry/ doc/message- holds-xmlrpc. txt' registry/ doc/message- holds-xmlrpc. txt 2010-08-23 17:50:11 +0000 registry/ doc/message- holds-xmlrpc. txt 2010-08-23 22:06:19 +0000
--- lib/lp/
+++ lib/lp/
@@ -226,7 +226,7 @@
Non-ascii messages
==================
-Messages with non-ascii in their headers or bodies are not exactly legal
+Messages with non-ascii in their headers are not exactly legal
(they should be encoded) but do occur especially in spam. These
messages can be held for moderator approval too. To avoid blowing up
later if the string is converted to a unicode object, the non-ascii
@@ -239,8 +239,7 @@
... Message-ID: <fifth-post\xa9>
... Date: Fri, 01 Aug 2000 01:08:59 -0000
...
- ... Watch out for badgers! \xa9
- ... Don't double quote characters: =E3=F6=FC
+ ... Don't escape non-ascii characters in the body! \xa9
... """)
>>> import xmlrpclib
@@ -269,8 +268,7 @@
'Message-ID: <fifth-post\\xa9>',
'Date: Fri, 01 Aug 2000 01:08:59 -0000',
'',
- 'Watch out for badgers! \\xa9',
- "Don't double quote characters: =E3=F6=FC"]
+ "Don't escape non-ascii characters in the body! \xa9"]
>>> held_message_ spam.status atus.NEW, (0) New status>
<DBItem PostedMessageSt