mail handling stalls when a large message is received and not deliverable

Bug #788874 reported by Robert Collins
4
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
Critical
Brad Crittenden

Bug Description

Right now the following is happening:
We have a large email inbound (larger than our outbound limit).
The mail fails to deliver for some reason:
http://launchpadlibrarian.net/72490694/9DXUl51Txcr83QLvOBe5YOGwyvJ.txt
And then we try to send a non-delivery report (NDR) but include so much of the original that our NDR cannot be sent:
http://launchpadlibrarian.net/72490577/gDKKPU6dDPL03MfpfvG4HKw1R1S.txt

Processing of mail is stalled while this happens - we don't pop the bad mail.

Recommendation
==============

Size limit our NDR reports to fit under our outbound mail limit (10MB currently, there is probably a config setting for it)

Related branches

Revision history for this message
Martin Pool (mbp) wrote :

Various issues here:

1- send_process_error_notification should probably truncate the bounced-back mail
2- arguably, incoming.py should be more aggressive about dropping problematic incoming mails. there is some kind of tradeoff about whether, if we fail to send an error, we would rather silently drop the mail, or stall processing. perhaps the best thing is to log the oops, delete the incoming mail, then send the bounce. But that may be overoptimizing for this particular failure.

Revision history for this message
Robert Collins (lifeless) wrote :

We've worked around the immediate fail in production, but this will happen again if not fixed.

summary: - mail handling stalls when a large message is received
+ mail handling stalls when a large message is received and not
+ deliverable
description: updated
Changed in launchpad:
importance: Critical → High
tags: added: canonical-losa-lp
Revision history for this message
William Grant (wgrant) wrote :

This should be an OOPS, and has happened again.

Changed in launchpad:
importance: High → Critical
Revision history for this message
Martin Pool (mbp) wrote : Re: [Bug 788874] Re: mail handling stalls when a large message is received and not deliverable

A few options here:

1- Set the border mta incoming message size limit down, or the
in-application limit up, so messages are rejected before reaching
incoming.py. In some ways this is cleaner than bouncing them.

2- Truncate when sending errors. Probably good, though I wonder if that will
leave open other cases where we could fail to send the reply.

3- Delete the message from the mailbox before sending the oops. One line
change. It does not seem likely that retrying them will help.
On May 28, 2011 5:55 PM, "William Grant" <email address hidden> wrote:

Revision history for this message
Robert Collins (lifeless) wrote :

On Thu, Jun 2, 2011 at 3:35 PM, Martin Pool <email address hidden> wrote:
> A few options here:
>
> 1- Set the border mta incoming message size limit down, or the
> in-application limit up, so messages are rejected before reaching
> incoming.py. In some ways this is cleaner than bouncing them.

We could, but even after that...

> 2- Truncate when sending errors. Probably good, though I wonder if that will
> leave open other cases where we could fail to send the reply.

This really appeals to me

Revision history for this message
Martin Pool (mbp) wrote :

For interest, cinerama tells me the gateway MTA looks to be set at a 50MB limit. It is perhaps odd we accept 50MB mails but limit ourselves to sending out 10MB mails.

> This should be an OOPS, and has happened again.

It's happening while already trying to send and report on an oops. Perhaps we should also oops about the failure to send notification of the oops?

I'm not sure oopsing on the original message fits the oops policy, because this is something users could produce on demand. To handle that we would have to distinguish "message too big" from "something else bad happened", and perhaps there it's better to fix it on the border MTA.

I think perhaps the easiest thing is to just send back the OOPS id and not the whole message?

Revision history for this message
Martin Pool (mbp) wrote :
Brad Crittenden (bac)
description: updated
Brad Crittenden (bac)
Changed in launchpad:
assignee: nobody → Brad Crittenden (bac)
status: Triaged → In Progress
Revision history for this message
Launchpad QA Bot (lpqabot) wrote :
tags: added: qa-needstesting
Changed in launchpad:
status: In Progress → Fix Committed
Brad Crittenden (bac)
tags: added: qa-ok
removed: qa-needstesting
William Grant (wgrant)
Changed in launchpad:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.