REST server crashes on "reopen"

Bug #1184376 reported by Aurélien Bompard
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
GNU Mailman
Fix Released
High
Barry Warsaw

Bug Description

Mailman's REST server crashes on the "mailman reopen" command. Here are the only lines I get in the logs:

May 26 12:29:01 2013 (30830) Master started
May 26 12:29:05 2013 (30844) RESTRunner runner started.
May 26 12:29:05 2013 (30844) Starting REST server
May 26 12:29:05 2013 (30837) DigestRunner runner started.
May 26 12:29:05 2013 (30843) RetryRunner runner started.
May 26 12:29:05 2013 (30848) CommandRunner runner started.
May 26 12:29:05 2013 (30845) BounceRunner runner started.
May 26 12:29:05 2013 (30846) LMTPRunner runner started.
May 26 12:29:05 2013 (30850) NNTPRunner runner started.
May 26 12:29:05 2013 (30838) PipelineRunner runner started.
May 26 12:29:06 2013 (30839) ArchiveRunner runner started.
May 26 12:29:06 2013 (30849) VirginRunner runner started.
May 26 12:29:06 2013 (30836) IncomingRunner runner started.
May 26 12:29:06 2013 (30840) OutgoingRunner runner started.
May 26 12:29:33 2013 (30830) Master watcher caught SIGHUP. Re-opening log files.
May 26 12:29:33 2013 (30848) CommandRunner runner caught SIGHUP. Reopening logs.
May 26 12:29:33 2013 (30843) RetryRunner runner caught SIGHUP. Reopening logs.
May 26 12:29:33 2013 (30838) PipelineRunner runner caught SIGHUP. Reopening logs.
May 26 12:29:33 2013 (30837) DigestRunner runner caught SIGHUP. Reopening logs.
May 26 12:29:33 2013 (30839) ArchiveRunner runner caught SIGHUP. Reopening logs.
May 26 12:29:33 2013 (30836) IncomingRunner runner caught SIGHUP. Reopening logs.
May 26 12:29:33 2013 (30846) LMTPRunner runner caught SIGHUP. Reopening logs.
May 26 12:29:33 2013 (30849) VirginRunner runner caught SIGHUP. Reopening logs.
May 26 12:29:33 2013 (30845) BounceRunner runner caught SIGHUP. Reopening logs.
May 26 12:29:33 2013 (30850) NNTPRunner runner caught SIGHUP. Reopening logs.
May 26 12:29:33 2013 (30840) OutgoingRunner runner caught SIGHUP. Reopening logs.

As you can see in the "caught SIGHUP" lines, there is nothing about the REST server, and indeed there is no "runner --runner=rest:0:1" process running anymore. I have to restart mailman to get the REST server back online.

Tags: mailman3 rest
Revision history for this message
Aurélien Bompard (abompard) wrote :

After looking at the source code, it seems to come from the fact that the REST runner has the intercept_signals attribute set to False. Any reason for that ?

Revision history for this message
Barry Warsaw (barry) wrote :

To be honest, I don't remember why I added that. Note that the REST server is the only one that doesn't intercept signals. I tried commenting out that line (since the default is True) and cli testing seemed just fine. The test suite passes too. I wonder if it was due to a problem in Python 2.6 which perhaps is fixed now in 2.7?

Try commenting out line 43 in rest.py and see if things work better for you. If so, then perhaps we can just remove this mis-feature. I sure wish I remembered why I added it though.

Revision history for this message
Aurélien Bompard (abompard) wrote :

I've been running without this line for a few days already, and it seems to work fine.

Revision history for this message
Barry Warsaw (barry) wrote :

Okay, I've commented out the line in RESTRunner, but not removed the basic functionality to override interception of signals. That way, if I remember why I added that in the first place, it will be easier to re-enable. OTOH, if nothing bad happens, then we should eventually remove the flag.

Changed in mailman:
milestone: none → 3.0.0b4
assignee: nobody → Barry Warsaw (barry)
importance: Undecided → High
status: New → Fix Committed
Revision history for this message
Aurélien Bompard (abompard) wrote :

OK, now I know why you added this flag. Removing it does not work, the REST server still exits on reopen (I don't know how I missed this, it used to work). But it's an exit, not a crash. In the logs I get "REST server exiting", because the server gets an interrupted system call error. This is very probably linked to http://bugs.python.org/issue7978.

In python upstream they suggested to simply restart the select call, so I'll prepare a patch to restart the serve_forever() call.
Can you think of a reason where we would really want the REST server to exit on an interrupted system call ?

Changed in mailman:
status: Fix Committed → In Progress
Revision history for this message
Aurélien Bompard (abompard) wrote :

Here's the patch to restart the REST server on an interrupted system call. I tried to make use of the existing runner infrastructure so it's a short patch, but it may not be a good idea (I don't want to hijack something not designed for this purpose).

Revision history for this message
Barry Warsaw (barry) wrote :

I generally like that you're using the runner infrastructure for this. Is there any chance you can come up with a unittest case for this? (I'm still working on one but haven't gotten it working yet.)

Revision history for this message
Aurélien Bompard (abompard) wrote :

Here's a testcase that I finally managed to get working. Download the attached file to src/mailman/runners/tests/test_rest.py. I'm not sure it's the best way to test for such a case, but at least it works (it fails without my patch and passes with it).

Revision history for this message
Barry Warsaw (barry) wrote :

This turned out to be a more extensive branch than anticipated because of weirdness in three competing constraints. To properly shutdown the WSGI server, the .shutdown() method must be called in a separate thread from the server. But because of SQLite constraints, the server must run in the main thread, and because of Python signal handling semantics, the signal handlers must run in the main thread. I solved this, by running a subthread that just waits on an Event and then calls the .shutdown(). The signal handler fires the Event.

I also greatly simplified the bin/runner implementation and removed some command line options that weren't really being used.

Thanks for the test code! I did a little bit of refactoring, but it was very helpful.

Changed in mailman:
status: In Progress → Fix Committed
Barry Warsaw (barry)
Changed in mailman:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.