Merge lp:~flacoste/launchpad/bug-688503 into lp:launchpad

Proposed by Francis J. Lacoste
Status: Merged
Approved by: Gary Poster
Approved revision: no longer in the source branch.
Merged at revision: 12104
Proposed branch: lp:~flacoste/launchpad/bug-688503
Merge into: lp:launchpad
Diff against target: 293 lines (+197/-9)
8 files modified
lib/canonical/launchpad/webapp/configure.zcml (+11/-0)
lib/canonical/launchpad/webapp/dbpolicy.py (+2/-2)
lib/canonical/launchpad/webapp/haproxy.py (+63/-0)
lib/canonical/launchpad/webapp/publication.py (+8/-5)
lib/canonical/launchpad/webapp/sighup.py (+26/-0)
lib/canonical/launchpad/webapp/tests/test_haproxy.py (+49/-0)
lib/canonical/launchpad/webapp/tests/test_sighup.py (+36/-0)
utilities/page-performance-report.ini (+2/-2)
To merge this branch: bzr merge lp:~flacoste/launchpad/bug-688503
Reviewer Review Type Date Requested Status
Gary Poster (community) Approve
Review via email: mp+44102@code.launchpad.net

Commit message

[r=gary][ui=none][bug=688503] Add a /+haproxy monitoring URL that can controlled through SIGHUP. That allows an easy way to take app servers out of rotation for graceful shutdown and restart.

Description of the change

Hi,

(Looks like I'm missing a plugin...)

This implements the LOSA request described in bug 688503. It adds a view at
/+haproxy that return status 200 or 500 based on a global flag. That flag is
controlled through the SIGHUP signal (suggested by elmo).

This URL will be used as the probe URL in HAProxy. Normally, it returns 200
and this tells HAProxy that the app server is functioning normally. When that
requests fails or returns 500, HAProxy will stop sending requests to it.

This will allow to restart app servers without interrupting user requests, as
LOSA will send the HUP signal which will stop HAProxy from dispatching
requests to it. The deployment script will then monitor the HAProxy status
board until all existing requests dispatched are completed. The app server
will then be able to be restarted.

Previously the URL used was /+opstats and we have some exceptional rules for
that view. I've implemented the same exceptions for this as it should also not
talk to the DB and be excluded from the PPR reports.

Tests can be run with test -vvm canonical.launchpad.webapp.tests.test_haproxy
and canonical.launchpad.webapp.tests.test_sighup.

QA can be done by sedning the HUP signal to the app server, looking at the log
and visiting the /+haproxy url.

Let me know if you have any questions.

To post a comment you must log in.
Revision history for this message
Gary Poster (gary) wrote :

Looks good, thank you.

in the PPR, you maybe could change "All launchpad except opstats" to "All launchpad except haproxy pages". I don't think it really matters, but if you want to.

review: Approve
Revision history for this message
Francis J. Lacoste (flacoste) wrote :

I changed it to 'All Launchpad exception operational pages', since +opstats is still used for Nagios checks and other metrics gathering.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'lib/canonical/launchpad/webapp/configure.zcml'
--- lib/canonical/launchpad/webapp/configure.zcml 2010-10-14 23:03:41 +0000
+++ lib/canonical/launchpad/webapp/configure.zcml 2010-12-17 21:57:48 +0000
@@ -331,6 +331,10 @@
331 <!-- Signal handlers -->331 <!-- Signal handlers -->
332 <subscriber332 <subscriber
333 for="zope.app.appsetup.IProcessStartingEvent"333 for="zope.app.appsetup.IProcessStartingEvent"
334 handler="canonical.launchpad.webapp.sighup.setup_sighup"
335 />
336 <subscriber
337 for="zope.app.appsetup.IProcessStartingEvent"
334 handler="canonical.launchpad.webapp.sigusr1.setup_sigusr1"338 handler="canonical.launchpad.webapp.sigusr1.setup_sigusr1"
335 />339 />
336 <subscriber340 <subscriber
@@ -431,4 +435,11 @@
431 factory="canonical.launchpad.webapp.namespace.FormNamespaceView"435 factory="canonical.launchpad.webapp.namespace.FormNamespaceView"
432 />436 />
433437
438 <!-- Registrations to support +haproxy status url. -->
439 <browser:page
440 for="canonical.launchpad.webapp.interfaces.ILaunchpadRoot"
441 name="+haproxy"
442 permission="zope.Public"
443 class="canonical.launchpad.webapp.haproxy.HAProxyStatusView"
444 />
434</configure>445</configure>
435446
=== modified file 'lib/canonical/launchpad/webapp/dbpolicy.py'
--- lib/canonical/launchpad/webapp/dbpolicy.py 2010-11-16 06:29:36 +0000
+++ lib/canonical/launchpad/webapp/dbpolicy.py 2010-12-17 21:57:48 +0000
@@ -193,13 +193,13 @@
193def LaunchpadDatabasePolicyFactory(request):193def LaunchpadDatabasePolicyFactory(request):
194 """Return the Launchpad IDatabasePolicy for the current appserver state.194 """Return the Launchpad IDatabasePolicy for the current appserver state.
195 """195 """
196 # We need to select a non-load balancing DB policy for +opstats so196 # We need to select a non-load balancing DB policy for some status URLs so
197 # it doesn't query the DB for lag information (this page should not197 # it doesn't query the DB for lag information (this page should not
198 # hit the database at all). We haven't traversed yet, so we have198 # hit the database at all). We haven't traversed yet, so we have
199 # to sniff the request this way. Even though PATH_INFO is always199 # to sniff the request this way. Even though PATH_INFO is always
200 # present in real requests, we need to tread carefully (``get``) because200 # present in real requests, we need to tread carefully (``get``) because
201 # of test requests in our automated tests.201 # of test requests in our automated tests.
202 if request.get('PATH_INFO') == u'/+opstats':202 if request.get('PATH_INFO') in [u'/+opstats', u'/+haproxy']:
203 return DatabaseBlockedPolicy(request)203 return DatabaseBlockedPolicy(request)
204 elif is_read_only():204 elif is_read_only():
205 return ReadOnlyLaunchpadDatabasePolicy(request)205 return ReadOnlyLaunchpadDatabasePolicy(request)
206206
=== added file 'lib/canonical/launchpad/webapp/haproxy.py'
--- lib/canonical/launchpad/webapp/haproxy.py 1970-01-01 00:00:00 +0000
+++ lib/canonical/launchpad/webapp/haproxy.py 2010-12-17 21:57:48 +0000
@@ -0,0 +1,63 @@
1# Copyright 2010 Canonical Ltd. This software is licensed under the
2# GNU Affero General Public License version 3 (see the file LICENSE).
3
4"""Implementation of the HAProxy probe URL."""
5
6
7__metaclass__ = type
8__all__ = [
9 'HAProxyStatusView',
10 'set_going_down_flag',
11 'switch_going_down_flag',
12 ]
13
14# This is the global flag, when this is True, the HAProxy view
15# will return 500, it returns 200 otherwise.
16going_down_flag = False
17
18
19def set_going_down_flag(state):
20 """Sets the going_down_flag to state"""
21 global going_down_flag
22 going_down_flag = state
23
24
25def switch_going_down_flag():
26 """Switch the going down flag.
27
28 This is is registered as a signal handler for HUP.
29 """
30 global going_down_flag
31 going_down_flag = not going_down_flag
32
33
34class HAProxyStatusView:
35 """
36 View implementing the HAProxy probe URL.
37
38 HAProxy doesn't support programmatically taking servers in and our of
39 rotation. It does however uses a probe URL that it scans regularly to see
40 if the server is still alive. The /+haproxy is that URL for us.
41
42 If it times out or returns a non-200 status, it will take the server out
43 of rotation, until the probe works again.
44
45 This allow us to send a signal (HUP) to the app servers when we want to
46 restart them. The probe URL will then return 500 and the app server will
47 be taken out of rotation. Once HAProxy reports that all connections to the
48 app servers are finished, we can restart the server safely.
49 """
50
51 def __init__(self, context, request):
52 self.context = context
53 self.request = request
54
55 def __call__(self):
56 """Return 200 or 500 depending on the global flag."""
57 global going_down_flag
58 if going_down_flag:
59 self.request.response.setStatus(500)
60 return u"May day! May day! I'm going down. Stop the flood gate."
61 else:
62 self.request.response.setStatus(200)
63 return u"Everything is groovy. Keep them coming!"
064
=== modified file 'lib/canonical/launchpad/webapp/publication.py'
--- lib/canonical/launchpad/webapp/publication.py 2010-11-02 01:34:05 +0000
+++ lib/canonical/launchpad/webapp/publication.py 2010-12-17 21:57:48 +0000
@@ -456,9 +456,11 @@
456 # And spit the pageid out to our tracelog.456 # And spit the pageid out to our tracelog.
457 tracelog(request, 'p', pageid)457 tracelog(request, 'p', pageid)
458458
459 # For opstats, where we really don't want to have any DB access at all,459 # For status URLs, where we really don't want to have any DB access
460 # ensure that all flag lookups will stop early.460 # at all, ensure that all flag lookups will stop early.
461 if pageid in ('RootObject:OpStats', 'RootObject:+opstats'):461 if pageid in (
462 'RootObject:OpStats', 'RootObject:+opstats',
463 'RootObject:+haproxy'):
462 request.features = NullFeatureController()464 request.features = NullFeatureController()
463 features.per_thread.features = request.features465 features.per_thread.features = request.features
464466
@@ -466,8 +468,9 @@
466 # to control the hard timeout, and they trigger DB access, but our468 # to control the hard timeout, and they trigger DB access, but our
467 # DB tracers are not safe for reentrant use, so we must do this469 # DB tracers are not safe for reentrant use, so we must do this
468 # outside of the SQL stack. We must also do it after traversal so that470 # outside of the SQL stack. We must also do it after traversal so that
469 # the view is known and can be used in scope resolution. As we actually471 # the view is known and can be used in scope resolution. As we
470 # stash the pageid after afterTraversal, we need to do this even later.472 # actually stash the pageid after afterTraversal, we need to do this
473 # even later.
471 da.set_permit_timeout_from_features(True)474 da.set_permit_timeout_from_features(True)
472 da._get_request_timeout()475 da._get_request_timeout()
473476
474477
=== added file 'lib/canonical/launchpad/webapp/sighup.py'
--- lib/canonical/launchpad/webapp/sighup.py 1970-01-01 00:00:00 +0000
+++ lib/canonical/launchpad/webapp/sighup.py 2010-12-17 21:57:48 +0000
@@ -0,0 +1,26 @@
1# Copyright 2010 Canonical Ltd. This software is licensed under the
2# GNU Affero General Public License version 3 (see the file LICENSE).
3
4"""Signal handler for SIGHUP."""
5
6__metaclass__ = type
7__all__ = []
8
9import logging
10import signal
11
12from canonical.launchpad.webapp import haproxy
13
14
15def sighup_handler(signum, frame):
16 """Switch the state of the HAProxy going_down flag."""
17 haproxy.switch_going_down_flag()
18 logging.getLogger('sighup').info(
19 'Received SIGHUP, swiched going_down flag to %s' %
20 haproxy.going_down_flag)
21
22
23def setup_sighup(event):
24 """Configure the SIGHUP handler. Called at startup."""
25 signal.signal(signal.SIGHUP, sighup_handler)
26
027
=== added file 'lib/canonical/launchpad/webapp/tests/test_haproxy.py'
--- lib/canonical/launchpad/webapp/tests/test_haproxy.py 1970-01-01 00:00:00 +0000
+++ lib/canonical/launchpad/webapp/tests/test_haproxy.py 2010-12-17 21:57:48 +0000
@@ -0,0 +1,49 @@
1# Copyright 2010 Canonical Ltd. This software is licensed under the
2# GNU Affero General Public License version 3 (see the file LICENSE).
3
4"""Test the haproxy integration view."""
5
6__metaclass__ = type
7__all__ = []
8
9from canonical.testing.layers import FunctionalLayer
10from canonical.launchpad.webapp import haproxy
11from canonical.launchpad.webapp.dbpolicy import (
12 DatabaseBlockedPolicy,
13 LaunchpadDatabasePolicyFactory,
14 )
15from canonical.launchpad.webapp.servers import LaunchpadTestRequest
16
17from zope.app.testing.functional import HTTPCaller
18from lp.testing import TestCase
19
20
21class HAProxyIntegrationTest(TestCase):
22 layer = FunctionalLayer
23
24 def setUp(self):
25 TestCase.setUp(self)
26 self.http = HTTPCaller()
27 self.original_flag = haproxy.going_down_flag
28 self.addCleanup(haproxy.set_going_down_flag, self.original_flag)
29
30 def test_HAProxyStatusView_all_good_returns_200(self):
31 result = self.http(u'GET /+haproxy HTTP/1.0', handle_errors=False)
32 self.assertEquals(200, result.getStatus())
33
34 def test_HAProxyStatusView_going_down_returns_500(self):
35 haproxy.set_going_down_flag(True)
36 result = self.http(u'GET /+haproxy HTTP/1.0', handle_errors=False)
37 self.assertEquals(500, result.getStatus())
38
39 def test_haproxy_url_uses_DatabaseBlocked_policy(self):
40 request = LaunchpadTestRequest(environ={'PATH_INFO': '/+haproxy'})
41 policy = LaunchpadDatabasePolicyFactory(request)
42 self.assertIsInstance(policy, DatabaseBlockedPolicy)
43
44 def test_switch_going_down_flag(self):
45 haproxy.set_going_down_flag(True)
46 haproxy.switch_going_down_flag()
47 self.assertEquals(False, haproxy.going_down_flag)
48 haproxy.switch_going_down_flag()
49 self.assertEquals(True, haproxy.going_down_flag)
050
=== added file 'lib/canonical/launchpad/webapp/tests/test_sighup.py'
--- lib/canonical/launchpad/webapp/tests/test_sighup.py 1970-01-01 00:00:00 +0000
+++ lib/canonical/launchpad/webapp/tests/test_sighup.py 2010-12-17 21:57:48 +0000
@@ -0,0 +1,36 @@
1# Copyright 2010 Canonical Ltd. This software is licensed under the
2# GNU Affero General Public License version 3 (see the file LICENSE).
3
4"""Test the SIGHUP signal handler."""
5
6__metaclass__ = type
7
8import os
9import signal
10
11from canonical.testing.layers import FunctionalLayer
12from canonical.launchpad.webapp import haproxy
13from canonical.launchpad.webapp import sighup
14from lp.testing import TestCase
15
16
17class SIGHUPTestCase(TestCase):
18 layer = FunctionalLayer
19
20 def setUp(self):
21 TestCase.setUp(self)
22 self.original_handler = signal.getsignal(signal.SIGHUP)
23 self.addCleanup(signal.signal, signal.SIGHUP, self.original_handler)
24 sighup.setup_sighup(None)
25
26 self.original_flag = haproxy.going_down_flag
27 self.addCleanup(haproxy.set_going_down_flag, self.original_flag)
28
29 def test_sighup(self):
30 # Sending SIGHUP should switch the PID
31 os.kill(os.getpid(), signal.SIGHUP)
32 self.assertEquals(not self.original_flag, haproxy.going_down_flag)
33
34 # Sending again should switch again.
35 os.kill(os.getpid(), signal.SIGHUP)
36 self.assertEquals(self.original_flag, haproxy.going_down_flag)
037
=== modified file 'utilities/page-performance-report.ini'
--- utilities/page-performance-report.ini 2010-11-05 19:58:32 +0000
+++ utilities/page-performance-report.ini 2010-12-17 21:57:48 +0000
@@ -3,7 +3,7 @@
3# Remeber to quote ?, ., + & ? characters to match litterally.3# Remeber to quote ?, ., + & ? characters to match litterally.
4# 'kodos' is useful for interactively testing regular expressions.4# 'kodos' is useful for interactively testing regular expressions.
5All Launchpad=.5All Launchpad=.
6All launchpad except opstats=(?<!\+opstats)$6All Launchpad except operational pages=(?<!\+opstats|\+haproxy)$
77
8Launchpad Frontpage=^https?://launchpad\.[^/]+/?$8Launchpad Frontpage=^https?://launchpad\.[^/]+/?$
99
@@ -46,7 +46,7 @@
46Shipit=^https?://shipit\.46Shipit=^https?://shipit\.
4747
48[metrics]48[metrics]
49ppr_all=All launchpad except opstats49ppr_all=All Launchpad except operational pages
50ppr_bugs=Bugs50ppr_bugs=Bugs
51ppr_api=API51ppr_api=API
52ppr_code=Code52ppr_code=Code