Merge lp:~flacoste/launchpad/bug-688503 into lp:launchpad

Proposed by Francis J. Lacoste
Status: Merged
Approved by: Gary Poster
Approved revision: no longer in the source branch.
Merged at revision: 12104
Proposed branch: lp:~flacoste/launchpad/bug-688503
Merge into: lp:launchpad
Diff against target: 293 lines (+197/-9)
8 files modified
lib/canonical/launchpad/webapp/configure.zcml (+11/-0)
lib/canonical/launchpad/webapp/dbpolicy.py (+2/-2)
lib/canonical/launchpad/webapp/haproxy.py (+63/-0)
lib/canonical/launchpad/webapp/publication.py (+8/-5)
lib/canonical/launchpad/webapp/sighup.py (+26/-0)
lib/canonical/launchpad/webapp/tests/test_haproxy.py (+49/-0)
lib/canonical/launchpad/webapp/tests/test_sighup.py (+36/-0)
utilities/page-performance-report.ini (+2/-2)
To merge this branch: bzr merge lp:~flacoste/launchpad/bug-688503
Reviewer Review Type Date Requested Status
Gary Poster (community) Approve
Review via email: mp+44102@code.launchpad.net

Commit message

[r=gary][ui=none][bug=688503] Add a /+haproxy monitoring URL that can controlled through SIGHUP. That allows an easy way to take app servers out of rotation for graceful shutdown and restart.

Description of the change

Hi,

(Looks like I'm missing a plugin...)

This implements the LOSA request described in bug 688503. It adds a view at
/+haproxy that return status 200 or 500 based on a global flag. That flag is
controlled through the SIGHUP signal (suggested by elmo).

This URL will be used as the probe URL in HAProxy. Normally, it returns 200
and this tells HAProxy that the app server is functioning normally. When that
requests fails or returns 500, HAProxy will stop sending requests to it.

This will allow to restart app servers without interrupting user requests, as
LOSA will send the HUP signal which will stop HAProxy from dispatching
requests to it. The deployment script will then monitor the HAProxy status
board until all existing requests dispatched are completed. The app server
will then be able to be restarted.

Previously the URL used was /+opstats and we have some exceptional rules for
that view. I've implemented the same exceptions for this as it should also not
talk to the DB and be excluded from the PPR reports.

Tests can be run with test -vvm canonical.launchpad.webapp.tests.test_haproxy
and canonical.launchpad.webapp.tests.test_sighup.

QA can be done by sedning the HUP signal to the app server, looking at the log
and visiting the /+haproxy url.

Let me know if you have any questions.

To post a comment you must log in.
Revision history for this message
Gary Poster (gary) wrote :

Looks good, thank you.

in the PPR, you maybe could change "All launchpad except opstats" to "All launchpad except haproxy pages". I don't think it really matters, but if you want to.

review: Approve
Revision history for this message
Francis J. Lacoste (flacoste) wrote :

I changed it to 'All Launchpad exception operational pages', since +opstats is still used for Nagios checks and other metrics gathering.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/canonical/launchpad/webapp/configure.zcml'
2--- lib/canonical/launchpad/webapp/configure.zcml 2010-10-14 23:03:41 +0000
3+++ lib/canonical/launchpad/webapp/configure.zcml 2010-12-17 21:57:48 +0000
4@@ -331,6 +331,10 @@
5 <!-- Signal handlers -->
6 <subscriber
7 for="zope.app.appsetup.IProcessStartingEvent"
8+ handler="canonical.launchpad.webapp.sighup.setup_sighup"
9+ />
10+ <subscriber
11+ for="zope.app.appsetup.IProcessStartingEvent"
12 handler="canonical.launchpad.webapp.sigusr1.setup_sigusr1"
13 />
14 <subscriber
15@@ -431,4 +435,11 @@
16 factory="canonical.launchpad.webapp.namespace.FormNamespaceView"
17 />
18
19+ <!-- Registrations to support +haproxy status url. -->
20+ <browser:page
21+ for="canonical.launchpad.webapp.interfaces.ILaunchpadRoot"
22+ name="+haproxy"
23+ permission="zope.Public"
24+ class="canonical.launchpad.webapp.haproxy.HAProxyStatusView"
25+ />
26 </configure>
27
28=== modified file 'lib/canonical/launchpad/webapp/dbpolicy.py'
29--- lib/canonical/launchpad/webapp/dbpolicy.py 2010-11-16 06:29:36 +0000
30+++ lib/canonical/launchpad/webapp/dbpolicy.py 2010-12-17 21:57:48 +0000
31@@ -193,13 +193,13 @@
32 def LaunchpadDatabasePolicyFactory(request):
33 """Return the Launchpad IDatabasePolicy for the current appserver state.
34 """
35- # We need to select a non-load balancing DB policy for +opstats so
36+ # We need to select a non-load balancing DB policy for some status URLs so
37 # it doesn't query the DB for lag information (this page should not
38 # hit the database at all). We haven't traversed yet, so we have
39 # to sniff the request this way. Even though PATH_INFO is always
40 # present in real requests, we need to tread carefully (``get``) because
41 # of test requests in our automated tests.
42- if request.get('PATH_INFO') == u'/+opstats':
43+ if request.get('PATH_INFO') in [u'/+opstats', u'/+haproxy']:
44 return DatabaseBlockedPolicy(request)
45 elif is_read_only():
46 return ReadOnlyLaunchpadDatabasePolicy(request)
47
48=== added file 'lib/canonical/launchpad/webapp/haproxy.py'
49--- lib/canonical/launchpad/webapp/haproxy.py 1970-01-01 00:00:00 +0000
50+++ lib/canonical/launchpad/webapp/haproxy.py 2010-12-17 21:57:48 +0000
51@@ -0,0 +1,63 @@
52+# Copyright 2010 Canonical Ltd. This software is licensed under the
53+# GNU Affero General Public License version 3 (see the file LICENSE).
54+
55+"""Implementation of the HAProxy probe URL."""
56+
57+
58+__metaclass__ = type
59+__all__ = [
60+ 'HAProxyStatusView',
61+ 'set_going_down_flag',
62+ 'switch_going_down_flag',
63+ ]
64+
65+# This is the global flag, when this is True, the HAProxy view
66+# will return 500, it returns 200 otherwise.
67+going_down_flag = False
68+
69+
70+def set_going_down_flag(state):
71+ """Sets the going_down_flag to state"""
72+ global going_down_flag
73+ going_down_flag = state
74+
75+
76+def switch_going_down_flag():
77+ """Switch the going down flag.
78+
79+ This is is registered as a signal handler for HUP.
80+ """
81+ global going_down_flag
82+ going_down_flag = not going_down_flag
83+
84+
85+class HAProxyStatusView:
86+ """
87+ View implementing the HAProxy probe URL.
88+
89+ HAProxy doesn't support programmatically taking servers in and our of
90+ rotation. It does however uses a probe URL that it scans regularly to see
91+ if the server is still alive. The /+haproxy is that URL for us.
92+
93+ If it times out or returns a non-200 status, it will take the server out
94+ of rotation, until the probe works again.
95+
96+ This allow us to send a signal (HUP) to the app servers when we want to
97+ restart them. The probe URL will then return 500 and the app server will
98+ be taken out of rotation. Once HAProxy reports that all connections to the
99+ app servers are finished, we can restart the server safely.
100+ """
101+
102+ def __init__(self, context, request):
103+ self.context = context
104+ self.request = request
105+
106+ def __call__(self):
107+ """Return 200 or 500 depending on the global flag."""
108+ global going_down_flag
109+ if going_down_flag:
110+ self.request.response.setStatus(500)
111+ return u"May day! May day! I'm going down. Stop the flood gate."
112+ else:
113+ self.request.response.setStatus(200)
114+ return u"Everything is groovy. Keep them coming!"
115
116=== modified file 'lib/canonical/launchpad/webapp/publication.py'
117--- lib/canonical/launchpad/webapp/publication.py 2010-11-02 01:34:05 +0000
118+++ lib/canonical/launchpad/webapp/publication.py 2010-12-17 21:57:48 +0000
119@@ -456,9 +456,11 @@
120 # And spit the pageid out to our tracelog.
121 tracelog(request, 'p', pageid)
122
123- # For opstats, where we really don't want to have any DB access at all,
124- # ensure that all flag lookups will stop early.
125- if pageid in ('RootObject:OpStats', 'RootObject:+opstats'):
126+ # For status URLs, where we really don't want to have any DB access
127+ # at all, ensure that all flag lookups will stop early.
128+ if pageid in (
129+ 'RootObject:OpStats', 'RootObject:+opstats',
130+ 'RootObject:+haproxy'):
131 request.features = NullFeatureController()
132 features.per_thread.features = request.features
133
134@@ -466,8 +468,9 @@
135 # to control the hard timeout, and they trigger DB access, but our
136 # DB tracers are not safe for reentrant use, so we must do this
137 # outside of the SQL stack. We must also do it after traversal so that
138- # the view is known and can be used in scope resolution. As we actually
139- # stash the pageid after afterTraversal, we need to do this even later.
140+ # the view is known and can be used in scope resolution. As we
141+ # actually stash the pageid after afterTraversal, we need to do this
142+ # even later.
143 da.set_permit_timeout_from_features(True)
144 da._get_request_timeout()
145
146
147=== added file 'lib/canonical/launchpad/webapp/sighup.py'
148--- lib/canonical/launchpad/webapp/sighup.py 1970-01-01 00:00:00 +0000
149+++ lib/canonical/launchpad/webapp/sighup.py 2010-12-17 21:57:48 +0000
150@@ -0,0 +1,26 @@
151+# Copyright 2010 Canonical Ltd. This software is licensed under the
152+# GNU Affero General Public License version 3 (see the file LICENSE).
153+
154+"""Signal handler for SIGHUP."""
155+
156+__metaclass__ = type
157+__all__ = []
158+
159+import logging
160+import signal
161+
162+from canonical.launchpad.webapp import haproxy
163+
164+
165+def sighup_handler(signum, frame):
166+ """Switch the state of the HAProxy going_down flag."""
167+ haproxy.switch_going_down_flag()
168+ logging.getLogger('sighup').info(
169+ 'Received SIGHUP, swiched going_down flag to %s' %
170+ haproxy.going_down_flag)
171+
172+
173+def setup_sighup(event):
174+ """Configure the SIGHUP handler. Called at startup."""
175+ signal.signal(signal.SIGHUP, sighup_handler)
176+
177
178=== added file 'lib/canonical/launchpad/webapp/tests/test_haproxy.py'
179--- lib/canonical/launchpad/webapp/tests/test_haproxy.py 1970-01-01 00:00:00 +0000
180+++ lib/canonical/launchpad/webapp/tests/test_haproxy.py 2010-12-17 21:57:48 +0000
181@@ -0,0 +1,49 @@
182+# Copyright 2010 Canonical Ltd. This software is licensed under the
183+# GNU Affero General Public License version 3 (see the file LICENSE).
184+
185+"""Test the haproxy integration view."""
186+
187+__metaclass__ = type
188+__all__ = []
189+
190+from canonical.testing.layers import FunctionalLayer
191+from canonical.launchpad.webapp import haproxy
192+from canonical.launchpad.webapp.dbpolicy import (
193+ DatabaseBlockedPolicy,
194+ LaunchpadDatabasePolicyFactory,
195+ )
196+from canonical.launchpad.webapp.servers import LaunchpadTestRequest
197+
198+from zope.app.testing.functional import HTTPCaller
199+from lp.testing import TestCase
200+
201+
202+class HAProxyIntegrationTest(TestCase):
203+ layer = FunctionalLayer
204+
205+ def setUp(self):
206+ TestCase.setUp(self)
207+ self.http = HTTPCaller()
208+ self.original_flag = haproxy.going_down_flag
209+ self.addCleanup(haproxy.set_going_down_flag, self.original_flag)
210+
211+ def test_HAProxyStatusView_all_good_returns_200(self):
212+ result = self.http(u'GET /+haproxy HTTP/1.0', handle_errors=False)
213+ self.assertEquals(200, result.getStatus())
214+
215+ def test_HAProxyStatusView_going_down_returns_500(self):
216+ haproxy.set_going_down_flag(True)
217+ result = self.http(u'GET /+haproxy HTTP/1.0', handle_errors=False)
218+ self.assertEquals(500, result.getStatus())
219+
220+ def test_haproxy_url_uses_DatabaseBlocked_policy(self):
221+ request = LaunchpadTestRequest(environ={'PATH_INFO': '/+haproxy'})
222+ policy = LaunchpadDatabasePolicyFactory(request)
223+ self.assertIsInstance(policy, DatabaseBlockedPolicy)
224+
225+ def test_switch_going_down_flag(self):
226+ haproxy.set_going_down_flag(True)
227+ haproxy.switch_going_down_flag()
228+ self.assertEquals(False, haproxy.going_down_flag)
229+ haproxy.switch_going_down_flag()
230+ self.assertEquals(True, haproxy.going_down_flag)
231
232=== added file 'lib/canonical/launchpad/webapp/tests/test_sighup.py'
233--- lib/canonical/launchpad/webapp/tests/test_sighup.py 1970-01-01 00:00:00 +0000
234+++ lib/canonical/launchpad/webapp/tests/test_sighup.py 2010-12-17 21:57:48 +0000
235@@ -0,0 +1,36 @@
236+# Copyright 2010 Canonical Ltd. This software is licensed under the
237+# GNU Affero General Public License version 3 (see the file LICENSE).
238+
239+"""Test the SIGHUP signal handler."""
240+
241+__metaclass__ = type
242+
243+import os
244+import signal
245+
246+from canonical.testing.layers import FunctionalLayer
247+from canonical.launchpad.webapp import haproxy
248+from canonical.launchpad.webapp import sighup
249+from lp.testing import TestCase
250+
251+
252+class SIGHUPTestCase(TestCase):
253+ layer = FunctionalLayer
254+
255+ def setUp(self):
256+ TestCase.setUp(self)
257+ self.original_handler = signal.getsignal(signal.SIGHUP)
258+ self.addCleanup(signal.signal, signal.SIGHUP, self.original_handler)
259+ sighup.setup_sighup(None)
260+
261+ self.original_flag = haproxy.going_down_flag
262+ self.addCleanup(haproxy.set_going_down_flag, self.original_flag)
263+
264+ def test_sighup(self):
265+ # Sending SIGHUP should switch the PID
266+ os.kill(os.getpid(), signal.SIGHUP)
267+ self.assertEquals(not self.original_flag, haproxy.going_down_flag)
268+
269+ # Sending again should switch again.
270+ os.kill(os.getpid(), signal.SIGHUP)
271+ self.assertEquals(self.original_flag, haproxy.going_down_flag)
272
273=== modified file 'utilities/page-performance-report.ini'
274--- utilities/page-performance-report.ini 2010-11-05 19:58:32 +0000
275+++ utilities/page-performance-report.ini 2010-12-17 21:57:48 +0000
276@@ -3,7 +3,7 @@
277 # Remeber to quote ?, ., + & ? characters to match litterally.
278 # 'kodos' is useful for interactively testing regular expressions.
279 All Launchpad=.
280-All launchpad except opstats=(?<!\+opstats)$
281+All Launchpad except operational pages=(?<!\+opstats|\+haproxy)$
282
283 Launchpad Frontpage=^https?://launchpad\.[^/]+/?$
284
285@@ -46,7 +46,7 @@
286 Shipit=^https?://shipit\.
287
288 [metrics]
289-ppr_all=All launchpad except opstats
290+ppr_all=All Launchpad except operational pages
291 ppr_bugs=Bugs
292 ppr_api=API
293 ppr_code=Code