Launchpad itself

implement haproxy check page that can be forced to return a 500 error

Bug #688503 reported by Tom Haddon on 2010-12-10

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Launchpad itself	Fix Released	High	Francis J. Lacoste

Bug Description

This bug is intended to describe the solution to RT#41503. Essentially what we're looking for is:

- A lightweight page that we can check for in haproxy (this would be being checked every 2 seconds)
- A means of forcing that page to return a 500 (this could be sending a signal, checking for the presence of a file, etc.)

This will allow us to implement "no downtime" rolling upgrades by:

1) Force the check page to return haproxy
2) Watch haproxy to confirm the service is reported as down, and wait til it reports no active connections to the instance in question
3) Stop the instance in question using normal shutdown process
4) Start the service again using new code
5) Haproxy will automatically see the new instance as "up" and start sending traffic to it

Tags:

Related branches

lp:~flacoste/launchpad/bug-688503

Merged into lp:launchpad at revision 12104

Gary Poster (community): Approve on 2010-12-17

Tom Haddon (mthaddon) on 2010-12-10

tags:

added: canonical-losa-lp

Gary Poster (gary) on 2010-12-10

Changed in launchpad-foundations:
status:	New → Triaged
importance:	Undecided → High
tags:	added: bugjam2010

Francis J. Lacoste (flacoste) on 2010-12-17

Changed in launchpad:
assignee:	nobody → Francis J. Lacoste (flacoste)
status:	Triaged → In Progress

Revision history for this message

Launchpad QA Bot (lpqabot) wrote on 2010-12-18: Bug fixed by a commit

Fixed in stable r12104 <http://bazaar.launchpad.net/~launchpad-pqm/launchpad/stable/revision/12104>.

tags:	added: qa-needstesting
Changed in launchpad:
status:	In Progress → Fix Committed

Gary Poster (gary) on 2010-12-20

tags:

added: qa-bad
removed: qa-needstesting

Revision history for this message

Gary Poster (gary) wrote on 2010-12-20:

The fix did not work on qastaging. Chex HUPped the appserver processes, but I continued to get the 200/"groovy" responses afterwards from wget. Chex verified that neither haproxy nor squid were in front of qastaging.

lifeless requested that the HUP always turn the appserver into the "broken" state, and not flip back and forth.

I am not sure right now why this did not work.

If I have to be the one to complete it, I will try to get it to work locally, as Francis did, and then, if nothing comes to mind as to what might have gone wrong, maybe add some logging (while I make the change lifeless suggests). I don't know yet if I'll have time for this.

Revision history for this message

Gary Poster (gary) wrote on 2010-12-20:

I'm changing this to qa-ok because there is no reason I know of for this not to be deployed. It does not break anything, and it does not expose broken functionality to the end user. That said, I'll move this bug back to Triaged.

tags:	added: qa-ok removed: qa-bad
Changed in launchpad:
status:	Fix Committed → Triaged

Francis J. Lacoste (flacoste) on 2011-01-05

Changed in launchpad:
status:	Triaged → Fix Committed

Revision history for this message

Francis J. Lacoste (flacoste) wrote on 2011-01-05:

OK, it doesn't work in production because the servers are started through nohup... which prevents HUP from being sent to the process!

I need to use another signal, SIGWINCH, SIGRTMIN?

Changed in launchpad:
status:	Fix Committed → In Progress

Revision history for this message

Francis J. Lacoste (flacoste) wrote on 2011-01-05:

It's working fine, actually. The HUP disappearing was because the signal was sent to the wrong process!

Changed in launchpad:
status:	In Progress → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.