Merge lp:~jameinel/launchpad/lp-service into lp:launchpad/db-devel

Proposed by John A Meinel
Status: Superseded
Proposed branch: lp:~jameinel/launchpad/lp-service
Merge into: lp:launchpad/db-devel
Diff against target: 273 lines (+194/-11)
2 files modified
bzrplugins/lpserve/__init__.py (+53/-1)
bzrplugins/lpserve/test_lpserve.py (+141/-10)
To merge this branch: bzr merge lp:~jameinel/launchpad/lp-service
Reviewer Review Type Date Requested Status
Michael Hudson-Doyle Approve
Launchpad code reviewers Pending
Review via email: mp+40386@code.launchpad.net

This proposal has been superseded by a proposal from 2010-11-29.

Commit message

Add '--pid-file' to launchpad-forking-service

Description of the change

This adds a --pid-file option to "bzr launchpad-forking-service". Which then causes the service to Daemonize itself, and write its pid to the given file.

This was brought out of the discussion of trying to get the service rolled out into qastaging.
see https://rt.admin.canonical.com/Ticket/Display.html?=42199

It would seem that 'start-stop-daemon' could be told to do the forking and pid-file management (--background and --make-pidfile), but it seems to state that "This is a last resort, and is only meant for programs that either make no sense forking on their own, or where it's not feasible to add the code for them to do this themselves."

The change seems reasonably localized, and hopefully reasonably tested. In general with IPC, I'm not always sure how to avoid both race conditions, hangs, and spurious failures. I chose 2.0s as a way to wait for the process to cleanup properly, but not block forever and hang the test suite.

To post a comment you must log in.
Revision history for this message
John A Meinel (jameinel) wrote :

Note that I still need some reviewer hand-holding to submit this to ec2, etc.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

As I said on IRC, I think you should wait until the daemon process is accepting requests before proceeding in _start_subprocess.

The 'out' variable in _daemonize is strangely located and named -- can you call it something clearer and assign to it closer to its use? I don't think we need to chdir to /, so please delete that comment. I don't think we really need to setsid either, but it does no harm I guess.

It's a bit of a shame you had to write (or copy/paste) so much code for the process handling, but oh well. There are good reasons this daemon isn't using twistd.

review: Approve
Revision history for this message
John A Meinel (jameinel) wrote :

setsid() is recommended from the daemonize code I've read:
  http://www.itp.uzh.ch/~dpotter/howto/daemonize

os.chdir('/') is sort of recommended, but doesn't seem to be required.

I moved the '/dev/null' definition closer.

I think that addresses everything. If you could give it a once-over and then send it via ec2land, I would appreciate it.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrplugins/lpserve/__init__.py'
2--- bzrplugins/lpserve/__init__.py 2010-10-07 23:43:40 +0000
3+++ bzrplugins/lpserve/__init__.py 2010-11-08 22:47:54 +0000
4@@ -568,6 +568,10 @@
5 self.log(client_addr, 'request timeout failure: %s' % (e,))
6 conn.sendall('FAILURE\nrequest timed out\n')
7 conn.close()
8+ except Exception, e:
9+ trace.log_exception_quietly()
10+ self.log(client_addr, 'trapped a failure while handling'
11+ ' connection: %s' % (e,))
12 self._poll_children()
13
14 def log(self, client_addr, message):
15@@ -759,6 +763,8 @@
16 help="Do/don't preload libraries before startup."),
17 Option('children-timeout', type=int, argname='SEC',
18 help="Only wait SEC seconds for children to exit"),
19+ Option('pid-file', type=unicode,
20+ help='Write the process PID to this file.')
21 ]
22
23 def _preload_libraries(self):
24@@ -768,8 +774,48 @@
25 except ImportError, e:
26 trace.mutter('failed to preload %s: %s' % (pyname, e))
27
28+
29+ def _daemonize(self, pid_filename):
30+ """Turn this process into a child-of-init daemon.
31+
32+ Upon request, we relinquish our control and switch to daemon mode,
33+ writing out the final pid of the daemon process.
34+ """
35+ # If fork fails, it will bubble out naturally and be reported by the
36+ # cmd logic
37+ pid = os.fork()
38+ if pid > 0:
39+ # Original process exits cleanly
40+ os._exit(0)
41+
42+ # Disconnect from the parent process
43+ os.setsid()
44+
45+ # fork again, to truly become a daemon.
46+ pid = os.fork()
47+ if pid > 0:
48+ os._exit(0)
49+
50+ # Redirect file handles
51+ stdin = open('/dev/null', 'r')
52+ os.dup2(stdin.fileno(), sys.stdin.fileno())
53+ stdout = open('/dev/null', 'a+')
54+ os.dup2(stdout.fileno(), sys.stdout.fileno())
55+ stderr = open('/dev/null', 'a+', 0)
56+ os.dup2(stderr.fileno(), sys.stderr.fileno())
57+
58+ # Now that we are a daemon, let people know what pid is running
59+ f = open(pid_filename, 'wb')
60+ try:
61+ f.write('%d\n' % (os.getpid(),))
62+ finally:
63+ f.close()
64+
65 def run(self, path=None, perms=None, preload=True,
66- children_timeout=LPForkingService.WAIT_FOR_CHILDREN_TIMEOUT):
67+ children_timeout=LPForkingService.WAIT_FOR_CHILDREN_TIMEOUT,
68+ pid_file=None):
69+ if pid_file is not None:
70+ self._daemonize(pid_file)
71 if path is None:
72 path = LPForkingService.DEFAULT_PATH
73 if perms is None:
74@@ -781,6 +827,12 @@
75 service = LPForkingService(path, perms)
76 service.WAIT_FOR_CHILDREN_TIMEOUT = children_timeout
77 service.main_loop()
78+ if pid_file is not None:
79+ try:
80+ os.remove(pid_file)
81+ except (OSError, IOError), e:
82+ trace.mutter('Failed to cleanup pid_file: %s\n%s'
83+ % (pid_file, e))
84
85 register_command(cmd_launchpad_forking_service)
86
87
88=== modified file 'bzrplugins/lpserve/test_lpserve.py'
89--- bzrplugins/lpserve/test_lpserve.py 2010-10-07 23:43:40 +0000
90+++ bzrplugins/lpserve/test_lpserve.py 2010-11-08 22:47:54 +0000
91@@ -1,6 +1,7 @@
92 # Copyright 2010 Canonical Ltd. This software is licensed under the
93 # GNU Affero General Public License version 3 (see the file LICENSE).
94
95+import errno
96 import os
97 import signal
98 import socket
99@@ -383,6 +384,18 @@
100 self.assertContainsRe(path, '/lp-forking-service-child-')
101 return path, pid, sock
102
103+ def _start_subprocess(self, path, env_changes):
104+ proc = self.start_bzr_subprocess(
105+ ['lp-service', '--path', path, '--no-preload',
106+ '--children-timeout=1'],
107+ env_changes=env_changes)
108+ trace.mutter('started lp-service subprocess')
109+ expected = 'Listening on socket: %s\n' % (path,)
110+ path_line = proc.stderr.readline()
111+ trace.mutter(path_line)
112+ self.assertEqual(expected, path_line)
113+ return proc
114+
115 def start_service_subprocess(self):
116 # Make sure this plugin is exposed to the subprocess
117 # SLOOWWW (~2 seconds, which is why we are doing the work anyway)
118@@ -406,16 +419,7 @@
119 os.remove(path) # service wants create it as a socket
120 env_changes = {'BZR_PLUGIN_PATH': lpserve.__path__[0],
121 'BZR_LOG': tempname}
122- proc = self.start_bzr_subprocess(
123- ['lp-service', '--path', path, '--no-preload',
124- '--children-timeout=1'],
125- env_changes=env_changes)
126- trace.mutter('started lp-service subprocess')
127- expected = 'Listening on socket: %s\n' % (path,)
128- path_line = proc.stderr.readline()
129- trace.mutter(path_line)
130- self.assertEqual(expected, path_line)
131- # The process won't delete it, so we do
132+ proc = self._start_subprocess(path, env_changes)
133 return proc, path
134
135 def stop_service(self):
136@@ -469,6 +473,9 @@
137 code = int(response.split('\n', 1)[1])
138 self.assertEqual(expected_code, code)
139
140+
141+class TestLPServiceInSubprocess(TestCaseWithLPForkingServiceSubprocess):
142+
143 def test_fork_lp_serve_hello(self):
144 path, _, sock = self.send_fork_request('lp-serve --inet 2')
145 stdout_content, stderr_content = self.communicate_with_fork(path,
146@@ -532,3 +539,127 @@
147
148 def test_sigint_exits_nicely(self):
149 self._check_exits_nicely(signal.SIGINT)
150+
151+
152+class TestCaseWithLPForkingServiceDaemon(
153+ TestCaseWithLPForkingServiceSubprocess):
154+ """Test LPForkingService interaction, when run in daemon mode."""
155+
156+ def _cleanup_daemon(self, pid, pid_filename):
157+ try:
158+ os.kill(pid, signal.SIGKILL)
159+ except (OSError, IOError), e:
160+ trace.mutter('failed to kill pid %d, might be already dead: %s'
161+ % (pid, e))
162+ try:
163+ os.remove(pid_filename)
164+ except (OSError, IOError), e:
165+ if e.errno != errno.ENOENT:
166+ trace.mutter('failed to remove %r: %s'
167+ % (pid_filename, e))
168+
169+ def _start_subprocess(self, path, env_changes):
170+ fd, pid_filename = tempfile.mkstemp(prefix='tmp-lp-forking-service-',
171+ suffix='.pid')
172+ self.service_pid_filename = pid_filename
173+ os.close(fd)
174+ proc = self.start_bzr_subprocess(
175+ ['lp-service', '--path', path, '--no-preload',
176+ '--children-timeout=1', '--pid-file', pid_filename],
177+ env_changes=env_changes)
178+ trace.mutter('started lp-service daemon')
179+ # We wait for the spawned process to exit, expecting it to report the
180+ # final pid into the pid_filename.
181+ tnow = time.time()
182+ tstop_waiting = tnow + 1.0
183+ # When this returns, the first fork has completed and the parent has
184+ # exited.
185+ proc.wait()
186+ while tnow < tstop_waiting:
187+ # Wait for the socket to become available
188+ if os.path.exists(path):
189+ # The service has created the socket for us to communicate
190+ break
191+ time.sleep(0.1)
192+ tnow = time.time()
193+
194+ with open(pid_filename, 'rb') as f:
195+ pid = f.read()
196+ trace.mutter('found pid: %r' % (pid,))
197+ pid = int(pid.strip())
198+ # This is now the pid of the final daemon
199+ trace.mutter('lp-forking-service daemon at pid %s' % (pid,))
200+ # Because nothing else will clean this up, add this final handler to
201+ # clean up if all else fails.
202+ self.addCleanup(self._cleanup_daemon, pid, pid_filename)
203+ # self.service_process will now be the pid of the daemon, rather than a
204+ # Popen object.
205+ return pid
206+
207+ def stop_service(self):
208+ if self.service_process is None:
209+ # Already stopped
210+ return
211+ # First, try to stop the service gracefully, by sending a 'quit'
212+ # message
213+ try:
214+ response = self.send_message_to_service('quit\n')
215+ except socket.error, e:
216+ # Ignore a failure to connect, the service must be stopping/stopped
217+ # already
218+ response = None
219+ if response is not None:
220+ self.assertEqual('ok\nquit command requested... exiting\n',
221+ response)
222+ # Wait for the process to actually exit, or force it if necessary.
223+ tnow = time.time()
224+ tend = tnow + 2.0
225+ # We'll be nice a couple of times, and then get mean
226+ attempts = [None, None, None, signal.SIGTERM, signal.SIGKILL]
227+ stopped = False
228+ unclean = False
229+ while tnow < tend:
230+ try:
231+ os.kill(self.service_process, 0)
232+ except (OSError, IOError), e:
233+ if e.errno == errno.ESRCH:
234+ # The process has successfully exited
235+ stopped = True
236+ break
237+ raise
238+ else:
239+ # The process has not exited yet
240+ time.sleep(0.1)
241+ if attempts:
242+ sig = attempts.pop(0)
243+ if sig is not None:
244+ unclean = True
245+ try:
246+ os.kill(self.service_process, sig)
247+ except (OSError, IOError), e:
248+ if e.errno == errno.ESRCH:
249+ stopped = True
250+ break
251+ raise
252+ if not stopped:
253+ self.fail('Unable to stop the daemon process (pid %s) after 2.0s'
254+ % (self.service_process,))
255+ elif unclean:
256+ self.fail('Process (pid %s) had to be shut-down'
257+ % (self.service_process,))
258+ self.service_process = None
259+
260+ def test_simple_start_and_stop(self):
261+ pass # All the work is done in setUp()
262+
263+ def test_starts_and_cleans_up(self):
264+ # The service should be up and responsive
265+ response = self.send_message_to_service('hello\n')
266+ self.assertEqual('ok\nyep, still alive\n', response)
267+ self.failUnless(os.path.isfile(self.service_pid_filename))
268+ with open(self.service_pid_filename, 'rb') as f:
269+ content = f.read()
270+ self.assertEqualDiff('%d\n' % (self.service_process,), content)
271+ # We're done, shut it down
272+ self.stop_service()
273+ self.failIf(os.path.isfile(self.service_pid_filename))