Merge lp:~jameinel/launchpad/lp-service into lp:launchpad/db-devel

Proposed by John A Meinel
Status: Superseded
Proposed branch: lp:~jameinel/launchpad/lp-service
Merge into: lp:launchpad/db-devel
Diff against target: 273 lines (+194/-11)
2 files modified
bzrplugins/lpserve/__init__.py (+53/-1)
bzrplugins/lpserve/test_lpserve.py (+141/-10)
To merge this branch: bzr merge lp:~jameinel/launchpad/lp-service
Reviewer Review Type Date Requested Status
Michael Hudson-Doyle Approve
Launchpad code reviewers Pending
Review via email: mp+40386@code.launchpad.net

This proposal has been superseded by a proposal from 2010-11-29.

Commit message

Add '--pid-file' to launchpad-forking-service

Description of the change

This adds a --pid-file option to "bzr launchpad-forking-service". Which then causes the service to Daemonize itself, and write its pid to the given file.

This was brought out of the discussion of trying to get the service rolled out into qastaging.
see https://rt.admin.canonical.com/Ticket/Display.html?=42199

It would seem that 'start-stop-daemon' could be told to do the forking and pid-file management (--background and --make-pidfile), but it seems to state that "This is a last resort, and is only meant for programs that either make no sense forking on their own, or where it's not feasible to add the code for them to do this themselves."

The change seems reasonably localized, and hopefully reasonably tested. In general with IPC, I'm not always sure how to avoid both race conditions, hangs, and spurious failures. I chose 2.0s as a way to wait for the process to cleanup properly, but not block forever and hang the test suite.

To post a comment you must log in.
Revision history for this message
John A Meinel (jameinel) wrote :

Note that I still need some reviewer hand-holding to submit this to ec2, etc.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

As I said on IRC, I think you should wait until the daemon process is accepting requests before proceeding in _start_subprocess.

The 'out' variable in _daemonize is strangely located and named -- can you call it something clearer and assign to it closer to its use? I don't think we need to chdir to /, so please delete that comment. I don't think we really need to setsid either, but it does no harm I guess.

It's a bit of a shame you had to write (or copy/paste) so much code for the process handling, but oh well. There are good reasons this daemon isn't using twistd.

review: Approve
Revision history for this message
John A Meinel (jameinel) wrote :

setsid() is recommended from the daemonize code I've read:
  http://www.itp.uzh.ch/~dpotter/howto/daemonize

os.chdir('/') is sort of recommended, but doesn't seem to be required.

I moved the '/dev/null' definition closer.

I think that addresses everything. If you could give it a once-over and then send it via ec2land, I would appreciate it.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'bzrplugins/lpserve/__init__.py'
--- bzrplugins/lpserve/__init__.py 2010-10-07 23:43:40 +0000
+++ bzrplugins/lpserve/__init__.py 2010-11-08 22:47:54 +0000
@@ -568,6 +568,10 @@
568 self.log(client_addr, 'request timeout failure: %s' % (e,))568 self.log(client_addr, 'request timeout failure: %s' % (e,))
569 conn.sendall('FAILURE\nrequest timed out\n')569 conn.sendall('FAILURE\nrequest timed out\n')
570 conn.close()570 conn.close()
571 except Exception, e:
572 trace.log_exception_quietly()
573 self.log(client_addr, 'trapped a failure while handling'
574 ' connection: %s' % (e,))
571 self._poll_children()575 self._poll_children()
572576
573 def log(self, client_addr, message):577 def log(self, client_addr, message):
@@ -759,6 +763,8 @@
759 help="Do/don't preload libraries before startup."),763 help="Do/don't preload libraries before startup."),
760 Option('children-timeout', type=int, argname='SEC',764 Option('children-timeout', type=int, argname='SEC',
761 help="Only wait SEC seconds for children to exit"),765 help="Only wait SEC seconds for children to exit"),
766 Option('pid-file', type=unicode,
767 help='Write the process PID to this file.')
762 ]768 ]
763769
764 def _preload_libraries(self):770 def _preload_libraries(self):
@@ -768,8 +774,48 @@
768 except ImportError, e:774 except ImportError, e:
769 trace.mutter('failed to preload %s: %s' % (pyname, e))775 trace.mutter('failed to preload %s: %s' % (pyname, e))
770776
777
778 def _daemonize(self, pid_filename):
779 """Turn this process into a child-of-init daemon.
780
781 Upon request, we relinquish our control and switch to daemon mode,
782 writing out the final pid of the daemon process.
783 """
784 # If fork fails, it will bubble out naturally and be reported by the
785 # cmd logic
786 pid = os.fork()
787 if pid > 0:
788 # Original process exits cleanly
789 os._exit(0)
790
791 # Disconnect from the parent process
792 os.setsid()
793
794 # fork again, to truly become a daemon.
795 pid = os.fork()
796 if pid > 0:
797 os._exit(0)
798
799 # Redirect file handles
800 stdin = open('/dev/null', 'r')
801 os.dup2(stdin.fileno(), sys.stdin.fileno())
802 stdout = open('/dev/null', 'a+')
803 os.dup2(stdout.fileno(), sys.stdout.fileno())
804 stderr = open('/dev/null', 'a+', 0)
805 os.dup2(stderr.fileno(), sys.stderr.fileno())
806
807 # Now that we are a daemon, let people know what pid is running
808 f = open(pid_filename, 'wb')
809 try:
810 f.write('%d\n' % (os.getpid(),))
811 finally:
812 f.close()
813
771 def run(self, path=None, perms=None, preload=True,814 def run(self, path=None, perms=None, preload=True,
772 children_timeout=LPForkingService.WAIT_FOR_CHILDREN_TIMEOUT):815 children_timeout=LPForkingService.WAIT_FOR_CHILDREN_TIMEOUT,
816 pid_file=None):
817 if pid_file is not None:
818 self._daemonize(pid_file)
773 if path is None:819 if path is None:
774 path = LPForkingService.DEFAULT_PATH820 path = LPForkingService.DEFAULT_PATH
775 if perms is None:821 if perms is None:
@@ -781,6 +827,12 @@
781 service = LPForkingService(path, perms)827 service = LPForkingService(path, perms)
782 service.WAIT_FOR_CHILDREN_TIMEOUT = children_timeout828 service.WAIT_FOR_CHILDREN_TIMEOUT = children_timeout
783 service.main_loop()829 service.main_loop()
830 if pid_file is not None:
831 try:
832 os.remove(pid_file)
833 except (OSError, IOError), e:
834 trace.mutter('Failed to cleanup pid_file: %s\n%s'
835 % (pid_file, e))
784836
785register_command(cmd_launchpad_forking_service)837register_command(cmd_launchpad_forking_service)
786838
787839
=== modified file 'bzrplugins/lpserve/test_lpserve.py'
--- bzrplugins/lpserve/test_lpserve.py 2010-10-07 23:43:40 +0000
+++ bzrplugins/lpserve/test_lpserve.py 2010-11-08 22:47:54 +0000
@@ -1,6 +1,7 @@
1# Copyright 2010 Canonical Ltd. This software is licensed under the1# Copyright 2010 Canonical Ltd. This software is licensed under the
2# GNU Affero General Public License version 3 (see the file LICENSE).2# GNU Affero General Public License version 3 (see the file LICENSE).
33
4import errno
4import os5import os
5import signal6import signal
6import socket7import socket
@@ -383,6 +384,18 @@
383 self.assertContainsRe(path, '/lp-forking-service-child-')384 self.assertContainsRe(path, '/lp-forking-service-child-')
384 return path, pid, sock385 return path, pid, sock
385386
387 def _start_subprocess(self, path, env_changes):
388 proc = self.start_bzr_subprocess(
389 ['lp-service', '--path', path, '--no-preload',
390 '--children-timeout=1'],
391 env_changes=env_changes)
392 trace.mutter('started lp-service subprocess')
393 expected = 'Listening on socket: %s\n' % (path,)
394 path_line = proc.stderr.readline()
395 trace.mutter(path_line)
396 self.assertEqual(expected, path_line)
397 return proc
398
386 def start_service_subprocess(self):399 def start_service_subprocess(self):
387 # Make sure this plugin is exposed to the subprocess400 # Make sure this plugin is exposed to the subprocess
388 # SLOOWWW (~2 seconds, which is why we are doing the work anyway)401 # SLOOWWW (~2 seconds, which is why we are doing the work anyway)
@@ -406,16 +419,7 @@
406 os.remove(path) # service wants create it as a socket419 os.remove(path) # service wants create it as a socket
407 env_changes = {'BZR_PLUGIN_PATH': lpserve.__path__[0],420 env_changes = {'BZR_PLUGIN_PATH': lpserve.__path__[0],
408 'BZR_LOG': tempname}421 'BZR_LOG': tempname}
409 proc = self.start_bzr_subprocess(422 proc = self._start_subprocess(path, env_changes)
410 ['lp-service', '--path', path, '--no-preload',
411 '--children-timeout=1'],
412 env_changes=env_changes)
413 trace.mutter('started lp-service subprocess')
414 expected = 'Listening on socket: %s\n' % (path,)
415 path_line = proc.stderr.readline()
416 trace.mutter(path_line)
417 self.assertEqual(expected, path_line)
418 # The process won't delete it, so we do
419 return proc, path423 return proc, path
420424
421 def stop_service(self):425 def stop_service(self):
@@ -469,6 +473,9 @@
469 code = int(response.split('\n', 1)[1])473 code = int(response.split('\n', 1)[1])
470 self.assertEqual(expected_code, code)474 self.assertEqual(expected_code, code)
471475
476
477class TestLPServiceInSubprocess(TestCaseWithLPForkingServiceSubprocess):
478
472 def test_fork_lp_serve_hello(self):479 def test_fork_lp_serve_hello(self):
473 path, _, sock = self.send_fork_request('lp-serve --inet 2')480 path, _, sock = self.send_fork_request('lp-serve --inet 2')
474 stdout_content, stderr_content = self.communicate_with_fork(path,481 stdout_content, stderr_content = self.communicate_with_fork(path,
@@ -532,3 +539,127 @@
532539
533 def test_sigint_exits_nicely(self):540 def test_sigint_exits_nicely(self):
534 self._check_exits_nicely(signal.SIGINT)541 self._check_exits_nicely(signal.SIGINT)
542
543
544class TestCaseWithLPForkingServiceDaemon(
545 TestCaseWithLPForkingServiceSubprocess):
546 """Test LPForkingService interaction, when run in daemon mode."""
547
548 def _cleanup_daemon(self, pid, pid_filename):
549 try:
550 os.kill(pid, signal.SIGKILL)
551 except (OSError, IOError), e:
552 trace.mutter('failed to kill pid %d, might be already dead: %s'
553 % (pid, e))
554 try:
555 os.remove(pid_filename)
556 except (OSError, IOError), e:
557 if e.errno != errno.ENOENT:
558 trace.mutter('failed to remove %r: %s'
559 % (pid_filename, e))
560
561 def _start_subprocess(self, path, env_changes):
562 fd, pid_filename = tempfile.mkstemp(prefix='tmp-lp-forking-service-',
563 suffix='.pid')
564 self.service_pid_filename = pid_filename
565 os.close(fd)
566 proc = self.start_bzr_subprocess(
567 ['lp-service', '--path', path, '--no-preload',
568 '--children-timeout=1', '--pid-file', pid_filename],
569 env_changes=env_changes)
570 trace.mutter('started lp-service daemon')
571 # We wait for the spawned process to exit, expecting it to report the
572 # final pid into the pid_filename.
573 tnow = time.time()
574 tstop_waiting = tnow + 1.0
575 # When this returns, the first fork has completed and the parent has
576 # exited.
577 proc.wait()
578 while tnow < tstop_waiting:
579 # Wait for the socket to become available
580 if os.path.exists(path):
581 # The service has created the socket for us to communicate
582 break
583 time.sleep(0.1)
584 tnow = time.time()
585
586 with open(pid_filename, 'rb') as f:
587 pid = f.read()
588 trace.mutter('found pid: %r' % (pid,))
589 pid = int(pid.strip())
590 # This is now the pid of the final daemon
591 trace.mutter('lp-forking-service daemon at pid %s' % (pid,))
592 # Because nothing else will clean this up, add this final handler to
593 # clean up if all else fails.
594 self.addCleanup(self._cleanup_daemon, pid, pid_filename)
595 # self.service_process will now be the pid of the daemon, rather than a
596 # Popen object.
597 return pid
598
599 def stop_service(self):
600 if self.service_process is None:
601 # Already stopped
602 return
603 # First, try to stop the service gracefully, by sending a 'quit'
604 # message
605 try:
606 response = self.send_message_to_service('quit\n')
607 except socket.error, e:
608 # Ignore a failure to connect, the service must be stopping/stopped
609 # already
610 response = None
611 if response is not None:
612 self.assertEqual('ok\nquit command requested... exiting\n',
613 response)
614 # Wait for the process to actually exit, or force it if necessary.
615 tnow = time.time()
616 tend = tnow + 2.0
617 # We'll be nice a couple of times, and then get mean
618 attempts = [None, None, None, signal.SIGTERM, signal.SIGKILL]
619 stopped = False
620 unclean = False
621 while tnow < tend:
622 try:
623 os.kill(self.service_process, 0)
624 except (OSError, IOError), e:
625 if e.errno == errno.ESRCH:
626 # The process has successfully exited
627 stopped = True
628 break
629 raise
630 else:
631 # The process has not exited yet
632 time.sleep(0.1)
633 if attempts:
634 sig = attempts.pop(0)
635 if sig is not None:
636 unclean = True
637 try:
638 os.kill(self.service_process, sig)
639 except (OSError, IOError), e:
640 if e.errno == errno.ESRCH:
641 stopped = True
642 break
643 raise
644 if not stopped:
645 self.fail('Unable to stop the daemon process (pid %s) after 2.0s'
646 % (self.service_process,))
647 elif unclean:
648 self.fail('Process (pid %s) had to be shut-down'
649 % (self.service_process,))
650 self.service_process = None
651
652 def test_simple_start_and_stop(self):
653 pass # All the work is done in setUp()
654
655 def test_starts_and_cleans_up(self):
656 # The service should be up and responsive
657 response = self.send_message_to_service('hello\n')
658 self.assertEqual('ok\nyep, still alive\n', response)
659 self.failUnless(os.path.isfile(self.service_pid_filename))
660 with open(self.service_pid_filename, 'rb') as f:
661 content = f.read()
662 self.assertEqualDiff('%d\n' % (self.service_process,), content)
663 # We're done, shut it down
664 self.stop_service()
665 self.failIf(os.path.isfile(self.service_pid_filename))