Merge lp:~julian-edwards/launchpad/builder-reset-fail-bug-563353 into lp:launchpad

Proposed by Julian Edwards
Status: Merged
Approved by: Jonathan Lange
Approved revision: no longer in the source branch.
Merged at revision: not available
Proposed branch: lp:~julian-edwards/launchpad/builder-reset-fail-bug-563353
Merge into: lp:launchpad
Diff against target: 428 lines (+198/-36)
3 files modified
lib/lp/buildmaster/manager.py (+33/-17)
lib/lp/buildmaster/tests/test_manager.py (+96/-19)
lib/lp/services/twistedsupport/processmonitor.py (+69/-0)
To merge this branch: bzr merge lp:~julian-edwards/launchpad/builder-reset-fail-bug-563353
Reviewer Review Type Date Requested Status
Jonathan Lange (community) Approve
Review via email: mp+23772@code.launchpad.net

Description of the change

Dear reviewer, if you're not familiar with twisted this branch is going to
make your head spin. Sorry. If you are familiar, it will still make your
head spin, because it's Soyuz, and it's one of the most complicated parts of
Soyuz. Sorry again.

= Summary =
Disable virtual builders that fail to reset

== Proposed fix ==
As per bug 563353, sometimes a PPA builder will fail to reset because the
reset command to its host fails. Right now, this is ignored and we blindly
try again and again and the builder remains idle because the dispatcher
ignores it on each scan once the reset fails.

What we need to do is to fail the builder and insert the results of the reset
command into the builder.failnotes column so that the buildd-admins can see
that the builder failed, and why.

== Pre-implementation notes ==
Chatted at length to jml about this since he's a twisted expert and I'm not!
Basically the buildd-manager is using the wrong way to time out a process as
we don't get to see its stdout/stderr.

The old way of using run_process_with_timeout() just returns a twisted Failure
in the Deferred callback.

jml explained to me how to set up a protocol.ProcessProtocol with a
TimeoutMixin so that we can capture a process's output and fire the timer in
the future if it didn't complete fast enough.

== Implementation details ==
I've implemented a new class "ProcessWithTimeout" that will handle managing an
external process while catching its stdout, stderr and return code, and
killing it if it overruns the timeout specified.

This could potentially move to lp.services in the future and gain its own unit
tests, but that is out of scope for this branch since it's mostly geared to
what the buildd-manager is doing.

Other changes:
 * resumeSlave() changed to use the new process spawning. It also takes an
optional twisted.internet.task.Clock to make testing easier as the clock can
be moved around at will, which makes testing timeouts much faster!

 * ResetDispatchResult now fails the builder (this is the class that
encapsulates the actions on builder reset failures) and pokes the failnotes on
the builder based on the stdout/stderr from the process.

== Tests ==
Lots of test updated in test_manager - some of them had some nasty bugs!

 * generally, I've changed "d.addCallback" or "d.addErrBack" calls to
d.addBoth. The problem with the former ones is that it allows the code to do
the wrong thing and the tests don't notice if a failure callback is generated
when it's only expecting success, or vice-versa.

 * config.pop moved to addCleanup() so that it's guaranteed to clean up.
Config changing is one of those nasty gotchas that's persistent across tests.

bin/test -cvv test_manager

== Demo and Q/A ==
Dogfood will get a beating when it's up.

= Launchpad lint =

I'm completely ignoring all this crack - pyflakes is returning total rubbish.

Checking for conflicts. and issues in doctests and templates.
Running jslint, xmllint, pyflakes, and pylint.
Using normal rules.

Linting changed files:
  lib/lp/buildmaster/tests/test_manager.py
  lib/lp/buildmaster/manager.py

== Pyflakes notices ==

lib/lp/buildmaster/manager.py
    418: local variable 'candidate' is assigned to but never used

== Pylint notices ==

lib/lp/buildmaster/tests/test_manager.py
    8: [F0401] Unable to import 'transaction'
    18: [F0401] Unable to import 'zope.component'
    19: [F0401] Unable to import 'zope.security.proxy'
    21: [F0401] Unable to import 'canonical.buildd.tests'
    22: [F0401] Unable to import 'canonical.config'
    23: [F0401] Unable to import 'canonical.launchpad.ftests'
    24: [F0401] Unable to import 'canonical.launchpad.scripts.logger'
    25: [F0401] Unable to import 'canonical.testing.layers'
    27: [F0401] Unable to import 'lp.buildmaster.interfaces.buildbase'
    28: [F0401] Unable to import 'lp.buildmaster.interfaces.builder'
    29: [F0401] Unable to import 'lp.buildmaster.interfaces.buildqueue'
    30: [F0401] Unable to import 'lp.buildmaster.manager'
    33: [F0401] Unable to import 'lp.buildmaster.tests.harness'
    34: [F0401] Unable to import 'lp.registry.interfaces.distribution'
    35: [F0401] Unable to import 'lp.soyuz.interfaces.binarypackagebuild'
    36: [F0401] Unable to import 'lp.soyuz.tests.soyuzbuilddhelpers'
    37: [F0401] Unable to import 'lp.soyuz.tests.test_publishing'
    518: [F0401, TestBuilddManagerScan.assertBuildingJob] Unable to import
'lp.services.job.interfaces.job'

lib/lp/buildmaster/manager.py
    20: [F0401] Unable to import 'transaction'
    29: [F0401] Unable to import 'zope.component'
    31: [F0401] Unable to import 'canonical.config'
    32: [F0401] Unable to import 'canonical.launchpad.webapp'
    33: [F0401] Unable to import 'canonical.librarian.db'
    34: [F0401] Unable to import 'lp.buildmaster.interfaces.buildbase'
    235: [F0401, FailDispatchResult.__call__] Unable to import
'lp.buildmaster.interfaces.builder'
    255: [F0401, ResetDispatchResult.__call__] Unable to import
'lp.buildmaster.interfaces.builder'
    391: [F0401, BuilddManager.scan] Unable to import
'lp.buildmaster.interfaces.builder'

To post a comment you must log in.
Revision history for this message
Jonathan Lange (jml) wrote :

Hey Julian,

My head didn't spin while doing this review!

It's great to see the tests becoming faster *and* simpler while at the same
time the code is becoming more correct.

I have a bunch of comments, mostly shallow.

Thanks!
jml

=== modified file 'lib/lp/buildmaster/manager.py'

 * Move ProcessWithTimeout into lp.services.twistedsupport. It's generic
   enough to be re-used, and no one will ever find it in
   lp.buildmaster.manager.

 * Properly speaking, a class this big ought to have unit tests. I won't
   insist on it -- follow your heart.

 * 'deferred', 'outBuf', 'errBuf' and 'processTransport' should all begin
   with underscores if they are not used outside of the class. If they are
   used, they should be documented with :ivar deferred: etc as part of the
   class docstring.

 * 'outBuf', 'errBuf' and 'processTransport' should be formatted with
   underscores instead of spaces.

 * The line 'self.outReceived = self.outBuf.write' and its err twin will
   probably be a little hard to follow for someone who isn't familiar with
   ProcessProtocol. Could you please add a comment explaining what's going on?
   e.g.

     # outReceived and errReceived are callback methods on ProcessProtocol.
     # All we want to do when we receive stuff from stdout or stderr is store
     # it for later.

 * ProcessWithTimeout.spawnProcess might as well take *args and **kwargs and
   pass them on to reactor.spawnProcess.

 * In ProcessWithTimeout.timeoutConnection, I'm nearly certain that method
   does nothing to record a TimeoutError -- is the docstring correct?

 * You can get rid of a lot of the if/else business in checkResume by
   not using 'addBoth' to add it as a callback to the Deferred.

 * In the following line:
   + self.logger.error( '%s resume failure: %s' % (slave, error_text))
   The space after "error(" should be deleted for PEP 8 glory.

=== modified file 'lib/lp/buildmaster/tests/test_manager.py'

 * The "from twisted.internet.error import" line can be joined with the next

 * I don't think you need to do self.assertIsInstance(failure.value, tuple).
   You don't actually care if it's a tuple, you just care if it will unpack
   to the right values.

 * I don't grok the 'builderok' change at the end of the test file -- what's
   going on there?

 * Possibly related, but I don't see anything here that tests the new
   "builder.failBuilder(self.info)" command that you've added.

Hope this helps,
jml

review: Needs Fixing
Revision history for this message
Julian Edwards (julian-edwards) wrote :

On Tuesday 20 April 2010 19:34:36 you wrote:
> Review: Needs Fixing
> Hey Julian,
>
> My head didn't spin while doing this review!

Yes but it's cheating if you review it!

Thanks for stepping in though.

> It's great to see the tests becoming faster *and* simpler while at the same
> time the code is becoming more correct.

Hell yeah.

> === modified file 'lib/lp/buildmaster/manager.py'
>
> * Move ProcessWithTimeout into lp.services.twistedsupport. It's generic
> enough to be re-used, and no one will ever find it in
> lp.buildmaster.manager.

Done.

>
> * Properly speaking, a class this big ought to have unit tests. I won't
> insist on it -- follow your heart.

My heart says yes, my schedule says add an XXX.

:(

> * 'deferred', 'outBuf', 'errBuf' and 'processTransport' should all begin
> with underscores if they are not used outside of the class. If they are
> used, they should be documented with :ivar deferred: etc as part of the
> class docstring.

Right, all underscored.

> * 'outBuf', 'errBuf' and 'processTransport' should be formatted with
> underscores instead of spaces.

Assuming you mean camelCase -> under_scored, done.

> * The line 'self.outReceived = self.outBuf.write' and its err twin will
> probably be a little hard to follow for someone who isn't familiar with
> ProcessProtocol. Could you please add a comment explaining what's going
> on? e.g.
>
> # outReceived and errReceived are callback methods on ProcessProtocol.
> # All we want to do when we receive stuff from stdout or stderr is
> store # it for later.

Copied verbatim, thanks.

> * ProcessWithTimeout.spawnProcess might as well take *args and **kwargs
> and pass them on to reactor.spawnProcess.

Done.

> * In ProcessWithTimeout.timeoutConnection, I'm nearly certain that method
> does nothing to record a TimeoutError -- is the docstring correct?

Well I copied it from your example :)

I removed the reference!

> * You can get rid of a lot of the if/else business in checkResume by
> not using 'addBoth' to add it as a callback to the Deferred.

All sorted, the success case doesn't need a callback.

>
> * In the following line:
> + self.logger.error( '%s resume failure: %s' % (slave,
> error_text)) The space after "error(" should be deleted for PEP 8 glory.

Fixed.

> === modified file 'lib/lp/buildmaster/tests/test_manager.py'
>
> * The "from twisted.internet.error import" line can be joined with the
> next

Fixed.

> * I don't think you need to do self.assertIsInstance(failure.value,
> tuple). You don't actually care if it's a tuple, you just care if it will
> unpack to the right values.

Fixed.

> * I don't grok

Heinlein fan?

> the 'builderok' change at the end of the test file --
> what's going on there?

The new changes mean that when the resume fails, we also fail the builder,
therefore builderok == False.

> * Possibly related, but I don't see anything here that tests the new
> "builder.failBuilder(self.info)" command that you've added.

It's related, that was the test.

Partial diff attached, cheers!

Revision history for this message
Jonathan Lange (jml) wrote :

On Wed, Apr 21, 2010 at 6:26 PM, Julian Edwards
<email address hidden> wrote:
> On Tuesday 20 April 2010 19:34:36 you wrote:
>> Review: Needs Fixing
>> Hey Julian,
>>
>> My head didn't spin while doing this review!
>
> Yes but it's cheating if you review it!
>
> Thanks for stepping in though.
>

My pleasure.

>>  * Properly speaking, a class this big ought to have unit tests. I won't
>>    insist on it -- follow your heart.
>
> My heart says yes, my schedule says add an XXX.
>
> :(
>

That's OK. Maybe I'll get bored one evening and add some.

>>  * 'outBuf', 'errBuf' and 'processTransport' should be formatted with
>>    underscores instead of spaces.
>
> Assuming you mean camelCase -> under_scored, done.
>

Right. Thanks.

...
>>  * I don't grok
>
> Heinlein fan?

I've read some of his stuff, but I'm not especially a fan. I used to
hear the word a lot more a few years ago on IRC. Seems to have fallen
out of fashion. See also http://catb.org/jargon/html/G/grok.html

Everything looks good to me.

jml

Revision history for this message
Jonathan Lange (jml) :
review: Approve
Revision history for this message
Jonathan Lange (jml) wrote :

As discussed on IRC yesterday, I think r10734 is a step backwards. You could achieve the same result by leaving the errback as it was and adding a callback that swallows the result and returns None. This would leave fewer code branches in the error path, which means that we can spend less time messing about on dogfood when something else unexpected happens.

That said, the branch as it stands contains no regressions compared to *trunk*, and is definitely a net improvement, so I'm happy for you to land it as-is.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/lp/buildmaster/manager.py'
2--- lib/lp/buildmaster/manager.py 2010-04-03 03:46:16 +0000
3+++ lib/lp/buildmaster/manager.py 2010-04-23 15:05:47 +0000
4@@ -15,10 +15,11 @@
5 ]
6
7 import logging
8+import os
9 import transaction
10
11 from twisted.application import service
12-from twisted.internet import reactor, defer
13+from twisted.internet import defer, reactor
14 from twisted.protocols.policies import TimeoutMixin
15 from twisted.python import log
16 from twisted.python.failure import Failure
17@@ -30,7 +31,7 @@
18 from canonical.launchpad.webapp import urlappend
19 from canonical.librarian.db import write_transaction
20 from lp.buildmaster.interfaces.buildbase import BUILDD_MANAGER_LOG_NAME
21-from lp.services.twistedsupport.processmonitor import run_process_with_timeout
22+from lp.services.twistedsupport.processmonitor import ProcessWithTimeout
23
24
25 buildd_success_result_map = {
26@@ -113,7 +114,7 @@
27 self.resume_requested = True
28 return ['', '', 0]
29
30- def resumeSlave(self):
31+ def resumeSlave(self, clock=None):
32 """Resume the builder in a asynchronous fashion.
33
34 Used the configuration command-line in the same way
35@@ -122,14 +123,20 @@
36 Also use the builddmaster configuration 'socket_timeout' as
37 the process timeout.
38
39+ :param clock: An optional twisted.internet.task.Clock to override
40+ the default clock. For use in tests.
41+
42 :return: a Deferred
43 """
44 resume_command = config.builddmaster.vm_resume_command % {
45 'vm_host': self.vm_host}
46 # Twisted API require string and the configuration provides unicode.
47 resume_argv = [str(term) for term in resume_command.split()]
48- d = run_process_with_timeout(
49- tuple(resume_argv), timeout=config.builddmaster.socket_timeout)
50+
51+ d = defer.Deferred()
52+ p = ProcessWithTimeout(
53+ d, config.builddmaster.socket_timeout, clock=clock)
54+ p.spawnProcess(resume_argv[0], tuple(resume_argv))
55 return d
56
57
58@@ -180,11 +187,11 @@
59 """Represents a failure to reset a builder.
60
61 When evaluated this object simply cleans up the running job
62- (`IBuildQueue`).
63+ (`IBuildQueue`) and marks the builder down.
64 """
65
66 def __repr__(self):
67- return '%r reset' % self.slave
68+ return '%r reset failure' % self.slave
69
70 @write_transaction
71 def __call__(self):
72@@ -192,6 +199,9 @@
73 from lp.buildmaster.interfaces.builder import IBuilderSet
74
75 builder = getUtility(IBuilderSet)[self.slave.name]
76+ # Builders that fail to reset should be disabled as per bug
77+ # 563353.
78+ builder.failBuilder(self.info)
79 self._cleanJob(builder.currentjob)
80
81
82@@ -357,18 +367,24 @@
83 return recording_slaves
84
85 def checkResume(self, response, slave):
86- """Verify the results of a slave resume procedure.
87+ """Deal with a slave resume failure.
88
89- If it failed, it returns a corresponding `ResetDispatchResult`
90- dispatch result.
91+ Return a corresponding `ResetDispatchResult` dispatch result,
92+ which is chained to the next callback, dispatchBuild.
93 """
94- if not isinstance(response, Failure):
95- return None
96+ # 'response' is the tuple that's constructed in
97+ # ProcessWithTimeout.processEnded(), or is a Failure that
98+ # contains the tuple.
99+ if isinstance(response, Failure):
100+ out, err, code = response.value
101+ else:
102+ out, err, code = response
103+ if code == os.EX_OK:
104+ return None
105
106- self.logger.error(
107- '%s resume failure: %s' % (slave, response.getErrorMessage()))
108- self.slaveDone(slave)
109- return self.reset_result(slave)
110+ error_text = '%s\n%s' % (out, err)
111+ self.logger.error('%s resume failure: %s' % (slave, error_text))
112+ return self.reset_result(slave, error_text)
113
114 def checkDispatch(self, response, method, slave):
115 """Verify the results of a slave xmlrpc call.
116@@ -442,10 +458,10 @@
117 If the slave resuming succeed, it starts the XMLRPC dialog. See
118 `_mayDispatch` for more information.
119 """
120- self.logger.info('Dispatching: %s' % slave)
121 if resume_result is not None:
122 self.slaveDone(slave)
123 return resume_result
124+ self.logger.info('Dispatching: %s' % slave)
125 self._mayDispatch(slave)
126
127 def _getProxyForSlave(self, slave):
128
129=== modified file 'lib/lp/buildmaster/tests/test_manager.py'
130--- lib/lp/buildmaster/tests/test_manager.py 2010-04-12 16:23:13 +0000
131+++ lib/lp/buildmaster/tests/test_manager.py 2010-04-23 15:05:47 +0000
132@@ -4,12 +4,13 @@
133 """Tests for the renovated slave scanner aka BuilddManager."""
134
135 import os
136+import signal
137 import transaction
138 import unittest
139
140 from twisted.internet import defer
141-from twisted.internet.error import (
142- ConnectionClosed, ProcessTerminated, TimeoutError)
143+from twisted.internet.error import ConnectionClosed
144+from twisted.internet.task import Clock
145 from twisted.python.failure import Failure
146 from twisted.trial.unittest import TestCase as TrialTestCase
147
148@@ -109,9 +110,11 @@
149
150 # On success the response is None.
151 def check_resume_success(response):
152- self.assertEquals(None, response)
153+ out, err, code = response
154+ self.assertEqual(os.EX_OK, code)
155+ self.assertEqual("%s\n" % self.slave.vm_host, out)
156 d = self.slave.resumeSlave()
157- d.addCallback(check_resume_success)
158+ d.addBoth(check_resume_success)
159 return d
160
161 def test_resumeHost_failure(self):
162@@ -124,15 +127,16 @@
163 vm_resume_command: test "%(vm_host)s = 'no-sir'"
164 """
165 config.push('failed_resume_command', failed_config)
166+ self.addCleanup(config.pop, 'failed_resume_command')
167
168 # On failures, the response is a twisted `Failure` object containing
169- # a `ProcessTerminated` error.
170+ # a tuple.
171 def check_resume_failure(failure):
172- self.assertIsInstance(failure, Failure)
173- self.assertIsInstance(failure.value, ProcessTerminated)
174- config.pop('failed_resume_command')
175+ out, err, code = failure.value
176+ # The process will exit with a return code of "1".
177+ self.assertEqual(code, 1)
178 d = self.slave.resumeSlave()
179- d.addErrback(check_resume_failure)
180+ d.addBoth(check_resume_failure)
181 return d
182
183 def test_resumeHost_timeout(self):
184@@ -146,15 +150,21 @@
185 socket_timeout: 1
186 """
187 config.push('timeout_resume_command', timeout_config)
188+ self.addCleanup(config.pop, 'timeout_resume_command')
189
190 # On timeouts, the response is a twisted `Failure` object containing
191 # a `TimeoutError` error.
192 def check_resume_timeout(failure):
193 self.assertIsInstance(failure, Failure)
194- self.assertIsInstance(failure.value, TimeoutError)
195- config.pop('timeout_resume_command')
196- d = self.slave.resumeSlave()
197- d.addErrback(check_resume_timeout)
198+ out, err, code = failure.value
199+ self.assertEqual(code, signal.SIGKILL)
200+ clock = Clock()
201+ d = self.slave.resumeSlave(clock=clock)
202+ # Move the clock beyond the socket_timeout but earlier than the
203+ # sleep 5. This stops the test having to wait for the timeout.
204+ # Fast tests FTW!
205+ clock.advance(2)
206+ d.addBoth(check_resume_timeout)
207 return d
208
209
210@@ -236,6 +246,7 @@
211 # Stop automatic collection of dispatching results.
212 def testSlaveDone(slave):
213 pass
214+ self._realSlaveDone = self.manager.slaveDone
215 self.manager.slaveDone = testSlaveDone
216
217 def testFinishCycle(self):
218@@ -274,7 +285,7 @@
219 # They corresponds to the ones created above and were already
220 # processed.
221 self.assertEqual(
222- '<foo:http://foo> reset', repr(reset))
223+ '<foo:http://foo> reset failure', repr(reset))
224 self.assertTrue(reset.processed)
225 self.assertEqual(
226 '<bar:http://bar> failure (boingo)', repr(fail))
227@@ -293,7 +304,7 @@
228 isinstance(result, TestingFailDispatchResult),
229 'Dispatch failure did not result in a FailBuildResult object')
230
231- def testCheckResume(self):
232+ def test_checkResume(self):
233 """`BuilddManager.checkResume` is chained after resume requests.
234
235 If the resume request succeed it returns None, otherwise it returns
236@@ -310,11 +321,76 @@
237 self.assertEqual(
238 None, result, 'Successful resume checks should return None')
239
240- failed_response = Failure(ProcessTerminated())
241+ failed_response = ['stdout', 'stderr', 1]
242 result = self.manager.checkResume(failed_response, slave)
243 self.assertIsDispatchReset(result)
244 self.assertEqual(
245- '<foo:http://foo.buildd:8221/> reset', repr(result))
246+ '<foo:http://foo.buildd:8221/> reset failure', repr(result))
247+ self.assertEqual(
248+ result.info, "stdout\nstderr")
249+
250+ def test_fail_to_resume_slave_resets_slave(self):
251+ # If an attempt to resume and dispatch a slave fails, we reset the
252+ # slave by calling self.reset_result(slave)().
253+
254+ reset_result_calls = []
255+ class LoggingResetResult(BaseDispatchResult):
256+ """A DispatchResult that logs calls to itself.
257+
258+ This *must* subclass BaseDispatchResult, otherwise finishCycle()
259+ won't treat it like a dispatch result.
260+ """
261+ def __init__(self, slave, info=None):
262+ self.slave = slave
263+ def __call__(self):
264+ reset_result_calls.append(self.slave)
265+
266+ # Make a failing slave that is requesting a resume.
267+ slave = RecordingSlave('foo', 'http://foo.buildd:8221/', 'foo.host')
268+ slave.resume_requested = True
269+ slave.resumeSlave = lambda: defer.fail(Failure(('out', 'err', 1)))
270+
271+ # Make the manager log the reset result calls.
272+ self.manager.reset_result = LoggingResetResult
273+ # Restore the slaveDone method. It's very relevant to this test.
274+ self.manager.slaveDone = self._realSlaveDone
275+ # We only care about this one slave. Reset the list of manager
276+ # deferreds in case setUp did something unexpected.
277+ self.manager._deferreds = []
278+
279+ self.manager.resumeAndDispatch([slave])
280+ # Note: finishCycle isn't generally called by external users, normally
281+ # resumeAndDispatch or slaveDone calls it. However, these calls
282+ # swallow the Deferred that finishCycle returns, and we need that
283+ # Deferred to make sure this test completes properly.
284+ d = self.manager.finishCycle()
285+ return d.addCallback(
286+ lambda ignored: self.assertEqual([slave], reset_result_calls))
287+
288+ def test_failed_to_resume_slave_ready_for_reset(self):
289+ # When a slave fails to resume, the manager has a Deferred in its
290+ # Deferred list that is ready to fire with a ResetDispatchResult.
291+
292+ # Make a failing slave that is requesting a resume.
293+ slave = RecordingSlave('foo', 'http://foo.buildd:8221/', 'foo.host')
294+ slave.resume_requested = True
295+ slave.resumeSlave = lambda: defer.fail(Failure(('out', 'err', 1)))
296+
297+ # We only care about this one slave. Reset the list of manager
298+ # deferreds in case setUp did something unexpected.
299+ self.manager._deferreds = []
300+ # Restore the slaveDone method. It's very relevant to this test.
301+ self.manager.slaveDone = self._realSlaveDone
302+ self.manager.resumeAndDispatch([slave])
303+ [d] = self.manager._deferreds
304+
305+ # The Deferred for our failing slave should be ready to fire
306+ # successfully with a ResetDispatchResult.
307+ def check_result(result):
308+ self.assertIsInstance(result, ResetDispatchResult)
309+ self.assertEqual(slave, result.slave)
310+ self.assertFalse(result.processed)
311+ return d.addCallback(check_result)
312
313 def testCheckDispatch(self):
314 """`BuilddManager.checkDispatch` is chained after dispatch requests.
315@@ -366,7 +442,7 @@
316 twisted_failure, 'ensurepresent', slave)
317 self.assertIsDispatchReset(result)
318 self.assertEqual(
319- '<foo:http://foo.buildd:8221/> reset', repr(result))
320+ '<foo:http://foo.buildd:8221/> reset failure', repr(result))
321
322 # Unexpected response, results in a `FailDispatchResult`.
323 unexpected_response = [1, 2, 3]
324@@ -746,6 +822,7 @@
325 Although it keeps the builder active in pool.
326 """
327 builder, job_id = self._getBuilder('bob')
328+ builder.builderok = True
329
330 # Setup a interaction to satisfy 'write_transaction' decorator.
331 login(ANONYMOUS)
332@@ -755,7 +832,7 @@
333
334 self.assertJobIsClean(job_id)
335
336- self.assertTrue(builder.builderok)
337+ self.assertFalse(builder.builderok)
338 self.assertEqual(None, builder.currentjob)
339
340 def testFailDispatchResult(self):
341
342=== modified file 'lib/lp/services/twistedsupport/processmonitor.py'
343--- lib/lp/services/twistedsupport/processmonitor.py 2009-07-26 14:19:49 +0000
344+++ lib/lp/services/twistedsupport/processmonitor.py 2010-04-23 15:05:47 +0000
345@@ -7,10 +7,14 @@
346 __all__ = [
347 'ProcessMonitorProtocol',
348 'ProcessMonitorProtocolWithTimeout',
349+ 'ProcessWithTimeout',
350 'run_process_with_timeout',
351 ]
352
353
354+import os
355+import StringIO
356+
357 from twisted.internet import defer, error, reactor
358 from twisted.internet.protocol import ProcessProtocol
359 from twisted.protocols.policies import TimeoutMixin
360@@ -253,3 +257,68 @@
361 executable = args[0]
362 reactor.spawnProcess(p, executable, args)
363 return d
364+
365+
366+class ProcessWithTimeout(ProcessProtocol, TimeoutMixin):
367+ """Run a process and capture its output while applying a timeout."""
368+
369+ # XXX Julian 2010-04-21
370+ # This class doesn't have any unit tests yet, it's used by
371+ # lib/lp/buildmaster/manager.py which tests its features indirectly.
372+
373+ def __init__(self, deferred, timeout, clock=None):
374+ self._deferred = deferred
375+ self._clock = clock
376+ self._timeout = timeout
377+ self._out_buf = StringIO.StringIO()
378+ self._err_buf = StringIO.StringIO()
379+ self._process_transport = None
380+
381+ # outReceived and errReceived are callback methods on
382+ # ProcessProtocol.
383+ # All we want to do when we receive stuff from stdout
384+ # or stderr is store it for later.
385+ self.outReceived = self._out_buf.write
386+ self.errReceived = self._err_buf.write
387+
388+ def callLater(self, period, func):
389+ """Override TimeoutMixin.callLater so we use self._clock.
390+
391+ This allows us to write unit tests that don't depend on actual wall
392+ clock time.
393+ """
394+ if self._clock is None:
395+ return TimeoutMixin.callLater(self, period, func)
396+
397+ return self._clock.callLater(period, func)
398+
399+ def spawnProcess(self, *args, **kwargs):
400+ """Start a process.
401+
402+ See reactor.spawnProcess.
403+ """
404+ self._process_transport = reactor.spawnProcess(
405+ self, *args, **kwargs)
406+
407+ def connectionMade(self):
408+ """Start the timeout counter when connection is made."""
409+ self.setTimeout(self._timeout)
410+
411+ def timeoutConnection(self):
412+ """When a timeout occurs, kill the process with a SIGKILL."""
413+ self._process_transport.signalProcess("KILL")
414+ # processEnded will get called.
415+
416+ def processEnded(self, reason):
417+ self.setTimeout(None)
418+ out = self._out_buf.getvalue()
419+ err = self._err_buf.getvalue()
420+ e = reason.value
421+ code = e.exitCode
422+ if e.signal:
423+ self._deferred.errback((out, err, e.signal))
424+ elif code != os.EX_OK:
425+ self._deferred.errback((out, err, code))
426+ else:
427+ self._deferred.callback((out, err, code))
428+