Merge lp:~michael.nelson/launchpad/499095-build-retry-depwait-stuck into lp:launchpad

Proposed by Michael Nelson
Status: Merged
Approved by: Brad Crittenden
Approved revision: not available
Merged at revision: not available
Proposed branch: lp:~michael.nelson/launchpad/499095-build-retry-depwait-stuck
Merge into: lp:launchpad
Diff against target: 87 lines (+15/-16)
2 files modified
lib/lp/buildmaster/buildergroup.py (+2/-8)
lib/lp/soyuz/doc/buildd-slavescanner.txt (+13/-8)
To merge this branch: bzr merge lp:~michael.nelson/launchpad/499095-build-retry-depwait-stuck
Reviewer Review Type Date Requested Status
Brad Crittenden (community) code Approve
Review via email: mp+16865@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Michael Nelson (michael.nelson) wrote :

= Summary =
This is a fix for the 2nd cause of bug 499095, where we sometimes get builds on builders with inconsistent states (date_started is not set because the job was already in the RUNNING state).

For details of the cause, see the comments:
https://bugs.edge.launchpad.net/soyuz/+bug/499095/comments/4
https://bugs.edge.launchpad.net/soyuz/+bug/499095/comments/11

== Proposed fix ==

Ensure that the job status is correctly reset back to WAITING after buildStatus_GIVENBACK() and buildStatus_BUILDERFAIL().

== Pre-implementation notes ==

Worked with wgrant to diagnose and fix this second cause.

We really need to update the bm logs so we can see the relevant info. Much of the bm logging is done as warning/debug which doesn't seem to be logged currently.

== Implementation details ==

Nothing weird.

== Tests ==

bin/test -vvt buildd-slavescanner.txt

== Demo and Q/A ==

We can QA this on dogfood (wgrant already QA'd the core of the fix locally).

= Launchpad lint =

Checking for conflicts. and issues in doctests and templates.
Running jslint, xmllint, pyflakes, and pylint.
Using normal rules.

Linting changed files:
  lib/lp/buildmaster/buildergroup.py
  lib/lp/soyuz/doc/buildd-slavescanner.txt

Revision history for this message
Brad Crittenden (bac) wrote :

Changes look reasonable Michael. Thanks for the fix.

review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'lib/lp/buildmaster/buildergroup.py'
--- lib/lp/buildmaster/buildergroup.py 2009-12-03 15:07:17 +0000
+++ lib/lp/buildmaster/buildergroup.py 2010-01-05 16:52:18 +0000
@@ -521,10 +521,8 @@
521 "for its status"))521 "for its status"))
522 # simply reset job522 # simply reset job
523 build = getUtility(IBuildSet).getByQueueEntry(queueItem)523 build = getUtility(IBuildSet).getByQueueEntry(queueItem)
524 build.buildstate = BuildStatus.NEEDSBUILD
525 self.storeBuildInfo(queueItem, librarian, buildid, dependencies)524 self.storeBuildInfo(queueItem, librarian, buildid, dependencies)
526 queueItem.builder = None525 queueItem.reset()
527 queueItem.setDateStarted(None)
528526
529 def buildStatus_GIVENBACK(self, queueItem, librarian, buildid,527 def buildStatus_GIVENBACK(self, queueItem, librarian, buildid,
530 filemap=None, dependencies=None):528 filemap=None, dependencies=None):
@@ -537,16 +535,12 @@
537 self.logger.warning("***** %s is GIVENBACK by %s *****"535 self.logger.warning("***** %s is GIVENBACK by %s *****"
538 % (buildid, queueItem.builder.name))536 % (buildid, queueItem.builder.name))
539 build = getUtility(IBuildSet).getByQueueEntry(queueItem)537 build = getUtility(IBuildSet).getByQueueEntry(queueItem)
540 build.buildstate = BuildStatus.NEEDSBUILD
541 self.storeBuildInfo(queueItem, librarian, buildid, dependencies)538 self.storeBuildInfo(queueItem, librarian, buildid, dependencies)
542 # XXX cprov 2006-05-30: Currently this information is not539 # XXX cprov 2006-05-30: Currently this information is not
543 # properly presented in the Web UI. We will discuss it in540 # properly presented in the Web UI. We will discuss it in
544 # the next Paris Summit, infinity has some ideas about how541 # the next Paris Summit, infinity has some ideas about how
545 # to use this content. For now we just ensure it's stored.542 # to use this content. For now we just ensure it's stored.
546 queueItem.builder.cleanSlave()543 queueItem.builder.cleanSlave()
547 queueItem.builder = None544 queueItem.reset()
548 queueItem.setDateStarted(None)
549 queueItem.logtail = None
550 queueItem.lastscore = 0
551545
552546
553547
=== modified file 'lib/lp/soyuz/doc/buildd-slavescanner.txt'
--- lib/lp/soyuz/doc/buildd-slavescanner.txt 2009-12-14 20:36:19 +0000
+++ lib/lp/soyuz/doc/buildd-slavescanner.txt 2010-01-05 16:52:18 +0000
@@ -311,11 +311,10 @@
311 >>> build.buildstate.title311 >>> build.buildstate.title
312 'Chroot problem'312 'Chroot problem'
313313
314WAITING - BUILDERFAIL -> builder has failed by internal error, job is314WAITING - BUILDERFAIL -> builder has failed by internal error, job is available for next build round:
315available for next build round:
316315
317 >>> bqItem6 = a_build.createBuildQueueEntry()316 >>> bqItem6 = a_build.createBuildQueueEntry()
318 >>> setupBuildQueue(bqItem6, a_builder)317 >>> bqItem6.markAsBuilding(a_builder)
319 >>> last_stub_mail_count = len(stub.test_emails)318 >>> last_stub_mail_count = len(stub.test_emails)
320319
321Create a mock slave so the builder can operate - one with a builder error.320Create a mock slave so the builder can operate - one with a builder error.
@@ -336,8 +335,11 @@
336 >>> check_mail_sent(last_stub_mail_count)335 >>> check_mail_sent(last_stub_mail_count)
337 False336 False
338 >>> build = getUtility(IBuildSet).getByQueueEntry(bqItem6)337 >>> build = getUtility(IBuildSet).getByQueueEntry(bqItem6)
339 >>> build.buildstate.title338 >>> print build.buildstate.title
340 'Needs building'339 Needs building
340 >>> job = bqItem6.specific_job.job
341 >>> print job.status.title
342 Waiting
341343
342Cleanup in preparation for the next test:344Cleanup in preparation for the next test:
343345
@@ -556,7 +558,7 @@
556WAITING -> GIVENBACK - slave requested build record to be rescheduled.558WAITING -> GIVENBACK - slave requested build record to be rescheduled.
557559
558 >>> bqItem11 = a_build.createBuildQueueEntry()560 >>> bqItem11 = a_build.createBuildQueueEntry()
559 >>> setupBuildQueue(bqItem11, a_builder)561 >>> bqItem11.markAsBuilding(a_builder)
560 >>> last_stub_mail_count = len(stub.test_emails)562 >>> last_stub_mail_count = len(stub.test_emails)
561563
562Create a mock slave so the builder gets the right responses for this test.564Create a mock slave so the builder gets the right responses for this test.
@@ -580,8 +582,11 @@
580 >>> check_mail_sent(last_stub_mail_count)582 >>> check_mail_sent(last_stub_mail_count)
581 False583 False
582 >>> build = getUtility(IBuildSet).getByQueueEntry(bqItem11)584 >>> build = getUtility(IBuildSet).getByQueueEntry(bqItem11)
583 >>> build.buildstate.title585 >>> print build.buildstate.title
584 'Needs building'586 Needs building
587 >>> job = bqItem11.specific_job.job
588 >>> print job.status.title
589 Waiting
585590
586Cleanup in preparation for the next test:591Cleanup in preparation for the next test:
587592