Merge lp:~mwhudson/launchpad/back-off-failing-imports-bug-413637 into lp:launchpad/db-devel

Proposed by Michael Hudson-Doyle
Status: Rejected
Rejected by: Michael Hudson-Doyle
Proposed branch: lp:~mwhudson/launchpad/back-off-failing-imports-bug-413637
Merge into: lp:launchpad/db-devel
Diff against target: 873 lines (+431/-102)
13 files modified
lib/canonical/buildd/debian/launchpad-buildd.init (+1/-1)
lib/lp/buildmaster/interfaces/buildfarmjobbehavior.py (+1/-1)
lib/lp/buildmaster/model/buildfarmjobbehavior.py (+94/-10)
lib/lp/buildmaster/tests/test_buildfarmjobbehavior.py (+80/-8)
lib/lp/code/interfaces/codeimportjob.py (+7/-3)
lib/lp/code/model/codeimportjob.py (+26/-21)
lib/lp/code/model/recipebuilder.py (+11/-2)
lib/lp/code/model/tests/test_codeimportjob.py (+25/-0)
lib/lp/soyuz/doc/buildd-slavescanner.txt (+4/-3)
lib/lp/testing/factory.py (+87/-40)
lib/lp/translations/doc/translationtemplatesbuildbehavior.txt (+41/-12)
lib/lp/translations/model/translationtemplatesbuildbehavior.py (+17/-1)
lib/lp/translations/tests/test_translationtemplatesbuildbehavior.py (+37/-0)
To merge this branch: bzr merge lp:~mwhudson/launchpad/back-off-failing-imports-bug-413637
Reviewer Review Type Date Requested Status
Canonical Launchpad Engineering Pending
Review via email: mp+21408@code.launchpad.net

Description of the change

Hi there,

This small branch changes the code import system to exponentially back off on retrying code imports -- see the linked bug for more.

Currently it backs off so that it tries imports at 6, 12, 24 and 48 hours (for subversion) for a total of three and a bit days of trying before marking an import failed -- is that enough? It seems OK, though perhaps a bit on the low side, to me.

Cheers,
mwh

To post a comment you must log in.
Revision history for this message
Jonathan Lange (jml) wrote :
Download full text (6.6 KiB)

On Mon, Mar 15, 2010 at 10:22 PM, Michael Hudson
<email address hidden> wrote:
> Michael Hudson has proposed merging lp:~mwhudson/launchpad/back-off-failing-imports-bug-413637 into lp:launchpad.
>
> Requested reviews:
>  Canonical Launchpad Engineering (launchpad)
> Related bugs:
>  #413637 imports disabled too rapidly
>  https://bugs.launchpad.net/bugs/413637
>
>
> Hi there,
>
> This small branch changes the code import system to exponentially back off on retrying code imports -- see the linked bug for more.
>

I think the branch is bigger than you intended. It's got a bunch of
build farm job stuff in the diff.

> Currently it backs off so that it tries imports at 6, 12, 24 and 48 hours (for subversion) for a total of three and a bit days of trying before marking an import failed -- is that enough?  It seems OK, though perhaps a bit on the low side, to me.
>

Sounds fine to me.

...
> === modified file 'lib/lp/code/model/codeimportjob.py'
> --- lib/lp/code/model/codeimportjob.py  2010-02-24 10:18:16 +0000
> +++ lib/lp/code/model/codeimportjob.py  2010-03-15 22:22:20 +0000
> @@ -10,6 +10,8 @@
>     'CodeImportJobWorkflow',
>     ]
>
> +import datetime
> +
>  from sqlobject import ForeignKey, IntCol, SQLObjectNotFound, StringCol
>
>  from zope.component import getUtility
> @@ -140,7 +142,7 @@
>
>     implements(ICodeImportJobWorkflow)
>
> -    def newJob(self, code_import, date_due=None):
> +    def newJob(self, code_import, interval=None):
>         """See `ICodeImportJobWorkflow`."""
>         assert code_import.review_status == CodeImportReviewStatus.REVIEWED, (
>             "Review status of %s is not REVIEWED: %s" % (
> @@ -149,23 +151,22 @@
>             "Already associated to a CodeImportJob: %s" % (
>             code_import.branch.unique_name))
>
> +        if interval is None:
> +            interval = code_import.effective_update_interval
> +
>         job = CodeImportJob(code_import=code_import, date_due=UTC_NOW)
>
> -        if date_due is None:
> -            # Find the most recent CodeImportResult for this CodeImport. We
> -            # sort by date_created because we do not have an index on
> -            # date_job_started in the database, and that should give the same
> -            # sort order.
> -            most_recent_result_list = list(CodeImportResult.selectBy(
> -                code_import=code_import).orderBy(['-date_created']).limit(1))
> +        # Find the most recent CodeImportResult for this CodeImport. We
> +        # sort by date_created because we do not have an index on
> +        # date_job_started in the database, and that should give the same
> +        # sort order.
> +        most_recent_result_list = list(CodeImportResult.selectBy(
> +            code_import=code_import).orderBy(['-date_created']).limit(1))
>

Why aren't you using Storm, with its 'one()' notation for this?

> @@ -281,15 +282,19 @@
>         # If the import has failed too many times in a row, mark it as
>         # FAILING.
>         failure_limit = config.codeimport.consecutive_failure_limit
> -        if code_import.consecutive_failure_count >= failure_limit:
> +        failure_count = code_import.consecutive_failure_count...

Read more...

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :
Download full text (7.3 KiB)

Jonathan Lange wrote:
> On Mon, Mar 15, 2010 at 10:22 PM, Michael Hudson
> <email address hidden> wrote:
>> Michael Hudson has proposed merging lp:~mwhudson/launchpad/back-off-failing-imports-bug-413637 into lp:launchpad.
>>
>> Requested reviews:
>> Canonical Launchpad Engineering (launchpad)
>> Related bugs:
>> #413637 imports disabled too rapidly
>> https://bugs.launchpad.net/bugs/413637
>>
>>
>> Hi there,
>>
>> This small branch changes the code import system to exponentially back off on retrying code imports -- see the linked bug for more.
>>
>
> I think the branch is bigger than you intended. It's got a bunch of
> build farm job stuff in the diff.

Grar grar, it's proposed into db-devel of course.

I proposed again:
https://code.edge.launchpad.net/~mwhudson/launchpad/back-off-failing-imports-bug-413637/+merge/21411

>> Currently it backs off so that it tries imports at 6, 12, 24 and 48 hours (for subversion) for a total of three and a bit days of trying before marking an import failed -- is that enough? It seems OK, though perhaps a bit on the low side, to me.
>>
>
> Sounds fine to me.
>
> ...
>> === modified file 'lib/lp/code/model/codeimportjob.py'
>> --- lib/lp/code/model/codeimportjob.py 2010-02-24 10:18:16 +0000
>> +++ lib/lp/code/model/codeimportjob.py 2010-03-15 22:22:20 +0000
>> @@ -10,6 +10,8 @@
>> 'CodeImportJobWorkflow',
>> ]
>>
>> +import datetime
>> +
>> from sqlobject import ForeignKey, IntCol, SQLObjectNotFound, StringCol
>>
>> from zope.component import getUtility
>> @@ -140,7 +142,7 @@
>>
>> implements(ICodeImportJobWorkflow)
>>
>> - def newJob(self, code_import, date_due=None):
>> + def newJob(self, code_import, interval=None):
>> """See `ICodeImportJobWorkflow`."""
>> assert code_import.review_status == CodeImportReviewStatus.REVIEWED, (
>> "Review status of %s is not REVIEWED: %s" % (
>> @@ -149,23 +151,22 @@
>> "Already associated to a CodeImportJob: %s" % (
>> code_import.branch.unique_name))
>>
>> + if interval is None:
>> + interval = code_import.effective_update_interval
>> +
>> job = CodeImportJob(code_import=code_import, date_due=UTC_NOW)
>>
>> - if date_due is None:
>> - # Find the most recent CodeImportResult for this CodeImport. We
>> - # sort by date_created because we do not have an index on
>> - # date_job_started in the database, and that should give the same
>> - # sort order.
>> - most_recent_result_list = list(CodeImportResult.selectBy(
>> - code_import=code_import).orderBy(['-date_created']).limit(1))
>> + # Find the most recent CodeImportResult for this CodeImport. We
>> + # sort by date_created because we do not have an index on
>> + # date_job_started in the database, and that should give the same
>> + # sort order.
>> + most_recent_result_list = list(CodeImportResult.selectBy(
>> + code_import=code_import).orderBy(['-date_created']).limit(1))
>>
>
> Why aren't you using Storm, with its 'one()' notation for this?

Because all I did to this code was dedent it. I ...

Read more...

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/canonical/buildd/debian/launchpad-buildd.init'
2--- lib/canonical/buildd/debian/launchpad-buildd.init 2009-12-16 00:16:24 +0000
3+++ lib/canonical/buildd/debian/launchpad-buildd.init 2010-03-15 22:22:20 +0000
4@@ -29,7 +29,7 @@
5 CONF=$1
6 PIDFILE="$PIDROOT"/"$CONF".pid
7 LOGFILE="$LOGROOT"/"$CONF".log
8- su - buildd -c "BUILDD_SLAVE_CONFIG=$CONFROOT/$CONF PYTHONPATH=/usr/share/launchpad-buildd twistd --no_save --pidfile $PIDFILE --python $TACFILE --logfile $LOGFILE"
9+ su - buildd -c "BUILDD_SLAVE_CONFIG=$CONFROOT/$CONF PYTHONPATH=/usr/share/launchpad-buildd twistd --no_save --pidfile $PIDFILE --python $TACFILE --logfile $LOGFILE --umask 022"
10 }
11
12 #
13
14=== modified file 'lib/lp/buildmaster/interfaces/buildfarmjobbehavior.py'
15--- lib/lp/buildmaster/interfaces/buildfarmjobbehavior.py 2010-01-21 05:03:16 +0000
16+++ lib/lp/buildmaster/interfaces/buildfarmjobbehavior.py 2010-03-15 22:22:20 +0000
17@@ -1,4 +1,4 @@
18-# Copyright 2009 Canonical Ltd. This software is licensed under the
19+# Copyright 2009-2010 Canonical Ltd. This software is licensed under the
20 # GNU Affero General Public License version 3 (see the file LICENSE).
21
22 # pylint: disable-msg=E0211,E0213
23
24=== modified file 'lib/lp/buildmaster/model/buildfarmjobbehavior.py'
25--- lib/lp/buildmaster/model/buildfarmjobbehavior.py 2010-03-05 13:52:32 +0000
26+++ lib/lp/buildmaster/model/buildfarmjobbehavior.py 2010-03-15 22:22:20 +0000
27@@ -1,4 +1,4 @@
28-# Copyright 2009 Canonical Ltd. This software is licensed under the
29+# Copyright 2009-2010 Canonical Ltd. This software is licensed under the
30 # GNU Affero General Public License version 3 (see the file LICENSE).
31
32 # pylint: disable-msg=E0211,E0213
33@@ -17,17 +17,22 @@
34 import xmlrpclib
35
36 from sqlobject import SQLObjectNotFound
37+
38 from zope.component import getUtility
39 from zope.interface import implements
40-from zope.security.proxy import removeSecurityProxy
41+from zope.security.proxy import removeSecurityProxy, isinstance as zisinstance
42
43 from canonical import encoding
44 from canonical.librarian.interfaces import ILibrarianClient
45+
46+from canonical.launchpad.webapp.interfaces import NotFoundError
47 from lp.buildmaster.interfaces.builder import CorruptBuildID
48 from lp.buildmaster.interfaces.buildfarmjobbehavior import (
49 BuildBehaviorMismatch, IBuildFarmJobBehavior)
50 from lp.buildmaster.interfaces.buildqueue import IBuildQueueSet
51+from lp.buildmaster.model.buildqueue import BuildQueue
52 from lp.services.job.interfaces.job import JobStatus
53+from lp.soyuz.interfaces.build import IBuildSet
54
55
56 class BuildFarmJobBehaviorBase:
57@@ -59,22 +64,101 @@
58 The default behavior is that we don't add any extra values."""
59 return {}
60
61+ def _helpVerifyBuildIDComponent(self, raw_id, item_type, finder):
62+ """Helper for verifying parts of a `BuildFarmJob` name.
63+
64+ Different `IBuildFarmJob` implementations can have different
65+ ways of constructing their identifying names. The names are
66+ produced by `IBuildFarmJob.getName` and verified by
67+ `IBuildFarmJobBehavior.verifySlaveBuildID`.
68+
69+ This little helper makes it easier to verify an object id
70+ embedded in that name, check that it's a valid number, and
71+ retrieve the associated database object.
72+
73+ :param raw_id: An unverified id string as extracted from the
74+ build name. The method will verify that it is a number, and
75+ try to retrieve the associated object.
76+ :param item_type: The type of object this id represents. Should
77+ be a class.
78+ :param finder: A function that, given an integral id, finds the
79+ associated object of type `item_type`.
80+ :raise CorruptBuildID: If `raw_id` is malformed in some way or
81+ the associated `item_type` object is not found.
82+ :return: An object that is an instance of `item_type`.
83+ """
84+ type_name = item_type.__name__
85+ try:
86+ numeric_id = int(raw_id)
87+ except ValueError:
88+ raise CorruptBuildID(
89+ "%s ID is not a number: '%s'" % (type_name, raw_id))
90+
91+ try:
92+ item = finder(numeric_id)
93+ except (NotFoundError, SQLObjectNotFound), reason:
94+ raise CorruptBuildID(
95+ "%s %d is not available: %s" % (
96+ type_name, numeric_id, reason))
97+ except Exception, reason:
98+ raise CorruptBuildID(
99+ "Error while looking up %s %d: %s" % (
100+ type_name, numeric_id, reason))
101+
102+ if item is None:
103+ raise CorruptBuildID("There is no %s with id %d." % (
104+ type_name, numeric_id))
105+
106+ assert zisinstance(item, item_type), (
107+ "Looked for %s, but found %s." % (type_name, repr(item)))
108+
109+ return item
110+
111+ def getVerifiedBuild(self, raw_id):
112+ """Verify and retrieve the `Build` component of a slave build id.
113+
114+ This does part of the verification for `verifySlaveBuildID`.
115+
116+ By default, a `BuildFarmJob` has an identifying name of the form
117+ "b-q", where b is the id of its `Build` and q is the id of its
118+ `BuildQueue` record.
119+
120+ Use `getVerifiedBuild` to verify the "b" part, and retrieve the
121+ associated `Build`.
122+ """
123+ # Avoid circular import.
124+ from lp.soyuz.model.build import Build
125+
126+ return self._helpVerifyBuildIDComponent(
127+ raw_id, Build, getUtility(IBuildSet).getByBuildID)
128+
129+ def getVerifiedBuildQueue(self, raw_id):
130+ """Verify and retrieve the `BuildQueue` component of a slave build id.
131+
132+ This does part of the verification for `verifySlaveBuildID`.
133+
134+ By default, a `BuildFarmJob` has an identifying name of the form
135+ "b-q", where b is the id of its `Build` and q is the id of its
136+ `BuildQueue` record.
137+
138+ Use `getVerifiedBuildQueue` to verify the "q" part, and retrieve
139+ the associated `BuildQueue` object.
140+ """
141+ return self._helpVerifyBuildIDComponent(
142+ raw_id, BuildQueue, getUtility(IBuildQueueSet).get)
143+
144 def verifySlaveBuildID(self, slave_build_id):
145 """See `IBuildFarmJobBehavior`."""
146 # Extract information from the identifier.
147 try:
148 build_id, queue_item_id = slave_build_id.split('-')
149- build_id = int(build_id)
150- queue_item_id = int(queue_item_id)
151 except ValueError:
152 raise CorruptBuildID('Malformed build ID')
153+
154+ build = self.getVerifiedBuild(build_id)
155+ queue_item = self.getVerifiedBuildQueue(queue_item_id)
156
157- try:
158- queue_item = getUtility(IBuildQueueSet).get(queue_item_id)
159- # Check whether build and buildqueue are properly related.
160- except SQLObjectNotFound, reason:
161- raise CorruptBuildID(str(reason))
162- if queue_item.specific_job.build.id != build_id:
163+ if build != queue_item.specific_job.build:
164 raise CorruptBuildID('Job build entry mismatch')
165
166 def updateBuild(self, queueItem):
167
168=== modified file 'lib/lp/buildmaster/tests/test_buildfarmjobbehavior.py'
169--- lib/lp/buildmaster/tests/test_buildfarmjobbehavior.py 2010-03-06 04:57:40 +0000
170+++ lib/lp/buildmaster/tests/test_buildfarmjobbehavior.py 2010-03-15 22:22:20 +0000
171@@ -3,10 +3,18 @@
172
173 """Unit tests for BuildFarmJobBehaviorBase."""
174
175-from unittest import TestCase, TestLoader
176+from unittest import TestLoader
177+
178+from zope.component import getUtility
179+
180+from canonical.testing.layers import ZopelessDatabaseLayer
181+from lp.testing import TestCaseWithFactory
182
183 from lp.buildmaster.interfaces.buildbase import BuildStatus
184+from lp.buildmaster.interfaces.builder import CorruptBuildID
185 from lp.buildmaster.model.buildfarmjobbehavior import BuildFarmJobBehaviorBase
186+from lp.registry.interfaces.pocket import PackagePublishingPocket
187+from lp.soyuz.interfaces.processor import IProcessorFamilySet
188
189
190 class FakeBuildFarmJob:
191@@ -14,27 +22,91 @@
192 pass
193
194
195-class TestBuildFarmJobBehaviorBase(TestCase):
196+class TestBuildFarmJobBehaviorBase(TestCaseWithFactory):
197 """Test very small, basic bits of BuildFarmJobBehaviorBase."""
198
199- def setUp(self):
200- self.behavior = BuildFarmJobBehaviorBase(FakeBuildFarmJob())
201+ layer = ZopelessDatabaseLayer
202+
203+ def _makeBehavior(self, buildfarmjob=None):
204+ """Create a `BuildFarmJobBehaviorBase`."""
205+ if buildfarmjob is None:
206+ buildfarmjob = FakeBuildFarmJob()
207+ return BuildFarmJobBehaviorBase(buildfarmjob)
208+
209+ def _makeBuild(self):
210+ """Create a `Build` object."""
211+ x86 = getUtility(IProcessorFamilySet).getByName('x86')
212+ distroarchseries = self.factory.makeDistroArchSeries(
213+ architecturetag='x86', processorfamily=x86)
214+ distroseries = distroarchseries.distroseries
215+ archive = self.factory.makeArchive(
216+ distribution=distroseries.distribution)
217+ pocket = PackagePublishingPocket.RELEASE
218+ spr = self.factory.makeSourcePackageRelease(
219+ distroseries=distroseries, archive=archive)
220+
221+ return spr.createBuild(
222+ distroarchseries=distroarchseries, pocket=pocket, archive=archive)
223+
224+ def _makeBuildQueue(self):
225+ """Create a `BuildQueue` object."""
226+ return self.factory.makeSourcePackageRecipeBuildJob()
227
228 def test_extractBuildStatus_baseline(self):
229 # extractBuildStatus picks the name of the build status out of a
230 # dict describing the slave's status.
231 slave_status = {'build_status': 'BuildStatus.BUILDING'}
232+ behavior = self._makeBehavior()
233 self.assertEqual(
234 BuildStatus.BUILDING.name,
235- self.behavior.extractBuildStatus(slave_status))
236+ behavior.extractBuildStatus(slave_status))
237
238 def test_extractBuildStatus_malformed(self):
239 # extractBuildStatus errors out when the status string is not
240 # of the form it expects.
241 slave_status = {'build_status': 'BUILDING'}
242- self.assertRaises(
243- AssertionError,
244- self.behavior.extractBuildStatus, slave_status)
245+ behavior = self._makeBehavior()
246+ self.assertRaises(
247+ AssertionError, behavior.extractBuildStatus, slave_status)
248+
249+ def test_getVerifiedBuild_success(self):
250+ build = self._makeBuild()
251+ behavior = self._makeBehavior()
252+ raw_id = str(build.id)
253+
254+ self.assertEqual(build, behavior.getVerifiedBuild(raw_id))
255+
256+ def test_getVerifiedBuild_malformed(self):
257+ behavior = self._makeBehavior()
258+ self.assertRaises(CorruptBuildID, behavior.getVerifiedBuild, 'hi!')
259+
260+ def test_getVerifiedBuild_notfound(self):
261+ build = self._makeBuild()
262+ behavior = self._makeBehavior()
263+ nonexistent_id = str(build.id + 99)
264+
265+ self.assertRaises(
266+ CorruptBuildID, behavior.getVerifiedBuild, nonexistent_id)
267+
268+ def test_getVerifiedBuildQueue_success(self):
269+ buildqueue = self._makeBuildQueue()
270+ behavior = self._makeBehavior()
271+ raw_id = str(buildqueue.id)
272+
273+ self.assertEqual(buildqueue, behavior.getVerifiedBuildQueue(raw_id))
274+
275+ def test_getVerifiedBuildQueue_malformed(self):
276+ behavior = self._makeBehavior()
277+ self.assertRaises(
278+ CorruptBuildID, behavior.getVerifiedBuildQueue, 'bye!')
279+
280+ def test_getVerifiedBuildQueue_notfound(self):
281+ buildqueue = self._makeBuildQueue()
282+ behavior = self._makeBehavior()
283+ nonexistent_id = str(buildqueue.id + 99)
284+
285+ self.assertRaises(
286+ CorruptBuildID, behavior.getVerifiedBuildQueue, nonexistent_id)
287
288
289 def test_suite():
290
291=== modified file 'lib/lp/code/interfaces/codeimportjob.py'
292--- lib/lp/code/interfaces/codeimportjob.py 2010-02-22 08:10:03 +0000
293+++ lib/lp/code/interfaces/codeimportjob.py 2010-03-15 22:22:20 +0000
294@@ -16,6 +16,8 @@
295 'ICodeImportJobWorkflow',
296 ]
297
298+import datetime
299+
300 from zope.interface import Interface
301 from zope.schema import Choice, Datetime, Int, Object, Text
302
303@@ -137,7 +139,7 @@
304 class ICodeImportJobWorkflow(Interface):
305 """Utility to manage `CodeImportJob` objects through their life cycle."""
306
307- def newJob(code_import):
308+ def newJob(code_import, interval=datetime.timedelta(0)):
309 """Create a `CodeImportJob` associated with a reviewed `CodeImport`.
310
311 Call this method from `CodeImport.updateFromData` when the
312@@ -218,9 +220,11 @@
313 display for diagnostics. May be None.
314 :precondition: `import_job`.state == RUNNING.
315 :postcondition: `import_job` is deleted.
316- :postcondition: `code_import.import_job` is not None.
317+ :postcondition: `code_import.import_job` is not None unless the job
318+ has failed more than consecutive_failure_limit times in a row.
319 :postcondition: `code_import.import_job.date_due` is
320- import_job.date_due + code_import.effective_update_interval`.
321+ import_job.date_due + code_import.effective_update_interval`, with
322+ scaling to retry failing imports less often.
323 :postcondition: A `CodeImportResult` was created.
324 :postcondition: A FINISH `CodeImportEvent` was created.
325 """
326
327=== modified file 'lib/lp/code/model/codeimportjob.py'
328--- lib/lp/code/model/codeimportjob.py 2010-02-24 10:18:16 +0000
329+++ lib/lp/code/model/codeimportjob.py 2010-03-15 22:22:20 +0000
330@@ -10,6 +10,8 @@
331 'CodeImportJobWorkflow',
332 ]
333
334+import datetime
335+
336 from sqlobject import ForeignKey, IntCol, SQLObjectNotFound, StringCol
337
338 from zope.component import getUtility
339@@ -140,7 +142,7 @@
340
341 implements(ICodeImportJobWorkflow)
342
343- def newJob(self, code_import, date_due=None):
344+ def newJob(self, code_import, interval=None):
345 """See `ICodeImportJobWorkflow`."""
346 assert code_import.review_status == CodeImportReviewStatus.REVIEWED, (
347 "Review status of %s is not REVIEWED: %s" % (
348@@ -149,23 +151,22 @@
349 "Already associated to a CodeImportJob: %s" % (
350 code_import.branch.unique_name))
351
352+ if interval is None:
353+ interval = code_import.effective_update_interval
354+
355 job = CodeImportJob(code_import=code_import, date_due=UTC_NOW)
356
357- if date_due is None:
358- # Find the most recent CodeImportResult for this CodeImport. We
359- # sort by date_created because we do not have an index on
360- # date_job_started in the database, and that should give the same
361- # sort order.
362- most_recent_result_list = list(CodeImportResult.selectBy(
363- code_import=code_import).orderBy(['-date_created']).limit(1))
364+ # Find the most recent CodeImportResult for this CodeImport. We
365+ # sort by date_created because we do not have an index on
366+ # date_job_started in the database, and that should give the same
367+ # sort order.
368+ most_recent_result_list = list(CodeImportResult.selectBy(
369+ code_import=code_import).orderBy(['-date_created']).limit(1))
370
371- if len(most_recent_result_list) != 0:
372- [most_recent_result] = most_recent_result_list
373- interval = code_import.effective_update_interval
374- date_due = most_recent_result.date_job_started + interval
375- job.date_due = max(job.date_due, date_due)
376- else:
377- job.date_due = date_due
378+ if len(most_recent_result_list) != 0:
379+ [most_recent_result] = most_recent_result_list
380+ date_due = most_recent_result.date_job_started + interval
381+ job.date_due = max(job.date_due, date_due)
382
383 return job
384
385@@ -281,15 +282,19 @@
386 # If the import has failed too many times in a row, mark it as
387 # FAILING.
388 failure_limit = config.codeimport.consecutive_failure_limit
389- if code_import.consecutive_failure_count >= failure_limit:
390+ failure_count = code_import.consecutive_failure_count
391+ if failure_count >= failure_limit:
392 code_import.updateFromData(
393 dict(review_status=CodeImportReviewStatus.FAILING), None)
394+ elif status == CodeImportResultStatus.SUCCESS_PARTIAL:
395+ interval = datetime.timedelta(0)
396+ elif failure_count > 0:
397+ interval = code_import.effective_update_interval * (2**(failure_count - 1))
398+ else:
399+ interval = code_import.effective_update_interval
400 # Only start a new one if the import is still in the REVIEWED state.
401 if code_import.review_status == CodeImportReviewStatus.REVIEWED:
402- extra = {}
403- if status == CodeImportResultStatus.SUCCESS_PARTIAL:
404- extra['date_due'] = UTC_NOW
405- self.newJob(code_import, **extra)
406+ self.newJob(code_import, interval=interval)
407 # If the status was successful, update date_last_successful.
408 if status in [CodeImportResultStatus.SUCCESS,
409 CodeImportResultStatus.SUCCESS_NOCHANGE]:
410@@ -321,7 +326,7 @@
411 import_job, CodeImportResultStatus.RECLAIMED, None)
412 # 3)
413 if code_import.review_status == CodeImportReviewStatus.REVIEWED:
414- self.newJob(code_import, UTC_NOW)
415+ self.newJob(code_import, datetime.timedelta(0))
416 # 4)
417 getUtility(ICodeImportEventSet).newReclaim(
418 code_import, machine, job_id)
419
420=== modified file 'lib/lp/code/model/recipebuilder.py'
421--- lib/lp/code/model/recipebuilder.py 2010-02-05 15:06:28 +0000
422+++ lib/lp/code/model/recipebuilder.py 2010-03-15 22:22:20 +0000
423@@ -8,7 +8,7 @@
424 'RecipeBuildBehavior',
425 ]
426
427-from zope.component import adapts
428+from zope.component import adapts, getUtility
429 from zope.interface import implements
430
431 from lp.archiveuploader.permission import check_upload_to_pocket
432@@ -18,7 +18,8 @@
433 from lp.buildmaster.model.buildfarmjobbehavior import (
434 BuildFarmJobBehaviorBase)
435 from lp.code.interfaces.sourcepackagerecipebuild import (
436- ISourcePackageRecipeBuildJob)
437+ ISourcePackageRecipeBuildJob, ISourcePackageRecipeBuildSource)
438+from lp.code.model.sourcepackagerecipebuild import SourcePackageRecipeBuild
439 from lp.registry.interfaces.pocket import PackagePublishingPocket
440 from lp.soyuz.adapters.archivedependencies import (
441 get_primary_current_component, get_sources_list_for_building)
442@@ -168,3 +169,11 @@
443 extra_info['logtail'] = raw_slave_status[2]
444
445 return extra_info
446+
447+ def getVerifiedBuild(self, raw_id):
448+ """See `IBuildFarmJobBehavior`."""
449+ # This type of job has a build that is of type BuildBase but not
450+ # actually a Build.
451+ return self._helpVerifyBuildIDComponent(
452+ raw_id, SourcePackageRecipeBuild,
453+ getUtility(ISourcePackageRecipeBuildSource).getById)
454
455=== modified file 'lib/lp/code/model/tests/test_codeimportjob.py'
456--- lib/lp/code/model/tests/test_codeimportjob.py 2010-03-14 20:15:34 +0000
457+++ lib/lp/code/model/tests/test_codeimportjob.py 2010-03-15 22:22:20 +0000
458@@ -778,6 +778,31 @@
459 self.assertEqual(new_job.machine, None)
460 self.assertSqlAttributeEqualsDate(new_job, 'date_due', UTC_NOW)
461
462+ def test_failures_back_off(self):
463+ # We wait for longer and longer between retrying failing imports, to
464+ # make it less likely that an import is marked failing just because
465+ # someone's DNS went down for a day.
466+ running_job = self.makeRunningJob()
467+ intervals = []
468+ interval = running_job.code_import.effective_update_interval
469+ expected_intervals = []
470+ for i in range(config.codeimport.consecutive_failure_limit - 1):
471+ expected_intervals.append(interval)
472+ interval *= 2
473+ # Fail an import a bunch of times and record how far in the future the
474+ # next job was scheduled.
475+ for i in range(config.codeimport.consecutive_failure_limit - 1):
476+ code_import = running_job.code_import
477+ getUtility(ICodeImportJobWorkflow).finishJob(
478+ running_job, CodeImportResultStatus.FAILURE, None)
479+ intervals.append(
480+ code_import.import_job.date_due -
481+ code_import.results[-1].date_job_started)
482+ running_job = code_import.import_job
483+ getUtility(ICodeImportJobWorkflow).startJob(
484+ running_job, self.machine)
485+ self.assertEqual(expected_intervals, intervals)
486+
487 def test_doesntCreateNewJobIfCodeImportNotReviewed(self):
488 # finishJob() creates a new CodeImportJob for the given CodeImport,
489 # unless the CodeImport has been suspended or marked invalid.
490
491=== modified file 'lib/lp/soyuz/doc/buildd-slavescanner.txt'
492--- lib/lp/soyuz/doc/buildd-slavescanner.txt 2010-03-10 12:50:18 +0000
493+++ lib/lp/soyuz/doc/buildd-slavescanner.txt 2010-03-15 22:22:20 +0000
494@@ -95,15 +95,16 @@
495 >>> lostbuilding_builder = MockBuilder(
496 ... 'Lost Building Slave', LostBuildingSlave())
497 >>> buildergroup.rescueBuilderIfLost(lostbuilding_builder)
498- WARNING:root:Builder 'Lost Building Slave' rescued from
499- '1000-10000': 'Object not found'
500+ WARNING:root:Builder 'Lost Building Slave' rescued from '1000-10000':
501+ 'Build 1000 is not available: 'Object not found''
502
503 Then a lost slave in status 'WAITING':
504
505 >>> lostwaiting_builder = MockBuilder(
506 ... 'Lost Waiting Slave', LostWaitingSlave())
507 >>> buildergroup.rescueBuilderIfLost(lostwaiting_builder)
508- WARNING:root:Builder 'Lost Waiting Slave' rescued from '1000-10000': 'Object not found'
509+ WARNING:root:Builder 'Lost Waiting Slave' rescued from '1000-10000':
510+ 'Build 1000 is not available: 'Object not found''
511
512 Both got rescued, as expected.
513
514
515=== modified file 'lib/lp/testing/factory.py'
516--- lib/lp/testing/factory.py 2010-03-11 20:54:36 +0000
517+++ lib/lp/testing/factory.py 2010-03-15 22:22:20 +0000
518@@ -2008,6 +2008,81 @@
519 def getAnySourcePackageUrgency(self):
520 return SourcePackageUrgency.MEDIUM
521
522+ def makeSourcePackageRelease(self, archive=None, sourcepackagename=None,
523+ distroseries=None, maintainer=None,
524+ creator=None, component=None, section=None,
525+ urgency=None, version=None,
526+ builddepends=None, builddependsindep=None,
527+ build_conflicts=None,
528+ build_conflicts_indep=None,
529+ architecturehintlist='all',
530+ dsc_maintainer_rfc822=None,
531+ dsc_standards_version='3.6.2',
532+ dsc_format='1.0', dsc_binaries='foo-bin',
533+ date_uploaded=UTC_NOW):
534+ """Make a `SourcePackageRelease`."""
535+ if distroseries is None:
536+ if archive is None:
537+ distribution = None
538+ else:
539+ distribution = archive.distribution
540+ distroseries = self.makeDistroRelease(distribution=distribution)
541+
542+ if archive is None:
543+ archive = self.makeArchive(
544+ distribution=distroseries.distribution,
545+ purpose=ArchivePurpose.PRIMARY)
546+
547+ if sourcepackagename is None:
548+ sourcepackagename = self.makeSourcePackageName()
549+
550+ if component is None:
551+ component = self.makeComponent()
552+
553+ if urgency is None:
554+ urgency = self.getAnySourcePackageUrgency()
555+
556+ if section is None:
557+ section = self.getUniqueString('section')
558+ section = getUtility(ISectionSet).ensure(section)
559+
560+ if maintainer is None:
561+ maintainer = self.makePerson()
562+
563+ maintainer_email = '%s <%s>' % (
564+ maintainer.displayname,
565+ maintainer.preferredemail.email)
566+
567+ if creator is None:
568+ creator = self.makePerson()
569+
570+ if version is None:
571+ version = self.getUniqueString('version')
572+
573+ return distroseries.createUploadedSourcePackageRelease(
574+ sourcepackagename=sourcepackagename,
575+ maintainer=maintainer,
576+ creator=creator,
577+ component=component,
578+ section=section,
579+ urgency=urgency,
580+ version=version,
581+ builddepends=builddepends,
582+ builddependsindep=builddependsindep,
583+ build_conflicts=build_conflicts,
584+ build_conflicts_indep=build_conflicts_indep,
585+ architecturehintlist=architecturehintlist,
586+ changelog_entry=None,
587+ dsc=None,
588+ copyright=self.getUniqueString(),
589+ dscsigningkey=None,
590+ dsc_maintainer_rfc822=maintainer_email,
591+ dsc_standards_version=dsc_standards_version,
592+ dsc_format=dsc_format,
593+ dsc_binaries=dsc_binaries,
594+ archive=archive,
595+ dateuploaded=date_uploaded)
596+
597 def makeSourcePackagePublishingHistory(self, sourcepackagename=None,
598 distroseries=None, maintainer=None,
599 creator=None, component=None,
600@@ -2027,54 +2102,31 @@
601 dsc_format='1.0',
602 dsc_binaries='foo-bin',
603 ):
604- if sourcepackagename is None:
605- sourcepackagename = self.makeSourcePackageName()
606- spn = sourcepackagename
607-
608+ """Make a `SourcePackagePublishingHistory`."""
609 if distroseries is None:
610- distroseries = self.makeDistroRelease()
611+ if archive is None:
612+ distribution = None
613+ else:
614+ distribution = archive.distribution
615+ distroseries = self.makeDistroRelease(distribution=distribution)
616
617 if archive is None:
618 archive = self.makeArchive(
619 distribution=distroseries.distribution,
620 purpose=ArchivePurpose.PRIMARY)
621
622- if component is None:
623- component = self.makeComponent()
624-
625 if pocket is None:
626 pocket = self.getAnyPocket()
627
628 if status is None:
629 status = PackagePublishingStatus.PENDING
630
631- if urgency is None:
632- urgency = self.getAnySourcePackageUrgency()
633-
634- if section is None:
635- section = self.getUniqueString('section')
636- section = getUtility(ISectionSet).ensure(section)
637-
638- if maintainer is None:
639- maintainer = self.makePerson()
640-
641- maintainer_email = '%s <%s>' % (
642- maintainer.displayname,
643- maintainer.preferredemail.email)
644-
645- if creator is None:
646- creator = self.makePerson()
647-
648- if version is None:
649- version = self.getUniqueString('version')
650-
651- gpg_key = self.makeGPGKey(creator)
652-
653- spr = distroseries.createUploadedSourcePackageRelease(
654- sourcepackagename=spn,
655+ spr = self.makeSourcePackageRelease(
656+ archive=archive,
657+ sourcepackagename=sourcepackagename,
658+ distroseries=distroseries,
659 maintainer=maintainer,
660- creator=creator,
661- component=component,
662+ creator=creator, component=component,
663 section=section,
664 urgency=urgency,
665 version=version,
666@@ -2083,15 +2135,10 @@
667 build_conflicts=build_conflicts,
668 build_conflicts_indep=build_conflicts_indep,
669 architecturehintlist=architecturehintlist,
670- changelog_entry=None,
671- dsc=None,
672- copyright=self.getUniqueString(),
673- dscsigningkey=gpg_key,
674- dsc_maintainer_rfc822=maintainer_email,
675 dsc_standards_version=dsc_standards_version,
676 dsc_format=dsc_format,
677 dsc_binaries=dsc_binaries,
678- archive=archive, dateuploaded=date_uploaded)
679+ date_uploaded=date_uploaded)
680
681 sspph = SourcePackagePublishingHistory(
682 distroseries=distroseries,
683
684=== modified file 'lib/lp/translations/doc/translationtemplatesbuildbehavior.txt'
685--- lib/lp/translations/doc/translationtemplatesbuildbehavior.txt 2010-03-05 13:52:32 +0000
686+++ lib/lp/translations/doc/translationtemplatesbuildbehavior.txt 2010-03-15 22:22:20 +0000
687@@ -2,13 +2,21 @@
688
689 == Setup ==
690
691-Set up build environment.
692+Set up build environment. Clear out the build queue.
693
694 >>> import transaction
695 >>> import logging
696 >>> logger = logging.getLogger()
697 >>> logger.setLevel(logging.CRITICAL)
698
699+ >>> from canonical.database.sqlbase import quote
700+ >>> from canonical.launchpad.interfaces.lpstorm import IMasterStore
701+ >>> from lp.services.job.interfaces.job import JobStatus
702+ >>> from lp.services.job.model.job import Job
703+ >>> store = IMasterStore(Job)
704+ >>> query = store.execute("UPDATE Job SET status = %s" % quote(
705+ ... JobStatus.FAILED))
706+
707 >>> from lp.buildmaster.master import BuilddMaster
708 >>> from canonical.buildd.tests import BuilddSlaveTestSetup
709 >>> bm = BuilddMaster(logger, transaction)
710@@ -39,7 +47,21 @@
711
712 Make a builder to process our build request.
713
714- >>> builder = factory.makeBuilder(virtualized=False, processor=processor)
715+ >>> builder = factory.makeBuilder(
716+ ... virtualized=True, processor=processor, vm_host='hostname')
717+
718+The builder doesn't talk to a real slave. We don't have those in our
719+test suite. But we give it a fake one.
720+
721+ >>> from lp.testing.fakemethod import FakeMethod
722+ >>> class FakeSlave:
723+ ... build = FakeMethod()
724+ ... cacheFile = FakeMethod()
725+ ... resume = FakeMethod(result=('Output here', 'Errors here', 0))
726+
727+ >>> from zope.security.proxy import removeSecurityProxy
728+ >>> slave = FakeSlave()
729+ >>> removeSecurityProxy(builder).slave = slave
730
731
732 == Get a job! ==
733@@ -60,20 +82,27 @@
734 >>> print buildqueue.date_started
735 None
736
737-Dispatch the job to the build slave. The proper method to call here
738-would have been Builder.findAndStartJob, but it has not been generalised
739-to handle new job types yet.
740-
741-# XXX JeroenVermeulen bug=506617: call findAndStartJob instead.
742-
743- >>> from zope.security.proxy import removeSecurityProxy
744- >>> removeSecurityProxy(builder)._dispatchBuildCandidate(buildqueue)
745-
746-The build is now marked as started.
747+Our job is now first in line to be executed.
748+
749+ >>> removeSecurityProxy(builder)._findBuildCandidate() == buildqueue
750+ True
751+
752+Dispatch the first job in line (ours!) to the build slave.
753+
754+ >>> activated_job = builder.findAndStartJob()
755+ >>> activated_job == buildqueue
756+ True
757+
758+Our build is now marked as started.
759
760 >>> buildqueue.date_started is None
761 False
762
763+The slave's build method has been called.
764+
765+ >>> slave.build.call_count
766+ 1
767+
768
769 == Teardown ==
770
771
772=== modified file 'lib/lp/translations/model/translationtemplatesbuildbehavior.py'
773--- lib/lp/translations/model/translationtemplatesbuildbehavior.py 2010-02-11 19:11:11 +0000
774+++ lib/lp/translations/model/translationtemplatesbuildbehavior.py 2010-03-15 22:22:20 +0000
775@@ -17,6 +17,7 @@
776
777 from canonical.launchpad.interfaces import ILaunchpadCelebrities
778
779+from lp.buildmaster.interfaces.builder import CorruptBuildID
780 from lp.buildmaster.interfaces.buildfarmjobbehavior import (
781 IBuildFarmJobBehavior)
782 from lp.buildmaster.model.buildfarmjobbehavior import (
783@@ -40,12 +41,27 @@
784 self._builder.slave.cacheFile(logger, chroot)
785 buildid = self.buildfarmjob.getName()
786
787- args = { 'branch_url': self.buildfarmjob.branch.url }
788+ args = self.buildfarmjob.metadata
789 filemap = {}
790
791 self._builder.slave.build(
792 buildid, self.build_type, chroot_sha1, filemap, args)
793
794+ def verifySlaveBuildID(self, slave_build_id):
795+ """See `IBuildFarmJobBehavior`."""
796+ try:
797+ branch_name, queue_item_id = slave_build_id.rsplit('-', 1)
798+ except ValueError:
799+ raise CorruptBuildID(
800+ "Malformed translation templates build id: '%s'" % (
801+ slave_build_id))
802+
803+ buildqueue = self.getVerifiedBuildQueue(queue_item_id)
804+ if buildqueue.job != self.buildfarmjob.job:
805+ raise CorruptBuildID(
806+ "ID mismatch for translation templates build '%s'" % (
807+ slave_build_id))
808+
809 def _getChroot(self):
810 ubuntu = getUtility(ILaunchpadCelebrities).ubuntu
811 return ubuntu.currentseries.nominatedarchindep.getChroot()
812
813=== modified file 'lib/lp/translations/tests/test_translationtemplatesbuildbehavior.py'
814--- lib/lp/translations/tests/test_translationtemplatesbuildbehavior.py 2010-03-12 08:31:43 +0000
815+++ lib/lp/translations/tests/test_translationtemplatesbuildbehavior.py 2010-03-15 22:22:20 +0000
816@@ -19,6 +19,7 @@
817 from lp.buildmaster.interfaces.buildbase import BuildStatus
818 from lp.buildmaster.interfaces.buildfarmjobbehavior import (
819 IBuildFarmJobBehavior)
820+from lp.buildmaster.interfaces.builder import CorruptBuildID
821 from lp.buildmaster.interfaces.buildqueue import IBuildQueueSet
822 from lp.testing import TestCaseWithFactory
823 from lp.testing.fakemethod import FakeMethod
824@@ -123,6 +124,10 @@
825 self.assertEqual(
826 'translation-templates', slave_status['test_build_type'])
827 self.assertIn('branch_url', slave_status['test_build_args'])
828+ # The slave receives the public http URL for the branch.
829+ self.assertEqual(
830+ behavior.buildfarmjob.branch.composePublicURL(),
831+ slave_status['test_build_args']['branch_url'])
832
833 def test_getChroot(self):
834 # _getChroot produces the current chroot for the current Ubuntu
835@@ -170,6 +175,38 @@
836 self.assertEqual(1, builder.cleanSlave.call_count)
837 self.assertEqual(0, behavior._uploadTarball.call_count)
838
839+ def test_verifySlaveBuildID_success(self):
840+ # TranslationTemplatesBuildJob.getName generates slave build ids
841+ # that TranslationTemplatesBuildBehavior.verifySlaveBuildID
842+ # accepts.
843+ behavior = self._makeBehavior()
844+ buildfarmjob = behavior.buildfarmjob
845+ job = buildfarmjob.job
846+
847+ # The test is that this not raise CorruptBuildID (or anything
848+ # else, for that matter).
849+ behavior.verifySlaveBuildID(behavior.buildfarmjob.getName())
850+
851+ def test_verifySlaveBuildID_handles_dashes(self):
852+ # TranslationTemplatesBuildBehavior.verifySlaveBuildID can deal
853+ # with dashes in branch names.
854+ behavior = self._makeBehavior()
855+ buildfarmjob = behavior.buildfarmjob
856+ job = buildfarmjob.job
857+ buildfarmjob.branch.name = 'x-y-z--'
858+
859+ # The test is that this not raise CorruptBuildID (or anything
860+ # else, for that matter).
861+ behavior.verifySlaveBuildID(behavior.buildfarmjob.getName())
862+
863+ def test_verifySlaveBuildID_malformed(self):
864+ behavior = self._makeBehavior()
865+ self.assertRaises(CorruptBuildID, behavior.verifySlaveBuildID, 'huh?')
866+
867+ def test_verifySlaveBuildID_notfound(self):
868+ behavior = self._makeBehavior()
869+ self.assertRaises(CorruptBuildID, behavior.verifySlaveBuildID, '1-1')
870+
871
872 def test_suite():
873 return TestLoader().loadTestsFromName(__name__)

Subscribers

People subscribed via source and target branches

to status/vote changes: