Code review comment for lp:~julian-edwards/launchpad/buildd-failure-counting

Revision history for this message
Jonathan Lange (jml) wrote :

On Fri, Aug 20, 2010 at 1:16 PM, Julian Edwards
<email address hidden> wrote:
> On Friday 20 August 2010 11:43:40 you wrote:
>> On Thu, Aug 19, 2010 at 5:13 PM, Julian Edwards
>> <email address hidden> wrote:
>> > On Thursday 19 August 2010 13:37:40 you wrote:

...
>> >> > === modified file 'lib/lp/buildmaster/manager.py'
>> >> > --- lib/lp/buildmaster/manager.py   2010-08-02 16:00:50 +0000
>> >> > +++ lib/lp/buildmaster/manager.py   2010-08-18 17:01:23 +0000
>>
>> ...
>>
>> >> > +        :return: True if we disabled something, False if we did not.
>> >>
>> >> This seems like an odd thing to return. It doesn't seem to be used in
>> >> the code. What's it for?
>> >
>> > For the tests.
>>
>> Yeah, but what about the tests? Or rather, why do the tests care about
>> something that the code doesn't?
>
> Well lots of our tests return stuff that the code doesn't need, like Deferreds
> for example.  But fine, I've got rid of it.
>

Thanks.

>> >> > +            # Decide if we need to terminate the job or fail the
>> >> > +            # builder.
>> >> > +            self._incrementFailureCounts(builder)
>> >> > +            self.logger.info(
>> >> > +                "builder failure count: %s, job failure count: %s" %
>> >> > ( +                    builder.failure_count,
>> >> > +
>> >> > builder.currentjob.specific_job.build.failure_count)) +
>> >> > BaseDispatchResult(slave=None).assessFailureCounts(builder)
>> >>
>> >> This line makes me wonder why assessFailureCounts is on
>> >> BaseDispatchResult at all.  Perhaps it should be a method on Builder
>> >> that takes an optional 'info' parameter?
>> >>
>> >> The above code would then read:
>> >>
>> >>   builder.gotFailure()
>> >>   self.logger.info(...)
>> >>   builder.assessFailureCounts()
>> >
>> > I don't think we should continue to bloat model classes with code that is
>> > only used in one place.  This is a buildd-manager-specific function.
>> >
>> > Anyway, bugger, I had intended to factor it out to a standalone function
>> > as I also realised that that line is ridiculous and again forgot :/
>>
>> A standalone function would be a definite improvement, and fine by me.
>> I personally don't think it's bloat to say "a builder knows how to
>> handle builder failures", but I'm not going to push it.
>
> By the same argument, anything that deals with object X should be a method on
> that object X.
>

I would argue that most things that manipulate state on object X
should be methods of object X. I would not say it's a universal rule.

> The difference here is that it's not dealing with builder failures
> exclusively, it's dealing with a combination of builder and job failures, and
> I would personally find it odd that a builder was trying to modify job
> properties.
>
> Generally, I think the boundaries of responsibility in our code are poor and
> should be less like this.
>
> Did I convince you yet?  Probably not :)
>

I think you have, in this case, particularly given the concessions
below to have methods on Job (however it's spelled) and Builder that
do the failure incrementing. That seems like a clean division of
responsibility.

>>
>> >> > +        builder.failure_count += 1
>> >> > +        builder.currentjob.specific_job.build.failure_count += 1
>> >> > +
>> >>
>> >> Perhaps this should be a method on Builder. e.g.
>> >>
>> >>   class Builder:
>> >>
>> >>      # ...
>> >>
>> >>      def gotFailure(self):
>> >>          self.failure_count += 1
>> >>          self.currentjob.specific_job.build.failure_count += 1
>> >>
>> >> Likewise, there could also be a gotSuccess() that resets to 0.
>> >
>> > For the same reasons as before, I don't think this belongs on IBuilder.
>> >  Not to mention it's not just manipulating builder properties, it's
>> > changing the job's properties.
>>
>> In answer to the second point:
>>   def gotFailure(self):
>>     self.failure_count += 1
>>     self.getCurrentBuild().gotFailure()
>
> I still don't like it :( It's mixing responsibilities with what needs to
> happen on failures, which is something that only the build manager should be
> thinking about.
>
> I'd prefer to have a gotFailure() on each of Builder and BuildFarmJob and have
> the build manager call each in turn.
>
> Let me know if you think that is acceptable and I'll do that change (it's not
> in the attached diff).
>

I think that's a great idea.

>>
>> And as for bloat, I don't see why adding extra columns is not bloat.
>
> Sorry, I don't understand what you mean by that.
>

Never mind. Some other time :)

>> >> > === modified file 'lib/lp/buildmaster/tests/test_manager.py'
>> >> > --- lib/lp/buildmaster/tests/test_manager.py        2010-08-06
>> >> > 10:48:49 +0000 +++ lib/lp/buildmaster/tests/test_manager.py
>> >> >  2010-08-18 17:01:23 +0000
>> >>
>> >> ...
>> >>
>> >> > @@ -360,9 +360,17 @@
>> >> >
>> >> >              self.assertFalse(result.processed)
>> >> >
>> >> >          return d.addCallback(check_result)
>> >> >
>> >> > -    def testCheckDispatch(self):
>> >> > -        """`SlaveScanner.checkDispatch` is chained after dispatch
>> >> > requests. -
>> >> > +    def _setUpSlaveAndBuilder(self):
>> >> > +        # Helper function to set up a builder and its recording
>> >> > slave. +        slave = RecordingSlave('bob',
>> >> > 'http://foo.buildd:8221/', 'foo.host') +        bob_builder =
>> >> > getUtility(IBuilderSet)[slave.name] +        return slave,
>> >> > bob_builder
>> >> > +
>> >>
>> >> There's already a slave called 'bob' in the sampledata?
>> >
>> > There's a builder called 'bob'.  There are no slaves.
>> >
>> > There's also another called 'frog'.
>>
>> OK. As long as there are meaningfully named constants, I'm happy. When
>> reading the test in the first place, one can't tell whether that
>> hostname and port number is significant or not.
>
> OK, I'll make them obvious via intent-revealing variable names.
>

Sweet, thanks.

>>
>> >> Also, it might be a good idea to have this method guarantee that the
>> >> failure counts are zero.
>> >
>> > I considered that, but the methods that call this one set the counts to
>> > varying values.
>>
>> I see that, but I do think it would be clearer, and less surprising
>> for anyone who adds more tests in future.
>
> I've re-jigged it so that the counts are passed to the setup method.
>

Ooh. Nice.

>> >> > +        manager.scheduleNextScanCycle = FakeMethod()
>> >> > +        self.patch(BaseDispatchResult, 'assessFailureCounts',
>> >> > FakeMethod())
>> >>
>> >> Why are you stubbing out assessFailureCounts here?
>> >
>> > Because at the bottom of the method you'll see that it's checking the
>> > call_count.
>>
>> Yeah, but why are you checking that it's called? Just as easy and more
>> robust to check that the state is what you expect.
>
> I think it's less robust like that.  The expected state after calling that
> function is checked elsewhere.  Now, these tests can just check that it's
> called at all and be confident that it will DTRT.  This means that if we tweak
> the method later, it will only require one test to be fixed instead of fixing
> all the tests that duplicate the same check.  (this is something that
> particularly pisses me off with Soyuz tests and I am slowly fixing them all)
>
> If you think this is wrong, I am open as to why you think so.
>

Re robustness, I think you lose one kind of robustness and gain another.

"Was this method called?" tests are robust against behaviour change.
If you change what assessFailureCounts is supposed to do, then you
won't have to change these tests, only the assessFailureCounts ones.
(Some people call this "Mockist")

Tests that check state are robust against implementation change. If
you change the details of how you've implemented the code (as you've
done during this review process), you won't have to change any tests.
(Some people call this "Classic")

As a rule, I lean heavily toward the latter, since I change
implementation details much more frequently than behaviour. I use
mocks in situations where there's a very clearly defined, fairly
stable interface and when using some other kind of double would be
prohibitively difficult.

See http://martinfowler.com/articles/mocksArentStubs.html for a good
discussion on the whole topic.

As far as this diff goes, I think using a mock is less preferable but
still sound, and I'm happy to see it merged.

>> > === modified file 'lib/lp/buildmaster/manager.py'
>> > --- lib/lp/buildmaster/manager.py   2010-08-18 16:58:18 +0000
>> > +++ lib/lp/buildmaster/manager.py   2010-08-19 15:51:41 +0000
>> > @@ -141,10 +141,55 @@
>> >
>> >          return d
>> >
>> > +def get_builder(name):
>> > +    """Helper to return the builder given the slave for this request."""
>> > +    # Avoiding circular imports.
>> > +    from lp.buildmaster.interfaces.builder import IBuilderSet
>> > +    return getUtility(IBuilderSet)[name]
>> > +
>> > +
>> > +def assessFailureCounts(builder, fail_notes):
>> > +    """View builder/job failure_count and work out which needs to die.
>> > +
>> > +    :return: True if we disabled something, False if we did not.
>> > +    """
>> > +    current_job = builder.currentjob
>> > +    build_job = current_job.specific_job.build
>> > +
>>
>> Or indeed, build_job = builder.getCurrentBuildFarmJob()
>
> That would incur duplicated queries! (via getting currentjob twice)  It's why
> I've split it up as I did.
>
> I think performance should trump things here but perhaps you can think of a
> better way?
>

It seems to me that Builder.currentjob ought to be replaced with a
method asap, since it's doing a query, and naive readers might well
miss that fact.

If it's doing dupe queries and you aren't going to change currentjob
to a method in this branch, then please add a comment saying something
like:

  # builder.currentjob runs a query. Don't run it twice.

cheers,
jml

« Back to merge proposal