Hi Henning, thanks for a very thorough review.

I've made most of the changes you suggested, except where we agreed
otherwise. Note that, in accordance with Stuart's recommendation I've
moved the tunable loop out of bugheat.py and into garbo.py as a
DBTunableLoop.

An incremental diff is attached.

On Mon, Jan 11, 2010 at 01:06:16PM -0000, Henning Eggers wrote:
> 1. Using "heat" consistently as the term and not use "hotness" will
> avoid confusion. We already agreed on that on IRC. Could you please also
> file a bug about renaming the db column?

I've changed the term in the code and filed bug 506514.

> 2. Using constants for heat calculations. Whenever I see constant values
> used in code, some alarm goes off inside of me. With a collection of
> values like this, multiple alarms where ringing ... ;-) Please see my
> suggestion below.

Understood. I've changed the class to use named constants.

> Apart from that, just a few minor things.
> 
> Again, thanks for your work.
> 
> Henning
> 
> 
> 

> > +    >>> factory = LaunchpadObjectFactory()
> 
> You can delete those two lines, "factory" is already defined in doc tests.

Done.

> > +The heat generated by duplicates is n*6, where n is the number of
> > +duplicates that the bug has. With one duplicate,
> > +_getHeatFromDuplicates() will return 6.
> 
> OK, this weighting value should definitely go into a constant of some
> kind. I think a BugHeatConstants class in the interface file would be
> appropriate.
> 
> class BugHeatConstants:
>     """Constants needed to calculate bug heat."""
> 
>     DUPLICATES = 6
> 
> etc. See lib/lp/translations/interfaces/translations.py for an example.
> 
> Now, the question is what to do about the test. If you change the
> weighting later on, do you want to update your test, too? In the
> documentation and in the test code? You can decide on that but I think I
> would do something like this:
> 
> 
> Weighting parameters are found in BugHeatConstants.
> The heat generated by duplicates is n*DUPLICATES, where n is the number of
> duplicates that the bug has.
> 
> > +
> > +    >>> dupe = factory.makeBug()
> > +    >>> dupe.duplicateof = bug
> > +
> > +    >>> bug_heat_calculator._getHeatFromDuplicates()
> > +    6
> 
> >>> heat = bug_heat_calculator._getHeatFromDuplicates()
> >>> print heat == BugHeatConstants.DUPLICATES
> True
> 
> > +
> > +With 5 duplicates, _getHeatFromDuplicates() will return 30.
> 
> With 5 duplicates, the value will be 5 times as high.
> 
> > +
> > +    >>> for i in range(4):
> > +    ...     dupe = factory.makeBug()
> > +    ...     dupe.duplicateof = bug
> > +    >>> transaction.commit()
> > +
> > +    >>> bug_heat_calculator._getHeatFromDuplicates()
> > +    30
> 
> >>> heat = bug_heat_calculator._getHeatFromDuplicates()
> >>> print heat == 5 * BugHeatConstants.DUPLICATES
> True
> 
> 
> But as I said, I leave that up to you to decide.
> 
> This goes for all heat calculations, obviously, so I am not repeating
> that here.

I've done as you suggested and, as we discussed on IRC, transformed
these tests into unit tests.

> > +BugHeatCalculator.getBugHeat()
> > +------------------------------
> > +
> > +BugHeatCalculator.getBugHeat() returns the total heat for a given bug as
> > +the sum of the results of all _getHeatFrom*() methods.
> > +
> > +For our bug, the total heat will be 470:
> > +
> > + * From duplicates 30
> > + * From subscribers 20
> > + * From privacy 150
> > + * From security 250
> > + * From affected users 20
> 
> This might not work so clearly with my suggestion for the test. But
> you'll think of something, I am sure. ;-P

See the nice new shiny unittests.

> > 
> > === modified file 'lib/lp/bugs/interfaces/bug.py'
> 
> BugHeatConstants should go in here.
> 

I disagree; I think they should be in bugheat.py since they're not going
to be used elsewhere.

> > --- lib/lp/bugs/interfaces/bug.py	2009-12-09 20:48:01 +0000
> > +++ lib/lp/bugs/interfaces/bug.py	2010-01-11 11:29:16 +0000
> > @@ -296,6 +296,10 @@
> >              value_type=Reference(schema=IPerson),
> >              readonly=True))
> >  
> > +    hotness = Int(
> > +        title=_("The 'hotness' of the bug"),
> > +        required=False, readonly=True)
> > +
> 
> heat = ...
> 

Fixed.

> >      # Adding related BugMessages provides a hook for getting at
> >      # BugMessage.visible when building bug comments.
> >      bug_messages = Attribute('The bug messages related to this object.')
> > @@ -732,6 +736,9 @@
> >          if the user is the owner or an admin.
> >          """
> >  
> > +    def setBugHotness(hotness):
> > +        """Set the hotness for the bug."""
> 
> setBugHeat
> 

Fixed.

> > +
> >  class InvalidDuplicateValue(Exception):
> >      """A bug cannot be set as the duplicate of another."""
> >      webservice_error(417)
> > @@ -970,6 +977,19 @@
> >          Otherwise, return False.
> >          """
> >  
> > +    def dangerousGetAllBugs():
> > +        """DO NOT CALL THIS METHOD.
> > +
> > +        This method exists solely to allow the bug heat script to grab
> > +        all the bugs in the database - including private ones - and
> > +        iterate over them. DO NOT USE IT UNLESS YOU KNOW WHAT YOU'RE
> > +        DOING. AND IF YOU KNOW WHAT YOU'RE DOING YOU KNOW BETTER THAN TO
> > +        USE THIS ANYWAY.
> > +        """
> > +        # XXX 2010-01-08 gmb bug=505850:
> > +        #     Note, this method should go away when we have a proper
> > +        #     permissions system for scripts.
> > +
> 
> Yeah, it's really bad there is no better solution for this. Hope this
> gets fixed soon.
> 

Me too.

> >  
> >  class InvalidBugTargetType(Exception):
> >      """Bug target's type is not valid."""
> > 
> > === modified file 'lib/lp/bugs/model/bug.py'
> > --- lib/lp/bugs/model/bug.py	2009-12-14 15:25:55 +0000
> > +++ lib/lp/bugs/model/bug.py	2010-01-11 11:29:16 +0000
> > @@ -254,6 +254,7 @@
> >      message_count = IntCol(notNull=True, default=0)
> >      users_affected_count = IntCol(notNull=True, default=0)
> >      users_unaffected_count = IntCol(notNull=True, default=0)
> > +    hotness = IntCol(notNull=True, default=0)
> 
> heat = IntCol(notNull=True, default=0, name="hotness") AFAIK
> 

Fixed

> >  
> >      @property
> >      def comment_count(self):
> > @@ -1425,6 +1426,10 @@
> >  
> >          return not subscriptions_from_dupes.is_empty()
> >  
> > +    def setHotness(self, hotness):
> > +        """See `IBug`."""
> > +        self.hotness = hotness
> > +
> 
> self.heat = heat
> 

Fixed

> >  
> >  class BugSet:
> >      """See BugSet."""
> > @@ -1665,13 +1670,19 @@
> >          return bugs
> >  
> >      def getByNumbers(self, bug_numbers):
> > -        """see `IBugSet`."""
> > +        """See `IBugSet`."""
> >          if bug_numbers is None or len(bug_numbers) < 1:
> >              return EmptyResultSet()
> >          store = IStore(Bug)
> >          result_set = store.find(Bug, In(Bug.id, bug_numbers))
> >          return result_set.order_by('id')
> >  
> > +    def dangerousGetAllBugs(self):
> > +        """See `IBugSet`."""
> > +        store = IStore(Bug)
> > +        result_set = store.find(Bug)
> > +        return result_set.order_by('id')
> > +
> >  
> >  class BugAffectsPerson(SQLBase):
> >      """A bug is marked as affecting a user."""
> > 
> > === added file 'lib/lp/bugs/scripts/bugheat.py'
> > --- lib/lp/bugs/scripts/bugheat.py	1970-01-01 00:00:00 +0000
> > +++ lib/lp/bugs/scripts/bugheat.py	2010-01-11 11:29:16 +0000
> > @@ -0,0 +1,153 @@
> > +# Copyright 2010 Canonical Ltd.  This software is licensed under the
> > +# GNU Affero General Public License version 3 (see the file LICENSE).
> > +
> > +"""The innards of the Bug Heat cronscript."""
> > +
> > +__metaclass__ = type
> > +__all__ = []
> > +
> > +
> > +from zope.component import getUtility
> > +from zope.interface import implements
> > +
> > +from canonical.launchpad.interfaces.looptuner import ITunableLoop
> > +from canonical.launchpad.utilities.looptuner import LoopTuner
> > +
> > +from lp.bugs.interfaces.bug import IBugSet
> > +
> > +class BugHeatCalculator:
> 
> I am surprised this goes into this file. True, it is used in the script
> only atm, but it is not really specific to being used in scripts, is it?
> But that is not a major point for me.
> 

I've put it here because:

 - It's not really model code and so doesn't belong with the model
 - It's not interface code and doesn't belong with the interface
 - It will *only* be used by scripts.

As you've said you're not too bothered about this I'll leave it where it
is.

> > +    """A class to calculate the heat for a bug."""
> > +
> > +    def __init__(self, bug):
> > +        self.bug = bug
> > +
> > +    def _getHeatFromPrivacy(self):
> > +        """Return the heat generated by the bug's `private` attribute."""
> > +        if self.bug.private:
> > +            return 150
> 
> return BugHeatConstants.PRIVATE
> 
> You get what I mean, I am sure... ;)
> 

Yep.

> > +        else:
> > +            return 0
> 
> return BugHeatConstants.COLD  :-) Maybe not....
> 

I think we can survive with zeroes ;)

> > +
> > +    def getBugHeat(self):
> > +        """Return the total heat for the current bug."""
> > +        heat_counts = [
> > +            self._getHeatFromAffectedUsers(),
> > +            self._getHeatFromDuplicates(),
> > +            self._getHeatFromPrivacy(),
> > +            self._getHeatFromSecurity(),
> > +            self._getHeatFromSubscribers(),
> > +            ]
> > +
> > +        total_heat = 0
> > +        for count in heat_counts:
> > +            total_heat += count
> > +
> > +        return total_heat
> 
> Err, what's wrong with "sum" here?
> return sum(heat_counts)
> 
> Or even just one line:
>         return sum([
>             self._getHeatFromAffectedUsers(),
>             self._getHeatFromDuplicates(),
>             self._getHeatFromPrivacy(),
>             self._getHeatFromSecurity(),
>             self._getHeatFromSubscribers(),
>             ])
> 

Nothing, I just forgot I could do that ;).

> > +
> > +
> > +class BugHeatTunableLoop:
> > +    """An `ITunableLoop` implementation for bug heat calculations."""
> > +
> > +    implements(ITunableLoop)
> > +
> > +    total_updated = 0
> > +
> > +    def __init__(self, transaction, logger, offset=0):
> > +        self.transaction = transaction
> > +        self.logger = logger
> > +        self.offset = offset
> > +        self.total_updated = 0
> > +
> > +    def isDone(self):
> > +        """See `ITunableLoop`."""
> > +        # When the main loop has no more Bugs to process it sets
> > +        # offset to None. Until then, it always has a numerical
> > +        # value.
> > +        return self.offset is None
> > +
> > +    def __call__(self, chunk_size):
> > +        """Retrieve a batch of Bugs and update their heat.
> > +
> > +        See `ITunableLoop`.
> > +        """
> > +        # XXX 2010-01-08 gmb bug=198767:
> > +        #     We cast chunk_size to an integer to ensure that we're not
> > +        #     trying to slice using floats or anything similarly
> > +        #     foolish. We shouldn't have to do this.
> > +        chunk_size = int(chunk_size)
> > +
> > +        start = self.offset
> > +        end = self.offset + chunk_size
> > +
> > +        self.transaction.begin()
> > +        # XXX 2010-01-08 gmb bug=505850:
> > +        #     This method call should be taken out and shot as soon as
> > +        #     we have a proper permissions system for scripts.
> > +        bugs = list(getUtility(IBugSet).dangerousGetAllBugs()[start:end])
> 
> We talked about this on IRC and you wanted to remove list().
> 

Yes, that's done.

> > +
> > +        self.offset = None
> > +        if bugs:
> 
> Either use bugs.any() or test the length for zero.
> 
> number_of_bugs = len(bugs)
> if number_of_bugs > 0:
> 

Done.

> > +            starting_id = bugs[0].id
> > +            self.logger.info("Updating %i Bugs (starting id: %i)" %
> > +                (len(bugs), starting_id))
> > +
> > +        for bug in bugs:
> > +            # We set the starting point of the next batch to the Bug
> > +            # id after the one we're looking at now. If there aren't any
> > +            # bugs this loop will run for 0 iterations and start_id
> 
> What's start_id? ;-)

Ha, fixed.

> > +            # will remain set to None.
> > +            start += 1
> > +            self.offset = start
> > +            self.logger.debug("Updating heat for bug %s" % bug.id)
> > +            bug_heat_calculator = BugHeatCalculator(bug)
> > +            heat = bug_heat_calculator.getBugHeat()
> > +            bug.setHotness(heat)
> > +            self.total_updated += 1
> 
> Seriously, I don't see the point of all those assignments in the loop
> (start, self.offset, self.total_updated). start is not even used.
> Probably copy-and-paste error.

No, it's because the offset will be None when there are no bugs left to
check, so we copy it into `start` and then set it to None until we start
the loop, when we know that there actually are bugs left to cover.

> Why not let the loop run and update the values at the end?
> 
> 	if number_of_bugs == 0:
>             # Terminate loop
>             self.total_updated = self.offset
>             self.offset = None
>             return
>         # Insert log output here ...
>         for bug in bugs:
>             self.logger.debug("Updating heat for bug %s" % bug.id)
>             bug_heat_calculator = BugHeatCalculator(bug)
>             heat = bug_heat_calculator.getBugHeat()
>             bug.setHotness(heat)
>         self.offset += number_of_bugs
> 
> 

Because if one bug fails things might get interesting. I don't know how
LoopTuner works with such situations but it's safer to update after each
one.

> > === modified file 'lib/lp/bugs/tests/test_doc.py'
> 
> Oh, good! I never knew about this file. Something learned for me, here! :-)
> 

:)

> > --- lib/lp/bugs/tests/test_doc.py	2009-10-02 11:29:22 +0000
> > +++ lib/lp/bugs/tests/test_doc.py	2010-01-11 11:29:16 +0000
> > @@ -89,6 +89,12 @@
> >          '../doc/cve-update.txt',
> >          setUp=cveSetUp, tearDown=tearDown, layer=LaunchpadZopelessLayer
> >          ),
> > +    'bug-heat.txt': LayeredDocFileSuite(
> > +        '../doc/bug-heat.txt',
> > +        setUp=setUp,
> > +        tearDown=tearDown,
> > +        layer=LaunchpadZopelessLayer
> > +        ),
> >      'bugnotificationrecipients.txt-uploader': LayeredDocFileSuite(
> >          '../doc/bugnotificationrecipients.txt',
> >          setUp=uploaderBugsSetUp,

-- 
Graham Binns | PGP Key: EC66FA7D