Merge lp:~jtv/launchpad/bug-409330 into lp:launchpad

Proposed by Jeroen T. Vermeulen
Status: Merged
Approved by: Jeroen T. Vermeulen
Approved revision: no longer in the source branch.
Merged at revision: not available
Proposed branch: lp:~jtv/launchpad/bug-409330
Merge into: lp:launchpad
Diff against target: None lines
To merge this branch: bzr merge lp:~jtv/launchpad/bug-409330
Reviewer Review Type Date Requested Status
Celso Providelo (community) Approve
Review via email: mp+9760@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

= Bug 409330 =

The nightly cron job that verifies the cached statistics for POFiles is
taking far too long. The last run pushed the nightly script as a whole
over 24 hours, which obviously causes trouble.

With message sharing, this script has become more important than it was.
A change to one template or translation can have all sorts of influence
on other templates and/or translations, and we leave it to the script to
set things straight offline.

Message sharing also makes it slightly too complex to find exactly the
POFiles whose statistics most need updating. We may still do that later
but for now we have an immediate problem to solve.

The only remaining way is to use the starting-ID option. This tells the
job to consider only POFiles with ids starting at a given number. We
can run the script less frequently on older POFiles, on the assumption
that newer POFiles are more likely to be updated more frequently. (It's
a widely effective optimization assumption that new data is likely to be
hot data).

Looking at recent logs, the sweet spot seems to lie between ids 870,000
and 880,000. Starting there covers almost all updates that are usually
made by the script; it's the point where the changes start to become
dense. On the most recent script run, it would have saved slightly over
half the script's run time while still doing almost all of the same job.

We can tweak this number later if needed, or come up with a better
mechanism to approximate the right set of POFiles.

== Implementation ==

The option to start verifying at a given POFile id was already present
in the lower levels of this script's implementation, but the script
object did not expose it to the command line yet. This branch adds the
corresponding command-line option.

It also takes the stats script run out of nightly.sh, so that that
script can stay nicely within 24 hours of run time. The script is to be
run daily as:

    python2.4 rosetta-pofile-stats.py --start-id 875000

And less frequently, say weekly, we'll want a slow but complete run:

    python2.4 rosetta-pofile-stats.py

Finally, the script is made to use DBLoopTuner instead of LoopTuner to
make it a bit more mindful of other clients on the database.

== Tests ==

{{{
./bin/test -vv -t pofile-verify-stats
}}}

== Demo and Q/A ==

After the change, perform a partial script run:

    python2.4 rosetta-pofile-stats.py --start-id 875000

This should take at least 3 hours, probably more, and update a lot of
POFiles. It's hard to predict the number, but at a guess I'd expect it
to be close to a thousand. (Though it may also be as low as a hundred).

Now perform a full script run:

    python2.4 rosetta-pofile-stats.py

This should take very roughly twice as long, but update much fewer
incorrect statistics. The partial run took care of most.

== lint ==

No lint complaints.

Jeroen

Revision history for this message
Celso Providelo (cprov) wrote :

Jeroen,

Thanks for investigating and find a quick solution for this production bug.

You need a new test for VerifyPOFileStats, to certify 'start_id' is handled as you expect. Other than that the code changes are good and harmless.

I'm assuming you are coordinating these changes with LOSAs, so they can add the new cronjob with the right parameters in production once it's CPed.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'cronscripts/nightly.sh'
2--- cronscripts/nightly.sh 2009-06-24 20:52:01 +0000
3+++ cronscripts/nightly.sh 2009-08-06 12:30:58 +0000
4@@ -75,9 +75,6 @@
5 echo == Updating package cache `date` ==
6 python2.4 update-pkgcache.py -q
7
8-echo == POFile stats `date` ==
9-python2.4 rosetta-pofile-stats.py
10-
11 echo == Product Release Finder `date` ==
12 python2.4 product-release-finder.py -q
13
14
15=== modified file 'cronscripts/rosetta-pofile-stats.py'
16--- cronscripts/rosetta-pofile-stats.py 2009-07-17 00:26:05 +0000
17+++ cronscripts/rosetta-pofile-stats.py 2009-08-06 12:30:58 +0000
18@@ -17,11 +17,22 @@
19 class VerifyPOFileStats(LaunchpadCronScript):
20 """Trawl `POFile` table, verifying and updating cached statistics."""
21
22+ def add_my_options(self):
23+ self.parser.add_option('-i', '--start-id', dest='start_id',
24+ type='int',
25+ help="Verify from this POFile id upward.")
26+
27 def main(self):
28- VerifyPOFileStatsProcess(self.txn, self.logger).run()
29+ if self.options.start_id is not None:
30+ start_id = int(self.options.start_id)
31+ else:
32+ start_id = 0
33+
34+ verifier = VerifyPOFileStatsProcess(
35+ self.txn, self.logger, start_at_id=start_id)
36+ verifier.run()
37
38
39 if __name__ == '__main__':
40 script = VerifyPOFileStats(name="pofile-stats", dbuser='pofilestats')
41 script.lock_and_run()
42-
43
44=== modified file 'lib/lp/translations/scripts/verify_pofile_stats.py'
45--- lib/lp/translations/scripts/verify_pofile_stats.py 2009-07-17 00:26:05 +0000
46+++ lib/lp/translations/scripts/verify_pofile_stats.py 2009-08-06 12:30:58 +0000
47@@ -19,7 +19,8 @@
48 from lp.translations.interfaces.pofile import IPOFileSet
49 from lp.services.mail.sendmail import simple_sendmail
50 from canonical.launchpad.mailnotification import MailWrapper
51-from canonical.launchpad.utilities.looptuner import LoopTuner
52+from canonical.launchpad.utilities.looptuner import DBLoopTuner
53+
54
55 class Verifier:
56 """`ITunableLoop` that recomputes & checks all `POFile`s' statistics."""
57@@ -109,7 +110,7 @@
58 # acquired only at the very end of the iteration, we can afford to
59 # make relatively long, low-overhead iterations without disrupting
60 # application response times.
61- LoopTuner(loop, 4).run()
62+ DBLoopTuner(loop, 4).run()
63
64 if loop.total_incorrect > 0 or loop.total_exceptions > 0:
65 # Not all statistics were correct, or there were failures while