Code review comment for lp:~al-maisan/launchpad/unembargo-oops-526645

Revision history for this message
Muharem Hrnjadovic (al-maisan) wrote :

On 03/11/2010 03:06 PM, Michael Nelson wrote:
> Review: Approve code
>
> Wow! I'm looking forward to seeing how this improves the response time
> for uploads with lots of binaries :)
Hello Michael,

thanks very much for the review! I am quite confident that this change
will have a positive impact :)

> I've only got a few suggestions (extra comments, use of MasterStore,
> possible renaming etc.). I've also attached a diff with the equivalent
> (unformatted) storm version that works too (as I know you don't like
> to work with storm ;) ). Of course, it's up to you whether you use it
> or go with the SQL.
Hrm .. the SQL version has been running on dogfood for a few days now
and I am quite confident that it actually works..

Please see also my in-line responses below as well as the attached
incremental diff.

> Thanks!
> -Michael
>
> On Thu, Mar 11, 2010 at 10:44 AM, Muharem Hrnjadovic
> <email address hidden> wrote:
>> Muharem Hrnjadovic has proposed merging lp:~al-maisan/launchpad/unembargo-oops-526645 into lp:launchpad/devel.
>>
>> Requested reviews:
>> Canonical Launchpad Engineering (launchpad)
>> Related bugs:
>> #526645 cannot unembargo due to LP API timeouts
>> https://bugs.launchpad.net/bugs/526645
>>
>>
>> Hello there!
>>
>> The 4 OOPSs attached to the related bug indicate the optimization with the
>> highest ROI (see the top 3 queries in the "Repeated SQL Statements" section):
>> we query the database for *each* binary in order to find out whether it's
>> already published in the destination archive. That's inefficient and accounts
>> for 50% percent of the SQL time.
>>
>> The branch at hand performs the "is the binary published in the destination
>> archive already?" check in one go.
>>
>> Tests to run:
>>
>> bin/test -vv test_uploadprocessor
>>
>> No "make lint" issues.
>>
>> --
>> https://code.launchpad.net/~al-maisan/launchpad/unembargo-oops-526645/+merge/21126
>> Your team Launchpad code reviewers from Canonical is subscribed to branch lp:launchpad/devel.
>>
>> === modified file 'lib/lp/soyuz/interfaces/queue.py'
>> --- lib/lp/soyuz/interfaces/queue.py 2009-07-19 04:41:14 +0000
>> +++ lib/lp/soyuz/interfaces/queue.py 2010-03-11 09:44:23 +0000
>> @@ -411,23 +411,6 @@
>> title=_("The related build"), required=True, readonly=False,
>> )
>>
>> - def verifyBeforeAccept():
>> - """Perform overall checks before accepting a binary upload.
>> -
>> - Ensure each uploaded binary file can be published in the targeted
>> - archive.
>> -
>> - If any of the uploaded binary files are already published a
>> - QueueInconsistentStateError is raised containing all filenames
>> - that cannot be published.
>> -
>> - This check is very similar to the one we do for source upload and
>> - was designed to prevent the creation of binary publications that
>> - will never reach the archive.
>> -
>> - See bug #227184 for further details.
>> - """
>> -
>> def publish(logger=None):
>> """Publish this queued source in the distroseries referred to by
>> the parent queue item.
>>
>> === modified file 'lib/lp/soyuz/model/queue.py'
>> --- lib/lp/soyuz/model/queue.py 2010-01-10 04:58:44 +0000
>> +++ lib/lp/soyuz/model/queue.py 2010-03-11 09:44:23 +0000
>> @@ -217,9 +217,9 @@
>> except QueueSourceAcceptError, info:
>> raise QueueInconsistentStateError(info)
>>
>> + self._checkForBinariesinDestinationArchive(
>> + [pub.build for pub in self.builds])
>
> It looks like we should rename self.builds here?
Hmm .. good idea but probably better done in a separate clean-up branch
since the self.builds is used all over the place.

>> for build in self.builds:
>> - # as before, but for QueueBuildAcceptError
>> - build.verifyBeforeAccept()
>> try:
>> build.checkComponentAndSection()
>> except QueueBuildAcceptError, info:
>> @@ -228,6 +228,62 @@
>> # if the previous checks applied and pass we do set the value
>> self.status = PassthroughStatusValue(PackageUploadStatus.ACCEPTED)
>>
>> + def _checkForBinariesinDestinationArchive(self, builds):
>> + """
>> + Check for existing binaries (in destination archive) for all binary
>> + uploads to be accepted.
>> +
>> + Before accepting binary uploads we check whether any of the binaries
>> + already exists in the destination archive and raise an exception
>> + (QueueInconsistentStateError) if this is the case.
>> +
>> + The only way to find pre-existing binaries is to match on binary
>> + package file names.
>> + """
>> + if len(builds) == 0:
>> + return
>> +
>
> It would be great to include a summary of what the query below is
> doing... something like:
> # Collect all the binary filenames for the set of builds.
Thanks for the suggestion. Done.

>> + inner_query = """
>> + SELECT DISTINCT lfa.filename
>> + FROM
>> + binarypackagefile bpf, binarypackagerelease bpr,
>> + libraryfilealias lfa
>> + WHERE
>> + bpr.build IN %s
>> + AND bpf.binarypackagerelease = bpr.id
>> + AND bpf.libraryfile = lfa.id
>> + """ % sqlvalues([build.id for build in builds])
>> +
>
> And again... let's see if I've understood this correctly, something like:
>
> # Check whether any of the binary filenames have already been
> published in this archive.
Yup, only s/this/the destination/

>> + query = """
>> + SELECT DISTINCT lfa.filename
>> + FROM
>> + binarypackagefile bpf, binarypackagepublishinghistory bpph,
>> + distroarchseries das, distroseries ds, libraryfilealias lfa
>> + WHERE
>> + bpph.archive = %s
>> + AND bpph.distroarchseries = das.id
>> + AND das.distroseries = ds.id
>> + AND ds.distribution = %s
>> + AND bpph.binarypackagerelease = bpf.binarypackagerelease
>> + AND bpf.libraryfile = lfa.id
>> + AND lfa.filename IN (%%s)
>> + """ % sqlvalues(self.archive, self.distroseries.distribution)
>> + # Inject the inner query.
>> + query %= inner_query
>> +
>> + store = IMasterStore(PackageUpload)
>
> Is there a reason for using the master store here? Given that we're not writing?
Not really .. changed this to a normal Store.of()

> Thanks!
>> + result_set = store.execute(query)
>> + known_filenames = [row[0] for row in result_set.get_all()]
>> +
>> + # Do any of the files to be uploaded already exist in the destination
>> + # archive?
>> + if len(known_filenames)> 0:
>> + filename_list = "\n\t%s".join(
>> + [filename for filename in known_filenames])
>> + raise QueueInconsistentStateError(
>> + 'The following files are already published in %s:\n%s' % (
>> + self.archive.displayname, filename_list))
>> +
>> def setDone(self):
>> """See `IPackageUpload`."""
>> if self.status == PackageUploadStatus.DONE:
>> @@ -1334,31 +1390,6 @@
>> # guaranteed to exist in the DB. We don't care if sections are
>> # not official.
>>
>> - def verifyBeforeAccept(self):
>> - """See `IPackageUploadBuild`."""
>> - distribution = self.packageupload.distroseries.distribution
>> - known_filenames = []
>> - # Check if the uploaded binaries are already published in the archive.
>> - for binary_package in self.build.binarypackages:
>> - for binary_file in binary_package.files:
>> - try:
>> - published_binary = distribution.getFileByName(
>> - binary_file.libraryfile.filename, source=False,
>> - archive=self.packageupload.archive)
>> - except NotFoundError:
>> - # Only unknown files are ok.
>> - continue
>> -
>> - known_filenames.append(binary_file.libraryfile.filename)
>> -
>> - # If any of the uploaded files are already present we have a problem.
>> - if len(known_filenames)> 0:
>> - filename_list = "\n\t%s".join(
>> - [filename for filename in known_filenames])
>> - raise QueueInconsistentStateError(
>> - 'The following files are already published in %s:\n%s' % (
>> - self.packageupload.archive.displayname, filename_list))
>> -
>> def publish(self, logger=None):
>> """See `IPackageUploadBuild`."""
>> # Determine the build's architecturetag
>>
>>
>> --
>> launchpad-reviews mailing list
>> <email address hidden>
>> https://lists.ubuntu.com/mailman/listinfo/launchpad-reviews

Best regards

--
Muharem Hrnjadovic <email address hidden>
Public key id : B2BBFCFC
Key fingerprint : A5A3 CC67 2B87 D641 103F 5602 219F 6B60 B2BB FCFC

1=== modified file 'lib/lp/soyuz/model/queue.py'
2--- lib/lp/soyuz/model/queue.py 2010-03-11 09:39:39 +0000
3+++ lib/lp/soyuz/model/queue.py 2010-03-11 14:38:48 +0000
4@@ -218,10 +218,10 @@
5 raise QueueInconsistentStateError(info)
6
7 self._checkForBinariesinDestinationArchive(
8- [pub.build for pub in self.builds])
9- for build in self.builds:
10+ [queue_build.build for queue_build in self.builds])
11+ for queue_build in self.builds:
12 try:
13- build.checkComponentAndSection()
14+ queue_build.checkComponentAndSection()
15 except QueueBuildAcceptError, info:
16 raise QueueInconsistentStateError(info)
17
18@@ -243,6 +243,7 @@
19 if len(builds) == 0:
20 return
21
22+ # Collects the binary file names for all builds.
23 inner_query = """
24 SELECT DISTINCT lfa.filename
25 FROM
26@@ -254,6 +255,8 @@
27 AND bpf.libraryfile = lfa.id
28 """ % sqlvalues([build.id for build in builds])
29
30+ # Check whether any of the binary file names have already been
31+ # published in the destination archive.
32 query = """
33 SELECT DISTINCT lfa.filename
34 FROM
35@@ -271,7 +274,7 @@
36 # Inject the inner query.
37 query %= inner_query
38
39- store = IMasterStore(PackageUpload)
40+ store = Store.of(self)
41 result_set = store.execute(query)
42 known_filenames = [row[0] for row in result_set.get_all()]
43

« Back to merge proposal