Code review comment for lp:~al-maisan/launchpad/unembargo-oops-526645

Revision history for this message
Michael Nelson (michael.nelson) wrote :

Review: Approve code

Wow! I'm looking forward to seeing how this improves the response time
for uploads with lots of binaries :)

I've only got a few suggestions (extra comments, use of MasterStore,
possible renaming etc.). I've also attached a diff with the equivalent
(unformatted) storm version that works too (as I know you don't like
to work with storm ;) ). Of course, it's up to you whether you use it
or go with the SQL.

Thanks!
-Michael

On Thu, Mar 11, 2010 at 10:44 AM, Muharem Hrnjadovic
<email address hidden> wrote:
> Muharem Hrnjadovic has proposed merging lp:~al-maisan/launchpad/unembargo-oops-526645 into lp:launchpad/devel.
>
> Requested reviews:
>  Canonical Launchpad Engineering (launchpad)
> Related bugs:
>  #526645 cannot unembargo due to LP API timeouts
>  https://bugs.launchpad.net/bugs/526645
>
>
> Hello there!
>
> The 4 OOPSs attached to the related bug indicate the optimization with the
> highest ROI (see the top 3 queries in the "Repeated SQL Statements" section):
> we query the database for *each* binary in order to find out whether it's
> already published in the destination archive. That's inefficient and accounts
> for 50% percent of the SQL time.
>
> The branch at hand performs the "is the binary published in the destination
> archive already?" check in one go.
>
> Tests to run:
>
>    bin/test -vv test_uploadprocessor
>
> No "make lint" issues.
>
> --
> https://code.launchpad.net/~al-maisan/launchpad/unembargo-oops-526645/+merge/21126
> Your team Launchpad code reviewers from Canonical is subscribed to branch lp:launchpad/devel.
>
> === modified file 'lib/lp/soyuz/interfaces/queue.py'
> --- lib/lp/soyuz/interfaces/queue.py    2009-07-19 04:41:14 +0000
> +++ lib/lp/soyuz/interfaces/queue.py    2010-03-11 09:44:23 +0000
> @@ -411,23 +411,6 @@
>             title=_("The related build"), required=True, readonly=False,
>             )
>
> -    def verifyBeforeAccept():
> -        """Perform overall checks before accepting a binary upload.
> -
> -        Ensure each uploaded binary file can be published in the targeted
> -        archive.
> -
> -        If any of the uploaded binary files are already published a
> -        QueueInconsistentStateError is raised containing all filenames
> -        that cannot be published.
> -
> -        This check is very similar to the one we do for source upload and
> -        was designed to prevent the creation of binary publications that
> -        will never reach the archive.
> -
> -        See bug #227184 for further details.
> -        """
> -
>     def publish(logger=None):
>         """Publish this queued source in the distroseries referred to by
>         the parent queue item.
>
> === modified file 'lib/lp/soyuz/model/queue.py'
> --- lib/lp/soyuz/model/queue.py 2010-01-10 04:58:44 +0000
> +++ lib/lp/soyuz/model/queue.py 2010-03-11 09:44:23 +0000
> @@ -217,9 +217,9 @@
>             except QueueSourceAcceptError, info:
>                 raise QueueInconsistentStateError(info)
>
> +        self._checkForBinariesinDestinationArchive(
> +            [pub.build for pub in self.builds])

It looks like we should rename self.builds here?

>         for build in self.builds:
> -            # as before, but for QueueBuildAcceptError
> -            build.verifyBeforeAccept()
>             try:
>                 build.checkComponentAndSection()
>             except QueueBuildAcceptError, info:
> @@ -228,6 +228,62 @@
>         # if the previous checks applied and pass we do set the value
>         self.status = PassthroughStatusValue(PackageUploadStatus.ACCEPTED)
>
> +    def _checkForBinariesinDestinationArchive(self, builds):
> +        """
> +        Check for existing binaries (in destination archive) for all binary
> +        uploads to be accepted.
> +
> +        Before accepting binary uploads we check whether any of the binaries
> +        already exists in the destination archive and raise an exception
> +        (QueueInconsistentStateError) if this is the case.
> +
> +        The only way to find pre-existing binaries is to match on binary
> +        package file names.
> +        """
> +        if len(builds) == 0:
> +            return
> +

It would be great to include a summary of what the query below is
doing... something like:
# Collect all the binary filenames for the set of builds.

> +        inner_query = """
> +            SELECT DISTINCT lfa.filename
> +            FROM
> +                binarypackagefile bpf, binarypackagerelease bpr,
> +                libraryfilealias lfa
> +            WHERE
> +                bpr.build IN %s
> +                AND bpf.binarypackagerelease = bpr.id
> +                AND bpf.libraryfile = lfa.id
> +        """ % sqlvalues([build.id for build in builds])
> +

And again... let's see if I've understood this correctly, something like:

# Check whether any of the binary filenames have already been
published in this archive.

> +        query = """
> +            SELECT DISTINCT lfa.filename
> +            FROM
> +                binarypackagefile bpf, binarypackagepublishinghistory bpph,
> +                distroarchseries das, distroseries ds, libraryfilealias lfa
> +            WHERE
> +                bpph.archive = %s
> +                AND bpph.distroarchseries = das.id
> +                AND das.distroseries = ds.id
> +                AND ds.distribution = %s
> +                AND bpph.binarypackagerelease = bpf.binarypackagerelease
> +                AND bpf.libraryfile = lfa.id
> +                AND lfa.filename IN (%%s)
> +        """ % sqlvalues(self.archive, self.distroseries.distribution)
> +        # Inject the inner query.
> +        query %= inner_query
> +
> +        store = IMasterStore(PackageUpload)

Is there a reason for using the master store here? Given that we're not writing?

Thanks!
> +        result_set = store.execute(query)
> +        known_filenames = [row[0] for row in result_set.get_all()]
> +
> +        # Do any of the files to be uploaded already exist in the destination
> +        # archive?
> +        if len(known_filenames) > 0:
> +            filename_list = "\n\t%s".join(
> +                [filename for filename in known_filenames])
> +            raise QueueInconsistentStateError(
> +                'The following files are already published in %s:\n%s' % (
> +                    self.archive.displayname, filename_list))
> +
>     def setDone(self):
>         """See `IPackageUpload`."""
>         if self.status == PackageUploadStatus.DONE:
> @@ -1334,31 +1390,6 @@
>             # guaranteed to exist in the DB. We don't care if sections are
>             # not official.
>
> -    def verifyBeforeAccept(self):
> -        """See `IPackageUploadBuild`."""
> -        distribution = self.packageupload.distroseries.distribution
> -        known_filenames = []
> -        # Check if the uploaded binaries are already published in the archive.
> -        for binary_package in self.build.binarypackages:
> -            for binary_file in binary_package.files:
> -                try:
> -                    published_binary = distribution.getFileByName(
> -                        binary_file.libraryfile.filename, source=False,
> -                        archive=self.packageupload.archive)
> -                except NotFoundError:
> -                    # Only unknown files are ok.
> -                    continue
> -
> -                known_filenames.append(binary_file.libraryfile.filename)
> -
> -        # If any of the uploaded files are already present we have a problem.
> -        if len(known_filenames) > 0:
> -            filename_list = "\n\t%s".join(
> -                [filename for filename in known_filenames])
> -            raise QueueInconsistentStateError(
> -                'The following files are already published in %s:\n%s' % (
> -                    self.packageupload.archive.displayname, filename_list))
> -
>     def publish(self, logger=None):
>         """See `IPackageUploadBuild`."""
>         # Determine the build's architecturetag
>
>
> --
> launchpad-reviews mailing list
> <email address hidden>
> https://lists.ubuntu.com/mailman/listinfo/launchpad-reviews
>
>

1=== modified file 'lib/lp/soyuz/model/queue.py'
2--- lib/lp/soyuz/model/queue.py 2010-03-11 09:39:39 +0000
3+++ lib/lp/soyuz/model/queue.py 2010-03-11 13:54:03 +0000
4@@ -243,37 +243,38 @@
5 if len(builds) == 0:
6 return
7
8- inner_query = """
9- SELECT DISTINCT lfa.filename
10- FROM
11- binarypackagefile bpf, binarypackagerelease bpr,
12- libraryfilealias lfa
13- WHERE
14- bpr.build IN %s
15- AND bpf.binarypackagerelease = bpr.id
16- AND bpf.libraryfile = lfa.id
17- """ % sqlvalues([build.id for build in builds])
18-
19- query = """
20- SELECT DISTINCT lfa.filename
21- FROM
22- binarypackagefile bpf, binarypackagepublishinghistory bpph,
23- distroarchseries das, distroseries ds, libraryfilealias lfa
24- WHERE
25- bpph.archive = %s
26- AND bpph.distroarchseries = das.id
27- AND das.distroseries = ds.id
28- AND ds.distribution = %s
29- AND bpph.binarypackagerelease = bpf.binarypackagerelease
30- AND bpf.libraryfile = lfa.id
31- AND lfa.filename IN (%%s)
32- """ % sqlvalues(self.archive, self.distroseries.distribution)
33- # Inject the inner query.
34- query %= inner_query
35-
36- store = IMasterStore(PackageUpload)
37- result_set = store.execute(query)
38- known_filenames = [row[0] for row in result_set.get_all()]
39+ from canonical.launchpad.database.librarian import LibraryFileAlias
40+ from lp.soyuz.model.files import BinaryPackageFile
41+ from lp.soyuz.model.binarypackagerelease import BinaryPackageRelease
42+ from lp.soyuz.model.build import Build
43+ from storm.store import Store
44+ from storm.expr import And, Select
45+ build_ids = [build.id for build in builds]
46+ lfa_filenames = Select(
47+ LibraryFileAlias.filename,
48+ And(
49+ LibraryFileAlias.id == BinaryPackageFile.libraryfileID,
50+ And(
51+ BinaryPackageFile.binarypackagerelease == BinaryPackageRelease.id,
52+ BinaryPackageRelease.buildID.is_in(build_ids)
53+ )),
54+ distinct=True)
55+
56+ from lp.soyuz.model.publishing import BinaryPackagePublishingHistory
57+ from lp.registry.model.distribution import Distribution
58+ from lp.registry.model.distroseries import DistroSeries
59+ from lp.soyuz.model.distroarchseries import DistroArchSeries
60+ published_lfas = Store.of(self).find(
61+ LibraryFileAlias,
62+ LibraryFileAlias.filename.is_in(lfa_filenames),
63+ BinaryPackageFile.libraryfile == LibraryFileAlias.id,
64+ BinaryPackagePublishingHistory.binarypackagereleaseID == BinaryPackageFile.binarypackagereleaseID,
65+ DistroSeries.distribution == self.distroseries.distribution,
66+ DistroArchSeries.distroseries == DistroSeries.id,
67+ BinaryPackagePublishingHistory.distroarchseries == DistroArchSeries.id,
68+ BinaryPackagePublishingHistory.archive == self.archive)
69+ published_lfas.config(distinct=True)
70+ known_filenames = list(published_lfas.values(LibraryFileAlias.filename))
71
72 # Do any of the files to be uploaded already exist in the destination
73 # archive?

« Back to merge proposal