Merge lp:~al-maisan/launchpad/unembargo-oops-526645 into lp:launchpad

Proposed by Muharem Hrnjadovic
Status: Merged
Approved by: Muharem Hrnjadovic
Approved revision: no longer in the source branch.
Merged at revision: not available
Proposed branch: lp:~al-maisan/launchpad/unembargo-oops-526645
Merge into: lp:launchpad
Diff against target: 145 lines (+64/-46)
2 files modified
lib/lp/soyuz/interfaces/queue.py (+0/-17)
lib/lp/soyuz/model/queue.py (+64/-29)
To merge this branch: bzr merge lp:~al-maisan/launchpad/unembargo-oops-526645
Reviewer Review Type Date Requested Status
Michael Nelson (community) Approve
Review via email: mp+21126@code.launchpad.net

Description of the change

Hello there!

The 4 OOPSs attached to the related bug indicate the optimization with the
highest ROI (see the top 3 queries in the "Repeated SQL Statements" section):
we query the database for *each* binary in order to find out whether it's
already published in the destination archive. That's inefficient and accounts
for 50% percent of the SQL time.

The branch at hand performs the "is the binary published in the destination
archive already?" check in one go.

Tests to run:

    bin/test -vv test_uploadprocessor

No "make lint" issues.

To post a comment you must log in.
Revision history for this message
Michael Nelson (michael.nelson) wrote :
Download full text (8.0 KiB)

Review: Approve code

Wow! I'm looking forward to seeing how this improves the response time
for uploads with lots of binaries :)

I've only got a few suggestions (extra comments, use of MasterStore,
possible renaming etc.). I've also attached a diff with the equivalent
(unformatted) storm version that works too (as I know you don't like
to work with storm ;) ). Of course, it's up to you whether you use it
or go with the SQL.

Thanks!
-Michael

On Thu, Mar 11, 2010 at 10:44 AM, Muharem Hrnjadovic
<email address hidden> wrote:
> Muharem Hrnjadovic has proposed merging lp:~al-maisan/launchpad/unembargo-oops-526645 into lp:launchpad/devel.
>
> Requested reviews:
>  Canonical Launchpad Engineering (launchpad)
> Related bugs:
>  #526645 cannot unembargo due to LP API timeouts
>  https://bugs.launchpad.net/bugs/526645
>
>
> Hello there!
>
> The 4 OOPSs attached to the related bug indicate the optimization with the
> highest ROI (see the top 3 queries in the "Repeated SQL Statements" section):
> we query the database for *each* binary in order to find out whether it's
> already published in the destination archive. That's inefficient and accounts
> for 50% percent of the SQL time.
>
> The branch at hand performs the "is the binary published in the destination
> archive already?" check in one go.
>
> Tests to run:
>
>    bin/test -vv test_uploadprocessor
>
> No "make lint" issues.
>
> --
> https://code.launchpad.net/~al-maisan/launchpad/unembargo-oops-526645/+merge/21126
> Your team Launchpad code reviewers from Canonical is subscribed to branch lp:launchpad/devel.
>
> === modified file 'lib/lp/soyuz/interfaces/queue.py'
> --- lib/lp/soyuz/interfaces/queue.py    2009-07-19 04:41:14 +0000
> +++ lib/lp/soyuz/interfaces/queue.py    2010-03-11 09:44:23 +0000
> @@ -411,23 +411,6 @@
>             title=_("The related build"), required=True, readonly=False,
>             )
>
> -    def verifyBeforeAccept():
> -        """Perform overall checks before accepting a binary upload.
> -
> -        Ensure each uploaded binary file can be published in the targeted
> -        archive.
> -
> -        If any of the uploaded binary files are already published a
> -        QueueInconsistentStateError is raised containing all filenames
> -        that cannot be published.
> -
> -        This check is very similar to the one we do for source upload and
> -        was designed to prevent the creation of binary publications that
> -        will never reach the archive.
> -
> -        See bug #227184 for further details.
> -        """
> -
>     def publish(logger=None):
>         """Publish this queued source in the distroseries referred to by
>         the parent queue item.
>
> === modified file 'lib/lp/soyuz/model/queue.py'
> --- lib/lp/soyuz/model/queue.py 2010-01-10 04:58:44 +0000
> +++ lib/lp/soyuz/model/queue.py 2010-03-11 09:44:23 +0000
> @@ -217,9 +217,9 @@
>             except QueueSourceAcceptError, info:
>                 raise QueueInconsistentStateError(info)
>
> +        self._checkForBinariesinDestinationArchive(
> +            [pub.build for pub in self.builds])

It looks like we should rename self.builds here?

>         for build in self.builds:
> -     ...

Read more...

1=== modified file 'lib/lp/soyuz/model/queue.py'
2--- lib/lp/soyuz/model/queue.py 2010-03-11 09:39:39 +0000
3+++ lib/lp/soyuz/model/queue.py 2010-03-11 13:54:03 +0000
4@@ -243,37 +243,38 @@
5 if len(builds) == 0:
6 return
7
8- inner_query = """
9- SELECT DISTINCT lfa.filename
10- FROM
11- binarypackagefile bpf, binarypackagerelease bpr,
12- libraryfilealias lfa
13- WHERE
14- bpr.build IN %s
15- AND bpf.binarypackagerelease = bpr.id
16- AND bpf.libraryfile = lfa.id
17- """ % sqlvalues([build.id for build in builds])
18-
19- query = """
20- SELECT DISTINCT lfa.filename
21- FROM
22- binarypackagefile bpf, binarypackagepublishinghistory bpph,
23- distroarchseries das, distroseries ds, libraryfilealias lfa
24- WHERE
25- bpph.archive = %s
26- AND bpph.distroarchseries = das.id
27- AND das.distroseries = ds.id
28- AND ds.distribution = %s
29- AND bpph.binarypackagerelease = bpf.binarypackagerelease
30- AND bpf.libraryfile = lfa.id
31- AND lfa.filename IN (%%s)
32- """ % sqlvalues(self.archive, self.distroseries.distribution)
33- # Inject the inner query.
34- query %= inner_query
35-
36- store = IMasterStore(PackageUpload)
37- result_set = store.execute(query)
38- known_filenames = [row[0] for row in result_set.get_all()]
39+ from canonical.launchpad.database.librarian import LibraryFileAlias
40+ from lp.soyuz.model.files import BinaryPackageFile
41+ from lp.soyuz.model.binarypackagerelease import BinaryPackageRelease
42+ from lp.soyuz.model.build import Build
43+ from storm.store import Store
44+ from storm.expr import And, Select
45+ build_ids = [build.id for build in builds]
46+ lfa_filenames = Select(
47+ LibraryFileAlias.filename,
48+ And(
49+ LibraryFileAlias.id == BinaryPackageFile.libraryfileID,
50+ And(
51+ BinaryPackageFile.binarypackagerelease == BinaryPackageRelease.id,
52+ BinaryPackageRelease.buildID.is_in(build_ids)
53+ )),
54+ distinct=True)
55+
56+ from lp.soyuz.model.publishing import BinaryPackagePublishingHistory
57+ from lp.registry.model.distribution import Distribution
58+ from lp.registry.model.distroseries import DistroSeries
59+ from lp.soyuz.model.distroarchseries import DistroArchSeries
60+ published_lfas = Store.of(self).find(
61+ LibraryFileAlias,
62+ LibraryFileAlias.filename.is_in(lfa_filenames),
63+ BinaryPackageFile.libraryfile == LibraryFileAlias.id,
64+ BinaryPackagePublishingHistory.binarypackagereleaseID == BinaryPackageFile.binarypackagereleaseID,
65+ DistroSeries.distribution == self.distroseries.distribution,
66+ DistroArchSeries.distroseries == DistroSeries.id,
67+ BinaryPackagePublishingHistory.distroarchseries == DistroArchSeries.id,
68+ BinaryPackagePublishingHistory.archive == self.archive)
69+ published_lfas.config(distinct=True)
70+ known_filenames = list(published_lfas.values(LibraryFileAlias.filename))
71
72 # Do any of the files to be uploaded already exist in the destination
73 # archive?
Revision history for this message
Michael Nelson (michael.nelson) :
review: Approve
Revision history for this message
Muharem Hrnjadovic (al-maisan) wrote :
Download full text (9.0 KiB)

On 03/11/2010 03:06 PM, Michael Nelson wrote:
> Review: Approve code
>
> Wow! I'm looking forward to seeing how this improves the response time
> for uploads with lots of binaries :)
Hello Michael,

thanks very much for the review! I am quite confident that this change
will have a positive impact :)

> I've only got a few suggestions (extra comments, use of MasterStore,
> possible renaming etc.). I've also attached a diff with the equivalent
> (unformatted) storm version that works too (as I know you don't like
> to work with storm ;) ). Of course, it's up to you whether you use it
> or go with the SQL.
Hrm .. the SQL version has been running on dogfood for a few days now
and I am quite confident that it actually works..

Please see also my in-line responses below as well as the attached
incremental diff.

> Thanks!
> -Michael
>
> On Thu, Mar 11, 2010 at 10:44 AM, Muharem Hrnjadovic
> <email address hidden> wrote:
>> Muharem Hrnjadovic has proposed merging lp:~al-maisan/launchpad/unembargo-oops-526645 into lp:launchpad/devel.
>>
>> Requested reviews:
>> Canonical Launchpad Engineering (launchpad)
>> Related bugs:
>> #526645 cannot unembargo due to LP API timeouts
>> https://bugs.launchpad.net/bugs/526645
>>
>>
>> Hello there!
>>
>> The 4 OOPSs attached to the related bug indicate the optimization with the
>> highest ROI (see the top 3 queries in the "Repeated SQL Statements" section):
>> we query the database for *each* binary in order to find out whether it's
>> already published in the destination archive. That's inefficient and accounts
>> for 50% percent of the SQL time.
>>
>> The branch at hand performs the "is the binary published in the destination
>> archive already?" check in one go.
>>
>> Tests to run:
>>
>> bin/test -vv test_uploadprocessor
>>
>> No "make lint" issues.
>>
>> --
>> https://code.launchpad.net/~al-maisan/launchpad/unembargo-oops-526645/+merge/21126
>> Your team Launchpad code reviewers from Canonical is subscribed to branch lp:launchpad/devel.
>>
>> === modified file 'lib/lp/soyuz/interfaces/queue.py'
>> --- lib/lp/soyuz/interfaces/queue.py 2009-07-19 04:41:14 +0000
>> +++ lib/lp/soyuz/interfaces/queue.py 2010-03-11 09:44:23 +0000
>> @@ -411,23 +411,6 @@
>> title=_("The related build"), required=True, readonly=False,
>> )
>>
>> - def verifyBeforeAccept():
>> - """Perform overall checks before accepting a binary upload.
>> -
>> - Ensure each uploaded binary file can be published in the targeted
>> - archive.
>> -
>> - If any of the uploaded binary files are already published a
>> - QueueInconsistentStateError is raised containing all filenames
>> - that cannot be published.
>> -
>> - This check is very similar to the one we do for source upload and
>> - was designed to prevent the creation of binary publications that
>> - will never reach the archive.
>> -
>> - See bug #227184 for further details.
>> - """
>> -
>> def publish(logger=None):
>> """Publish this queued source in the distroseries referred to by
>> the parent queue item.
>>
>> === modified file 'lib/lp/soyuz/model...

Read more...

1=== modified file 'lib/lp/soyuz/model/queue.py'
2--- lib/lp/soyuz/model/queue.py 2010-03-11 09:39:39 +0000
3+++ lib/lp/soyuz/model/queue.py 2010-03-11 14:38:48 +0000
4@@ -218,10 +218,10 @@
5 raise QueueInconsistentStateError(info)
6
7 self._checkForBinariesinDestinationArchive(
8- [pub.build for pub in self.builds])
9- for build in self.builds:
10+ [queue_build.build for queue_build in self.builds])
11+ for queue_build in self.builds:
12 try:
13- build.checkComponentAndSection()
14+ queue_build.checkComponentAndSection()
15 except QueueBuildAcceptError, info:
16 raise QueueInconsistentStateError(info)
17
18@@ -243,6 +243,7 @@
19 if len(builds) == 0:
20 return
21
22+ # Collects the binary file names for all builds.
23 inner_query = """
24 SELECT DISTINCT lfa.filename
25 FROM
26@@ -254,6 +255,8 @@
27 AND bpf.libraryfile = lfa.id
28 """ % sqlvalues([build.id for build in builds])
29
30+ # Check whether any of the binary file names have already been
31+ # published in the destination archive.
32 query = """
33 SELECT DISTINCT lfa.filename
34 FROM
35@@ -271,7 +274,7 @@
36 # Inject the inner query.
37 query %= inner_query
38
39- store = IMasterStore(PackageUpload)
40+ store = Store.of(self)
41 result_set = store.execute(query)
42 known_filenames = [row[0] for row in result_set.get_all()]
43

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/lp/soyuz/interfaces/queue.py'
2--- lib/lp/soyuz/interfaces/queue.py 2009-07-19 04:41:14 +0000
3+++ lib/lp/soyuz/interfaces/queue.py 2010-03-12 16:59:28 +0000
4@@ -411,23 +411,6 @@
5 title=_("The related build"), required=True, readonly=False,
6 )
7
8- def verifyBeforeAccept():
9- """Perform overall checks before accepting a binary upload.
10-
11- Ensure each uploaded binary file can be published in the targeted
12- archive.
13-
14- If any of the uploaded binary files are already published a
15- QueueInconsistentStateError is raised containing all filenames
16- that cannot be published.
17-
18- This check is very similar to the one we do for source upload and
19- was designed to prevent the creation of binary publications that
20- will never reach the archive.
21-
22- See bug #227184 for further details.
23- """
24-
25 def publish(logger=None):
26 """Publish this queued source in the distroseries referred to by
27 the parent queue item.
28
29=== modified file 'lib/lp/soyuz/model/queue.py'
30--- lib/lp/soyuz/model/queue.py 2010-01-10 04:58:44 +0000
31+++ lib/lp/soyuz/model/queue.py 2010-03-12 16:59:28 +0000
32@@ -217,17 +217,77 @@
33 except QueueSourceAcceptError, info:
34 raise QueueInconsistentStateError(info)
35
36- for build in self.builds:
37- # as before, but for QueueBuildAcceptError
38- build.verifyBeforeAccept()
39+ self._checkForBinariesinDestinationArchive(
40+ [queue_build.build for queue_build in self.builds])
41+ for queue_build in self.builds:
42 try:
43- build.checkComponentAndSection()
44+ queue_build.checkComponentAndSection()
45 except QueueBuildAcceptError, info:
46 raise QueueInconsistentStateError(info)
47
48 # if the previous checks applied and pass we do set the value
49 self.status = PassthroughStatusValue(PackageUploadStatus.ACCEPTED)
50
51+ def _checkForBinariesinDestinationArchive(self, builds):
52+ """
53+ Check for existing binaries (in destination archive) for all binary
54+ uploads to be accepted.
55+
56+ Before accepting binary uploads we check whether any of the binaries
57+ already exists in the destination archive and raise an exception
58+ (QueueInconsistentStateError) if this is the case.
59+
60+ The only way to find pre-existing binaries is to match on binary
61+ package file names.
62+ """
63+ if len(builds) == 0:
64+ return
65+
66+ # Collects the binary file names for all builds.
67+ inner_query = """
68+ SELECT DISTINCT lfa.filename
69+ FROM
70+ binarypackagefile bpf, binarypackagerelease bpr,
71+ libraryfilealias lfa
72+ WHERE
73+ bpr.build IN %s
74+ AND bpf.binarypackagerelease = bpr.id
75+ AND bpf.libraryfile = lfa.id
76+ """ % sqlvalues([build.id for build in builds])
77+
78+ # Check whether any of the binary file names have already been
79+ # published in the destination archive.
80+ query = """
81+ SELECT DISTINCT lfa.filename
82+ FROM
83+ binarypackagefile bpf, binarypackagepublishinghistory bpph,
84+ distroarchseries das, distroseries ds, libraryfilealias lfa
85+ WHERE
86+ bpph.archive = %s
87+ AND bpph.distroarchseries = das.id
88+ AND bpph.dateremoved IS NULL
89+ AND das.distroseries = ds.id
90+ AND ds.distribution = %s
91+ AND bpph.binarypackagerelease = bpf.binarypackagerelease
92+ AND bpf.libraryfile = lfa.id
93+ AND lfa.filename IN (%%s)
94+ """ % sqlvalues(self.archive, self.distroseries.distribution)
95+ # Inject the inner query.
96+ query %= inner_query
97+
98+ store = Store.of(self)
99+ result_set = store.execute(query)
100+ known_filenames = [row[0] for row in result_set.get_all()]
101+
102+ # Do any of the files to be uploaded already exist in the destination
103+ # archive?
104+ if len(known_filenames) > 0:
105+ filename_list = "\n\t%s".join(
106+ [filename for filename in known_filenames])
107+ raise QueueInconsistentStateError(
108+ 'The following files are already published in %s:\n%s' % (
109+ self.archive.displayname, filename_list))
110+
111 def setDone(self):
112 """See `IPackageUpload`."""
113 if self.status == PackageUploadStatus.DONE:
114@@ -1334,31 +1394,6 @@
115 # guaranteed to exist in the DB. We don't care if sections are
116 # not official.
117
118- def verifyBeforeAccept(self):
119- """See `IPackageUploadBuild`."""
120- distribution = self.packageupload.distroseries.distribution
121- known_filenames = []
122- # Check if the uploaded binaries are already published in the archive.
123- for binary_package in self.build.binarypackages:
124- for binary_file in binary_package.files:
125- try:
126- published_binary = distribution.getFileByName(
127- binary_file.libraryfile.filename, source=False,
128- archive=self.packageupload.archive)
129- except NotFoundError:
130- # Only unknown files are ok.
131- continue
132-
133- known_filenames.append(binary_file.libraryfile.filename)
134-
135- # If any of the uploaded files are already present we have a problem.
136- if len(known_filenames) > 0:
137- filename_list = "\n\t%s".join(
138- [filename for filename in known_filenames])
139- raise QueueInconsistentStateError(
140- 'The following files are already published in %s:\n%s' % (
141- self.packageupload.archive.displayname, filename_list))
142-
143 def publish(self, logger=None):
144 """See `IPackageUploadBuild`."""
145 # Determine the build's architecturetag