Merge lp:~al-maisan/launchpad/unembargo-oops-526645 into lp:launchpad

Proposed by Muharem Hrnjadovic
Status: Merged
Approved by: Muharem Hrnjadovic
Approved revision: no longer in the source branch.
Merged at revision: not available
Proposed branch: lp:~al-maisan/launchpad/unembargo-oops-526645
Merge into: lp:launchpad
Diff against target: 145 lines (+64/-46)
2 files modified
lib/lp/soyuz/interfaces/queue.py (+0/-17)
lib/lp/soyuz/model/queue.py (+64/-29)
To merge this branch: bzr merge lp:~al-maisan/launchpad/unembargo-oops-526645
Reviewer Review Type Date Requested Status
Michael Nelson (community) Approve
Review via email: mp+21126@code.launchpad.net

Description of the change

Hello there!

The 4 OOPSs attached to the related bug indicate the optimization with the
highest ROI (see the top 3 queries in the "Repeated SQL Statements" section):
we query the database for *each* binary in order to find out whether it's
already published in the destination archive. That's inefficient and accounts
for 50% percent of the SQL time.

The branch at hand performs the "is the binary published in the destination
archive already?" check in one go.

Tests to run:

    bin/test -vv test_uploadprocessor

No "make lint" issues.

To post a comment you must log in.
Revision history for this message
Michael Nelson (michael.nelson) wrote :
Download full text (8.0 KiB)

Review: Approve code

Wow! I'm looking forward to seeing how this improves the response time
for uploads with lots of binaries :)

I've only got a few suggestions (extra comments, use of MasterStore,
possible renaming etc.). I've also attached a diff with the equivalent
(unformatted) storm version that works too (as I know you don't like
to work with storm ;) ). Of course, it's up to you whether you use it
or go with the SQL.

Thanks!
-Michael

On Thu, Mar 11, 2010 at 10:44 AM, Muharem Hrnjadovic
<email address hidden> wrote:
> Muharem Hrnjadovic has proposed merging lp:~al-maisan/launchpad/unembargo-oops-526645 into lp:launchpad/devel.
>
> Requested reviews:
>  Canonical Launchpad Engineering (launchpad)
> Related bugs:
>  #526645 cannot unembargo due to LP API timeouts
>  https://bugs.launchpad.net/bugs/526645
>
>
> Hello there!
>
> The 4 OOPSs attached to the related bug indicate the optimization with the
> highest ROI (see the top 3 queries in the "Repeated SQL Statements" section):
> we query the database for *each* binary in order to find out whether it's
> already published in the destination archive. That's inefficient and accounts
> for 50% percent of the SQL time.
>
> The branch at hand performs the "is the binary published in the destination
> archive already?" check in one go.
>
> Tests to run:
>
>    bin/test -vv test_uploadprocessor
>
> No "make lint" issues.
>
> --
> https://code.launchpad.net/~al-maisan/launchpad/unembargo-oops-526645/+merge/21126
> Your team Launchpad code reviewers from Canonical is subscribed to branch lp:launchpad/devel.
>
> === modified file 'lib/lp/soyuz/interfaces/queue.py'
> --- lib/lp/soyuz/interfaces/queue.py    2009-07-19 04:41:14 +0000
> +++ lib/lp/soyuz/interfaces/queue.py    2010-03-11 09:44:23 +0000
> @@ -411,23 +411,6 @@
>             title=_("The related build"), required=True, readonly=False,
>             )
>
> -    def verifyBeforeAccept():
> -        """Perform overall checks before accepting a binary upload.
> -
> -        Ensure each uploaded binary file can be published in the targeted
> -        archive.
> -
> -        If any of the uploaded binary files are already published a
> -        QueueInconsistentStateError is raised containing all filenames
> -        that cannot be published.
> -
> -        This check is very similar to the one we do for source upload and
> -        was designed to prevent the creation of binary publications that
> -        will never reach the archive.
> -
> -        See bug #227184 for further details.
> -        """
> -
>     def publish(logger=None):
>         """Publish this queued source in the distroseries referred to by
>         the parent queue item.
>
> === modified file 'lib/lp/soyuz/model/queue.py'
> --- lib/lp/soyuz/model/queue.py 2010-01-10 04:58:44 +0000
> +++ lib/lp/soyuz/model/queue.py 2010-03-11 09:44:23 +0000
> @@ -217,9 +217,9 @@
>             except QueueSourceAcceptError, info:
>                 raise QueueInconsistentStateError(info)
>
> +        self._checkForBinariesinDestinationArchive(
> +            [pub.build for pub in self.builds])

It looks like we should rename self.builds here?

>         for build in self.builds:
> -     ...

Read more...

=== modified file 'lib/lp/soyuz/model/queue.py'
--- lib/lp/soyuz/model/queue.py 2010-03-11 09:39:39 +0000
+++ lib/lp/soyuz/model/queue.py 2010-03-11 13:54:03 +0000
@@ -243,37 +243,38 @@
243 if len(builds) == 0:243 if len(builds) == 0:
244 return244 return
245245
246 inner_query = """246 from canonical.launchpad.database.librarian import LibraryFileAlias
247 SELECT DISTINCT lfa.filename247 from lp.soyuz.model.files import BinaryPackageFile
248 FROM248 from lp.soyuz.model.binarypackagerelease import BinaryPackageRelease
249 binarypackagefile bpf, binarypackagerelease bpr,249 from lp.soyuz.model.build import Build
250 libraryfilealias lfa250 from storm.store import Store
251 WHERE251 from storm.expr import And, Select
252 bpr.build IN %s252 build_ids = [build.id for build in builds]
253 AND bpf.binarypackagerelease = bpr.id253 lfa_filenames = Select(
254 AND bpf.libraryfile = lfa.id254 LibraryFileAlias.filename,
255 """ % sqlvalues([build.id for build in builds])255 And(
256256 LibraryFileAlias.id == BinaryPackageFile.libraryfileID,
257 query = """257 And(
258 SELECT DISTINCT lfa.filename258 BinaryPackageFile.binarypackagerelease == BinaryPackageRelease.id,
259 FROM259 BinaryPackageRelease.buildID.is_in(build_ids)
260 binarypackagefile bpf, binarypackagepublishinghistory bpph,260 )),
261 distroarchseries das, distroseries ds, libraryfilealias lfa261 distinct=True)
262 WHERE262
263 bpph.archive = %s263 from lp.soyuz.model.publishing import BinaryPackagePublishingHistory
264 AND bpph.distroarchseries = das.id264 from lp.registry.model.distribution import Distribution
265 AND das.distroseries = ds.id265 from lp.registry.model.distroseries import DistroSeries
266 AND ds.distribution = %s266 from lp.soyuz.model.distroarchseries import DistroArchSeries
267 AND bpph.binarypackagerelease = bpf.binarypackagerelease267 published_lfas = Store.of(self).find(
268 AND bpf.libraryfile = lfa.id268 LibraryFileAlias,
269 AND lfa.filename IN (%%s)269 LibraryFileAlias.filename.is_in(lfa_filenames),
270 """ % sqlvalues(self.archive, self.distroseries.distribution)270 BinaryPackageFile.libraryfile == LibraryFileAlias.id,
271 # Inject the inner query.271 BinaryPackagePublishingHistory.binarypackagereleaseID == BinaryPackageFile.binarypackagereleaseID,
272 query %= inner_query272 DistroSeries.distribution == self.distroseries.distribution,
273273 DistroArchSeries.distroseries == DistroSeries.id,
274 store = IMasterStore(PackageUpload)274 BinaryPackagePublishingHistory.distroarchseries == DistroArchSeries.id,
275 result_set = store.execute(query)275 BinaryPackagePublishingHistory.archive == self.archive)
276 known_filenames = [row[0] for row in result_set.get_all()]276 published_lfas.config(distinct=True)
277 known_filenames = list(published_lfas.values(LibraryFileAlias.filename))
277278
278 # Do any of the files to be uploaded already exist in the destination279 # Do any of the files to be uploaded already exist in the destination
279 # archive?280 # archive?
Revision history for this message
Michael Nelson (michael.nelson) :
review: Approve
Revision history for this message
Muharem Hrnjadovic (al-maisan) wrote :
Download full text (9.0 KiB)

On 03/11/2010 03:06 PM, Michael Nelson wrote:
> Review: Approve code
>
> Wow! I'm looking forward to seeing how this improves the response time
> for uploads with lots of binaries :)
Hello Michael,

thanks very much for the review! I am quite confident that this change
will have a positive impact :)

> I've only got a few suggestions (extra comments, use of MasterStore,
> possible renaming etc.). I've also attached a diff with the equivalent
> (unformatted) storm version that works too (as I know you don't like
> to work with storm ;) ). Of course, it's up to you whether you use it
> or go with the SQL.
Hrm .. the SQL version has been running on dogfood for a few days now
and I am quite confident that it actually works..

Please see also my in-line responses below as well as the attached
incremental diff.

> Thanks!
> -Michael
>
> On Thu, Mar 11, 2010 at 10:44 AM, Muharem Hrnjadovic
> <email address hidden> wrote:
>> Muharem Hrnjadovic has proposed merging lp:~al-maisan/launchpad/unembargo-oops-526645 into lp:launchpad/devel.
>>
>> Requested reviews:
>> Canonical Launchpad Engineering (launchpad)
>> Related bugs:
>> #526645 cannot unembargo due to LP API timeouts
>> https://bugs.launchpad.net/bugs/526645
>>
>>
>> Hello there!
>>
>> The 4 OOPSs attached to the related bug indicate the optimization with the
>> highest ROI (see the top 3 queries in the "Repeated SQL Statements" section):
>> we query the database for *each* binary in order to find out whether it's
>> already published in the destination archive. That's inefficient and accounts
>> for 50% percent of the SQL time.
>>
>> The branch at hand performs the "is the binary published in the destination
>> archive already?" check in one go.
>>
>> Tests to run:
>>
>> bin/test -vv test_uploadprocessor
>>
>> No "make lint" issues.
>>
>> --
>> https://code.launchpad.net/~al-maisan/launchpad/unembargo-oops-526645/+merge/21126
>> Your team Launchpad code reviewers from Canonical is subscribed to branch lp:launchpad/devel.
>>
>> === modified file 'lib/lp/soyuz/interfaces/queue.py'
>> --- lib/lp/soyuz/interfaces/queue.py 2009-07-19 04:41:14 +0000
>> +++ lib/lp/soyuz/interfaces/queue.py 2010-03-11 09:44:23 +0000
>> @@ -411,23 +411,6 @@
>> title=_("The related build"), required=True, readonly=False,
>> )
>>
>> - def verifyBeforeAccept():
>> - """Perform overall checks before accepting a binary upload.
>> -
>> - Ensure each uploaded binary file can be published in the targeted
>> - archive.
>> -
>> - If any of the uploaded binary files are already published a
>> - QueueInconsistentStateError is raised containing all filenames
>> - that cannot be published.
>> -
>> - This check is very similar to the one we do for source upload and
>> - was designed to prevent the creation of binary publications that
>> - will never reach the archive.
>> -
>> - See bug #227184 for further details.
>> - """
>> -
>> def publish(logger=None):
>> """Publish this queued source in the distroseries referred to by
>> the parent queue item.
>>
>> === modified file 'lib/lp/soyuz/model...

Read more...

=== modified file 'lib/lp/soyuz/model/queue.py'
--- lib/lp/soyuz/model/queue.py 2010-03-11 09:39:39 +0000
+++ lib/lp/soyuz/model/queue.py 2010-03-11 14:38:48 +0000
@@ -218,10 +218,10 @@
218 raise QueueInconsistentStateError(info)218 raise QueueInconsistentStateError(info)
219219
220 self._checkForBinariesinDestinationArchive(220 self._checkForBinariesinDestinationArchive(
221 [pub.build for pub in self.builds])221 [queue_build.build for queue_build in self.builds])
222 for build in self.builds:222 for queue_build in self.builds:
223 try:223 try:
224 build.checkComponentAndSection()224 queue_build.checkComponentAndSection()
225 except QueueBuildAcceptError, info:225 except QueueBuildAcceptError, info:
226 raise QueueInconsistentStateError(info)226 raise QueueInconsistentStateError(info)
227227
@@ -243,6 +243,7 @@
243 if len(builds) == 0:243 if len(builds) == 0:
244 return244 return
245245
246 # Collects the binary file names for all builds.
246 inner_query = """247 inner_query = """
247 SELECT DISTINCT lfa.filename248 SELECT DISTINCT lfa.filename
248 FROM249 FROM
@@ -254,6 +255,8 @@
254 AND bpf.libraryfile = lfa.id255 AND bpf.libraryfile = lfa.id
255 """ % sqlvalues([build.id for build in builds])256 """ % sqlvalues([build.id for build in builds])
256257
258 # Check whether any of the binary file names have already been
259 # published in the destination archive.
257 query = """260 query = """
258 SELECT DISTINCT lfa.filename261 SELECT DISTINCT lfa.filename
259 FROM262 FROM
@@ -271,7 +274,7 @@
271 # Inject the inner query.274 # Inject the inner query.
272 query %= inner_query275 query %= inner_query
273276
274 store = IMasterStore(PackageUpload)277 store = Store.of(self)
275 result_set = store.execute(query)278 result_set = store.execute(query)
276 known_filenames = [row[0] for row in result_set.get_all()]279 known_filenames = [row[0] for row in result_set.get_all()]
277280

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'lib/lp/soyuz/interfaces/queue.py'
--- lib/lp/soyuz/interfaces/queue.py 2009-07-19 04:41:14 +0000
+++ lib/lp/soyuz/interfaces/queue.py 2010-03-12 16:59:28 +0000
@@ -411,23 +411,6 @@
411 title=_("The related build"), required=True, readonly=False,411 title=_("The related build"), required=True, readonly=False,
412 )412 )
413413
414 def verifyBeforeAccept():
415 """Perform overall checks before accepting a binary upload.
416
417 Ensure each uploaded binary file can be published in the targeted
418 archive.
419
420 If any of the uploaded binary files are already published a
421 QueueInconsistentStateError is raised containing all filenames
422 that cannot be published.
423
424 This check is very similar to the one we do for source upload and
425 was designed to prevent the creation of binary publications that
426 will never reach the archive.
427
428 See bug #227184 for further details.
429 """
430
431 def publish(logger=None):414 def publish(logger=None):
432 """Publish this queued source in the distroseries referred to by415 """Publish this queued source in the distroseries referred to by
433 the parent queue item.416 the parent queue item.
434417
=== modified file 'lib/lp/soyuz/model/queue.py'
--- lib/lp/soyuz/model/queue.py 2010-01-10 04:58:44 +0000
+++ lib/lp/soyuz/model/queue.py 2010-03-12 16:59:28 +0000
@@ -217,17 +217,77 @@
217 except QueueSourceAcceptError, info:217 except QueueSourceAcceptError, info:
218 raise QueueInconsistentStateError(info)218 raise QueueInconsistentStateError(info)
219219
220 for build in self.builds:220 self._checkForBinariesinDestinationArchive(
221 # as before, but for QueueBuildAcceptError221 [queue_build.build for queue_build in self.builds])
222 build.verifyBeforeAccept()222 for queue_build in self.builds:
223 try:223 try:
224 build.checkComponentAndSection()224 queue_build.checkComponentAndSection()
225 except QueueBuildAcceptError, info:225 except QueueBuildAcceptError, info:
226 raise QueueInconsistentStateError(info)226 raise QueueInconsistentStateError(info)
227227
228 # if the previous checks applied and pass we do set the value228 # if the previous checks applied and pass we do set the value
229 self.status = PassthroughStatusValue(PackageUploadStatus.ACCEPTED)229 self.status = PassthroughStatusValue(PackageUploadStatus.ACCEPTED)
230230
231 def _checkForBinariesinDestinationArchive(self, builds):
232 """
233 Check for existing binaries (in destination archive) for all binary
234 uploads to be accepted.
235
236 Before accepting binary uploads we check whether any of the binaries
237 already exists in the destination archive and raise an exception
238 (QueueInconsistentStateError) if this is the case.
239
240 The only way to find pre-existing binaries is to match on binary
241 package file names.
242 """
243 if len(builds) == 0:
244 return
245
246 # Collects the binary file names for all builds.
247 inner_query = """
248 SELECT DISTINCT lfa.filename
249 FROM
250 binarypackagefile bpf, binarypackagerelease bpr,
251 libraryfilealias lfa
252 WHERE
253 bpr.build IN %s
254 AND bpf.binarypackagerelease = bpr.id
255 AND bpf.libraryfile = lfa.id
256 """ % sqlvalues([build.id for build in builds])
257
258 # Check whether any of the binary file names have already been
259 # published in the destination archive.
260 query = """
261 SELECT DISTINCT lfa.filename
262 FROM
263 binarypackagefile bpf, binarypackagepublishinghistory bpph,
264 distroarchseries das, distroseries ds, libraryfilealias lfa
265 WHERE
266 bpph.archive = %s
267 AND bpph.distroarchseries = das.id
268 AND bpph.dateremoved IS NULL
269 AND das.distroseries = ds.id
270 AND ds.distribution = %s
271 AND bpph.binarypackagerelease = bpf.binarypackagerelease
272 AND bpf.libraryfile = lfa.id
273 AND lfa.filename IN (%%s)
274 """ % sqlvalues(self.archive, self.distroseries.distribution)
275 # Inject the inner query.
276 query %= inner_query
277
278 store = Store.of(self)
279 result_set = store.execute(query)
280 known_filenames = [row[0] for row in result_set.get_all()]
281
282 # Do any of the files to be uploaded already exist in the destination
283 # archive?
284 if len(known_filenames) > 0:
285 filename_list = "\n\t%s".join(
286 [filename for filename in known_filenames])
287 raise QueueInconsistentStateError(
288 'The following files are already published in %s:\n%s' % (
289 self.archive.displayname, filename_list))
290
231 def setDone(self):291 def setDone(self):
232 """See `IPackageUpload`."""292 """See `IPackageUpload`."""
233 if self.status == PackageUploadStatus.DONE:293 if self.status == PackageUploadStatus.DONE:
@@ -1334,31 +1394,6 @@
1334 # guaranteed to exist in the DB. We don't care if sections are1394 # guaranteed to exist in the DB. We don't care if sections are
1335 # not official.1395 # not official.
13361396
1337 def verifyBeforeAccept(self):
1338 """See `IPackageUploadBuild`."""
1339 distribution = self.packageupload.distroseries.distribution
1340 known_filenames = []
1341 # Check if the uploaded binaries are already published in the archive.
1342 for binary_package in self.build.binarypackages:
1343 for binary_file in binary_package.files:
1344 try:
1345 published_binary = distribution.getFileByName(
1346 binary_file.libraryfile.filename, source=False,
1347 archive=self.packageupload.archive)
1348 except NotFoundError:
1349 # Only unknown files are ok.
1350 continue
1351
1352 known_filenames.append(binary_file.libraryfile.filename)
1353
1354 # If any of the uploaded files are already present we have a problem.
1355 if len(known_filenames) > 0:
1356 filename_list = "\n\t%s".join(
1357 [filename for filename in known_filenames])
1358 raise QueueInconsistentStateError(
1359 'The following files are already published in %s:\n%s' % (
1360 self.packageupload.archive.displayname, filename_list))
1361
1362 def publish(self, logger=None):1397 def publish(self, logger=None):
1363 """See `IPackageUploadBuild`."""1398 """See `IPackageUploadBuild`."""
1364 # Determine the build's architecturetag1399 # Determine the build's architecturetag