Merge lp:~al-maisan/launchpad/unembargo-oops-526645 into lp:launchpad
- unembargo-oops-526645
- Merge into devel
Status: | Merged | ||||
---|---|---|---|---|---|
Approved by: | Muharem Hrnjadovic | ||||
Approved revision: | no longer in the source branch. | ||||
Merged at revision: | not available | ||||
Proposed branch: | lp:~al-maisan/launchpad/unembargo-oops-526645 | ||||
Merge into: | lp:launchpad | ||||
Diff against target: |
145 lines (+64/-46) 2 files modified
lib/lp/soyuz/interfaces/queue.py (+0/-17) lib/lp/soyuz/model/queue.py (+64/-29) |
||||
To merge this branch: | bzr merge lp:~al-maisan/launchpad/unembargo-oops-526645 | ||||
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Michael Nelson (community) | Approve | ||
Review via email: mp+21126@code.launchpad.net |
Commit message
Description of the change
Hello there!
The 4 OOPSs attached to the related bug indicate the optimization with the
highest ROI (see the top 3 queries in the "Repeated SQL Statements" section):
we query the database for *each* binary in order to find out whether it's
already published in the destination archive. That's inefficient and accounts
for 50% percent of the SQL time.
The branch at hand performs the "is the binary published in the destination
archive already?" check in one go.
Tests to run:
bin/test -vv test_uploadproc
No "make lint" issues.
Michael Nelson (michael.nelson) wrote : | # |
1 | === modified file 'lib/lp/soyuz/model/queue.py' | |||
2 | --- lib/lp/soyuz/model/queue.py 2010-03-11 09:39:39 +0000 | |||
3 | +++ lib/lp/soyuz/model/queue.py 2010-03-11 13:54:03 +0000 | |||
4 | @@ -243,37 +243,38 @@ | |||
5 | 243 | if len(builds) == 0: | 243 | if len(builds) == 0: |
6 | 244 | return | 244 | return |
7 | 245 | 245 | ||
39 | 246 | inner_query = """ | 246 | from canonical.launchpad.database.librarian import LibraryFileAlias |
40 | 247 | SELECT DISTINCT lfa.filename | 247 | from lp.soyuz.model.files import BinaryPackageFile |
41 | 248 | FROM | 248 | from lp.soyuz.model.binarypackagerelease import BinaryPackageRelease |
42 | 249 | binarypackagefile bpf, binarypackagerelease bpr, | 249 | from lp.soyuz.model.build import Build |
43 | 250 | libraryfilealias lfa | 250 | from storm.store import Store |
44 | 251 | WHERE | 251 | from storm.expr import And, Select |
45 | 252 | bpr.build IN %s | 252 | build_ids = [build.id for build in builds] |
46 | 253 | AND bpf.binarypackagerelease = bpr.id | 253 | lfa_filenames = Select( |
47 | 254 | AND bpf.libraryfile = lfa.id | 254 | LibraryFileAlias.filename, |
48 | 255 | """ % sqlvalues([build.id for build in builds]) | 255 | And( |
49 | 256 | 256 | LibraryFileAlias.id == BinaryPackageFile.libraryfileID, | |
50 | 257 | query = """ | 257 | And( |
51 | 258 | SELECT DISTINCT lfa.filename | 258 | BinaryPackageFile.binarypackagerelease == BinaryPackageRelease.id, |
52 | 259 | FROM | 259 | BinaryPackageRelease.buildID.is_in(build_ids) |
53 | 260 | binarypackagefile bpf, binarypackagepublishinghistory bpph, | 260 | )), |
54 | 261 | distroarchseries das, distroseries ds, libraryfilealias lfa | 261 | distinct=True) |
55 | 262 | WHERE | 262 | |
56 | 263 | bpph.archive = %s | 263 | from lp.soyuz.model.publishing import BinaryPackagePublishingHistory |
57 | 264 | AND bpph.distroarchseries = das.id | 264 | from lp.registry.model.distribution import Distribution |
58 | 265 | AND das.distroseries = ds.id | 265 | from lp.registry.model.distroseries import DistroSeries |
59 | 266 | AND ds.distribution = %s | 266 | from lp.soyuz.model.distroarchseries import DistroArchSeries |
60 | 267 | AND bpph.binarypackagerelease = bpf.binarypackagerelease | 267 | published_lfas = Store.of(self).find( |
61 | 268 | AND bpf.libraryfile = lfa.id | 268 | LibraryFileAlias, |
62 | 269 | AND lfa.filename IN (%%s) | 269 | LibraryFileAlias.filename.is_in(lfa_filenames), |
63 | 270 | """ % sqlvalues(self.archive, self.distroseries.distribution) | 270 | BinaryPackageFile.libraryfile == LibraryFileAlias.id, |
64 | 271 | # Inject the inner query. | 271 | BinaryPackagePublishingHistory.binarypackagereleaseID == BinaryPackageFile.binarypackagereleaseID, |
65 | 272 | query %= inner_query | 272 | DistroSeries.distribution == self.distroseries.distribution, |
66 | 273 | 273 | DistroArchSeries.distroseries == DistroSeries.id, | |
67 | 274 | store = IMasterStore(PackageUpload) | 274 | BinaryPackagePublishingHistory.distroarchseries == DistroArchSeries.id, |
68 | 275 | result_set = store.execute(query) | 275 | BinaryPackagePublishingHistory.archive == self.archive) |
69 | 276 | known_filenames = [row[0] for row in result_set.get_all()] | 276 | published_lfas.config(distinct=True) |
70 | 277 | known_filenames = list(published_lfas.values(LibraryFileAlias.filename)) | ||
71 | 277 | 278 | ||
72 | 278 | # Do any of the files to be uploaded already exist in the destination | 279 | # Do any of the files to be uploaded already exist in the destination |
73 | 279 | # archive? | 280 | # archive? |
Michael Nelson (michael.nelson) : | # |
Muharem Hrnjadovic (al-maisan) wrote : | # |
On 03/11/2010 03:06 PM, Michael Nelson wrote:
> Review: Approve code
>
> Wow! I'm looking forward to seeing how this improves the response time
> for uploads with lots of binaries :)
Hello Michael,
thanks very much for the review! I am quite confident that this change
will have a positive impact :)
> I've only got a few suggestions (extra comments, use of MasterStore,
> possible renaming etc.). I've also attached a diff with the equivalent
> (unformatted) storm version that works too (as I know you don't like
> to work with storm ;) ). Of course, it's up to you whether you use it
> or go with the SQL.
Hrm .. the SQL version has been running on dogfood for a few days now
and I am quite confident that it actually works..
Please see also my in-line responses below as well as the attached
incremental diff.
> Thanks!
> -Michael
>
> On Thu, Mar 11, 2010 at 10:44 AM, Muharem Hrnjadovic
> <email address hidden> wrote:
>> Muharem Hrnjadovic has proposed merging lp:~al-maisan/launchpad/unembargo-oops-526645 into lp:launchpad/devel.
>>
>> Requested reviews:
>> Canonical Launchpad Engineering (launchpad)
>> Related bugs:
>> #526645 cannot unembargo due to LP API timeouts
>> https:/
>>
>>
>> Hello there!
>>
>> The 4 OOPSs attached to the related bug indicate the optimization with the
>> highest ROI (see the top 3 queries in the "Repeated SQL Statements" section):
>> we query the database for *each* binary in order to find out whether it's
>> already published in the destination archive. That's inefficient and accounts
>> for 50% percent of the SQL time.
>>
>> The branch at hand performs the "is the binary published in the destination
>> archive already?" check in one go.
>>
>> Tests to run:
>>
>> bin/test -vv test_uploadproc
>>
>> No "make lint" issues.
>>
>> --
>> https:/
>> Your team Launchpad code reviewers from Canonical is subscribed to branch lp:launchpad/devel.
>>
>> === modified file 'lib/lp/
>> --- lib/lp/
>> +++ lib/lp/
>> @@ -411,23 +411,6 @@
>> title=_("The related build"), required=True, readonly=False,
>> )
>>
>> - def verifyBeforeAcc
>> - """Perform overall checks before accepting a binary upload.
>> -
>> - Ensure each uploaded binary file can be published in the targeted
>> - archive.
>> -
>> - If any of the uploaded binary files are already published a
>> - QueueInconsiste
>> - that cannot be published.
>> -
>> - This check is very similar to the one we do for source upload and
>> - was designed to prevent the creation of binary publications that
>> - will never reach the archive.
>> -
>> - See bug #227184 for further details.
>> - """
>> -
>> def publish(
>> """Publish this queued source in the distroseries referred to by
>> the parent queue item.
>>
>> === modified file 'lib/lp/
1 | === modified file 'lib/lp/soyuz/model/queue.py' | |||
2 | --- lib/lp/soyuz/model/queue.py 2010-03-11 09:39:39 +0000 | |||
3 | +++ lib/lp/soyuz/model/queue.py 2010-03-11 14:38:48 +0000 | |||
4 | @@ -218,10 +218,10 @@ | |||
5 | 218 | raise QueueInconsistentStateError(info) | 218 | raise QueueInconsistentStateError(info) |
6 | 219 | 219 | ||
7 | 220 | self._checkForBinariesinDestinationArchive( | 220 | self._checkForBinariesinDestinationArchive( |
10 | 221 | [pub.build for pub in self.builds]) | 221 | [queue_build.build for queue_build in self.builds]) |
11 | 222 | for build in self.builds: | 222 | for queue_build in self.builds: |
12 | 223 | try: | 223 | try: |
14 | 224 | build.checkComponentAndSection() | 224 | queue_build.checkComponentAndSection() |
15 | 225 | except QueueBuildAcceptError, info: | 225 | except QueueBuildAcceptError, info: |
16 | 226 | raise QueueInconsistentStateError(info) | 226 | raise QueueInconsistentStateError(info) |
17 | 227 | 227 | ||
18 | @@ -243,6 +243,7 @@ | |||
19 | 243 | if len(builds) == 0: | 243 | if len(builds) == 0: |
20 | 244 | return | 244 | return |
21 | 245 | 245 | ||
22 | 246 | # Collects the binary file names for all builds. | ||
23 | 246 | inner_query = """ | 247 | inner_query = """ |
24 | 247 | SELECT DISTINCT lfa.filename | 248 | SELECT DISTINCT lfa.filename |
25 | 248 | FROM | 249 | FROM |
26 | @@ -254,6 +255,8 @@ | |||
27 | 254 | AND bpf.libraryfile = lfa.id | 255 | AND bpf.libraryfile = lfa.id |
28 | 255 | """ % sqlvalues([build.id for build in builds]) | 256 | """ % sqlvalues([build.id for build in builds]) |
29 | 256 | 257 | ||
30 | 258 | # Check whether any of the binary file names have already been | ||
31 | 259 | # published in the destination archive. | ||
32 | 257 | query = """ | 260 | query = """ |
33 | 258 | SELECT DISTINCT lfa.filename | 261 | SELECT DISTINCT lfa.filename |
34 | 259 | FROM | 262 | FROM |
35 | @@ -271,7 +274,7 @@ | |||
36 | 271 | # Inject the inner query. | 274 | # Inject the inner query. |
37 | 272 | query %= inner_query | 275 | query %= inner_query |
38 | 273 | 276 | ||
40 | 274 | store = IMasterStore(PackageUpload) | 277 | store = Store.of(self) |
41 | 275 | result_set = store.execute(query) | 278 | result_set = store.execute(query) |
42 | 276 | known_filenames = [row[0] for row in result_set.get_all()] | 279 | known_filenames = [row[0] for row in result_set.get_all()] |
43 | 277 | 280 |
Preview Diff
1 | === modified file 'lib/lp/soyuz/interfaces/queue.py' | |||
2 | --- lib/lp/soyuz/interfaces/queue.py 2009-07-19 04:41:14 +0000 | |||
3 | +++ lib/lp/soyuz/interfaces/queue.py 2010-03-12 16:59:28 +0000 | |||
4 | @@ -411,23 +411,6 @@ | |||
5 | 411 | title=_("The related build"), required=True, readonly=False, | 411 | title=_("The related build"), required=True, readonly=False, |
6 | 412 | ) | 412 | ) |
7 | 413 | 413 | ||
8 | 414 | def verifyBeforeAccept(): | ||
9 | 415 | """Perform overall checks before accepting a binary upload. | ||
10 | 416 | |||
11 | 417 | Ensure each uploaded binary file can be published in the targeted | ||
12 | 418 | archive. | ||
13 | 419 | |||
14 | 420 | If any of the uploaded binary files are already published a | ||
15 | 421 | QueueInconsistentStateError is raised containing all filenames | ||
16 | 422 | that cannot be published. | ||
17 | 423 | |||
18 | 424 | This check is very similar to the one we do for source upload and | ||
19 | 425 | was designed to prevent the creation of binary publications that | ||
20 | 426 | will never reach the archive. | ||
21 | 427 | |||
22 | 428 | See bug #227184 for further details. | ||
23 | 429 | """ | ||
24 | 430 | |||
25 | 431 | def publish(logger=None): | 414 | def publish(logger=None): |
26 | 432 | """Publish this queued source in the distroseries referred to by | 415 | """Publish this queued source in the distroseries referred to by |
27 | 433 | the parent queue item. | 416 | the parent queue item. |
28 | 434 | 417 | ||
29 | === modified file 'lib/lp/soyuz/model/queue.py' | |||
30 | --- lib/lp/soyuz/model/queue.py 2010-01-10 04:58:44 +0000 | |||
31 | +++ lib/lp/soyuz/model/queue.py 2010-03-12 16:59:28 +0000 | |||
32 | @@ -217,17 +217,77 @@ | |||
33 | 217 | except QueueSourceAcceptError, info: | 217 | except QueueSourceAcceptError, info: |
34 | 218 | raise QueueInconsistentStateError(info) | 218 | raise QueueInconsistentStateError(info) |
35 | 219 | 219 | ||
39 | 220 | for build in self.builds: | 220 | self._checkForBinariesinDestinationArchive( |
40 | 221 | # as before, but for QueueBuildAcceptError | 221 | [queue_build.build for queue_build in self.builds]) |
41 | 222 | build.verifyBeforeAccept() | 222 | for queue_build in self.builds: |
42 | 223 | try: | 223 | try: |
44 | 224 | build.checkComponentAndSection() | 224 | queue_build.checkComponentAndSection() |
45 | 225 | except QueueBuildAcceptError, info: | 225 | except QueueBuildAcceptError, info: |
46 | 226 | raise QueueInconsistentStateError(info) | 226 | raise QueueInconsistentStateError(info) |
47 | 227 | 227 | ||
48 | 228 | # if the previous checks applied and pass we do set the value | 228 | # if the previous checks applied and pass we do set the value |
49 | 229 | self.status = PassthroughStatusValue(PackageUploadStatus.ACCEPTED) | 229 | self.status = PassthroughStatusValue(PackageUploadStatus.ACCEPTED) |
50 | 230 | 230 | ||
51 | 231 | def _checkForBinariesinDestinationArchive(self, builds): | ||
52 | 232 | """ | ||
53 | 233 | Check for existing binaries (in destination archive) for all binary | ||
54 | 234 | uploads to be accepted. | ||
55 | 235 | |||
56 | 236 | Before accepting binary uploads we check whether any of the binaries | ||
57 | 237 | already exists in the destination archive and raise an exception | ||
58 | 238 | (QueueInconsistentStateError) if this is the case. | ||
59 | 239 | |||
60 | 240 | The only way to find pre-existing binaries is to match on binary | ||
61 | 241 | package file names. | ||
62 | 242 | """ | ||
63 | 243 | if len(builds) == 0: | ||
64 | 244 | return | ||
65 | 245 | |||
66 | 246 | # Collects the binary file names for all builds. | ||
67 | 247 | inner_query = """ | ||
68 | 248 | SELECT DISTINCT lfa.filename | ||
69 | 249 | FROM | ||
70 | 250 | binarypackagefile bpf, binarypackagerelease bpr, | ||
71 | 251 | libraryfilealias lfa | ||
72 | 252 | WHERE | ||
73 | 253 | bpr.build IN %s | ||
74 | 254 | AND bpf.binarypackagerelease = bpr.id | ||
75 | 255 | AND bpf.libraryfile = lfa.id | ||
76 | 256 | """ % sqlvalues([build.id for build in builds]) | ||
77 | 257 | |||
78 | 258 | # Check whether any of the binary file names have already been | ||
79 | 259 | # published in the destination archive. | ||
80 | 260 | query = """ | ||
81 | 261 | SELECT DISTINCT lfa.filename | ||
82 | 262 | FROM | ||
83 | 263 | binarypackagefile bpf, binarypackagepublishinghistory bpph, | ||
84 | 264 | distroarchseries das, distroseries ds, libraryfilealias lfa | ||
85 | 265 | WHERE | ||
86 | 266 | bpph.archive = %s | ||
87 | 267 | AND bpph.distroarchseries = das.id | ||
88 | 268 | AND bpph.dateremoved IS NULL | ||
89 | 269 | AND das.distroseries = ds.id | ||
90 | 270 | AND ds.distribution = %s | ||
91 | 271 | AND bpph.binarypackagerelease = bpf.binarypackagerelease | ||
92 | 272 | AND bpf.libraryfile = lfa.id | ||
93 | 273 | AND lfa.filename IN (%%s) | ||
94 | 274 | """ % sqlvalues(self.archive, self.distroseries.distribution) | ||
95 | 275 | # Inject the inner query. | ||
96 | 276 | query %= inner_query | ||
97 | 277 | |||
98 | 278 | store = Store.of(self) | ||
99 | 279 | result_set = store.execute(query) | ||
100 | 280 | known_filenames = [row[0] for row in result_set.get_all()] | ||
101 | 281 | |||
102 | 282 | # Do any of the files to be uploaded already exist in the destination | ||
103 | 283 | # archive? | ||
104 | 284 | if len(known_filenames) > 0: | ||
105 | 285 | filename_list = "\n\t%s".join( | ||
106 | 286 | [filename for filename in known_filenames]) | ||
107 | 287 | raise QueueInconsistentStateError( | ||
108 | 288 | 'The following files are already published in %s:\n%s' % ( | ||
109 | 289 | self.archive.displayname, filename_list)) | ||
110 | 290 | |||
111 | 231 | def setDone(self): | 291 | def setDone(self): |
112 | 232 | """See `IPackageUpload`.""" | 292 | """See `IPackageUpload`.""" |
113 | 233 | if self.status == PackageUploadStatus.DONE: | 293 | if self.status == PackageUploadStatus.DONE: |
114 | @@ -1334,31 +1394,6 @@ | |||
115 | 1334 | # guaranteed to exist in the DB. We don't care if sections are | 1394 | # guaranteed to exist in the DB. We don't care if sections are |
116 | 1335 | # not official. | 1395 | # not official. |
117 | 1336 | 1396 | ||
118 | 1337 | def verifyBeforeAccept(self): | ||
119 | 1338 | """See `IPackageUploadBuild`.""" | ||
120 | 1339 | distribution = self.packageupload.distroseries.distribution | ||
121 | 1340 | known_filenames = [] | ||
122 | 1341 | # Check if the uploaded binaries are already published in the archive. | ||
123 | 1342 | for binary_package in self.build.binarypackages: | ||
124 | 1343 | for binary_file in binary_package.files: | ||
125 | 1344 | try: | ||
126 | 1345 | published_binary = distribution.getFileByName( | ||
127 | 1346 | binary_file.libraryfile.filename, source=False, | ||
128 | 1347 | archive=self.packageupload.archive) | ||
129 | 1348 | except NotFoundError: | ||
130 | 1349 | # Only unknown files are ok. | ||
131 | 1350 | continue | ||
132 | 1351 | |||
133 | 1352 | known_filenames.append(binary_file.libraryfile.filename) | ||
134 | 1353 | |||
135 | 1354 | # If any of the uploaded files are already present we have a problem. | ||
136 | 1355 | if len(known_filenames) > 0: | ||
137 | 1356 | filename_list = "\n\t%s".join( | ||
138 | 1357 | [filename for filename in known_filenames]) | ||
139 | 1358 | raise QueueInconsistentStateError( | ||
140 | 1359 | 'The following files are already published in %s:\n%s' % ( | ||
141 | 1360 | self.packageupload.archive.displayname, filename_list)) | ||
142 | 1361 | |||
143 | 1362 | def publish(self, logger=None): | 1397 | def publish(self, logger=None): |
144 | 1363 | """See `IPackageUploadBuild`.""" | 1398 | """See `IPackageUploadBuild`.""" |
145 | 1364 | # Determine the build's architecturetag | 1399 | # Determine the build's architecturetag |
Review: Approve code
Wow! I'm looking forward to seeing how this improves the response time
for uploads with lots of binaries :)
I've only got a few suggestions (extra comments, use of MasterStore,
possible renaming etc.). I've also attached a diff with the equivalent
(unformatted) storm version that works too (as I know you don't like
to work with storm ;) ). Of course, it's up to you whether you use it
or go with the SQL.
Thanks!
-Michael
On Thu, Mar 11, 2010 at 10:44 AM, Muharem Hrnjadovic /bugs.launchpad .net/bugs/ 526645 essor /code.launchpad .net/~al- maisan/ launchpad/ unembargo- oops-526645/ +merge/ 21126 soyuz/interface s/queue. py' soyuz/interface s/queue. py 2009-07-19 04:41:14 +0000 soyuz/interface s/queue. py 2010-03-11 09:44:23 +0000 ept(): ntStateError is raised containing all filenames logger= None): soyuz/model/ queue.py' soyuz/model/ queue.py 2010-01-10 04:58:44 +0000 soyuz/model/ queue.py 2010-03-11 09:44:23 +0000 ptError, info: ntStateError( info) inariesinDestin ationArchive(
<email address hidden> wrote:
> Muharem Hrnjadovic has proposed merging lp:~al-maisan/launchpad/unembargo-oops-526645 into lp:launchpad/devel.
>
> Requested reviews:
> Canonical Launchpad Engineering (launchpad)
> Related bugs:
> #526645 cannot unembargo due to LP API timeouts
> https:/
>
>
> Hello there!
>
> The 4 OOPSs attached to the related bug indicate the optimization with the
> highest ROI (see the top 3 queries in the "Repeated SQL Statements" section):
> we query the database for *each* binary in order to find out whether it's
> already published in the destination archive. That's inefficient and accounts
> for 50% percent of the SQL time.
>
> The branch at hand performs the "is the binary published in the destination
> archive already?" check in one go.
>
> Tests to run:
>
> bin/test -vv test_uploadproc
>
> No "make lint" issues.
>
> --
> https:/
> Your team Launchpad code reviewers from Canonical is subscribed to branch lp:launchpad/devel.
>
> === modified file 'lib/lp/
> --- lib/lp/
> +++ lib/lp/
> @@ -411,23 +411,6 @@
> title=_("The related build"), required=True, readonly=False,
> )
>
> - def verifyBeforeAcc
> - """Perform overall checks before accepting a binary upload.
> -
> - Ensure each uploaded binary file can be published in the targeted
> - archive.
> -
> - If any of the uploaded binary files are already published a
> - QueueInconsiste
> - that cannot be published.
> -
> - This check is very similar to the one we do for source upload and
> - was designed to prevent the creation of binary publications that
> - will never reach the archive.
> -
> - See bug #227184 for further details.
> - """
> -
> def publish(
> """Publish this queued source in the distroseries referred to by
> the parent queue item.
>
> === modified file 'lib/lp/
> --- lib/lp/
> +++ lib/lp/
> @@ -217,9 +217,9 @@
> except QueueSourceAcce
> raise QueueInconsiste
>
> + self._checkForB
> + [pub.build for pub in self.builds])
It looks like we should rename self.builds here?
> for build in self.builds:
> - ...