Merge lp:~cprov/launchpad/bug-430552-unblock-death-row into lp:launchpad/db-devel
- bug-430552-unblock-death-row
- Merge into db-devel
Status: | Merged |
---|---|
Merged at revision: | not available |
Proposed branch: | lp:~cprov/launchpad/bug-430552-unblock-death-row |
Merge into: | lp:launchpad/db-devel |
Diff against target: |
261 lines 4 files modified
lib/lp/archivepublisher/deathrow.py (+10/-5) lib/lp/archivepublisher/diskpool.py (+10/-4) lib/lp/archivepublisher/tests/test_deathrow.py (+155/-0) lib/lp/soyuz/interfaces/publishing.py (+10/-0) |
To merge this branch: | bzr merge lp:~cprov/launchpad/bug-430552-unblock-death-row |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Brad Crittenden (community) | release-critical | Approve | |
Michael Nelson (community) | code | Approve | |
Review via email: mp+12346@code.launchpad.net |
Commit message
Description of the change
Celso Providelo (cprov) wrote : | # |
Michael Nelson (michael.nelson) wrote : | # |
Great work Celso - you make complex things easy to understand :)
r=me
> = Summary =
>
> This branch fixes https:/
>
> The is a issue currently blocking `process-death-row` (p-d-r) in
> production and increasing its backlog of removal-candidates, since
> it's not able to remove files from pool/
>
> After an investigation we discovered that p-d-r is failing due
> because it trying to remove files that do not exist anymore.
>
> The missing files (symlinks, in fact) could be gone by the following
> reasons:
>
> * Someone butter-fingered the ubuntu archive disk trying to fix
> something else (only remotely possible, there are many broken
> files)
>
> * The last effective run of p-d-r was aborted during the DB update
> (quite possible, disk operations and updates are not atomic)
>
Thanks for the detailed explanation - I'd been wondering about this
issue while seeing you and Julian chatting about it.
It's not possible that this could have happened while converting the
privacy of ppas? I guess you would already know that - just a thought.
>
> == Proposed fix ==
>
> The fix for this problem is to deal with 'missing symlinks' as we deal
> with 'missing files'.
>
> It will be an *attempt* to remove what was already removed, somehow,
> so should carry on and update the DB records as if we have
> successfully removed the files from disk.
Great.
>
> == Tests ==
>
> ./bin/test -vv -t TestDeathRow -t deathrow.txt
>
> == Demo and Q/A ==
>
> The best! Code is cowboyed in production and dealing with 200k records
> backlog (ETA 5h).
ouch. It can't even be tested on df? Can't we mess with df and delete
some symlinks that p-d-r expects?
>
> = Launchpad lint =
>
> Checking for conflicts. and issues in doctests and templates.
> Running jslint, xmllint, pyflakes, and pylint.
> Using normal rules.
>
> Linting changed files:
> lib/lp/
> lib/lp/
> lib/lp/
> lib/lp/
> === modified file 'lib/lp/
> --- lib/lp/
> +++ lib/lp/
> @@ -23,7 +23,8 @@
> from lp.soyuz.
> ISecureBinaryPa
> ISecureSourcePa
> -from canonical.
> +from lp.soyuz.
> + MissingSymlinkI
>
>
> def getDeathRow(
> @@ -292,12 +293,16 @@
> try:
> bytes += self._removeFile(
> component_name, source_name, file_name)
> - except NotInPool:
> + except NotInPool, info:
> # It's safe for us to let this slide because it means that
> # the file is already gone.
> - self.logger.debug(
> - "File for removing %s %s/%s is not in pool, skipping" %
> - (component_name, source_name, file_name))
> + self.logger....
Celso Providelo (cprov) wrote : | # |
On Thu, Sep 24, 2009 at 10:51 AM, Michael Nelson
<email address hidden> wrote:
> Review: Approve code
> Great work Celso - you make complex things easy to understand :)
> r=me
Aha! thanks Michael, you always make me few important.
>> = Summary =
>>
>> This branch fixes https:/
>>
>> The is a issue currently blocking `process-death-row` (p-d-r) in
>> production and increasing its backlog of removal-candidates, since
>> it's not able to remove files from pool/
>>
>> After an investigation we discovered that p-d-r is failing due
>> because it trying to remove files that do not exist anymore.
>>
>> The missing files (symlinks, in fact) could be gone by the following
>> reasons:
>>
>> * Someone butter-fingered the ubuntu archive disk trying to fix
>> something else (only remotely possible, there are many broken
>> files)
>>
>> * The last effective run of p-d-r was aborted during the DB update
>> (quite possible, disk operations and updates are not atomic)
>>
>
> Thanks for the detailed explanation - I'd been wondering about this
> issue while seeing you and Julian chatting about it.
>
> It's not possible that this could have happened while converting the
> privacy of ppas? I guess you would already know that - just a thought.
No I don't think so ... it's certainly some slip in the death-row
workflow, most likely to be the second scenario, when it removed the
symlinks from disk but failed to update the DB.
We have to change the code to make those actions atomic, that will
probably help us to also reduce the transaction time by doing isolated
batches of work.
Future ...
>>
>> == Proposed fix ==
>>
>> The fix for this problem is to deal with 'missing symlinks' as we deal
>> with 'missing files'.
>>
>> It will be an *attempt* to remove what was already removed, somehow,
>> so should carry on and update the DB records as if we have
>> successfully removed the files from disk.
>
> Great.
>
>>
>> == Tests ==
>>
>> ./bin/test -vv -t TestDeathRow -t deathrow.txt
>>
>> == Demo and Q/A ==
>>
>> The best! Code is cowboyed in production and dealing with 200k records
>> backlog (ETA 5h).
>
> ouch. It can't even be tested on df? Can't we mess with df and delete
> some symlinks that p-d-r expects?
Yes, but it would be as artificial as the test-case, we can't reproduce the
huge backlog we have in prod neither 'God knows how many' missing symlinks
in pool. If it wasn't that urgent and complex I would have invested time to
tweak DF.
>>
>> = Launchpad lint =
>>
>> Checking for conflicts. and issues in doctests and templates.
>> Running jslint, xmllint, pyflakes, and pylint.
>> Using normal rules.
>>
>> Linting changed files:
>> lib/lp/
>> lib/lp/
>> lib/lp/
>> lib/lp/
>
>> === modified file 'lib/lp/
>> --- lib/lp/
>> +++ lib/lp/
>> @@ -23,7 +23,8 @@
>> from lp.soyuz.
>> ISecureBinaryPa
Brad Crittenden (bac) : | # |
Preview Diff
1 | === modified file 'lib/lp/archivepublisher/deathrow.py' |
2 | --- lib/lp/archivepublisher/deathrow.py 2009-06-24 23:28:16 +0000 |
3 | +++ lib/lp/archivepublisher/deathrow.py 2009-09-24 16:51:13 +0000 |
4 | @@ -23,7 +23,8 @@ |
5 | from lp.soyuz.interfaces.publishing import ( |
6 | ISecureBinaryPackagePublishingHistory, |
7 | ISecureSourcePackagePublishingHistory) |
8 | -from canonical.launchpad.interfaces import NotInPool |
9 | +from lp.soyuz.interfaces.publishing import ( |
10 | + MissingSymlinkInPool, NotInPool) |
11 | |
12 | |
13 | def getDeathRow(archive, log, pool_root_override): |
14 | @@ -292,12 +293,16 @@ |
15 | try: |
16 | bytes += self._removeFile( |
17 | component_name, source_name, file_name) |
18 | - except NotInPool: |
19 | + except NotInPool, info: |
20 | # It's safe for us to let this slide because it means that |
21 | # the file is already gone. |
22 | - self.logger.debug( |
23 | - "File for removing %s %s/%s is not in pool, skipping" % |
24 | - (component_name, source_name, file_name)) |
25 | + self.logger.debug(str(info)) |
26 | + except MissingSymlinkInPool, info: |
27 | + # This one is a little more worrying, because an expected |
28 | + # symlink has vanished from the pool/ (could be a code |
29 | + # mistake) but there is nothing we can do about it at this |
30 | + # point. |
31 | + self.logger.warn(str(info)) |
32 | |
33 | self.logger.info("Total bytes freed: %s" % bytes) |
34 | |
35 | |
36 | === modified file 'lib/lp/archivepublisher/diskpool.py' |
37 | --- lib/lp/archivepublisher/diskpool.py 2009-06-24 23:28:16 +0000 |
38 | +++ lib/lp/archivepublisher/diskpool.py 2009-09-24 16:51:13 +0000 |
39 | @@ -8,8 +8,8 @@ |
40 | |
41 | from lp.archivepublisher import HARDCODED_COMPONENT_ORDER |
42 | from canonical.cachedproperty import cachedproperty |
43 | -from canonical.launchpad.interfaces import ( |
44 | - NotInPool, PoolFileOverwriteError) |
45 | +from lp.soyuz.interfaces.publishing import ( |
46 | + MissingSymlinkInPool, NotInPool, PoolFileOverwriteError) |
47 | from canonical.librarian.utils import copy_and_close, sha1_from_path |
48 | |
49 | |
50 | @@ -229,7 +229,10 @@ |
51 | 3) Remove the main file and there are symlinks left. |
52 | """ |
53 | if not self.file_component: |
54 | - raise NotInPool() |
55 | + raise NotInPool( |
56 | + "File for removing %s %s/%s is not in pool, skipping." % |
57 | + (component, self.source, self.filename)) |
58 | + |
59 | |
60 | # Okay, it's there, if it's a symlink then we need to remove |
61 | # it simply. |
62 | @@ -242,7 +245,10 @@ |
63 | assert os.path.islink(link_path) |
64 | return self._reallyRemove(component) |
65 | |
66 | - assert component == self.file_component |
67 | + if component != self.file_component: |
68 | + raise MissingSymlinkInPool( |
69 | + "Symlink for %s/%s in %s is missing, skipping." % |
70 | + (self.source, self.filename, component)) |
71 | |
72 | # It's not a symlink, this means we need to check whether we |
73 | # have symlinks or not. |
74 | |
75 | === added file 'lib/lp/archivepublisher/tests/test_deathrow.py' |
76 | --- lib/lp/archivepublisher/tests/test_deathrow.py 1970-01-01 00:00:00 +0000 |
77 | +++ lib/lp/archivepublisher/tests/test_deathrow.py 2009-09-24 16:51:13 +0000 |
78 | @@ -0,0 +1,155 @@ |
79 | +# Copyright 2009 Canonical Ltd. This software is licensed under the |
80 | +# GNU Affero General Public License version 3 (see the file LICENSE). |
81 | + |
82 | +"""Tests for deathrow class.""" |
83 | + |
84 | +__metaclass__ = type |
85 | + |
86 | + |
87 | +import os |
88 | +import shutil |
89 | +import tempfile |
90 | +import unittest |
91 | + |
92 | +from zope.component import getUtility |
93 | + |
94 | +from canonical.testing import LaunchpadZopelessLayer |
95 | +from canonical.launchpad.scripts.logger import BufferLogger |
96 | + |
97 | +from lp.archivepublisher.deathrow import DeathRow |
98 | +from lp.archivepublisher.diskpool import DiskPool |
99 | +from lp.registry.interfaces.distribution import IDistributionSet |
100 | +from lp.soyuz.interfaces.component import IComponentSet |
101 | +from lp.soyuz.model.publishing import SourcePackagePublishingHistory |
102 | +from lp.soyuz.tests.test_publishing import SoyuzTestPublisher |
103 | +from lp.testing import TestCase |
104 | + |
105 | + |
106 | +class TestDeathRow(TestCase): |
107 | + |
108 | + layer = LaunchpadZopelessLayer |
109 | + |
110 | + def getTestPublisher(self, distroseries): |
111 | + """Return an `SoyuzTestPublisher`instance.""" |
112 | + stp = SoyuzTestPublisher() |
113 | + stp.addFakeChroots(distroseries) |
114 | + stp.setUpDefaultDistroSeries(distroseries) |
115 | + return stp |
116 | + |
117 | + def getDeathRow(self, archive): |
118 | + """Return an `DeathRow` for the given archive. |
119 | + |
120 | + Created the temporary 'pool' and 'temp' directories and register |
121 | + a 'cleanup' to purge them after the test runs. |
122 | + """ |
123 | + pool_path = tempfile.mkdtemp('-pool') |
124 | + temp_path = tempfile.mkdtemp('-pool-tmp') |
125 | + def clean_pool(pool_path, temp_path): |
126 | + shutil.rmtree(pool_path) |
127 | + shutil.rmtree(temp_path) |
128 | + self.addCleanup(clean_pool, pool_path, temp_path) |
129 | + |
130 | + logger = BufferLogger() |
131 | + diskpool = DiskPool(pool_path, temp_path, logger) |
132 | + return DeathRow(archive, diskpool, logger) |
133 | + |
134 | + def getDiskPoolPath(self, pub_file, diskpool): |
135 | + """Return the absolute path to a published file in the disk pool/.""" |
136 | + return diskpool.pathFor( |
137 | + pub_file.componentname.encode('utf-8'), |
138 | + pub_file.sourcepackagename.encode('utf8'), |
139 | + pub_file.libraryfilealiasfilename.encode('utf-8') |
140 | + ) |
141 | + |
142 | + def assertIsFile(self, path): |
143 | + """Assert the path exists and is a regular file.""" |
144 | + self.assertTrue( |
145 | + os.path.exists(path), |
146 | + "File %s does not exist" % os.path.basename(path)) |
147 | + self.assertFalse( |
148 | + os.path.islink(path), |
149 | + "File %s is a symbolic link" % os.path.basename(path)) |
150 | + |
151 | + def assertIsLink(self, path): |
152 | + """Assert the path exists and is a symbolic link.""" |
153 | + self.assertTrue( |
154 | + os.path.exists(path), |
155 | + "File %s does not exist" % os.path.basename(path)) |
156 | + self.assertTrue( |
157 | + os.path.islink(path), |
158 | + "File %s is a not symbolic link" % os.path.basename(path)) |
159 | + |
160 | + def assertDoesNotExist(self, path): |
161 | + """Assert the path does not exit.""" |
162 | + self.assertFalse( |
163 | + os.path.exists(path), |
164 | + "File %s exists" % os.path.basename(path)) |
165 | + |
166 | + def test_MissingSymLinkInPool(self): |
167 | + # When a publication is promoted from 'universe' to 'main' and |
168 | + # the symbolic links expected in 'universe' are not present, |
169 | + # a `MissingSymlinkInPool` error is generated and immediately |
170 | + # ignored by the `DeathRow` processor. Even in this adverse |
171 | + # circumstances the database record (removal candidate) is |
172 | + # updated to match the disk status. |
173 | + |
174 | + # Setup an `SoyuzTestPublisher` and a `DeathRow` instance. |
175 | + ubuntu = getUtility(IDistributionSet).getByName('ubuntu') |
176 | + hoary = ubuntu.getSeries('hoary') |
177 | + stp = self.getTestPublisher(hoary) |
178 | + deathrow = self.getDeathRow(hoary.main_archive) |
179 | + |
180 | + # Create a source publication with a since file (DSC) in |
181 | + # 'universe' and promote it to 'main'. |
182 | + source_universe = stp.getPubSource(component='universe') |
183 | + secure_record = source_universe.changeOverride( |
184 | + new_component=getUtility(IComponentSet)['main']) |
185 | + source_main = SourcePackagePublishingHistory.get( |
186 | + secure_record.id) |
187 | + test_publications = (source_universe, source_main) |
188 | + |
189 | + # Commit for exposing the just-created librarian files. |
190 | + self.layer.commit() |
191 | + |
192 | + # Publish the testing publication on disk, the file for the |
193 | + # 'universe' component will be a symbolic link to the one |
194 | + # in 'main'. |
195 | + for pub in test_publications: |
196 | + pub.publish(deathrow.diskpool, deathrow.logger) |
197 | + [main_dsc_path] = [ |
198 | + self.getDiskPoolPath(pub_file, deathrow.diskpool) |
199 | + for pub_file in source_main.files] |
200 | + [universe_dsc_path] = [ |
201 | + self.getDiskPoolPath(pub_file, deathrow.diskpool) |
202 | + for pub_file in source_universe.files] |
203 | + self.assertIsFile(main_dsc_path) |
204 | + self.assertIsLink(universe_dsc_path) |
205 | + |
206 | + # Remove the symbolic link to emulate MissingSymlinkInPool scenario. |
207 | + os.remove(universe_dsc_path) |
208 | + |
209 | + # Remove the testing publications. |
210 | + for pub in test_publications: |
211 | + pub.requestObsolescence() |
212 | + |
213 | + # Commit for exposing the just-created removal candidates. |
214 | + self.layer.commit() |
215 | + |
216 | + # Due to the MissingSymlinkInPool scenario, it takes 2 iteration to |
217 | + # remove both references to the shared file in pool/. |
218 | + deathrow.reap() |
219 | + deathrow.reap() |
220 | + |
221 | + for pub in test_publications: |
222 | + self.assertTrue( |
223 | + pub.dateremoved is not None, |
224 | + '%s (%s) is not marked as removed.' |
225 | + % (pub.displayname, pub.component.name)) |
226 | + |
227 | + self.assertDoesNotExist(main_dsc_path) |
228 | + self.assertDoesNotExist(universe_dsc_path) |
229 | + |
230 | + |
231 | +def test_suite(): |
232 | + return unittest.TestLoader().loadTestsFromName(__name__) |
233 | + |
234 | |
235 | === modified file 'lib/lp/soyuz/interfaces/publishing.py' |
236 | --- lib/lp/soyuz/interfaces/publishing.py 2009-09-15 09:24:15 +0000 |
237 | +++ lib/lp/soyuz/interfaces/publishing.py 2009-09-24 16:51:13 +0000 |
238 | @@ -18,6 +18,7 @@ |
239 | 'ISecureSourcePackagePublishingHistory', |
240 | 'ISourcePackageFilePublishing', |
241 | 'ISourcePackagePublishingHistory', |
242 | + 'MissingSymlinkInPool', |
243 | 'NotInPool', |
244 | 'PackagePublishingPriority', |
245 | 'PackagePublishingStatus', |
246 | @@ -60,6 +61,15 @@ |
247 | requires manual intervention in the archive. |
248 | """ |
249 | |
250 | +class MissingSymlinkInPool(Exception): |
251 | + """Raised when there is a missing symlink in pool. |
252 | + |
253 | + This condition is ignored, similarly to what we do for `NotInPool`, |
254 | + since the pool entry requested to be removed is not there anymore. |
255 | + |
256 | + The corresponding record is marked as removed and the process |
257 | + continues. |
258 | + """ |
259 | |
260 | class PackagePublishingStatus(DBEnumeratedType): |
261 | """Package Publishing Status |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
= Summary =
This branch fixes https:/ /bugs.edge. launchpad. net/bugs/ 430552.
The is a issue currently blocking `process-death-row` (p-d-r) in
production and increasing its backlog of removal-candidates, since
it's not able to remove files from pool/
After an investigation we discovered that p-d-r is failing due
because it trying to remove files that do not exist anymore.
The missing files (symlinks, in fact) could be gone by the following
reasons:
* Someone butter-fingered the ubuntu archive disk trying to fix
something else (only remotely possible, there are many broken
files)
* The last effective run of p-d-r was aborted during the DB update
(quite possible, disk operations and updates are not atomic)
== Proposed fix ==
The fix for this problem is to deal with 'missing symlinks' as we deal
with 'missing files'.
It will be an *attempt* to remove what was already removed, somehow,
so should carry on and update the DB records as if we have
successfully removed the files from disk.
== Tests ==
./bin/test -vv -t TestDeathRow -t deathrow.txt
== Demo and Q/A ==
The best! Code is cowboyed in production and dealing with 200k records
backlog (ETA 5h).
= Launchpad lint =
Checking for conflicts. and issues in doctests and templates.
Running jslint, xmllint, pyflakes, and pylint.
Using normal rules.
Linting changed files: archivepublishe r/deathrow. py soyuz/interface s/publishing. py archivepublishe r/diskpool. py archivepublishe r/tests/ test_deathrow. py
lib/lp/
lib/lp/
lib/lp/
lib/lp/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
iEYEARECAAYFAkq 7ZvgACgkQ7KBXuX yZSjDw9ACgrP9P0 ceJ38CB75+ hX8GkmP0/ xrUWwvNwVZSgMlQ 9U
kogAnjmB5M46Lfy
=xq9C
-----END PGP SIGNATURE-----