Merge lp:~cprov/launchpad/bug-430552-unblock-death-row into lp:launchpad/db-devel

Proposed by Celso Providelo
Status: Merged
Merged at revision: not available
Proposed branch: lp:~cprov/launchpad/bug-430552-unblock-death-row
Merge into: lp:launchpad/db-devel
Diff against target: 261 lines
4 files modified
lib/lp/archivepublisher/deathrow.py (+10/-5)
lib/lp/archivepublisher/diskpool.py (+10/-4)
lib/lp/archivepublisher/tests/test_deathrow.py (+155/-0)
lib/lp/soyuz/interfaces/publishing.py (+10/-0)
To merge this branch: bzr merge lp:~cprov/launchpad/bug-430552-unblock-death-row
Reviewer Review Type Date Requested Status
Brad Crittenden (community) release-critical Approve
Michael Nelson (community) code Approve
Review via email: mp+12346@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Celso Providelo (cprov) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

= Summary =

This branch fixes https://bugs.edge.launchpad.net/bugs/430552.

The is a issue currently blocking `process-death-row` (p-d-r) in
production and increasing its backlog of removal-candidates, since
it's not able to remove files from pool/

After an investigation we discovered that p-d-r is failing due
because it trying to remove files that do not exist anymore.

The missing files (symlinks, in fact) could be gone by the following
reasons:

 * Someone butter-fingered the ubuntu archive disk trying to fix
   something else (only remotely possible, there are many broken
   files)

 * The last effective run of p-d-r was aborted during the DB update
   (quite possible, disk operations and updates are not atomic)

== Proposed fix ==

The fix for this problem is to deal with 'missing symlinks' as we deal
with 'missing files'.

It will be an *attempt* to remove what was already removed, somehow,
so should carry on and update the DB records as if we have
successfully removed the files from disk.

== Tests ==

./bin/test -vv -t TestDeathRow -t deathrow.txt

== Demo and Q/A ==

The best! Code is cowboyed in production and dealing with 200k records
backlog (ETA 5h).

= Launchpad lint =

Checking for conflicts. and issues in doctests and templates.
Running jslint, xmllint, pyflakes, and pylint.
Using normal rules.

Linting changed files:
  lib/lp/archivepublisher/deathrow.py
  lib/lp/soyuz/interfaces/publishing.py
  lib/lp/archivepublisher/diskpool.py
  lib/lp/archivepublisher/tests/test_deathrow.py
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkq7ZvgACgkQ7KBXuXyZSjDw9ACgrP9P0ceJ38CB75+hX8GkmP0/
kogAnjmB5M46LfyxrUWwvNwVZSgMlQ9U
=xq9C
-----END PGP SIGNATURE-----

Revision history for this message
Michael Nelson (michael.nelson) wrote :
Download full text (12.7 KiB)

Great work Celso - you make complex things easy to understand :)
r=me

> = Summary =
>
> This branch fixes https://bugs.edge.launchpad.net/bugs/430552.
>
> The is a issue currently blocking `process-death-row` (p-d-r) in
> production and increasing its backlog of removal-candidates, since
> it's not able to remove files from pool/
>
> After an investigation we discovered that p-d-r is failing due
> because it trying to remove files that do not exist anymore.
>
> The missing files (symlinks, in fact) could be gone by the following
> reasons:
>
> * Someone butter-fingered the ubuntu archive disk trying to fix
> something else (only remotely possible, there are many broken
> files)
>
> * The last effective run of p-d-r was aborted during the DB update
> (quite possible, disk operations and updates are not atomic)
>

Thanks for the detailed explanation - I'd been wondering about this
issue while seeing you and Julian chatting about it.

It's not possible that this could have happened while converting the
privacy of ppas? I guess you would already know that - just a thought.

>
> == Proposed fix ==
>
> The fix for this problem is to deal with 'missing symlinks' as we deal
> with 'missing files'.
>
> It will be an *attempt* to remove what was already removed, somehow,
> so should carry on and update the DB records as if we have
> successfully removed the files from disk.

Great.

>
> == Tests ==
>
> ./bin/test -vv -t TestDeathRow -t deathrow.txt
>
> == Demo and Q/A ==
>
> The best! Code is cowboyed in production and dealing with 200k records
> backlog (ETA 5h).

ouch. It can't even be tested on df? Can't we mess with df and delete
some symlinks that p-d-r expects?

>
> = Launchpad lint =
>
> Checking for conflicts. and issues in doctests and templates.
> Running jslint, xmllint, pyflakes, and pylint.
> Using normal rules.
>
> Linting changed files:
> lib/lp/archivepublisher/deathrow.py
> lib/lp/soyuz/interfaces/publishing.py
> lib/lp/archivepublisher/diskpool.py
> lib/lp/archivepublisher/tests/test_deathrow.py

> === modified file 'lib/lp/archivepublisher/deathrow.py'
> --- lib/lp/archivepublisher/deathrow.py 2009-06-24 23:28:16 +0000
> +++ lib/lp/archivepublisher/deathrow.py 2009-09-24 12:54:03 +0000
> @@ -23,7 +23,8 @@
> from lp.soyuz.interfaces.publishing import (
> ISecureBinaryPackagePublishingHistory,
> ISecureSourcePackagePublishingHistory)
> -from canonical.launchpad.interfaces import NotInPool
> +from lp.soyuz.interfaces.publishing import (
> + MissingSymlinkInPool, NotInPool)
>
>
> def getDeathRow(archive, log, pool_root_override):
> @@ -292,12 +293,16 @@
> try:
> bytes += self._removeFile(
> component_name, source_name, file_name)
> - except NotInPool:
> + except NotInPool, info:
> # It's safe for us to let this slide because it means that
> # the file is already gone.
> - self.logger.debug(
> - "File for removing %s %s/%s is not in pool, skipping" %
> - (component_name, source_name, file_name))
> + self.logger....

review: Approve (code)
Revision history for this message
Celso Providelo (cprov) wrote :
Download full text (14.1 KiB)

On Thu, Sep 24, 2009 at 10:51 AM, Michael Nelson
<email address hidden> wrote:
> Review: Approve code
> Great work Celso - you make complex things easy to understand :)
> r=me

Aha! thanks Michael, you always make me few important.

>> = Summary =
>>
>> This branch fixes https://bugs.edge.launchpad.net/bugs/430552.
>>
>> The is a issue currently blocking `process-death-row` (p-d-r) in
>> production and increasing its backlog of removal-candidates, since
>> it's not able to remove files from pool/
>>
>> After an investigation we discovered that p-d-r is failing due
>> because it trying to remove files that do not exist anymore.
>>
>> The missing files (symlinks, in fact) could be gone by the following
>> reasons:
>>
>>  * Someone butter-fingered the ubuntu archive disk trying to fix
>>    something else (only remotely possible, there are many broken
>>    files)
>>
>>  * The last effective run of p-d-r was aborted during the DB update
>>    (quite possible, disk operations and updates are not atomic)
>>
>
> Thanks for the detailed explanation - I'd been wondering about this
> issue while seeing you and Julian chatting about it.
>
> It's not possible that this could have happened while converting the
> privacy of ppas? I guess you would already know that - just a thought.

No I don't think so ... it's certainly some slip in the death-row
workflow, most likely to be the second scenario, when it removed the
symlinks from disk but failed to update the DB.

We have to change the code to make those actions atomic, that will
probably help us to also reduce the transaction time by doing isolated
batches of work.

Future ...

>>
>> == Proposed fix ==
>>
>> The fix for this problem is to deal with 'missing symlinks' as we deal
>> with 'missing files'.
>>
>> It will be an *attempt* to remove what was already removed, somehow,
>> so should carry on and update the DB records as if we have
>> successfully removed the files from disk.
>
> Great.
>
>>
>> == Tests ==
>>
>> ./bin/test -vv -t TestDeathRow -t deathrow.txt
>>
>> == Demo and Q/A ==
>>
>> The best! Code is cowboyed in production and dealing with 200k records
>> backlog (ETA 5h).
>
> ouch. It can't even be tested on df? Can't we mess with df and delete
> some symlinks that p-d-r expects?

Yes, but it would be as artificial as the test-case, we can't reproduce the
huge backlog we have in prod neither 'God knows how many' missing symlinks
in pool. If it wasn't that urgent and complex I would have invested time to
tweak DF.

>>
>> = Launchpad lint =
>>
>> Checking for conflicts. and issues in doctests and templates.
>> Running jslint, xmllint, pyflakes, and pylint.
>> Using normal rules.
>>
>> Linting changed files:
>>   lib/lp/archivepublisher/deathrow.py
>>   lib/lp/soyuz/interfaces/publishing.py
>>   lib/lp/archivepublisher/diskpool.py
>>   lib/lp/archivepublisher/tests/test_deathrow.py
>
>> === modified file 'lib/lp/archivepublisher/deathrow.py'
>> --- lib/lp/archivepublisher/deathrow.py       2009-06-24 23:28:16 +0000
>> +++ lib/lp/archivepublisher/deathrow.py       2009-09-24 12:54:03 +0000
>> @@ -23,7 +23,8 @@
>>  from lp.soyuz.interfaces.publishing import (
>>      ISecureBinaryPackagePu...

Revision history for this message
Brad Crittenden (bac) :
review: Approve (release-critical)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/lp/archivepublisher/deathrow.py'
2--- lib/lp/archivepublisher/deathrow.py 2009-06-24 23:28:16 +0000
3+++ lib/lp/archivepublisher/deathrow.py 2009-09-24 16:51:13 +0000
4@@ -23,7 +23,8 @@
5 from lp.soyuz.interfaces.publishing import (
6 ISecureBinaryPackagePublishingHistory,
7 ISecureSourcePackagePublishingHistory)
8-from canonical.launchpad.interfaces import NotInPool
9+from lp.soyuz.interfaces.publishing import (
10+ MissingSymlinkInPool, NotInPool)
11
12
13 def getDeathRow(archive, log, pool_root_override):
14@@ -292,12 +293,16 @@
15 try:
16 bytes += self._removeFile(
17 component_name, source_name, file_name)
18- except NotInPool:
19+ except NotInPool, info:
20 # It's safe for us to let this slide because it means that
21 # the file is already gone.
22- self.logger.debug(
23- "File for removing %s %s/%s is not in pool, skipping" %
24- (component_name, source_name, file_name))
25+ self.logger.debug(str(info))
26+ except MissingSymlinkInPool, info:
27+ # This one is a little more worrying, because an expected
28+ # symlink has vanished from the pool/ (could be a code
29+ # mistake) but there is nothing we can do about it at this
30+ # point.
31+ self.logger.warn(str(info))
32
33 self.logger.info("Total bytes freed: %s" % bytes)
34
35
36=== modified file 'lib/lp/archivepublisher/diskpool.py'
37--- lib/lp/archivepublisher/diskpool.py 2009-06-24 23:28:16 +0000
38+++ lib/lp/archivepublisher/diskpool.py 2009-09-24 16:51:13 +0000
39@@ -8,8 +8,8 @@
40
41 from lp.archivepublisher import HARDCODED_COMPONENT_ORDER
42 from canonical.cachedproperty import cachedproperty
43-from canonical.launchpad.interfaces import (
44- NotInPool, PoolFileOverwriteError)
45+from lp.soyuz.interfaces.publishing import (
46+ MissingSymlinkInPool, NotInPool, PoolFileOverwriteError)
47 from canonical.librarian.utils import copy_and_close, sha1_from_path
48
49
50@@ -229,7 +229,10 @@
51 3) Remove the main file and there are symlinks left.
52 """
53 if not self.file_component:
54- raise NotInPool()
55+ raise NotInPool(
56+ "File for removing %s %s/%s is not in pool, skipping." %
57+ (component, self.source, self.filename))
58+
59
60 # Okay, it's there, if it's a symlink then we need to remove
61 # it simply.
62@@ -242,7 +245,10 @@
63 assert os.path.islink(link_path)
64 return self._reallyRemove(component)
65
66- assert component == self.file_component
67+ if component != self.file_component:
68+ raise MissingSymlinkInPool(
69+ "Symlink for %s/%s in %s is missing, skipping." %
70+ (self.source, self.filename, component))
71
72 # It's not a symlink, this means we need to check whether we
73 # have symlinks or not.
74
75=== added file 'lib/lp/archivepublisher/tests/test_deathrow.py'
76--- lib/lp/archivepublisher/tests/test_deathrow.py 1970-01-01 00:00:00 +0000
77+++ lib/lp/archivepublisher/tests/test_deathrow.py 2009-09-24 16:51:13 +0000
78@@ -0,0 +1,155 @@
79+# Copyright 2009 Canonical Ltd. This software is licensed under the
80+# GNU Affero General Public License version 3 (see the file LICENSE).
81+
82+"""Tests for deathrow class."""
83+
84+__metaclass__ = type
85+
86+
87+import os
88+import shutil
89+import tempfile
90+import unittest
91+
92+from zope.component import getUtility
93+
94+from canonical.testing import LaunchpadZopelessLayer
95+from canonical.launchpad.scripts.logger import BufferLogger
96+
97+from lp.archivepublisher.deathrow import DeathRow
98+from lp.archivepublisher.diskpool import DiskPool
99+from lp.registry.interfaces.distribution import IDistributionSet
100+from lp.soyuz.interfaces.component import IComponentSet
101+from lp.soyuz.model.publishing import SourcePackagePublishingHistory
102+from lp.soyuz.tests.test_publishing import SoyuzTestPublisher
103+from lp.testing import TestCase
104+
105+
106+class TestDeathRow(TestCase):
107+
108+ layer = LaunchpadZopelessLayer
109+
110+ def getTestPublisher(self, distroseries):
111+ """Return an `SoyuzTestPublisher`instance."""
112+ stp = SoyuzTestPublisher()
113+ stp.addFakeChroots(distroseries)
114+ stp.setUpDefaultDistroSeries(distroseries)
115+ return stp
116+
117+ def getDeathRow(self, archive):
118+ """Return an `DeathRow` for the given archive.
119+
120+ Created the temporary 'pool' and 'temp' directories and register
121+ a 'cleanup' to purge them after the test runs.
122+ """
123+ pool_path = tempfile.mkdtemp('-pool')
124+ temp_path = tempfile.mkdtemp('-pool-tmp')
125+ def clean_pool(pool_path, temp_path):
126+ shutil.rmtree(pool_path)
127+ shutil.rmtree(temp_path)
128+ self.addCleanup(clean_pool, pool_path, temp_path)
129+
130+ logger = BufferLogger()
131+ diskpool = DiskPool(pool_path, temp_path, logger)
132+ return DeathRow(archive, diskpool, logger)
133+
134+ def getDiskPoolPath(self, pub_file, diskpool):
135+ """Return the absolute path to a published file in the disk pool/."""
136+ return diskpool.pathFor(
137+ pub_file.componentname.encode('utf-8'),
138+ pub_file.sourcepackagename.encode('utf8'),
139+ pub_file.libraryfilealiasfilename.encode('utf-8')
140+ )
141+
142+ def assertIsFile(self, path):
143+ """Assert the path exists and is a regular file."""
144+ self.assertTrue(
145+ os.path.exists(path),
146+ "File %s does not exist" % os.path.basename(path))
147+ self.assertFalse(
148+ os.path.islink(path),
149+ "File %s is a symbolic link" % os.path.basename(path))
150+
151+ def assertIsLink(self, path):
152+ """Assert the path exists and is a symbolic link."""
153+ self.assertTrue(
154+ os.path.exists(path),
155+ "File %s does not exist" % os.path.basename(path))
156+ self.assertTrue(
157+ os.path.islink(path),
158+ "File %s is a not symbolic link" % os.path.basename(path))
159+
160+ def assertDoesNotExist(self, path):
161+ """Assert the path does not exit."""
162+ self.assertFalse(
163+ os.path.exists(path),
164+ "File %s exists" % os.path.basename(path))
165+
166+ def test_MissingSymLinkInPool(self):
167+ # When a publication is promoted from 'universe' to 'main' and
168+ # the symbolic links expected in 'universe' are not present,
169+ # a `MissingSymlinkInPool` error is generated and immediately
170+ # ignored by the `DeathRow` processor. Even in this adverse
171+ # circumstances the database record (removal candidate) is
172+ # updated to match the disk status.
173+
174+ # Setup an `SoyuzTestPublisher` and a `DeathRow` instance.
175+ ubuntu = getUtility(IDistributionSet).getByName('ubuntu')
176+ hoary = ubuntu.getSeries('hoary')
177+ stp = self.getTestPublisher(hoary)
178+ deathrow = self.getDeathRow(hoary.main_archive)
179+
180+ # Create a source publication with a since file (DSC) in
181+ # 'universe' and promote it to 'main'.
182+ source_universe = stp.getPubSource(component='universe')
183+ secure_record = source_universe.changeOverride(
184+ new_component=getUtility(IComponentSet)['main'])
185+ source_main = SourcePackagePublishingHistory.get(
186+ secure_record.id)
187+ test_publications = (source_universe, source_main)
188+
189+ # Commit for exposing the just-created librarian files.
190+ self.layer.commit()
191+
192+ # Publish the testing publication on disk, the file for the
193+ # 'universe' component will be a symbolic link to the one
194+ # in 'main'.
195+ for pub in test_publications:
196+ pub.publish(deathrow.diskpool, deathrow.logger)
197+ [main_dsc_path] = [
198+ self.getDiskPoolPath(pub_file, deathrow.diskpool)
199+ for pub_file in source_main.files]
200+ [universe_dsc_path] = [
201+ self.getDiskPoolPath(pub_file, deathrow.diskpool)
202+ for pub_file in source_universe.files]
203+ self.assertIsFile(main_dsc_path)
204+ self.assertIsLink(universe_dsc_path)
205+
206+ # Remove the symbolic link to emulate MissingSymlinkInPool scenario.
207+ os.remove(universe_dsc_path)
208+
209+ # Remove the testing publications.
210+ for pub in test_publications:
211+ pub.requestObsolescence()
212+
213+ # Commit for exposing the just-created removal candidates.
214+ self.layer.commit()
215+
216+ # Due to the MissingSymlinkInPool scenario, it takes 2 iteration to
217+ # remove both references to the shared file in pool/.
218+ deathrow.reap()
219+ deathrow.reap()
220+
221+ for pub in test_publications:
222+ self.assertTrue(
223+ pub.dateremoved is not None,
224+ '%s (%s) is not marked as removed.'
225+ % (pub.displayname, pub.component.name))
226+
227+ self.assertDoesNotExist(main_dsc_path)
228+ self.assertDoesNotExist(universe_dsc_path)
229+
230+
231+def test_suite():
232+ return unittest.TestLoader().loadTestsFromName(__name__)
233+
234
235=== modified file 'lib/lp/soyuz/interfaces/publishing.py'
236--- lib/lp/soyuz/interfaces/publishing.py 2009-09-15 09:24:15 +0000
237+++ lib/lp/soyuz/interfaces/publishing.py 2009-09-24 16:51:13 +0000
238@@ -18,6 +18,7 @@
239 'ISecureSourcePackagePublishingHistory',
240 'ISourcePackageFilePublishing',
241 'ISourcePackagePublishingHistory',
242+ 'MissingSymlinkInPool',
243 'NotInPool',
244 'PackagePublishingPriority',
245 'PackagePublishingStatus',
246@@ -60,6 +61,15 @@
247 requires manual intervention in the archive.
248 """
249
250+class MissingSymlinkInPool(Exception):
251+ """Raised when there is a missing symlink in pool.
252+
253+ This condition is ignored, similarly to what we do for `NotInPool`,
254+ since the pool entry requested to be removed is not there anymore.
255+
256+ The corresponding record is marked as removed and the process
257+ continues.
258+ """
259
260 class PackagePublishingStatus(DBEnumeratedType):
261 """Package Publishing Status

Subscribers

People subscribed via source and target branches

to status/vote changes: