Merge lp:~spiv/bzr/graceful-broken-lockdir-619872-2.0 into lp:bzr/2.0

Proposed by Andrew Bennetts
Status: Merged
Approved by: Martin Pool
Approved revision: no longer in the source branch.
Merged at revision: 4757
Proposed branch: lp:~spiv/bzr/graceful-broken-lockdir-619872-2.0
Merge into: lp:bzr/2.0
Diff against target: 256 lines (+172/-2)
5 files modified
bzrlib/errors.py (+12/-0)
bzrlib/lockdir.py (+44/-2)
bzrlib/tests/test_errors.py (+7/-0)
bzrlib/tests/test_lockdir.py (+101/-0)
bzrlib/tests/test_rio.py (+8/-0)
To merge this branch: bzr merge lp:~spiv/bzr/graceful-broken-lockdir-619872-2.0
Reviewer Review Type Date Requested Status
Martin Pool Approve
Review via email: mp+34657@code.launchpad.net

Commit message

more gracefully handle corrupt lockdirs (missing/null held files)

Description of the change

This patch addresses bug 619872. It adds a LockCorrupt error so that the
lockdir code can clearly distinguish apparently-held-but-damaged locks from
other conditions like not-held. Specifically this deals with the case where the
info file is present, but unparseable, e.g. contains only NUL bytes. This
immediately improves the error reporting to the user when a corrupt
lock/held/info file is encountered (avoids a traceback, and suggests the user
try 'bzr break-lock').

Further, it improves break_lock to cope gracefully with LockCorrupt, so that it
will prompt “Break (corrupt $lock_repr)” [similar to the usual “Break
($lock_info)”], and successfully clears the corrupt lock.

I've added quite a few tests to try make sure we're covering all the interesting
cases. Ideally any time there is a held/ directory in a LockDir without a
parseable info file we'd raise LockCorrupt, but actually there are many cases
already where we silently treat an empty held/ directory as the same as an
unheld lock:
 * many places assume NoSuchFile when reading held/info implies there is no lock
 * attempt_lock assumes that if rename(new-dir, 'held') raises no error, then
   there was no lock, but on POSIX rename(new-dir, empty-dir) silently passes
   [but not on Windows, IIUC, so it would raise LockContention there... but
   wait_lock would then try to peek after the timeout and get LockCorrupt!]

I don't think we can fix these cases without possibly impacting performance, and
arguably at least some of these cases are harmless, so I've left this behaviour
untouched, although I've explicitly allowed some tests to tolerate LockCorrupt
if a future implementation chooses to notice these issues. Most importantly at
least there are now tests that we don't explode in these cases.

I've written this patch against 2.0.x, although perhaps this is a bit too much
like a new feature to belong on the stable branch. (Certainly if we had
translations this patch would violate Ubuntu's SRU policy by introducing new
strings.) I haven't yet written the NEWS entries for this patch, but if we
decide it is ok for 2.0 I certainly will!

To post a comment you must log in.
Revision history for this message
Martin Pool (mbp) wrote :

That looks good.

I do wonder how we end up with lock files containing nulls.

I think the LoggingUIFactory should be moved into somewhere more generally reusable, and definitely not copy&pasted.

The error could have a link to the bug reporting the kind of corruption we saw.

review: Approve
Revision history for this message
Martin Pool (mbp) wrote :

sent to pqm by email

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/errors.py'
2--- bzrlib/errors.py 2009-08-27 05:22:14 +0000
3+++ bzrlib/errors.py 2010-09-06 07:46:12 +0000
4@@ -1054,6 +1054,18 @@
5 self.target = target
6
7
8+class LockCorrupt(LockError):
9+
10+ _fmt = ("Lock is apparently held, but corrupted: %(corruption_info)s\n"
11+ "Use 'bzr break-lock' to clear it")
12+
13+ internal_error = False
14+
15+ def __init__(self, corruption_info, file_data=None):
16+ self.corruption_info = corruption_info
17+ self.file_data = file_data
18+
19+
20 class LockNotHeld(LockError):
21
22 _fmt = "Lock not held: %(lock)s"
23
24=== modified file 'bzrlib/lockdir.py'
25--- bzrlib/lockdir.py 2010-02-26 06:51:51 +0000
26+++ bzrlib/lockdir.py 2010-09-06 07:46:12 +0000
27@@ -118,6 +118,7 @@
28 LockBreakMismatch,
29 LockBroken,
30 LockContention,
31+ LockCorrupt,
32 LockFailed,
33 LockNotHeld,
34 NoSuchFile,
35@@ -345,7 +346,13 @@
36 it possibly being still active.
37 """
38 self._check_not_locked()
39- holder_info = self.peek()
40+ try:
41+ holder_info = self.peek()
42+ except LockCorrupt, e:
43+ # The lock info is corrupt.
44+ if bzrlib.ui.ui_factory.get_boolean("Break (corrupt %r)" % (self,)):
45+ self.force_break_corrupt(e.file_data)
46+ return
47 if holder_info is not None:
48 lock_info = '\n'.join(self._format_lock_info(holder_info))
49 if bzrlib.ui.ui_factory.get_boolean("Break %s" % lock_info):
50@@ -392,6 +399,35 @@
51 for hook in self.hooks['lock_broken']:
52 hook(result)
53
54+ def force_break_corrupt(self, corrupt_info_lines):
55+ """Release a lock that has been corrupted.
56+
57+ This is very similar to force_break, it except it doesn't assume that
58+ self.peek() can work.
59+
60+ :param corrupt_info_lines: the lines of the corrupted info file, used
61+ to check that the lock hasn't changed between reading the (corrupt)
62+ info file and calling force_break_corrupt.
63+ """
64+ # XXX: this copes with unparseable info files, but what about missing
65+ # info files? Or missing lock dirs?
66+ self._check_not_locked()
67+ tmpname = '%s/broken.%s.tmp' % (self.path, rand_chars(20))
68+ self.transport.rename(self._held_dir, tmpname)
69+ # check that we actually broke the right lock, not someone else;
70+ # there's a small race window between checking it and doing the
71+ # rename.
72+ broken_info_path = tmpname + self.__INFO_NAME
73+ f = self.transport.get(broken_info_path)
74+ broken_lines = f.readlines()
75+ if broken_lines != corrupt_info_lines:
76+ raise LockBreakMismatch(self, broken_lines, corrupt_info_lines)
77+ self.transport.delete(broken_info_path)
78+ self.transport.rmdir(tmpname)
79+ result = lock.LockResult(self.transport.abspath(self.path))
80+ for hook in self.hooks['lock_broken']:
81+ hook(result)
82+
83 def _check_not_locked(self):
84 """If the lock is held by this instance, raise an error."""
85 if self._lock_held:
86@@ -456,7 +492,13 @@
87 return s.to_string()
88
89 def _parse_info(self, info_file):
90- stanza = rio.read_stanza(info_file.readlines())
91+ lines = info_file.readlines()
92+ try:
93+ stanza = rio.read_stanza(lines)
94+ except ValueError, e:
95+ mutter('Corrupt lock info file: %r', lines)
96+ raise LockCorrupt("could not parse lock info file: " + str(e),
97+ lines)
98 if stanza is None:
99 # see bug 185013; we fairly often end up with the info file being
100 # empty after an interruption; we could log a message here but
101
102=== modified file 'bzrlib/tests/test_errors.py'
103--- bzrlib/tests/test_errors.py 2009-08-21 02:37:18 +0000
104+++ bzrlib/tests/test_errors.py 2010-09-06 07:46:12 +0000
105@@ -132,6 +132,13 @@
106 "cannot be broken.",
107 str(error))
108
109+ def test_lock_corrupt(self):
110+ error = errors.LockCorrupt("corruption info")
111+ self.assertEqualDiff("Lock is apparently held, but corrupted: "
112+ "corruption info\n"
113+ "Use 'bzr break-lock' to clear it",
114+ str(error))
115+
116 def test_knit_data_stream_incompatible(self):
117 error = errors.KnitDataStreamIncompatible(
118 'stream format', 'target format')
119
120=== modified file 'bzrlib/tests/test_lockdir.py'
121--- bzrlib/tests/test_lockdir.py 2010-02-26 06:51:51 +0000
122+++ bzrlib/tests/test_lockdir.py 2010-09-06 07:46:12 +0000
123@@ -568,6 +568,62 @@
124 finally:
125 bzrlib.ui.ui_factory = orig_factory
126
127+ def test_break_lock_corrupt_info(self):
128+ """break_lock works even if the info file is corrupt (and tells the UI
129+ that it is corrupt).
130+ """
131+ ld = self.get_lock()
132+ ld2 = self.get_lock()
133+ ld.create()
134+ ld.lock_write()
135+ ld.transport.put_bytes_non_atomic('test_lock/held/info', '\0')
136+ class LoggingUIFactory(bzrlib.ui.SilentUIFactory):
137+ def __init__(self):
138+ self.prompts = []
139+ def get_boolean(self, prompt):
140+ self.prompts.append(('boolean', prompt))
141+ return True
142+ ui = LoggingUIFactory()
143+ orig_factory = bzrlib.ui.ui_factory
144+ bzrlib.ui.ui_factory = ui
145+ try:
146+ ld2.break_lock()
147+ self.assertLength(1, ui.prompts)
148+ self.assertEqual('boolean', ui.prompts[0][0])
149+ self.assertStartsWith(ui.prompts[0][1], 'Break (corrupt LockDir')
150+ self.assertRaises(LockBroken, ld.unlock)
151+ finally:
152+ bzrlib.ui.ui_factory = orig_factory
153+
154+ def test_break_lock_missing_info(self):
155+ """break_lock works even if the info file is missing (and tells the UI
156+ that it is corrupt).
157+ """
158+ ld = self.get_lock()
159+ ld2 = self.get_lock()
160+ ld.create()
161+ ld.lock_write()
162+ ld.transport.delete('test_lock/held/info')
163+ class LoggingUIFactory(bzrlib.ui.SilentUIFactory):
164+ def __init__(self):
165+ self.prompts = []
166+ def get_boolean(self, prompt):
167+ self.prompts.append(('boolean', prompt))
168+ return True
169+ ui = LoggingUIFactory()
170+ orig_factory = bzrlib.ui.ui_factory
171+ bzrlib.ui.ui_factory = ui
172+ try:
173+ ld2.break_lock()
174+ self.assertRaises(LockBroken, ld.unlock)
175+ self.assertLength(0, ui.prompts)
176+ finally:
177+ bzrlib.ui.ui_factory = orig_factory
178+ # Suppress warnings due to ld not being unlocked
179+ # XXX: if lock_broken hook was invoked in this case, this hack would
180+ # not be necessary. - Andrew Bennetts, 2010-09-06.
181+ del self._lock_actions[:]
182+
183 def test_create_missing_base_directory(self):
184 """If LockDir.path doesn't exist, it can be created
185
186@@ -688,6 +744,51 @@
187 'locked (unknown)'],
188 formatted_info)
189
190+ def test_corrupt_lockdir_info(self):
191+ """We can cope with corrupt (and thus unparseable) info files."""
192+ # This seems like a fairly common failure case too - see
193+ # <https://bugs.edge.launchpad.net/bzr/+bug/619872> for instance.
194+ # In particular some systems tend to fill recently created files with
195+ # nul bytes after recovering from a system crash.
196+ t = self.get_transport()
197+ t.mkdir('test_lock')
198+ t.mkdir('test_lock/held')
199+ t.put_bytes('test_lock/held/info', '\0')
200+ lf = LockDir(t, 'test_lock')
201+ self.assertRaises(errors.LockCorrupt, lf.peek)
202+ # Currently attempt_lock gives LockContention, but LockCorrupt would be
203+ # a reasonable result too.
204+ self.assertRaises(
205+ (errors.LockCorrupt, errors.LockContention), lf.attempt_lock)
206+ self.assertRaises(errors.LockCorrupt, lf.validate_token, 'fake token')
207+
208+ def test_missing_lockdir_info(self):
209+ """We can cope with absent info files."""
210+ t = self.get_transport()
211+ t.mkdir('test_lock')
212+ t.mkdir('test_lock/held')
213+ lf = LockDir(t, 'test_lock')
214+ # In this case we expect the 'not held' result from peek, because peek
215+ # cannot be expected to notice that there is a 'held' directory with no
216+ # 'info' file.
217+ self.assertEqual(None, lf.peek())
218+ # And lock/unlock may work or give LockContention (but not any other
219+ # error).
220+ try:
221+ lf.attempt_lock()
222+ except LockContention:
223+ # LockContention is ok, and expected on Windows
224+ pass
225+ else:
226+ # no error is ok, and expected on POSIX (because POSIX allows
227+ # os.rename over an empty directory).
228+ lf.unlock()
229+ # Currently raises TokenMismatch, but LockCorrupt would be reasonable
230+ # too.
231+ self.assertRaises(
232+ (errors.TokenMismatch, errors.LockCorrupt),
233+ lf.validate_token, 'fake token')
234+
235
236 class TestLockDirHooks(TestCaseWithTransport):
237
238
239=== modified file 'bzrlib/tests/test_rio.py'
240--- bzrlib/tests/test_rio.py 2009-03-23 14:59:43 +0000
241+++ bzrlib/tests/test_rio.py 2010-09-06 07:46:12 +0000
242@@ -188,6 +188,14 @@
243 self.assertEqual(s, None)
244 self.assertTrue(s is None)
245
246+ def test_read_nul_byte(self):
247+ """File consisting of a nul byte causes an error."""
248+ self.assertRaises(ValueError, read_stanza, ['\0'])
249+
250+ def test_read_nul_bytes(self):
251+ """File consisting of many nul bytes causes an error."""
252+ self.assertRaises(ValueError, read_stanza, ['\0' * 100])
253+
254 def test_read_iter(self):
255 """Read several stanzas from file"""
256 tmpf = TemporaryFile()

Subscribers

People subscribed via source and target branches