Merge lp:~mbp/bzr/185103-2.0-lockdir-info into lp:bzr/2.0

Proposed by Martin Pool
Status: Merged
Approved by: John A Meinel
Approved revision: no longer in the source branch.
Merged at revision: not available
Proposed branch: lp:~mbp/bzr/185103-2.0-lockdir-info
Merge into: lp:bzr/2.0
Diff against target: 114 lines (+44/-8)
3 files modified
NEWS (+5/-0)
bzrlib/lockdir.py (+19/-7)
bzrlib/tests/test_lockdir.py (+20/-1)
To merge this branch: bzr merge lp:~mbp/bzr/185103-2.0-lockdir-info
Reviewer Review Type Date Requested Status
John A Meinel Approve
Vincent Ladeuil Approve
Review via email: mp+20110@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Martin Pool (mbp) wrote :

Cope with the lock/info file being often; very commonly hit by dupes of bug 185103. Proposed for 2.0.

Revision history for this message
Vincent Ladeuil (vila) wrote :

Nice !

I wonder if we can make it clearer:

108 + ['lock %s' % t.abspath('test_lock'),
109 + 'held by None on host None [process #None]',
110 + 'locked (unknown)'],

   lock URL held by None on host None [process #None] locked (unknown)

is not user-friendly, what about:

   lock URL held by <unknown> on host <unknown> locked <file's last modification time>

or even better:

   lock URL left in a broken state by <unknown> since <file's last modification time>

which clearly states that something weird occurred ?

For the test, forcing the mtime (like John did in
transform.DiskTreeTransform._set_mtime()) will allow
checking the whole formatted string.

review: Approve
Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> Martin Pool has proposed merging lp:~mbp/bzr/185103-2.0-lockdir-info into lp:bzr/2.0.
>
> Requested reviews:
> bzr-core (bzr-core)
>
>
> Cope with the lock/info file being often; very commonly hit by dupes of bug 185103. Proposed for 2.0.
>

I'm still a bit concerned about how this happened. Specifically, if you
got an interrupt, I wouldn't be surprised if 'pending.XXXX/info' ended
up empty, but we shouldn't be renaming it to 'held'.

Anyway, I think failing gracefully is better than failing poorly. I just
feel like we probably are missing something.

caveat Vincent's comments:

 merge: approve

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkuGxjwACgkQJdeBCYSNAAMnowCfTGBquWFP9k2MB4UbY13P7TaM
KLQAoIR78fDXkismPgOOPhKN98uOrhgF
=nSeI
-----END PGP SIGNATURE-----

review: Approve
Revision history for this message
Martin Pool (mbp) wrote :

On 26 February 2010 05:51, John A Meinel <email address hidden> wrote:
> I'm still a bit concerned about how this happened. Specifically, if you
> got an interrupt, I wouldn't be surprised if 'pending.XXXX/info' ended
> up empty, but we shouldn't be renaming it to 'held'.

To me the most likely case is that the whole machine crashed around
the time the lock was being taken. It's plausible that the metadata
was recovered from the journal but the lock contents were not.

We've never got a really clear description of how the lock got into
this state, but people do seem to hit it a lot.

> Anyway, I think failing gracefully is better than failing poorly. I just
> feel like we probably are missing something.

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message
Martin Pool (mbp) wrote :

On 25 February 2010 19:16, Vincent Ladeuil <email address hidden> wrote:
> Review: Approve
> Nice !
>
> I wonder if we can make it clearer:
>
> 108     + ['lock %s' % t.abspath('test_lock'),
> 109     + 'held by None on host None [process #None]',
> 110     + 'locked (unknown)'],
>
>   lock URL held by None on host None [process #None] locked (unknown)
>
> is not user-friendly, what about:
>
>
>   lock URL held by <unknown> on host <unknown> locked <file's last modification time>

Showing '<unknown>' rather than None would be good. But I somewhat
dislike the idea of doing extra work to stat the file when we're
already in a state we shouldn't normally hit...

I suspect if we put this in we may still get bugs about "why did it
say that?" but then we can push harder to understand what happened.

One option actually would be to fsync the lock file, but on some
filesystems that flushes the whole filesystem so it can be extremely
slow, and of course this will be hit on almost every data-writing bzr
command.

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message
Vincent Ladeuil (vila) wrote :

Already merged but anyhow,

> But I somewhat
> dislike the idea of doing extra work to stat the file when we're
> already in a state we shouldn't normally hit...

but that's exactly why I asked for it.
The error hapenned in a previous run, now, as a user, I'm not aware
of who is responsible and worse I don't even know *when* it occurred.
Displaying the last modification time gives a clue: If it was two days ago,
I can safely break-lock, it it's 5 mins ago... I may wait a bit hoping that the
responsible person will just do that for me.

So spending a bit more time here doesn't matter but really add value to the
error message.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2010-02-19 07:13:45 +0000
3+++ NEWS 2010-02-26 06:54:17 +0000
4@@ -17,6 +17,11 @@
5 * Avoid ``malloc(0)`` in ``patiencediff``, which is non-portable.
6 (Martin Pool, #331095)
7
8+* Cope with the lockdir ``held/info`` file being empty, which seems to
9+ happen fairly often if the process is suddenly interrupted while taking
10+ a lock.
11+ (Martin Pool, #185103)
12+
13 * Handle renames correctly when there are files or directories that
14 differ only in case. (Chris Jones, Martin Pool, #368931)
15
16
17=== modified file 'bzrlib/lockdir.py'
18--- bzrlib/lockdir.py 2009-12-21 06:17:05 +0000
19+++ bzrlib/lockdir.py 2010-02-26 06:54:17 +0000
20@@ -1,4 +1,4 @@
21-# Copyright (C) 2006, 2007, 2008 Canonical Ltd
22+# Copyright (C) 2006, 2007, 2008, 2010 Canonical Ltd
23 #
24 # This program is free software; you can redistribute it and/or modify
25 # it under the terms of the GNU General Public License as published by
26@@ -250,7 +250,7 @@
27 if info is None:
28 raise LockFailed(self, "lock was renamed into place, but "
29 "now is missing!")
30- if info['nonce'] != self.nonce:
31+ if info.get('nonce') != self.nonce:
32 self._trace("rename succeeded, "
33 "but lock is still held by someone else")
34 raise LockContention(self)
35@@ -427,7 +427,7 @@
36 def peek(self):
37 """Check if the lock is held by anyone.
38
39- If it is held, this returns the lock info structure as a rio Stanza,
40+ If it is held, this returns the lock info structure as a dict
41 which contains some information about the current lock holder.
42 Otherwise returns None.
43 """
44@@ -456,7 +456,14 @@
45 return s.to_string()
46
47 def _parse_info(self, info_file):
48- return rio.read_stanza(info_file.readlines()).as_dict()
49+ stanza = rio.read_stanza(info_file.readlines())
50+ if stanza is None:
51+ # see bug 185013; we fairly often end up with the info file being
52+ # empty after an interruption; we could log a message here but
53+ # there may not be much we can say
54+ return {}
55+ else:
56+ return stanza.as_dict()
57
58 def attempt_lock(self):
59 """Take the lock; fail if it's already held.
60@@ -604,11 +611,16 @@
61 def _format_lock_info(self, info):
62 """Turn the contents of peek() into something for the user"""
63 lock_url = self.transport.abspath(self.path)
64- delta = time.time() - int(info['start_time'])
65+ start_time = info.get('start_time')
66+ if start_time is None:
67+ time_ago = '(unknown)'
68+ else:
69+ time_ago = format_delta(time.time() - int(info['start_time']))
70 return [
71 'lock %s' % (lock_url,),
72- 'held by %(user)s on host %(hostname)s [process #%(pid)s]' % info,
73- 'locked %s' % (format_delta(delta),),
74+ 'held by %s on host %s [process #%s]' %
75+ tuple([info.get(x, '<unknown>') for x in ['user', 'hostname', 'pid']]),
76+ 'locked %s' % (time_ago,),
77 ]
78
79 def validate_token(self, token):
80
81=== modified file 'bzrlib/tests/test_lockdir.py'
82--- bzrlib/tests/test_lockdir.py 2009-06-23 08:43:05 +0000
83+++ bzrlib/tests/test_lockdir.py 2010-02-26 06:54:17 +0000
84@@ -1,4 +1,4 @@
85-# Copyright (C) 2006, 2007, 2008 Canonical Ltd
86+# Copyright (C) 2006, 2007, 2008, 2010 Canonical Ltd
87 #
88 # This program is free software; you can redistribute it and/or modify
89 # it under the terms of the GNU General Public License as published by
90@@ -669,6 +669,25 @@
91 # no kibble
92 check_dir(['held'])
93
94+ def test_no_lockdir_info(self):
95+ """We can cope with empty info files."""
96+ # This seems like a fairly common failure case - see
97+ # <https://bugs.edge.launchpad.net/bzr/+bug/185103> and all its dupes.
98+ # Processes are often interrupted after opening the file
99+ # before the actual contents are committed.
100+ t = self.get_transport()
101+ t.mkdir('test_lock')
102+ t.mkdir('test_lock/held')
103+ t.put_bytes('test_lock/held/info', '')
104+ lf = LockDir(t, 'test_lock')
105+ info = lf.peek()
106+ formatted_info = lf._format_lock_info(info)
107+ self.assertEquals(
108+ ['lock %s' % t.abspath('test_lock'),
109+ 'held by <unknown> on host <unknown> [process #<unknown>]',
110+ 'locked (unknown)'],
111+ formatted_info)
112+
113
114 class TestLockDirHooks(TestCaseWithTransport):
115

Subscribers

People subscribed via source and target branches