Merge lp:~lifeless/bzr/bug-395556 into lp:~bzr/bzr/trunk-old

Proposed by Robert Collins
Status: Merged
Merged at revision: not available
Proposed branch: lp:~lifeless/bzr/bug-395556
Merge into: lp:~bzr/bzr/trunk-old
Diff against target: 284 lines (has conflicts)
Text conflict in NEWS
To merge this branch: bzr merge lp:~lifeless/bzr/bug-395556
Reviewer Review Type Date Requested Status
John A Meinel Needs Fixing
Review via email: mp+8857@code.launchpad.net

This proposal supersedes a proposal from 2009-07-15.

To post a comment you must log in.
Revision history for this message
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal

Fix dirstate.set_state_from_inventory to have a more stable
old_iterator, fixing rename problems in new trees. (Robert Collins, bug
395556
)

I've removed some tests that helped to diagnose the problem but had
nothing to do with where or how it was occuring; I don't think they add
value to the test environment - the particular way they tickled the bug,
though interesting, is not demonstrating an interface problem, and is a
particularly expensive way of showing that lower levels aren't broken.

-Rob

Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal

This certainly seems to be the simple and straightforward fix. I believe a "better" fix (faster, but more complex) would be to copy the objects when we want to mutate them, rather than trying to mutate-in-place.

Considering we are trying to get away from mutate-in-place anyway (CHKInventory doesn't support apply_delta anymore, etc.)

Anyway, we might want a comment about performance if we see a regression from this. ISTR that Inventory.copy() time can easily became a dominant factor in a lot of operations. But we can certainly start here.

review: Approve
Revision history for this message
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal

> Anyway, we might want a comment about performance if we see a regression from
> this. ISTR that Inventory.copy() time can easily became a dominant factor in a
> lot of operations. But we can certainly start here.

Yah, copy() is bad ;). That said, we're copying tuples lists and strings not complex objects, so there is some reason to believe it will be lighter weight.

Revision history for this message
Robert Collins (lifeless) wrote :

This branch has been updated; the first fix was flawed.

This one just changes the update_minimal to know how to remove a totally moved entry. I also added some tracing support for debugging future things of this ilk.

Revision history for this message
John A Meinel (jameinel) wrote :

As near as I can tell, you don't use deepcopy anymore so you don't need to import it
+from copy import deepcopy

and you need to revert the comment change:
41 - # list - which is a shallow copy.
42 + # list using a deep copy so that forward changes don't make the logic
43 + # more complex. Using a shallow copy results in all entries being seen
44 + # but the state of the entries being wrong, and that leads to stale
45 + # entries being left behind.

62 + if tracing:
63 + trace.mutter("Truncating from old '%s/%s'.",
64 + current_old[0][0].decode('utf'),
65 + current_old[0][1].decode('utf8'))

I don't know what encoding "utf" is, but I'd be really surprised if that works.

same here:
93 + if tracing:
94 + trace.mutter("Deleting from old '%s/%s'.",
95 + current_old[0][0].decode('utf'),
96 + current_old[0][1].decode('utf8'))

Perhaps a smoke test that you can set tracing on and have it not abort, though to really test that requires exercising all the code paths, which seems a bit much.

115 - raise AssertionError('could not find block for %s' % (other_key,))
116 + raise AssertionError('could not find block for %s' % (
117 + other_key,))
118 other_entry_index, present = self._find_entry_index(other_key,
119 self._dirblocks[other_block_index][1])
120 if not present:
121 - raise AssertionError('could not find entry for %s' % (other_key,))
122 + raise AssertionError('could not find entry for %s' % (
123 + other_key,))
124 if path_utf8 is None:
125 raise AssertionError('no path')
126 - self._dirblocks[other_block_index][1][other_entry_index][1][0] = \
127 + other_block = self._dirblocks[other_block_index][1]

move the "other_block = self._dirblocks[other_block_index][1]" up higher, and re-use it in the "other_entry_index, present = XXX" code.
...

126 - self._dirblocks[other_block_index][1][other_entry_index][1][0] = \
127 + other_block = self._dirblocks[other_block_index][1]
128 + # Turn this other location into a reference to the new
129 + # location. This also updates the aliased iterator that
130 + # set_state_from_inventory uses so that the old entry, if
131 + # not already examined, is skipped over.
132 + other_block[other_entry_index][1][0] = \
133 ('r', path_utf8, 0, False, '')
134 + if len(other_block[other_entry_index][1]) == 1:
135 + # We only have one tree in use, so an 'r' record is not
136 + # needed: remove it.
137 + other_block.pop(other_entry_index)

^- Why do you bother setting it, if you are just going to pop it out again right after?

The code comment here also is saying "updates the aliased iterator" but I don't see any such thing.

It at least looks like the code you've written is only tested for when there is a single tree, and not when there are merged trees. I could be wrong, but it at least feels like we need to do something more.

So at least, I would like you to:

1) Fix up the comments that are no longer correct
2) Fix 'utf' => 'utf-8'
3) Make sure (at least manually) that things are correct when there is a merge tree around.

review: Needs Fixing
Revision history for this message
Robert Collins (lifeless) wrote :

On Thu, 2009-07-16 at 21:43 +0000, John A Meinel wrote:
> 131 + # not already examined, is skipped over.
> 132 + other_block[other_entry_index][1][0] = \
> 133 ('r', path_utf8, 0, False, '')
> 134 + if len(other_block[other_entry_index][1]) == 1:
> 135 + # We only have one tree in use, so an 'r' record is not
> 136 + # needed: remove it.
> 137 + other_block.pop(other_entry_index)
>
> ^- Why do you bother setting it, if you are just going to pop it out
> again right after?

Because:

> The code comment here also is saying "updates the aliased iterator"
> but I don't see any such thing.

old_iterator - the iterator over the shallow copy will see the entry
tuples, and thus needs to see the 'r' record, to skip over it. If it
doesn't see the 'r' record, it changes it to 'a' in the row, which
corrupts multi column dirstates. Auditing the full logic again may make
it cleaner, but I'm really on a diversion here from fast-commit, and I
have limited time budget :(.

> It at least looks like the code you've written is only tested for when
> there is a single tree, and not when there are merged trees. I could
> be wrong, but it at least feels like we need to do something more.

> So at least, I would like you to:
>
> 1) Fix up the comments that are no longer correct

Done. And I've tweaked 'aliased iterator' to mention current_old
specifically.

> 2) Fix 'utf' => 'utf-8'

Done.

> 3) Make sure (at least manually) that things are correct when there is
> a merge tree around.

It is correct :).

The loop finds all the rows that are used by the merge trees; column 0
is set to 'r' in them all, and in the special case of there being - ah,
found a remaining conceptual bug. Will tweak, and I think it will make
it more obvious to you.

-Rob

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2009-07-16 08:00:03 +0000
3+++ NEWS 2009-07-16 16:35:19 +0000
4@@ -25,6 +25,7 @@
5 * BranchBuilder now accepts timezone to avoid test failures in countries far
6 from GMT. (Vincent Ladeuil, #397716)
7
8+<<<<<<< TREE
9 * ``bzr commit`` no longer saves the unversioning of missing files until
10 the commit has completed on the branch. This means that aborting a
11 commit that found a missing file will leave the tree unedited.
12@@ -40,6 +41,12 @@
13 were present in a parent tree but renamed in the working tree.
14 (Robert Collins, #187207)
15
16+=======
17+* Renames to lexographically lower basenames in trees that have never been
18+ committed to will no longer corrupt the dirstate. This was caused by an
19+ bug in the dirstate update_minimal method. (Robert Collins, #395556)
20+
21+>>>>>>> MERGE-SOURCE
22 Improvements
23 ************
24
25
26=== modified file 'bzrlib/dirstate.py'
27--- bzrlib/dirstate.py 2009-07-10 05:33:07 +0000
28+++ bzrlib/dirstate.py 2009-07-16 16:35:19 +0000
29@@ -203,6 +203,7 @@
30
31 import bisect
32 import binascii
33+from copy import deepcopy
34 import errno
35 import os
36 from stat import S_IEXEC
37@@ -2422,6 +2423,9 @@
38 if 'evil' in debug.debug_flags:
39 trace.mutter_callsite(1,
40 "set_state_from_inventory called; please mutate the tree instead")
41+ tracing = 'dirstate' in debug.debug_flags
42+ if tracing:
43+ trace.mutter("set_state_from_inventory trace:")
44 self._read_dirblocks_if_needed()
45 # sketch:
46 # Two iterators: current data and new data, both in dirblock order.
47@@ -2436,7 +2440,10 @@
48 new_iterator = new_inv.iter_entries_by_dir()
49 # we will be modifying the dirstate, so we need a stable iterator. In
50 # future we might write one, for now we just clone the state into a
51- # list - which is a shallow copy.
52+ # list using a deep copy so that forward changes don't make the logic
53+ # more complex. Using a shallow copy results in all entries being seen
54+ # but the state of the entries being wrong, and that leads to stale
55+ # entries being left behind.
56 old_iterator = iter(list(self._iter_entries()))
57 # both must have roots so this is safe:
58 current_new = new_iterator.next()
59@@ -2476,12 +2483,19 @@
60 # we make both end conditions explicit
61 if not current_old:
62 # old is finished: insert current_new into the state.
63+ if tracing:
64+ trace.mutter("Appending from new '%s'.",
65+ new_path_utf8.decode('utf8'))
66 self.update_minimal(new_entry_key, current_new_minikind,
67 executable=current_new[1].executable,
68 path_utf8=new_path_utf8, fingerprint=fingerprint)
69 current_new = advance(new_iterator)
70 elif not current_new:
71 # new is finished
72+ if tracing:
73+ trace.mutter("Truncating from old '%s/%s'.",
74+ current_old[0][0].decode('utf'),
75+ current_old[0][1].decode('utf8'))
76 self._make_absent(current_old)
77 current_old = advance(old_iterator)
78 elif new_entry_key == current_old[0]:
79@@ -2494,6 +2508,9 @@
80 # kind has changed.
81 if (current_old[1][0][3] != current_new[1].executable or
82 current_old[1][0][0] != current_new_minikind):
83+ if tracing:
84+ trace.mutter("Updating in-place change '%s'.",
85+ new_path_utf8.decode('utf8'))
86 self.update_minimal(current_old[0], current_new_minikind,
87 executable=current_new[1].executable,
88 path_utf8=new_path_utf8, fingerprint=fingerprint)
89@@ -2505,6 +2522,9 @@
90 and new_entry_key[1:] < current_old[0][1:])):
91 # new comes before:
92 # add a entry for this and advance new
93+ if tracing:
94+ trace.mutter("Inserting from new '%s'.",
95+ new_path_utf8.decode('utf8'))
96 self.update_minimal(new_entry_key, current_new_minikind,
97 executable=current_new[1].executable,
98 path_utf8=new_path_utf8, fingerprint=fingerprint)
99@@ -2512,11 +2532,17 @@
100 else:
101 # we've advanced past the place where the old key would be,
102 # without seeing it in the new list. so it must be gone.
103+ if tracing:
104+ trace.mutter("Deleting from old '%s/%s'.",
105+ current_old[0][0].decode('utf'),
106+ current_old[0][1].decode('utf8'))
107 self._make_absent(current_old)
108 current_old = advance(old_iterator)
109 self._dirblock_state = DirState.IN_MEMORY_MODIFIED
110 self._id_index = None
111 self._packed_stat_index = None
112+ if tracing:
113+ trace.mutter("set_state_from_inventory complete.")
114
115 def _make_absent(self, current_old):
116 """Mark current_old - an entry - as absent for tree 0.
117@@ -2618,17 +2644,29 @@
118 # relocations when updating an existing record but its not:
119 # the test for existing kinds is different: this can be
120 # factored out to a helper though.
121- other_block_index, present = self._find_block_index_from_key(other_key)
122+ other_block_index, present = self._find_block_index_from_key(
123+ other_key)
124 if not present:
125- raise AssertionError('could not find block for %s' % (other_key,))
126+ raise AssertionError('could not find block for %s' % (
127+ other_key,))
128 other_entry_index, present = self._find_entry_index(other_key,
129 self._dirblocks[other_block_index][1])
130 if not present:
131- raise AssertionError('could not find entry for %s' % (other_key,))
132+ raise AssertionError('could not find entry for %s' % (
133+ other_key,))
134 if path_utf8 is None:
135 raise AssertionError('no path')
136- self._dirblocks[other_block_index][1][other_entry_index][1][0] = \
137+ other_block = self._dirblocks[other_block_index][1]
138+ # Turn this other location into a reference to the new
139+ # location. This also updates the aliased iterator that
140+ # set_state_from_inventory uses so that the old entry, if
141+ # not already examined, is skipped over.
142+ other_block[other_entry_index][1][0] = \
143 ('r', path_utf8, 0, False, '')
144+ if len(other_block[other_entry_index][1]) == 1:
145+ # We only have one tree in use, so an 'r' record is not
146+ # needed: remove it.
147+ other_block.pop(other_entry_index)
148
149 num_present_parents = self._num_present_parents()
150 for lookup_index in xrange(1, num_present_parents + 1):
151
152=== modified file 'bzrlib/help_topics/en/debug-flags.txt'
153--- bzrlib/help_topics/en/debug-flags.txt 2009-07-06 09:47:35 +0000
154+++ bzrlib/help_topics/en/debug-flags.txt 2009-07-16 16:35:19 +0000
155@@ -5,6 +5,7 @@
156 prefix) put in the ``debug_flags`` variable in ``bazaar.conf``.
157
158 -Dauth Trace authentication sections used.
159+-Ddirstate Trace dirstate activity (verbose!)
160 -Derror Instead of normal error handling, always print a traceback
161 on error.
162 -Devil Capture call sites that do expensive or badly-scaling
163
164=== modified file 'bzrlib/tests/blackbox/test_commit.py'
165--- bzrlib/tests/blackbox/test_commit.py 2009-07-08 05:27:06 +0000
166+++ bzrlib/tests/blackbox/test_commit.py 2009-07-16 16:35:20 +0000
167@@ -639,25 +639,3 @@
168 out, err = self.run_bzr("commit tree/hello.txt")
169 last_rev = tree.branch.repository.get_revision(tree.last_revision())
170 self.assertEqual('save me some typing\n', last_rev.message)
171-
172- def test_commit_and_mv_dance_a(self):
173- # see https://bugs.launchpad.net/bzr/+bug/395556
174- tree = self.make_branch_and_tree(".")
175- self.build_tree(["a"])
176- tree.add("a")
177- self.check_output("a => b\n", ["mv", "a", "b"])
178- self.check_output("", ["commit", "-q", "-m", "Actually no, b"])
179- self.check_output("b => a\n", ["mv", "b", "a"])
180- self.check_output("", ["commit", "-q", "-m", "No, really, a"])
181-
182- def test_commit_and_mv_dance_b(self):
183- # see https://bugs.launchpad.net/bzr/+bug/395556
184- tree = self.make_branch_and_tree(".")
185- self.build_tree(["b"])
186- tree.add("b")
187- self.check_output("b => a\n", ["mv", "b", "a"])
188- self.check_output("", ["commit", "-q", "-m", "Actually no, a"])
189- self.check_output("a => b\n", ["mv", "a", "b"])
190- self.expectFailure("bug 395556: gives DuplicateFileId "
191- "committing renames",
192- self.check_output, "", ["commit", "-q", "-m", "No, really, b"])
193
194=== modified file 'bzrlib/tests/per_workingtree/test_commit.py'
195--- bzrlib/tests/per_workingtree/test_commit.py 2009-07-15 05:54:37 +0000
196+++ bzrlib/tests/per_workingtree/test_commit.py 2009-07-16 16:35:20 +0000
197@@ -611,29 +611,3 @@
198 revid = tree.commit('first post')
199 committed_tree = tree.basis_tree()
200 self.assertTrue(committed_tree.has_filename("newfile"))
201-
202- def test_commit_and_mv_dance_a(self):
203- # should fail because of
204- # <https://bugs.launchpad.net/bzr/+bug/395556> but apparently does
205- # not, while the blackbox.test_commit equivalent does - maybe because
206- # of different format combinations
207- tree = self.make_branch_and_tree(".")
208- self.build_tree(["a"])
209- tree.add("a")
210- tree.rename_one("a", "b")
211- tree.commit("Actually no, b")
212- tree.rename_one("b", "a")
213- tree.commit("No, really, a")
214-
215- def test_commit_and_mv_dance_b(self):
216- # should fail because of
217- # <https://bugs.launchpad.net/bzr/+bug/395556> but apparently does
218- # not, while the blackbox.test_commit equivalent does - maybe because
219- # of different format combinations
220- tree = self.make_branch_and_tree(".")
221- self.build_tree(["b"])
222- tree.add("b")
223- tree.rename_one("b", "a")
224- tree.commit("Actually no, a")
225- tree.rename_one("a", "b")
226- tree.commit("No, really, b")
227
228=== modified file 'bzrlib/tests/test_dirstate.py'
229--- bzrlib/tests/test_dirstate.py 2009-05-06 05:36:28 +0000
230+++ bzrlib/tests/test_dirstate.py 2009-07-16 16:35:20 +0000
231@@ -827,7 +827,6 @@
232 finally:
233 tree.unlock()
234
235-
236 def test_set_state_from_inventory_mixed_paths(self):
237 tree1 = self.make_branch_and_tree('tree1')
238 self.build_tree(['tree1/a/', 'tree1/a/b/', 'tree1/a-b/',
239@@ -942,7 +941,6 @@
240 finally:
241 state.unlock()
242
243-
244 def test_set_parent_trees_no_content(self):
245 # set_parent_trees is a slow but important api to support.
246 tree1 = self.make_branch_and_memory_tree('tree1')
247@@ -1238,6 +1236,38 @@
248 self.assertRaises(errors.BzrError,
249 state.add, '..', 'ass-id', 'directory', None, None)
250
251+ def test_set_state_with_rename_b_a_bug_395556(self):
252+ # bug 395556 uncovered a bug where the dirstate ends up with a false
253+ # relocation record - in a tree with no parents there should be no
254+ # absent or relocated records. This then leads to further corruption
255+ # when a commit occurs, as the incorrect relocation gathers an
256+ # incorrect absent in tree 1, and future changes go to pot.
257+ tree1 = self.make_branch_and_tree('tree1')
258+ self.build_tree(['tree1/b'])
259+ tree1.lock_write()
260+ try:
261+ tree1.add(['b'], ['b-id'])
262+ root_id = tree1.get_root_id()
263+ inv = tree1.inventory
264+ state = dirstate.DirState.initialize('dirstate')
265+ try:
266+ # Set the initial state with 'b'
267+ state.set_state_from_inventory(inv)
268+ inv.rename('b-id', root_id, 'a')
269+ # Set the new state with 'a', which currently corrupts.
270+ state.set_state_from_inventory(inv)
271+ expected_result1 = [('', '', root_id, 'd'),
272+ ('', 'a', 'b-id', 'f'),
273+ ]
274+ values = []
275+ for entry in state._iter_entries():
276+ values.append(entry[0] + entry[1][0][:1])
277+ self.assertEqual(expected_result1, values)
278+ finally:
279+ state.unlock()
280+ finally:
281+ tree1.unlock()
282+
283
284 class TestGetLines(TestCaseWithDirState):
285