Merge lp:~lifeless/bzr/bug-395556 into lp:~bzr/bzr/trunk-old

Proposed by Robert Collins
Status: Merged
Merged at revision: not available
Proposed branch: lp:~lifeless/bzr/bug-395556
Merge into: lp:~bzr/bzr/trunk-old
Diff against target: 284 lines (has conflicts)
Text conflict in NEWS
To merge this branch: bzr merge lp:~lifeless/bzr/bug-395556
Reviewer Review Type Date Requested Status
John A Meinel Needs Fixing
Review via email: mp+8857@code.launchpad.net

This proposal supersedes a proposal from 2009-07-15.

To post a comment you must log in.
Revision history for this message
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal

Fix dirstate.set_state_from_inventory to have a more stable
old_iterator, fixing rename problems in new trees. (Robert Collins, bug
395556
)

I've removed some tests that helped to diagnose the problem but had
nothing to do with where or how it was occuring; I don't think they add
value to the test environment - the particular way they tickled the bug,
though interesting, is not demonstrating an interface problem, and is a
particularly expensive way of showing that lower levels aren't broken.

-Rob

Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal

This certainly seems to be the simple and straightforward fix. I believe a "better" fix (faster, but more complex) would be to copy the objects when we want to mutate them, rather than trying to mutate-in-place.

Considering we are trying to get away from mutate-in-place anyway (CHKInventory doesn't support apply_delta anymore, etc.)

Anyway, we might want a comment about performance if we see a regression from this. ISTR that Inventory.copy() time can easily became a dominant factor in a lot of operations. But we can certainly start here.

review: Approve
Revision history for this message
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal

> Anyway, we might want a comment about performance if we see a regression from
> this. ISTR that Inventory.copy() time can easily became a dominant factor in a
> lot of operations. But we can certainly start here.

Yah, copy() is bad ;). That said, we're copying tuples lists and strings not complex objects, so there is some reason to believe it will be lighter weight.

Revision history for this message
Robert Collins (lifeless) wrote :

This branch has been updated; the first fix was flawed.

This one just changes the update_minimal to know how to remove a totally moved entry. I also added some tracing support for debugging future things of this ilk.

Revision history for this message
John A Meinel (jameinel) wrote :

As near as I can tell, you don't use deepcopy anymore so you don't need to import it
+from copy import deepcopy

and you need to revert the comment change:
41 - # list - which is a shallow copy.
42 + # list using a deep copy so that forward changes don't make the logic
43 + # more complex. Using a shallow copy results in all entries being seen
44 + # but the state of the entries being wrong, and that leads to stale
45 + # entries being left behind.

62 + if tracing:
63 + trace.mutter("Truncating from old '%s/%s'.",
64 + current_old[0][0].decode('utf'),
65 + current_old[0][1].decode('utf8'))

I don't know what encoding "utf" is, but I'd be really surprised if that works.

same here:
93 + if tracing:
94 + trace.mutter("Deleting from old '%s/%s'.",
95 + current_old[0][0].decode('utf'),
96 + current_old[0][1].decode('utf8'))

Perhaps a smoke test that you can set tracing on and have it not abort, though to really test that requires exercising all the code paths, which seems a bit much.

115 - raise AssertionError('could not find block for %s' % (other_key,))
116 + raise AssertionError('could not find block for %s' % (
117 + other_key,))
118 other_entry_index, present = self._find_entry_index(other_key,
119 self._dirblocks[other_block_index][1])
120 if not present:
121 - raise AssertionError('could not find entry for %s' % (other_key,))
122 + raise AssertionError('could not find entry for %s' % (
123 + other_key,))
124 if path_utf8 is None:
125 raise AssertionError('no path')
126 - self._dirblocks[other_block_index][1][other_entry_index][1][0] = \
127 + other_block = self._dirblocks[other_block_index][1]

move the "other_block = self._dirblocks[other_block_index][1]" up higher, and re-use it in the "other_entry_index, present = XXX" code.
...

126 - self._dirblocks[other_block_index][1][other_entry_index][1][0] = \
127 + other_block = self._dirblocks[other_block_index][1]
128 + # Turn this other location into a reference to the new
129 + # location. This also updates the aliased iterator that
130 + # set_state_from_inventory uses so that the old entry, if
131 + # not already examined, is skipped over.
132 + other_block[other_entry_index][1][0] = \
133 ('r', path_utf8, 0, False, '')
134 + if len(other_block[other_entry_index][1]) == 1:
135 + # We only have one tree in use, so an 'r' record is not
136 + # needed: remove it.
137 + other_block.pop(other_entry_index)

^- Why do you bother setting it, if you are just going to pop it out again right after?

The code comment here also is saying "updates the aliased iterator" but I don't see any such thing.

It at least looks like the code you've written is only tested for when there is a single tree, and not when there are merged trees. I could be wrong, but it at least feels like we need to do something more.

So at least, I would like you to:

1) Fix up the comments that are no longer correct
2) Fix 'utf' => 'utf-8'
3) Make sure (at least manually) that things are correct when there is a merge tree around.

review: Needs Fixing
Revision history for this message
Robert Collins (lifeless) wrote :

On Thu, 2009-07-16 at 21:43 +0000, John A Meinel wrote:
> 131 + # not already examined, is skipped over.
> 132 + other_block[other_entry_index][1][0] = \
> 133 ('r', path_utf8, 0, False, '')
> 134 + if len(other_block[other_entry_index][1]) == 1:
> 135 + # We only have one tree in use, so an 'r' record is not
> 136 + # needed: remove it.
> 137 + other_block.pop(other_entry_index)
>
> ^- Why do you bother setting it, if you are just going to pop it out
> again right after?

Because:

> The code comment here also is saying "updates the aliased iterator"
> but I don't see any such thing.

old_iterator - the iterator over the shallow copy will see the entry
tuples, and thus needs to see the 'r' record, to skip over it. If it
doesn't see the 'r' record, it changes it to 'a' in the row, which
corrupts multi column dirstates. Auditing the full logic again may make
it cleaner, but I'm really on a diversion here from fast-commit, and I
have limited time budget :(.

> It at least looks like the code you've written is only tested for when
> there is a single tree, and not when there are merged trees. I could
> be wrong, but it at least feels like we need to do something more.

> So at least, I would like you to:
>
> 1) Fix up the comments that are no longer correct

Done. And I've tweaked 'aliased iterator' to mention current_old
specifically.

> 2) Fix 'utf' => 'utf-8'

Done.

> 3) Make sure (at least manually) that things are correct when there is
> a merge tree around.

It is correct :).

The loop finds all the rows that are used by the merge trees; column 0
is set to 'r' in them all, and in the special case of there being - ah,
found a remaining conceptual bug. Will tweak, and I think it will make
it more obvious to you.

-Rob

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'NEWS'
--- NEWS 2009-07-16 08:00:03 +0000
+++ NEWS 2009-07-16 16:35:19 +0000
@@ -25,6 +25,7 @@
25* BranchBuilder now accepts timezone to avoid test failures in countries far25* BranchBuilder now accepts timezone to avoid test failures in countries far
26 from GMT. (Vincent Ladeuil, #397716)26 from GMT. (Vincent Ladeuil, #397716)
2727
28<<<<<<< TREE
28* ``bzr commit`` no longer saves the unversioning of missing files until29* ``bzr commit`` no longer saves the unversioning of missing files until
29 the commit has completed on the branch. This means that aborting a30 the commit has completed on the branch. This means that aborting a
30 commit that found a missing file will leave the tree unedited.31 commit that found a missing file will leave the tree unedited.
@@ -40,6 +41,12 @@
40 were present in a parent tree but renamed in the working tree.41 were present in a parent tree but renamed in the working tree.
41 (Robert Collins, #187207)42 (Robert Collins, #187207)
4243
44=======
45* Renames to lexographically lower basenames in trees that have never been
46 committed to will no longer corrupt the dirstate. This was caused by an
47 bug in the dirstate update_minimal method. (Robert Collins, #395556)
48
49>>>>>>> MERGE-SOURCE
43Improvements50Improvements
44************51************
4552
4653
=== modified file 'bzrlib/dirstate.py'
--- bzrlib/dirstate.py 2009-07-10 05:33:07 +0000
+++ bzrlib/dirstate.py 2009-07-16 16:35:19 +0000
@@ -203,6 +203,7 @@
203203
204import bisect204import bisect
205import binascii205import binascii
206from copy import deepcopy
206import errno207import errno
207import os208import os
208from stat import S_IEXEC209from stat import S_IEXEC
@@ -2422,6 +2423,9 @@
2422 if 'evil' in debug.debug_flags:2423 if 'evil' in debug.debug_flags:
2423 trace.mutter_callsite(1,2424 trace.mutter_callsite(1,
2424 "set_state_from_inventory called; please mutate the tree instead")2425 "set_state_from_inventory called; please mutate the tree instead")
2426 tracing = 'dirstate' in debug.debug_flags
2427 if tracing:
2428 trace.mutter("set_state_from_inventory trace:")
2425 self._read_dirblocks_if_needed()2429 self._read_dirblocks_if_needed()
2426 # sketch:2430 # sketch:
2427 # Two iterators: current data and new data, both in dirblock order.2431 # Two iterators: current data and new data, both in dirblock order.
@@ -2436,7 +2440,10 @@
2436 new_iterator = new_inv.iter_entries_by_dir()2440 new_iterator = new_inv.iter_entries_by_dir()
2437 # we will be modifying the dirstate, so we need a stable iterator. In2441 # we will be modifying the dirstate, so we need a stable iterator. In
2438 # future we might write one, for now we just clone the state into a2442 # future we might write one, for now we just clone the state into a
2439 # list - which is a shallow copy.2443 # list using a deep copy so that forward changes don't make the logic
2444 # more complex. Using a shallow copy results in all entries being seen
2445 # but the state of the entries being wrong, and that leads to stale
2446 # entries being left behind.
2440 old_iterator = iter(list(self._iter_entries()))2447 old_iterator = iter(list(self._iter_entries()))
2441 # both must have roots so this is safe:2448 # both must have roots so this is safe:
2442 current_new = new_iterator.next()2449 current_new = new_iterator.next()
@@ -2476,12 +2483,19 @@
2476 # we make both end conditions explicit2483 # we make both end conditions explicit
2477 if not current_old:2484 if not current_old:
2478 # old is finished: insert current_new into the state.2485 # old is finished: insert current_new into the state.
2486 if tracing:
2487 trace.mutter("Appending from new '%s'.",
2488 new_path_utf8.decode('utf8'))
2479 self.update_minimal(new_entry_key, current_new_minikind,2489 self.update_minimal(new_entry_key, current_new_minikind,
2480 executable=current_new[1].executable,2490 executable=current_new[1].executable,
2481 path_utf8=new_path_utf8, fingerprint=fingerprint)2491 path_utf8=new_path_utf8, fingerprint=fingerprint)
2482 current_new = advance(new_iterator)2492 current_new = advance(new_iterator)
2483 elif not current_new:2493 elif not current_new:
2484 # new is finished2494 # new is finished
2495 if tracing:
2496 trace.mutter("Truncating from old '%s/%s'.",
2497 current_old[0][0].decode('utf'),
2498 current_old[0][1].decode('utf8'))
2485 self._make_absent(current_old)2499 self._make_absent(current_old)
2486 current_old = advance(old_iterator)2500 current_old = advance(old_iterator)
2487 elif new_entry_key == current_old[0]:2501 elif new_entry_key == current_old[0]:
@@ -2494,6 +2508,9 @@
2494 # kind has changed.2508 # kind has changed.
2495 if (current_old[1][0][3] != current_new[1].executable or2509 if (current_old[1][0][3] != current_new[1].executable or
2496 current_old[1][0][0] != current_new_minikind):2510 current_old[1][0][0] != current_new_minikind):
2511 if tracing:
2512 trace.mutter("Updating in-place change '%s'.",
2513 new_path_utf8.decode('utf8'))
2497 self.update_minimal(current_old[0], current_new_minikind,2514 self.update_minimal(current_old[0], current_new_minikind,
2498 executable=current_new[1].executable,2515 executable=current_new[1].executable,
2499 path_utf8=new_path_utf8, fingerprint=fingerprint)2516 path_utf8=new_path_utf8, fingerprint=fingerprint)
@@ -2505,6 +2522,9 @@
2505 and new_entry_key[1:] < current_old[0][1:])):2522 and new_entry_key[1:] < current_old[0][1:])):
2506 # new comes before:2523 # new comes before:
2507 # add a entry for this and advance new2524 # add a entry for this and advance new
2525 if tracing:
2526 trace.mutter("Inserting from new '%s'.",
2527 new_path_utf8.decode('utf8'))
2508 self.update_minimal(new_entry_key, current_new_minikind,2528 self.update_minimal(new_entry_key, current_new_minikind,
2509 executable=current_new[1].executable,2529 executable=current_new[1].executable,
2510 path_utf8=new_path_utf8, fingerprint=fingerprint)2530 path_utf8=new_path_utf8, fingerprint=fingerprint)
@@ -2512,11 +2532,17 @@
2512 else:2532 else:
2513 # we've advanced past the place where the old key would be,2533 # we've advanced past the place where the old key would be,
2514 # without seeing it in the new list. so it must be gone.2534 # without seeing it in the new list. so it must be gone.
2535 if tracing:
2536 trace.mutter("Deleting from old '%s/%s'.",
2537 current_old[0][0].decode('utf'),
2538 current_old[0][1].decode('utf8'))
2515 self._make_absent(current_old)2539 self._make_absent(current_old)
2516 current_old = advance(old_iterator)2540 current_old = advance(old_iterator)
2517 self._dirblock_state = DirState.IN_MEMORY_MODIFIED2541 self._dirblock_state = DirState.IN_MEMORY_MODIFIED
2518 self._id_index = None2542 self._id_index = None
2519 self._packed_stat_index = None2543 self._packed_stat_index = None
2544 if tracing:
2545 trace.mutter("set_state_from_inventory complete.")
25202546
2521 def _make_absent(self, current_old):2547 def _make_absent(self, current_old):
2522 """Mark current_old - an entry - as absent for tree 0.2548 """Mark current_old - an entry - as absent for tree 0.
@@ -2618,17 +2644,29 @@
2618 # relocations when updating an existing record but its not:2644 # relocations when updating an existing record but its not:
2619 # the test for existing kinds is different: this can be2645 # the test for existing kinds is different: this can be
2620 # factored out to a helper though.2646 # factored out to a helper though.
2621 other_block_index, present = self._find_block_index_from_key(other_key)2647 other_block_index, present = self._find_block_index_from_key(
2648 other_key)
2622 if not present:2649 if not present:
2623 raise AssertionError('could not find block for %s' % (other_key,))2650 raise AssertionError('could not find block for %s' % (
2651 other_key,))
2624 other_entry_index, present = self._find_entry_index(other_key,2652 other_entry_index, present = self._find_entry_index(other_key,
2625 self._dirblocks[other_block_index][1])2653 self._dirblocks[other_block_index][1])
2626 if not present:2654 if not present:
2627 raise AssertionError('could not find entry for %s' % (other_key,))2655 raise AssertionError('could not find entry for %s' % (
2656 other_key,))
2628 if path_utf8 is None:2657 if path_utf8 is None:
2629 raise AssertionError('no path')2658 raise AssertionError('no path')
2630 self._dirblocks[other_block_index][1][other_entry_index][1][0] = \2659 other_block = self._dirblocks[other_block_index][1]
2660 # Turn this other location into a reference to the new
2661 # location. This also updates the aliased iterator that
2662 # set_state_from_inventory uses so that the old entry, if
2663 # not already examined, is skipped over.
2664 other_block[other_entry_index][1][0] = \
2631 ('r', path_utf8, 0, False, '')2665 ('r', path_utf8, 0, False, '')
2666 if len(other_block[other_entry_index][1]) == 1:
2667 # We only have one tree in use, so an 'r' record is not
2668 # needed: remove it.
2669 other_block.pop(other_entry_index)
26322670
2633 num_present_parents = self._num_present_parents()2671 num_present_parents = self._num_present_parents()
2634 for lookup_index in xrange(1, num_present_parents + 1):2672 for lookup_index in xrange(1, num_present_parents + 1):
26352673
=== modified file 'bzrlib/help_topics/en/debug-flags.txt'
--- bzrlib/help_topics/en/debug-flags.txt 2009-07-06 09:47:35 +0000
+++ bzrlib/help_topics/en/debug-flags.txt 2009-07-16 16:35:19 +0000
@@ -5,6 +5,7 @@
5prefix) put in the ``debug_flags`` variable in ``bazaar.conf``.5prefix) put in the ``debug_flags`` variable in ``bazaar.conf``.
66
7-Dauth Trace authentication sections used.7-Dauth Trace authentication sections used.
8-Ddirstate Trace dirstate activity (verbose!)
8-Derror Instead of normal error handling, always print a traceback9-Derror Instead of normal error handling, always print a traceback
9 on error.10 on error.
10-Devil Capture call sites that do expensive or badly-scaling11-Devil Capture call sites that do expensive or badly-scaling
1112
=== modified file 'bzrlib/tests/blackbox/test_commit.py'
--- bzrlib/tests/blackbox/test_commit.py 2009-07-08 05:27:06 +0000
+++ bzrlib/tests/blackbox/test_commit.py 2009-07-16 16:35:20 +0000
@@ -639,25 +639,3 @@
639 out, err = self.run_bzr("commit tree/hello.txt")639 out, err = self.run_bzr("commit tree/hello.txt")
640 last_rev = tree.branch.repository.get_revision(tree.last_revision())640 last_rev = tree.branch.repository.get_revision(tree.last_revision())
641 self.assertEqual('save me some typing\n', last_rev.message)641 self.assertEqual('save me some typing\n', last_rev.message)
642
643 def test_commit_and_mv_dance_a(self):
644 # see https://bugs.launchpad.net/bzr/+bug/395556
645 tree = self.make_branch_and_tree(".")
646 self.build_tree(["a"])
647 tree.add("a")
648 self.check_output("a => b\n", ["mv", "a", "b"])
649 self.check_output("", ["commit", "-q", "-m", "Actually no, b"])
650 self.check_output("b => a\n", ["mv", "b", "a"])
651 self.check_output("", ["commit", "-q", "-m", "No, really, a"])
652
653 def test_commit_and_mv_dance_b(self):
654 # see https://bugs.launchpad.net/bzr/+bug/395556
655 tree = self.make_branch_and_tree(".")
656 self.build_tree(["b"])
657 tree.add("b")
658 self.check_output("b => a\n", ["mv", "b", "a"])
659 self.check_output("", ["commit", "-q", "-m", "Actually no, a"])
660 self.check_output("a => b\n", ["mv", "a", "b"])
661 self.expectFailure("bug 395556: gives DuplicateFileId "
662 "committing renames",
663 self.check_output, "", ["commit", "-q", "-m", "No, really, b"])
664642
=== modified file 'bzrlib/tests/per_workingtree/test_commit.py'
--- bzrlib/tests/per_workingtree/test_commit.py 2009-07-15 05:54:37 +0000
+++ bzrlib/tests/per_workingtree/test_commit.py 2009-07-16 16:35:20 +0000
@@ -611,29 +611,3 @@
611 revid = tree.commit('first post')611 revid = tree.commit('first post')
612 committed_tree = tree.basis_tree()612 committed_tree = tree.basis_tree()
613 self.assertTrue(committed_tree.has_filename("newfile"))613 self.assertTrue(committed_tree.has_filename("newfile"))
614
615 def test_commit_and_mv_dance_a(self):
616 # should fail because of
617 # <https://bugs.launchpad.net/bzr/+bug/395556> but apparently does
618 # not, while the blackbox.test_commit equivalent does - maybe because
619 # of different format combinations
620 tree = self.make_branch_and_tree(".")
621 self.build_tree(["a"])
622 tree.add("a")
623 tree.rename_one("a", "b")
624 tree.commit("Actually no, b")
625 tree.rename_one("b", "a")
626 tree.commit("No, really, a")
627
628 def test_commit_and_mv_dance_b(self):
629 # should fail because of
630 # <https://bugs.launchpad.net/bzr/+bug/395556> but apparently does
631 # not, while the blackbox.test_commit equivalent does - maybe because
632 # of different format combinations
633 tree = self.make_branch_and_tree(".")
634 self.build_tree(["b"])
635 tree.add("b")
636 tree.rename_one("b", "a")
637 tree.commit("Actually no, a")
638 tree.rename_one("a", "b")
639 tree.commit("No, really, b")
640614
=== modified file 'bzrlib/tests/test_dirstate.py'
--- bzrlib/tests/test_dirstate.py 2009-05-06 05:36:28 +0000
+++ bzrlib/tests/test_dirstate.py 2009-07-16 16:35:20 +0000
@@ -827,7 +827,6 @@
827 finally:827 finally:
828 tree.unlock()828 tree.unlock()
829829
830
831 def test_set_state_from_inventory_mixed_paths(self):830 def test_set_state_from_inventory_mixed_paths(self):
832 tree1 = self.make_branch_and_tree('tree1')831 tree1 = self.make_branch_and_tree('tree1')
833 self.build_tree(['tree1/a/', 'tree1/a/b/', 'tree1/a-b/',832 self.build_tree(['tree1/a/', 'tree1/a/b/', 'tree1/a-b/',
@@ -942,7 +941,6 @@
942 finally:941 finally:
943 state.unlock()942 state.unlock()
944943
945
946 def test_set_parent_trees_no_content(self):944 def test_set_parent_trees_no_content(self):
947 # set_parent_trees is a slow but important api to support.945 # set_parent_trees is a slow but important api to support.
948 tree1 = self.make_branch_and_memory_tree('tree1')946 tree1 = self.make_branch_and_memory_tree('tree1')
@@ -1238,6 +1236,38 @@
1238 self.assertRaises(errors.BzrError,1236 self.assertRaises(errors.BzrError,
1239 state.add, '..', 'ass-id', 'directory', None, None)1237 state.add, '..', 'ass-id', 'directory', None, None)
12401238
1239 def test_set_state_with_rename_b_a_bug_395556(self):
1240 # bug 395556 uncovered a bug where the dirstate ends up with a false
1241 # relocation record - in a tree with no parents there should be no
1242 # absent or relocated records. This then leads to further corruption
1243 # when a commit occurs, as the incorrect relocation gathers an
1244 # incorrect absent in tree 1, and future changes go to pot.
1245 tree1 = self.make_branch_and_tree('tree1')
1246 self.build_tree(['tree1/b'])
1247 tree1.lock_write()
1248 try:
1249 tree1.add(['b'], ['b-id'])
1250 root_id = tree1.get_root_id()
1251 inv = tree1.inventory
1252 state = dirstate.DirState.initialize('dirstate')
1253 try:
1254 # Set the initial state with 'b'
1255 state.set_state_from_inventory(inv)
1256 inv.rename('b-id', root_id, 'a')
1257 # Set the new state with 'a', which currently corrupts.
1258 state.set_state_from_inventory(inv)
1259 expected_result1 = [('', '', root_id, 'd'),
1260 ('', 'a', 'b-id', 'f'),
1261 ]
1262 values = []
1263 for entry in state._iter_entries():
1264 values.append(entry[0] + entry[1][0][:1])
1265 self.assertEqual(expected_result1, values)
1266 finally:
1267 state.unlock()
1268 finally:
1269 tree1.unlock()
1270
12411271
1242class TestGetLines(TestCaseWithDirState):1272class TestGetLines(TestCaseWithDirState):
12431273