Merge lp:~lifeless/bzr/bug-395556 into lp:~bzr/bzr/trunk-old
- bug-395556
- Merge into trunk-old
Status: | Merged |
---|---|
Merged at revision: | not available |
Proposed branch: | lp:~lifeless/bzr/bug-395556 |
Merge into: | lp:~bzr/bzr/trunk-old |
Diff against target: |
284 lines (has conflicts)
Text conflict in NEWS |
To merge this branch: | bzr merge lp:~lifeless/bzr/bug-395556 |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
John A Meinel | Needs Fixing | ||
Review via email: mp+8857@code.launchpad.net |
This proposal supersedes a proposal from 2009-07-15.
Commit message
Description of the change
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal | # |
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal | # |
This certainly seems to be the simple and straightforward fix. I believe a "better" fix (faster, but more complex) would be to copy the objects when we want to mutate them, rather than trying to mutate-in-place.
Considering we are trying to get away from mutate-in-place anyway (CHKInventory doesn't support apply_delta anymore, etc.)
Anyway, we might want a comment about performance if we see a regression from this. ISTR that Inventory.copy() time can easily became a dominant factor in a lot of operations. But we can certainly start here.
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal | # |
> Anyway, we might want a comment about performance if we see a regression from
> this. ISTR that Inventory.copy() time can easily became a dominant factor in a
> lot of operations. But we can certainly start here.
Yah, copy() is bad ;). That said, we're copying tuples lists and strings not complex objects, so there is some reason to believe it will be lighter weight.
Robert Collins (lifeless) wrote : | # |
This branch has been updated; the first fix was flawed.
This one just changes the update_minimal to know how to remove a totally moved entry. I also added some tracing support for debugging future things of this ilk.
John A Meinel (jameinel) wrote : | # |
As near as I can tell, you don't use deepcopy anymore so you don't need to import it
+from copy import deepcopy
and you need to revert the comment change:
41 - # list - which is a shallow copy.
42 + # list using a deep copy so that forward changes don't make the logic
43 + # more complex. Using a shallow copy results in all entries being seen
44 + # but the state of the entries being wrong, and that leads to stale
45 + # entries being left behind.
62 + if tracing:
63 + trace.mutter(
64 + current_
65 + current_
I don't know what encoding "utf" is, but I'd be really surprised if that works.
same here:
93 + if tracing:
94 + trace.mutter(
95 + current_
96 + current_
Perhaps a smoke test that you can set tracing on and have it not abort, though to really test that requires exercising all the code paths, which seems a bit much.
115 - raise AssertionError(
116 + raise AssertionError(
117 + other_key,))
118 other_entry_index, present = self._find_
119 self._dirblocks
120 if not present:
121 - raise AssertionError(
122 + raise AssertionError(
123 + other_key,))
124 if path_utf8 is None:
125 raise AssertionError('no path')
126 - self._dirblocks
127 + other_block = self._dirblocks
move the "other_block = self._dirblocks
...
126 - self._dirblocks
127 + other_block = self._dirblocks
128 + # Turn this other location into a reference to the new
129 + # location. This also updates the aliased iterator that
130 + # set_state_
131 + # not already examined, is skipped over.
132 + other_block[
133 ('r', path_utf8, 0, False, '')
134 + if len(other_
135 + # We only have one tree in use, so an 'r' record is not
136 + # needed: remove it.
137 + other_block.
^- Why do you bother setting it, if you are just going to pop it out again right after?
The code comment here also is saying "updates the aliased iterator" but I don't see any such thing.
It at least looks like the code you've written is only tested for when there is a single tree, and not when there are merged trees. I could be wrong, but it at least feels like we need to do something more.
So at least, I would like you to:
1) Fix up the comments that are no longer correct
2) Fix 'utf' => 'utf-8'
3) Make sure (at least manually) that things are correct when there is a merge tree around.
Robert Collins (lifeless) wrote : | # |
On Thu, 2009-07-16 at 21:43 +0000, John A Meinel wrote:
> 131 + # not already examined, is skipped over.
> 132 + other_block[
> 133 ('r', path_utf8, 0, False, '')
> 134 + if len(other_
> 135 + # We only have one tree in use, so an 'r' record is not
> 136 + # needed: remove it.
> 137 + other_block.
>
> ^- Why do you bother setting it, if you are just going to pop it out
> again right after?
Because:
> The code comment here also is saying "updates the aliased iterator"
> but I don't see any such thing.
old_iterator - the iterator over the shallow copy will see the entry
tuples, and thus needs to see the 'r' record, to skip over it. If it
doesn't see the 'r' record, it changes it to 'a' in the row, which
corrupts multi column dirstates. Auditing the full logic again may make
it cleaner, but I'm really on a diversion here from fast-commit, and I
have limited time budget :(.
> It at least looks like the code you've written is only tested for when
> there is a single tree, and not when there are merged trees. I could
> be wrong, but it at least feels like we need to do something more.
> So at least, I would like you to:
>
> 1) Fix up the comments that are no longer correct
Done. And I've tweaked 'aliased iterator' to mention current_old
specifically.
> 2) Fix 'utf' => 'utf-8'
Done.
> 3) Make sure (at least manually) that things are correct when there is
> a merge tree around.
It is correct :).
The loop finds all the rows that are used by the merge trees; column 0
is set to 'r' in them all, and in the special case of there being - ah,
found a remaining conceptual bug. Will tweak, and I think it will make
it more obvious to you.
-Rob
Preview Diff
1 | === modified file 'NEWS' |
2 | --- NEWS 2009-07-16 08:00:03 +0000 |
3 | +++ NEWS 2009-07-16 16:35:19 +0000 |
4 | @@ -25,6 +25,7 @@ |
5 | * BranchBuilder now accepts timezone to avoid test failures in countries far |
6 | from GMT. (Vincent Ladeuil, #397716) |
7 | |
8 | +<<<<<<< TREE |
9 | * ``bzr commit`` no longer saves the unversioning of missing files until |
10 | the commit has completed on the branch. This means that aborting a |
11 | commit that found a missing file will leave the tree unedited. |
12 | @@ -40,6 +41,12 @@ |
13 | were present in a parent tree but renamed in the working tree. |
14 | (Robert Collins, #187207) |
15 | |
16 | +======= |
17 | +* Renames to lexographically lower basenames in trees that have never been |
18 | + committed to will no longer corrupt the dirstate. This was caused by an |
19 | + bug in the dirstate update_minimal method. (Robert Collins, #395556) |
20 | + |
21 | +>>>>>>> MERGE-SOURCE |
22 | Improvements |
23 | ************ |
24 | |
25 | |
26 | === modified file 'bzrlib/dirstate.py' |
27 | --- bzrlib/dirstate.py 2009-07-10 05:33:07 +0000 |
28 | +++ bzrlib/dirstate.py 2009-07-16 16:35:19 +0000 |
29 | @@ -203,6 +203,7 @@ |
30 | |
31 | import bisect |
32 | import binascii |
33 | +from copy import deepcopy |
34 | import errno |
35 | import os |
36 | from stat import S_IEXEC |
37 | @@ -2422,6 +2423,9 @@ |
38 | if 'evil' in debug.debug_flags: |
39 | trace.mutter_callsite(1, |
40 | "set_state_from_inventory called; please mutate the tree instead") |
41 | + tracing = 'dirstate' in debug.debug_flags |
42 | + if tracing: |
43 | + trace.mutter("set_state_from_inventory trace:") |
44 | self._read_dirblocks_if_needed() |
45 | # sketch: |
46 | # Two iterators: current data and new data, both in dirblock order. |
47 | @@ -2436,7 +2440,10 @@ |
48 | new_iterator = new_inv.iter_entries_by_dir() |
49 | # we will be modifying the dirstate, so we need a stable iterator. In |
50 | # future we might write one, for now we just clone the state into a |
51 | - # list - which is a shallow copy. |
52 | + # list using a deep copy so that forward changes don't make the logic |
53 | + # more complex. Using a shallow copy results in all entries being seen |
54 | + # but the state of the entries being wrong, and that leads to stale |
55 | + # entries being left behind. |
56 | old_iterator = iter(list(self._iter_entries())) |
57 | # both must have roots so this is safe: |
58 | current_new = new_iterator.next() |
59 | @@ -2476,12 +2483,19 @@ |
60 | # we make both end conditions explicit |
61 | if not current_old: |
62 | # old is finished: insert current_new into the state. |
63 | + if tracing: |
64 | + trace.mutter("Appending from new '%s'.", |
65 | + new_path_utf8.decode('utf8')) |
66 | self.update_minimal(new_entry_key, current_new_minikind, |
67 | executable=current_new[1].executable, |
68 | path_utf8=new_path_utf8, fingerprint=fingerprint) |
69 | current_new = advance(new_iterator) |
70 | elif not current_new: |
71 | # new is finished |
72 | + if tracing: |
73 | + trace.mutter("Truncating from old '%s/%s'.", |
74 | + current_old[0][0].decode('utf'), |
75 | + current_old[0][1].decode('utf8')) |
76 | self._make_absent(current_old) |
77 | current_old = advance(old_iterator) |
78 | elif new_entry_key == current_old[0]: |
79 | @@ -2494,6 +2508,9 @@ |
80 | # kind has changed. |
81 | if (current_old[1][0][3] != current_new[1].executable or |
82 | current_old[1][0][0] != current_new_minikind): |
83 | + if tracing: |
84 | + trace.mutter("Updating in-place change '%s'.", |
85 | + new_path_utf8.decode('utf8')) |
86 | self.update_minimal(current_old[0], current_new_minikind, |
87 | executable=current_new[1].executable, |
88 | path_utf8=new_path_utf8, fingerprint=fingerprint) |
89 | @@ -2505,6 +2522,9 @@ |
90 | and new_entry_key[1:] < current_old[0][1:])): |
91 | # new comes before: |
92 | # add a entry for this and advance new |
93 | + if tracing: |
94 | + trace.mutter("Inserting from new '%s'.", |
95 | + new_path_utf8.decode('utf8')) |
96 | self.update_minimal(new_entry_key, current_new_minikind, |
97 | executable=current_new[1].executable, |
98 | path_utf8=new_path_utf8, fingerprint=fingerprint) |
99 | @@ -2512,11 +2532,17 @@ |
100 | else: |
101 | # we've advanced past the place where the old key would be, |
102 | # without seeing it in the new list. so it must be gone. |
103 | + if tracing: |
104 | + trace.mutter("Deleting from old '%s/%s'.", |
105 | + current_old[0][0].decode('utf'), |
106 | + current_old[0][1].decode('utf8')) |
107 | self._make_absent(current_old) |
108 | current_old = advance(old_iterator) |
109 | self._dirblock_state = DirState.IN_MEMORY_MODIFIED |
110 | self._id_index = None |
111 | self._packed_stat_index = None |
112 | + if tracing: |
113 | + trace.mutter("set_state_from_inventory complete.") |
114 | |
115 | def _make_absent(self, current_old): |
116 | """Mark current_old - an entry - as absent for tree 0. |
117 | @@ -2618,17 +2644,29 @@ |
118 | # relocations when updating an existing record but its not: |
119 | # the test for existing kinds is different: this can be |
120 | # factored out to a helper though. |
121 | - other_block_index, present = self._find_block_index_from_key(other_key) |
122 | + other_block_index, present = self._find_block_index_from_key( |
123 | + other_key) |
124 | if not present: |
125 | - raise AssertionError('could not find block for %s' % (other_key,)) |
126 | + raise AssertionError('could not find block for %s' % ( |
127 | + other_key,)) |
128 | other_entry_index, present = self._find_entry_index(other_key, |
129 | self._dirblocks[other_block_index][1]) |
130 | if not present: |
131 | - raise AssertionError('could not find entry for %s' % (other_key,)) |
132 | + raise AssertionError('could not find entry for %s' % ( |
133 | + other_key,)) |
134 | if path_utf8 is None: |
135 | raise AssertionError('no path') |
136 | - self._dirblocks[other_block_index][1][other_entry_index][1][0] = \ |
137 | + other_block = self._dirblocks[other_block_index][1] |
138 | + # Turn this other location into a reference to the new |
139 | + # location. This also updates the aliased iterator that |
140 | + # set_state_from_inventory uses so that the old entry, if |
141 | + # not already examined, is skipped over. |
142 | + other_block[other_entry_index][1][0] = \ |
143 | ('r', path_utf8, 0, False, '') |
144 | + if len(other_block[other_entry_index][1]) == 1: |
145 | + # We only have one tree in use, so an 'r' record is not |
146 | + # needed: remove it. |
147 | + other_block.pop(other_entry_index) |
148 | |
149 | num_present_parents = self._num_present_parents() |
150 | for lookup_index in xrange(1, num_present_parents + 1): |
151 | |
152 | === modified file 'bzrlib/help_topics/en/debug-flags.txt' |
153 | --- bzrlib/help_topics/en/debug-flags.txt 2009-07-06 09:47:35 +0000 |
154 | +++ bzrlib/help_topics/en/debug-flags.txt 2009-07-16 16:35:19 +0000 |
155 | @@ -5,6 +5,7 @@ |
156 | prefix) put in the ``debug_flags`` variable in ``bazaar.conf``. |
157 | |
158 | -Dauth Trace authentication sections used. |
159 | +-Ddirstate Trace dirstate activity (verbose!) |
160 | -Derror Instead of normal error handling, always print a traceback |
161 | on error. |
162 | -Devil Capture call sites that do expensive or badly-scaling |
163 | |
164 | === modified file 'bzrlib/tests/blackbox/test_commit.py' |
165 | --- bzrlib/tests/blackbox/test_commit.py 2009-07-08 05:27:06 +0000 |
166 | +++ bzrlib/tests/blackbox/test_commit.py 2009-07-16 16:35:20 +0000 |
167 | @@ -639,25 +639,3 @@ |
168 | out, err = self.run_bzr("commit tree/hello.txt") |
169 | last_rev = tree.branch.repository.get_revision(tree.last_revision()) |
170 | self.assertEqual('save me some typing\n', last_rev.message) |
171 | - |
172 | - def test_commit_and_mv_dance_a(self): |
173 | - # see https://bugs.launchpad.net/bzr/+bug/395556 |
174 | - tree = self.make_branch_and_tree(".") |
175 | - self.build_tree(["a"]) |
176 | - tree.add("a") |
177 | - self.check_output("a => b\n", ["mv", "a", "b"]) |
178 | - self.check_output("", ["commit", "-q", "-m", "Actually no, b"]) |
179 | - self.check_output("b => a\n", ["mv", "b", "a"]) |
180 | - self.check_output("", ["commit", "-q", "-m", "No, really, a"]) |
181 | - |
182 | - def test_commit_and_mv_dance_b(self): |
183 | - # see https://bugs.launchpad.net/bzr/+bug/395556 |
184 | - tree = self.make_branch_and_tree(".") |
185 | - self.build_tree(["b"]) |
186 | - tree.add("b") |
187 | - self.check_output("b => a\n", ["mv", "b", "a"]) |
188 | - self.check_output("", ["commit", "-q", "-m", "Actually no, a"]) |
189 | - self.check_output("a => b\n", ["mv", "a", "b"]) |
190 | - self.expectFailure("bug 395556: gives DuplicateFileId " |
191 | - "committing renames", |
192 | - self.check_output, "", ["commit", "-q", "-m", "No, really, b"]) |
193 | |
194 | === modified file 'bzrlib/tests/per_workingtree/test_commit.py' |
195 | --- bzrlib/tests/per_workingtree/test_commit.py 2009-07-15 05:54:37 +0000 |
196 | +++ bzrlib/tests/per_workingtree/test_commit.py 2009-07-16 16:35:20 +0000 |
197 | @@ -611,29 +611,3 @@ |
198 | revid = tree.commit('first post') |
199 | committed_tree = tree.basis_tree() |
200 | self.assertTrue(committed_tree.has_filename("newfile")) |
201 | - |
202 | - def test_commit_and_mv_dance_a(self): |
203 | - # should fail because of |
204 | - # <https://bugs.launchpad.net/bzr/+bug/395556> but apparently does |
205 | - # not, while the blackbox.test_commit equivalent does - maybe because |
206 | - # of different format combinations |
207 | - tree = self.make_branch_and_tree(".") |
208 | - self.build_tree(["a"]) |
209 | - tree.add("a") |
210 | - tree.rename_one("a", "b") |
211 | - tree.commit("Actually no, b") |
212 | - tree.rename_one("b", "a") |
213 | - tree.commit("No, really, a") |
214 | - |
215 | - def test_commit_and_mv_dance_b(self): |
216 | - # should fail because of |
217 | - # <https://bugs.launchpad.net/bzr/+bug/395556> but apparently does |
218 | - # not, while the blackbox.test_commit equivalent does - maybe because |
219 | - # of different format combinations |
220 | - tree = self.make_branch_and_tree(".") |
221 | - self.build_tree(["b"]) |
222 | - tree.add("b") |
223 | - tree.rename_one("b", "a") |
224 | - tree.commit("Actually no, a") |
225 | - tree.rename_one("a", "b") |
226 | - tree.commit("No, really, b") |
227 | |
228 | === modified file 'bzrlib/tests/test_dirstate.py' |
229 | --- bzrlib/tests/test_dirstate.py 2009-05-06 05:36:28 +0000 |
230 | +++ bzrlib/tests/test_dirstate.py 2009-07-16 16:35:20 +0000 |
231 | @@ -827,7 +827,6 @@ |
232 | finally: |
233 | tree.unlock() |
234 | |
235 | - |
236 | def test_set_state_from_inventory_mixed_paths(self): |
237 | tree1 = self.make_branch_and_tree('tree1') |
238 | self.build_tree(['tree1/a/', 'tree1/a/b/', 'tree1/a-b/', |
239 | @@ -942,7 +941,6 @@ |
240 | finally: |
241 | state.unlock() |
242 | |
243 | - |
244 | def test_set_parent_trees_no_content(self): |
245 | # set_parent_trees is a slow but important api to support. |
246 | tree1 = self.make_branch_and_memory_tree('tree1') |
247 | @@ -1238,6 +1236,38 @@ |
248 | self.assertRaises(errors.BzrError, |
249 | state.add, '..', 'ass-id', 'directory', None, None) |
250 | |
251 | + def test_set_state_with_rename_b_a_bug_395556(self): |
252 | + # bug 395556 uncovered a bug where the dirstate ends up with a false |
253 | + # relocation record - in a tree with no parents there should be no |
254 | + # absent or relocated records. This then leads to further corruption |
255 | + # when a commit occurs, as the incorrect relocation gathers an |
256 | + # incorrect absent in tree 1, and future changes go to pot. |
257 | + tree1 = self.make_branch_and_tree('tree1') |
258 | + self.build_tree(['tree1/b']) |
259 | + tree1.lock_write() |
260 | + try: |
261 | + tree1.add(['b'], ['b-id']) |
262 | + root_id = tree1.get_root_id() |
263 | + inv = tree1.inventory |
264 | + state = dirstate.DirState.initialize('dirstate') |
265 | + try: |
266 | + # Set the initial state with 'b' |
267 | + state.set_state_from_inventory(inv) |
268 | + inv.rename('b-id', root_id, 'a') |
269 | + # Set the new state with 'a', which currently corrupts. |
270 | + state.set_state_from_inventory(inv) |
271 | + expected_result1 = [('', '', root_id, 'd'), |
272 | + ('', 'a', 'b-id', 'f'), |
273 | + ] |
274 | + values = [] |
275 | + for entry in state._iter_entries(): |
276 | + values.append(entry[0] + entry[1][0][:1]) |
277 | + self.assertEqual(expected_result1, values) |
278 | + finally: |
279 | + state.unlock() |
280 | + finally: |
281 | + tree1.unlock() |
282 | + |
283 | |
284 | class TestGetLines(TestCaseWithDirState): |
285 |
Fix dirstate. set_state_ from_inventory to have a more stable
old_iterator, fixing rename problems in new trees. (Robert Collins, bug
395556)
I've removed some tests that helped to diagnose the problem but had
nothing to do with where or how it was occuring; I don't think they add
value to the test environment - the particular way they tickled the bug,
though interesting, is not demonstrating an interface problem, and is a
particularly expensive way of showing that lower levels aren't broken.
-Rob