Merge lp:~spiv/bzr/insert-stream-check-chk-root into lp:bzr/2.0
- insert-stream-check-chk-root
- Merge into 2.0
Proposed by
Andrew Bennetts
Status: | Merged |
---|---|
Merged at revision: | not available |
Proposed branch: | lp:~spiv/bzr/insert-stream-check-chk-root |
Merge into: | lp:bzr/2.0 |
Diff against target: | 408 lines |
To merge this branch: | bzr merge lp:~spiv/bzr/insert-stream-check-chk-root |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Robert Collins (community) | Approve | ||
Review via email: mp+11033@code.launchpad.net |
Commit message
Description of the change
To post a comment you must log in.
Revision history for this message
Andrew Bennetts (spiv) wrote : | # |
Revision history for this message
Robert Collins (lifeless) wrote : | # |
tweak
I think that it would be clearer to use method overriding to handle pack vs chk rather than a check on self in the pack code: the bulk of the method is chk specific.
[though pack repos want their inventories to, so that suggests either duplicate, or missing coverage]
review:
Approve
Preview Diff
[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1 | === modified file 'bzrlib/groupcompress.py' | |||
2 | --- bzrlib/groupcompress.py 2009-08-26 16:47:51 +0000 | |||
3 | +++ bzrlib/groupcompress.py 2009-09-02 03:35:25 +0000 | |||
4 | @@ -1714,7 +1714,7 @@ | |||
5 | 1714 | 1714 | ||
6 | 1715 | def __init__(self, graph_index, is_locked, parents=True, | 1715 | def __init__(self, graph_index, is_locked, parents=True, |
7 | 1716 | add_callback=None, track_external_parent_refs=False, | 1716 | add_callback=None, track_external_parent_refs=False, |
9 | 1717 | inconsistency_fatal=True): | 1717 | inconsistency_fatal=True, track_new_keys=False): |
10 | 1718 | """Construct a _GCGraphIndex on a graph_index. | 1718 | """Construct a _GCGraphIndex on a graph_index. |
11 | 1719 | 1719 | ||
12 | 1720 | :param graph_index: An implementation of bzrlib.index.GraphIndex. | 1720 | :param graph_index: An implementation of bzrlib.index.GraphIndex. |
13 | @@ -1740,7 +1740,8 @@ | |||
14 | 1740 | self._is_locked = is_locked | 1740 | self._is_locked = is_locked |
15 | 1741 | self._inconsistency_fatal = inconsistency_fatal | 1741 | self._inconsistency_fatal = inconsistency_fatal |
16 | 1742 | if track_external_parent_refs: | 1742 | if track_external_parent_refs: |
18 | 1743 | self._key_dependencies = knit._KeyRefs() | 1743 | self._key_dependencies = knit._KeyRefs( |
19 | 1744 | track_new_keys=track_new_keys) | ||
20 | 1744 | else: | 1745 | else: |
21 | 1745 | self._key_dependencies = None | 1746 | self._key_dependencies = None |
22 | 1746 | 1747 | ||
23 | @@ -1800,10 +1801,14 @@ | |||
24 | 1800 | result.append((key, value)) | 1801 | result.append((key, value)) |
25 | 1801 | records = result | 1802 | records = result |
26 | 1802 | key_dependencies = self._key_dependencies | 1803 | key_dependencies = self._key_dependencies |
31 | 1803 | if key_dependencies is not None and self._parents: | 1804 | if key_dependencies is not None: |
32 | 1804 | for key, value, refs in records: | 1805 | if self._parents: |
33 | 1805 | parents = refs[0] | 1806 | for key, value, refs in records: |
34 | 1806 | key_dependencies.add_references(key, parents) | 1807 | parents = refs[0] |
35 | 1808 | key_dependencies.add_references(key, parents) | ||
36 | 1809 | else: | ||
37 | 1810 | for key, value, refs in records: | ||
38 | 1811 | new_keys.add_key(key) | ||
39 | 1807 | self._add_callback(records) | 1812 | self._add_callback(records) |
40 | 1808 | 1813 | ||
41 | 1809 | def _check_read(self): | 1814 | def _check_read(self): |
42 | @@ -1866,7 +1871,7 @@ | |||
43 | 1866 | """Return the keys of missing parents.""" | 1871 | """Return the keys of missing parents.""" |
44 | 1867 | # Copied from _KnitGraphIndex.get_missing_parents | 1872 | # Copied from _KnitGraphIndex.get_missing_parents |
45 | 1868 | # We may have false positives, so filter those out. | 1873 | # We may have false positives, so filter those out. |
47 | 1869 | self._key_dependencies.add_keys( | 1874 | self._key_dependencies.satisfy_refs_for_keys( |
48 | 1870 | self.get_parent_map(self._key_dependencies.get_unsatisfied_refs())) | 1875 | self.get_parent_map(self._key_dependencies.get_unsatisfied_refs())) |
49 | 1871 | return frozenset(self._key_dependencies.get_unsatisfied_refs()) | 1876 | return frozenset(self._key_dependencies.get_unsatisfied_refs()) |
50 | 1872 | 1877 | ||
51 | @@ -1926,17 +1931,17 @@ | |||
52 | 1926 | 1931 | ||
53 | 1927 | This allows this _GCGraphIndex to keep track of any missing | 1932 | This allows this _GCGraphIndex to keep track of any missing |
54 | 1928 | compression parents we may want to have filled in to make those | 1933 | compression parents we may want to have filled in to make those |
56 | 1929 | indices valid. | 1934 | indices valid. It also allows _GCGraphIndex to track any new keys. |
57 | 1930 | 1935 | ||
58 | 1931 | :param graph_index: A GraphIndex | 1936 | :param graph_index: A GraphIndex |
59 | 1932 | """ | 1937 | """ |
67 | 1933 | if self._key_dependencies is not None: | 1938 | key_dependencies = self._key_dependencies |
68 | 1934 | # Add parent refs from graph_index (and discard parent refs that | 1939 | if key_dependencies is None: |
69 | 1935 | # the graph_index has). | 1940 | return |
70 | 1936 | add_refs = self._key_dependencies.add_references | 1941 | for node in graph_index.iter_all_entries(): |
71 | 1937 | for node in graph_index.iter_all_entries(): | 1942 | # Add parent refs from graph_index (and discard parent refs |
72 | 1938 | add_refs(node[1], node[3][0]) | 1943 | # that the graph_index has). |
73 | 1939 | 1944 | key_dependencies.add_references(node[1], node[3][0]) | |
74 | 1940 | 1945 | ||
75 | 1941 | 1946 | ||
76 | 1942 | from bzrlib._groupcompress_py import ( | 1947 | from bzrlib._groupcompress_py import ( |
77 | 1943 | 1948 | ||
78 | === modified file 'bzrlib/knit.py' | |||
79 | --- bzrlib/knit.py 2009-08-26 16:44:27 +0000 | |||
80 | +++ bzrlib/knit.py 2009-09-02 03:35:25 +0000 | |||
81 | @@ -2777,9 +2777,19 @@ | |||
82 | 2777 | 2777 | ||
83 | 2778 | class _KeyRefs(object): | 2778 | class _KeyRefs(object): |
84 | 2779 | 2779 | ||
86 | 2780 | def __init__(self): | 2780 | def __init__(self, track_new_keys=False): |
87 | 2781 | # dict mapping 'key' to 'set of keys referring to that key' | 2781 | # dict mapping 'key' to 'set of keys referring to that key' |
88 | 2782 | self.refs = {} | 2782 | self.refs = {} |
89 | 2783 | if track_new_keys: | ||
90 | 2784 | self.new_keys = set() | ||
91 | 2785 | else: | ||
92 | 2786 | self.new_keys = None | ||
93 | 2787 | |||
94 | 2788 | def clear(self): | ||
95 | 2789 | if self.refs: | ||
96 | 2790 | self.refs.clear() | ||
97 | 2791 | if self.new_keys: | ||
98 | 2792 | self.new_keys.clear() | ||
99 | 2783 | 2793 | ||
100 | 2784 | def add_references(self, key, refs): | 2794 | def add_references(self, key, refs): |
101 | 2785 | # Record the new references | 2795 | # Record the new references |
102 | @@ -2792,19 +2802,28 @@ | |||
103 | 2792 | # Discard references satisfied by the new key | 2802 | # Discard references satisfied by the new key |
104 | 2793 | self.add_key(key) | 2803 | self.add_key(key) |
105 | 2794 | 2804 | ||
106 | 2805 | def get_new_keys(self): | ||
107 | 2806 | return self.new_keys | ||
108 | 2807 | |||
109 | 2795 | def get_unsatisfied_refs(self): | 2808 | def get_unsatisfied_refs(self): |
110 | 2796 | return self.refs.iterkeys() | 2809 | return self.refs.iterkeys() |
111 | 2797 | 2810 | ||
113 | 2798 | def add_key(self, key): | 2811 | def _satisfy_refs_for_key(self, key): |
114 | 2799 | try: | 2812 | try: |
115 | 2800 | del self.refs[key] | 2813 | del self.refs[key] |
116 | 2801 | except KeyError: | 2814 | except KeyError: |
117 | 2802 | # No keys depended on this key. That's ok. | 2815 | # No keys depended on this key. That's ok. |
118 | 2803 | pass | 2816 | pass |
119 | 2804 | 2817 | ||
121 | 2805 | def add_keys(self, keys): | 2818 | def add_key(self, key): |
122 | 2819 | # satisfy refs for key, and remember that we've seen this key. | ||
123 | 2820 | self._satisfy_refs_for_key(key) | ||
124 | 2821 | if self.new_keys is not None: | ||
125 | 2822 | self.new_keys.add(key) | ||
126 | 2823 | |||
127 | 2824 | def satisfy_refs_for_keys(self, keys): | ||
128 | 2806 | for key in keys: | 2825 | for key in keys: |
130 | 2807 | self.add_key(key) | 2826 | self._satisfy_refs_for_key(key) |
131 | 2808 | 2827 | ||
132 | 2809 | def get_referrers(self): | 2828 | def get_referrers(self): |
133 | 2810 | result = set() | 2829 | result = set() |
134 | @@ -2972,7 +2991,7 @@ | |||
135 | 2972 | # If updating this, you should also update | 2991 | # If updating this, you should also update |
136 | 2973 | # groupcompress._GCGraphIndex.get_missing_parents | 2992 | # groupcompress._GCGraphIndex.get_missing_parents |
137 | 2974 | # We may have false positives, so filter those out. | 2993 | # We may have false positives, so filter those out. |
139 | 2975 | self._key_dependencies.add_keys( | 2994 | self._key_dependencies.satisfy_refs_for_keys( |
140 | 2976 | self.get_parent_map(self._key_dependencies.get_unsatisfied_refs())) | 2995 | self.get_parent_map(self._key_dependencies.get_unsatisfied_refs())) |
141 | 2977 | return frozenset(self._key_dependencies.get_unsatisfied_refs()) | 2996 | return frozenset(self._key_dependencies.get_unsatisfied_refs()) |
142 | 2978 | 2997 | ||
143 | 2979 | 2998 | ||
144 | === modified file 'bzrlib/repofmt/groupcompress_repo.py' | |||
145 | --- bzrlib/repofmt/groupcompress_repo.py 2009-08-24 19:34:13 +0000 | |||
146 | +++ bzrlib/repofmt/groupcompress_repo.py 2009-09-02 03:35:25 +0000 | |||
147 | @@ -651,7 +651,7 @@ | |||
148 | 651 | _GCGraphIndex(self._pack_collection.revision_index.combined_index, | 651 | _GCGraphIndex(self._pack_collection.revision_index.combined_index, |
149 | 652 | add_callback=self._pack_collection.revision_index.add_callback, | 652 | add_callback=self._pack_collection.revision_index.add_callback, |
150 | 653 | parents=True, is_locked=self.is_locked, | 653 | parents=True, is_locked=self.is_locked, |
152 | 654 | track_external_parent_refs=True), | 654 | track_external_parent_refs=True, track_new_keys=True), |
153 | 655 | access=self._pack_collection.revision_index.data_access, | 655 | access=self._pack_collection.revision_index.data_access, |
154 | 656 | delta=False) | 656 | delta=False) |
155 | 657 | self.signatures = GroupCompressVersionedFiles( | 657 | self.signatures = GroupCompressVersionedFiles( |
156 | 658 | 658 | ||
157 | === modified file 'bzrlib/repofmt/pack_repo.py' | |||
158 | --- bzrlib/repofmt/pack_repo.py 2009-08-14 11:11:29 +0000 | |||
159 | +++ bzrlib/repofmt/pack_repo.py 2009-09-02 03:35:25 +0000 | |||
160 | @@ -2063,6 +2063,53 @@ | |||
161 | 2063 | self._remove_pack_indices(resumed_pack) | 2063 | self._remove_pack_indices(resumed_pack) |
162 | 2064 | del self._resumed_packs[:] | 2064 | del self._resumed_packs[:] |
163 | 2065 | 2065 | ||
164 | 2066 | def _check_new_inventories(self): | ||
165 | 2067 | """Detect missing inventories or chk root entries for the new revisions | ||
166 | 2068 | in this write group. | ||
167 | 2069 | |||
168 | 2070 | :returns: set of missing keys. Note that not every missing key is | ||
169 | 2071 | guaranteed to be reported. | ||
170 | 2072 | """ | ||
171 | 2073 | if getattr(self.repo, 'chk_bytes', None) is None: | ||
172 | 2074 | return set() | ||
173 | 2075 | # Ensure that all revisions added in this write group have: | ||
174 | 2076 | # - corresponding inventories, | ||
175 | 2077 | # - chk root entries for those inventories, | ||
176 | 2078 | # - and any present parent inventories have their chk root | ||
177 | 2079 | # entries too. | ||
178 | 2080 | # And all this should be independent of any fallback repository. | ||
179 | 2081 | key_deps = self.repo.revisions._index._key_dependencies | ||
180 | 2082 | new_revisions_keys = key_deps.get_new_keys() | ||
181 | 2083 | no_fallback_inv_index = self.repo.inventories._index | ||
182 | 2084 | no_fallback_chk_bytes_index = self.repo.chk_bytes._index | ||
183 | 2085 | inv_parent_map = no_fallback_inv_index.get_parent_map( | ||
184 | 2086 | new_revisions_keys) | ||
185 | 2087 | # Are any inventories for corresponding to the new revisions missing? | ||
186 | 2088 | corresponding_invs = set(inv_parent_map) | ||
187 | 2089 | missing_corresponding = set(new_revisions_keys) | ||
188 | 2090 | missing_corresponding.difference_update(corresponding_invs) | ||
189 | 2091 | if missing_corresponding: | ||
190 | 2092 | return [('inventories', key) for key in missing_corresponding] | ||
191 | 2093 | # Are any chk root entries missing for any inventories? This includes | ||
192 | 2094 | # any present parent inventories, which may be used when calculating | ||
193 | 2095 | # deltas for streaming. | ||
194 | 2096 | all_inv_keys = set(corresponding_invs) | ||
195 | 2097 | for parent_inv_keys in inv_parent_map.itervalues(): | ||
196 | 2098 | all_inv_keys.update(parent_inv_keys) | ||
197 | 2099 | # Filter out ghost parents. | ||
198 | 2100 | all_inv_keys.intersection_update( | ||
199 | 2101 | no_fallback_inv_index.get_parent_map(all_inv_keys)) | ||
200 | 2102 | all_missing = set() | ||
201 | 2103 | inv_ids = [key[-1] for key in all_inv_keys] | ||
202 | 2104 | for inv in self.repo.iter_inventories(inv_ids, 'unordered'): | ||
203 | 2105 | root_keys = set([inv.id_to_entry.key()]) | ||
204 | 2106 | if inv.parent_id_basename_to_file_id is not None: | ||
205 | 2107 | root_keys.add(inv.parent_id_basename_to_file_id.key()) | ||
206 | 2108 | present = no_fallback_chk_bytes_index.get_parent_map(root_keys) | ||
207 | 2109 | missing = root_keys.difference(present) | ||
208 | 2110 | all_missing.update([('chk_bytes',) + key for key in missing]) | ||
209 | 2111 | return all_missing | ||
210 | 2112 | |||
211 | 2066 | def _commit_write_group(self): | 2113 | def _commit_write_group(self): |
212 | 2067 | all_missing = set() | 2114 | all_missing = set() |
213 | 2068 | for prefix, versioned_file in ( | 2115 | for prefix, versioned_file in ( |
214 | @@ -2073,6 +2120,7 @@ | |||
215 | 2073 | ): | 2120 | ): |
216 | 2074 | missing = versioned_file.get_missing_compression_parent_keys() | 2121 | missing = versioned_file.get_missing_compression_parent_keys() |
217 | 2075 | all_missing.update([(prefix,) + key for key in missing]) | 2122 | all_missing.update([(prefix,) + key for key in missing]) |
218 | 2123 | all_missing.update(self._check_new_inventories()) | ||
219 | 2076 | if all_missing: | 2124 | if all_missing: |
220 | 2077 | raise errors.BzrCheckError( | 2125 | raise errors.BzrCheckError( |
221 | 2078 | "Repository %s has missing compression parent(s) %r " | 2126 | "Repository %s has missing compression parent(s) %r " |
222 | @@ -2222,7 +2270,7 @@ | |||
223 | 2222 | % (self._format, self.bzrdir.transport.base)) | 2270 | % (self._format, self.bzrdir.transport.base)) |
224 | 2223 | 2271 | ||
225 | 2224 | def _abort_write_group(self): | 2272 | def _abort_write_group(self): |
227 | 2225 | self.revisions._index._key_dependencies.refs.clear() | 2273 | self.revisions._index._key_dependencies.clear() |
228 | 2226 | self._pack_collection._abort_write_group() | 2274 | self._pack_collection._abort_write_group() |
229 | 2227 | 2275 | ||
230 | 2228 | def _get_source(self, to_format): | 2276 | def _get_source(self, to_format): |
231 | @@ -2242,13 +2290,14 @@ | |||
232 | 2242 | self._pack_collection._start_write_group() | 2290 | self._pack_collection._start_write_group() |
233 | 2243 | 2291 | ||
234 | 2244 | def _commit_write_group(self): | 2292 | def _commit_write_group(self): |
237 | 2245 | self.revisions._index._key_dependencies.refs.clear() | 2293 | hint = self._pack_collection._commit_write_group() |
238 | 2246 | return self._pack_collection._commit_write_group() | 2294 | self.revisions._index._key_dependencies.clear() |
239 | 2295 | return hint | ||
240 | 2247 | 2296 | ||
241 | 2248 | def suspend_write_group(self): | 2297 | def suspend_write_group(self): |
242 | 2249 | # XXX check self._write_group is self.get_transaction()? | 2298 | # XXX check self._write_group is self.get_transaction()? |
243 | 2250 | tokens = self._pack_collection._suspend_write_group() | 2299 | tokens = self._pack_collection._suspend_write_group() |
245 | 2251 | self.revisions._index._key_dependencies.refs.clear() | 2300 | self.revisions._index._key_dependencies.clear() |
246 | 2252 | self._write_group = None | 2301 | self._write_group = None |
247 | 2253 | return tokens | 2302 | return tokens |
248 | 2254 | 2303 | ||
249 | 2255 | 2304 | ||
250 | === modified file 'bzrlib/repository.py' | |||
251 | --- bzrlib/repository.py 2009-08-30 22:02:45 +0000 | |||
252 | +++ bzrlib/repository.py 2009-09-02 03:35:25 +0000 | |||
253 | @@ -1604,7 +1604,7 @@ | |||
254 | 1604 | # but at the moment we're only checking for texts referenced by | 1604 | # but at the moment we're only checking for texts referenced by |
255 | 1605 | # inventories at the graph's edge. | 1605 | # inventories at the graph's edge. |
256 | 1606 | key_deps = self.revisions._index._key_dependencies | 1606 | key_deps = self.revisions._index._key_dependencies |
258 | 1607 | key_deps.add_keys(present_inventories) | 1607 | key_deps.satisfy_refs_for_keys(present_inventories) |
259 | 1608 | referrers = frozenset(r[0] for r in key_deps.get_referrers()) | 1608 | referrers = frozenset(r[0] for r in key_deps.get_referrers()) |
260 | 1609 | file_ids = self.fileids_altered_by_revision_ids(referrers) | 1609 | file_ids = self.fileids_altered_by_revision_ids(referrers) |
261 | 1610 | missing_texts = set() | 1610 | missing_texts = set() |
262 | 1611 | 1611 | ||
263 | === modified file 'bzrlib/tests/per_repository/test_write_group.py' | |||
264 | --- bzrlib/tests/per_repository/test_write_group.py 2009-08-17 04:18:57 +0000 | |||
265 | +++ bzrlib/tests/per_repository/test_write_group.py 2009-09-02 03:35:25 +0000 | |||
266 | @@ -361,6 +361,143 @@ | |||
267 | 361 | sink.insert_stream((), repo._format, tokens) | 361 | sink.insert_stream((), repo._format, tokens) |
268 | 362 | self.assertEqual([True], call_log) | 362 | self.assertEqual([True], call_log) |
269 | 363 | 363 | ||
270 | 364 | def test_missing_chk_root_for_inventory(self): | ||
271 | 365 | """commit_write_group fails with BzrCheckError when the chk root record | ||
272 | 366 | for a new inventory is missing. | ||
273 | 367 | """ | ||
274 | 368 | builder = self.make_branch_builder('simple-branch') | ||
275 | 369 | builder.build_snapshot('A-id', None, [ | ||
276 | 370 | ('add', ('', 'root-id', 'directory', None)), | ||
277 | 371 | ('add', ('file', 'file-id', 'file', 'content\n'))]) | ||
278 | 372 | b = builder.get_branch() | ||
279 | 373 | if not b.repository._format.supports_chks: | ||
280 | 374 | raise TestNotApplicable('requires repository with chk_bytes') | ||
281 | 375 | b.lock_read() | ||
282 | 376 | self.addCleanup(b.unlock) | ||
283 | 377 | repo = self.make_repository('damaged-repo') | ||
284 | 378 | repo.lock_write() | ||
285 | 379 | repo.start_write_group() | ||
286 | 380 | # Now, add the objects manually | ||
287 | 381 | text_keys = [('file-id', 'A-id'), ('root-id', 'A-id')] | ||
288 | 382 | # Directly add the texts, inventory, and revision object for 'A-id' -- | ||
289 | 383 | # but don't add the chk_bytes. | ||
290 | 384 | src_repo = b.repository | ||
291 | 385 | repo.texts.insert_record_stream(src_repo.texts.get_record_stream( | ||
292 | 386 | text_keys, 'unordered', True)) | ||
293 | 387 | repo.inventories.insert_record_stream( | ||
294 | 388 | src_repo.inventories.get_record_stream( | ||
295 | 389 | [('A-id',)], 'unordered', True)) | ||
296 | 390 | repo.revisions.insert_record_stream( | ||
297 | 391 | src_repo.revisions.get_record_stream( | ||
298 | 392 | [('A-id',)], 'unordered', True)) | ||
299 | 393 | # Make sure the presence of the missing data in a fallback does not | ||
300 | 394 | # avoid the error. | ||
301 | 395 | repo.add_fallback_repository(b.repository) | ||
302 | 396 | self.assertRaises(errors.BzrCheckError, repo.commit_write_group) | ||
303 | 397 | reopened_repo = self.reopen_repo_and_resume_write_group(repo) | ||
304 | 398 | self.assertRaises( | ||
305 | 399 | errors.BzrCheckError, reopened_repo.commit_write_group) | ||
306 | 400 | reopened_repo.abort_write_group() | ||
307 | 401 | |||
308 | 402 | def test_missing_chk_root_for_unchanged_inventory(self): | ||
309 | 403 | """commit_write_group fails with BzrCheckError when the chk root record | ||
310 | 404 | for a new inventory is missing, even if the parent inventory is present | ||
311 | 405 | and has identical content (i.e. the same chk root). | ||
312 | 406 | |||
313 | 407 | A stacked repository containing only a revision with an identical | ||
314 | 408 | inventory to its parent will still have the chk root records for those | ||
315 | 409 | inventories. | ||
316 | 410 | |||
317 | 411 | (In principle the chk records are unnecessary in this case, but in | ||
318 | 412 | practice bzr 2.0rc1 (at least) expects to find them.) | ||
319 | 413 | """ | ||
320 | 414 | # Make a branch where the last two revisions have identical | ||
321 | 415 | # inventories. | ||
322 | 416 | builder = self.make_branch_builder('simple-branch') | ||
323 | 417 | builder.build_snapshot('A-id', None, [ | ||
324 | 418 | ('add', ('', 'root-id', 'directory', None)), | ||
325 | 419 | ('add', ('file', 'file-id', 'file', 'content\n'))]) | ||
326 | 420 | builder.build_snapshot('B-id', None, []) | ||
327 | 421 | builder.build_snapshot('C-id', None, []) | ||
328 | 422 | b = builder.get_branch() | ||
329 | 423 | if not b.repository._format.supports_chks: | ||
330 | 424 | raise TestNotApplicable('requires repository with chk_bytes') | ||
331 | 425 | b.lock_read() | ||
332 | 426 | self.addCleanup(b.unlock) | ||
333 | 427 | # check our setup: B-id and C-id should have identical chk root keys. | ||
334 | 428 | inv_b = b.repository.get_inventory('B-id') | ||
335 | 429 | inv_c = b.repository.get_inventory('C-id') | ||
336 | 430 | self.assertEqual(inv_b.id_to_entry.key(), inv_c.id_to_entry.key()) | ||
337 | 431 | # Now, manually insert objects for a stacked repo with only revision | ||
338 | 432 | # C-id: | ||
339 | 433 | # We need ('revisions', 'C-id'), ('inventories', 'C-id'), | ||
340 | 434 | # ('inventories', 'B-id'), and the corresponding chk roots for those | ||
341 | 435 | # inventories. | ||
342 | 436 | repo = self.make_repository('damaged-repo') | ||
343 | 437 | repo.lock_write() | ||
344 | 438 | repo.start_write_group() | ||
345 | 439 | src_repo = b.repository | ||
346 | 440 | repo.inventories.insert_record_stream( | ||
347 | 441 | src_repo.inventories.get_record_stream( | ||
348 | 442 | [('B-id',), ('C-id',)], 'unordered', True)) | ||
349 | 443 | repo.revisions.insert_record_stream( | ||
350 | 444 | src_repo.revisions.get_record_stream( | ||
351 | 445 | [('C-id',)], 'unordered', True)) | ||
352 | 446 | # Make sure the presence of the missing data in a fallback does not | ||
353 | 447 | # avoid the error. | ||
354 | 448 | repo.add_fallback_repository(b.repository) | ||
355 | 449 | self.assertRaises(errors.BzrCheckError, repo.commit_write_group) | ||
356 | 450 | reopened_repo = self.reopen_repo_and_resume_write_group(repo) | ||
357 | 451 | self.assertRaises( | ||
358 | 452 | errors.BzrCheckError, reopened_repo.commit_write_group) | ||
359 | 453 | reopened_repo.abort_write_group() | ||
360 | 454 | |||
361 | 455 | def test_missing_chk_root_for_parent_inventory(self): | ||
362 | 456 | """commit_write_group fails with BzrCheckError when the chk root record | ||
363 | 457 | for a parent inventory of a new revision is missing. | ||
364 | 458 | """ | ||
365 | 459 | builder = self.make_branch_builder('simple-branch') | ||
366 | 460 | builder.build_snapshot('A-id', None, [ | ||
367 | 461 | ('add', ('', 'root-id', 'directory', None)), | ||
368 | 462 | ('add', ('file', 'file-id', 'file', 'content\n'))]) | ||
369 | 463 | builder.build_snapshot('B-id', None, []) | ||
370 | 464 | builder.build_snapshot('C-id', None, [ | ||
371 | 465 | ('modify', ('file-id', 'new-content'))]) | ||
372 | 466 | b = builder.get_branch() | ||
373 | 467 | if not b.repository._format.supports_chks: | ||
374 | 468 | raise TestNotApplicable('requires repository with chk_bytes') | ||
375 | 469 | b.lock_read() | ||
376 | 470 | self.addCleanup(b.unlock) | ||
377 | 471 | # Now, manually insert objects for a stacked repo with only revision | ||
378 | 472 | # C-id, *except* the chk root entry for the parent inventory. | ||
379 | 473 | # We need ('revisions', 'C-id'), ('inventories', 'C-id'), | ||
380 | 474 | # ('inventories', 'B-id'), and the corresponding chk roots for those | ||
381 | 475 | # inventories. | ||
382 | 476 | inv_c = b.repository.get_inventory('C-id') | ||
383 | 477 | chk_keys_for_c_only = [ | ||
384 | 478 | inv_c.id_to_entry.key(), inv_c.parent_id_basename_to_file_id.key()] | ||
385 | 479 | repo = self.make_repository('damaged-repo') | ||
386 | 480 | repo.lock_write() | ||
387 | 481 | repo.start_write_group() | ||
388 | 482 | src_repo = b.repository | ||
389 | 483 | repo.chk_bytes.insert_record_stream( | ||
390 | 484 | src_repo.chk_bytes.get_record_stream( | ||
391 | 485 | chk_keys_for_c_only, 'unordered', True)) | ||
392 | 486 | repo.inventories.insert_record_stream( | ||
393 | 487 | src_repo.inventories.get_record_stream( | ||
394 | 488 | [('B-id',), ('C-id',)], 'unordered', True)) | ||
395 | 489 | repo.revisions.insert_record_stream( | ||
396 | 490 | src_repo.revisions.get_record_stream( | ||
397 | 491 | [('C-id',)], 'unordered', True)) | ||
398 | 492 | # Make sure the presence of the missing data in a fallback does not | ||
399 | 493 | # avoid the error. | ||
400 | 494 | repo.add_fallback_repository(b.repository) | ||
401 | 495 | self.assertRaises(errors.BzrCheckError, repo.commit_write_group) | ||
402 | 496 | reopened_repo = self.reopen_repo_and_resume_write_group(repo) | ||
403 | 497 | self.assertRaises( | ||
404 | 498 | errors.BzrCheckError, reopened_repo.commit_write_group) | ||
405 | 499 | reopened_repo.abort_write_group() | ||
406 | 500 | |||
407 | 364 | 501 | ||
408 | 365 | class TestResumeableWriteGroup(TestCaseWithRepository): | 502 | class TestResumeableWriteGroup(TestCaseWithRepository): |
409 | 366 | 503 |
This is a partial fix for bug 406687, and an incremental step towards a full fix. It adds a check to RepositoryPackC ollection. _commit_ write_group that verifies every revision added by that write group has a corresponding inventory, and that for every corresponding inventory plus any present parent inventories that the chk root entries are present — and presence in a fallback does not count. At the same time ghost parent inventories are still allowed. This fix is sufficient at least to catch some bugs we had in the past where no chk records were transferred at all in some cases.
(The full fix, which I am working on, will involve checking that all relevant chk records are present, and that the text versions named by those relevant records are also present.)
I experimented with not requiring chk root records in a stacked repository for inventories that have not changed, because in principle they aren't required (just comparing the key names is enough to show that the delta between those revs is empty), but bzr currently fails on such a repository, so allowing this would allow data incompatible with e.g. 2.0rc1, which I think is undesirable.
I am a little worried about the possible performance impact of this change. I don't think the code is particularly wasteful (except perhaps in holding collections of keys that are larger or more long-lived than strictly necessary), but it is fundamentally more work to perform on every 2a repository write.