Merge lp:~spiv/bzr/insert-stream-check-chks-part-2 into lp:bzr/2.0
- insert-stream-check-chks-part-2
- Merge into 2.0
Status: | Merged | ||||
---|---|---|---|---|---|
Merged at revision: | not available | ||||
Proposed branch: | lp:~spiv/bzr/insert-stream-check-chks-part-2 | ||||
Merge into: | lp:bzr/2.0 | ||||
Diff against target: | None lines | ||||
To merge this branch: | bzr merge lp:~spiv/bzr/insert-stream-check-chks-part-2 | ||||
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Martin Pool | Needs Fixing | ||
Review via email: mp+11290@code.launchpad.net |
Commit message
Description of the change
Andrew Bennetts (spiv) wrote : | # |
Robert Collins (lifeless) wrote : | # |
So a few things.
Firstly, check is trying to move away from raising a single error at the first sign of trouble; raising BzrCheckError in code check uses takes that back a few steps. You could pass down a flag asking for an exception, or something. This occurs at least twice. I may be misreading though - the code in question might not be used by check at all.
+ repo = self.make_
255 + if not repo._format.
256 + raise TestNotApplicab
Please move these tests to per_repository_chks
Other than that it looks plausible to me.
Martin Pool (mbp) wrote : | # |
It seems like you should have a news entry before you merge.
=== modified file 'bzrlib/
--- bzrlib/
+++ bzrlib/
@@ -1182,6 +1182,12 @@
+ def without_
OK, it's fairly obvious, but a docstring would still be good.
+ gcvf = GroupCompressVe
+ self._index, self._access, self._delta)
+ gcvf._unadded_refs = dict(self.
+ return gcvf
+
It seems a bit potentially problematic to poke this in post
construction, if someone for instance changes the constructor so that it
does actually start opening things. How about instead adding a private
parameter to the constructor to pass this in?
def add_lines(self, key, parents, lines, parent_texts=None,
=== modified file 'bzrlib/
--- bzrlib/
+++ bzrlib/
@@ -591,8 +591,6 @@
:returns: set of missing keys. Note that not every missing key is
"""
- if getattr(self.repo, 'chk_bytes', None) is None:
- return set()
# Ensure that all revisions added in this write group have:
# - corresponding inventories,
# - chk root entries for those inventories,
@@ -603,6 +601,7 @@
+ no_fallback_
# Are any inventories for corresponding to the new revisions missing?
@@ -610,7 +609,9 @@
if missing_
- return [('inventories', key) for key in missing_
+ raise errors.
+ "Repository %s missing inventories for new revisions %r "
+ % (self.repo, sorted(
As Robert says, it seems like a step backwards to be raising an
exception directly rather than returning a list of problems - is there a
reason why you have to?
# Are any chk root entries missing for any inventories? This includes
# any present parent inventories, which may be used when calculating
# deltas for streaming.
@@ -620,17 +621,57 @@
# Filter out ghost parents.
+ parent_
+ corresponding_invs)
inv_ids = [key[-1] for key in all_inv_keys]
...
Martin Pool (mbp) wrote : | # |
that status is meant to mean 'tweak', but feel free to call me to talk about it.
Preview Diff
1 | === modified file 'bzrlib/groupcompress.py' | |||
2 | --- bzrlib/groupcompress.py 2009-09-04 07:39:08 +0000 | |||
3 | +++ bzrlib/groupcompress.py 2009-09-07 05:52:04 +0000 | |||
4 | @@ -1182,6 +1182,12 @@ | |||
5 | 1182 | self._group_cache = LRUSizeCache(max_size=50*1024*1024) | 1182 | self._group_cache = LRUSizeCache(max_size=50*1024*1024) |
6 | 1183 | self._fallback_vfs = [] | 1183 | self._fallback_vfs = [] |
7 | 1184 | 1184 | ||
8 | 1185 | def without_fallbacks(self): | ||
9 | 1186 | gcvf = GroupCompressVersionedFiles( | ||
10 | 1187 | self._index, self._access, self._delta) | ||
11 | 1188 | gcvf._unadded_refs = dict(self._unadded_refs) | ||
12 | 1189 | return gcvf | ||
13 | 1190 | |||
14 | 1185 | def add_lines(self, key, parents, lines, parent_texts=None, | 1191 | def add_lines(self, key, parents, lines, parent_texts=None, |
15 | 1186 | left_matching_blocks=None, nostore_sha=None, random_id=False, | 1192 | left_matching_blocks=None, nostore_sha=None, random_id=False, |
16 | 1187 | check_content=True): | 1193 | check_content=True): |
17 | 1188 | 1194 | ||
18 | === modified file 'bzrlib/repofmt/groupcompress_repo.py' | |||
19 | --- bzrlib/repofmt/groupcompress_repo.py 2009-09-07 03:00:23 +0000 | |||
20 | +++ bzrlib/repofmt/groupcompress_repo.py 2009-09-07 08:39:36 +0000 | |||
21 | @@ -591,8 +591,6 @@ | |||
22 | 591 | :returns: set of missing keys. Note that not every missing key is | 591 | :returns: set of missing keys. Note that not every missing key is |
23 | 592 | guaranteed to be reported. | 592 | guaranteed to be reported. |
24 | 593 | """ | 593 | """ |
25 | 594 | if getattr(self.repo, 'chk_bytes', None) is None: | ||
26 | 595 | return set() | ||
27 | 596 | # Ensure that all revisions added in this write group have: | 594 | # Ensure that all revisions added in this write group have: |
28 | 597 | # - corresponding inventories, | 595 | # - corresponding inventories, |
29 | 598 | # - chk root entries for those inventories, | 596 | # - chk root entries for those inventories, |
30 | @@ -603,6 +601,7 @@ | |||
31 | 603 | new_revisions_keys = key_deps.get_new_keys() | 601 | new_revisions_keys = key_deps.get_new_keys() |
32 | 604 | no_fallback_inv_index = self.repo.inventories._index | 602 | no_fallback_inv_index = self.repo.inventories._index |
33 | 605 | no_fallback_chk_bytes_index = self.repo.chk_bytes._index | 603 | no_fallback_chk_bytes_index = self.repo.chk_bytes._index |
34 | 604 | no_fallback_texts_index = self.repo.texts._index | ||
35 | 606 | inv_parent_map = no_fallback_inv_index.get_parent_map( | 605 | inv_parent_map = no_fallback_inv_index.get_parent_map( |
36 | 607 | new_revisions_keys) | 606 | new_revisions_keys) |
37 | 608 | # Are any inventories for corresponding to the new revisions missing? | 607 | # Are any inventories for corresponding to the new revisions missing? |
38 | @@ -610,7 +609,9 @@ | |||
39 | 610 | missing_corresponding = set(new_revisions_keys) | 609 | missing_corresponding = set(new_revisions_keys) |
40 | 611 | missing_corresponding.difference_update(corresponding_invs) | 610 | missing_corresponding.difference_update(corresponding_invs) |
41 | 612 | if missing_corresponding: | 611 | if missing_corresponding: |
43 | 613 | return [('inventories', key) for key in missing_corresponding] | 612 | raise errors.BzrCheckError( |
44 | 613 | "Repository %s missing inventories for new revisions %r " | ||
45 | 614 | % (self.repo, sorted(missing_corresponding))) | ||
46 | 614 | # Are any chk root entries missing for any inventories? This includes | 615 | # Are any chk root entries missing for any inventories? This includes |
47 | 615 | # any present parent inventories, which may be used when calculating | 616 | # any present parent inventories, which may be used when calculating |
48 | 616 | # deltas for streaming. | 617 | # deltas for streaming. |
49 | @@ -620,17 +621,57 @@ | |||
50 | 620 | # Filter out ghost parents. | 621 | # Filter out ghost parents. |
51 | 621 | all_inv_keys.intersection_update( | 622 | all_inv_keys.intersection_update( |
52 | 622 | no_fallback_inv_index.get_parent_map(all_inv_keys)) | 623 | no_fallback_inv_index.get_parent_map(all_inv_keys)) |
53 | 624 | parent_invs_only_keys = all_inv_keys.symmetric_difference( | ||
54 | 625 | corresponding_invs) | ||
55 | 623 | all_missing = set() | 626 | all_missing = set() |
56 | 624 | inv_ids = [key[-1] for key in all_inv_keys] | 627 | inv_ids = [key[-1] for key in all_inv_keys] |
66 | 625 | for inv in self.repo.iter_inventories(inv_ids, 'unordered'): | 628 | parent_invs_only_ids = [key[-1] for key in parent_invs_only_keys] |
67 | 626 | root_keys = set([inv.id_to_entry.key()]) | 629 | root_key_info = _build_interesting_key_sets( |
68 | 627 | if inv.parent_id_basename_to_file_id is not None: | 630 | self.repo, inv_ids, parent_invs_only_ids) |
69 | 628 | root_keys.add(inv.parent_id_basename_to_file_id.key()) | 631 | expected_chk_roots = root_key_info.all_keys() |
70 | 629 | present = no_fallback_chk_bytes_index.get_parent_map(root_keys) | 632 | present_chk_roots = no_fallback_chk_bytes_index.get_parent_map( |
71 | 630 | missing = root_keys.difference(present) | 633 | expected_chk_roots) |
72 | 631 | all_missing.update([('chk_bytes',) + key for key in missing]) | 634 | missing_chk_roots = expected_chk_roots.difference(present_chk_roots) |
73 | 632 | return all_missing | 635 | if missing_chk_roots: |
74 | 633 | 636 | # Don't bother checking any further. | |
75 | 637 | raise errors.BzrCheckError( | ||
76 | 638 | "Repository %s missing chk root keys %r for new revisions" | ||
77 | 639 | % (self.repo, sorted(missing_chk_roots))) | ||
78 | 640 | # Find all interesting chk_bytes records, and make sure they are | ||
79 | 641 | # present, as well as the text keys they reference. | ||
80 | 642 | chk_bytes_no_fallbacks = self.repo.chk_bytes.without_fallbacks() | ||
81 | 643 | chk_bytes_no_fallbacks._search_key_func = \ | ||
82 | 644 | self.repo.chk_bytes._search_key_func | ||
83 | 645 | chk_diff = chk_map.iter_interesting_nodes( | ||
84 | 646 | chk_bytes_no_fallbacks, root_key_info.interesting_root_keys, | ||
85 | 647 | root_key_info.uninteresting_root_keys) | ||
86 | 648 | bytes_to_info = inventory.CHKInventory._bytes_to_utf8name_key | ||
87 | 649 | text_keys = set() | ||
88 | 650 | try: | ||
89 | 651 | for record in _filter_text_keys(chk_diff, text_keys, bytes_to_info): | ||
90 | 652 | pass | ||
91 | 653 | except errors.NoSuchRevision, e: | ||
92 | 654 | # XXX: It would be nice if we could give a more precise error here. | ||
93 | 655 | raise errors.BzrCheckError( | ||
94 | 656 | "Repository %s missing chk node(s) for new revisions." | ||
95 | 657 | % (self.repo,)) | ||
96 | 658 | chk_diff = chk_map.iter_interesting_nodes( | ||
97 | 659 | chk_bytes_no_fallbacks, root_key_info.interesting_pid_root_keys, | ||
98 | 660 | root_key_info.uninteresting_pid_root_keys) | ||
99 | 661 | try: | ||
100 | 662 | for interesting_rec, interesting_map in chk_diff: | ||
101 | 663 | pass | ||
102 | 664 | except errors.NoSuchRevision, e: | ||
103 | 665 | raise errors.BzrCheckError( | ||
104 | 666 | "Repository %s missing chk node(s) for new revisions." | ||
105 | 667 | % (self.repo,)) | ||
106 | 668 | present_text_keys = no_fallback_texts_index.get_parent_map(text_keys) | ||
107 | 669 | missing_text_keys = text_keys.difference(present_text_keys) | ||
108 | 670 | if missing_text_keys: | ||
109 | 671 | raise errors.BzrCheckError( | ||
110 | 672 | "Repository %s missing text keys %r for new revisions" | ||
111 | 673 | % (self.repo, sorted(missing_text_keys))) | ||
112 | 674 | |||
113 | 634 | def _execute_pack_operations(self, pack_operations, | 675 | def _execute_pack_operations(self, pack_operations, |
114 | 635 | _packer_class=GCCHKPacker, | 676 | _packer_class=GCCHKPacker, |
115 | 636 | reload_func=None): | 677 | reload_func=None): |
116 | @@ -898,17 +939,12 @@ | |||
117 | 898 | parent_keys) | 939 | parent_keys) |
118 | 899 | present_parent_inv_ids = set( | 940 | present_parent_inv_ids = set( |
119 | 900 | [k[-1] for k in present_parent_inv_keys]) | 941 | [k[-1] for k in present_parent_inv_keys]) |
120 | 901 | uninteresting_root_keys = set() | ||
121 | 902 | interesting_root_keys = set() | ||
122 | 903 | inventories_to_read = set(revision_ids) | 942 | inventories_to_read = set(revision_ids) |
123 | 904 | inventories_to_read.update(present_parent_inv_ids) | 943 | inventories_to_read.update(present_parent_inv_ids) |
131 | 905 | for inv in self.iter_inventories(inventories_to_read): | 944 | root_key_info = _build_interesting_key_sets( |
132 | 906 | entry_chk_root_key = inv.id_to_entry.key() | 945 | self, inventories_to_read, present_parent_inv_ids) |
133 | 907 | if inv.revision_id in present_parent_inv_ids: | 946 | interesting_root_keys = root_key_info.interesting_root_keys |
134 | 908 | uninteresting_root_keys.add(entry_chk_root_key) | 947 | uninteresting_root_keys = root_key_info.uninteresting_root_keys |
128 | 909 | else: | ||
129 | 910 | interesting_root_keys.add(entry_chk_root_key) | ||
130 | 911 | |||
135 | 912 | chk_bytes = self.chk_bytes | 948 | chk_bytes = self.chk_bytes |
136 | 913 | for record, items in chk_map.iter_interesting_nodes(chk_bytes, | 949 | for record, items in chk_map.iter_interesting_nodes(chk_bytes, |
137 | 914 | interesting_root_keys, uninteresting_root_keys, | 950 | interesting_root_keys, uninteresting_root_keys, |
138 | @@ -1048,13 +1084,10 @@ | |||
139 | 1048 | bytes_to_info = inventory.CHKInventory._bytes_to_utf8name_key | 1084 | bytes_to_info = inventory.CHKInventory._bytes_to_utf8name_key |
140 | 1049 | chk_bytes = self.from_repository.chk_bytes | 1085 | chk_bytes = self.from_repository.chk_bytes |
141 | 1050 | def _filter_id_to_entry(): | 1086 | def _filter_id_to_entry(): |
149 | 1051 | for record, items in chk_map.iter_interesting_nodes(chk_bytes, | 1087 | interesting_nodes = chk_map.iter_interesting_nodes(chk_bytes, |
150 | 1052 | self._chk_id_roots, uninteresting_root_keys): | 1088 | self._chk_id_roots, uninteresting_root_keys) |
151 | 1053 | for name, bytes in items: | 1089 | for record in _filter_text_keys(interesting_nodes, self._text_keys, |
152 | 1054 | # Note: we don't care about name_utf8, because we are always | 1090 | bytes_to_info): |
146 | 1055 | # rich-root = True | ||
147 | 1056 | _, file_id, revision_id = bytes_to_info(bytes) | ||
148 | 1057 | self._text_keys.add((file_id, revision_id)) | ||
153 | 1058 | if record is not None: | 1091 | if record is not None: |
154 | 1059 | yield record | 1092 | yield record |
155 | 1060 | # Consumed | 1093 | # Consumed |
156 | @@ -1098,7 +1131,7 @@ | |||
157 | 1098 | missing_inventory_keys.add(key[1:]) | 1131 | missing_inventory_keys.add(key[1:]) |
158 | 1099 | if self._chk_id_roots or self._chk_p_id_roots: | 1132 | if self._chk_id_roots or self._chk_p_id_roots: |
159 | 1100 | raise AssertionError('Cannot call get_stream_for_missing_keys' | 1133 | raise AssertionError('Cannot call get_stream_for_missing_keys' |
161 | 1101 | ' untill all of get_stream() has been consumed.') | 1134 | ' until all of get_stream() has been consumed.') |
162 | 1102 | # Yield the inventory stream, so we can find the chk stream | 1135 | # Yield the inventory stream, so we can find the chk stream |
163 | 1103 | # Some of the missing_keys will be missing because they are ghosts. | 1136 | # Some of the missing_keys will be missing because they are ghosts. |
164 | 1104 | # As such, we can ignore them. The Sink is required to verify there are | 1137 | # As such, we can ignore them. The Sink is required to verify there are |
165 | @@ -1111,6 +1144,54 @@ | |||
166 | 1111 | yield stream_info | 1144 | yield stream_info |
167 | 1112 | 1145 | ||
168 | 1113 | 1146 | ||
169 | 1147 | class _InterestingKeyInfo(object): | ||
170 | 1148 | def __init__(self): | ||
171 | 1149 | self.interesting_root_keys = set() | ||
172 | 1150 | self.interesting_pid_root_keys = set() | ||
173 | 1151 | self.uninteresting_root_keys = set() | ||
174 | 1152 | self.uninteresting_pid_root_keys = set() | ||
175 | 1153 | |||
176 | 1154 | def all_interesting(self): | ||
177 | 1155 | return self.interesting_root_keys.union(self.interesting_pid_root_keys) | ||
178 | 1156 | |||
179 | 1157 | def all_uninteresting(self): | ||
180 | 1158 | return self.uninteresting_root_keys.union( | ||
181 | 1159 | self.uninteresting_pid_root_keys) | ||
182 | 1160 | |||
183 | 1161 | def all_keys(self): | ||
184 | 1162 | return self.all_interesting().union(self.all_uninteresting()) | ||
185 | 1163 | |||
186 | 1164 | |||
187 | 1165 | def _build_interesting_key_sets(repo, inventory_ids, parent_only_inv_ids): | ||
188 | 1166 | result = _InterestingKeyInfo() | ||
189 | 1167 | for inv in repo.iter_inventories(inventory_ids, 'unordered'): | ||
190 | 1168 | root_key = inv.id_to_entry.key() | ||
191 | 1169 | pid_root_key = inv.parent_id_basename_to_file_id.key() | ||
192 | 1170 | if inv.revision_id in parent_only_inv_ids: | ||
193 | 1171 | result.uninteresting_root_keys.add(root_key) | ||
194 | 1172 | result.uninteresting_pid_root_keys.add(pid_root_key) | ||
195 | 1173 | else: | ||
196 | 1174 | result.interesting_root_keys.add(root_key) | ||
197 | 1175 | result.interesting_pid_root_keys.add(pid_root_key) | ||
198 | 1176 | return result | ||
199 | 1177 | |||
200 | 1178 | |||
201 | 1179 | def _filter_text_keys(interesting_nodes_iterable, text_keys, bytes_to_info): | ||
202 | 1180 | """Iterate the result of iter_interesting_nodes, yielding the records | ||
203 | 1181 | and adding to text_keys. | ||
204 | 1182 | """ | ||
205 | 1183 | for record, items in interesting_nodes_iterable: | ||
206 | 1184 | for name, bytes in items: | ||
207 | 1185 | # Note: we don't care about name_utf8, because groupcompress repos | ||
208 | 1186 | # are always rich-root, so there are no synthesised root records to | ||
209 | 1187 | # ignore. | ||
210 | 1188 | _, file_id, revision_id = bytes_to_info(bytes) | ||
211 | 1189 | text_keys.add((file_id, revision_id)) | ||
212 | 1190 | yield record | ||
213 | 1191 | |||
214 | 1192 | |||
215 | 1193 | |||
216 | 1194 | |||
217 | 1114 | class RepositoryFormatCHK1(RepositoryFormatPack): | 1195 | class RepositoryFormatCHK1(RepositoryFormatPack): |
218 | 1115 | """A hashed CHK+group compress pack repository.""" | 1196 | """A hashed CHK+group compress pack repository.""" |
219 | 1116 | 1197 | ||
220 | 1117 | 1198 | ||
221 | === modified file 'bzrlib/repofmt/pack_repo.py' | |||
222 | --- bzrlib/repofmt/pack_repo.py 2009-09-07 03:00:23 +0000 | |||
223 | +++ bzrlib/repofmt/pack_repo.py 2009-09-07 06:01:48 +0000 | |||
224 | @@ -2071,7 +2071,7 @@ | |||
225 | 2071 | """ | 2071 | """ |
226 | 2072 | # The base implementation does no checks. GCRepositoryPackCollection | 2072 | # The base implementation does no checks. GCRepositoryPackCollection |
227 | 2073 | # overrides this. | 2073 | # overrides this. |
229 | 2074 | return set() | 2074 | pass |
230 | 2075 | 2075 | ||
231 | 2076 | def _commit_write_group(self): | 2076 | def _commit_write_group(self): |
232 | 2077 | all_missing = set() | 2077 | all_missing = set() |
233 | @@ -2087,11 +2087,7 @@ | |||
234 | 2087 | raise errors.BzrCheckError( | 2087 | raise errors.BzrCheckError( |
235 | 2088 | "Repository %s has missing compression parent(s) %r " | 2088 | "Repository %s has missing compression parent(s) %r " |
236 | 2089 | % (self.repo, sorted(all_missing))) | 2089 | % (self.repo, sorted(all_missing))) |
242 | 2090 | all_missing = self._check_new_inventories() | 2090 | self._check_new_inventories() |
238 | 2091 | if all_missing: | ||
239 | 2092 | raise errors.BzrCheckError( | ||
240 | 2093 | "Repository %s missing keys for new revisions %r " | ||
241 | 2094 | % (self.repo, sorted(all_missing))) | ||
243 | 2095 | self._remove_pack_indices(self._new_pack) | 2091 | self._remove_pack_indices(self._new_pack) |
244 | 2096 | any_new_content = False | 2092 | any_new_content = False |
245 | 2097 | if self._new_pack.data_inserted(): | 2093 | if self._new_pack.data_inserted(): |
246 | 2098 | 2094 | ||
247 | === modified file 'bzrlib/tests/per_repository/test_write_group.py' | |||
248 | --- bzrlib/tests/per_repository/test_write_group.py 2009-09-02 03:07:23 +0000 | |||
249 | +++ bzrlib/tests/per_repository/test_write_group.py 2009-09-07 08:06:25 +0000 | |||
250 | @@ -365,16 +365,16 @@ | |||
251 | 365 | """commit_write_group fails with BzrCheckError when the chk root record | 365 | """commit_write_group fails with BzrCheckError when the chk root record |
252 | 366 | for a new inventory is missing. | 366 | for a new inventory is missing. |
253 | 367 | """ | 367 | """ |
254 | 368 | repo = self.make_repository('damaged-repo') | ||
255 | 369 | if not repo._format.supports_chks: | ||
256 | 370 | raise TestNotApplicable('requires repository with chk_bytes') | ||
257 | 368 | builder = self.make_branch_builder('simple-branch') | 371 | builder = self.make_branch_builder('simple-branch') |
258 | 369 | builder.build_snapshot('A-id', None, [ | 372 | builder.build_snapshot('A-id', None, [ |
259 | 370 | ('add', ('', 'root-id', 'directory', None)), | 373 | ('add', ('', 'root-id', 'directory', None)), |
260 | 371 | ('add', ('file', 'file-id', 'file', 'content\n'))]) | 374 | ('add', ('file', 'file-id', 'file', 'content\n'))]) |
261 | 372 | b = builder.get_branch() | 375 | b = builder.get_branch() |
262 | 373 | if not b.repository._format.supports_chks: | ||
263 | 374 | raise TestNotApplicable('requires repository with chk_bytes') | ||
264 | 375 | b.lock_read() | 376 | b.lock_read() |
265 | 376 | self.addCleanup(b.unlock) | 377 | self.addCleanup(b.unlock) |
266 | 377 | repo = self.make_repository('damaged-repo') | ||
267 | 378 | repo.lock_write() | 378 | repo.lock_write() |
268 | 379 | repo.start_write_group() | 379 | repo.start_write_group() |
269 | 380 | # Now, add the objects manually | 380 | # Now, add the objects manually |
270 | @@ -411,6 +411,9 @@ | |||
271 | 411 | (In principle the chk records are unnecessary in this case, but in | 411 | (In principle the chk records are unnecessary in this case, but in |
272 | 412 | practice bzr 2.0rc1 (at least) expects to find them.) | 412 | practice bzr 2.0rc1 (at least) expects to find them.) |
273 | 413 | """ | 413 | """ |
274 | 414 | repo = self.make_repository('damaged-repo') | ||
275 | 415 | if not repo._format.supports_chks: | ||
276 | 416 | raise TestNotApplicable('requires repository with chk_bytes') | ||
277 | 414 | # Make a branch where the last two revisions have identical | 417 | # Make a branch where the last two revisions have identical |
278 | 415 | # inventories. | 418 | # inventories. |
279 | 416 | builder = self.make_branch_builder('simple-branch') | 419 | builder = self.make_branch_builder('simple-branch') |
280 | @@ -420,8 +423,6 @@ | |||
281 | 420 | builder.build_snapshot('B-id', None, []) | 423 | builder.build_snapshot('B-id', None, []) |
282 | 421 | builder.build_snapshot('C-id', None, []) | 424 | builder.build_snapshot('C-id', None, []) |
283 | 422 | b = builder.get_branch() | 425 | b = builder.get_branch() |
284 | 423 | if not b.repository._format.supports_chks: | ||
285 | 424 | raise TestNotApplicable('requires repository with chk_bytes') | ||
286 | 425 | b.lock_read() | 426 | b.lock_read() |
287 | 426 | self.addCleanup(b.unlock) | 427 | self.addCleanup(b.unlock) |
288 | 427 | # check our setup: B-id and C-id should have identical chk root keys. | 428 | # check our setup: B-id and C-id should have identical chk root keys. |
289 | @@ -433,10 +434,71 @@ | |||
290 | 433 | # We need ('revisions', 'C-id'), ('inventories', 'C-id'), | 434 | # We need ('revisions', 'C-id'), ('inventories', 'C-id'), |
291 | 434 | # ('inventories', 'B-id'), and the corresponding chk roots for those | 435 | # ('inventories', 'B-id'), and the corresponding chk roots for those |
292 | 435 | # inventories. | 436 | # inventories. |
293 | 437 | repo.lock_write() | ||
294 | 438 | repo.start_write_group() | ||
295 | 439 | src_repo = b.repository | ||
296 | 440 | repo.inventories.insert_record_stream( | ||
297 | 441 | src_repo.inventories.get_record_stream( | ||
298 | 442 | [('B-id',), ('C-id',)], 'unordered', True)) | ||
299 | 443 | repo.revisions.insert_record_stream( | ||
300 | 444 | src_repo.revisions.get_record_stream( | ||
301 | 445 | [('C-id',)], 'unordered', True)) | ||
302 | 446 | # Make sure the presence of the missing data in a fallback does not | ||
303 | 447 | # avoid the error. | ||
304 | 448 | repo.add_fallback_repository(b.repository) | ||
305 | 449 | self.assertRaises(errors.BzrCheckError, repo.commit_write_group) | ||
306 | 450 | reopened_repo = self.reopen_repo_and_resume_write_group(repo) | ||
307 | 451 | self.assertRaises( | ||
308 | 452 | errors.BzrCheckError, reopened_repo.commit_write_group) | ||
309 | 453 | reopened_repo.abort_write_group() | ||
310 | 454 | |||
311 | 455 | def test_missing_chk_leaf_for_inventory(self): | ||
312 | 456 | """commit_write_group fails with BzrCheckError when the chk root record | ||
313 | 457 | for a parent inventory of a new revision is missing. | ||
314 | 458 | """ | ||
315 | 436 | repo = self.make_repository('damaged-repo') | 459 | repo = self.make_repository('damaged-repo') |
316 | 460 | if not repo._format.supports_chks: | ||
317 | 461 | raise TestNotApplicable('requires repository with chk_bytes') | ||
318 | 462 | builder = self.make_branch_builder('simple-branch') | ||
319 | 463 | # add and modify files with very long file-ids, so that the chk map | ||
320 | 464 | # will need more than just a root node. | ||
321 | 465 | file_adds = [] | ||
322 | 466 | file_modifies = [] | ||
323 | 467 | for char in 'abc': | ||
324 | 468 | name = char * 10000 | ||
325 | 469 | file_adds.append( | ||
326 | 470 | ('add', ('file-' + name, 'file-%s-id' % name, 'file', | ||
327 | 471 | 'content %s\n' % name))) | ||
328 | 472 | file_modifies.append( | ||
329 | 473 | ('modify', ('file-%s-id' % name, 'new content %s\n' % name))) | ||
330 | 474 | builder.build_snapshot('A-id', None, [ | ||
331 | 475 | ('add', ('', 'root-id', 'directory', None))] + | ||
332 | 476 | file_adds) | ||
333 | 477 | builder.build_snapshot('B-id', None, []) | ||
334 | 478 | builder.build_snapshot('C-id', None, file_modifies) | ||
335 | 479 | b = builder.get_branch() | ||
336 | 480 | src_repo = b.repository | ||
337 | 481 | src_repo.lock_read() | ||
338 | 482 | self.addCleanup(src_repo.unlock) | ||
339 | 483 | # Now, manually insert objects for a stacked repo with only revision | ||
340 | 484 | # C-id, *except* drop the non-root chk records. | ||
341 | 485 | inv_b = src_repo.get_inventory('B-id') | ||
342 | 486 | inv_c = src_repo.get_inventory('C-id') | ||
343 | 487 | chk_root_keys_only = [ | ||
344 | 488 | inv_b.id_to_entry.key(), inv_b.parent_id_basename_to_file_id.key(), | ||
345 | 489 | inv_c.id_to_entry.key(), inv_c.parent_id_basename_to_file_id.key()] | ||
346 | 490 | all_chks = src_repo.chk_bytes.keys() | ||
347 | 491 | # Pick a non-root key to drop | ||
348 | 492 | key_to_drop = all_chks.difference(chk_root_keys_only).pop() | ||
349 | 493 | all_chks.discard(key_to_drop) | ||
350 | 437 | repo.lock_write() | 494 | repo.lock_write() |
351 | 438 | repo.start_write_group() | 495 | repo.start_write_group() |
353 | 439 | src_repo = b.repository | 496 | repo.chk_bytes.insert_record_stream( |
354 | 497 | src_repo.chk_bytes.get_record_stream( | ||
355 | 498 | all_chks, 'unordered', True)) | ||
356 | 499 | repo.texts.insert_record_stream( | ||
357 | 500 | src_repo.texts.get_record_stream( | ||
358 | 501 | src_repo.texts.keys(), 'unordered', True)) | ||
359 | 440 | repo.inventories.insert_record_stream( | 502 | repo.inventories.insert_record_stream( |
360 | 441 | src_repo.inventories.get_record_stream( | 503 | src_repo.inventories.get_record_stream( |
361 | 442 | [('B-id',), ('C-id',)], 'unordered', True)) | 504 | [('B-id',), ('C-id',)], 'unordered', True)) |
362 | @@ -456,16 +518,10 @@ | |||
363 | 456 | """commit_write_group fails with BzrCheckError when the chk root record | 518 | """commit_write_group fails with BzrCheckError when the chk root record |
364 | 457 | for a parent inventory of a new revision is missing. | 519 | for a parent inventory of a new revision is missing. |
365 | 458 | """ | 520 | """ |
375 | 459 | builder = self.make_branch_builder('simple-branch') | 521 | repo = self.make_repository('damaged-repo') |
376 | 460 | builder.build_snapshot('A-id', None, [ | 522 | if not repo._format.supports_chks: |
368 | 461 | ('add', ('', 'root-id', 'directory', None)), | ||
369 | 462 | ('add', ('file', 'file-id', 'file', 'content\n'))]) | ||
370 | 463 | builder.build_snapshot('B-id', None, []) | ||
371 | 464 | builder.build_snapshot('C-id', None, [ | ||
372 | 465 | ('modify', ('file-id', 'new-content'))]) | ||
373 | 466 | b = builder.get_branch() | ||
374 | 467 | if not b.repository._format.supports_chks: | ||
377 | 468 | raise TestNotApplicable('requires repository with chk_bytes') | 523 | raise TestNotApplicable('requires repository with chk_bytes') |
378 | 524 | b = self.make_branch_with_multiple_chk_nodes() | ||
379 | 469 | b.lock_read() | 525 | b.lock_read() |
380 | 470 | self.addCleanup(b.unlock) | 526 | self.addCleanup(b.unlock) |
381 | 471 | # Now, manually insert objects for a stacked repo with only revision | 527 | # Now, manually insert objects for a stacked repo with only revision |
382 | @@ -476,7 +532,6 @@ | |||
383 | 476 | inv_c = b.repository.get_inventory('C-id') | 532 | inv_c = b.repository.get_inventory('C-id') |
384 | 477 | chk_keys_for_c_only = [ | 533 | chk_keys_for_c_only = [ |
385 | 478 | inv_c.id_to_entry.key(), inv_c.parent_id_basename_to_file_id.key()] | 534 | inv_c.id_to_entry.key(), inv_c.parent_id_basename_to_file_id.key()] |
386 | 479 | repo = self.make_repository('damaged-repo') | ||
387 | 480 | repo.lock_write() | 535 | repo.lock_write() |
388 | 481 | repo.start_write_group() | 536 | repo.start_write_group() |
389 | 482 | src_repo = b.repository | 537 | src_repo = b.repository |
390 | @@ -498,6 +553,63 @@ | |||
391 | 498 | errors.BzrCheckError, reopened_repo.commit_write_group) | 553 | errors.BzrCheckError, reopened_repo.commit_write_group) |
392 | 499 | reopened_repo.abort_write_group() | 554 | reopened_repo.abort_write_group() |
393 | 500 | 555 | ||
394 | 556 | def make_branch_with_multiple_chk_nodes(self): | ||
395 | 557 | # add and modify files with very long file-ids, so that the chk map | ||
396 | 558 | # will need more than just a root node. | ||
397 | 559 | builder = self.make_branch_builder('simple-branch') | ||
398 | 560 | file_adds = [] | ||
399 | 561 | file_modifies = [] | ||
400 | 562 | for char in 'abc': | ||
401 | 563 | name = char * 10000 | ||
402 | 564 | file_adds.append( | ||
403 | 565 | ('add', ('file-' + name, 'file-%s-id' % name, 'file', | ||
404 | 566 | 'content %s\n' % name))) | ||
405 | 567 | file_modifies.append( | ||
406 | 568 | ('modify', ('file-%s-id' % name, 'new content %s\n' % name))) | ||
407 | 569 | builder.build_snapshot('A-id', None, [ | ||
408 | 570 | ('add', ('', 'root-id', 'directory', None))] + | ||
409 | 571 | file_adds) | ||
410 | 572 | builder.build_snapshot('B-id', None, []) | ||
411 | 573 | builder.build_snapshot('C-id', None, file_modifies) | ||
412 | 574 | return builder.get_branch() | ||
413 | 575 | |||
414 | 576 | def test_missing_text_record(self): | ||
415 | 577 | """commit_write_group fails with BzrCheckError when a text is missing. | ||
416 | 578 | """ | ||
417 | 579 | repo = self.make_repository('damaged-repo') | ||
418 | 580 | if not repo._format.supports_chks: | ||
419 | 581 | raise TestNotApplicable('requires repository with chk_bytes') | ||
420 | 582 | b = self.make_branch_with_multiple_chk_nodes() | ||
421 | 583 | src_repo = b.repository | ||
422 | 584 | src_repo.lock_read() | ||
423 | 585 | self.addCleanup(src_repo.unlock) | ||
424 | 586 | # Now, manually insert objects for a stacked repo with only revision | ||
425 | 587 | # C-id, *except* drop one changed text. | ||
426 | 588 | all_texts = src_repo.texts.keys() | ||
427 | 589 | all_texts.remove(('file-%s-id' % ('c'*10000,), 'C-id')) | ||
428 | 590 | repo.lock_write() | ||
429 | 591 | repo.start_write_group() | ||
430 | 592 | repo.chk_bytes.insert_record_stream( | ||
431 | 593 | src_repo.chk_bytes.get_record_stream( | ||
432 | 594 | src_repo.chk_bytes.keys(), 'unordered', True)) | ||
433 | 595 | repo.texts.insert_record_stream( | ||
434 | 596 | src_repo.texts.get_record_stream( | ||
435 | 597 | all_texts, 'unordered', True)) | ||
436 | 598 | repo.inventories.insert_record_stream( | ||
437 | 599 | src_repo.inventories.get_record_stream( | ||
438 | 600 | [('B-id',), ('C-id',)], 'unordered', True)) | ||
439 | 601 | repo.revisions.insert_record_stream( | ||
440 | 602 | src_repo.revisions.get_record_stream( | ||
441 | 603 | [('C-id',)], 'unordered', True)) | ||
442 | 604 | # Make sure the presence of the missing data in a fallback does not | ||
443 | 605 | # avoid the error. | ||
444 | 606 | repo.add_fallback_repository(b.repository) | ||
445 | 607 | self.assertRaises(errors.BzrCheckError, repo.commit_write_group) | ||
446 | 608 | reopened_repo = self.reopen_repo_and_resume_write_group(repo) | ||
447 | 609 | self.assertRaises( | ||
448 | 610 | errors.BzrCheckError, reopened_repo.commit_write_group) | ||
449 | 611 | reopened_repo.abort_write_group() | ||
450 | 612 | |||
451 | 501 | 613 | ||
452 | 502 | class TestResumeableWriteGroup(TestCaseWithRepository): | 614 | class TestResumeableWriteGroup(TestCaseWithRepository): |
453 | 503 | 615 |
This completes the fix for bug 406687.
We could go much further I think in terms of reusing code. I've taken a nibble at that in this branch but there's room to do much, much better. We now have three code paths in groupcompress_ repo.py that are doing the essentially the same logic of finding text references from a set of revisions:
- GroupCHKStreamS ource, altered_ by_revision_ ids, and new_inventories (the new checks in this patch).
- fileids_
- _check_
However I think the incremental improvement in this patch is landable without tackling all the duplication at once. (But I'm happy to do any more low-hanging fruit that a reviewer considers appropriate.)