Merge lp:~lifeless/bzr/check-1 into lp:~bzr/bzr/trunk-old
- check-1
- Merge into trunk-old
Status: | Merged |
---|---|
Merged at revision: | not available |
Proposed branch: | lp:~lifeless/bzr/check-1 |
Merge into: | lp:~bzr/bzr/trunk-old |
Diff against target: | None lines |
To merge this branch: | bzr merge lp:~lifeless/bzr/check-1 |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Vincent Ladeuil | Approve | ||
bzr-core | Pending | ||
Review via email: mp+7489@code.launchpad.net |
Commit message
Description of the change
Robert Collins (lifeless) wrote : | # |
Ian Clatworthy (ian-clatworthy) wrote : | # |
Robert Collins wrote:
> Robert Collins has proposed merging lp:~lifeless/bzr/check-1 into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
>
> This gives a single-pass progress bar to check, makes it talk 1/2 the
> time on 1.9 flavour formats and nochange on CHK formats. More work to
> come but this seems unqualified better to me.
>
>
Thanks for working on check. This command will get quite a workout from
lots of users as they migrate to 2a so I think getting it performing
well and reliably is quite a high priority.
This is a pretty big patch and I'm finding it difficult to review it all
at once. Here's a partial review with numerous clean-ups outside the
guts of your new code. To begin with, I'm happy with the -Dprogress
changes. These are arguably completely independent from check so it
would have been nice to submit them as a separate patch. In any case,
the changes related to that are good to go so you can land them immediately.
> +API Changes
> +***********
> +
> +* ``WorkingTree.
> + those returned by ``WorkingTree.
> +
>
And more importantly, you've changed the public API Branch.check() to
have a new mandatory parameter and InventoryEntry.
tree parameter. You need to document these as API breaks.
> + def find_lefthand_
> + """Find the distance to null for all the keys in keys.
> +
> + :param keys: keys to lookup.
> + :return: A dict key->distance for all of keys.
> + """
> + # Optimisable by concurrent searching, but a random spread should get
> + # some sort of hit rate.
> + result = {}
>
result isn't used in this routine.
> + else:
> + # At the moment, check does not extra work over get_record_stream
> + return self.get_
>
This comment is confusing and needs rewording.
> + for revid, revision in revisions_iterator:
> + if revision is None:
> + pass
> + parent_map = vf.get_
>
"pass" does nothing here so I'm assuming it's wrong. "continue" maybe?
> === modified file 'bzrlib/
> --- bzrlib/
> +++ bzrlib/
> @@ -37,7 +37,6 @@
> from bzrlib import (
> debug,
> errors,
> - osutils,
> symbol_versioning,
> trace,
> ui,
> @@ -46,7 +45,8 @@
> from bzrlib.smart import client, protocol, request, vfs
> from bzrlib.transport import ssh
> """)
> -
> +#usually already imported, and getting IllegalScoperRe
> +from bzrlib import osutils
>
Was this a transient issue? If not, it looks like a completely separate
bug to the check stuff and maybe ought to be submitted as such.
> def check(self, progress_bar=None):
> - """Check this object for integrity."""
> + """Check this object for integrity.
> +
> + :param progress_bar: A progress bar to output as the check progresses.
> + :param keys: Specific keys within the Versione...
Robert Collins (lifeless) wrote : | # |
I've stopped working on check while the current progress is reviewed.
I'll go through and look at the issues you've noted.
With regard to the API changes, I can document Branch.check, as it is
conceivable that people have used it. WT._check is private;
InventoryEntry.
object - it can only be used from within check.
-Rob
Vincent Ladeuil (vila) wrote : | # |
Huge patch to review here... for which the main intent is not
obvious.
The overall feeling is that you stopped your effort at an
intermediate point, with enough gains to land. Yet, many things
seems unfinished.
If you could add comments as brain dump before landing I'm sure a
lot of energy will not be wasted by the future maintainer (who
may or not be yourself ;-)
Some remarks ask for changes, they are small enough to be
considered as tweaks. If you disagree, an explanation will be
welcome.
My main concern here is that you're changing check without adding
test facilities to create broken repo/branch/
understand the desire to not check for every conceivable bug, I'd
feel more confident in the code if some tests were easier to
write...
Other than that, it looks good so with as many tweaks as you can
take in to account:
Review: approve
>>>>> "robert" == Robert Collins <email address hidden> writes:
...
robert> === modified file 'bzrlib/branch.py'
robert> --- bzrlib/branch.py 2009-06-10 03:56:49 +0000
robert> +++ bzrlib/branch.py 2009-06-16 05:29:42 +0000
robert> @@ -125,6 +125,14 @@
robert> raise errors.
robert> self.repository
robert> + def _get_check_
robert> + """Get the references needed for check().
robert> +
robert> + See bzrlib.check.
robert> + """
robert> + revid = self.last_
robert> + return [('revision-
'revision-
is not. Add a comment, if only to point the reader to the right
place to understand it.
I find the way to specify the different checks quite pleasant at
that point :)
I'm less sure about the way to store the results. AIUI 'refs' (as
used in Check.check()) will be a dict with keys being (kind,
id).
Why didn't you use a first level of dict with 'kind' as key and
new dict with 'id' as key instead ?
Is there some other reason than keeping the definition and
accesses use the same key ?
<snip/>
robert> === modified file 'bzrlib/check.py'
...
robert> - def check(self):
robert> + def check(self, callback_refs=None, check_repo=True):
robert> + if callback_refs is None:
...
robert> + self.progress.
robert> + if self.check_repo:
robert> + self.progress.
robert> + self.check_
robert> + self.progress.
robert> + self.repository
robert> + self.progress.
robert> + # check_weaves is done after the revision scan so that
robert> + # revision index is known to be valid.
robert> + self.check_weaves()
robert> + self.progress.
The above is clear, but put the following if block in a method by
itse...
Vincent Ladeuil (vila) wrote : | # |
Bah, forgot to sign my mail :-/
Andrew Bennetts (spiv) wrote : | # |
Vincent Ladeuil wrote:
[...]
> robert> + def _check_text(self, record, checker, item_data):
> robert> + """Check a single text."""
> robert> + # Check it is extractable.
> robert> + # TODO: check length.
>
> Add '-- lifeless <aaaammjj>' please.
I think you mean yyyymmdd? :)
-Andrew.
Preview Diff
1 | === modified file 'NEWS' |
2 | --- NEWS 2009-06-15 15:20:24 +0000 |
3 | +++ NEWS 2009-06-16 05:29:42 +0000 |
4 | @@ -17,6 +17,12 @@ |
5 | diverged-branches`` when a push fails because the branches have |
6 | diverged. (Neil Martinsen-Burrell, #269477) |
7 | |
8 | +API Changes |
9 | +*********** |
10 | + |
11 | +* ``WorkingTree._check`` now requires a references dict with keys matching |
12 | + those returned by ``WorkingTree._get_check_refs``. (Robert Collins) |
13 | + |
14 | Internals |
15 | ********* |
16 | |
17 | @@ -402,6 +408,9 @@ |
18 | cause mismatched physical locks to cause test errors rather than just |
19 | reporting to the screen. (Robert Collins) |
20 | |
21 | +* -Dprogress will cause pdb to start up if a progress view jumps |
22 | + backwards. (Robert Collins) |
23 | + |
24 | * Fallback ``CredentialStore`` instances registered with ``fallback=True`` |
25 | are now be able to provide credentials if obtaining credentials |
26 | via ~/.bazaar/authentication.conf fails. (Jelmer Vernooij, |
27 | |
28 | === modified file 'bzrlib/branch.py' |
29 | --- bzrlib/branch.py 2009-06-10 03:56:49 +0000 |
30 | +++ bzrlib/branch.py 2009-06-16 05:29:42 +0000 |
31 | @@ -125,6 +125,14 @@ |
32 | raise errors.UnstackableRepositoryFormat(self.repository._format, |
33 | self.repository.base) |
34 | |
35 | + def _get_check_refs(self): |
36 | + """Get the references needed for check(). |
37 | + |
38 | + See bzrlib.check. |
39 | + """ |
40 | + revid = self.last_revision() |
41 | + return [('revision-existence', revid), ('lefthand-distance', revid)] |
42 | + |
43 | @staticmethod |
44 | def open(base, _unsupported=False, possible_transports=None): |
45 | """Open the branch rooted at base. |
46 | @@ -1132,7 +1140,7 @@ |
47 | target._set_all_reference_info(target_reference_dict) |
48 | |
49 | @needs_read_lock |
50 | - def check(self): |
51 | + def check(self, refs): |
52 | """Check consistency of the branch. |
53 | |
54 | In particular this checks that revisions given in the revision-history |
55 | @@ -1141,42 +1149,23 @@ |
56 | |
57 | Callers will typically also want to check the repository. |
58 | |
59 | + :param refs: Calculated refs for this branch as specified by |
60 | + branch._get_check_refs() |
61 | :return: A BranchCheckResult. |
62 | """ |
63 | - ret = BranchCheckResult(self) |
64 | - mainline_parent_id = None |
65 | + result = BranchCheckResult(self) |
66 | last_revno, last_revision_id = self.last_revision_info() |
67 | - real_rev_history = [] |
68 | - try: |
69 | - for revid in self.repository.iter_reverse_revision_history( |
70 | - last_revision_id): |
71 | - real_rev_history.append(revid) |
72 | - except errors.RevisionNotPresent: |
73 | - ret.ghosts_in_mainline = True |
74 | - else: |
75 | - ret.ghosts_in_mainline = False |
76 | - real_rev_history.reverse() |
77 | - if len(real_rev_history) != last_revno: |
78 | - raise errors.BzrCheckError('revno does not match len(mainline)' |
79 | - ' %s != %s' % (last_revno, len(real_rev_history))) |
80 | - # TODO: We should probably also check that real_rev_history actually |
81 | - # matches self.revision_history() |
82 | - for revision_id in real_rev_history: |
83 | - try: |
84 | - revision = self.repository.get_revision(revision_id) |
85 | - except errors.NoSuchRevision, e: |
86 | - raise errors.BzrCheckError("mainline revision {%s} not in repository" |
87 | - % revision_id) |
88 | - # In general the first entry on the revision history has no parents. |
89 | - # But it's not illegal for it to have parents listed; this can happen |
90 | - # in imports from Arch when the parents weren't reachable. |
91 | - if mainline_parent_id is not None: |
92 | - if mainline_parent_id not in revision.parent_ids: |
93 | - raise errors.BzrCheckError("previous revision {%s} not listed among " |
94 | - "parents of {%s}" |
95 | - % (mainline_parent_id, revision_id)) |
96 | - mainline_parent_id = revision_id |
97 | - return ret |
98 | + actual_revno = refs[('lefthand-distance', last_revision_id)] |
99 | + if actual_revno != last_revno: |
100 | + result.errors.append(errors.BzrCheckError( |
101 | + 'revno does not match len(mainline) %s != %s' % ( |
102 | + last_revno, actual_revno))) |
103 | + # TODO: We should probably also check that self.revision_history |
104 | + # matches the repository for older branch formats. |
105 | + # If looking for the code that cross-checks repository parents against |
106 | + # the iter_reverse_revision_history output, that is now a repository |
107 | + # specific check. |
108 | + return result |
109 | |
110 | def _get_checkout_format(self): |
111 | """Return the most suitable metadir for a checkout of this branch. |
112 | @@ -2784,7 +2773,7 @@ |
113 | |
114 | def __init__(self, branch): |
115 | self.branch = branch |
116 | - self.ghosts_in_mainline = False |
117 | + self.errors = [] |
118 | |
119 | def report_results(self, verbose): |
120 | """Report the check results via trace.note. |
121 | @@ -2792,11 +2781,10 @@ |
122 | :param verbose: Requests more detailed display of what was checked, |
123 | if any. |
124 | """ |
125 | - note('checked branch %s format %s', |
126 | - self.branch.base, |
127 | - self.branch._format) |
128 | - if self.ghosts_in_mainline: |
129 | - note('branch contains ghosts in mainline') |
130 | + note('checked branch %s format %s', self.branch.base, |
131 | + self.branch._format) |
132 | + for error in self.errors: |
133 | + note('found error:%s', error) |
134 | |
135 | |
136 | class Converter5to6(object): |
137 | |
138 | === modified file 'bzrlib/check.py' |
139 | --- bzrlib/check.py 2009-03-23 14:59:43 +0000 |
140 | +++ bzrlib/check.py 2009-06-16 00:53:41 +0000 |
141 | @@ -32,6 +32,20 @@ |
142 | # raising them. If there's more than one exception it'd be good to see them |
143 | # all. |
144 | |
145 | +"""Checking of bzr objects. |
146 | + |
147 | +check_refs is a concept used for optimising check. Objects that depend on other |
148 | +objects (e.g. tree on repository) can list the objects they would be requesting |
149 | +so that when the dependent object is checked, matches can be pulled out and |
150 | +evaluated in-line rather than re-reading the same data many times. |
151 | +check_refs are tuples (kind, value). Currently defined kinds are: |
152 | +* 'trees', where value is a revid and the looked up objects are revision trees. |
153 | +* 'lefthand-distance', where value is a revid and the looked up objects are the |
154 | + distance along the lefthand path to NULL for that revid. |
155 | +* 'revision-existence', where value is a revid, and the result is True or False |
156 | + indicating that the revision was found/not found. |
157 | +""" |
158 | + |
159 | from bzrlib import errors, osutils |
160 | from bzrlib import repository as _mod_repository |
161 | from bzrlib import revision |
162 | @@ -39,6 +53,7 @@ |
163 | from bzrlib.bzrdir import BzrDir |
164 | from bzrlib.errors import BzrCheckError |
165 | from bzrlib.repository import Repository |
166 | +from bzrlib.revision import NULL_REVISION |
167 | from bzrlib.symbol_versioning import deprecated_function, deprecated_in |
168 | from bzrlib.trace import log_error, note |
169 | import bzrlib.ui |
170 | @@ -49,76 +64,145 @@ |
171 | |
172 | # The Check object interacts with InventoryEntry.check, etc. |
173 | |
174 | - def __init__(self, repository): |
175 | + def __init__(self, repository, check_repo=True): |
176 | self.repository = repository |
177 | - self.checked_text_cnt = 0 |
178 | self.checked_rev_cnt = 0 |
179 | - self.ghosts = [] |
180 | - self.repeated_text_cnt = 0 |
181 | + self.ghosts = set() |
182 | self.missing_parent_links = {} |
183 | self.missing_inventory_sha_cnt = 0 |
184 | self.missing_revision_cnt = 0 |
185 | - # maps (file-id, version) -> sha1; used by InventoryFile._check |
186 | - self.checked_texts = {} |
187 | self.checked_weaves = set() |
188 | self.unreferenced_versions = set() |
189 | self.inconsistent_parents = [] |
190 | self.rich_roots = repository.supports_rich_root() |
191 | self.text_key_references = {} |
192 | + self.check_repo = check_repo |
193 | + self.other_results = [] |
194 | + # Plain text lines to include in the report |
195 | + self._report_items = [] |
196 | + # Keys we are looking for; may be large and need spilling to disk. |
197 | + # key->(type(revision/inventory/text/signature/map), sha1, first-referer) |
198 | + self.pending_keys = {} |
199 | + # Ancestors map for all of revisions being checked; while large helper |
200 | + # functions we call would create it anyway, so better to have once and |
201 | + # keep. |
202 | + self.ancestors = {} |
203 | |
204 | - def check(self): |
205 | + def check(self, callback_refs=None, check_repo=True): |
206 | + if callback_refs is None: |
207 | + callback_refs = {} |
208 | self.repository.lock_read() |
209 | self.progress = bzrlib.ui.ui_factory.nested_progress_bar() |
210 | try: |
211 | - self.progress.update('retrieving inventory', 0, 2) |
212 | - # do not put in init, as it should be done with progess, |
213 | - # and inside the lock. |
214 | - self.inventory_weave = self.repository.inventories |
215 | - self.progress.update('checking revision graph', 1) |
216 | - self.check_revision_graph() |
217 | - self.plan_revisions() |
218 | - revno = 0 |
219 | - while revno < len(self.planned_revisions): |
220 | - rev_id = self.planned_revisions[revno] |
221 | - self.progress.update('checking revision', revno, |
222 | - len(self.planned_revisions)) |
223 | - revno += 1 |
224 | - self.check_one_rev(rev_id) |
225 | - # check_weaves is done after the revision scan so that |
226 | - # revision index is known to be valid. |
227 | - self.check_weaves() |
228 | + self.progress.update('check', 0, 4) |
229 | + if self.check_repo: |
230 | + self.progress.update('checking revisions', 0) |
231 | + self.check_revisions() |
232 | + self.progress.update('checking commit contents', 1) |
233 | + self.repository._check_inventories(self) |
234 | + self.progress.update('checking file graphs', 2) |
235 | + # check_weaves is done after the revision scan so that |
236 | + # revision index is known to be valid. |
237 | + self.check_weaves() |
238 | + self.progress.update('checking branches and trees', 3) |
239 | + if callback_refs: |
240 | + repo = self.repository |
241 | + # calculate all refs, and callback the objects requesting them. |
242 | + refs = {} |
243 | + wanting_items = set() |
244 | + # Current crude version calculates everything and calls |
245 | + # everything at once. Doing a queue and popping as things are |
246 | + # satisfied would be cheaper on memory [but few people have |
247 | + # huge numbers of working trees today. TODO: fix before |
248 | + # landing]. |
249 | + distances = set() |
250 | + existences = set() |
251 | + for ref, wantlist in callback_refs.iteritems(): |
252 | + wanting_items.update(wantlist) |
253 | + kind, value = ref |
254 | + if kind == 'trees': |
255 | + refs[ref] = repo.revision_tree(value) |
256 | + elif kind == 'lefthand-distance': |
257 | + distances.add(value) |
258 | + elif kind == 'revision-existence': |
259 | + existences.add(value) |
260 | + else: |
261 | + raise AssertionError( |
262 | + 'unknown ref kind for ref %s' % ref) |
263 | + node_distances = repo.get_graph().find_lefthand_distances(distances) |
264 | + for key, distance in node_distances.iteritems(): |
265 | + refs[('lefthand-distance', key)] = distance |
266 | + if key in existences and distance > 0: |
267 | + refs[('revision-existence', key)] = True |
268 | + existences.remove(key) |
269 | + parent_map = repo.get_graph().get_parent_map(existences) |
270 | + for key in parent_map: |
271 | + refs[('revision-existence', key)] = True |
272 | + existences.remove(key) |
273 | + for key in existences: |
274 | + refs[('revision-existence', key)] = False |
275 | + for item in wanting_items: |
276 | + if isinstance(item, WorkingTree): |
277 | + item._check(refs) |
278 | + if isinstance(item, Branch): |
279 | + self.other_results.append(item.check(refs)) |
280 | finally: |
281 | self.progress.finished() |
282 | self.repository.unlock() |
283 | |
284 | - def check_revision_graph(self): |
285 | + def _check_revisions(self, revisions_iterator): |
286 | + """Check revision objects by decorating a generator. |
287 | + |
288 | + :param revisions_iterator: An iterator of(revid, Revision-or-None). |
289 | + :return: A generator of the contents of revisions_iterator. |
290 | + """ |
291 | + self.planned_revisions = set() |
292 | + for revid, revision in revisions_iterator: |
293 | + yield revid, revision |
294 | + self._check_one_rev(revid, revision) |
295 | + # Flatten the revisions we found to guarantee consistent later |
296 | + # iteration. |
297 | + self.planned_revisions = list(self.planned_revisions) |
298 | + # TODO: extract digital signatures as items to callback on too. |
299 | + |
300 | + def check_revisions(self): |
301 | + """Scan revisions, checking data directly available as we go.""" |
302 | + revision_iterator = self.repository._iter_revisions(None) |
303 | + revision_iterator = self._check_revisions(revision_iterator) |
304 | + # We read the all revisions here: |
305 | + # - doing this allows later code to depend on the revision index. |
306 | + # - we can fill out existence flags at this point |
307 | + # - we can read the revision inventory sha at this point |
308 | + # - we can check properties and serialisers etc. |
309 | if not self.repository.revision_graph_can_have_wrong_parents(): |
310 | - # This check is not necessary. |
311 | + # The check against the index isn't needed. |
312 | self.revs_with_bad_parents_in_index = None |
313 | - return |
314 | - bad_revisions = self.repository._find_inconsistent_revision_parents() |
315 | - self.revs_with_bad_parents_in_index = list(bad_revisions) |
316 | - |
317 | - def plan_revisions(self): |
318 | - repository = self.repository |
319 | - self.planned_revisions = repository.all_revision_ids() |
320 | - self.progress.clear() |
321 | - inventoried = set(key[-1] for key in self.inventory_weave.keys()) |
322 | - awol = set(self.planned_revisions) - inventoried |
323 | - if len(awol) > 0: |
324 | - raise BzrCheckError('Stored revisions missing from inventory' |
325 | - '{%s}' % ','.join([f for f in awol])) |
326 | + for thing in revision_iterator: |
327 | + pass |
328 | + else: |
329 | + bad_revisions = self.repository._find_inconsistent_revision_parents( |
330 | + revision_iterator) |
331 | + self.revs_with_bad_parents_in_index = list(bad_revisions) |
332 | |
333 | def report_results(self, verbose): |
334 | + if self.check_repo: |
335 | + self._report_repo_results(verbose) |
336 | + for result in self.other_results: |
337 | + result.report_results(verbose) |
338 | + |
339 | + def _report_repo_results(self, verbose): |
340 | note('checked repository %s format %s', |
341 | self.repository.bzrdir.root_transport, |
342 | self.repository._format) |
343 | note('%6d revisions', self.checked_rev_cnt) |
344 | note('%6d file-ids', len(self.checked_weaves)) |
345 | - note('%6d unique file texts', self.checked_text_cnt) |
346 | - note('%6d repeated file texts', self.repeated_text_cnt) |
347 | - note('%6d unreferenced text versions', |
348 | - len(self.unreferenced_versions)) |
349 | + if verbose: |
350 | + note('%6d unreferenced text versions', |
351 | + len(self.unreferenced_versions)) |
352 | + if verbose and len(self.unreferenced_versions): |
353 | + for file_id, revision_id in self.unreferenced_versions: |
354 | + log_error('unreferenced version: {%s} in %s', revision_id, |
355 | + file_id) |
356 | if self.missing_inventory_sha_cnt: |
357 | note('%6d revisions are missing inventory_sha1', |
358 | self.missing_inventory_sha_cnt) |
359 | @@ -138,10 +222,6 @@ |
360 | note(' %s should be in the ancestry for:', link) |
361 | for linker in linkers: |
362 | note(' * %s', linker) |
363 | - if verbose: |
364 | - for file_id, revision_id in self.unreferenced_versions: |
365 | - log_error('unreferenced version: {%s} in %s', revision_id, |
366 | - file_id) |
367 | if len(self.inconsistent_parents): |
368 | note('%6d inconsistent parents', len(self.inconsistent_parents)) |
369 | if verbose: |
370 | @@ -161,60 +241,75 @@ |
371 | ' %s has wrong parents in index: ' |
372 | '%r should be %r', |
373 | revision_id, index_parents, actual_parents) |
374 | - |
375 | - def check_one_rev(self, rev_id): |
376 | - """Check one revision. |
377 | - |
378 | - rev_id - the one to check |
379 | + for item in self._report_items: |
380 | + note(item) |
381 | + |
382 | + def _check_one_rev(self, rev_id, rev): |
383 | + """Cross-check one revision. |
384 | + |
385 | + :param rev_id: A revision id to check. |
386 | + :param rev: A revision or None to indicate a missing revision. |
387 | """ |
388 | - rev = self.repository.get_revision(rev_id) |
389 | - |
390 | if rev.revision_id != rev_id: |
391 | - raise BzrCheckError('wrong internal revision id in revision {%s}' |
392 | - % rev_id) |
393 | - |
394 | + self._report_items.append( |
395 | + 'Mismatched internal revid {%s} and index revid {%s}' % ( |
396 | + rev.revision_id, rev_id)) |
397 | + rev_id = rev.revision_id |
398 | + # Check this revision tree etc, and count as seen when we encounter a |
399 | + # reference to it. |
400 | + self.planned_revisions.add(rev_id) |
401 | + # It is not a ghost |
402 | + self.ghosts.discard(rev_id) |
403 | + # Count all parents as ghosts if we haven't seen them yet. |
404 | for parent in rev.parent_ids: |
405 | if not parent in self.planned_revisions: |
406 | - # rev has a parent we didn't know about. |
407 | - missing_links = self.missing_parent_links.get(parent, []) |
408 | - missing_links.append(rev_id) |
409 | - self.missing_parent_links[parent] = missing_links |
410 | - # list based so somewhat slow, |
411 | - # TODO have a planned_revisions list and set. |
412 | - if self.repository.has_revision(parent): |
413 | - missing_ancestry = self.repository.get_ancestry(parent) |
414 | - for missing in missing_ancestry: |
415 | - if (missing is not None |
416 | - and missing not in self.planned_revisions): |
417 | - self.planned_revisions.append(missing) |
418 | - else: |
419 | - self.ghosts.append(rev_id) |
420 | - |
421 | - if rev.inventory_sha1: |
422 | - # Loopback - this is currently circular logic as the |
423 | - # knit get_inventory_sha1 call returns rev.inventory_sha1. |
424 | - # Repository.py's get_inventory_sha1 should instead return |
425 | - # inventories.get_record_stream([(revid,)]).next().sha1 or |
426 | - # similar. |
427 | - inv_sha1 = self.repository.get_inventory_sha1(rev_id) |
428 | - if inv_sha1 != rev.inventory_sha1: |
429 | - raise BzrCheckError('Inventory sha1 hash doesn\'t match' |
430 | - ' value in revision {%s}' % rev_id) |
431 | - self._check_revision_tree(rev_id) |
432 | + self.ghosts.add(parent) |
433 | + |
434 | + self.ancestors[rev_id] = tuple(rev.parent_ids) or (NULL_REVISION,) |
435 | + self.add_pending_item(rev_id, ('inventories', rev_id), 'inventory', |
436 | + rev.inventory_sha1) |
437 | self.checked_rev_cnt += 1 |
438 | |
439 | + def add_pending_item(self, referer, key, kind, sha1): |
440 | + """Add a reference to a sha1 to be cross checked against a key. |
441 | + |
442 | + :param referer: The referer that expects key to have sha1. |
443 | + :param key: A storage key e.g. ('texts', 'foo@bar-20040504-1234') |
444 | + :param kind: revision/inventory/text/map/signature |
445 | + :param sha1: A hex sha1 or None if no sha1 is known. |
446 | + """ |
447 | + existing = self.pending_keys.get(key) |
448 | + if existing: |
449 | + if sha1 != existing[1]: |
450 | + self._report_items.append('Multiple expected sha1s for %s. {%s}' |
451 | + ' expects {%s}, {%s} expects {%s}', ( |
452 | + key, referer, sha1, existing[1], existing[0])) |
453 | + else: |
454 | + self.pending_keys[key] = (kind, sha1, referer) |
455 | + |
456 | def check_weaves(self): |
457 | """Check all the weaves we can get our hands on. |
458 | """ |
459 | weave_ids = [] |
460 | - self.progress.update('checking inventory', 0, 2) |
461 | - self.inventory_weave.check(progress_bar=self.progress) |
462 | - self.progress.update('checking text storage', 1, 2) |
463 | - self.repository.texts.check(progress_bar=self.progress) |
464 | - weave_checker = self.repository._get_versioned_file_checker( |
465 | - text_key_references=self.text_key_references) |
466 | + storebar = bzrlib.ui.ui_factory.nested_progress_bar() |
467 | + try: |
468 | + self._check_weaves(storebar) |
469 | + finally: |
470 | + storebar.finished() |
471 | + |
472 | + def _check_weaves(self, storebar): |
473 | + storebar.update('text-index', 0, 2) |
474 | + if self.repository._format.fast_deltas: |
475 | + # We haven't considered every fileid instance so far. |
476 | + weave_checker = self.repository._get_versioned_file_checker( |
477 | + ancestors=self.ancestors) |
478 | + else: |
479 | + weave_checker = self.repository._get_versioned_file_checker( |
480 | + text_key_references=self.text_key_references, |
481 | + ancestors=self.ancestors) |
482 | + storebar.update('file-graph', 1) |
483 | result = weave_checker.check_file_version_parents( |
484 | - self.repository.texts, progress_bar=self.progress) |
485 | + self.repository.texts) |
486 | self.checked_weaves = weave_checker.file_ids |
487 | bad_parents, unused_versions = result |
488 | bad_parents = bad_parents.items() |
489 | @@ -228,28 +323,8 @@ |
490 | (revision_id, weave_id, weave_parents, correct_parents)) |
491 | self.unreferenced_versions.update(unused_versions) |
492 | |
493 | - def _check_revision_tree(self, rev_id): |
494 | - tree = self.repository.revision_tree(rev_id) |
495 | - inv = tree.inventory |
496 | - seen_ids = set() |
497 | - seen_names = set() |
498 | - for path, ie in inv.iter_entries(): |
499 | - self._add_entry_to_text_key_references(inv, ie) |
500 | - file_id = ie.file_id |
501 | - if file_id in seen_ids: |
502 | - raise BzrCheckError('duplicated file_id {%s} ' |
503 | - 'in inventory for revision {%s}' |
504 | - % (file_id, rev_id)) |
505 | - seen_ids.add(file_id) |
506 | - ie.check(self, rev_id, inv, tree) |
507 | - if path in seen_names: |
508 | - raise BzrCheckError('duplicated path %s ' |
509 | - 'in inventory for revision {%s}' |
510 | - % (path, rev_id)) |
511 | - seen_names.add(path) |
512 | - |
513 | def _add_entry_to_text_key_references(self, inv, entry): |
514 | - if not self.rich_roots and entry == inv.root: |
515 | + if not self.rich_roots and entry.name == '': |
516 | return |
517 | key = (entry.file_id, entry.revision) |
518 | self.text_key_references.setdefault(key, False) |
519 | @@ -263,13 +338,14 @@ |
520 | |
521 | Results are reported through logging. |
522 | |
523 | - Deprecated in 1.6. Please use check_branch instead. |
524 | + Deprecated in 1.6. Please use check_dwim instead. |
525 | |
526 | :raise BzrCheckError: if there's a consistency error. |
527 | """ |
528 | check_branch(branch, verbose) |
529 | |
530 | |
531 | +@deprecated_function(deprecated_in((1,16,0))) |
532 | def check_branch(branch, verbose): |
533 | """Run consistency checks on a branch. |
534 | |
535 | @@ -279,56 +355,108 @@ |
536 | """ |
537 | branch.lock_read() |
538 | try: |
539 | - branch_result = branch.check() |
540 | + needed_refs = {} |
541 | + for ref in branch._get_check_refs(): |
542 | + needed_refs.setdefault(ref, []).append(branch) |
543 | + result = branch.repository.check([branch.last_revision()], needed_refs) |
544 | + branch_result = result.other_results[0] |
545 | finally: |
546 | branch.unlock() |
547 | branch_result.report_results(verbose) |
548 | |
549 | |
550 | +def scan_branch(branch, needed_refs, to_unlock): |
551 | + """Scan a branch for refs. |
552 | + |
553 | + :param branch: The branch to schedule for checking. |
554 | + :param needed_refs: Refs we are accumulating. |
555 | + :param to_unlock: The unlock list accumulating. |
556 | + """ |
557 | + note("Checking branch at '%s'." % (branch.base,)) |
558 | + branch.lock_read() |
559 | + to_unlock.append(branch) |
560 | + branch_refs = branch._get_check_refs() |
561 | + for ref in branch_refs: |
562 | + reflist = needed_refs.setdefault(ref, []) |
563 | + reflist.append(branch) |
564 | + |
565 | + |
566 | +def scan_tree(base_tree, tree, needed_refs, to_unlock): |
567 | + """Scan a tree for refs. |
568 | + |
569 | + :param base_tree: The original tree check opened, used to detect duplicate |
570 | + tree checks. |
571 | + :param tree: The tree to schedule for checking. |
572 | + :param needed_refs: Refs we are accumulating. |
573 | + :param to_unlock: The unlock list accumulating. |
574 | + """ |
575 | + if base_tree is not None and tree.basedir == base_tree.basedir: |
576 | + return |
577 | + note("Checking working tree at '%s'." % (tree.basedir,)) |
578 | + tree.lock_read() |
579 | + to_unlock.append(tree) |
580 | + tree_refs = tree._get_check_refs() |
581 | + for ref in tree_refs: |
582 | + reflist = needed_refs.setdefault(ref, []) |
583 | + reflist.append(tree) |
584 | + |
585 | + |
586 | def check_dwim(path, verbose, do_branch=False, do_repo=False, do_tree=False): |
587 | try: |
588 | - tree, branch, repo, relpath = \ |
589 | + base_tree, branch, repo, relpath = \ |
590 | BzrDir.open_containing_tree_branch_or_repository(path) |
591 | except errors.NotBranchError: |
592 | - tree = branch = repo = None |
593 | - |
594 | - if do_tree: |
595 | - if tree is not None: |
596 | - note("Checking working tree at '%s'." |
597 | - % (tree.bzrdir.root_transport.base,)) |
598 | - tree._check() |
599 | - else: |
600 | - log_error("No working tree found at specified location.") |
601 | - |
602 | - if branch is not None: |
603 | - # We have a branch |
604 | - if repo is None: |
605 | - # The branch is in a shared repository |
606 | - repo = branch.repository |
607 | - branches = [branch] |
608 | - elif repo is not None: |
609 | - branches = repo.find_branches(using=True) |
610 | - |
611 | - if repo is not None: |
612 | - repo.lock_read() |
613 | - try: |
614 | - if do_repo: |
615 | - note("Checking repository at '%s'." |
616 | - % (repo.bzrdir.root_transport.base,)) |
617 | - result = repo.check() |
618 | + base_tree = branch = repo = None |
619 | + |
620 | + to_unlock = [] |
621 | + needed_refs= {} |
622 | + try: |
623 | + if base_tree is not None: |
624 | + # If the tree is a lightweight checkout we won't see it in |
625 | + # repo.find_branches - add now. |
626 | + if do_tree: |
627 | + scan_tree(None, base_tree, needed_refs, to_unlock) |
628 | + branch = base_tree.branch |
629 | + if branch is not None: |
630 | + # We have a branch |
631 | + if repo is None: |
632 | + # The branch is in a shared repository |
633 | + repo = branch.repository |
634 | + if repo is not None: |
635 | + repo.lock_read() |
636 | + to_unlock.append(repo) |
637 | + branches = repo.find_branches(using=True) |
638 | + saw_tree = False |
639 | + if do_branch or do_tree: |
640 | + for branch in branches: |
641 | + if do_tree: |
642 | + try: |
643 | + tree = branch.bzrdir.open_workingtree() |
644 | + saw_tree = True |
645 | + except (errors.NotLocalUrl, errors.NoWorkingTree): |
646 | + pass |
647 | + else: |
648 | + scan_tree(base_tree, tree, needed_refs, to_unlock) |
649 | + if do_branch: |
650 | + scan_branch(branch, needed_refs, to_unlock) |
651 | + if do_branch and not branches: |
652 | + log_error("No branch found at specified location.") |
653 | + if do_tree and base_tree is None and not saw_tree: |
654 | + log_error("No working tree found at specified location.") |
655 | + if do_repo or do_branch or do_tree: |
656 | + if do_repo: |
657 | + note("Checking repository at '%s'." |
658 | + % (repo.bzrdir.root_transport.base,)) |
659 | + result = repo.check(None, callback_refs=needed_refs, |
660 | + check_repo=do_repo) |
661 | result.report_results(verbose) |
662 | + else: |
663 | + if do_tree: |
664 | + log_error("No working tree found at specified location.") |
665 | if do_branch: |
666 | - if branches == []: |
667 | - log_error("No branch found at specified location.") |
668 | - else: |
669 | - for branch in branches: |
670 | - note("Checking branch at '%s'." |
671 | - % (branch.bzrdir.root_transport.base,)) |
672 | - check_branch(branch, verbose) |
673 | - finally: |
674 | - repo.unlock() |
675 | - else: |
676 | - if do_branch: |
677 | - log_error("No branch found at specified location.") |
678 | - if do_repo: |
679 | - log_error("No repository found at specified location.") |
680 | + log_error("No branch found at specified location.") |
681 | + if do_repo: |
682 | + log_error("No repository found at specified location.") |
683 | + finally: |
684 | + for thing in to_unlock: |
685 | + thing.unlock() |
686 | |
687 | === modified file 'bzrlib/graph.py' |
688 | --- bzrlib/graph.py 2009-06-10 03:56:49 +0000 |
689 | +++ bzrlib/graph.py 2009-06-16 05:29:42 +0000 |
690 | @@ -311,6 +311,27 @@ |
691 | # get there. |
692 | return known_revnos[cur_tip] + num_steps |
693 | |
694 | + def find_lefthand_distances(self, keys): |
695 | + """Find the distance to null for all the keys in keys. |
696 | + |
697 | + :param keys: keys to lookup. |
698 | + :return: A dict key->distance for all of keys. |
699 | + """ |
700 | + # Optimisable by concurrent searching, but a random spread should get |
701 | + # some sort of hit rate. |
702 | + result = {} |
703 | + known_revnos = [] |
704 | + ghosts = [] |
705 | + for key in keys: |
706 | + try: |
707 | + known_revnos.append( |
708 | + (key, self.find_distance_to_null(key, known_revnos))) |
709 | + except errors.GhostRevisionsHaveNoRevno: |
710 | + ghosts.append(key) |
711 | + for key in ghosts: |
712 | + known_revnos.append((key, -1)) |
713 | + return dict(known_revnos) |
714 | + |
715 | def find_unique_ancestors(self, unique_revision, common_revisions): |
716 | """Find the unique ancestors for a revision versus others. |
717 | |
718 | |
719 | === modified file 'bzrlib/groupcompress.py' |
720 | --- bzrlib/groupcompress.py 2009-06-10 03:56:49 +0000 |
721 | +++ bzrlib/groupcompress.py 2009-06-16 05:29:42 +0000 |
722 | @@ -1049,11 +1049,14 @@ |
723 | reannotate(parent_lines, lines, key, None, head_cache)) |
724 | return parent_cache[key] |
725 | |
726 | - def check(self, progress_bar=None): |
727 | + def check(self, progress_bar=None, keys=None): |
728 | """See VersionedFiles.check().""" |
729 | - keys = self.keys() |
730 | - for record in self.get_record_stream(keys, 'unordered', True): |
731 | - record.get_bytes_as('fulltext') |
732 | + if keys is None: |
733 | + keys = self.keys() |
734 | + for record in self.get_record_stream(keys, 'unordered', True): |
735 | + record.get_bytes_as('fulltext') |
736 | + else: |
737 | + return self.get_record_stream(keys, 'unordered', True) |
738 | |
739 | def _check_add(self, key, lines, random_id, check_content): |
740 | """check that version_id and lines are safe to add.""" |
741 | |
742 | === modified file 'bzrlib/help_topics/en/debug-flags.txt' |
743 | --- bzrlib/help_topics/en/debug-flags.txt 2009-03-18 23:43:51 +0000 |
744 | +++ bzrlib/help_topics/en/debug-flags.txt 2009-05-12 06:30:56 +0000 |
745 | @@ -19,6 +19,7 @@ |
746 | -Dindex Trace major index operations. |
747 | -Dknit Trace knit operations. |
748 | -Dlock Trace when lockdir locks are taken or released. |
749 | +-Dprogress Trace progress bar operations. |
750 | -Dmerge Emit information for debugging merges. |
751 | -Dpack Emit information about pack operations. |
752 | -Dsftp Trace SFTP internals. |
753 | |
754 | === modified file 'bzrlib/inventory.py' |
755 | --- bzrlib/inventory.py 2009-06-10 03:56:49 +0000 |
756 | +++ bzrlib/inventory.py 2009-06-16 05:29:42 +0000 |
757 | @@ -262,7 +262,7 @@ |
758 | def versionable_kind(kind): |
759 | return (kind in ('file', 'directory', 'symlink', 'tree-reference')) |
760 | |
761 | - def check(self, checker, rev_id, inv, tree): |
762 | + def check(self, checker, rev_id, inv): |
763 | """Check this inventory entry is intact. |
764 | |
765 | This is a template method, override _check for kind specific |
766 | @@ -274,18 +274,18 @@ |
767 | :param rev_id: Revision id from which this InventoryEntry was loaded. |
768 | Not necessarily the last-changed revision for this file. |
769 | :param inv: Inventory from which the entry was loaded. |
770 | - :param tree: RevisionTree for this entry. |
771 | """ |
772 | if self.parent_id is not None: |
773 | if not inv.has_id(self.parent_id): |
774 | raise BzrCheckError('missing parent {%s} in inventory for revision {%s}' |
775 | % (self.parent_id, rev_id)) |
776 | - self._check(checker, rev_id, tree) |
777 | + checker._add_entry_to_text_key_references(inv, self) |
778 | + self._check(checker, rev_id) |
779 | |
780 | - def _check(self, checker, rev_id, tree): |
781 | + def _check(self, checker, rev_id): |
782 | """Check this inventory entry for kind specific errors.""" |
783 | - raise BzrCheckError('unknown entry kind %r in revision {%s}' % |
784 | - (self.kind, rev_id)) |
785 | + checker._report_items.append( |
786 | + 'unknown entry kind %r in revision {%s}' % (self.kind, rev_id)) |
787 | |
788 | def copy(self): |
789 | """Clone this inventory entry.""" |
790 | @@ -404,7 +404,7 @@ |
791 | 'text_id', 'parent_id', 'children', 'executable', |
792 | 'revision', 'symlink_target', 'reference_revision'] |
793 | |
794 | - def _check(self, checker, rev_id, tree): |
795 | + def _check(self, checker, rev_id): |
796 | """See InventoryEntry._check""" |
797 | |
798 | def __init__(self, file_id): |
799 | @@ -433,11 +433,16 @@ |
800 | 'text_id', 'parent_id', 'children', 'executable', |
801 | 'revision', 'symlink_target', 'reference_revision'] |
802 | |
803 | - def _check(self, checker, rev_id, tree): |
804 | + def _check(self, checker, rev_id): |
805 | """See InventoryEntry._check""" |
806 | - if self.text_sha1 is not None or self.text_size is not None or self.text_id is not None: |
807 | - raise BzrCheckError('directory {%s} has text in revision {%s}' |
808 | + if (self.text_sha1 is not None or self.text_size is not None or |
809 | + self.text_id is not None): |
810 | + checker._report_items.append('directory {%s} has text in revision {%s}' |
811 | % (self.file_id, rev_id)) |
812 | + # Directories are stored as ''. |
813 | + checker.add_pending_item(rev_id, |
814 | + ('texts', self.file_id, self.revision), 'text', |
815 | + 'da39a3ee5e6b4b0d3255bfef95601890afd80709') |
816 | |
817 | def copy(self): |
818 | other = InventoryDirectory(self.file_id, self.name, self.parent_id) |
819 | @@ -476,27 +481,16 @@ |
820 | 'text_id', 'parent_id', 'children', 'executable', |
821 | 'revision', 'symlink_target', 'reference_revision'] |
822 | |
823 | - def _check(self, checker, tree_revision_id, tree): |
824 | + def _check(self, checker, tree_revision_id): |
825 | """See InventoryEntry._check""" |
826 | - key = (self.file_id, self.revision) |
827 | - if key in checker.checked_texts: |
828 | - prev_sha = checker.checked_texts[key] |
829 | - if prev_sha != self.text_sha1: |
830 | - raise BzrCheckError( |
831 | - 'mismatched sha1 on {%s} in {%s} (%s != %s) %r' % |
832 | - (self.file_id, tree_revision_id, prev_sha, self.text_sha1, |
833 | - t)) |
834 | - else: |
835 | - checker.repeated_text_cnt += 1 |
836 | - return |
837 | - |
838 | - checker.checked_text_cnt += 1 |
839 | - # We can't check the length, because Weave doesn't store that |
840 | - # information, and the whole point of looking at the weave's |
841 | - # sha1sum is that we don't have to extract the text. |
842 | - if (self.text_sha1 != tree._repository.texts.get_sha1s([key])[key]): |
843 | - raise BzrCheckError('text {%s} version {%s} wrong sha1' % key) |
844 | - checker.checked_texts[key] = self.text_sha1 |
845 | + # TODO: check size too. |
846 | + checker.add_pending_item(tree_revision_id, |
847 | + ('texts', self.file_id, self.revision), 'text', |
848 | + self.text_sha1) |
849 | + if self.text_size is None: |
850 | + checker._report_items.append( |
851 | + 'fileid {%s} in {%s} has None for text_size' % (self.file_id, |
852 | + tree_revision_id)) |
853 | |
854 | def copy(self): |
855 | other = InventoryFile(self.file_id, self.name, self.parent_id) |
856 | @@ -600,14 +594,20 @@ |
857 | 'text_id', 'parent_id', 'children', 'executable', |
858 | 'revision', 'symlink_target', 'reference_revision'] |
859 | |
860 | - def _check(self, checker, rev_id, tree): |
861 | + def _check(self, checker, rev_id): |
862 | """See InventoryEntry._check""" |
863 | if self.text_sha1 is not None or self.text_size is not None or self.text_id is not None: |
864 | - raise BzrCheckError('symlink {%s} has text in revision {%s}' |
865 | + checker._report_items.append( |
866 | + 'symlink {%s} has text in revision {%s}' |
867 | % (self.file_id, rev_id)) |
868 | if self.symlink_target is None: |
869 | - raise BzrCheckError('symlink {%s} has no target in revision {%s}' |
870 | + checker._report_items.append( |
871 | + 'symlink {%s} has no target in revision {%s}' |
872 | % (self.file_id, rev_id)) |
873 | + # Symlinks are stored as '' |
874 | + checker.add_pending_item(tree_revision_id, |
875 | + ('texts', self.file_id, self.revision), 'text', |
876 | + 'da39a3ee5e6b4b0d3255bfef95601890afd80709') |
877 | |
878 | def copy(self): |
879 | other = InventoryLink(self.file_id, self.name, self.parent_id) |
880 | |
881 | === modified file 'bzrlib/knit.py' |
882 | --- bzrlib/knit.py 2009-06-10 03:56:49 +0000 |
883 | +++ bzrlib/knit.py 2009-06-16 05:29:42 +0000 |
884 | @@ -1005,8 +1005,15 @@ |
885 | """See VersionedFiles.annotate.""" |
886 | return self._factory.annotate(self, key) |
887 | |
888 | - def check(self, progress_bar=None): |
889 | + def check(self, progress_bar=None, keys=None): |
890 | """See VersionedFiles.check().""" |
891 | + if keys is None: |
892 | + return self._logical_check() |
893 | + else: |
894 | + # At the moment, check does not extra work over get_record_stream |
895 | + return self.get_record_stream(keys, 'unordered', True) |
896 | + |
897 | + def _logical_check(self): |
898 | # This doesn't actually test extraction of everything, but that will |
899 | # impact 'bzr check' substantially, and needs to be integrated with |
900 | # care. However, it does check for the obvious problem of a delta with |
901 | |
902 | === modified file 'bzrlib/remote.py' |
903 | --- bzrlib/remote.py 2009-06-11 09:11:21 +0000 |
904 | +++ bzrlib/remote.py 2009-06-16 05:29:42 +0000 |
905 | @@ -1414,9 +1414,10 @@ |
906 | return self._real_repository.get_revision_reconcile(revision_id) |
907 | |
908 | @needs_read_lock |
909 | - def check(self, revision_ids=None): |
910 | + def check(self, revision_ids=None, callback_refs=None): |
911 | self._ensure_real() |
912 | - return self._real_repository.check(revision_ids=revision_ids) |
913 | + return self._real_repository.check(revision_ids=revision_ids, |
914 | + callback_refs=callback_refs) |
915 | |
916 | def copy_content_into(self, destination, revision_id=None): |
917 | self._ensure_real() |
918 | |
919 | === modified file 'bzrlib/repofmt/groupcompress_repo.py' |
920 | --- bzrlib/repofmt/groupcompress_repo.py 2009-06-12 01:11:00 +0000 |
921 | +++ bzrlib/repofmt/groupcompress_repo.py 2009-06-16 05:29:42 +0000 |
922 | @@ -718,6 +718,10 @@ |
923 | finally: |
924 | basis_tree.unlock() |
925 | |
926 | + def deserialise_inventory(self, revision_id, bytes): |
927 | + return inventory.CHKInventory.deserialise(self.chk_bytes, bytes, |
928 | + (revision_id,)) |
929 | + |
930 | def _iter_inventories(self, revision_ids): |
931 | """Iterate over many inventory objects.""" |
932 | keys = [(revision_id,) for revision_id in revision_ids] |
933 | |
934 | === modified file 'bzrlib/repofmt/knitrepo.py' |
935 | --- bzrlib/repofmt/knitrepo.py 2009-06-10 03:56:49 +0000 |
936 | +++ bzrlib/repofmt/knitrepo.py 2009-06-16 05:29:42 +0000 |
937 | @@ -229,24 +229,29 @@ |
938 | def _make_parents_provider(self): |
939 | return _KnitsParentsProvider(self.revisions) |
940 | |
941 | - def _find_inconsistent_revision_parents(self): |
942 | + def _find_inconsistent_revision_parents(self, revisions_iterator=None): |
943 | """Find revisions with different parent lists in the revision object |
944 | and in the index graph. |
945 | |
946 | + :param revisions_iterator: None, or an iterator of (revid, |
947 | + Revision-or-None). This iterator controls the revisions checked. |
948 | :returns: an iterator yielding tuples of (revison-id, parents-in-index, |
949 | parents-in-revision). |
950 | """ |
951 | if not self.is_locked(): |
952 | raise AssertionError() |
953 | vf = self.revisions |
954 | - for index_version in vf.keys(): |
955 | - parent_map = vf.get_parent_map([index_version]) |
956 | + if revisions_iterator is None: |
957 | + revisions_iterator = self._iter_revisions(None) |
958 | + for revid, revision in revisions_iterator: |
959 | + if revision is None: |
960 | + pass |
961 | + parent_map = vf.get_parent_map([(revid,)]) |
962 | parents_according_to_index = tuple(parent[-1] for parent in |
963 | - parent_map[index_version]) |
964 | - revision = self.get_revision(index_version[-1]) |
965 | + parent_map[(revid,)]) |
966 | parents_according_to_revision = tuple(revision.parent_ids) |
967 | if parents_according_to_index != parents_according_to_revision: |
968 | - yield (index_version[-1], parents_according_to_index, |
969 | + yield (revid, parents_according_to_index, |
970 | parents_according_to_revision) |
971 | |
972 | def _check_for_inconsistent_revision_parents(self): |
973 | |
974 | === modified file 'bzrlib/repofmt/pack_repo.py' |
975 | --- bzrlib/repofmt/pack_repo.py 2009-06-10 03:56:49 +0000 |
976 | +++ bzrlib/repofmt/pack_repo.py 2009-06-16 05:29:42 +0000 |
977 | @@ -2219,52 +2219,6 @@ |
978 | self.revisions._index._key_dependencies.refs.clear() |
979 | self._pack_collection._abort_write_group() |
980 | |
981 | - def _find_inconsistent_revision_parents(self): |
982 | - """Find revisions with incorrectly cached parents. |
983 | - |
984 | - :returns: an iterator yielding tuples of (revison-id, parents-in-index, |
985 | - parents-in-revision). |
986 | - """ |
987 | - if not self.is_locked(): |
988 | - raise errors.ObjectNotLocked(self) |
989 | - pb = ui.ui_factory.nested_progress_bar() |
990 | - result = [] |
991 | - try: |
992 | - revision_nodes = self._pack_collection.revision_index \ |
993 | - .combined_index.iter_all_entries() |
994 | - index_positions = [] |
995 | - # Get the cached index values for all revisions, and also the |
996 | - # location in each index of the revision text so we can perform |
997 | - # linear IO. |
998 | - for index, key, value, refs in revision_nodes: |
999 | - node = (index, key, value, refs) |
1000 | - index_memo = self.revisions._index._node_to_position(node) |
1001 | - if index_memo[0] != index: |
1002 | - raise AssertionError('%r != %r' % (index_memo[0], index)) |
1003 | - index_positions.append((index_memo, key[0], |
1004 | - tuple(parent[0] for parent in refs[0]))) |
1005 | - pb.update("Reading revision index", 0, 0) |
1006 | - index_positions.sort() |
1007 | - batch_size = 1000 |
1008 | - pb.update("Checking cached revision graph", 0, |
1009 | - len(index_positions)) |
1010 | - for offset in xrange(0, len(index_positions), 1000): |
1011 | - pb.update("Checking cached revision graph", offset) |
1012 | - to_query = index_positions[offset:offset + batch_size] |
1013 | - if not to_query: |
1014 | - break |
1015 | - rev_ids = [item[1] for item in to_query] |
1016 | - revs = self.get_revisions(rev_ids) |
1017 | - for revision, item in zip(revs, to_query): |
1018 | - index_parents = item[2] |
1019 | - rev_parents = tuple(revision.parent_ids) |
1020 | - if index_parents != rev_parents: |
1021 | - result.append((revision.revision_id, index_parents, |
1022 | - rev_parents)) |
1023 | - finally: |
1024 | - pb.finished() |
1025 | - return result |
1026 | - |
1027 | def _make_parents_provider(self): |
1028 | return graph.CachingParentsProvider(self) |
1029 | |
1030 | |
1031 | === modified file 'bzrlib/repository.py' |
1032 | --- bzrlib/repository.py 2009-06-12 01:11:00 +0000 |
1033 | +++ bzrlib/repository.py 2009-06-16 05:29:42 +0000 |
1034 | @@ -1154,6 +1154,119 @@ |
1035 | # The old API returned a list, should this actually be a set? |
1036 | return parent_map.keys() |
1037 | |
1038 | + def _check_inventories(self, checker): |
1039 | + """Check the inventories found from the revision scan. |
1040 | + |
1041 | + This is responsible for verifying the sha1 of inventories and |
1042 | + creating a pending_keys set that covers data referenced by inventories. |
1043 | + """ |
1044 | + bar = ui.ui_factory.nested_progress_bar() |
1045 | + try: |
1046 | + self._do_check_inventories(checker, bar) |
1047 | + finally: |
1048 | + bar.finished() |
1049 | + |
1050 | + def _do_check_inventories(self, checker, bar): |
1051 | + """Helper for _check_inventories.""" |
1052 | + revno = 0 |
1053 | + keys = {'chk_bytes':set(), 'inventories':set(), 'texts':set()} |
1054 | + kinds = ['chk_bytes', 'texts'] |
1055 | + count = len(checker.pending_keys) |
1056 | + bar.update("inventories", 0, 2) |
1057 | + current_keys = checker.pending_keys |
1058 | + checker.pending_keys = {} |
1059 | + # Accumulate current checks. |
1060 | + for key in current_keys: |
1061 | + if key[0] != 'inventories' and key[0] not in kinds: |
1062 | + checker._report_items.append('unknown key type %r' % (key,)) |
1063 | + keys[key[0]].add(key[1:]) |
1064 | + if keys['inventories']: |
1065 | + # NB: output order *should* be roughly sorted - topo or |
1066 | + # inverse topo depending on repository - either way decent |
1067 | + # to just delta against. However, pre-CHK formats didn't |
1068 | + # try to optimise inventory layout on disk. As such the |
1069 | + # pre-CHK code path does not use inventory deltas. |
1070 | + last_object = None |
1071 | + for record in self.inventories.check(keys=keys['inventories']): |
1072 | + if record.storage_kind == 'absent': |
1073 | + checker._report_items.append( |
1074 | + 'Missing inventory {%s}' % (record.key,)) |
1075 | + else: |
1076 | + last_object = self._check_record('inventories', record, |
1077 | + checker, last_object, |
1078 | + current_keys[('inventories',) + record.key]) |
1079 | + del keys['inventories'] |
1080 | + else: |
1081 | + return |
1082 | + bar.update("texts", 1) |
1083 | + while (checker.pending_keys or keys['chk_bytes'] |
1084 | + or keys['texts']): |
1085 | + # Something to check. |
1086 | + current_keys = checker.pending_keys |
1087 | + checker.pending_keys = {} |
1088 | + # Accumulate current checks. |
1089 | + for key in current_keys: |
1090 | + if key[0] not in kinds: |
1091 | + checker._report_items.append('unknown key type %r' % (key,)) |
1092 | + keys[key[0]].add(key[1:]) |
1093 | + # Check the outermost kind only - inventories || chk_bytes || texts |
1094 | + for kind in kinds: |
1095 | + if keys[kind]: |
1096 | + last_object = None |
1097 | + for record in getattr(self, kind).check(keys=keys[kind]): |
1098 | + if record.storage_kind == 'absent': |
1099 | + checker._report_items.append( |
1100 | + 'Missing inventory {%s}' % (record.key,)) |
1101 | + else: |
1102 | + last_object = self._check_record(kind, record, |
1103 | + checker, last_object, current_keys[(kind,) + record.key]) |
1104 | + keys[kind] = set() |
1105 | + break |
1106 | + |
1107 | + def _check_record(self, kind, record, checker, last_object, item_data): |
1108 | + """Check a single text from this repository.""" |
1109 | + if kind == 'inventories': |
1110 | + rev_id = record.key[0] |
1111 | + inv = self.deserialise_inventory(rev_id, |
1112 | + record.get_bytes_as('fulltext')) |
1113 | + if last_object is not None: |
1114 | + delta = inv._make_delta(last_object) |
1115 | + for old_path, path, file_id, ie in delta: |
1116 | + if ie is None: |
1117 | + continue |
1118 | + ie.check(checker, rev_id, inv) |
1119 | + else: |
1120 | + for path, ie in inv.iter_entries(): |
1121 | + ie.check(checker, rev_id, inv) |
1122 | + if self._format.fast_deltas: |
1123 | + return inv |
1124 | + elif kind == 'chk_bytes': |
1125 | + # No code written to check chk_bytes for this repo format. |
1126 | + checker._report_items.append( |
1127 | + 'unsupported key type chk_bytes for %s' % (record.key,)) |
1128 | + elif kind == 'texts': |
1129 | + self._check_text(record, checker, item_data) |
1130 | + else: |
1131 | + checker._report_items.append( |
1132 | + 'unknown key type %s for %s' % (kind, record.key)) |
1133 | + |
1134 | + def _check_text(self, record, checker, item_data): |
1135 | + """Check a single text.""" |
1136 | + # Check it is extractable. |
1137 | + # TODO: check length. |
1138 | + if record.storage_kind == 'chunked': |
1139 | + chunks = record.get_bytes_as(record.storage_kind) |
1140 | + sha1 = osutils.sha_strings(chunks) |
1141 | + length = sum(map(len, chunks)) |
1142 | + else: |
1143 | + content = record.get_bytes_as('fulltext') |
1144 | + sha1 = osutils.sha_string(content) |
1145 | + length = len(content) |
1146 | + if item_data and sha1 != item_data[1]: |
1147 | + checker._report_items.append( |
1148 | + 'sha1 mismatch: %s has sha1 %s expected %s referenced by %s' % |
1149 | + (record.key, sha1, item_data[1], item_data[2])) |
1150 | + |
1151 | @staticmethod |
1152 | def create(a_bzrdir): |
1153 | """Construct the current default format repository in a_bzrdir.""" |
1154 | @@ -1700,25 +1813,49 @@ |
1155 | |
1156 | @needs_read_lock |
1157 | def get_revisions(self, revision_ids): |
1158 | - """Get many revisions at once.""" |
1159 | + """Get many revisions at once. |
1160 | + |
1161 | + Repositories that need to check data on every revision read should |
1162 | + subclass this method. |
1163 | + """ |
1164 | return self._get_revisions(revision_ids) |
1165 | |
1166 | @needs_read_lock |
1167 | def _get_revisions(self, revision_ids): |
1168 | """Core work logic to get many revisions without sanity checks.""" |
1169 | - for rev_id in revision_ids: |
1170 | - if not rev_id or not isinstance(rev_id, basestring): |
1171 | - raise errors.InvalidRevisionId(revision_id=rev_id, branch=self) |
1172 | + revs = {} |
1173 | + for revid, rev in self._iter_revisions(revision_ids): |
1174 | + if rev is None: |
1175 | + raise errors.NoSuchRevision(self, revid) |
1176 | + revs[revid] = rev |
1177 | + return [revs[revid] for revid in revision_ids] |
1178 | + |
1179 | + def _iter_revisions(self, revision_ids): |
1180 | + """Iterate over revision objects. |
1181 | + |
1182 | + :param revision_ids: An iterable of revisions to examine. None may be |
1183 | + passed to request all revisions known to the repository. Note that |
1184 | + not all repositories can find unreferenced revisions; for those |
1185 | + repositories only referenced ones will be returned. |
1186 | + :return: An iterator of (revid, revision) tuples. Absent revisions ( |
1187 | + those asked for but not available) are returned as (revid, None). |
1188 | + """ |
1189 | + if revision_ids is None: |
1190 | + revision_ids = self.all_revision_ids() |
1191 | + else: |
1192 | + for rev_id in revision_ids: |
1193 | + if not rev_id or not isinstance(rev_id, basestring): |
1194 | + raise errors.InvalidRevisionId(revision_id=rev_id, branch=self) |
1195 | keys = [(key,) for key in revision_ids] |
1196 | stream = self.revisions.get_record_stream(keys, 'unordered', True) |
1197 | - revs = {} |
1198 | for record in stream: |
1199 | + revid = record.key[0] |
1200 | if record.storage_kind == 'absent': |
1201 | - raise errors.NoSuchRevision(self, record.key[0]) |
1202 | - text = record.get_bytes_as('fulltext') |
1203 | - rev = self._serializer.read_revision_from_string(text) |
1204 | - revs[record.key[0]] = rev |
1205 | - return [revs[revid] for revid in revision_ids] |
1206 | + yield (revid, None) |
1207 | + else: |
1208 | + text = record.get_bytes_as('fulltext') |
1209 | + rev = self._serializer.read_revision_from_string(text) |
1210 | + yield (revid, rev) |
1211 | |
1212 | @needs_read_lock |
1213 | def get_revision_xml(self, revision_id): |
1214 | @@ -2056,8 +2193,7 @@ |
1215 | batch_size] |
1216 | if not to_query: |
1217 | break |
1218 | - for rev_tree in self.revision_trees(to_query): |
1219 | - revision_id = rev_tree.get_revision_id() |
1220 | + for revision_id in to_query: |
1221 | parent_ids = ancestors[revision_id] |
1222 | for text_key in revision_keys[revision_id]: |
1223 | pb.update("Calculating text parents", processed_texts) |
1224 | @@ -2417,7 +2553,8 @@ |
1225 | [parents_provider, other_repository._make_parents_provider()]) |
1226 | return graph.Graph(parents_provider) |
1227 | |
1228 | - def _get_versioned_file_checker(self, text_key_references=None): |
1229 | + def _get_versioned_file_checker(self, text_key_references=None, |
1230 | + ancestors=None): |
1231 | """Return an object suitable for checking versioned files. |
1232 | |
1233 | :param text_key_references: if non-None, an already built |
1234 | @@ -2425,9 +2562,12 @@ |
1235 | to whether they were referred to by the inventory of the |
1236 | revision_id that they contain. If None, this will be |
1237 | calculated. |
1238 | + :param ancestors: Optional result from |
1239 | + self.get_graph().get_parent_map(self.all_revision_ids()) if already |
1240 | + available. |
1241 | """ |
1242 | return _VersionedFileChecker(self, |
1243 | - text_key_references=text_key_references) |
1244 | + text_key_references=text_key_references, ancestors=ancestors) |
1245 | |
1246 | def revision_ids_to_search_result(self, result_set): |
1247 | """Convert a set of revision ids to a graph SearchResult.""" |
1248 | @@ -2483,19 +2623,25 @@ |
1249 | return record.get_bytes_as('fulltext') |
1250 | |
1251 | @needs_read_lock |
1252 | - def check(self, revision_ids=None): |
1253 | + def check(self, revision_ids=None, callback_refs=None, check_repo=True): |
1254 | """Check consistency of all history of given revision_ids. |
1255 | |
1256 | Different repository implementations should override _check(). |
1257 | |
1258 | :param revision_ids: A non-empty list of revision_ids whose ancestry |
1259 | will be checked. Typically the last revision_id of a branch. |
1260 | + :param callback_refs: A dict of check-refs to resolve and callback |
1261 | + the check/_check method on the items listed as wanting the ref. |
1262 | + see bzrlib.check. |
1263 | + :param check_repo: If False do not check the repository contents, just |
1264 | + calculate the data callback_refs requires and call them back. |
1265 | """ |
1266 | - return self._check(revision_ids) |
1267 | + return self._check(revision_ids, callback_refs=callback_refs, |
1268 | + check_repo=check_repo) |
1269 | |
1270 | - def _check(self, revision_ids): |
1271 | - result = check.Check(self) |
1272 | - result.check() |
1273 | + def _check(self, revision_ids, callback_refs, check_repo): |
1274 | + result = check.Check(self, check_repo=check_repo) |
1275 | + result.check(callback_refs) |
1276 | return result |
1277 | |
1278 | def _warn_if_deprecated(self): |
1279 | @@ -3923,10 +4069,10 @@ |
1280 | |
1281 | class _VersionedFileChecker(object): |
1282 | |
1283 | - def __init__(self, repository, text_key_references=None): |
1284 | + def __init__(self, repository, text_key_references=None, ancestors=None): |
1285 | self.repository = repository |
1286 | self.text_index = self.repository._generate_text_key_index( |
1287 | - text_key_references=text_key_references) |
1288 | + text_key_references=text_key_references, ancestors=ancestors) |
1289 | |
1290 | def calculate_file_version_parents(self, text_key): |
1291 | """Calculate the correct parents for a file version according to |
1292 | @@ -3950,6 +4096,18 @@ |
1293 | revision_id) tuples for versions that are present in this versioned |
1294 | file, but not used by the corresponding inventory. |
1295 | """ |
1296 | + local_progress = None |
1297 | + if progress_bar is None: |
1298 | + local_progress = ui.ui_factory.nested_progress_bar() |
1299 | + progress_bar = local_progress |
1300 | + try: |
1301 | + return self._check_file_version_parents(texts, progress_bar) |
1302 | + finally: |
1303 | + if local_progress: |
1304 | + local_progress.finished() |
1305 | + |
1306 | + def _check_file_version_parents(self, texts, progress_bar): |
1307 | + """See check_file_version_parents.""" |
1308 | wrong_parents = {} |
1309 | self.file_ids = set([file_id for file_id, _ in |
1310 | self.text_index.iterkeys()]) |
1311 | @@ -3964,8 +4122,7 @@ |
1312 | text_keys = self.repository.texts.keys() |
1313 | unused_keys = frozenset(text_keys) - set(self.text_index) |
1314 | for num, key in enumerate(self.text_index.iterkeys()): |
1315 | - if progress_bar is not None: |
1316 | - progress_bar.update('checking text graph', num, n_versions) |
1317 | + progress_bar.update('checking text graph', num, n_versions) |
1318 | correct_parents = self.calculate_file_version_parents(key) |
1319 | try: |
1320 | knit_parents = parent_map[key] |
1321 | |
1322 | === modified file 'bzrlib/smart/medium.py' |
1323 | --- bzrlib/smart/medium.py 2009-06-10 03:56:49 +0000 |
1324 | +++ bzrlib/smart/medium.py 2009-06-16 05:29:42 +0000 |
1325 | @@ -37,7 +37,6 @@ |
1326 | from bzrlib import ( |
1327 | debug, |
1328 | errors, |
1329 | - osutils, |
1330 | symbol_versioning, |
1331 | trace, |
1332 | ui, |
1333 | @@ -46,7 +45,8 @@ |
1334 | from bzrlib.smart import client, protocol, request, vfs |
1335 | from bzrlib.transport import ssh |
1336 | """) |
1337 | - |
1338 | +#usually already imported, and getting IllegalScoperReplacer on it here. |
1339 | +from bzrlib import osutils |
1340 | |
1341 | # We must not read any more than 64k at a time so we don't risk "no buffer |
1342 | # space available" errors on some platforms. Windows in particular is likely |
1343 | |
1344 | === modified file 'bzrlib/tests/blackbox/test_check.py' |
1345 | --- bzrlib/tests/blackbox/test_check.py 2009-03-23 14:59:43 +0000 |
1346 | +++ bzrlib/tests/blackbox/test_check.py 2009-05-12 07:27:33 +0000 |
1347 | @@ -34,23 +34,21 @@ |
1348 | tree = self.make_branch_and_tree('.') |
1349 | tree.commit('hallelujah') |
1350 | out, err = self.run_bzr('check') |
1351 | - self.assertContainsRe(err, r"^Checking working tree at '.*'\.\n" |
1352 | - r"Checking repository at '.*'\.\n" |
1353 | - r"checked repository.*\n" |
1354 | + self.assertContainsRe(err, r"Checking working tree at '.*'\.\n") |
1355 | + self.assertContainsRe(err, r"Checking repository at '.*'\.\n") |
1356 | + self.assertContainsRe(err, r"checked repository.*\n" |
1357 | r" 1 revisions\n" |
1358 | r" 0 file-ids\n" |
1359 | - r" 0 unique file texts\n" |
1360 | - r" 0 repeated file texts\n" |
1361 | - r" 0 unreferenced text versions\n" |
1362 | - r"Checking branch at '.*'\.\n" |
1363 | - r"checked branch.*\n$") |
1364 | + ) |
1365 | + self.assertContainsRe(err, r"Checking branch at '.*'\.\n") |
1366 | + self.assertContainsRe(err, r"checked branch.*") |
1367 | |
1368 | def test_check_branch(self): |
1369 | tree = self.make_branch_and_tree('.') |
1370 | tree.commit('foo') |
1371 | out, err = self.run_bzr('check --branch') |
1372 | self.assertContainsRe(err, r"^Checking branch at '.*'\.\n" |
1373 | - r"checked branch.*\n$") |
1374 | + r"checked branch.*") |
1375 | |
1376 | def test_check_repository(self): |
1377 | tree = self.make_branch_and_tree('.') |
1378 | @@ -60,9 +58,7 @@ |
1379 | r"checked repository.*\n" |
1380 | r" 1 revisions\n" |
1381 | r" 0 file-ids\n" |
1382 | - r" 0 unique file texts\n" |
1383 | - r" 0 repeated file texts\n" |
1384 | - r" 0 unreferenced text versions$") |
1385 | + ) |
1386 | |
1387 | def test_check_tree(self): |
1388 | tree = self.make_branch_and_tree('.') |
1389 | @@ -76,7 +72,7 @@ |
1390 | out, err = self.run_bzr('check --tree --branch') |
1391 | self.assertContainsRe(err, r"^Checking working tree at '.*'\.\n" |
1392 | r"Checking branch at '.*'\.\n" |
1393 | - r"checked branch.*\n$") |
1394 | + r"checked branch.*") |
1395 | |
1396 | def test_check_missing_tree(self): |
1397 | branch = self.make_branch('.') |
1398 | @@ -87,9 +83,9 @@ |
1399 | branch = self.make_branch('.') |
1400 | out, err = self.run_bzr('check --tree --branch') |
1401 | self.assertContainsRe(err, |
1402 | - r"^No working tree found at specified location\.\n" |
1403 | r"Checking branch at '.*'\.\n" |
1404 | - r"checked branch.*\n$") |
1405 | + r"No working tree found at specified location\.\n" |
1406 | + r"checked branch.*") |
1407 | |
1408 | def test_check_missing_branch_in_shared_repo(self): |
1409 | self.make_repository('shared', shared=True) |
1410 | |
1411 | === modified file 'bzrlib/tests/branch_implementations/test_check.py' |
1412 | --- bzrlib/tests/branch_implementations/test_check.py 2009-06-10 03:56:49 +0000 |
1413 | +++ bzrlib/tests/branch_implementations/test_check.py 2009-06-16 05:29:42 +0000 |
1414 | @@ -16,7 +16,9 @@ |
1415 | |
1416 | """Tests for branch implementations - test check() functionality""" |
1417 | |
1418 | -from bzrlib import errors |
1419 | +from StringIO import StringIO |
1420 | + |
1421 | +from bzrlib import errors, tests, ui |
1422 | from bzrlib.tests.branch_implementations import TestCaseWithBranch |
1423 | |
1424 | |
1425 | @@ -54,24 +56,57 @@ |
1426 | # with set_last_revision_info |
1427 | tree.branch.set_last_revision_info(3, r5) |
1428 | |
1429 | - e = self.assertRaises(errors.BzrCheckError, |
1430 | - tree.branch.check) |
1431 | - self.assertEqual('Internal check failed:' |
1432 | - ' revno does not match len(mainline) 3 != 5', str(e)) |
1433 | + tree.lock_read() |
1434 | + self.addCleanup(tree.unlock) |
1435 | + refs = self.make_refs(tree.branch) |
1436 | + result = tree.branch.check(refs) |
1437 | + ui.ui_factory = tests.TestUIFactory(stdout=StringIO()) |
1438 | + result.report_results(True) |
1439 | + self.assertContainsRe('revno does not match len', |
1440 | + ui.ui_factory.stdout.getvalue()) |
1441 | |
1442 | def test_check_branch_report_results(self): |
1443 | """Checking a branch produces results which can be printed""" |
1444 | branch = self.make_branch('.') |
1445 | - result = branch.check() |
1446 | + branch.lock_read() |
1447 | + self.addCleanup(branch.unlock) |
1448 | + result = branch.check(self.make_refs(branch)) |
1449 | # reports results through logging |
1450 | result.report_results(verbose=True) |
1451 | result.report_results(verbose=False) |
1452 | |
1453 | - def test_check_detects_ghosts_in_mainline(self): |
1454 | - tree = self.make_branch_and_tree('test') |
1455 | - tree.set_parent_ids(['thisisaghost'], allow_leftmost_as_ghost=True) |
1456 | - r1 = tree.commit('one') |
1457 | - r2 = tree.commit('two') |
1458 | - result = tree.branch.check() |
1459 | - self.assertEquals(True, result.ghosts_in_mainline) |
1460 | + def test__get_check_refs(self): |
1461 | + tree = self.make_branch_and_tree('.') |
1462 | + revid = tree.commit('foo') |
1463 | + self.assertEqual( |
1464 | + set([('revision-existence', revid), ('lefthand-distance', revid)]), |
1465 | + set(tree.branch._get_check_refs())) |
1466 | |
1467 | + def make_refs(self, branch): |
1468 | + needed_refs = branch._get_check_refs() |
1469 | + refs = {} |
1470 | + distances = set() |
1471 | + existences = set() |
1472 | + for ref in needed_refs: |
1473 | + kind, value = ref |
1474 | + if kind == 'lefthand-distance': |
1475 | + distances.add(value) |
1476 | + elif kind == 'revision-existence': |
1477 | + existences.add(value) |
1478 | + else: |
1479 | + raise AssertionError( |
1480 | + 'unknown ref kind for ref %s' % ref) |
1481 | + node_distances = branch.repository.get_graph().find_lefthand_distances( |
1482 | + distances) |
1483 | + for key, distance in node_distances.iteritems(): |
1484 | + refs[('lefthand-distance', key)] = distance |
1485 | + if key in existences and distance > 0: |
1486 | + refs[('revision-existence', key)] = True |
1487 | + existences.remove(key) |
1488 | + parent_map = branch.repository.get_graph().get_parent_map(existences) |
1489 | + for key in parent_map: |
1490 | + refs[('revision-existence', key)] = True |
1491 | + existences.remove(key) |
1492 | + for key in existences: |
1493 | + refs[('revision-existence', key)] = False |
1494 | + return refs |
1495 | |
1496 | === modified file 'bzrlib/tests/per_repository/test_check.py' |
1497 | --- bzrlib/tests/per_repository/test_check.py 2009-04-09 20:23:07 +0000 |
1498 | +++ bzrlib/tests/per_repository/test_check.py 2009-05-12 05:34:15 +0000 |
1499 | @@ -36,18 +36,14 @@ |
1500 | tree = self.make_branch_and_tree('.') |
1501 | self.build_tree(['foo']) |
1502 | tree.smart_add(['.']) |
1503 | - tree.commit('1') |
1504 | + revid1 = tree.commit('1') |
1505 | self.build_tree(['bar']) |
1506 | tree.smart_add(['.']) |
1507 | - tree.commit('2') |
1508 | - # XXX: check requires a non-empty revision IDs list, but it ignores the |
1509 | - # contents of it! |
1510 | - check_object = tree.branch.repository.check(['ignored']) |
1511 | + revid2 = tree.commit('2') |
1512 | + check_object = tree.branch.repository.check([revid1, revid2]) |
1513 | check_object.report_results(verbose=True) |
1514 | log = self._get_log(keep_log_file=True) |
1515 | - self.assertContainsRe( |
1516 | - log, |
1517 | - "0 unreferenced text versions") |
1518 | + self.assertContainsRe(log, "0 unreferenced text versions") |
1519 | |
1520 | |
1521 | class TestFindInconsistentRevisionParents(TestCaseWithBrokenRevisionIndex): |
1522 | @@ -100,3 +96,32 @@ |
1523 | "revision-id has wrong parents in index: " |
1524 | r"\('incorrect-parent',\) should be \(\)") |
1525 | |
1526 | + |
1527 | +class TestCallbacks(TestCaseWithRepository): |
1528 | + |
1529 | + def test_callback_tree_and_branch(self): |
1530 | + # use a real tree to get actual refs that will work |
1531 | + tree = self.make_branch_and_tree('foo') |
1532 | + revid = tree.commit('foo') |
1533 | + tree.lock_read() |
1534 | + self.addCleanup(tree.unlock) |
1535 | + needed_refs = {} |
1536 | + for ref in tree._get_check_refs(): |
1537 | + needed_refs.setdefault(ref, []).append(tree) |
1538 | + for ref in tree.branch._get_check_refs(): |
1539 | + needed_refs.setdefault(ref, []).append(tree.branch) |
1540 | + self.tree_check = tree._check |
1541 | + self.branch_check = tree.branch.check |
1542 | + tree._check = self.tree_callback |
1543 | + tree.branch.check = self.branch_callback |
1544 | + self.callbacks = [] |
1545 | + tree.branch.repository.check([revid], callback_refs=needed_refs) |
1546 | + self.assertNotEqual([], self.callbacks) |
1547 | + |
1548 | + def tree_callback(self, refs): |
1549 | + self.callbacks.append(('tree', refs)) |
1550 | + return self.tree_check(refs) |
1551 | + |
1552 | + def branch_callback(self, refs): |
1553 | + self.callbacks.append(('branch', refs)) |
1554 | + return self.branch_check(refs) |
1555 | |
1556 | === modified file 'bzrlib/tests/test_graph.py' |
1557 | --- bzrlib/tests/test_graph.py 2009-06-10 03:56:49 +0000 |
1558 | +++ bzrlib/tests/test_graph.py 2009-06-16 05:29:42 +0000 |
1559 | @@ -526,6 +526,19 @@ |
1560 | graph = self.make_graph(history_shortcut) |
1561 | self.assertEqual(set(['rev2b']), graph.find_lca('rev3a', 'rev3b')) |
1562 | |
1563 | + def test_lefthand_distance_smoke(self): |
1564 | + """A simple does it work test for graph.lefthand_distance(keys).""" |
1565 | + graph = self.make_graph(history_shortcut) |
1566 | + distance_graph = graph.find_lefthand_distances(['rev3b', 'rev2a']) |
1567 | + self.assertEqual({'rev2a': 2, 'rev3b': 3}, distance_graph) |
1568 | + |
1569 | + def test_lefthand_distance_ghosts(self): |
1570 | + """A simple does it work test for graph.lefthand_distance(keys).""" |
1571 | + nodes = {'nonghost':[NULL_REVISION], 'toghost':['ghost']} |
1572 | + graph = self.make_graph(nodes) |
1573 | + distance_graph = graph.find_lefthand_distances(['nonghost', 'toghost']) |
1574 | + self.assertEqual({'nonghost': 1, 'toghost': -1}, distance_graph) |
1575 | + |
1576 | def test_recursive_unique_lca(self): |
1577 | """Test finding a unique least common ancestor. |
1578 | |
1579 | |
1580 | === modified file 'bzrlib/tests/test_versionedfile.py' |
1581 | --- bzrlib/tests/test_versionedfile.py 2009-05-01 18:09:24 +0000 |
1582 | +++ bzrlib/tests/test_versionedfile.py 2009-06-15 00:24:04 +0000 |
1583 | @@ -30,6 +30,7 @@ |
1584 | knit as _mod_knit, |
1585 | osutils, |
1586 | progress, |
1587 | + ui, |
1588 | ) |
1589 | from bzrlib.errors import ( |
1590 | RevisionNotPresent, |
1591 | @@ -1510,6 +1511,28 @@ |
1592 | self.assertRaises(RevisionNotPresent, |
1593 | files.annotate, prefix + ('missing-key',)) |
1594 | |
1595 | + def test_check_no_parameters(self): |
1596 | + files = self.get_versionedfiles() |
1597 | + |
1598 | + def test_check_progressbar_parameter(self): |
1599 | + """A progress bar can be supplied because check can be a generator.""" |
1600 | + pb = ui.ui_factory.nested_progress_bar() |
1601 | + self.addCleanup(pb.finished) |
1602 | + files = self.get_versionedfiles() |
1603 | + files.check(progress_bar=pb) |
1604 | + |
1605 | + def test_check_with_keys_becomes_generator(self): |
1606 | + files = self.get_versionedfiles() |
1607 | + self.get_diamond_files(files) |
1608 | + keys = files.keys() |
1609 | + entries = files.check(keys=keys) |
1610 | + seen = set() |
1611 | + # Texts output should be fulltexts. |
1612 | + self.capture_stream(files, entries, seen.add, |
1613 | + files.get_parent_map(keys), require_fulltext=True) |
1614 | + # All texts should be output. |
1615 | + self.assertEqual(set(keys), seen) |
1616 | + |
1617 | def test_construct(self): |
1618 | """Each parameterised test can be constructed on a transport.""" |
1619 | files = self.get_versionedfiles() |
1620 | @@ -1669,7 +1692,8 @@ |
1621 | 'knit-delta-closure', 'knit-delta-closure-ref', |
1622 | 'groupcompress-block', 'groupcompress-block-ref']) |
1623 | |
1624 | - def capture_stream(self, f, entries, on_seen, parents): |
1625 | + def capture_stream(self, f, entries, on_seen, parents, |
1626 | + require_fulltext=False): |
1627 | """Capture a stream for testing.""" |
1628 | for factory in entries: |
1629 | on_seen(factory.key) |
1630 | @@ -1680,6 +1704,8 @@ |
1631 | self.assertEqual(parents[factory.key], factory.parents) |
1632 | self.assertIsInstance(factory.get_bytes_as(factory.storage_kind), |
1633 | str) |
1634 | + if require_fulltext: |
1635 | + factory.get_bytes_as('fulltext') |
1636 | |
1637 | def test_get_record_stream_interface(self): |
1638 | """each item in a stream has to provide a regular interface.""" |
1639 | @@ -2547,8 +2573,8 @@ |
1640 | self.assertRaises(NotImplementedError, |
1641 | self.texts.add_mpdiffs, []) |
1642 | |
1643 | - def test_check(self): |
1644 | - self.assertTrue(self.texts.check()) |
1645 | + def test_check_noerrors(self): |
1646 | + self.texts.check() |
1647 | |
1648 | def test_insert_record_stream(self): |
1649 | self.assertRaises(NotImplementedError, self.texts.insert_record_stream, |
1650 | |
1651 | === modified file 'bzrlib/tests/workingtree_implementations/__init__.py' |
1652 | --- bzrlib/tests/workingtree_implementations/__init__.py 2009-06-10 03:56:49 +0000 |
1653 | +++ bzrlib/tests/workingtree_implementations/__init__.py 2009-06-16 05:29:42 +0000 |
1654 | @@ -60,45 +60,49 @@ |
1655 | |
1656 | |
1657 | def load_tests(standard_tests, module, loader): |
1658 | + test_names = [ |
1659 | + 'add_reference', |
1660 | + 'add', |
1661 | + 'basis_inventory', |
1662 | + 'basis_tree', |
1663 | + 'break_lock', |
1664 | + 'changes_from', |
1665 | + 'check', |
1666 | + 'content_filters', |
1667 | + 'commit', |
1668 | + 'eol_conversion', |
1669 | + 'executable', |
1670 | + 'flush', |
1671 | + 'get_file_mtime', |
1672 | + 'get_parent_ids', |
1673 | + 'inv', |
1674 | + 'is_control_filename', |
1675 | + 'is_ignored', |
1676 | + 'locking', |
1677 | + 'merge_from_branch', |
1678 | + 'mkdir', |
1679 | + 'move', |
1680 | + 'nested_specifics', |
1681 | + 'parents', |
1682 | + 'paths2ids', |
1683 | + 'pull', |
1684 | + 'put_file', |
1685 | + 'readonly', |
1686 | + 'read_working_inventory', |
1687 | + 'remove', |
1688 | + 'rename_one', |
1689 | + 'revision_tree', |
1690 | + 'set_root_id', |
1691 | + 'smart_add', |
1692 | + 'uncommit', |
1693 | + 'unversion', |
1694 | + 'views', |
1695 | + 'walkdirs', |
1696 | + 'workingtree', |
1697 | + ] |
1698 | test_workingtree_implementations = [ |
1699 | - 'bzrlib.tests.workingtree_implementations.test_add_reference', |
1700 | - 'bzrlib.tests.workingtree_implementations.test_add', |
1701 | - 'bzrlib.tests.workingtree_implementations.test_basis_inventory', |
1702 | - 'bzrlib.tests.workingtree_implementations.test_basis_tree', |
1703 | - 'bzrlib.tests.workingtree_implementations.test_break_lock', |
1704 | - 'bzrlib.tests.workingtree_implementations.test_changes_from', |
1705 | - 'bzrlib.tests.workingtree_implementations.test_content_filters', |
1706 | - 'bzrlib.tests.workingtree_implementations.test_commit', |
1707 | - 'bzrlib.tests.workingtree_implementations.test_eol_conversion', |
1708 | - 'bzrlib.tests.workingtree_implementations.test_executable', |
1709 | - 'bzrlib.tests.workingtree_implementations.test_flush', |
1710 | - 'bzrlib.tests.workingtree_implementations.test_get_file_mtime', |
1711 | - 'bzrlib.tests.workingtree_implementations.test_get_parent_ids', |
1712 | - 'bzrlib.tests.workingtree_implementations.test_inv', |
1713 | - 'bzrlib.tests.workingtree_implementations.test_is_control_filename', |
1714 | - 'bzrlib.tests.workingtree_implementations.test_is_ignored', |
1715 | - 'bzrlib.tests.workingtree_implementations.test_locking', |
1716 | - 'bzrlib.tests.workingtree_implementations.test_merge_from_branch', |
1717 | - 'bzrlib.tests.workingtree_implementations.test_mkdir', |
1718 | - 'bzrlib.tests.workingtree_implementations.test_move', |
1719 | - 'bzrlib.tests.workingtree_implementations.test_nested_specifics', |
1720 | - 'bzrlib.tests.workingtree_implementations.test_parents', |
1721 | - 'bzrlib.tests.workingtree_implementations.test_paths2ids', |
1722 | - 'bzrlib.tests.workingtree_implementations.test_pull', |
1723 | - 'bzrlib.tests.workingtree_implementations.test_put_file', |
1724 | - 'bzrlib.tests.workingtree_implementations.test_readonly', |
1725 | - 'bzrlib.tests.workingtree_implementations.test_read_working_inventory', |
1726 | - 'bzrlib.tests.workingtree_implementations.test_remove', |
1727 | - 'bzrlib.tests.workingtree_implementations.test_rename_one', |
1728 | - 'bzrlib.tests.workingtree_implementations.test_revision_tree', |
1729 | - 'bzrlib.tests.workingtree_implementations.test_set_root_id', |
1730 | - 'bzrlib.tests.workingtree_implementations.test_smart_add', |
1731 | - 'bzrlib.tests.workingtree_implementations.test_uncommit', |
1732 | - 'bzrlib.tests.workingtree_implementations.test_unversion', |
1733 | - 'bzrlib.tests.workingtree_implementations.test_views', |
1734 | - 'bzrlib.tests.workingtree_implementations.test_walkdirs', |
1735 | - 'bzrlib.tests.workingtree_implementations.test_workingtree', |
1736 | - ] |
1737 | + 'bzrlib.tests.workingtree_implementations.test_' + name for |
1738 | + name in test_names] |
1739 | |
1740 | scenarios = make_scenarios( |
1741 | tests.default_transport, |
1742 | |
1743 | === added file 'bzrlib/tests/workingtree_implementations/test_check.py' |
1744 | --- bzrlib/tests/workingtree_implementations/test_check.py 1970-01-01 00:00:00 +0000 |
1745 | +++ bzrlib/tests/workingtree_implementations/test_check.py 2009-05-08 02:06:36 +0000 |
1746 | @@ -0,0 +1,55 @@ |
1747 | +# Copyright (C) 2009 Canonical Ltd |
1748 | +# |
1749 | +# This program is free software; you can redistribute it and/or modify |
1750 | +# it under the terms of the GNU General Public License as published by |
1751 | +# the Free Software Foundation; either version 2 of the License, or |
1752 | +# (at your option) any later version. |
1753 | +# |
1754 | +# This program is distributed in the hope that it will be useful, |
1755 | +# but WITHOUT ANY WARRANTY; without even the implied warranty of |
1756 | +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
1757 | +# GNU General Public License for more details. |
1758 | +# |
1759 | +# You should have received a copy of the GNU General Public License |
1760 | +# along with this program; if not, write to the Free Software |
1761 | +# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA |
1762 | + |
1763 | +"""Tests for checking of trees.""" |
1764 | + |
1765 | +from bzrlib import ( |
1766 | + tests, |
1767 | + ) |
1768 | +from bzrlib.tests.workingtree_implementations import TestCaseWithWorkingTree |
1769 | + |
1770 | + |
1771 | +class TestCheck(TestCaseWithWorkingTree): |
1772 | + |
1773 | + def test__get_check_refs_new(self): |
1774 | + tree = self.make_branch_and_tree('tree') |
1775 | + self.assertEqual(set([('trees', 'null:')]), |
1776 | + set(tree._get_check_refs())) |
1777 | + |
1778 | + def test__get_check_refs_basis(self): |
1779 | + # with a basis, all current bzr trees cache it and so need the |
1780 | + # inventory to cross-check. |
1781 | + tree = self.make_branch_and_tree('tree') |
1782 | + revid = tree.commit('first post') |
1783 | + self.assertEqual(set([('trees', revid)]), |
1784 | + set(tree._get_check_refs())) |
1785 | + |
1786 | + def test__check_with_refs(self): |
1787 | + # _check can be called with a dict of the things required. |
1788 | + tree = self.make_branch_and_tree('tree') |
1789 | + tree.lock_write() |
1790 | + self.addCleanup(tree.unlock) |
1791 | + revid = tree.commit('first post') |
1792 | + needed_refs = tree._get_check_refs() |
1793 | + repo = tree.branch.repository |
1794 | + for ref in needed_refs: |
1795 | + kind, revid = ref |
1796 | + refs = {} |
1797 | + if kind == 'trees': |
1798 | + refs[ref] = repo.revision_tree(revid) |
1799 | + else: |
1800 | + self.fail('unknown ref kind') |
1801 | + tree._check(refs) |
1802 | |
1803 | === modified file 'bzrlib/ui/text.py' |
1804 | --- bzrlib/ui/text.py 2009-04-10 19:37:20 +0000 |
1805 | +++ bzrlib/ui/text.py 2009-05-12 06:30:56 +0000 |
1806 | @@ -26,6 +26,7 @@ |
1807 | from bzrlib.lazy_import import lazy_import |
1808 | lazy_import(globals(), """ |
1809 | from bzrlib import ( |
1810 | + debug, |
1811 | progress, |
1812 | osutils, |
1813 | symbol_versioning, |
1814 | @@ -128,6 +129,7 @@ |
1815 | self._last_task = None |
1816 | self._total_byte_count = 0 |
1817 | self._bytes_since_update = 0 |
1818 | + self._fraction = 0 |
1819 | |
1820 | def _show_line(self, s): |
1821 | n = self._width - 1 |
1822 | @@ -151,9 +153,14 @@ |
1823 | cols = 20 |
1824 | if self._last_task is None: |
1825 | completion_fraction = 0 |
1826 | + self._fraction = 0 |
1827 | else: |
1828 | completion_fraction = \ |
1829 | self._last_task._overall_completion_fraction() or 0 |
1830 | + if (completion_fraction < self._fraction and 'progress' in |
1831 | + debug.debug_flags): |
1832 | + import pdb;pdb.set_trace() |
1833 | + self._fraction = completion_fraction |
1834 | markers = int(round(float(cols) * completion_fraction)) - 1 |
1835 | bar_str = '[' + ('#' * markers + spin_str).ljust(cols) + '] ' |
1836 | return bar_str |
1837 | |
1838 | === modified file 'bzrlib/versionedfile.py' |
1839 | --- bzrlib/versionedfile.py 2009-06-10 03:56:49 +0000 |
1840 | +++ bzrlib/versionedfile.py 2009-06-16 05:29:42 +0000 |
1841 | @@ -876,7 +876,16 @@ |
1842 | raise NotImplementedError(self.annotate) |
1843 | |
1844 | def check(self, progress_bar=None): |
1845 | - """Check this object for integrity.""" |
1846 | + """Check this object for integrity. |
1847 | + |
1848 | + :param progress_bar: A progress bar to output as the check progresses. |
1849 | + :param keys: Specific keys within the VersionedFiles to check. When |
1850 | + this parameter is not None, check() becomes a generator as per |
1851 | + get_record_stream. The difference to get_record_stream is that |
1852 | + more or deeper checks will be performed. |
1853 | + :return: None, or if keys was supplied a generator as per |
1854 | + get_record_stream. |
1855 | + """ |
1856 | raise NotImplementedError(self.check) |
1857 | |
1858 | @staticmethod |
1859 | @@ -1092,10 +1101,15 @@ |
1860 | result.append((prefix + (origin,), line)) |
1861 | return result |
1862 | |
1863 | - def check(self, progress_bar=None): |
1864 | + def check(self, progress_bar=None, keys=None): |
1865 | """See VersionedFiles.check().""" |
1866 | + # XXX: This is over-enthusiastic but as we only thunk for Weaves today |
1867 | + # this is tolerable. Ideally we'd pass keys down to check() and |
1868 | + # have the older VersiondFile interface updated too. |
1869 | for prefix, vf in self._iter_all_components(): |
1870 | vf.check() |
1871 | + if keys is not None: |
1872 | + return self.get_record_stream(keys, 'unordered', True) |
1873 | |
1874 | def get_parent_map(self, keys): |
1875 | """Get a map of the parents of keys. |
1876 | |
1877 | === modified file 'bzrlib/workingtree.py' |
1878 | --- bzrlib/workingtree.py 2009-06-15 15:47:45 +0000 |
1879 | +++ bzrlib/workingtree.py 2009-06-16 05:29:42 +0000 |
1880 | @@ -290,6 +290,16 @@ |
1881 | self._control_files.break_lock() |
1882 | self.branch.break_lock() |
1883 | |
1884 | + def _get_check_refs(self): |
1885 | + """Return the references needed to perform a check of this tree. |
1886 | + |
1887 | + The default implementation returns no refs, and is only suitable for |
1888 | + trees that have no local caching and can commit on ghosts at any time. |
1889 | + |
1890 | + :seealso: bzrlib.check for details about check_refs. |
1891 | + """ |
1892 | + return [] |
1893 | + |
1894 | def requires_rich_root(self): |
1895 | return self._format.requires_rich_root |
1896 | |
1897 | @@ -2515,12 +2525,17 @@ |
1898 | return un_resolved, resolved |
1899 | |
1900 | @needs_read_lock |
1901 | - def _check(self): |
1902 | + def _check(self, references): |
1903 | + """Check the tree for consistency. |
1904 | + |
1905 | + :param references: A dict with keys matching the items returned by |
1906 | + self._get_check_refs(), and values from looking those keys up in |
1907 | + the repository. |
1908 | + """ |
1909 | tree_basis = self.basis_tree() |
1910 | tree_basis.lock_read() |
1911 | try: |
1912 | - repo_basis = self.branch.repository.revision_tree( |
1913 | - self.last_revision()) |
1914 | + repo_basis = references[('trees', self.last_revision())] |
1915 | if len(list(repo_basis.iter_changes(tree_basis))) > 0: |
1916 | raise errors.BzrCheckError( |
1917 | "Mismatched basis inventory content.") |
1918 | @@ -2572,6 +2587,10 @@ |
1919 | if self._inventory is None: |
1920 | self.read_working_inventory() |
1921 | |
1922 | + def _get_check_refs(self): |
1923 | + """Return the references needed to perform a check of this tree.""" |
1924 | + return [('trees', self.last_revision())] |
1925 | + |
1926 | def lock_tree_write(self): |
1927 | """See WorkingTree.lock_tree_write(). |
1928 | |
1929 | @@ -2634,6 +2653,10 @@ |
1930 | mode=self.bzrdir._get_file_mode()) |
1931 | return True |
1932 | |
1933 | + def _get_check_refs(self): |
1934 | + """Return the references needed to perform a check of this tree.""" |
1935 | + return [('trees', self.last_revision())] |
1936 | + |
1937 | @needs_tree_write_lock |
1938 | def set_conflicts(self, conflicts): |
1939 | self._put_rio('conflicts', conflicts.to_stanzas(), |
1940 | |
1941 | === added file 'doc/developers/check.txt' |
1942 | --- doc/developers/check.txt 1970-01-01 00:00:00 +0000 |
1943 | +++ doc/developers/check.txt 2009-05-12 03:50:39 +0000 |
1944 | @@ -0,0 +1,63 @@ |
1945 | +Check Notes |
1946 | +=========== |
1947 | + |
1948 | +.. contents:: :local: |
1949 | + |
1950 | +Overview |
1951 | +-------- |
1952 | + |
1953 | +Check has multiple responsibilities: |
1954 | + |
1955 | +* Ensure that the data as recorded on disk is accessible intact and unaltered. |
1956 | +* Ensure that a branch/repository/tree/whatever is ready for upgrade. |
1957 | +* Look for and report on recorded-data issues where previous bzr's, or changing |
1958 | + situations have lead so some form of inconsistency. |
1959 | +* Report sufficient information for a user to either fix the issue themselves |
1960 | + or report a bug that will hopefully be sufficiently detailed we can fix based |
1961 | + on the initial report. |
1962 | +* Not scare users when run if everything is okey-dokey. |
1963 | + |
1964 | +Ideally one check invocation can do all these things. |
1965 | + |
1966 | +Repository |
1967 | +---------- |
1968 | + |
1969 | +Things that can go wrong: |
1970 | +* Bit errors or alterations may occur in raw data. |
1971 | +* Data that is referenced may be missing |
1972 | +* There could be a lot of garbage in upload etc. |
1973 | +* File graphs may be inconsistent with inventories and parents. |
1974 | +* The revision graph cache can be inconsistent with the revision data. |
1975 | + |
1976 | +Branch |
1977 | +------ |
1978 | + |
1979 | +Things that can go wrong: |
1980 | +* Tag or tip revision ids may be missing from the repo. |
1981 | +* The revno tip cache may be wrong. |
1982 | +* Various urls could be problematic (not inaccessible, just invalid) |
1983 | +* Stacked-on branch could be inaccessible. |
1984 | + |
1985 | +Tree |
1986 | +---- |
1987 | + |
1988 | +Things that can go wrong: |
1989 | +* Bit errors in dirstate. |
1990 | +* Corrupt or invalid shelves. |
1991 | +* Corrupt dirstates written to disk. |
1992 | +* Cached inventories might not match repository. |
1993 | + |
1994 | +Duplicate work |
1995 | +-------------- |
1996 | + |
1997 | +If we check every branch in a repo separately we will encounter duplicate |
1998 | +effort in assessing things like missing tags/tips, revno cache etc. |
1999 | + |
2000 | +Outline of approach |
2001 | +------------------- |
2002 | + |
2003 | +To check a repository, we scan for branches, open their trees and generate |
2004 | +summary data. We then collect all the summary data in as compact a form as |
2005 | +possible and do a detailed check on the repository, calling back out to branch |
2006 | +and trees as we encounter the actual data that that tree/branch requires to |
2007 | +perform its check. |
2008 |
This gives a single-pass progress bar to check, makes it talk 1/2 the
time on 1.9 flavour formats and nochange on CHK formats. More work to
come but this seems unqualified better to me.
(NB: tiny projects show this code as slower, but testing on things the
size of bzr or larger gives the better timings I'm seeing).
--