Merge lp:~lifeless/bzr/check-1 into lp:~bzr/bzr/trunk-old

Proposed by Robert Collins
Status: Merged
Merged at revision: not available
Proposed branch: lp:~lifeless/bzr/check-1
Merge into: lp:~bzr/bzr/trunk-old
Diff against target: None lines
To merge this branch: bzr merge lp:~lifeless/bzr/check-1
Reviewer Review Type Date Requested Status
Vincent Ladeuil Approve
bzr-core Pending
Review via email: mp+7489@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Robert Collins (lifeless) wrote :

This gives a single-pass progress bar to check, makes it talk 1/2 the
time on 1.9 flavour formats and nochange on CHK formats. More work to
come but this seems unqualified better to me.

(NB: tiny projects show this code as slower, but testing on things the
size of bzr or larger gives the better timings I'm seeing).

--

Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :
Download full text (4.5 KiB)

Robert Collins wrote:
> Robert Collins has proposed merging lp:~lifeless/bzr/check-1 into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
>
> This gives a single-pass progress bar to check, makes it talk 1/2 the
> time on 1.9 flavour formats and nochange on CHK formats. More work to
> come but this seems unqualified better to me.
>
>
Thanks for working on check. This command will get quite a workout from
lots of users as they migrate to 2a so I think getting it performing
well and reliably is quite a high priority.

This is a pretty big patch and I'm finding it difficult to review it all
at once. Here's a partial review with numerous clean-ups outside the
guts of your new code. To begin with, I'm happy with the -Dprogress
changes. These are arguably completely independent from check so it
would have been nice to submit them as a separate patch. In any case,
the changes related to that are good to go so you can land them immediately.

> +API Changes
> +***********
> +
> +* ``WorkingTree._check`` now requires a references dict with keys matching
> + those returned by ``WorkingTree._get_check_refs``. (Robert Collins)
> +
>
And more importantly, you've changed the public API Branch.check() to
have a new mandatory parameter and InventoryEntry.check() to not have a
tree parameter. You need to document these as API breaks.

> + def find_lefthand_distances(self, keys):
> + """Find the distance to null for all the keys in keys.
> +
> + :param keys: keys to lookup.
> + :return: A dict key->distance for all of keys.
> + """
> + # Optimisable by concurrent searching, but a random spread should get
> + # some sort of hit rate.
> + result = {}
>

result isn't used in this routine.

> + else:
> + # At the moment, check does not extra work over get_record_stream
> + return self.get_record_stream(keys, 'unordered', True)
>

This comment is confusing and needs rewording.

> + for revid, revision in revisions_iterator:
> + if revision is None:
> + pass
> + parent_map = vf.get_parent_map([(revid,)])
>
"pass" does nothing here so I'm assuming it's wrong. "continue" maybe?

> === modified file 'bzrlib/smart/medium.py'
> --- bzrlib/smart/medium.py 2009-06-10 03:56:49 +0000
> +++ bzrlib/smart/medium.py 2009-06-16 05:29:42 +0000
> @@ -37,7 +37,6 @@
> from bzrlib import (
> debug,
> errors,
> - osutils,
> symbol_versioning,
> trace,
> ui,
> @@ -46,7 +45,8 @@
> from bzrlib.smart import client, protocol, request, vfs
> from bzrlib.transport import ssh
> """)
> -
> +#usually already imported, and getting IllegalScoperReplacer on it here.
> +from bzrlib import osutils
>

Was this a transient issue? If not, it looks like a completely separate
bug to the check stuff and maybe ought to be submitted as such.
> def check(self, progress_bar=None):
> - """Check this object for integrity."""
> + """Check this object for integrity.
> +
> + :param progress_bar: A progress bar to output as the check progresses.
> + :param keys: Specific keys within the Versione...

Read more...

Revision history for this message
Robert Collins (lifeless) wrote :

I've stopped working on check while the current progress is reviewed.

I'll go through and look at the issues you've noted.

With regard to the API changes, I can document Branch.check, as it is
conceivable that people have used it. WT._check is private;
InventoryEntry.check is also private by virtue of requiring a check
object - it can only be used from within check.

-Rob

Revision history for this message
Vincent Ladeuil (vila) wrote :
Download full text (9.8 KiB)

Huge patch to review here... for which the main intent is not
obvious.

The overall feeling is that you stopped your effort at an
intermediate point, with enough gains to land. Yet, many things
seems unfinished.

If you could add comments as brain dump before landing I'm sure a
lot of energy will not be wasted by the future maintainer (who
may or not be yourself ;-)

Some remarks ask for changes, they are small enough to be
considered as tweaks. If you disagree, an explanation will be
welcome.

My main concern here is that you're changing check without adding
test facilities to create broken repo/branch/tree...and while I
understand the desire to not check for every conceivable bug, I'd
feel more confident in the code if some tests were easier to
write...

Other than that, it looks good so with as many tweaks as you can
take in to account:

Review: approve

>>>>> "robert" == Robert Collins <email address hidden> writes:
...
    robert> === modified file 'bzrlib/branch.py'
    robert> --- bzrlib/branch.py 2009-06-10 03:56:49 +0000
    robert> +++ bzrlib/branch.py 2009-06-16 05:29:42 +0000
    robert> @@ -125,6 +125,14 @@
    robert> raise errors.UnstackableRepositoryFormat(self.repository._format,
    robert> self.repository.base)

    robert> + def _get_check_refs(self):
    robert> + """Get the references needed for check().
    robert> +
    robert> + See bzrlib.check.
    robert> + """
    robert> + revid = self.last_revision()
    robert> + return [('revision-existence', revid), ('lefthand-distance', revid)]

'revision-existence' is self-explanatory, but 'lefthand-distance'
is not. Add a comment, if only to point the reader to the right
place to understand it.

I find the way to specify the different checks quite pleasant at
that point :)

I'm less sure about the way to store the results. AIUI 'refs' (as
used in Check.check()) will be a dict with keys being (kind,
id).

Why didn't you use a first level of dict with 'kind' as key and
new dict with 'id' as key instead ?

Is there some other reason than keeping the definition and
accesses use the same key ?

<snip/>

    robert> === modified file 'bzrlib/check.py'
...
    robert> - def check(self):
    robert> + def check(self, callback_refs=None, check_repo=True):
    robert> + if callback_refs is None:
...
    robert> + self.progress.update('check', 0, 4)
    robert> + if self.check_repo:
    robert> + self.progress.update('checking revisions', 0)
    robert> + self.check_revisions()
    robert> + self.progress.update('checking commit contents', 1)
    robert> + self.repository._check_inventories(self)
    robert> + self.progress.update('checking file graphs', 2)
    robert> + # check_weaves is done after the revision scan so that
    robert> + # revision index is known to be valid.
    robert> + self.check_weaves()
    robert> + self.progress.update('checking branches and trees', 3)

The above is clear, but put the following if block in a method by
itse...

Read more...

Revision history for this message
Vincent Ladeuil (vila) wrote :

Bah, forgot to sign my mail :-/

review: Approve
Revision history for this message
Andrew Bennetts (spiv) wrote :

Vincent Ladeuil wrote:
[...]
> robert> + def _check_text(self, record, checker, item_data):
> robert> + """Check a single text."""
> robert> + # Check it is extractable.
> robert> + # TODO: check length.
>
> Add '-- lifeless <aaaammjj>' please.

I think you mean yyyymmdd? :)

-Andrew.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2009-06-15 15:20:24 +0000
3+++ NEWS 2009-06-16 05:29:42 +0000
4@@ -17,6 +17,12 @@
5 diverged-branches`` when a push fails because the branches have
6 diverged. (Neil Martinsen-Burrell, #269477)
7
8+API Changes
9+***********
10+
11+* ``WorkingTree._check`` now requires a references dict with keys matching
12+ those returned by ``WorkingTree._get_check_refs``. (Robert Collins)
13+
14 Internals
15 *********
16
17@@ -402,6 +408,9 @@
18 cause mismatched physical locks to cause test errors rather than just
19 reporting to the screen. (Robert Collins)
20
21+* -Dprogress will cause pdb to start up if a progress view jumps
22+ backwards. (Robert Collins)
23+
24 * Fallback ``CredentialStore`` instances registered with ``fallback=True``
25 are now be able to provide credentials if obtaining credentials
26 via ~/.bazaar/authentication.conf fails. (Jelmer Vernooij,
27
28=== modified file 'bzrlib/branch.py'
29--- bzrlib/branch.py 2009-06-10 03:56:49 +0000
30+++ bzrlib/branch.py 2009-06-16 05:29:42 +0000
31@@ -125,6 +125,14 @@
32 raise errors.UnstackableRepositoryFormat(self.repository._format,
33 self.repository.base)
34
35+ def _get_check_refs(self):
36+ """Get the references needed for check().
37+
38+ See bzrlib.check.
39+ """
40+ revid = self.last_revision()
41+ return [('revision-existence', revid), ('lefthand-distance', revid)]
42+
43 @staticmethod
44 def open(base, _unsupported=False, possible_transports=None):
45 """Open the branch rooted at base.
46@@ -1132,7 +1140,7 @@
47 target._set_all_reference_info(target_reference_dict)
48
49 @needs_read_lock
50- def check(self):
51+ def check(self, refs):
52 """Check consistency of the branch.
53
54 In particular this checks that revisions given in the revision-history
55@@ -1141,42 +1149,23 @@
56
57 Callers will typically also want to check the repository.
58
59+ :param refs: Calculated refs for this branch as specified by
60+ branch._get_check_refs()
61 :return: A BranchCheckResult.
62 """
63- ret = BranchCheckResult(self)
64- mainline_parent_id = None
65+ result = BranchCheckResult(self)
66 last_revno, last_revision_id = self.last_revision_info()
67- real_rev_history = []
68- try:
69- for revid in self.repository.iter_reverse_revision_history(
70- last_revision_id):
71- real_rev_history.append(revid)
72- except errors.RevisionNotPresent:
73- ret.ghosts_in_mainline = True
74- else:
75- ret.ghosts_in_mainline = False
76- real_rev_history.reverse()
77- if len(real_rev_history) != last_revno:
78- raise errors.BzrCheckError('revno does not match len(mainline)'
79- ' %s != %s' % (last_revno, len(real_rev_history)))
80- # TODO: We should probably also check that real_rev_history actually
81- # matches self.revision_history()
82- for revision_id in real_rev_history:
83- try:
84- revision = self.repository.get_revision(revision_id)
85- except errors.NoSuchRevision, e:
86- raise errors.BzrCheckError("mainline revision {%s} not in repository"
87- % revision_id)
88- # In general the first entry on the revision history has no parents.
89- # But it's not illegal for it to have parents listed; this can happen
90- # in imports from Arch when the parents weren't reachable.
91- if mainline_parent_id is not None:
92- if mainline_parent_id not in revision.parent_ids:
93- raise errors.BzrCheckError("previous revision {%s} not listed among "
94- "parents of {%s}"
95- % (mainline_parent_id, revision_id))
96- mainline_parent_id = revision_id
97- return ret
98+ actual_revno = refs[('lefthand-distance', last_revision_id)]
99+ if actual_revno != last_revno:
100+ result.errors.append(errors.BzrCheckError(
101+ 'revno does not match len(mainline) %s != %s' % (
102+ last_revno, actual_revno)))
103+ # TODO: We should probably also check that self.revision_history
104+ # matches the repository for older branch formats.
105+ # If looking for the code that cross-checks repository parents against
106+ # the iter_reverse_revision_history output, that is now a repository
107+ # specific check.
108+ return result
109
110 def _get_checkout_format(self):
111 """Return the most suitable metadir for a checkout of this branch.
112@@ -2784,7 +2773,7 @@
113
114 def __init__(self, branch):
115 self.branch = branch
116- self.ghosts_in_mainline = False
117+ self.errors = []
118
119 def report_results(self, verbose):
120 """Report the check results via trace.note.
121@@ -2792,11 +2781,10 @@
122 :param verbose: Requests more detailed display of what was checked,
123 if any.
124 """
125- note('checked branch %s format %s',
126- self.branch.base,
127- self.branch._format)
128- if self.ghosts_in_mainline:
129- note('branch contains ghosts in mainline')
130+ note('checked branch %s format %s', self.branch.base,
131+ self.branch._format)
132+ for error in self.errors:
133+ note('found error:%s', error)
134
135
136 class Converter5to6(object):
137
138=== modified file 'bzrlib/check.py'
139--- bzrlib/check.py 2009-03-23 14:59:43 +0000
140+++ bzrlib/check.py 2009-06-16 00:53:41 +0000
141@@ -32,6 +32,20 @@
142 # raising them. If there's more than one exception it'd be good to see them
143 # all.
144
145+"""Checking of bzr objects.
146+
147+check_refs is a concept used for optimising check. Objects that depend on other
148+objects (e.g. tree on repository) can list the objects they would be requesting
149+so that when the dependent object is checked, matches can be pulled out and
150+evaluated in-line rather than re-reading the same data many times.
151+check_refs are tuples (kind, value). Currently defined kinds are:
152+* 'trees', where value is a revid and the looked up objects are revision trees.
153+* 'lefthand-distance', where value is a revid and the looked up objects are the
154+ distance along the lefthand path to NULL for that revid.
155+* 'revision-existence', where value is a revid, and the result is True or False
156+ indicating that the revision was found/not found.
157+"""
158+
159 from bzrlib import errors, osutils
160 from bzrlib import repository as _mod_repository
161 from bzrlib import revision
162@@ -39,6 +53,7 @@
163 from bzrlib.bzrdir import BzrDir
164 from bzrlib.errors import BzrCheckError
165 from bzrlib.repository import Repository
166+from bzrlib.revision import NULL_REVISION
167 from bzrlib.symbol_versioning import deprecated_function, deprecated_in
168 from bzrlib.trace import log_error, note
169 import bzrlib.ui
170@@ -49,76 +64,145 @@
171
172 # The Check object interacts with InventoryEntry.check, etc.
173
174- def __init__(self, repository):
175+ def __init__(self, repository, check_repo=True):
176 self.repository = repository
177- self.checked_text_cnt = 0
178 self.checked_rev_cnt = 0
179- self.ghosts = []
180- self.repeated_text_cnt = 0
181+ self.ghosts = set()
182 self.missing_parent_links = {}
183 self.missing_inventory_sha_cnt = 0
184 self.missing_revision_cnt = 0
185- # maps (file-id, version) -> sha1; used by InventoryFile._check
186- self.checked_texts = {}
187 self.checked_weaves = set()
188 self.unreferenced_versions = set()
189 self.inconsistent_parents = []
190 self.rich_roots = repository.supports_rich_root()
191 self.text_key_references = {}
192+ self.check_repo = check_repo
193+ self.other_results = []
194+ # Plain text lines to include in the report
195+ self._report_items = []
196+ # Keys we are looking for; may be large and need spilling to disk.
197+ # key->(type(revision/inventory/text/signature/map), sha1, first-referer)
198+ self.pending_keys = {}
199+ # Ancestors map for all of revisions being checked; while large helper
200+ # functions we call would create it anyway, so better to have once and
201+ # keep.
202+ self.ancestors = {}
203
204- def check(self):
205+ def check(self, callback_refs=None, check_repo=True):
206+ if callback_refs is None:
207+ callback_refs = {}
208 self.repository.lock_read()
209 self.progress = bzrlib.ui.ui_factory.nested_progress_bar()
210 try:
211- self.progress.update('retrieving inventory', 0, 2)
212- # do not put in init, as it should be done with progess,
213- # and inside the lock.
214- self.inventory_weave = self.repository.inventories
215- self.progress.update('checking revision graph', 1)
216- self.check_revision_graph()
217- self.plan_revisions()
218- revno = 0
219- while revno < len(self.planned_revisions):
220- rev_id = self.planned_revisions[revno]
221- self.progress.update('checking revision', revno,
222- len(self.planned_revisions))
223- revno += 1
224- self.check_one_rev(rev_id)
225- # check_weaves is done after the revision scan so that
226- # revision index is known to be valid.
227- self.check_weaves()
228+ self.progress.update('check', 0, 4)
229+ if self.check_repo:
230+ self.progress.update('checking revisions', 0)
231+ self.check_revisions()
232+ self.progress.update('checking commit contents', 1)
233+ self.repository._check_inventories(self)
234+ self.progress.update('checking file graphs', 2)
235+ # check_weaves is done after the revision scan so that
236+ # revision index is known to be valid.
237+ self.check_weaves()
238+ self.progress.update('checking branches and trees', 3)
239+ if callback_refs:
240+ repo = self.repository
241+ # calculate all refs, and callback the objects requesting them.
242+ refs = {}
243+ wanting_items = set()
244+ # Current crude version calculates everything and calls
245+ # everything at once. Doing a queue and popping as things are
246+ # satisfied would be cheaper on memory [but few people have
247+ # huge numbers of working trees today. TODO: fix before
248+ # landing].
249+ distances = set()
250+ existences = set()
251+ for ref, wantlist in callback_refs.iteritems():
252+ wanting_items.update(wantlist)
253+ kind, value = ref
254+ if kind == 'trees':
255+ refs[ref] = repo.revision_tree(value)
256+ elif kind == 'lefthand-distance':
257+ distances.add(value)
258+ elif kind == 'revision-existence':
259+ existences.add(value)
260+ else:
261+ raise AssertionError(
262+ 'unknown ref kind for ref %s' % ref)
263+ node_distances = repo.get_graph().find_lefthand_distances(distances)
264+ for key, distance in node_distances.iteritems():
265+ refs[('lefthand-distance', key)] = distance
266+ if key in existences and distance > 0:
267+ refs[('revision-existence', key)] = True
268+ existences.remove(key)
269+ parent_map = repo.get_graph().get_parent_map(existences)
270+ for key in parent_map:
271+ refs[('revision-existence', key)] = True
272+ existences.remove(key)
273+ for key in existences:
274+ refs[('revision-existence', key)] = False
275+ for item in wanting_items:
276+ if isinstance(item, WorkingTree):
277+ item._check(refs)
278+ if isinstance(item, Branch):
279+ self.other_results.append(item.check(refs))
280 finally:
281 self.progress.finished()
282 self.repository.unlock()
283
284- def check_revision_graph(self):
285+ def _check_revisions(self, revisions_iterator):
286+ """Check revision objects by decorating a generator.
287+
288+ :param revisions_iterator: An iterator of(revid, Revision-or-None).
289+ :return: A generator of the contents of revisions_iterator.
290+ """
291+ self.planned_revisions = set()
292+ for revid, revision in revisions_iterator:
293+ yield revid, revision
294+ self._check_one_rev(revid, revision)
295+ # Flatten the revisions we found to guarantee consistent later
296+ # iteration.
297+ self.planned_revisions = list(self.planned_revisions)
298+ # TODO: extract digital signatures as items to callback on too.
299+
300+ def check_revisions(self):
301+ """Scan revisions, checking data directly available as we go."""
302+ revision_iterator = self.repository._iter_revisions(None)
303+ revision_iterator = self._check_revisions(revision_iterator)
304+ # We read the all revisions here:
305+ # - doing this allows later code to depend on the revision index.
306+ # - we can fill out existence flags at this point
307+ # - we can read the revision inventory sha at this point
308+ # - we can check properties and serialisers etc.
309 if not self.repository.revision_graph_can_have_wrong_parents():
310- # This check is not necessary.
311+ # The check against the index isn't needed.
312 self.revs_with_bad_parents_in_index = None
313- return
314- bad_revisions = self.repository._find_inconsistent_revision_parents()
315- self.revs_with_bad_parents_in_index = list(bad_revisions)
316-
317- def plan_revisions(self):
318- repository = self.repository
319- self.planned_revisions = repository.all_revision_ids()
320- self.progress.clear()
321- inventoried = set(key[-1] for key in self.inventory_weave.keys())
322- awol = set(self.planned_revisions) - inventoried
323- if len(awol) > 0:
324- raise BzrCheckError('Stored revisions missing from inventory'
325- '{%s}' % ','.join([f for f in awol]))
326+ for thing in revision_iterator:
327+ pass
328+ else:
329+ bad_revisions = self.repository._find_inconsistent_revision_parents(
330+ revision_iterator)
331+ self.revs_with_bad_parents_in_index = list(bad_revisions)
332
333 def report_results(self, verbose):
334+ if self.check_repo:
335+ self._report_repo_results(verbose)
336+ for result in self.other_results:
337+ result.report_results(verbose)
338+
339+ def _report_repo_results(self, verbose):
340 note('checked repository %s format %s',
341 self.repository.bzrdir.root_transport,
342 self.repository._format)
343 note('%6d revisions', self.checked_rev_cnt)
344 note('%6d file-ids', len(self.checked_weaves))
345- note('%6d unique file texts', self.checked_text_cnt)
346- note('%6d repeated file texts', self.repeated_text_cnt)
347- note('%6d unreferenced text versions',
348- len(self.unreferenced_versions))
349+ if verbose:
350+ note('%6d unreferenced text versions',
351+ len(self.unreferenced_versions))
352+ if verbose and len(self.unreferenced_versions):
353+ for file_id, revision_id in self.unreferenced_versions:
354+ log_error('unreferenced version: {%s} in %s', revision_id,
355+ file_id)
356 if self.missing_inventory_sha_cnt:
357 note('%6d revisions are missing inventory_sha1',
358 self.missing_inventory_sha_cnt)
359@@ -138,10 +222,6 @@
360 note(' %s should be in the ancestry for:', link)
361 for linker in linkers:
362 note(' * %s', linker)
363- if verbose:
364- for file_id, revision_id in self.unreferenced_versions:
365- log_error('unreferenced version: {%s} in %s', revision_id,
366- file_id)
367 if len(self.inconsistent_parents):
368 note('%6d inconsistent parents', len(self.inconsistent_parents))
369 if verbose:
370@@ -161,60 +241,75 @@
371 ' %s has wrong parents in index: '
372 '%r should be %r',
373 revision_id, index_parents, actual_parents)
374-
375- def check_one_rev(self, rev_id):
376- """Check one revision.
377-
378- rev_id - the one to check
379+ for item in self._report_items:
380+ note(item)
381+
382+ def _check_one_rev(self, rev_id, rev):
383+ """Cross-check one revision.
384+
385+ :param rev_id: A revision id to check.
386+ :param rev: A revision or None to indicate a missing revision.
387 """
388- rev = self.repository.get_revision(rev_id)
389-
390 if rev.revision_id != rev_id:
391- raise BzrCheckError('wrong internal revision id in revision {%s}'
392- % rev_id)
393-
394+ self._report_items.append(
395+ 'Mismatched internal revid {%s} and index revid {%s}' % (
396+ rev.revision_id, rev_id))
397+ rev_id = rev.revision_id
398+ # Check this revision tree etc, and count as seen when we encounter a
399+ # reference to it.
400+ self.planned_revisions.add(rev_id)
401+ # It is not a ghost
402+ self.ghosts.discard(rev_id)
403+ # Count all parents as ghosts if we haven't seen them yet.
404 for parent in rev.parent_ids:
405 if not parent in self.planned_revisions:
406- # rev has a parent we didn't know about.
407- missing_links = self.missing_parent_links.get(parent, [])
408- missing_links.append(rev_id)
409- self.missing_parent_links[parent] = missing_links
410- # list based so somewhat slow,
411- # TODO have a planned_revisions list and set.
412- if self.repository.has_revision(parent):
413- missing_ancestry = self.repository.get_ancestry(parent)
414- for missing in missing_ancestry:
415- if (missing is not None
416- and missing not in self.planned_revisions):
417- self.planned_revisions.append(missing)
418- else:
419- self.ghosts.append(rev_id)
420-
421- if rev.inventory_sha1:
422- # Loopback - this is currently circular logic as the
423- # knit get_inventory_sha1 call returns rev.inventory_sha1.
424- # Repository.py's get_inventory_sha1 should instead return
425- # inventories.get_record_stream([(revid,)]).next().sha1 or
426- # similar.
427- inv_sha1 = self.repository.get_inventory_sha1(rev_id)
428- if inv_sha1 != rev.inventory_sha1:
429- raise BzrCheckError('Inventory sha1 hash doesn\'t match'
430- ' value in revision {%s}' % rev_id)
431- self._check_revision_tree(rev_id)
432+ self.ghosts.add(parent)
433+
434+ self.ancestors[rev_id] = tuple(rev.parent_ids) or (NULL_REVISION,)
435+ self.add_pending_item(rev_id, ('inventories', rev_id), 'inventory',
436+ rev.inventory_sha1)
437 self.checked_rev_cnt += 1
438
439+ def add_pending_item(self, referer, key, kind, sha1):
440+ """Add a reference to a sha1 to be cross checked against a key.
441+
442+ :param referer: The referer that expects key to have sha1.
443+ :param key: A storage key e.g. ('texts', 'foo@bar-20040504-1234')
444+ :param kind: revision/inventory/text/map/signature
445+ :param sha1: A hex sha1 or None if no sha1 is known.
446+ """
447+ existing = self.pending_keys.get(key)
448+ if existing:
449+ if sha1 != existing[1]:
450+ self._report_items.append('Multiple expected sha1s for %s. {%s}'
451+ ' expects {%s}, {%s} expects {%s}', (
452+ key, referer, sha1, existing[1], existing[0]))
453+ else:
454+ self.pending_keys[key] = (kind, sha1, referer)
455+
456 def check_weaves(self):
457 """Check all the weaves we can get our hands on.
458 """
459 weave_ids = []
460- self.progress.update('checking inventory', 0, 2)
461- self.inventory_weave.check(progress_bar=self.progress)
462- self.progress.update('checking text storage', 1, 2)
463- self.repository.texts.check(progress_bar=self.progress)
464- weave_checker = self.repository._get_versioned_file_checker(
465- text_key_references=self.text_key_references)
466+ storebar = bzrlib.ui.ui_factory.nested_progress_bar()
467+ try:
468+ self._check_weaves(storebar)
469+ finally:
470+ storebar.finished()
471+
472+ def _check_weaves(self, storebar):
473+ storebar.update('text-index', 0, 2)
474+ if self.repository._format.fast_deltas:
475+ # We haven't considered every fileid instance so far.
476+ weave_checker = self.repository._get_versioned_file_checker(
477+ ancestors=self.ancestors)
478+ else:
479+ weave_checker = self.repository._get_versioned_file_checker(
480+ text_key_references=self.text_key_references,
481+ ancestors=self.ancestors)
482+ storebar.update('file-graph', 1)
483 result = weave_checker.check_file_version_parents(
484- self.repository.texts, progress_bar=self.progress)
485+ self.repository.texts)
486 self.checked_weaves = weave_checker.file_ids
487 bad_parents, unused_versions = result
488 bad_parents = bad_parents.items()
489@@ -228,28 +323,8 @@
490 (revision_id, weave_id, weave_parents, correct_parents))
491 self.unreferenced_versions.update(unused_versions)
492
493- def _check_revision_tree(self, rev_id):
494- tree = self.repository.revision_tree(rev_id)
495- inv = tree.inventory
496- seen_ids = set()
497- seen_names = set()
498- for path, ie in inv.iter_entries():
499- self._add_entry_to_text_key_references(inv, ie)
500- file_id = ie.file_id
501- if file_id in seen_ids:
502- raise BzrCheckError('duplicated file_id {%s} '
503- 'in inventory for revision {%s}'
504- % (file_id, rev_id))
505- seen_ids.add(file_id)
506- ie.check(self, rev_id, inv, tree)
507- if path in seen_names:
508- raise BzrCheckError('duplicated path %s '
509- 'in inventory for revision {%s}'
510- % (path, rev_id))
511- seen_names.add(path)
512-
513 def _add_entry_to_text_key_references(self, inv, entry):
514- if not self.rich_roots and entry == inv.root:
515+ if not self.rich_roots and entry.name == '':
516 return
517 key = (entry.file_id, entry.revision)
518 self.text_key_references.setdefault(key, False)
519@@ -263,13 +338,14 @@
520
521 Results are reported through logging.
522
523- Deprecated in 1.6. Please use check_branch instead.
524+ Deprecated in 1.6. Please use check_dwim instead.
525
526 :raise BzrCheckError: if there's a consistency error.
527 """
528 check_branch(branch, verbose)
529
530
531+@deprecated_function(deprecated_in((1,16,0)))
532 def check_branch(branch, verbose):
533 """Run consistency checks on a branch.
534
535@@ -279,56 +355,108 @@
536 """
537 branch.lock_read()
538 try:
539- branch_result = branch.check()
540+ needed_refs = {}
541+ for ref in branch._get_check_refs():
542+ needed_refs.setdefault(ref, []).append(branch)
543+ result = branch.repository.check([branch.last_revision()], needed_refs)
544+ branch_result = result.other_results[0]
545 finally:
546 branch.unlock()
547 branch_result.report_results(verbose)
548
549
550+def scan_branch(branch, needed_refs, to_unlock):
551+ """Scan a branch for refs.
552+
553+ :param branch: The branch to schedule for checking.
554+ :param needed_refs: Refs we are accumulating.
555+ :param to_unlock: The unlock list accumulating.
556+ """
557+ note("Checking branch at '%s'." % (branch.base,))
558+ branch.lock_read()
559+ to_unlock.append(branch)
560+ branch_refs = branch._get_check_refs()
561+ for ref in branch_refs:
562+ reflist = needed_refs.setdefault(ref, [])
563+ reflist.append(branch)
564+
565+
566+def scan_tree(base_tree, tree, needed_refs, to_unlock):
567+ """Scan a tree for refs.
568+
569+ :param base_tree: The original tree check opened, used to detect duplicate
570+ tree checks.
571+ :param tree: The tree to schedule for checking.
572+ :param needed_refs: Refs we are accumulating.
573+ :param to_unlock: The unlock list accumulating.
574+ """
575+ if base_tree is not None and tree.basedir == base_tree.basedir:
576+ return
577+ note("Checking working tree at '%s'." % (tree.basedir,))
578+ tree.lock_read()
579+ to_unlock.append(tree)
580+ tree_refs = tree._get_check_refs()
581+ for ref in tree_refs:
582+ reflist = needed_refs.setdefault(ref, [])
583+ reflist.append(tree)
584+
585+
586 def check_dwim(path, verbose, do_branch=False, do_repo=False, do_tree=False):
587 try:
588- tree, branch, repo, relpath = \
589+ base_tree, branch, repo, relpath = \
590 BzrDir.open_containing_tree_branch_or_repository(path)
591 except errors.NotBranchError:
592- tree = branch = repo = None
593-
594- if do_tree:
595- if tree is not None:
596- note("Checking working tree at '%s'."
597- % (tree.bzrdir.root_transport.base,))
598- tree._check()
599- else:
600- log_error("No working tree found at specified location.")
601-
602- if branch is not None:
603- # We have a branch
604- if repo is None:
605- # The branch is in a shared repository
606- repo = branch.repository
607- branches = [branch]
608- elif repo is not None:
609- branches = repo.find_branches(using=True)
610-
611- if repo is not None:
612- repo.lock_read()
613- try:
614- if do_repo:
615- note("Checking repository at '%s'."
616- % (repo.bzrdir.root_transport.base,))
617- result = repo.check()
618+ base_tree = branch = repo = None
619+
620+ to_unlock = []
621+ needed_refs= {}
622+ try:
623+ if base_tree is not None:
624+ # If the tree is a lightweight checkout we won't see it in
625+ # repo.find_branches - add now.
626+ if do_tree:
627+ scan_tree(None, base_tree, needed_refs, to_unlock)
628+ branch = base_tree.branch
629+ if branch is not None:
630+ # We have a branch
631+ if repo is None:
632+ # The branch is in a shared repository
633+ repo = branch.repository
634+ if repo is not None:
635+ repo.lock_read()
636+ to_unlock.append(repo)
637+ branches = repo.find_branches(using=True)
638+ saw_tree = False
639+ if do_branch or do_tree:
640+ for branch in branches:
641+ if do_tree:
642+ try:
643+ tree = branch.bzrdir.open_workingtree()
644+ saw_tree = True
645+ except (errors.NotLocalUrl, errors.NoWorkingTree):
646+ pass
647+ else:
648+ scan_tree(base_tree, tree, needed_refs, to_unlock)
649+ if do_branch:
650+ scan_branch(branch, needed_refs, to_unlock)
651+ if do_branch and not branches:
652+ log_error("No branch found at specified location.")
653+ if do_tree and base_tree is None and not saw_tree:
654+ log_error("No working tree found at specified location.")
655+ if do_repo or do_branch or do_tree:
656+ if do_repo:
657+ note("Checking repository at '%s'."
658+ % (repo.bzrdir.root_transport.base,))
659+ result = repo.check(None, callback_refs=needed_refs,
660+ check_repo=do_repo)
661 result.report_results(verbose)
662+ else:
663+ if do_tree:
664+ log_error("No working tree found at specified location.")
665 if do_branch:
666- if branches == []:
667- log_error("No branch found at specified location.")
668- else:
669- for branch in branches:
670- note("Checking branch at '%s'."
671- % (branch.bzrdir.root_transport.base,))
672- check_branch(branch, verbose)
673- finally:
674- repo.unlock()
675- else:
676- if do_branch:
677- log_error("No branch found at specified location.")
678- if do_repo:
679- log_error("No repository found at specified location.")
680+ log_error("No branch found at specified location.")
681+ if do_repo:
682+ log_error("No repository found at specified location.")
683+ finally:
684+ for thing in to_unlock:
685+ thing.unlock()
686
687=== modified file 'bzrlib/graph.py'
688--- bzrlib/graph.py 2009-06-10 03:56:49 +0000
689+++ bzrlib/graph.py 2009-06-16 05:29:42 +0000
690@@ -311,6 +311,27 @@
691 # get there.
692 return known_revnos[cur_tip] + num_steps
693
694+ def find_lefthand_distances(self, keys):
695+ """Find the distance to null for all the keys in keys.
696+
697+ :param keys: keys to lookup.
698+ :return: A dict key->distance for all of keys.
699+ """
700+ # Optimisable by concurrent searching, but a random spread should get
701+ # some sort of hit rate.
702+ result = {}
703+ known_revnos = []
704+ ghosts = []
705+ for key in keys:
706+ try:
707+ known_revnos.append(
708+ (key, self.find_distance_to_null(key, known_revnos)))
709+ except errors.GhostRevisionsHaveNoRevno:
710+ ghosts.append(key)
711+ for key in ghosts:
712+ known_revnos.append((key, -1))
713+ return dict(known_revnos)
714+
715 def find_unique_ancestors(self, unique_revision, common_revisions):
716 """Find the unique ancestors for a revision versus others.
717
718
719=== modified file 'bzrlib/groupcompress.py'
720--- bzrlib/groupcompress.py 2009-06-10 03:56:49 +0000
721+++ bzrlib/groupcompress.py 2009-06-16 05:29:42 +0000
722@@ -1049,11 +1049,14 @@
723 reannotate(parent_lines, lines, key, None, head_cache))
724 return parent_cache[key]
725
726- def check(self, progress_bar=None):
727+ def check(self, progress_bar=None, keys=None):
728 """See VersionedFiles.check()."""
729- keys = self.keys()
730- for record in self.get_record_stream(keys, 'unordered', True):
731- record.get_bytes_as('fulltext')
732+ if keys is None:
733+ keys = self.keys()
734+ for record in self.get_record_stream(keys, 'unordered', True):
735+ record.get_bytes_as('fulltext')
736+ else:
737+ return self.get_record_stream(keys, 'unordered', True)
738
739 def _check_add(self, key, lines, random_id, check_content):
740 """check that version_id and lines are safe to add."""
741
742=== modified file 'bzrlib/help_topics/en/debug-flags.txt'
743--- bzrlib/help_topics/en/debug-flags.txt 2009-03-18 23:43:51 +0000
744+++ bzrlib/help_topics/en/debug-flags.txt 2009-05-12 06:30:56 +0000
745@@ -19,6 +19,7 @@
746 -Dindex Trace major index operations.
747 -Dknit Trace knit operations.
748 -Dlock Trace when lockdir locks are taken or released.
749+-Dprogress Trace progress bar operations.
750 -Dmerge Emit information for debugging merges.
751 -Dpack Emit information about pack operations.
752 -Dsftp Trace SFTP internals.
753
754=== modified file 'bzrlib/inventory.py'
755--- bzrlib/inventory.py 2009-06-10 03:56:49 +0000
756+++ bzrlib/inventory.py 2009-06-16 05:29:42 +0000
757@@ -262,7 +262,7 @@
758 def versionable_kind(kind):
759 return (kind in ('file', 'directory', 'symlink', 'tree-reference'))
760
761- def check(self, checker, rev_id, inv, tree):
762+ def check(self, checker, rev_id, inv):
763 """Check this inventory entry is intact.
764
765 This is a template method, override _check for kind specific
766@@ -274,18 +274,18 @@
767 :param rev_id: Revision id from which this InventoryEntry was loaded.
768 Not necessarily the last-changed revision for this file.
769 :param inv: Inventory from which the entry was loaded.
770- :param tree: RevisionTree for this entry.
771 """
772 if self.parent_id is not None:
773 if not inv.has_id(self.parent_id):
774 raise BzrCheckError('missing parent {%s} in inventory for revision {%s}'
775 % (self.parent_id, rev_id))
776- self._check(checker, rev_id, tree)
777+ checker._add_entry_to_text_key_references(inv, self)
778+ self._check(checker, rev_id)
779
780- def _check(self, checker, rev_id, tree):
781+ def _check(self, checker, rev_id):
782 """Check this inventory entry for kind specific errors."""
783- raise BzrCheckError('unknown entry kind %r in revision {%s}' %
784- (self.kind, rev_id))
785+ checker._report_items.append(
786+ 'unknown entry kind %r in revision {%s}' % (self.kind, rev_id))
787
788 def copy(self):
789 """Clone this inventory entry."""
790@@ -404,7 +404,7 @@
791 'text_id', 'parent_id', 'children', 'executable',
792 'revision', 'symlink_target', 'reference_revision']
793
794- def _check(self, checker, rev_id, tree):
795+ def _check(self, checker, rev_id):
796 """See InventoryEntry._check"""
797
798 def __init__(self, file_id):
799@@ -433,11 +433,16 @@
800 'text_id', 'parent_id', 'children', 'executable',
801 'revision', 'symlink_target', 'reference_revision']
802
803- def _check(self, checker, rev_id, tree):
804+ def _check(self, checker, rev_id):
805 """See InventoryEntry._check"""
806- if self.text_sha1 is not None or self.text_size is not None or self.text_id is not None:
807- raise BzrCheckError('directory {%s} has text in revision {%s}'
808+ if (self.text_sha1 is not None or self.text_size is not None or
809+ self.text_id is not None):
810+ checker._report_items.append('directory {%s} has text in revision {%s}'
811 % (self.file_id, rev_id))
812+ # Directories are stored as ''.
813+ checker.add_pending_item(rev_id,
814+ ('texts', self.file_id, self.revision), 'text',
815+ 'da39a3ee5e6b4b0d3255bfef95601890afd80709')
816
817 def copy(self):
818 other = InventoryDirectory(self.file_id, self.name, self.parent_id)
819@@ -476,27 +481,16 @@
820 'text_id', 'parent_id', 'children', 'executable',
821 'revision', 'symlink_target', 'reference_revision']
822
823- def _check(self, checker, tree_revision_id, tree):
824+ def _check(self, checker, tree_revision_id):
825 """See InventoryEntry._check"""
826- key = (self.file_id, self.revision)
827- if key in checker.checked_texts:
828- prev_sha = checker.checked_texts[key]
829- if prev_sha != self.text_sha1:
830- raise BzrCheckError(
831- 'mismatched sha1 on {%s} in {%s} (%s != %s) %r' %
832- (self.file_id, tree_revision_id, prev_sha, self.text_sha1,
833- t))
834- else:
835- checker.repeated_text_cnt += 1
836- return
837-
838- checker.checked_text_cnt += 1
839- # We can't check the length, because Weave doesn't store that
840- # information, and the whole point of looking at the weave's
841- # sha1sum is that we don't have to extract the text.
842- if (self.text_sha1 != tree._repository.texts.get_sha1s([key])[key]):
843- raise BzrCheckError('text {%s} version {%s} wrong sha1' % key)
844- checker.checked_texts[key] = self.text_sha1
845+ # TODO: check size too.
846+ checker.add_pending_item(tree_revision_id,
847+ ('texts', self.file_id, self.revision), 'text',
848+ self.text_sha1)
849+ if self.text_size is None:
850+ checker._report_items.append(
851+ 'fileid {%s} in {%s} has None for text_size' % (self.file_id,
852+ tree_revision_id))
853
854 def copy(self):
855 other = InventoryFile(self.file_id, self.name, self.parent_id)
856@@ -600,14 +594,20 @@
857 'text_id', 'parent_id', 'children', 'executable',
858 'revision', 'symlink_target', 'reference_revision']
859
860- def _check(self, checker, rev_id, tree):
861+ def _check(self, checker, rev_id):
862 """See InventoryEntry._check"""
863 if self.text_sha1 is not None or self.text_size is not None or self.text_id is not None:
864- raise BzrCheckError('symlink {%s} has text in revision {%s}'
865+ checker._report_items.append(
866+ 'symlink {%s} has text in revision {%s}'
867 % (self.file_id, rev_id))
868 if self.symlink_target is None:
869- raise BzrCheckError('symlink {%s} has no target in revision {%s}'
870+ checker._report_items.append(
871+ 'symlink {%s} has no target in revision {%s}'
872 % (self.file_id, rev_id))
873+ # Symlinks are stored as ''
874+ checker.add_pending_item(tree_revision_id,
875+ ('texts', self.file_id, self.revision), 'text',
876+ 'da39a3ee5e6b4b0d3255bfef95601890afd80709')
877
878 def copy(self):
879 other = InventoryLink(self.file_id, self.name, self.parent_id)
880
881=== modified file 'bzrlib/knit.py'
882--- bzrlib/knit.py 2009-06-10 03:56:49 +0000
883+++ bzrlib/knit.py 2009-06-16 05:29:42 +0000
884@@ -1005,8 +1005,15 @@
885 """See VersionedFiles.annotate."""
886 return self._factory.annotate(self, key)
887
888- def check(self, progress_bar=None):
889+ def check(self, progress_bar=None, keys=None):
890 """See VersionedFiles.check()."""
891+ if keys is None:
892+ return self._logical_check()
893+ else:
894+ # At the moment, check does not extra work over get_record_stream
895+ return self.get_record_stream(keys, 'unordered', True)
896+
897+ def _logical_check(self):
898 # This doesn't actually test extraction of everything, but that will
899 # impact 'bzr check' substantially, and needs to be integrated with
900 # care. However, it does check for the obvious problem of a delta with
901
902=== modified file 'bzrlib/remote.py'
903--- bzrlib/remote.py 2009-06-11 09:11:21 +0000
904+++ bzrlib/remote.py 2009-06-16 05:29:42 +0000
905@@ -1414,9 +1414,10 @@
906 return self._real_repository.get_revision_reconcile(revision_id)
907
908 @needs_read_lock
909- def check(self, revision_ids=None):
910+ def check(self, revision_ids=None, callback_refs=None):
911 self._ensure_real()
912- return self._real_repository.check(revision_ids=revision_ids)
913+ return self._real_repository.check(revision_ids=revision_ids,
914+ callback_refs=callback_refs)
915
916 def copy_content_into(self, destination, revision_id=None):
917 self._ensure_real()
918
919=== modified file 'bzrlib/repofmt/groupcompress_repo.py'
920--- bzrlib/repofmt/groupcompress_repo.py 2009-06-12 01:11:00 +0000
921+++ bzrlib/repofmt/groupcompress_repo.py 2009-06-16 05:29:42 +0000
922@@ -718,6 +718,10 @@
923 finally:
924 basis_tree.unlock()
925
926+ def deserialise_inventory(self, revision_id, bytes):
927+ return inventory.CHKInventory.deserialise(self.chk_bytes, bytes,
928+ (revision_id,))
929+
930 def _iter_inventories(self, revision_ids):
931 """Iterate over many inventory objects."""
932 keys = [(revision_id,) for revision_id in revision_ids]
933
934=== modified file 'bzrlib/repofmt/knitrepo.py'
935--- bzrlib/repofmt/knitrepo.py 2009-06-10 03:56:49 +0000
936+++ bzrlib/repofmt/knitrepo.py 2009-06-16 05:29:42 +0000
937@@ -229,24 +229,29 @@
938 def _make_parents_provider(self):
939 return _KnitsParentsProvider(self.revisions)
940
941- def _find_inconsistent_revision_parents(self):
942+ def _find_inconsistent_revision_parents(self, revisions_iterator=None):
943 """Find revisions with different parent lists in the revision object
944 and in the index graph.
945
946+ :param revisions_iterator: None, or an iterator of (revid,
947+ Revision-or-None). This iterator controls the revisions checked.
948 :returns: an iterator yielding tuples of (revison-id, parents-in-index,
949 parents-in-revision).
950 """
951 if not self.is_locked():
952 raise AssertionError()
953 vf = self.revisions
954- for index_version in vf.keys():
955- parent_map = vf.get_parent_map([index_version])
956+ if revisions_iterator is None:
957+ revisions_iterator = self._iter_revisions(None)
958+ for revid, revision in revisions_iterator:
959+ if revision is None:
960+ pass
961+ parent_map = vf.get_parent_map([(revid,)])
962 parents_according_to_index = tuple(parent[-1] for parent in
963- parent_map[index_version])
964- revision = self.get_revision(index_version[-1])
965+ parent_map[(revid,)])
966 parents_according_to_revision = tuple(revision.parent_ids)
967 if parents_according_to_index != parents_according_to_revision:
968- yield (index_version[-1], parents_according_to_index,
969+ yield (revid, parents_according_to_index,
970 parents_according_to_revision)
971
972 def _check_for_inconsistent_revision_parents(self):
973
974=== modified file 'bzrlib/repofmt/pack_repo.py'
975--- bzrlib/repofmt/pack_repo.py 2009-06-10 03:56:49 +0000
976+++ bzrlib/repofmt/pack_repo.py 2009-06-16 05:29:42 +0000
977@@ -2219,52 +2219,6 @@
978 self.revisions._index._key_dependencies.refs.clear()
979 self._pack_collection._abort_write_group()
980
981- def _find_inconsistent_revision_parents(self):
982- """Find revisions with incorrectly cached parents.
983-
984- :returns: an iterator yielding tuples of (revison-id, parents-in-index,
985- parents-in-revision).
986- """
987- if not self.is_locked():
988- raise errors.ObjectNotLocked(self)
989- pb = ui.ui_factory.nested_progress_bar()
990- result = []
991- try:
992- revision_nodes = self._pack_collection.revision_index \
993- .combined_index.iter_all_entries()
994- index_positions = []
995- # Get the cached index values for all revisions, and also the
996- # location in each index of the revision text so we can perform
997- # linear IO.
998- for index, key, value, refs in revision_nodes:
999- node = (index, key, value, refs)
1000- index_memo = self.revisions._index._node_to_position(node)
1001- if index_memo[0] != index:
1002- raise AssertionError('%r != %r' % (index_memo[0], index))
1003- index_positions.append((index_memo, key[0],
1004- tuple(parent[0] for parent in refs[0])))
1005- pb.update("Reading revision index", 0, 0)
1006- index_positions.sort()
1007- batch_size = 1000
1008- pb.update("Checking cached revision graph", 0,
1009- len(index_positions))
1010- for offset in xrange(0, len(index_positions), 1000):
1011- pb.update("Checking cached revision graph", offset)
1012- to_query = index_positions[offset:offset + batch_size]
1013- if not to_query:
1014- break
1015- rev_ids = [item[1] for item in to_query]
1016- revs = self.get_revisions(rev_ids)
1017- for revision, item in zip(revs, to_query):
1018- index_parents = item[2]
1019- rev_parents = tuple(revision.parent_ids)
1020- if index_parents != rev_parents:
1021- result.append((revision.revision_id, index_parents,
1022- rev_parents))
1023- finally:
1024- pb.finished()
1025- return result
1026-
1027 def _make_parents_provider(self):
1028 return graph.CachingParentsProvider(self)
1029
1030
1031=== modified file 'bzrlib/repository.py'
1032--- bzrlib/repository.py 2009-06-12 01:11:00 +0000
1033+++ bzrlib/repository.py 2009-06-16 05:29:42 +0000
1034@@ -1154,6 +1154,119 @@
1035 # The old API returned a list, should this actually be a set?
1036 return parent_map.keys()
1037
1038+ def _check_inventories(self, checker):
1039+ """Check the inventories found from the revision scan.
1040+
1041+ This is responsible for verifying the sha1 of inventories and
1042+ creating a pending_keys set that covers data referenced by inventories.
1043+ """
1044+ bar = ui.ui_factory.nested_progress_bar()
1045+ try:
1046+ self._do_check_inventories(checker, bar)
1047+ finally:
1048+ bar.finished()
1049+
1050+ def _do_check_inventories(self, checker, bar):
1051+ """Helper for _check_inventories."""
1052+ revno = 0
1053+ keys = {'chk_bytes':set(), 'inventories':set(), 'texts':set()}
1054+ kinds = ['chk_bytes', 'texts']
1055+ count = len(checker.pending_keys)
1056+ bar.update("inventories", 0, 2)
1057+ current_keys = checker.pending_keys
1058+ checker.pending_keys = {}
1059+ # Accumulate current checks.
1060+ for key in current_keys:
1061+ if key[0] != 'inventories' and key[0] not in kinds:
1062+ checker._report_items.append('unknown key type %r' % (key,))
1063+ keys[key[0]].add(key[1:])
1064+ if keys['inventories']:
1065+ # NB: output order *should* be roughly sorted - topo or
1066+ # inverse topo depending on repository - either way decent
1067+ # to just delta against. However, pre-CHK formats didn't
1068+ # try to optimise inventory layout on disk. As such the
1069+ # pre-CHK code path does not use inventory deltas.
1070+ last_object = None
1071+ for record in self.inventories.check(keys=keys['inventories']):
1072+ if record.storage_kind == 'absent':
1073+ checker._report_items.append(
1074+ 'Missing inventory {%s}' % (record.key,))
1075+ else:
1076+ last_object = self._check_record('inventories', record,
1077+ checker, last_object,
1078+ current_keys[('inventories',) + record.key])
1079+ del keys['inventories']
1080+ else:
1081+ return
1082+ bar.update("texts", 1)
1083+ while (checker.pending_keys or keys['chk_bytes']
1084+ or keys['texts']):
1085+ # Something to check.
1086+ current_keys = checker.pending_keys
1087+ checker.pending_keys = {}
1088+ # Accumulate current checks.
1089+ for key in current_keys:
1090+ if key[0] not in kinds:
1091+ checker._report_items.append('unknown key type %r' % (key,))
1092+ keys[key[0]].add(key[1:])
1093+ # Check the outermost kind only - inventories || chk_bytes || texts
1094+ for kind in kinds:
1095+ if keys[kind]:
1096+ last_object = None
1097+ for record in getattr(self, kind).check(keys=keys[kind]):
1098+ if record.storage_kind == 'absent':
1099+ checker._report_items.append(
1100+ 'Missing inventory {%s}' % (record.key,))
1101+ else:
1102+ last_object = self._check_record(kind, record,
1103+ checker, last_object, current_keys[(kind,) + record.key])
1104+ keys[kind] = set()
1105+ break
1106+
1107+ def _check_record(self, kind, record, checker, last_object, item_data):
1108+ """Check a single text from this repository."""
1109+ if kind == 'inventories':
1110+ rev_id = record.key[0]
1111+ inv = self.deserialise_inventory(rev_id,
1112+ record.get_bytes_as('fulltext'))
1113+ if last_object is not None:
1114+ delta = inv._make_delta(last_object)
1115+ for old_path, path, file_id, ie in delta:
1116+ if ie is None:
1117+ continue
1118+ ie.check(checker, rev_id, inv)
1119+ else:
1120+ for path, ie in inv.iter_entries():
1121+ ie.check(checker, rev_id, inv)
1122+ if self._format.fast_deltas:
1123+ return inv
1124+ elif kind == 'chk_bytes':
1125+ # No code written to check chk_bytes for this repo format.
1126+ checker._report_items.append(
1127+ 'unsupported key type chk_bytes for %s' % (record.key,))
1128+ elif kind == 'texts':
1129+ self._check_text(record, checker, item_data)
1130+ else:
1131+ checker._report_items.append(
1132+ 'unknown key type %s for %s' % (kind, record.key))
1133+
1134+ def _check_text(self, record, checker, item_data):
1135+ """Check a single text."""
1136+ # Check it is extractable.
1137+ # TODO: check length.
1138+ if record.storage_kind == 'chunked':
1139+ chunks = record.get_bytes_as(record.storage_kind)
1140+ sha1 = osutils.sha_strings(chunks)
1141+ length = sum(map(len, chunks))
1142+ else:
1143+ content = record.get_bytes_as('fulltext')
1144+ sha1 = osutils.sha_string(content)
1145+ length = len(content)
1146+ if item_data and sha1 != item_data[1]:
1147+ checker._report_items.append(
1148+ 'sha1 mismatch: %s has sha1 %s expected %s referenced by %s' %
1149+ (record.key, sha1, item_data[1], item_data[2]))
1150+
1151 @staticmethod
1152 def create(a_bzrdir):
1153 """Construct the current default format repository in a_bzrdir."""
1154@@ -1700,25 +1813,49 @@
1155
1156 @needs_read_lock
1157 def get_revisions(self, revision_ids):
1158- """Get many revisions at once."""
1159+ """Get many revisions at once.
1160+
1161+ Repositories that need to check data on every revision read should
1162+ subclass this method.
1163+ """
1164 return self._get_revisions(revision_ids)
1165
1166 @needs_read_lock
1167 def _get_revisions(self, revision_ids):
1168 """Core work logic to get many revisions without sanity checks."""
1169- for rev_id in revision_ids:
1170- if not rev_id or not isinstance(rev_id, basestring):
1171- raise errors.InvalidRevisionId(revision_id=rev_id, branch=self)
1172+ revs = {}
1173+ for revid, rev in self._iter_revisions(revision_ids):
1174+ if rev is None:
1175+ raise errors.NoSuchRevision(self, revid)
1176+ revs[revid] = rev
1177+ return [revs[revid] for revid in revision_ids]
1178+
1179+ def _iter_revisions(self, revision_ids):
1180+ """Iterate over revision objects.
1181+
1182+ :param revision_ids: An iterable of revisions to examine. None may be
1183+ passed to request all revisions known to the repository. Note that
1184+ not all repositories can find unreferenced revisions; for those
1185+ repositories only referenced ones will be returned.
1186+ :return: An iterator of (revid, revision) tuples. Absent revisions (
1187+ those asked for but not available) are returned as (revid, None).
1188+ """
1189+ if revision_ids is None:
1190+ revision_ids = self.all_revision_ids()
1191+ else:
1192+ for rev_id in revision_ids:
1193+ if not rev_id or not isinstance(rev_id, basestring):
1194+ raise errors.InvalidRevisionId(revision_id=rev_id, branch=self)
1195 keys = [(key,) for key in revision_ids]
1196 stream = self.revisions.get_record_stream(keys, 'unordered', True)
1197- revs = {}
1198 for record in stream:
1199+ revid = record.key[0]
1200 if record.storage_kind == 'absent':
1201- raise errors.NoSuchRevision(self, record.key[0])
1202- text = record.get_bytes_as('fulltext')
1203- rev = self._serializer.read_revision_from_string(text)
1204- revs[record.key[0]] = rev
1205- return [revs[revid] for revid in revision_ids]
1206+ yield (revid, None)
1207+ else:
1208+ text = record.get_bytes_as('fulltext')
1209+ rev = self._serializer.read_revision_from_string(text)
1210+ yield (revid, rev)
1211
1212 @needs_read_lock
1213 def get_revision_xml(self, revision_id):
1214@@ -2056,8 +2193,7 @@
1215 batch_size]
1216 if not to_query:
1217 break
1218- for rev_tree in self.revision_trees(to_query):
1219- revision_id = rev_tree.get_revision_id()
1220+ for revision_id in to_query:
1221 parent_ids = ancestors[revision_id]
1222 for text_key in revision_keys[revision_id]:
1223 pb.update("Calculating text parents", processed_texts)
1224@@ -2417,7 +2553,8 @@
1225 [parents_provider, other_repository._make_parents_provider()])
1226 return graph.Graph(parents_provider)
1227
1228- def _get_versioned_file_checker(self, text_key_references=None):
1229+ def _get_versioned_file_checker(self, text_key_references=None,
1230+ ancestors=None):
1231 """Return an object suitable for checking versioned files.
1232
1233 :param text_key_references: if non-None, an already built
1234@@ -2425,9 +2562,12 @@
1235 to whether they were referred to by the inventory of the
1236 revision_id that they contain. If None, this will be
1237 calculated.
1238+ :param ancestors: Optional result from
1239+ self.get_graph().get_parent_map(self.all_revision_ids()) if already
1240+ available.
1241 """
1242 return _VersionedFileChecker(self,
1243- text_key_references=text_key_references)
1244+ text_key_references=text_key_references, ancestors=ancestors)
1245
1246 def revision_ids_to_search_result(self, result_set):
1247 """Convert a set of revision ids to a graph SearchResult."""
1248@@ -2483,19 +2623,25 @@
1249 return record.get_bytes_as('fulltext')
1250
1251 @needs_read_lock
1252- def check(self, revision_ids=None):
1253+ def check(self, revision_ids=None, callback_refs=None, check_repo=True):
1254 """Check consistency of all history of given revision_ids.
1255
1256 Different repository implementations should override _check().
1257
1258 :param revision_ids: A non-empty list of revision_ids whose ancestry
1259 will be checked. Typically the last revision_id of a branch.
1260+ :param callback_refs: A dict of check-refs to resolve and callback
1261+ the check/_check method on the items listed as wanting the ref.
1262+ see bzrlib.check.
1263+ :param check_repo: If False do not check the repository contents, just
1264+ calculate the data callback_refs requires and call them back.
1265 """
1266- return self._check(revision_ids)
1267+ return self._check(revision_ids, callback_refs=callback_refs,
1268+ check_repo=check_repo)
1269
1270- def _check(self, revision_ids):
1271- result = check.Check(self)
1272- result.check()
1273+ def _check(self, revision_ids, callback_refs, check_repo):
1274+ result = check.Check(self, check_repo=check_repo)
1275+ result.check(callback_refs)
1276 return result
1277
1278 def _warn_if_deprecated(self):
1279@@ -3923,10 +4069,10 @@
1280
1281 class _VersionedFileChecker(object):
1282
1283- def __init__(self, repository, text_key_references=None):
1284+ def __init__(self, repository, text_key_references=None, ancestors=None):
1285 self.repository = repository
1286 self.text_index = self.repository._generate_text_key_index(
1287- text_key_references=text_key_references)
1288+ text_key_references=text_key_references, ancestors=ancestors)
1289
1290 def calculate_file_version_parents(self, text_key):
1291 """Calculate the correct parents for a file version according to
1292@@ -3950,6 +4096,18 @@
1293 revision_id) tuples for versions that are present in this versioned
1294 file, but not used by the corresponding inventory.
1295 """
1296+ local_progress = None
1297+ if progress_bar is None:
1298+ local_progress = ui.ui_factory.nested_progress_bar()
1299+ progress_bar = local_progress
1300+ try:
1301+ return self._check_file_version_parents(texts, progress_bar)
1302+ finally:
1303+ if local_progress:
1304+ local_progress.finished()
1305+
1306+ def _check_file_version_parents(self, texts, progress_bar):
1307+ """See check_file_version_parents."""
1308 wrong_parents = {}
1309 self.file_ids = set([file_id for file_id, _ in
1310 self.text_index.iterkeys()])
1311@@ -3964,8 +4122,7 @@
1312 text_keys = self.repository.texts.keys()
1313 unused_keys = frozenset(text_keys) - set(self.text_index)
1314 for num, key in enumerate(self.text_index.iterkeys()):
1315- if progress_bar is not None:
1316- progress_bar.update('checking text graph', num, n_versions)
1317+ progress_bar.update('checking text graph', num, n_versions)
1318 correct_parents = self.calculate_file_version_parents(key)
1319 try:
1320 knit_parents = parent_map[key]
1321
1322=== modified file 'bzrlib/smart/medium.py'
1323--- bzrlib/smart/medium.py 2009-06-10 03:56:49 +0000
1324+++ bzrlib/smart/medium.py 2009-06-16 05:29:42 +0000
1325@@ -37,7 +37,6 @@
1326 from bzrlib import (
1327 debug,
1328 errors,
1329- osutils,
1330 symbol_versioning,
1331 trace,
1332 ui,
1333@@ -46,7 +45,8 @@
1334 from bzrlib.smart import client, protocol, request, vfs
1335 from bzrlib.transport import ssh
1336 """)
1337-
1338+#usually already imported, and getting IllegalScoperReplacer on it here.
1339+from bzrlib import osutils
1340
1341 # We must not read any more than 64k at a time so we don't risk "no buffer
1342 # space available" errors on some platforms. Windows in particular is likely
1343
1344=== modified file 'bzrlib/tests/blackbox/test_check.py'
1345--- bzrlib/tests/blackbox/test_check.py 2009-03-23 14:59:43 +0000
1346+++ bzrlib/tests/blackbox/test_check.py 2009-05-12 07:27:33 +0000
1347@@ -34,23 +34,21 @@
1348 tree = self.make_branch_and_tree('.')
1349 tree.commit('hallelujah')
1350 out, err = self.run_bzr('check')
1351- self.assertContainsRe(err, r"^Checking working tree at '.*'\.\n"
1352- r"Checking repository at '.*'\.\n"
1353- r"checked repository.*\n"
1354+ self.assertContainsRe(err, r"Checking working tree at '.*'\.\n")
1355+ self.assertContainsRe(err, r"Checking repository at '.*'\.\n")
1356+ self.assertContainsRe(err, r"checked repository.*\n"
1357 r" 1 revisions\n"
1358 r" 0 file-ids\n"
1359- r" 0 unique file texts\n"
1360- r" 0 repeated file texts\n"
1361- r" 0 unreferenced text versions\n"
1362- r"Checking branch at '.*'\.\n"
1363- r"checked branch.*\n$")
1364+ )
1365+ self.assertContainsRe(err, r"Checking branch at '.*'\.\n")
1366+ self.assertContainsRe(err, r"checked branch.*")
1367
1368 def test_check_branch(self):
1369 tree = self.make_branch_and_tree('.')
1370 tree.commit('foo')
1371 out, err = self.run_bzr('check --branch')
1372 self.assertContainsRe(err, r"^Checking branch at '.*'\.\n"
1373- r"checked branch.*\n$")
1374+ r"checked branch.*")
1375
1376 def test_check_repository(self):
1377 tree = self.make_branch_and_tree('.')
1378@@ -60,9 +58,7 @@
1379 r"checked repository.*\n"
1380 r" 1 revisions\n"
1381 r" 0 file-ids\n"
1382- r" 0 unique file texts\n"
1383- r" 0 repeated file texts\n"
1384- r" 0 unreferenced text versions$")
1385+ )
1386
1387 def test_check_tree(self):
1388 tree = self.make_branch_and_tree('.')
1389@@ -76,7 +72,7 @@
1390 out, err = self.run_bzr('check --tree --branch')
1391 self.assertContainsRe(err, r"^Checking working tree at '.*'\.\n"
1392 r"Checking branch at '.*'\.\n"
1393- r"checked branch.*\n$")
1394+ r"checked branch.*")
1395
1396 def test_check_missing_tree(self):
1397 branch = self.make_branch('.')
1398@@ -87,9 +83,9 @@
1399 branch = self.make_branch('.')
1400 out, err = self.run_bzr('check --tree --branch')
1401 self.assertContainsRe(err,
1402- r"^No working tree found at specified location\.\n"
1403 r"Checking branch at '.*'\.\n"
1404- r"checked branch.*\n$")
1405+ r"No working tree found at specified location\.\n"
1406+ r"checked branch.*")
1407
1408 def test_check_missing_branch_in_shared_repo(self):
1409 self.make_repository('shared', shared=True)
1410
1411=== modified file 'bzrlib/tests/branch_implementations/test_check.py'
1412--- bzrlib/tests/branch_implementations/test_check.py 2009-06-10 03:56:49 +0000
1413+++ bzrlib/tests/branch_implementations/test_check.py 2009-06-16 05:29:42 +0000
1414@@ -16,7 +16,9 @@
1415
1416 """Tests for branch implementations - test check() functionality"""
1417
1418-from bzrlib import errors
1419+from StringIO import StringIO
1420+
1421+from bzrlib import errors, tests, ui
1422 from bzrlib.tests.branch_implementations import TestCaseWithBranch
1423
1424
1425@@ -54,24 +56,57 @@
1426 # with set_last_revision_info
1427 tree.branch.set_last_revision_info(3, r5)
1428
1429- e = self.assertRaises(errors.BzrCheckError,
1430- tree.branch.check)
1431- self.assertEqual('Internal check failed:'
1432- ' revno does not match len(mainline) 3 != 5', str(e))
1433+ tree.lock_read()
1434+ self.addCleanup(tree.unlock)
1435+ refs = self.make_refs(tree.branch)
1436+ result = tree.branch.check(refs)
1437+ ui.ui_factory = tests.TestUIFactory(stdout=StringIO())
1438+ result.report_results(True)
1439+ self.assertContainsRe('revno does not match len',
1440+ ui.ui_factory.stdout.getvalue())
1441
1442 def test_check_branch_report_results(self):
1443 """Checking a branch produces results which can be printed"""
1444 branch = self.make_branch('.')
1445- result = branch.check()
1446+ branch.lock_read()
1447+ self.addCleanup(branch.unlock)
1448+ result = branch.check(self.make_refs(branch))
1449 # reports results through logging
1450 result.report_results(verbose=True)
1451 result.report_results(verbose=False)
1452
1453- def test_check_detects_ghosts_in_mainline(self):
1454- tree = self.make_branch_and_tree('test')
1455- tree.set_parent_ids(['thisisaghost'], allow_leftmost_as_ghost=True)
1456- r1 = tree.commit('one')
1457- r2 = tree.commit('two')
1458- result = tree.branch.check()
1459- self.assertEquals(True, result.ghosts_in_mainline)
1460+ def test__get_check_refs(self):
1461+ tree = self.make_branch_and_tree('.')
1462+ revid = tree.commit('foo')
1463+ self.assertEqual(
1464+ set([('revision-existence', revid), ('lefthand-distance', revid)]),
1465+ set(tree.branch._get_check_refs()))
1466
1467+ def make_refs(self, branch):
1468+ needed_refs = branch._get_check_refs()
1469+ refs = {}
1470+ distances = set()
1471+ existences = set()
1472+ for ref in needed_refs:
1473+ kind, value = ref
1474+ if kind == 'lefthand-distance':
1475+ distances.add(value)
1476+ elif kind == 'revision-existence':
1477+ existences.add(value)
1478+ else:
1479+ raise AssertionError(
1480+ 'unknown ref kind for ref %s' % ref)
1481+ node_distances = branch.repository.get_graph().find_lefthand_distances(
1482+ distances)
1483+ for key, distance in node_distances.iteritems():
1484+ refs[('lefthand-distance', key)] = distance
1485+ if key in existences and distance > 0:
1486+ refs[('revision-existence', key)] = True
1487+ existences.remove(key)
1488+ parent_map = branch.repository.get_graph().get_parent_map(existences)
1489+ for key in parent_map:
1490+ refs[('revision-existence', key)] = True
1491+ existences.remove(key)
1492+ for key in existences:
1493+ refs[('revision-existence', key)] = False
1494+ return refs
1495
1496=== modified file 'bzrlib/tests/per_repository/test_check.py'
1497--- bzrlib/tests/per_repository/test_check.py 2009-04-09 20:23:07 +0000
1498+++ bzrlib/tests/per_repository/test_check.py 2009-05-12 05:34:15 +0000
1499@@ -36,18 +36,14 @@
1500 tree = self.make_branch_and_tree('.')
1501 self.build_tree(['foo'])
1502 tree.smart_add(['.'])
1503- tree.commit('1')
1504+ revid1 = tree.commit('1')
1505 self.build_tree(['bar'])
1506 tree.smart_add(['.'])
1507- tree.commit('2')
1508- # XXX: check requires a non-empty revision IDs list, but it ignores the
1509- # contents of it!
1510- check_object = tree.branch.repository.check(['ignored'])
1511+ revid2 = tree.commit('2')
1512+ check_object = tree.branch.repository.check([revid1, revid2])
1513 check_object.report_results(verbose=True)
1514 log = self._get_log(keep_log_file=True)
1515- self.assertContainsRe(
1516- log,
1517- "0 unreferenced text versions")
1518+ self.assertContainsRe(log, "0 unreferenced text versions")
1519
1520
1521 class TestFindInconsistentRevisionParents(TestCaseWithBrokenRevisionIndex):
1522@@ -100,3 +96,32 @@
1523 "revision-id has wrong parents in index: "
1524 r"\('incorrect-parent',\) should be \(\)")
1525
1526+
1527+class TestCallbacks(TestCaseWithRepository):
1528+
1529+ def test_callback_tree_and_branch(self):
1530+ # use a real tree to get actual refs that will work
1531+ tree = self.make_branch_and_tree('foo')
1532+ revid = tree.commit('foo')
1533+ tree.lock_read()
1534+ self.addCleanup(tree.unlock)
1535+ needed_refs = {}
1536+ for ref in tree._get_check_refs():
1537+ needed_refs.setdefault(ref, []).append(tree)
1538+ for ref in tree.branch._get_check_refs():
1539+ needed_refs.setdefault(ref, []).append(tree.branch)
1540+ self.tree_check = tree._check
1541+ self.branch_check = tree.branch.check
1542+ tree._check = self.tree_callback
1543+ tree.branch.check = self.branch_callback
1544+ self.callbacks = []
1545+ tree.branch.repository.check([revid], callback_refs=needed_refs)
1546+ self.assertNotEqual([], self.callbacks)
1547+
1548+ def tree_callback(self, refs):
1549+ self.callbacks.append(('tree', refs))
1550+ return self.tree_check(refs)
1551+
1552+ def branch_callback(self, refs):
1553+ self.callbacks.append(('branch', refs))
1554+ return self.branch_check(refs)
1555
1556=== modified file 'bzrlib/tests/test_graph.py'
1557--- bzrlib/tests/test_graph.py 2009-06-10 03:56:49 +0000
1558+++ bzrlib/tests/test_graph.py 2009-06-16 05:29:42 +0000
1559@@ -526,6 +526,19 @@
1560 graph = self.make_graph(history_shortcut)
1561 self.assertEqual(set(['rev2b']), graph.find_lca('rev3a', 'rev3b'))
1562
1563+ def test_lefthand_distance_smoke(self):
1564+ """A simple does it work test for graph.lefthand_distance(keys)."""
1565+ graph = self.make_graph(history_shortcut)
1566+ distance_graph = graph.find_lefthand_distances(['rev3b', 'rev2a'])
1567+ self.assertEqual({'rev2a': 2, 'rev3b': 3}, distance_graph)
1568+
1569+ def test_lefthand_distance_ghosts(self):
1570+ """A simple does it work test for graph.lefthand_distance(keys)."""
1571+ nodes = {'nonghost':[NULL_REVISION], 'toghost':['ghost']}
1572+ graph = self.make_graph(nodes)
1573+ distance_graph = graph.find_lefthand_distances(['nonghost', 'toghost'])
1574+ self.assertEqual({'nonghost': 1, 'toghost': -1}, distance_graph)
1575+
1576 def test_recursive_unique_lca(self):
1577 """Test finding a unique least common ancestor.
1578
1579
1580=== modified file 'bzrlib/tests/test_versionedfile.py'
1581--- bzrlib/tests/test_versionedfile.py 2009-05-01 18:09:24 +0000
1582+++ bzrlib/tests/test_versionedfile.py 2009-06-15 00:24:04 +0000
1583@@ -30,6 +30,7 @@
1584 knit as _mod_knit,
1585 osutils,
1586 progress,
1587+ ui,
1588 )
1589 from bzrlib.errors import (
1590 RevisionNotPresent,
1591@@ -1510,6 +1511,28 @@
1592 self.assertRaises(RevisionNotPresent,
1593 files.annotate, prefix + ('missing-key',))
1594
1595+ def test_check_no_parameters(self):
1596+ files = self.get_versionedfiles()
1597+
1598+ def test_check_progressbar_parameter(self):
1599+ """A progress bar can be supplied because check can be a generator."""
1600+ pb = ui.ui_factory.nested_progress_bar()
1601+ self.addCleanup(pb.finished)
1602+ files = self.get_versionedfiles()
1603+ files.check(progress_bar=pb)
1604+
1605+ def test_check_with_keys_becomes_generator(self):
1606+ files = self.get_versionedfiles()
1607+ self.get_diamond_files(files)
1608+ keys = files.keys()
1609+ entries = files.check(keys=keys)
1610+ seen = set()
1611+ # Texts output should be fulltexts.
1612+ self.capture_stream(files, entries, seen.add,
1613+ files.get_parent_map(keys), require_fulltext=True)
1614+ # All texts should be output.
1615+ self.assertEqual(set(keys), seen)
1616+
1617 def test_construct(self):
1618 """Each parameterised test can be constructed on a transport."""
1619 files = self.get_versionedfiles()
1620@@ -1669,7 +1692,8 @@
1621 'knit-delta-closure', 'knit-delta-closure-ref',
1622 'groupcompress-block', 'groupcompress-block-ref'])
1623
1624- def capture_stream(self, f, entries, on_seen, parents):
1625+ def capture_stream(self, f, entries, on_seen, parents,
1626+ require_fulltext=False):
1627 """Capture a stream for testing."""
1628 for factory in entries:
1629 on_seen(factory.key)
1630@@ -1680,6 +1704,8 @@
1631 self.assertEqual(parents[factory.key], factory.parents)
1632 self.assertIsInstance(factory.get_bytes_as(factory.storage_kind),
1633 str)
1634+ if require_fulltext:
1635+ factory.get_bytes_as('fulltext')
1636
1637 def test_get_record_stream_interface(self):
1638 """each item in a stream has to provide a regular interface."""
1639@@ -2547,8 +2573,8 @@
1640 self.assertRaises(NotImplementedError,
1641 self.texts.add_mpdiffs, [])
1642
1643- def test_check(self):
1644- self.assertTrue(self.texts.check())
1645+ def test_check_noerrors(self):
1646+ self.texts.check()
1647
1648 def test_insert_record_stream(self):
1649 self.assertRaises(NotImplementedError, self.texts.insert_record_stream,
1650
1651=== modified file 'bzrlib/tests/workingtree_implementations/__init__.py'
1652--- bzrlib/tests/workingtree_implementations/__init__.py 2009-06-10 03:56:49 +0000
1653+++ bzrlib/tests/workingtree_implementations/__init__.py 2009-06-16 05:29:42 +0000
1654@@ -60,45 +60,49 @@
1655
1656
1657 def load_tests(standard_tests, module, loader):
1658+ test_names = [
1659+ 'add_reference',
1660+ 'add',
1661+ 'basis_inventory',
1662+ 'basis_tree',
1663+ 'break_lock',
1664+ 'changes_from',
1665+ 'check',
1666+ 'content_filters',
1667+ 'commit',
1668+ 'eol_conversion',
1669+ 'executable',
1670+ 'flush',
1671+ 'get_file_mtime',
1672+ 'get_parent_ids',
1673+ 'inv',
1674+ 'is_control_filename',
1675+ 'is_ignored',
1676+ 'locking',
1677+ 'merge_from_branch',
1678+ 'mkdir',
1679+ 'move',
1680+ 'nested_specifics',
1681+ 'parents',
1682+ 'paths2ids',
1683+ 'pull',
1684+ 'put_file',
1685+ 'readonly',
1686+ 'read_working_inventory',
1687+ 'remove',
1688+ 'rename_one',
1689+ 'revision_tree',
1690+ 'set_root_id',
1691+ 'smart_add',
1692+ 'uncommit',
1693+ 'unversion',
1694+ 'views',
1695+ 'walkdirs',
1696+ 'workingtree',
1697+ ]
1698 test_workingtree_implementations = [
1699- 'bzrlib.tests.workingtree_implementations.test_add_reference',
1700- 'bzrlib.tests.workingtree_implementations.test_add',
1701- 'bzrlib.tests.workingtree_implementations.test_basis_inventory',
1702- 'bzrlib.tests.workingtree_implementations.test_basis_tree',
1703- 'bzrlib.tests.workingtree_implementations.test_break_lock',
1704- 'bzrlib.tests.workingtree_implementations.test_changes_from',
1705- 'bzrlib.tests.workingtree_implementations.test_content_filters',
1706- 'bzrlib.tests.workingtree_implementations.test_commit',
1707- 'bzrlib.tests.workingtree_implementations.test_eol_conversion',
1708- 'bzrlib.tests.workingtree_implementations.test_executable',
1709- 'bzrlib.tests.workingtree_implementations.test_flush',
1710- 'bzrlib.tests.workingtree_implementations.test_get_file_mtime',
1711- 'bzrlib.tests.workingtree_implementations.test_get_parent_ids',
1712- 'bzrlib.tests.workingtree_implementations.test_inv',
1713- 'bzrlib.tests.workingtree_implementations.test_is_control_filename',
1714- 'bzrlib.tests.workingtree_implementations.test_is_ignored',
1715- 'bzrlib.tests.workingtree_implementations.test_locking',
1716- 'bzrlib.tests.workingtree_implementations.test_merge_from_branch',
1717- 'bzrlib.tests.workingtree_implementations.test_mkdir',
1718- 'bzrlib.tests.workingtree_implementations.test_move',
1719- 'bzrlib.tests.workingtree_implementations.test_nested_specifics',
1720- 'bzrlib.tests.workingtree_implementations.test_parents',
1721- 'bzrlib.tests.workingtree_implementations.test_paths2ids',
1722- 'bzrlib.tests.workingtree_implementations.test_pull',
1723- 'bzrlib.tests.workingtree_implementations.test_put_file',
1724- 'bzrlib.tests.workingtree_implementations.test_readonly',
1725- 'bzrlib.tests.workingtree_implementations.test_read_working_inventory',
1726- 'bzrlib.tests.workingtree_implementations.test_remove',
1727- 'bzrlib.tests.workingtree_implementations.test_rename_one',
1728- 'bzrlib.tests.workingtree_implementations.test_revision_tree',
1729- 'bzrlib.tests.workingtree_implementations.test_set_root_id',
1730- 'bzrlib.tests.workingtree_implementations.test_smart_add',
1731- 'bzrlib.tests.workingtree_implementations.test_uncommit',
1732- 'bzrlib.tests.workingtree_implementations.test_unversion',
1733- 'bzrlib.tests.workingtree_implementations.test_views',
1734- 'bzrlib.tests.workingtree_implementations.test_walkdirs',
1735- 'bzrlib.tests.workingtree_implementations.test_workingtree',
1736- ]
1737+ 'bzrlib.tests.workingtree_implementations.test_' + name for
1738+ name in test_names]
1739
1740 scenarios = make_scenarios(
1741 tests.default_transport,
1742
1743=== added file 'bzrlib/tests/workingtree_implementations/test_check.py'
1744--- bzrlib/tests/workingtree_implementations/test_check.py 1970-01-01 00:00:00 +0000
1745+++ bzrlib/tests/workingtree_implementations/test_check.py 2009-05-08 02:06:36 +0000
1746@@ -0,0 +1,55 @@
1747+# Copyright (C) 2009 Canonical Ltd
1748+#
1749+# This program is free software; you can redistribute it and/or modify
1750+# it under the terms of the GNU General Public License as published by
1751+# the Free Software Foundation; either version 2 of the License, or
1752+# (at your option) any later version.
1753+#
1754+# This program is distributed in the hope that it will be useful,
1755+# but WITHOUT ANY WARRANTY; without even the implied warranty of
1756+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
1757+# GNU General Public License for more details.
1758+#
1759+# You should have received a copy of the GNU General Public License
1760+# along with this program; if not, write to the Free Software
1761+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
1762+
1763+"""Tests for checking of trees."""
1764+
1765+from bzrlib import (
1766+ tests,
1767+ )
1768+from bzrlib.tests.workingtree_implementations import TestCaseWithWorkingTree
1769+
1770+
1771+class TestCheck(TestCaseWithWorkingTree):
1772+
1773+ def test__get_check_refs_new(self):
1774+ tree = self.make_branch_and_tree('tree')
1775+ self.assertEqual(set([('trees', 'null:')]),
1776+ set(tree._get_check_refs()))
1777+
1778+ def test__get_check_refs_basis(self):
1779+ # with a basis, all current bzr trees cache it and so need the
1780+ # inventory to cross-check.
1781+ tree = self.make_branch_and_tree('tree')
1782+ revid = tree.commit('first post')
1783+ self.assertEqual(set([('trees', revid)]),
1784+ set(tree._get_check_refs()))
1785+
1786+ def test__check_with_refs(self):
1787+ # _check can be called with a dict of the things required.
1788+ tree = self.make_branch_and_tree('tree')
1789+ tree.lock_write()
1790+ self.addCleanup(tree.unlock)
1791+ revid = tree.commit('first post')
1792+ needed_refs = tree._get_check_refs()
1793+ repo = tree.branch.repository
1794+ for ref in needed_refs:
1795+ kind, revid = ref
1796+ refs = {}
1797+ if kind == 'trees':
1798+ refs[ref] = repo.revision_tree(revid)
1799+ else:
1800+ self.fail('unknown ref kind')
1801+ tree._check(refs)
1802
1803=== modified file 'bzrlib/ui/text.py'
1804--- bzrlib/ui/text.py 2009-04-10 19:37:20 +0000
1805+++ bzrlib/ui/text.py 2009-05-12 06:30:56 +0000
1806@@ -26,6 +26,7 @@
1807 from bzrlib.lazy_import import lazy_import
1808 lazy_import(globals(), """
1809 from bzrlib import (
1810+ debug,
1811 progress,
1812 osutils,
1813 symbol_versioning,
1814@@ -128,6 +129,7 @@
1815 self._last_task = None
1816 self._total_byte_count = 0
1817 self._bytes_since_update = 0
1818+ self._fraction = 0
1819
1820 def _show_line(self, s):
1821 n = self._width - 1
1822@@ -151,9 +153,14 @@
1823 cols = 20
1824 if self._last_task is None:
1825 completion_fraction = 0
1826+ self._fraction = 0
1827 else:
1828 completion_fraction = \
1829 self._last_task._overall_completion_fraction() or 0
1830+ if (completion_fraction < self._fraction and 'progress' in
1831+ debug.debug_flags):
1832+ import pdb;pdb.set_trace()
1833+ self._fraction = completion_fraction
1834 markers = int(round(float(cols) * completion_fraction)) - 1
1835 bar_str = '[' + ('#' * markers + spin_str).ljust(cols) + '] '
1836 return bar_str
1837
1838=== modified file 'bzrlib/versionedfile.py'
1839--- bzrlib/versionedfile.py 2009-06-10 03:56:49 +0000
1840+++ bzrlib/versionedfile.py 2009-06-16 05:29:42 +0000
1841@@ -876,7 +876,16 @@
1842 raise NotImplementedError(self.annotate)
1843
1844 def check(self, progress_bar=None):
1845- """Check this object for integrity."""
1846+ """Check this object for integrity.
1847+
1848+ :param progress_bar: A progress bar to output as the check progresses.
1849+ :param keys: Specific keys within the VersionedFiles to check. When
1850+ this parameter is not None, check() becomes a generator as per
1851+ get_record_stream. The difference to get_record_stream is that
1852+ more or deeper checks will be performed.
1853+ :return: None, or if keys was supplied a generator as per
1854+ get_record_stream.
1855+ """
1856 raise NotImplementedError(self.check)
1857
1858 @staticmethod
1859@@ -1092,10 +1101,15 @@
1860 result.append((prefix + (origin,), line))
1861 return result
1862
1863- def check(self, progress_bar=None):
1864+ def check(self, progress_bar=None, keys=None):
1865 """See VersionedFiles.check()."""
1866+ # XXX: This is over-enthusiastic but as we only thunk for Weaves today
1867+ # this is tolerable. Ideally we'd pass keys down to check() and
1868+ # have the older VersiondFile interface updated too.
1869 for prefix, vf in self._iter_all_components():
1870 vf.check()
1871+ if keys is not None:
1872+ return self.get_record_stream(keys, 'unordered', True)
1873
1874 def get_parent_map(self, keys):
1875 """Get a map of the parents of keys.
1876
1877=== modified file 'bzrlib/workingtree.py'
1878--- bzrlib/workingtree.py 2009-06-15 15:47:45 +0000
1879+++ bzrlib/workingtree.py 2009-06-16 05:29:42 +0000
1880@@ -290,6 +290,16 @@
1881 self._control_files.break_lock()
1882 self.branch.break_lock()
1883
1884+ def _get_check_refs(self):
1885+ """Return the references needed to perform a check of this tree.
1886+
1887+ The default implementation returns no refs, and is only suitable for
1888+ trees that have no local caching and can commit on ghosts at any time.
1889+
1890+ :seealso: bzrlib.check for details about check_refs.
1891+ """
1892+ return []
1893+
1894 def requires_rich_root(self):
1895 return self._format.requires_rich_root
1896
1897@@ -2515,12 +2525,17 @@
1898 return un_resolved, resolved
1899
1900 @needs_read_lock
1901- def _check(self):
1902+ def _check(self, references):
1903+ """Check the tree for consistency.
1904+
1905+ :param references: A dict with keys matching the items returned by
1906+ self._get_check_refs(), and values from looking those keys up in
1907+ the repository.
1908+ """
1909 tree_basis = self.basis_tree()
1910 tree_basis.lock_read()
1911 try:
1912- repo_basis = self.branch.repository.revision_tree(
1913- self.last_revision())
1914+ repo_basis = references[('trees', self.last_revision())]
1915 if len(list(repo_basis.iter_changes(tree_basis))) > 0:
1916 raise errors.BzrCheckError(
1917 "Mismatched basis inventory content.")
1918@@ -2572,6 +2587,10 @@
1919 if self._inventory is None:
1920 self.read_working_inventory()
1921
1922+ def _get_check_refs(self):
1923+ """Return the references needed to perform a check of this tree."""
1924+ return [('trees', self.last_revision())]
1925+
1926 def lock_tree_write(self):
1927 """See WorkingTree.lock_tree_write().
1928
1929@@ -2634,6 +2653,10 @@
1930 mode=self.bzrdir._get_file_mode())
1931 return True
1932
1933+ def _get_check_refs(self):
1934+ """Return the references needed to perform a check of this tree."""
1935+ return [('trees', self.last_revision())]
1936+
1937 @needs_tree_write_lock
1938 def set_conflicts(self, conflicts):
1939 self._put_rio('conflicts', conflicts.to_stanzas(),
1940
1941=== added file 'doc/developers/check.txt'
1942--- doc/developers/check.txt 1970-01-01 00:00:00 +0000
1943+++ doc/developers/check.txt 2009-05-12 03:50:39 +0000
1944@@ -0,0 +1,63 @@
1945+Check Notes
1946+===========
1947+
1948+.. contents:: :local:
1949+
1950+Overview
1951+--------
1952+
1953+Check has multiple responsibilities:
1954+
1955+* Ensure that the data as recorded on disk is accessible intact and unaltered.
1956+* Ensure that a branch/repository/tree/whatever is ready for upgrade.
1957+* Look for and report on recorded-data issues where previous bzr's, or changing
1958+ situations have lead so some form of inconsistency.
1959+* Report sufficient information for a user to either fix the issue themselves
1960+ or report a bug that will hopefully be sufficiently detailed we can fix based
1961+ on the initial report.
1962+* Not scare users when run if everything is okey-dokey.
1963+
1964+Ideally one check invocation can do all these things.
1965+
1966+Repository
1967+----------
1968+
1969+Things that can go wrong:
1970+* Bit errors or alterations may occur in raw data.
1971+* Data that is referenced may be missing
1972+* There could be a lot of garbage in upload etc.
1973+* File graphs may be inconsistent with inventories and parents.
1974+* The revision graph cache can be inconsistent with the revision data.
1975+
1976+Branch
1977+------
1978+
1979+Things that can go wrong:
1980+* Tag or tip revision ids may be missing from the repo.
1981+* The revno tip cache may be wrong.
1982+* Various urls could be problematic (not inaccessible, just invalid)
1983+* Stacked-on branch could be inaccessible.
1984+
1985+Tree
1986+----
1987+
1988+Things that can go wrong:
1989+* Bit errors in dirstate.
1990+* Corrupt or invalid shelves.
1991+* Corrupt dirstates written to disk.
1992+* Cached inventories might not match repository.
1993+
1994+Duplicate work
1995+--------------
1996+
1997+If we check every branch in a repo separately we will encounter duplicate
1998+effort in assessing things like missing tags/tips, revno cache etc.
1999+
2000+Outline of approach
2001+-------------------
2002+
2003+To check a repository, we scan for branches, open their trees and generate
2004+summary data. We then collect all the summary data in as compact a form as
2005+possible and do a detailed check on the repository, calling back out to branch
2006+and trees as we encounter the actual data that that tree/branch requires to
2007+perform its check.
2008