Huge patch to review here... for which the main intent is not
obvious.
The overall feeling is that you stopped your effort at an
intermediate point, with enough gains to land. Yet, many things
seems unfinished.
If you could add comments as brain dump before landing I'm sure a
lot of energy will not be wasted by the future maintainer (who
may or not be yourself ;-)
Some remarks ask for changes, they are small enough to be
considered as tweaks. If you disagree, an explanation will be
welcome.
My main concern here is that you're changing check without adding
test facilities to create broken repo/branch/tree...and while I
understand the desire to not check for every conceivable bug, I'd
feel more confident in the code if some tests were easier to
write...
Other than that, it looks good so with as many tweaks as you can
take in to account:
'revision-existence' is self-explanatory, but 'lefthand-distance'
is not. Add a comment, if only to point the reader to the right
place to understand it.
I find the way to specify the different checks quite pleasant at
that point :)
I'm less sure about the way to store the results. AIUI 'refs' (as
used in Check.check()) will be a dict with keys being (kind,
id).
Why didn't you use a first level of dict with 'kind' as key and
new dict with 'id' as key instead ?
Is there some other reason than keeping the definition and
accesses use the same key ?
<snip/>
robert> === modified file 'bzrlib/check.py'
...
robert> - def check(self):
robert> + def check(self, callback_refs=None, check_repo=True):
robert> + if callback_refs is None:
...
robert> + self.progress.update('check', 0, 4)
robert> + if self.check_repo:
robert> + self.progress.update('checking revisions', 0)
robert> + self.check_revisions()
robert> + self.progress.update('checking commit contents', 1)
robert> + self.repository._check_inventories(self)
robert> + self.progress.update('checking file graphs', 2)
robert> + # check_weaves is done after the revision scan so that
robert> + # revision index is known to be valid.
robert> + self.check_weaves()
robert> + self.progress.update('checking branches and trees', 3)
The above is clear, but put the following if block in a method by
itself, it's bigger than the rest of the check() method and
certainly look like something you want to continue working on
(and so that it can be overridden), please.
I can't tell for sure, but this may also need a step in the
progress report (it *looks* like something that can vary greatly
at least).
I can't see how kinds can contain 'inventories' here, the comment
needs to be updated.
robert> + for kind in kinds:
robert> + if keys[kind]:
robert> + last_object = None
robert> + for record in getattr(self, kind).check(keys=keys[kind]):
robert> + if record.storage_kind == 'absent':
robert> + checker._report_items.append(
robert> + 'Missing inventory {%s}' % (record.key,))
This is... unclean, copy/pasted from above or bogus. It's hard to
tell without digging really deep, but just reading the code, I
fail to understand why you couldn't have a missing 'text' here
(and as said above, I fail to see how it can be an inventory). If
that's the case, a better design or at minimum a comment will be
welcome.
Also do you miss a test that would have caught that ?
...
robert> + def _check_record(self, kind, record, checker, last_object, item_data):
...
robert> + elif kind == 'chk_bytes':
robert> + # No code written to check chk_bytes for this repo format.
Is that a TODO or a FIXME ?
...
robert> + def _check_text(self, record, checker, item_data):
robert> + """Check a single text."""
robert> + # Check it is extractable.
robert> + # TODO: check length.
Add '-- lifeless <aaaammjj>' please.
...
robert> @@ -3950,6 +4096,18 @@
robert> revision_id) tuples for versions that are present in this versioned
robert> file, but not used by the corresponding inventory.
robert> """
robert> + local_progress = None
robert> + if progress_bar is None:
robert> + local_progress = ui.ui_factory.nested_progress_bar()
robert> + progress_bar = local_progress
robert> + try:
robert> + return self._check_file_version_parents(texts, progress_bar)
robert> + finally:
robert> + if local_progress:
robert> + local_progress.finished()
robert> +
robert> + def _check_file_version_parents(self, texts, progress_bar):
robert> + """See check_file_version_parents."""
robert> wrong_parents = {}
robert> self.file_ids = set([file_id for file_id, _ in
robert> self.text_index.iterkeys()])
robert> @@ -3964,8 +4122,7 @@
robert> text_keys = self.repository.texts.keys()
robert> unused_keys = frozenset(text_keys) - set(self.text_index)
robert> for num, key in enumerate(self.text_index.iterkeys()):
robert> - if progress_bar is not None:
robert> - progress_bar.update('checking text graph', num, n_versions)
robert> + progress_bar.update('checking text graph', num, n_versions)
robert> correct_parents = self.calculate_file_version_parents(key)
robert> try:
robert> knit_parents = parent_map[key]
12 lines added for 1 deleted ? Is that a debug attempt ? Do you
really want to land that ? Why ?
...
robert> === modified file 'bzrlib/tests/workingtree_implementations/__init__.py'
...
robert> def load_tests(standard_tests, module, loader):
...
robert> + 'bzrlib.tests.workingtree_implementations.test_' + name for
robert> + name in test_names]
+1 ;-D
robert> === added file 'doc/developers/check.txt'
robert> +Repository
robert> +----------
robert> +
robert> +Things that can go wrong:
robert> +* Bit errors or alterations may occur in raw data.
May be good to say that the above are due to hardware or OS
errors while the following are more likely due to bzr bugs.
robert> +* Data that is referenced may be missing
robert> +* There could be a lot of garbage in upload etc.
robert> +* File graphs may be inconsistent with inventories and parents.
robert> +* The revision graph cache can be inconsistent with the revision data.
Huge patch to review here... for which the main intent is not
obvious.
The overall feeling is that you stopped your effort at an
intermediate point, with enough gains to land. Yet, many things
seems unfinished.
If you could add comments as brain dump before landing I'm sure a
lot of energy will not be wasted by the future maintainer (who
may or not be yourself ;-)
Some remarks ask for changes, they are small enough to be
considered as tweaks. If you disagree, an explanation will be
welcome.
My main concern here is that you're changing check without adding tree... and while I
test facilities to create broken repo/branch/
understand the desire to not check for every conceivable bug, I'd
feel more confident in the code if some tests were easier to
write...
Other than that, it looks good so with as many tweaks as you can
take in to account:
Review: approve
>>>>> "robert" == Robert Collins <email address hidden> writes: UnstackableRepo sitoryFormat( self.repository ._format, .base)
...
robert> === modified file 'bzrlib/branch.py'
robert> --- bzrlib/branch.py 2009-06-10 03:56:49 +0000
robert> +++ bzrlib/branch.py 2009-06-16 05:29:42 +0000
robert> @@ -125,6 +125,14 @@
robert> raise errors.
robert> self.repository
robert> + def _get_check_ refs(self) : revision( ) existence' , revid), ('lefthand- distance' , revid)]
robert> + """Get the references needed for check().
robert> +
robert> + See bzrlib.check.
robert> + """
robert> + revid = self.last_
robert> + return [('revision-
'revision- existence' is self-explanatory, but 'lefthand-distance'
is not. Add a comment, if only to point the reader to the right
place to understand it.
I find the way to specify the different checks quite pleasant at
that point :)
I'm less sure about the way to store the results. AIUI 'refs' (as
used in Check.check()) will be a dict with keys being (kind,
id).
Why didn't you use a first level of dict with 'kind' as key and
new dict with 'id' as key instead ?
Is there some other reason than keeping the definition and
accesses use the same key ?
<snip/>
robert> === modified file 'bzrlib/check.py' update( 'check' , 0, 4) update( 'checking revisions', 0) revisions( ) update( 'checking commit contents', 1) ._check_ inventories( self) update( 'checking file graphs', 2) update( 'checking branches and trees', 3)
...
robert> - def check(self):
robert> + def check(self, callback_refs=None, check_repo=True):
robert> + if callback_refs is None:
...
robert> + self.progress.
robert> + if self.check_repo:
robert> + self.progress.
robert> + self.check_
robert> + self.progress.
robert> + self.repository
robert> + self.progress.
robert> + # check_weaves is done after the revision scan so that
robert> + # revision index is known to be valid.
robert> + self.check_weaves()
robert> + self.progress.
The above is clear, but put the following if block in a method by
itself, it's bigger than the rest of the check() method and
certainly look like something you want to continue working on
(and so that it can be overridden), please.
I can't tell for sure, but this may also need a step in the
progress report (it *looks* like something that can vary greatly
at least).
robert> + if callback_refs: finished( ) .unlock( )
...
robert> finally:
robert> self.progress.
robert> self.repository
... help_topics/ en/debug- flags.txt' help_topics/ en/debug- flags.txt 2009-03-18 23:43:51 +0000 help_topics/ en/debug- flags.txt 2009-05-12 06:30:56 +0000
robert> === modified file 'bzrlib/
robert> --- bzrlib/
robert> +++ bzrlib/
robert> @@ -19,6 +19,7 @@
robert> -Dindex Trace major index operations.
robert> -Dknit Trace knit operations.
robert> -Dlock Trace when lockdir locks are taken or released.
robert> +-Dprogress Trace progress bar operations.
Cough. Misleading description. 'Enable progress bar debugging' ? repofmt/ pack_repo. py' repofmt/ pack_repo. py 2009-06-10 03:56:49 +0000 repofmt/ pack_repo. py 2009-06-16 05:29:42 +0000 _index. _key_dependenci es.refs. clear() collection. _abort_ write_group( )
...
robert> === modified file 'bzrlib/
robert> --- bzrlib/
robert> +++ bzrlib/
robert> @@ -2219,52 +2219,6 @@
robert> self.revisions.
robert> self._pack_
robert> - def _find_inconsist ent_revision_ parents( self):
It's hard to tell if this deleted version is safely replaced by
the generic one, can you confirm ?
robert> === modified file 'bzrlib/ repository. py' inventories( self, checker, bar):
...
robert> + def _do_check_
...
robert> + # Check the outermost kind only - inventories || chk_bytes || texts
I can't see how kinds can contain 'inventories' here, the comment
needs to be updated.
robert> + for kind in kinds: keys=keys[ kind]): _report_ items.append(
robert> + if keys[kind]:
robert> + last_object = None
robert> + for record in getattr(self, kind).check(
robert> + if record.storage_kind == 'absent':
robert> + checker.
robert> + 'Missing inventory {%s}' % (record.key,))
This is... unclean, copy/pasted from above or bogus. It's hard to
tell without digging really deep, but just reading the code, I
fail to understand why you couldn't have a missing 'text' here
(and as said above, I fail to see how it can be an inventory). If
that's the case, a better design or at minimum a comment will be
welcome.
Also do you miss a test that would have caught that ?
...
robert> + def _check_record(self, kind, record, checker, last_object, item_data):
...
robert> + elif kind == 'chk_bytes':
robert> + # No code written to check chk_bytes for this repo format.
Is that a TODO or a FIXME ?
...
robert> + def _check_text(self, record, checker, item_data):
robert> + """Check a single text."""
robert> + # Check it is extractable.
robert> + # TODO: check length.
Add '-- lifeless <aaaammjj>' please. nested_ progress_ bar() file_version_ parents( texts, progress_bar) finished( ) file_version_ parents( self, texts, progress_bar): version_ parents. """ index.iterkeys( )]) .texts. keys() text_keys) - set(self. text_index) self.text_ index.iterkeys( )): bar.update( 'checking text graph', num, n_versions) bar.update( 'checking text graph', num, n_versions) file_version_ parents( key)
...
robert> @@ -3950,6 +4096,18 @@
robert> revision_id) tuples for versions that are present in this versioned
robert> file, but not used by the corresponding inventory.
robert> """
robert> + local_progress = None
robert> + if progress_bar is None:
robert> + local_progress = ui.ui_factory.
robert> + progress_bar = local_progress
robert> + try:
robert> + return self._check_
robert> + finally:
robert> + if local_progress:
robert> + local_progress.
robert> +
robert> + def _check_
robert> + """See check_file_
robert> wrong_parents = {}
robert> self.file_ids = set([file_id for file_id, _ in
robert> self.text_
robert> @@ -3964,8 +4122,7 @@
robert> text_keys = self.repository
robert> unused_keys = frozenset(
robert> for num, key in enumerate(
robert> - if progress_bar is not None:
robert> - progress_
robert> + progress_
robert> correct_parents = self.calculate_
robert> try:
robert> knit_parents = parent_map[key]
12 lines added for 1 deleted ? Is that a debug attempt ? Do you
really want to land that ? Why ?
... tests/branch_ implementations /test_check. py'
robert> === modified file 'bzrlib/
...
robert> -from bzrlib import errors
robert> +from StringIO import StringIO
Why not cStringIO ?
...
robert> + def make_refs(self, branch):
Nice, but seems to duplicate a bit too much real code :-/
robert> === modified file 'bzrlib/ tests/per_ repository/ test_check. py'
...
robert> - # XXX: check requires a non-empty revision IDs list, but it ignores the
robert> - # contents of it!
<Cough> It looks like you didn't address the XXX revid2/ 'ignored' / below doesn't make the test fail), why
(s/revid1,
deleting it then ?
robert> - check_object = tree.branch. repository. check([ 'ignored' ]) repository. check([ revid1, revid2])
robert> + revid2 = tree.commit('2')
robert> + check_object = tree.branch.
... tests/workingtr ee_implementati ons/__init_ _.py' standard_ tests, module, loader): tests.workingtr ee_implementati ons.test_ ' + name for
robert> === modified file 'bzrlib/
...
robert> def load_tests(
...
robert> + 'bzrlib.
robert> + name in test_names]
+1 ;-D
robert> === added file 'doc/developers /check. txt'
robert> +Repository
robert> +----------
robert> +
robert> +Things that can go wrong:
robert> +* Bit errors or alterations may occur in raw data.
May be good to say that the above are due to hardware or OS
errors while the following are more likely due to bzr bugs.
robert> +* Data that is referenced may be missing
robert> +* There could be a lot of garbage in upload etc.
robert> +* File graphs may be inconsistent with inventories and parents.
robert> +* The revision graph cache can be inconsistent with the revision data.