Bazaar

Code review comment for lp:~lifeless/bzr/check-1

check-1
Merge into trunk-old

Revision history for this message

Vincent Ladeuil (vila) wrote on 2009-07-07:

Huge patch to review here... for which the main intent is not
obvious.

The overall feeling is that you stopped your effort at an
intermediate point, with enough gains to land. Yet, many things
seems unfinished.

If you could add comments as brain dump before landing I'm sure a
lot of energy will not be wasted by the future maintainer (who
may or not be yourself ;-)

Some remarks ask for changes, they are small enough to be
considered as tweaks. If you disagree, an explanation will be
welcome.

My main concern here is that you're changing check without adding
test facilities to create broken repo/branch/tree...and while I
understand the desire to not check for every conceivable bug, I'd
feel more confident in the code if some tests were easier to
write...

Other than that, it looks good so with as many tweaks as you can
take in to account:

Review: approve

>>>>> "robert" == Robert Collins <email address hidden> writes:
...
    robert> === modified file 'bzrlib/branch.py'
    robert> --- bzrlib/branch.py 2009-06-10 03:56:49 +0000
    robert> +++ bzrlib/branch.py 2009-06-16 05:29:42 +0000
    robert> @@ -125,6 +125,14 @@
    robert> raise errors.UnstackableRepositoryFormat(self.repository._format,
    robert> self.repository.base)

    robert> + def _get_check_refs(self):
    robert> + """Get the references needed for check().
    robert> +
    robert> + See bzrlib.check.
    robert> + """
    robert> + revid = self.last_revision()
    robert> + return [('revision-existence', revid), ('lefthand-distance', revid)]

'revision-existence' is self-explanatory, but 'lefthand-distance'
is not. Add a comment, if only to point the reader to the right
place to understand it.

I find the way to specify the different checks quite pleasant at
that point :)

I'm less sure about the way to store the results. AIUI 'refs' (as
used in Check.check()) will be a dict with keys being (kind,
id).

Why didn't you use a first level of dict with 'kind' as key and
new dict with 'id' as key instead ?

Is there some other reason than keeping the definition and
accesses use the same key ?

<snip/>

    robert> === modified file 'bzrlib/check.py'
...
    robert> - def check(self):
    robert> + def check(self, callback_refs=None, check_repo=True):
    robert> + if callback_refs is None:
...
    robert> + self.progress.update('check', 0, 4)
    robert> + if self.check_repo:
    robert> + self.progress.update('checking revisions', 0)
    robert> + self.check_revisions()
    robert> + self.progress.update('checking commit contents', 1)
    robert> + self.repository._check_inventories(self)
    robert> + self.progress.update('checking file graphs', 2)
    robert> + # check_weaves is done after the revision scan so that
    robert> + # revision index is known to be valid.
    robert> + self.check_weaves()
    robert> + self.progress.update('checking branches and trees', 3)

The above is clear, but put the following if block in a method by
itself, it's bigger than the rest of the check() method and
certainly look like something you want to continue working on
(and so that it can be overridden), please.

I can't tell for sure, but this may also need a step in the
progress report (it *looks* like something that can vary greatly
at least).

    robert> + if callback_refs:
...
    robert> finally:
    robert> self.progress.finished()
    robert> self.repository.unlock()

...
    robert> === modified file 'bzrlib/help_topics/en/debug-flags.txt'
    robert> --- bzrlib/help_topics/en/debug-flags.txt 2009-03-18 23:43:51 +0000
    robert> +++ bzrlib/help_topics/en/debug-flags.txt 2009-05-12 06:30:56 +0000
    robert> @@ -19,6 +19,7 @@
    robert> -Dindex Trace major index operations.
    robert> -Dknit Trace knit operations.
    robert> -Dlock Trace when lockdir locks are taken or released.
    robert> +-Dprogress Trace progress bar operations.

Cough. Misleading description. 'Enable progress bar debugging' ?
...
    robert> === modified file 'bzrlib/repofmt/pack_repo.py'
    robert> --- bzrlib/repofmt/pack_repo.py 2009-06-10 03:56:49 +0000
    robert> +++ bzrlib/repofmt/pack_repo.py 2009-06-16 05:29:42 +0000
    robert> @@ -2219,52 +2219,6 @@
    robert> self.revisions._index._key_dependencies.refs.clear()
    robert> self._pack_collection._abort_write_group()

robert> - def _find_inconsistent_revision_parents(self):

It's hard to tell if this deleted version is safely replaced by
the generic one, can you confirm ?

    robert> === modified file 'bzrlib/repository.py'
...
    robert> + def _do_check_inventories(self, checker, bar):
...
    robert> + # Check the outermost kind only - inventories || chk_bytes || texts

I can't see how kinds can contain 'inventories' here, the comment
needs to be updated.

This is... unclean, copy/pasted from above or bogus. It's hard to
tell without digging really deep, but just reading the code, I
fail to understand why you couldn't have a missing 'text' here
(and as said above, I fail to see how it can be an inventory). If
that's the case, a better design or at minimum a comment will be
welcome.

Also do you miss a test that would have caught that ?
...
    robert> + def _check_record(self, kind, record, checker, last_object, item_data):
...
    robert> + elif kind == 'chk_bytes':
    robert> + # No code written to check chk_bytes for this repo format.

Is that a TODO or a FIXME ?
...
    robert> + def _check_text(self, record, checker, item_data):
    robert> + """Check a single text."""
    robert> + # Check it is extractable.
    robert> + # TODO: check length.

Add '-- lifeless <aaaammjj>' ...
robert> robert> robert> robert> robert> + robert> + robert> + robert> + robert> + robert> + robert> + robert> + robert> + robert> robert> + robert> + robert> robert> robert> robert> robert> robert> robert> robert> - robert> - robert> + robert> robert> robert> please.
@@ -3950,6 +4096,18 @@
revision_id) tuples for versions that are present in this versioned
file, but not used by the corresponding inventory.
"""
local_progress = None
if progress_bar is None:
local_progress = ui.ui_factory.nested_progress_bar()
progress_bar = local_progress
try:
return self._check_file_version_parents(texts, progress_bar)
finally:
if local_progress:
local_progress.finished()
+
def _check_file_version_parents(self, texts, progress_bar):
"""See check_file_version_parents."""
wrong_parents = {}
self.file_ids = set([file_id for file_id, _ in
self.text_index.iterkeys()])
@@ -3964,8 +4122,7 @@
text_keys = self.repository.texts.keys()
unused_keys = frozenset(text_keys) - set(self.text_index)
for num, key in enumerate(self.text_index.iterkeys()):
if progress_bar is not None:
progress_bar.update('checking text graph', num, n_versions)
progress_bar.update('checking text graph', num, n_versions)
correct_parents = self.calculate_file_version_parents(key)
try:
knit_parents = parent_map[key]

12 lines added for 1 deleted ? Is that a debug attempt ? Do you
really want to land that ? Why ?

...
    robert> === modified file 'bzrlib/tests/branch_implementations/test_check.py'
...
    robert> -from bzrlib import errors
    robert> +from StringIO import StringIO

Why not cStringIO ?
...
robert> + def make_refs(self, branch):

Nice, but seems to duplicate a bit too much real code :-/

    robert> === modified file 'bzrlib/tests/per_repository/test_check.py'
...
    robert> - # XXX: check requires a non-empty revision IDs list, but it ignores the
    robert> - # contents of it!

<Cough> It looks like you didn't address the XXX
(s/revid1,revid2/'ignored'/ below doesn't make the test fail), why
deleting it then ?

    robert> - check_object = tree.branch.repository.check(['ignored'])
    robert> + revid2 = tree.commit('2')
    robert> + check_object = tree.branch.repository.check([revid1, revid2])

...
    robert> === modified file 'bzrlib/tests/workingtree_implementations/__init__.py'
...
    robert> def load_tests(standard_tests, module, loader):
...
    robert> + 'bzrlib.tests.workingtree_implementations.test_' + name for
    robert> + name in test_names]

+1 ;-D

    robert> === added file 'doc/developers/check.txt'
    robert> +Repository
    robert> +----------
    robert> +
    robert> +Things that can go wrong:
    robert> +* Bit errors or alterations may occur in raw data.

May be good to say that the above are due to hardware or OS
errors while the following are more likely due to bzr bugs.

    robert> +* Data that is referenced may be missing
    robert> +* There could be a lot of garbage in upload etc.
    robert> +* File graphs may be inconsistent with inventories and parents.
    robert> +* The revision graph cache can be inconsistent with the revision data.

Huge patch to review here... for which the main intent is not
obvious.

The overall feeling is that you stopped your effort at an
intermediate point, with enough gains to land. Yet, many things
seems unfinished.

If you could add comments as brain dump before landing I'm sure a
lot of energy will not be wasted by the future maintainer (who
may or not be yourself ;-)

Some remarks ask for changes, they are small enough to be
considered as tweaks. If you disagree, an explanation will be
welcome.

Other than that, it looks good so with as many tweaks as you can
take in to account:

Review: approve

>>>>> "robert" == Robert Collins <robertc@robertcollins.net> writes:
...
    robert> === modified file 'bzrlib/branch.py'
    robert> --- bzrlib/branch.py	2009-06-10 03:56:49 +0000
    robert> +++ bzrlib/branch.py	2009-06-16 05:29:42 +0000
    robert> @@ -125,6 +125,14 @@
    robert>              raise errors.UnstackableRepositoryFormat(self.repository._format,
    robert>                  self.repository.base)
 
    robert> +    def _get_check_refs(self):
    robert> +        """Get the references needed for check().
    robert> +
    robert> +        See bzrlib.check.
    robert> +        """
    robert> +        revid = self.last_revision()
    robert> +        return [('revision-existence', revid), ('lefthand-distance', revid)]

'revision-existence' is self-explanatory, but 'lefthand-distance'
is not. Add a comment, if only to point the reader to the right
place to understand it.

I find the way to specify the different checks quite pleasant at
that point :)

I'm less sure about the way to store the results. AIUI 'refs' (as
used in Check.check()) will be a dict with keys being (kind,
id).

Why didn't you use a first level of dict with 'kind' as key and
new dict with 'id' as key instead ?

Is there some other reason than keeping the definition and
accesses use the same key ?

<snip/>

robert> === modified file 'bzrlib/check.py'
...
    robert> -    def check(self):
    robert> +    def check(self, callback_refs=None, check_repo=True):
    robert> +        if callback_refs is None:
...
    robert> +            self.progress.update('check', 0, 4)
    robert> +            if self.check_repo:
    robert> +                self.progress.update('checking revisions', 0)
    robert> +                self.check_revisions()
    robert> +                self.progress.update('checking commit contents', 1)
    robert> +                self.repository._check_inventories(self)
    robert> +                self.progress.update('checking file graphs', 2)
    robert> +                # check_weaves is done after the revision scan so that
    robert> +                # revision index is known to be valid.
    robert> +                self.check_weaves()
    robert> +            self.progress.update('checking branches and trees', 3)

I can't tell for sure, but this may also need a step in the
progress report (it *looks* like something that can vary greatly
at least).

robert> +            if callback_refs:
...
    robert>          finally:
    robert>              self.progress.finished()
    robert>              self.repository.unlock()

...
    robert> === modified file 'bzrlib/help_topics/en/debug-flags.txt'
    robert> --- bzrlib/help_topics/en/debug-flags.txt	2009-03-18 23:43:51 +0000
    robert> +++ bzrlib/help_topics/en/debug-flags.txt	2009-05-12 06:30:56 +0000
    robert> @@ -19,6 +19,7 @@
    robert>  -Dindex           Trace major index operations.
    robert>  -Dknit            Trace knit operations.
    robert>  -Dlock            Trace when lockdir locks are taken or released.
    robert> +-Dprogress        Trace progress bar operations.

Cough. Misleading description. 'Enable progress bar debugging' ?
...
    robert> === modified file 'bzrlib/repofmt/pack_repo.py'
    robert> --- bzrlib/repofmt/pack_repo.py	2009-06-10 03:56:49 +0000
    robert> +++ bzrlib/repofmt/pack_repo.py	2009-06-16 05:29:42 +0000
    robert> @@ -2219,52 +2219,6 @@
    robert>          self.revisions._index._key_dependencies.refs.clear()
    robert>          self._pack_collection._abort_write_group()
 
    robert> -    def _find_inconsistent_revision_parents(self):

It's hard to tell if this deleted version is safely replaced by
the generic one, can you confirm ?

robert> === modified file 'bzrlib/repository.py'
...
    robert> +    def _do_check_inventories(self, checker, bar):
...
    robert> +            # Check the outermost kind only - inventories || chk_bytes || texts

I can't see how kinds can contain 'inventories' here, the comment
needs to be updated.

robert> +            for kind in kinds:
    robert> +                if keys[kind]:
    robert> +                    last_object = None
    robert> +                    for record in getattr(self, kind).check(keys=keys[kind]):
    robert> +                        if record.storage_kind == 'absent':
    robert> +                            checker._report_items.append(
    robert> +                                'Missing inventory {%s}' % (record.key,))

Also do you miss a test that would have caught that ?
...
    robert> +    def _check_record(self, kind, record, checker, last_object, item_data):
...
    robert> +        elif kind == 'chk_bytes':
    robert> +            # No code written to check chk_bytes for this repo format.

Is that a TODO or a FIXME ?
...
    robert> +    def _check_text(self, record, checker, item_data):
    robert> +        """Check a single text."""
    robert> +        # Check it is extractable.
    robert> +        # TODO: check length.

Add '-- lifeless <aaaammjj>' please.
...
    robert> @@ -3950,6 +4096,18 @@
    robert>              revision_id) tuples for versions that are present in this versioned
    robert>              file, but not used by the corresponding inventory.
    robert>          """
    robert> +        local_progress = None
    robert> +        if progress_bar is None:
    robert> +            local_progress = ui.ui_factory.nested_progress_bar()
    robert> +            progress_bar = local_progress
    robert> +        try:
    robert> +            return self._check_file_version_parents(texts, progress_bar)
    robert> +        finally:
    robert> +            if local_progress:
    robert> +                local_progress.finished()
    robert> +
    robert> +    def _check_file_version_parents(self, texts, progress_bar):
    robert> +        """See check_file_version_parents."""
    robert>          wrong_parents = {}
    robert>          self.file_ids = set([file_id for file_id, _ in
    robert>              self.text_index.iterkeys()])
    robert> @@ -3964,8 +4122,7 @@
    robert>          text_keys = self.repository.texts.keys()
    robert>          unused_keys = frozenset(text_keys) - set(self.text_index)
    robert>          for num, key in enumerate(self.text_index.iterkeys()):
    robert> -            if progress_bar is not None:
    robert> -                progress_bar.update('checking text graph', num, n_versions)
    robert> +            progress_bar.update('checking text graph', num, n_versions)
    robert>              correct_parents = self.calculate_file_version_parents(key)
    robert>              try:
    robert>                  knit_parents = parent_map[key]

12 lines added for 1 deleted ? Is that a debug attempt ? Do you
really want to land that ? Why ?

...
    robert> === modified file 'bzrlib/tests/branch_implementations/test_check.py'
... 
    robert> -from bzrlib import errors
    robert> +from StringIO import StringIO

Why not cStringIO ?
...
    robert> +    def make_refs(self, branch):

Nice, but seems to duplicate a bit too much real code :-/

robert> === modified file 'bzrlib/tests/per_repository/test_check.py'
...
    robert> -        # XXX: check requires a non-empty revision IDs list, but it ignores the
    robert> -        # contents of it!

<Cough> It looks like you didn't address the XXX 
(s/revid1,revid2/'ignored'/ below doesn't make the test fail), why
deleting it then ?

robert> -        check_object = tree.branch.repository.check(['ignored'])
    robert> +        revid2 = tree.commit('2')
    robert> +        check_object = tree.branch.repository.check([revid1, revid2])

...
    robert> === modified file 'bzrlib/tests/workingtree_implementations/__init__.py'
...
    robert>  def load_tests(standard_tests, module, loader):
...
    robert> +        'bzrlib.tests.workingtree_implementations.test_' + name for
    robert> +        name in test_names]
 
+1 ;-D

May be good to say that the above are due to hardware or OS
errors while the following are more likely due to bzr bugs.

« Back to merge proposal