Merge lp:~mbp/bzr/415508-content-filtering into lp:bzr/1.18

Proposed by Martin Pool
Status: Merged
Merged at revision: not available
Proposed branch: lp:~mbp/bzr/415508-content-filtering
Merge into: lp:bzr/1.18
Diff against target: 345 lines
To merge this branch: bzr merge lp:~mbp/bzr/415508-content-filtering
Reviewer Review Type Date Requested Status
Ian Clatworthy (community) Approve
John A Meinel (community) Approve
Review via email: mp+10640@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Martin Pool (mbp) wrote :

fix for https://bugs.edge.launchpad.net/bzr/+bug/415508:

* when getting the contents_summary on a file, don't return the size because it's not needed and it's not cheap to compute for content-filtered trees

Revision history for this message
John A Meinel (jameinel) wrote :

I think this patch is good.

I don't see a test added that would catch the bug that we discovered. Namely, that we *don't* add a new node for texts that have a content filter, but which have not otherwise changed.

I would like to see something along those lines, but I'm not going to block this patch on it, because I think the bug we have right now is pretty huge and needs fixing.

review: Approve
Revision history for this message
Martin Pool (mbp) wrote :

I need an additional test change, like this, for the path_content_summary tests:

=== modified file 'bzrlib/tests/per_tree/test_path_content_summary.py'
--- bzrlib/tests/per_tree/test_path_content_summary.py 2009-07-10 07:14:02 +0000
+++ bzrlib/tests/per_tree/test_path_content_summary.py 2009-08-25 05:55:44 +0000
@@ -1,4 +1,4 @@
-# Copyright (C) 2007 Canonical Ltd
+# Copyright (C) 2007, 2009 Canonical Ltd
 #
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
@@ -34,6 +34,18 @@
         self.addCleanup(result.unlock)
         return result

+ def check_content_summary_size(self, tree, summary, expected_size):
+ # if the tree supports content filters, then it's allowed to leave out
+ # the size because it might be difficult to compute. otherwise, it
+ # must be present and correct
+ returned_size = summary[1]
+ if returned_size == expected_size or (
+ tree.supports_content_filtering()
+ and returned_size is None):
+ pass
+ else:
+ self.fail("invalid size in summary: %r" % (returned_size,))
+
     def test_symlink_content_summary(self):
         self.requireFeature(tests.SymlinkFeature)
         tree = self.make_branch_and_tree('tree')
@@ -76,8 +88,7 @@
         summary = self._convert_tree(tree).path_content_summary('path')
         self.assertEqual(4, len(summary))
         self.assertEqual('file', summary[0])
- # size must be known
- self.assertEqual(22, summary[1])
+ self.check_content_summary_size(tree, summary, 22)
         # executable
         self.assertEqual(True, summary[2])
         # may have hash,
@@ -91,8 +102,7 @@
         summary = self._convert_tree(tree).path_content_summary('path')
         self.assertEqual(4, len(summary))
         self.assertEqual('file', summary[0])
- # size must be known
- self.assertEqual(22, summary[1])
+ self.check_content_summary_size(tree, summary, 22)
         # not executable
         if osutils.supports_executable:
             self.assertEqual(False, summary[2])

Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :

The code and tests look good to me. The docstring and code in _path_content_summary() aren't obviously in sync but I don't have good ideas for improving either. Perhaps dirstate_sha1 should be called dirstate_fingerprint instead?

The NEWS items needs a trivial tweak: 'of file' -> 'of a file'.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'NEWS'
--- NEWS 2009-08-20 08:38:09 +0000
+++ NEWS 2009-08-26 06:35:38 +0000
@@ -7,6 +7,24 @@
7 :depth: 17 :depth: 1
88
99
10bzr 1.18.1 NOT RELEASED YET
11###########################
12
13Bug Fixes
14*********
15
16* Fixed a problem where using content filtering and especially end-of-line
17 conversion will commit too many copies a file.
18 (Martin Pool, #415508)
19
20API Changes
21***********
22
23* ``Tree.path_content_summary`` may return a size of None, when called on
24 a tree with content filtering where the size of the canonical form
25 cannot be cheaply determined. (Martin Pool)
26
27
10bzr 1.1828bzr 1.18
11########29########
1230
1331
=== modified file 'bzrlib/commit.py'
--- bzrlib/commit.py 2009-07-15 05:54:37 +0000
+++ bzrlib/commit.py 2009-08-26 06:35:39 +0000
@@ -802,10 +802,11 @@
802 # _update_builder_with_changes.802 # _update_builder_with_changes.
803 continue803 continue
804 content_summary = self.work_tree.path_content_summary(path)804 content_summary = self.work_tree.path_content_summary(path)
805 kind = content_summary[0]
805 # Note that when a filter of specific files is given, we must only806 # Note that when a filter of specific files is given, we must only
806 # skip/record deleted files matching that filter.807 # skip/record deleted files matching that filter.
807 if not specific_files or is_inside_any(specific_files, path):808 if not specific_files or is_inside_any(specific_files, path):
808 if content_summary[0] == 'missing':809 if kind == 'missing':
809 if not deleted_paths:810 if not deleted_paths:
810 # path won't have been split yet.811 # path won't have been split yet.
811 path_segments = splitpath(path)812 path_segments = splitpath(path)
@@ -818,23 +819,20 @@
818 continue819 continue
819 # TODO: have the builder do the nested commit just-in-time IF and820 # TODO: have the builder do the nested commit just-in-time IF and
820 # only if needed.821 # only if needed.
821 if content_summary[0] == 'tree-reference':822 if kind == 'tree-reference':
822 # enforce repository nested tree policy.823 # enforce repository nested tree policy.
823 if (not self.work_tree.supports_tree_reference() or824 if (not self.work_tree.supports_tree_reference() or
824 # repository does not support it either.825 # repository does not support it either.
825 not self.branch.repository._format.supports_tree_reference):826 not self.branch.repository._format.supports_tree_reference):
826 content_summary = ('directory',) + content_summary[1:]827 kind = 'directory'
827 kind = content_summary[0]828 content_summary = (kind, None, None, None)
828 # TODO: specific_files filtering before nested tree processing829 elif self.recursive == 'down':
829 if kind == 'tree-reference':
830 if self.recursive == 'down':
831 nested_revision_id = self._commit_nested_tree(830 nested_revision_id = self._commit_nested_tree(
832 file_id, path)831 file_id, path)
833 content_summary = content_summary[:3] + (832 content_summary = (kind, None, None, nested_revision_id)
834 nested_revision_id,)
835 else:833 else:
836 content_summary = content_summary[:3] + (834 nested_revision_id = self.work_tree.get_reference_revision(file_id)
837 self.work_tree.get_reference_revision(file_id),)835 content_summary = (kind, None, None, nested_revision_id)
838836
839 # Record an entry for this item837 # Record an entry for this item
840 # Note: I don't particularly want to have the existing_ie838 # Note: I don't particularly want to have the existing_ie
841839
=== modified file 'bzrlib/repository.py'
--- bzrlib/repository.py 2009-08-06 02:23:37 +0000
+++ bzrlib/repository.py 2009-08-26 06:35:39 +0000
@@ -464,9 +464,9 @@
464 if content_summary[2] is None:464 if content_summary[2] is None:
465 raise ValueError("Files must not have executable = None")465 raise ValueError("Files must not have executable = None")
466 if not store:466 if not store:
467 if (# if the file length changed we have to store:467 # We can't trust a check of the file length because of content
468 parent_entry.text_size != content_summary[1] or468 # filtering...
469 # if the exec bit has changed we have to store:469 if (# if the exec bit has changed we have to store:
470 parent_entry.executable != content_summary[2]):470 parent_entry.executable != content_summary[2]):
471 store = True471 store = True
472 elif parent_entry.text_sha1 == content_summary[3]:472 elif parent_entry.text_sha1 == content_summary[3]:
@@ -539,6 +539,9 @@
539 ie.revision = parent_entry.revision539 ie.revision = parent_entry.revision
540 return self._get_delta(ie, basis_inv, path), False, None540 return self._get_delta(ie, basis_inv, path), False, None
541 ie.reference_revision = content_summary[3]541 ie.reference_revision = content_summary[3]
542 if ie.reference_revision is None:
543 raise AssertionError("invalid content_summary for nested tree: %r"
544 % (content_summary,))
542 self._add_text_to_weave(ie.file_id, '', heads, None)545 self._add_text_to_weave(ie.file_id, '', heads, None)
543 else:546 else:
544 raise NotImplementedError('unknown kind')547 raise NotImplementedError('unknown kind')
545548
=== modified file 'bzrlib/tests/per_tree/test_path_content_summary.py'
--- bzrlib/tests/per_tree/test_path_content_summary.py 2009-07-10 07:14:02 +0000
+++ bzrlib/tests/per_tree/test_path_content_summary.py 2009-08-26 06:35:39 +0000
@@ -1,4 +1,4 @@
1# Copyright (C) 2007 Canonical Ltd1# Copyright (C) 2007, 2009 Canonical Ltd
2#2#
3# This program is free software; you can redistribute it and/or modify3# This program is free software; you can redistribute it and/or modify
4# it under the terms of the GNU General Public License as published by4# it under the terms of the GNU General Public License as published by
@@ -34,6 +34,18 @@
34 self.addCleanup(result.unlock)34 self.addCleanup(result.unlock)
35 return result35 return result
3636
37 def check_content_summary_size(self, tree, summary, expected_size):
38 # if the tree supports content filters, then it's allowed to leave out
39 # the size because it might be difficult to compute. otherwise, it
40 # must be present and correct
41 returned_size = summary[1]
42 if returned_size == expected_size or (
43 tree.supports_content_filtering()
44 and returned_size is None):
45 pass
46 else:
47 self.fail("invalid size in summary: %r" % (returned_size,))
48
37 def test_symlink_content_summary(self):49 def test_symlink_content_summary(self):
38 self.requireFeature(tests.SymlinkFeature)50 self.requireFeature(tests.SymlinkFeature)
39 tree = self.make_branch_and_tree('tree')51 tree = self.make_branch_and_tree('tree')
@@ -76,8 +88,7 @@
76 summary = self._convert_tree(tree).path_content_summary('path')88 summary = self._convert_tree(tree).path_content_summary('path')
77 self.assertEqual(4, len(summary))89 self.assertEqual(4, len(summary))
78 self.assertEqual('file', summary[0])90 self.assertEqual('file', summary[0])
79 # size must be known91 self.check_content_summary_size(tree, summary, 22)
80 self.assertEqual(22, summary[1])
81 # executable92 # executable
82 self.assertEqual(True, summary[2])93 self.assertEqual(True, summary[2])
83 # may have hash,94 # may have hash,
@@ -91,8 +102,7 @@
91 summary = self._convert_tree(tree).path_content_summary('path')102 summary = self._convert_tree(tree).path_content_summary('path')
92 self.assertEqual(4, len(summary))103 self.assertEqual(4, len(summary))
93 self.assertEqual('file', summary[0])104 self.assertEqual('file', summary[0])
94 # size must be known105 self.check_content_summary_size(tree, summary, 22)
95 self.assertEqual(22, summary[1])
96 # not executable106 # not executable
97 if osutils.supports_executable:107 if osutils.supports_executable:
98 self.assertEqual(False, summary[2])108 self.assertEqual(False, summary[2])
99109
=== modified file 'bzrlib/tests/per_workingtree/test_content_filters.py'
--- bzrlib/tests/per_workingtree/test_content_filters.py 2009-07-10 07:14:02 +0000
+++ bzrlib/tests/per_workingtree/test_content_filters.py 2009-08-26 06:35:39 +0000
@@ -43,6 +43,25 @@
43 return _converter_helper(chunks, 'lower')43 return _converter_helper(chunks, 'lower')
4444
4545
46_trailer_string = '\nend string\n'
47
48
49def _append_text(chunks, context=None):
50 """A content filter that appends a string to the end of the file.
51
52 This tests filters that change the length."""
53 return chunks + [_trailer_string]
54
55
56def _remove_appended_text(chunks, context=None):
57 """Remove the appended text."""
58
59 text = ''.join(chunks)
60 if text.endswith(_trailer_string):
61 text = text[:-len(_trailer_string)]
62 return [text]
63
64
46class TestWorkingTreeWithContentFilters(TestCaseWithWorkingTree):65class TestWorkingTreeWithContentFilters(TestCaseWithWorkingTree):
4766
48 def create_cf_tree(self, txt_reader, txt_writer, dir='.'):67 def create_cf_tree(self, txt_reader, txt_writer, dir='.'):
@@ -159,3 +178,34 @@
159 self.assertFileEqual("fOO tXT", 'target/file1.txt')178 self.assertFileEqual("fOO tXT", 'target/file1.txt')
160 changes = target.changes_from(source.basis_tree())179 changes = target.changes_from(source.basis_tree())
161 self.assertFalse(changes.has_changed())180 self.assertFalse(changes.has_changed())
181
182 def test_path_content_summary(self):
183 """path_content_summary should always talk about the canonical form."""
184 # see https://bugs.edge.launchpad.net/bzr/+bug/415508
185 #
186 # set up a tree where the canonical form has a string added to the
187 # end
188 source, txt_fileid, bin_fileid = self.create_cf_tree(
189 txt_reader=_append_text,
190 txt_writer=_remove_appended_text,
191 dir='source')
192 if not source.supports_content_filtering():
193 return
194 source.lock_read()
195 self.addCleanup(source.unlock)
196
197 expected_canonical_form = 'Foo Txt\nend string\n'
198 self.assertEquals(source.get_file(txt_fileid, filtered=True).read(),
199 expected_canonical_form)
200 self.assertEquals(source.get_file(txt_fileid, filtered=False).read(),
201 'Foo Txt')
202
203 # results are: kind, size, executable, sha1_or_link_target
204 result = source.path_content_summary('file1.txt')
205
206 self.assertEquals(result,
207 ('file', None, False, None))
208
209 # we could give back the length of the canonical form, but in general
210 # that will be expensive to compute, so it's acceptable to just return
211 # None.
162212
=== modified file 'bzrlib/tree.py'
--- bzrlib/tree.py 2009-07-17 06:04:35 +0000
+++ bzrlib/tree.py 2009-08-26 06:35:39 +0000
@@ -218,10 +218,14 @@
218 def path_content_summary(self, path):218 def path_content_summary(self, path):
219 """Get a summary of the information about path.219 """Get a summary of the information about path.
220220
221 All the attributes returned are for the canonical form, not the
222 convenient form (if content filters are in use.)
223
221 :param path: A relative path within the tree.224 :param path: A relative path within the tree.
222 :return: A tuple containing kind, size, exec, sha1-or-link.225 :return: A tuple containing kind, size, exec, sha1-or-link.
223 Kind is always present (see tree.kind()).226 Kind is always present (see tree.kind()).
224 size is present if kind is file, None otherwise.227 size is present if kind is file and the size of the
228 canonical form can be cheaply determined, None otherwise.
225 exec is None unless kind is file and the platform supports the 'x'229 exec is None unless kind is file and the platform supports the 'x'
226 bit.230 bit.
227 sha1-or-link is the link target if kind is symlink, or the sha1 if231 sha1-or-link is the link target if kind is symlink, or the sha1 if
228232
=== modified file 'bzrlib/workingtree.py'
--- bzrlib/workingtree.py 2009-08-04 04:36:34 +0000
+++ bzrlib/workingtree.py 2009-08-26 06:35:39 +0000
@@ -613,6 +613,8 @@
613613
614 def get_file_size(self, file_id):614 def get_file_size(self, file_id):
615 """See Tree.get_file_size"""615 """See Tree.get_file_size"""
616 # XXX: this returns the on-disk size; it should probably return the
617 # canonical size
616 try:618 try:
617 return os.path.getsize(self.id2abspath(file_id))619 return os.path.getsize(self.id2abspath(file_id))
618 except OSError, e:620 except OSError, e:
@@ -749,11 +751,7 @@
749 raise751 raise
750 kind = _mapper(stat_result.st_mode)752 kind = _mapper(stat_result.st_mode)
751 if kind == 'file':753 if kind == 'file':
752 size = stat_result.st_size754 return self._file_content_summary(path, stat_result)
753 # try for a stat cache lookup
754 executable = self._is_executable_from_path_and_stat(path, stat_result)
755 return (kind, size, executable, self._sha_from_stat(
756 path, stat_result))
757 elif kind == 'directory':755 elif kind == 'directory':
758 # perhaps it looks like a plain directory, but it's really a756 # perhaps it looks like a plain directory, but it's really a
759 # reference.757 # reference.
@@ -766,6 +764,13 @@
766 else:764 else:
767 return (kind, None, None, None)765 return (kind, None, None, None)
768766
767 def _file_content_summary(self, path, stat_result):
768 size = stat_result.st_size
769 executable = self._is_executable_from_path_and_stat(path, stat_result)
770 # try for a stat cache lookup
771 return ('file', size, executable, self._sha_from_stat(
772 path, stat_result))
773
769 def _check_parents_for_ghosts(self, revision_ids, allow_leftmost_as_ghost):774 def _check_parents_for_ghosts(self, revision_ids, allow_leftmost_as_ghost):
770 """Common ghost checking functionality from set_parent_*.775 """Common ghost checking functionality from set_parent_*.
771776
772777
=== modified file 'bzrlib/workingtree_4.py'
--- bzrlib/workingtree_4.py 2009-08-04 11:25:23 +0000
+++ bzrlib/workingtree_4.py 2009-08-26 06:35:39 +0000
@@ -1300,6 +1300,32 @@
1300 return statvalue, sha11300 return statvalue, sha1
13011301
13021302
1303class ContentFilteringDirStateWorkingTree(DirStateWorkingTree):
1304 """Dirstate working tree that supports content filtering.
1305
1306 The dirstate holds the hash and size of the canonical form of the file,
1307 and most methods must return that.
1308 """
1309
1310 def _file_content_summary(self, path, stat_result):
1311 # This is to support the somewhat obsolete path_content_summary method
1312 # with content filtering: see
1313 # <https://bugs.edge.launchpad.net/bzr/+bug/415508>.
1314 #
1315 # If the dirstate cache is up to date and knows the hash and size,
1316 # return that.
1317 # Otherwise if there are no content filters, return the on-disk size
1318 # and leave the hash blank.
1319 # Otherwise, read and filter the on-disk file and use its size and
1320 # hash.
1321 #
1322 # The dirstate doesn't store the size of the canonical form so we
1323 # can't trust it for content-filtered trees. We just return None.
1324 dirstate_sha1 = self._dirstate.sha1_from_stat(path, stat_result)
1325 executable = self._is_executable_from_path_and_stat(path, stat_result)
1326 return ('file', None, executable, dirstate_sha1)
1327
1328
1303class WorkingTree4(DirStateWorkingTree):1329class WorkingTree4(DirStateWorkingTree):
1304 """This is the Format 4 working tree.1330 """This is the Format 4 working tree.
13051331
@@ -1313,7 +1339,7 @@
1313 """1339 """
13141340
13151341
1316class WorkingTree5(DirStateWorkingTree):1342class WorkingTree5(ContentFilteringDirStateWorkingTree):
1317 """This is the Format 5 working tree.1343 """This is the Format 5 working tree.
13181344
1319 This differs from WorkingTree4 by:1345 This differs from WorkingTree4 by:
@@ -1323,7 +1349,7 @@
1323 """1349 """
13241350
13251351
1326class WorkingTree6(DirStateWorkingTree):1352class WorkingTree6(ContentFilteringDirStateWorkingTree):
1327 """This is the Format 6 working tree.1353 """This is the Format 6 working tree.
13281354
1329 This differs from WorkingTree5 by:1355 This differs from WorkingTree5 by:
@@ -1550,7 +1576,11 @@
15501576
15511577
1552class DirStateRevisionTree(Tree):1578class DirStateRevisionTree(Tree):
1553 """A revision tree pulling the inventory from a dirstate."""1579 """A revision tree pulling the inventory from a dirstate.
1580
1581 Note that this is one of the historical (ie revision) trees cached in the
1582 dirstate for easy access, not the workingtree.
1583 """
15541584
1555 def __init__(self, dirstate, revision_id, repository):1585 def __init__(self, dirstate, revision_id, repository):
1556 self._dirstate = dirstate1586 self._dirstate = dirstate

Subscribers

People subscribed via source and target branches