Merge lp:~spiv/bzr/hardlink-2a-408193 into lp:bzr

Proposed by Andrew Bennetts
Status: Merged
Approved by: Martin Pool
Approved revision: not available
Merged at revision: not available
Proposed branch: lp:~spiv/bzr/hardlink-2a-408193
Merge into: lp:bzr
Diff against target: 202 lines (+60/-40)
8 files modified
NEWS (+4/-0)
bzrlib/builtins.py (+0/-3)
bzrlib/tests/blackbox/test_branch.py (+1/-9)
bzrlib/tests/blackbox/test_checkout.py (+1/-9)
bzrlib/tests/per_tree/test_iter_search_rules.py (+0/-1)
bzrlib/tests/test_transform.py (+45/-2)
bzrlib/transform.py (+6/-2)
bzrlib/workingtree_4.py (+3/-14)
To merge this branch: bzr merge lp:~spiv/bzr/hardlink-2a-408193
Reviewer Review Type Date Requested Status
Martin Pool Approve
John A Meinel Approve
Review via email: mp+15591@code.launchpad.net

This proposal supersedes a proposal from 2009-11-27.

To post a comment you must log in.
Revision history for this message
Andrew Bennetts (spiv) wrote : Posted in a previous version of this proposal

This tries to fix bug 408193, --hardlink not working on recent tree formats. I'm not certain this is a correct fix, as I haven't looked at content filtering code before, so I'd love some careful feedback on this approach.

This patch doesn't seem to break any tests, and it doesn't seem to change performance for the worse. For formats that don't support content filtering, nothing changes. For formats that do, this (assuming I've written this correctly) looks at each file to see if any filtering rules apply to it and if so skips hardlinking of it.

In my testing, if you have no rules configured on a 2a tree, on bzr trees it gives the same performance and hardlinks as --hardlink of a 1.9 format tree. (Both cases with 2a branch and repo, just varying tree format.)

In practice, for me, this makes branching 2a format bzr.dev trees speed up from 3s to 1s. Plus of course uses less disk.

This patch almost certainly deserves new tests. I'd appreciate some suggestions about which tests to add, and where. I guess maybe in test_transform?

Revision history for this message
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal

On Fri, 2009-11-27 at 00:50 +0000, Andrew Bennetts wrote:

> This patch almost certainly deserves new tests. I'd appreciate some suggestions about which tests to add, and where. I guess maybe in test_transform?

That sounds appropriate to me, and the approach you've taken sounds
plausible.

Revision history for this message
Martin Pool (mbp) wrote : Posted in a previous version of this proposal

news please.

This looks like a plausible fix to me. I probably wouldn't trust it to go into 2.0 though, at least not without a shakedown in trunk.

There is another test relevant to this in test_checkout so you should at least reenable that.

I guess ideally you want a test that shows that things are not hardlinked when the filters cause them to have different content. However, you could make a case that we should optimistically land this and then see if things actually fail that way.

review: Needs Fixing
Revision history for this message
Andrew Bennetts (spiv) wrote :

Thanks very much for the reviews so far. This patch adds a test, as well as the other changes suggested in the earlier reviews.

The test is a bit ugly, but I'm not sure if there's a better way to write it. There are quite a few layers involved in the content filtering APIs. I think it's probably ok, but an extra review won't hurt :)

I agree that this is a bit risky for 2.0, so I wouldn't propose backporting it without a) someone clamouring for it, and b) time for it to be tested in trunk.

Revision history for this message
John A Meinel (jameinel) wrote :

46 + self.assertEqual(source_stat, target_stat)

^- It is generally better to use "self.assertEqualStat()".

I realize it isn't strictly required here, since you aren't doing an fstat versus lstat, etc.

118 + os.mkdir(self.test_home_dir + '/.bazaar')
119 + rules_filename = self.test_home_dir + '/.bazaar/rules'
120 + f = open(rules_filename, 'w')
121 + f.write('[name %s]\nrot13=yes\n' % (pattern,))
122 + f.close()

^- Could you use ".build_tree()" and ".build_tree_contents()" here? I'll also note that you should at least use 'wb', since I don't think you want to get \r\n content on Windows. (maybe this doesn't matter as much since it is a config file?)

Would it be easier to use something like "fn = config.get_rules_filename()" ? I don't know how that path is currently determined, but it doesn't seem like you should be going via self.test_home_dir. (If there isn't an api for it, just submit a bug, and leave the test alone.)

Otherwise I think the change is reasonable.

review: Approve
Revision history for this message
Andrew Bennetts (spiv) wrote :

John A Meinel wrote:
> Review: Approve
> 46 + self.assertEqual(source_stat, target_stat)
>
> ^- It is generally better to use "self.assertEqualStat()".
>
> I realize it isn't strictly required here, since you aren't doing an fstat
> versus lstat, etc.

Ok, although there's no assertNotEqualStat, so I'll leave the corresponding
assertNotEqual as it is.

> 118 + os.mkdir(self.test_home_dir + '/.bazaar')
> 119 + rules_filename = self.test_home_dir + '/.bazaar/rules'
> 120 + f = open(rules_filename, 'w')
> 121 + f.write('[name %s]\nrot13=yes\n' % (pattern,))
> 122 + f.close()
>
> ^- Could you use ".build_tree()" and ".build_tree_contents()" here? I'll also
> note that you should at least use 'wb', since I don't think you want to get
> \r\n content on Windows. (maybe this doesn't matter as much since it is a
> config file?)

Good point about 'wb', changed (although I don't know the precise rules for
newlines in config files).

I cannot easily use build_tree_contents, because doesn't provide a convenient
way to build a tree relative to a dir other than '.'. Perhaps it should be
updated to take an optional transport and use put_bytes_non_atomic like
TestCase.build_tree? And I still need to remove or rename the file afterwards
before I call reset_rules()... probably that's something that should be handled
by the base TestCase, but it isn't at the moment.

> Would it be easier to use something like "fn = config.get_rules_filename()" ?
> I don't know how that path is currently determined, but it doesn't seem like
> you should be going via self.test_home_dir. (If there isn't an api for it,
> just submit a bug, and leave the test alone.)
>
> Otherwise I think the change is reasonable.

Hmm, there's bzrlib.rules.rules_filename(). I kind of like the safety of using
the test_home_dir explicitly though, it's easy to imagine a naïve optimisation
of rules_filename() caching and returning the wrong value — after all I need to
call reset_rules() because of similar caching. I might be too pessimistic
though. Anyway, I'd still need to makedirs(os.path.dirname(fn)), so it wouldn't
end up being much easier.

You do have a point that ideally I'd have essentially a one-liner:

     write_file(rules_filename(), '''...''')

And instead I have 5 lines, even ignoring the reset_rules call and cleanups.

So this is something to revisit if we write more tests that want to temporarily
install some rules, but I hope it's good enough for now.

-Andrew.

Revision history for this message
Martin Pool (mbp) wrote :

If I touched this again I'd probably add assertIsHardLinked. Not needed now. (And this is a case where testtools's assertIs would be nice, so you didn't need to separately write assertIsNotHardLinked.)

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'NEWS'
--- NEWS 2009-12-03 02:24:54 +0000
+++ NEWS 2009-12-04 06:17:11 +0000
@@ -48,6 +48,10 @@
48* Terminate ssh subprocesses when no references to them remain, fixing48* Terminate ssh subprocesses when no references to them remain, fixing
49 subprocess and file descriptor leaks. (Andrew Bennetts, #426662)49 subprocess and file descriptor leaks. (Andrew Bennetts, #426662)
50 50
51* The ``--hardlink`` option of ``bzr branch`` and ``bzr checkout`` now
52 works for 2a format trees. Only files unaffected by content filters
53 will be hardlinked. (Andrew Bennetts, #408193)
54
51* The new glob expansion on Windows would replace all ``\`` characters55* The new glob expansion on Windows would replace all ``\`` characters
52 with ``/`` even if it there wasn't a glob to expand, the arg was quoted,56 with ``/`` even if it there wasn't a glob to expand, the arg was quoted,
53 etc. Now only change slashes if there is something being glob expanded.57 etc. Now only change slashes if there is something being glob expanded.
5458
=== modified file 'bzrlib/builtins.py'
--- bzrlib/builtins.py 2009-12-03 20:56:18 +0000
+++ bzrlib/builtins.py 2009-12-04 06:17:11 +0000
@@ -1209,9 +1209,6 @@
1209 from bzrlib.tag import _merge_tags_if_possible1209 from bzrlib.tag import _merge_tags_if_possible
1210 accelerator_tree, br_from = bzrdir.BzrDir.open_tree_or_branch(1210 accelerator_tree, br_from = bzrdir.BzrDir.open_tree_or_branch(
1211 from_location)1211 from_location)
1212 if (accelerator_tree is not None and
1213 accelerator_tree.supports_content_filtering()):
1214 accelerator_tree = None
1215 revision = _get_one_revision('branch', revision)1212 revision = _get_one_revision('branch', revision)
1216 br_from.lock_read()1213 br_from.lock_read()
1217 try:1214 try:
12181215
=== modified file 'bzrlib/tests/blackbox/test_branch.py'
--- bzrlib/tests/blackbox/test_branch.py 2009-08-20 12:30:49 +0000
+++ bzrlib/tests/blackbox/test_branch.py 2009-12-04 06:17:11 +0000
@@ -171,15 +171,7 @@
171 out, err = self.run_bzr(['branch', 'source', 'target', '--hardlink'])171 out, err = self.run_bzr(['branch', 'source', 'target', '--hardlink'])
172 source_stat = os.stat('source/file1')172 source_stat = os.stat('source/file1')
173 target_stat = os.stat('target/file1')173 target_stat = os.stat('target/file1')
174 same_file = (source_stat == target_stat)174 self.assertEqual(source_stat, target_stat)
175 if same_file:
176 pass
177 else:
178 # https://bugs.edge.launchpad.net/bzr/+bug/408193
179 self.assertContainsRe(err, "hardlinking working copy files is "
180 "not currently supported")
181 raise KnownFailure("--hardlink doesn't work in formats "
182 "that support content filtering (#408193)")
183175
184 def test_branch_standalone(self):176 def test_branch_standalone(self):
185 shared_repo = self.make_repository('repo', shared=True)177 shared_repo = self.make_repository('repo', shared=True)
186178
=== modified file 'bzrlib/tests/blackbox/test_checkout.py'
--- bzrlib/tests/blackbox/test_checkout.py 2009-08-03 04:19:03 +0000
+++ bzrlib/tests/blackbox/test_checkout.py 2009-12-04 06:17:11 +0000
@@ -160,12 +160,4 @@
160 '--hardlink'])160 '--hardlink'])
161 source_stat = os.stat('source/file1')161 source_stat = os.stat('source/file1')
162 target_stat = os.stat('target/file1')162 target_stat = os.stat('target/file1')
163 same_file = (source_stat == target_stat)163 self.assertEqual(source_stat, target_stat)
164 if same_file:
165 pass
166 else:
167 # https://bugs.edge.launchpad.net/bzr/+bug/408193
168 self.assertContainsRe(err, "hardlinking working copy files is "
169 "not currently supported")
170 raise KnownFailure("--hardlink doesn't work in formats "
171 "that support content filtering (#408193)")
172164
=== modified file 'bzrlib/tests/per_tree/test_iter_search_rules.py'
--- bzrlib/tests/per_tree/test_iter_search_rules.py 2009-07-10 07:14:02 +0000
+++ bzrlib/tests/per_tree/test_iter_search_rules.py 2009-12-04 06:17:11 +0000
@@ -18,7 +18,6 @@
1818
19from bzrlib import (19from bzrlib import (
20 rules,20 rules,
21 tests,
22 )21 )
23from bzrlib.tests.per_tree import TestCaseWithTree22from bzrlib.tests.per_tree import TestCaseWithTree
2423
2524
=== modified file 'bzrlib/tests/test_transform.py'
--- bzrlib/tests/test_transform.py 2009-11-18 16:10:18 +0000
+++ bzrlib/tests/test_transform.py 2009-12-04 06:17:11 +0000
@@ -15,18 +15,18 @@
15# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA15# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
1616
17import os17import os
18import stat
19from StringIO import StringIO18from StringIO import StringIO
20import sys19import sys
2120
22from bzrlib import (21from bzrlib import (
23 bencode,22 bencode,
24 errors,23 errors,
24 filters,
25 generate_ids,25 generate_ids,
26 osutils,26 osutils,
27 progress,27 progress,
28 revision as _mod_revision,28 revision as _mod_revision,
29 symbol_versioning,29 rules,
30 tests,30 tests,
31 urlutils,31 urlutils,
32 )32 )
@@ -1868,6 +1868,49 @@
1868 self.assertEqual([], list(target.iter_changes(revision_tree)))1868 self.assertEqual([], list(target.iter_changes(revision_tree)))
1869 self.assertTrue(source.is_executable('file1-id'))1869 self.assertTrue(source.is_executable('file1-id'))
18701870
1871 def install_rot13_content_filter(self, pattern):
1872 original_registry = filters._reset_registry()
1873 def restore_registry():
1874 filters._reset_registry(original_registry)
1875 self.addCleanup(restore_registry)
1876 def rot13(chunks, context=None):
1877 return [''.join(chunks).encode('rot13')]
1878 rot13filter = filters.ContentFilter(rot13, rot13)
1879 filters.register_filter_stack_map('rot13', {'yes': [rot13filter]}.get)
1880 os.mkdir(self.test_home_dir + '/.bazaar')
1881 rules_filename = self.test_home_dir + '/.bazaar/rules'
1882 f = open(rules_filename, 'wb')
1883 f.write('[name %s]\nrot13=yes\n' % (pattern,))
1884 f.close()
1885 def uninstall_rules():
1886 os.remove(rules_filename)
1887 rules.reset_rules()
1888 self.addCleanup(uninstall_rules)
1889 rules.reset_rules()
1890
1891 def test_build_tree_content_filtered_files_are_not_hardlinked(self):
1892 """build_tree will not hardlink files that have content filtering rules
1893 applied to them (but will still hardlink other files from the same tree
1894 if it can).
1895 """
1896 self.requireFeature(HardlinkFeature)
1897 self.install_rot13_content_filter('file1')
1898 source = self.create_ab_tree()
1899 target = self.make_branch_and_tree('target')
1900 revision_tree = source.basis_tree()
1901 revision_tree.lock_read()
1902 self.addCleanup(revision_tree.unlock)
1903 build_tree(revision_tree, target, source, hardlink=True)
1904 target.lock_read()
1905 self.addCleanup(target.unlock)
1906 self.assertEqual([], list(target.iter_changes(revision_tree)))
1907 source_stat = os.stat('source/file1')
1908 target_stat = os.stat('target/file1')
1909 self.assertNotEqual(source_stat, target_stat)
1910 source_stat = os.stat('source/file2')
1911 target_stat = os.stat('target/file2')
1912 self.assertEqualStat(source_stat, target_stat)
1913
1871 def test_case_insensitive_build_tree_inventory(self):1914 def test_case_insensitive_build_tree_inventory(self):
1872 if (tests.CaseInsensitiveFilesystemFeature.available()1915 if (tests.CaseInsensitiveFilesystemFeature.available()
1873 or tests.CaseInsCasePresFilenameFeature.available()):1916 or tests.CaseInsCasePresFilenameFeature.available()):
18741917
=== modified file 'bzrlib/transform.py'
--- bzrlib/transform.py 2009-12-03 02:24:54 +0000
+++ bzrlib/transform.py 2009-12-04 06:17:11 +0000
@@ -2300,8 +2300,12 @@
2300 new_desired_files = desired_files2300 new_desired_files = desired_files
2301 else:2301 else:
2302 iter = accelerator_tree.iter_changes(tree, include_unchanged=True)2302 iter = accelerator_tree.iter_changes(tree, include_unchanged=True)
2303 unchanged = dict((f, p[1]) for (f, p, c, v, d, n, k, e)2303 unchanged = [(f, p[1]) for (f, p, c, v, d, n, k, e)
2304 in iter if not (c or e[0] != e[1]))2304 in iter if not (c or e[0] != e[1])]
2305 if accelerator_tree.supports_content_filtering():
2306 unchanged = [(f, p) for (f, p) in unchanged
2307 if not accelerator_tree.iter_search_rules([p]).next()]
2308 unchanged = dict(unchanged)
2305 new_desired_files = []2309 new_desired_files = []
2306 count = 02310 count = 0
2307 for file_id, (trans_id, tree_path) in desired_files:2311 for file_id, (trans_id, tree_path) in desired_files:
23082312
=== modified file 'bzrlib/workingtree_4.py'
--- bzrlib/workingtree_4.py 2009-11-14 11:06:51 +0000
+++ bzrlib/workingtree_4.py 2009-12-04 06:17:11 +0000
@@ -1447,21 +1447,10 @@
1447 if basis_root_id is not None:1447 if basis_root_id is not None:
1448 wt._set_root_id(basis_root_id)1448 wt._set_root_id(basis_root_id)
1449 wt.flush()1449 wt.flush()
1450 # If content filtering is supported, do not use the accelerator
1451 # tree - the cost of transforming the content both ways and
1452 # checking for changed content can outweight the gains it gives.
1453 # Note: do NOT move this logic up higher - using the basis from
1454 # the accelerator tree is still desirable because that can save
1455 # a minute or more of processing on large trees!
1456 # The original tree may not have the same content filters
1457 # applied so we can't safely build the inventory delta from
1458 # the source tree.
1459 if wt.supports_content_filtering():1450 if wt.supports_content_filtering():
1460 if hardlink:1451 # The original tree may not have the same content filters
1461 # see https://bugs.edge.launchpad.net/bzr/+bug/4081931452 # applied so we can't safely build the inventory delta from
1462 trace.warning("hardlinking working copy files is not currently "1453 # the source tree.
1463 "supported in %r" % (wt,))
1464 accelerator_tree = None
1465 delta_from_tree = False1454 delta_from_tree = False
1466 else:1455 else:
1467 delta_from_tree = True1456 delta_from_tree = True