Merge lp:~spiv/bzr/hardlink-2a-408193 into lp:bzr

Proposed by Andrew Bennetts
Status: Superseded
Proposed branch: lp:~spiv/bzr/hardlink-2a-408193
Merge into: lp:bzr
Diff against target: 202 lines (+60/-40)
8 files modified
NEWS (+4/-0)
bzrlib/builtins.py (+0/-3)
bzrlib/tests/blackbox/test_branch.py (+1/-9)
bzrlib/tests/blackbox/test_checkout.py (+1/-9)
bzrlib/tests/per_tree/test_iter_search_rules.py (+0/-1)
bzrlib/tests/test_transform.py (+45/-2)
bzrlib/transform.py (+6/-2)
bzrlib/workingtree_4.py (+3/-14)
To merge this branch: bzr merge lp:~spiv/bzr/hardlink-2a-408193
Reviewer Review Type Date Requested Status
Martin Pool Needs Fixing
Review via email: mp+15299@code.launchpad.net

This proposal has been superseded by a proposal from 2009-12-03.

To post a comment you must log in.
Revision history for this message
Andrew Bennetts (spiv) wrote :

This tries to fix bug 408193, --hardlink not working on recent tree formats. I'm not certain this is a correct fix, as I haven't looked at content filtering code before, so I'd love some careful feedback on this approach.

This patch doesn't seem to break any tests, and it doesn't seem to change performance for the worse. For formats that don't support content filtering, nothing changes. For formats that do, this (assuming I've written this correctly) looks at each file to see if any filtering rules apply to it and if so skips hardlinking of it.

In my testing, if you have no rules configured on a 2a tree, on bzr trees it gives the same performance and hardlinks as --hardlink of a 1.9 format tree. (Both cases with 2a branch and repo, just varying tree format.)

In practice, for me, this makes branching 2a format bzr.dev trees speed up from 3s to 1s. Plus of course uses less disk.

This patch almost certainly deserves new tests. I'd appreciate some suggestions about which tests to add, and where. I guess maybe in test_transform?

Revision history for this message
Robert Collins (lifeless) wrote :

On Fri, 2009-11-27 at 00:50 +0000, Andrew Bennetts wrote:

> This patch almost certainly deserves new tests. I'd appreciate some suggestions about which tests to add, and where. I guess maybe in test_transform?

That sounds appropriate to me, and the approach you've taken sounds
plausible.

Revision history for this message
Martin Pool (mbp) wrote :

news please.

This looks like a plausible fix to me. I probably wouldn't trust it to go into 2.0 though, at least not without a shakedown in trunk.

There is another test relevant to this in test_checkout so you should at least reenable that.

I guess ideally you want a test that shows that things are not hardlinked when the filters cause them to have different content. However, you could make a case that we should optimistically land this and then see if things actually fail that way.

review: Needs Fixing

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2009-12-03 02:24:54 +0000
3+++ NEWS 2009-12-03 06:01:13 +0000
4@@ -48,6 +48,10 @@
5 * Terminate ssh subprocesses when no references to them remain, fixing
6 subprocess and file descriptor leaks. (Andrew Bennetts, #426662)
7
8+* The ``--hardlink`` option of ``bzr branch`` and ``bzr checkout`` now
9+ works for 2a format trees. Only files unaffected by content filters
10+ will be hardlinked. (Andrew Bennetts, #408193)
11+
12 * The new glob expansion on Windows would replace all ``\`` characters
13 with ``/`` even if it there wasn't a glob to expand, the arg was quoted,
14 etc. Now only change slashes if there is something being glob expanded.
15
16=== modified file 'bzrlib/builtins.py'
17--- bzrlib/builtins.py 2009-12-02 23:14:10 +0000
18+++ bzrlib/builtins.py 2009-12-03 06:01:13 +0000
19@@ -1207,9 +1207,6 @@
20 from bzrlib.tag import _merge_tags_if_possible
21 accelerator_tree, br_from = bzrdir.BzrDir.open_tree_or_branch(
22 from_location)
23- if (accelerator_tree is not None and
24- accelerator_tree.supports_content_filtering()):
25- accelerator_tree = None
26 revision = _get_one_revision('branch', revision)
27 br_from.lock_read()
28 try:
29
30=== modified file 'bzrlib/tests/blackbox/test_branch.py'
31--- bzrlib/tests/blackbox/test_branch.py 2009-08-20 12:30:49 +0000
32+++ bzrlib/tests/blackbox/test_branch.py 2009-12-03 06:01:13 +0000
33@@ -171,15 +171,7 @@
34 out, err = self.run_bzr(['branch', 'source', 'target', '--hardlink'])
35 source_stat = os.stat('source/file1')
36 target_stat = os.stat('target/file1')
37- same_file = (source_stat == target_stat)
38- if same_file:
39- pass
40- else:
41- # https://bugs.edge.launchpad.net/bzr/+bug/408193
42- self.assertContainsRe(err, "hardlinking working copy files is "
43- "not currently supported")
44- raise KnownFailure("--hardlink doesn't work in formats "
45- "that support content filtering (#408193)")
46+ self.assertEqual(source_stat, target_stat)
47
48 def test_branch_standalone(self):
49 shared_repo = self.make_repository('repo', shared=True)
50
51=== modified file 'bzrlib/tests/blackbox/test_checkout.py'
52--- bzrlib/tests/blackbox/test_checkout.py 2009-08-03 04:19:03 +0000
53+++ bzrlib/tests/blackbox/test_checkout.py 2009-12-03 06:01:13 +0000
54@@ -160,12 +160,4 @@
55 '--hardlink'])
56 source_stat = os.stat('source/file1')
57 target_stat = os.stat('target/file1')
58- same_file = (source_stat == target_stat)
59- if same_file:
60- pass
61- else:
62- # https://bugs.edge.launchpad.net/bzr/+bug/408193
63- self.assertContainsRe(err, "hardlinking working copy files is "
64- "not currently supported")
65- raise KnownFailure("--hardlink doesn't work in formats "
66- "that support content filtering (#408193)")
67+ self.assertEqual(source_stat, target_stat)
68
69=== modified file 'bzrlib/tests/per_tree/test_iter_search_rules.py'
70--- bzrlib/tests/per_tree/test_iter_search_rules.py 2009-07-10 07:14:02 +0000
71+++ bzrlib/tests/per_tree/test_iter_search_rules.py 2009-12-03 06:01:13 +0000
72@@ -18,7 +18,6 @@
73
74 from bzrlib import (
75 rules,
76- tests,
77 )
78 from bzrlib.tests.per_tree import TestCaseWithTree
79
80
81=== modified file 'bzrlib/tests/test_transform.py'
82--- bzrlib/tests/test_transform.py 2009-11-18 16:10:18 +0000
83+++ bzrlib/tests/test_transform.py 2009-12-03 06:01:13 +0000
84@@ -15,18 +15,18 @@
85 # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
86
87 import os
88-import stat
89 from StringIO import StringIO
90 import sys
91
92 from bzrlib import (
93 bencode,
94 errors,
95+ filters,
96 generate_ids,
97 osutils,
98 progress,
99 revision as _mod_revision,
100- symbol_versioning,
101+ rules,
102 tests,
103 urlutils,
104 )
105@@ -1868,6 +1868,49 @@
106 self.assertEqual([], list(target.iter_changes(revision_tree)))
107 self.assertTrue(source.is_executable('file1-id'))
108
109+ def install_rot13_content_filter(self, pattern):
110+ original_registry = filters._reset_registry()
111+ def restore_registry():
112+ filters._reset_registry(original_registry)
113+ self.addCleanup(restore_registry)
114+ def rot13(chunks, context=None):
115+ return [''.join(chunks).encode('rot13')]
116+ rot13filter = filters.ContentFilter(rot13, rot13)
117+ filters.register_filter_stack_map('rot13', {'yes': [rot13filter]}.get)
118+ os.mkdir(self.test_home_dir + '/.bazaar')
119+ rules_filename = self.test_home_dir + '/.bazaar/rules'
120+ f = open(rules_filename, 'w')
121+ f.write('[name %s]\nrot13=yes\n' % (pattern,))
122+ f.close()
123+ def uninstall_rules():
124+ os.remove(rules_filename)
125+ rules.reset_rules()
126+ self.addCleanup(uninstall_rules)
127+ rules.reset_rules()
128+
129+ def test_build_tree_content_filtered_files_are_not_hardlinked(self):
130+ """build_tree will not hardlink files that have content filtering rules
131+ applied to them (but will still hardlink other files from the same tree
132+ if it can).
133+ """
134+ self.requireFeature(HardlinkFeature)
135+ self.install_rot13_content_filter('file1')
136+ source = self.create_ab_tree()
137+ target = self.make_branch_and_tree('target')
138+ revision_tree = source.basis_tree()
139+ revision_tree.lock_read()
140+ self.addCleanup(revision_tree.unlock)
141+ build_tree(revision_tree, target, source, hardlink=True)
142+ target.lock_read()
143+ self.addCleanup(target.unlock)
144+ self.assertEqual([], list(target.iter_changes(revision_tree)))
145+ source_stat = os.stat('source/file1')
146+ target_stat = os.stat('target/file1')
147+ self.assertNotEqual(source_stat, target_stat)
148+ source_stat = os.stat('source/file2')
149+ target_stat = os.stat('target/file2')
150+ self.assertEqual(source_stat, target_stat)
151+
152 def test_case_insensitive_build_tree_inventory(self):
153 if (tests.CaseInsensitiveFilesystemFeature.available()
154 or tests.CaseInsCasePresFilenameFeature.available()):
155
156=== modified file 'bzrlib/transform.py'
157--- bzrlib/transform.py 2009-12-03 02:24:54 +0000
158+++ bzrlib/transform.py 2009-12-03 06:01:13 +0000
159@@ -2300,8 +2300,12 @@
160 new_desired_files = desired_files
161 else:
162 iter = accelerator_tree.iter_changes(tree, include_unchanged=True)
163- unchanged = dict((f, p[1]) for (f, p, c, v, d, n, k, e)
164- in iter if not (c or e[0] != e[1]))
165+ unchanged = [(f, p[1]) for (f, p, c, v, d, n, k, e)
166+ in iter if not (c or e[0] != e[1])]
167+ if accelerator_tree.supports_content_filtering():
168+ unchanged = [(f, p) for (f, p) in unchanged
169+ if not accelerator_tree.iter_search_rules([p]).next()]
170+ unchanged = dict(unchanged)
171 new_desired_files = []
172 count = 0
173 for file_id, (trans_id, tree_path) in desired_files:
174
175=== modified file 'bzrlib/workingtree_4.py'
176--- bzrlib/workingtree_4.py 2009-11-14 11:06:51 +0000
177+++ bzrlib/workingtree_4.py 2009-12-03 06:01:13 +0000
178@@ -1447,21 +1447,10 @@
179 if basis_root_id is not None:
180 wt._set_root_id(basis_root_id)
181 wt.flush()
182- # If content filtering is supported, do not use the accelerator
183- # tree - the cost of transforming the content both ways and
184- # checking for changed content can outweight the gains it gives.
185- # Note: do NOT move this logic up higher - using the basis from
186- # the accelerator tree is still desirable because that can save
187- # a minute or more of processing on large trees!
188- # The original tree may not have the same content filters
189- # applied so we can't safely build the inventory delta from
190- # the source tree.
191 if wt.supports_content_filtering():
192- if hardlink:
193- # see https://bugs.edge.launchpad.net/bzr/+bug/408193
194- trace.warning("hardlinking working copy files is not currently "
195- "supported in %r" % (wt,))
196- accelerator_tree = None
197+ # The original tree may not have the same content filters
198+ # applied so we can't safely build the inventory delta from
199+ # the source tree.
200 delta_from_tree = False
201 else:
202 delta_from_tree = True