Merge lp:~spiv/bzr/hardlink-2a-408193 into lp:bzr
- hardlink-2a-408193
- Merge into bzr.dev
Status: | Merged |
---|---|
Approved by: | Martin Pool |
Approved revision: | not available |
Merged at revision: | not available |
Proposed branch: | lp:~spiv/bzr/hardlink-2a-408193 |
Merge into: | lp:bzr |
Diff against target: |
202 lines (+60/-40) 8 files modified
NEWS (+4/-0) bzrlib/builtins.py (+0/-3) bzrlib/tests/blackbox/test_branch.py (+1/-9) bzrlib/tests/blackbox/test_checkout.py (+1/-9) bzrlib/tests/per_tree/test_iter_search_rules.py (+0/-1) bzrlib/tests/test_transform.py (+45/-2) bzrlib/transform.py (+6/-2) bzrlib/workingtree_4.py (+3/-14) |
To merge this branch: | bzr merge lp:~spiv/bzr/hardlink-2a-408193 |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Martin Pool | Approve | ||
John A Meinel | Approve | ||
Review via email: mp+15591@code.launchpad.net |
This proposal supersedes a proposal from 2009-11-27.
Commit message
Description of the change
Andrew Bennetts (spiv) wrote : Posted in a previous version of this proposal | # |
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal | # |
On Fri, 2009-11-27 at 00:50 +0000, Andrew Bennetts wrote:
> This patch almost certainly deserves new tests. I'd appreciate some suggestions about which tests to add, and where. I guess maybe in test_transform?
That sounds appropriate to me, and the approach you've taken sounds
plausible.
Martin Pool (mbp) wrote : Posted in a previous version of this proposal | # |
news please.
This looks like a plausible fix to me. I probably wouldn't trust it to go into 2.0 though, at least not without a shakedown in trunk.
There is another test relevant to this in test_checkout so you should at least reenable that.
I guess ideally you want a test that shows that things are not hardlinked when the filters cause them to have different content. However, you could make a case that we should optimistically land this and then see if things actually fail that way.
Andrew Bennetts (spiv) wrote : | # |
Thanks very much for the reviews so far. This patch adds a test, as well as the other changes suggested in the earlier reviews.
The test is a bit ugly, but I'm not sure if there's a better way to write it. There are quite a few layers involved in the content filtering APIs. I think it's probably ok, but an extra review won't hurt :)
I agree that this is a bit risky for 2.0, so I wouldn't propose backporting it without a) someone clamouring for it, and b) time for it to be tested in trunk.
John A Meinel (jameinel) wrote : | # |
46 + self.assertEqua
^- It is generally better to use "self.assertEqu
I realize it isn't strictly required here, since you aren't doing an fstat versus lstat, etc.
118 + os.mkdir(
119 + rules_filename = self.test_home_dir + '/.bazaar/rules'
120 + f = open(rules_
121 + f.write('[name %s]\nrot13=yes\n' % (pattern,))
122 + f.close()
^- Could you use ".build_tree()" and ".build_
Would it be easier to use something like "fn = config.
Otherwise I think the change is reasonable.
Andrew Bennetts (spiv) wrote : | # |
John A Meinel wrote:
> Review: Approve
> 46 + self.assertEqua
>
> ^- It is generally better to use "self.assertEqu
>
> I realize it isn't strictly required here, since you aren't doing an fstat
> versus lstat, etc.
Ok, although there's no assertNotEqualStat, so I'll leave the corresponding
assertNotEqual as it is.
> 118 + os.mkdir(
> 119 + rules_filename = self.test_home_dir + '/.bazaar/rules'
> 120 + f = open(rules_
> 121 + f.write('[name %s]\nrot13=yes\n' % (pattern,))
> 122 + f.close()
>
> ^- Could you use ".build_tree()" and ".build_
> note that you should at least use 'wb', since I don't think you want to get
> \r\n content on Windows. (maybe this doesn't matter as much since it is a
> config file?)
Good point about 'wb', changed (although I don't know the precise rules for
newlines in config files).
I cannot easily use build_tree_
way to build a tree relative to a dir other than '.'. Perhaps it should be
updated to take an optional transport and use put_bytes_
TestCase.
before I call reset_rules()... probably that's something that should be handled
by the base TestCase, but it isn't at the moment.
> Would it be easier to use something like "fn = config.
> I don't know how that path is currently determined, but it doesn't seem like
> you should be going via self.test_home_dir. (If there isn't an api for it,
> just submit a bug, and leave the test alone.)
>
> Otherwise I think the change is reasonable.
Hmm, there's bzrlib.
the test_home_dir explicitly though, it's easy to imagine a naïve optimisation
of rules_filename() caching and returning the wrong value — after all I need to
call reset_rules() because of similar caching. I might be too pessimistic
though. Anyway, I'd still need to makedirs(
end up being much easier.
You do have a point that ideally I'd have essentially a one-liner:
write_
And instead I have 5 lines, even ignoring the reset_rules call and cleanups.
So this is something to revisit if we write more tests that want to temporarily
install some rules, but I hope it's good enough for now.
-Andrew.
Martin Pool (mbp) wrote : | # |
If I touched this again I'd probably add assertIsHardLinked. Not needed now. (And this is a case where testtools's assertIs would be nice, so you didn't need to separately write assertIsNotHard
Preview Diff
1 | === modified file 'NEWS' | |||
2 | --- NEWS 2009-12-03 02:24:54 +0000 | |||
3 | +++ NEWS 2009-12-04 06:17:11 +0000 | |||
4 | @@ -48,6 +48,10 @@ | |||
5 | 48 | * Terminate ssh subprocesses when no references to them remain, fixing | 48 | * Terminate ssh subprocesses when no references to them remain, fixing |
6 | 49 | subprocess and file descriptor leaks. (Andrew Bennetts, #426662) | 49 | subprocess and file descriptor leaks. (Andrew Bennetts, #426662) |
7 | 50 | 50 | ||
8 | 51 | * The ``--hardlink`` option of ``bzr branch`` and ``bzr checkout`` now | ||
9 | 52 | works for 2a format trees. Only files unaffected by content filters | ||
10 | 53 | will be hardlinked. (Andrew Bennetts, #408193) | ||
11 | 54 | |||
12 | 51 | * The new glob expansion on Windows would replace all ``\`` characters | 55 | * The new glob expansion on Windows would replace all ``\`` characters |
13 | 52 | with ``/`` even if it there wasn't a glob to expand, the arg was quoted, | 56 | with ``/`` even if it there wasn't a glob to expand, the arg was quoted, |
14 | 53 | etc. Now only change slashes if there is something being glob expanded. | 57 | etc. Now only change slashes if there is something being glob expanded. |
15 | 54 | 58 | ||
16 | === modified file 'bzrlib/builtins.py' | |||
17 | --- bzrlib/builtins.py 2009-12-03 20:56:18 +0000 | |||
18 | +++ bzrlib/builtins.py 2009-12-04 06:17:11 +0000 | |||
19 | @@ -1209,9 +1209,6 @@ | |||
20 | 1209 | from bzrlib.tag import _merge_tags_if_possible | 1209 | from bzrlib.tag import _merge_tags_if_possible |
21 | 1210 | accelerator_tree, br_from = bzrdir.BzrDir.open_tree_or_branch( | 1210 | accelerator_tree, br_from = bzrdir.BzrDir.open_tree_or_branch( |
22 | 1211 | from_location) | 1211 | from_location) |
23 | 1212 | if (accelerator_tree is not None and | ||
24 | 1213 | accelerator_tree.supports_content_filtering()): | ||
25 | 1214 | accelerator_tree = None | ||
26 | 1215 | revision = _get_one_revision('branch', revision) | 1212 | revision = _get_one_revision('branch', revision) |
27 | 1216 | br_from.lock_read() | 1213 | br_from.lock_read() |
28 | 1217 | try: | 1214 | try: |
29 | 1218 | 1215 | ||
30 | === modified file 'bzrlib/tests/blackbox/test_branch.py' | |||
31 | --- bzrlib/tests/blackbox/test_branch.py 2009-08-20 12:30:49 +0000 | |||
32 | +++ bzrlib/tests/blackbox/test_branch.py 2009-12-04 06:17:11 +0000 | |||
33 | @@ -171,15 +171,7 @@ | |||
34 | 171 | out, err = self.run_bzr(['branch', 'source', 'target', '--hardlink']) | 171 | out, err = self.run_bzr(['branch', 'source', 'target', '--hardlink']) |
35 | 172 | source_stat = os.stat('source/file1') | 172 | source_stat = os.stat('source/file1') |
36 | 173 | target_stat = os.stat('target/file1') | 173 | target_stat = os.stat('target/file1') |
46 | 174 | same_file = (source_stat == target_stat) | 174 | self.assertEqual(source_stat, target_stat) |
38 | 175 | if same_file: | ||
39 | 176 | pass | ||
40 | 177 | else: | ||
41 | 178 | # https://bugs.edge.launchpad.net/bzr/+bug/408193 | ||
42 | 179 | self.assertContainsRe(err, "hardlinking working copy files is " | ||
43 | 180 | "not currently supported") | ||
44 | 181 | raise KnownFailure("--hardlink doesn't work in formats " | ||
45 | 182 | "that support content filtering (#408193)") | ||
47 | 183 | 175 | ||
48 | 184 | def test_branch_standalone(self): | 176 | def test_branch_standalone(self): |
49 | 185 | shared_repo = self.make_repository('repo', shared=True) | 177 | shared_repo = self.make_repository('repo', shared=True) |
50 | 186 | 178 | ||
51 | === modified file 'bzrlib/tests/blackbox/test_checkout.py' | |||
52 | --- bzrlib/tests/blackbox/test_checkout.py 2009-08-03 04:19:03 +0000 | |||
53 | +++ bzrlib/tests/blackbox/test_checkout.py 2009-12-04 06:17:11 +0000 | |||
54 | @@ -160,12 +160,4 @@ | |||
55 | 160 | '--hardlink']) | 160 | '--hardlink']) |
56 | 161 | source_stat = os.stat('source/file1') | 161 | source_stat = os.stat('source/file1') |
57 | 162 | target_stat = os.stat('target/file1') | 162 | target_stat = os.stat('target/file1') |
67 | 163 | same_file = (source_stat == target_stat) | 163 | self.assertEqual(source_stat, target_stat) |
59 | 164 | if same_file: | ||
60 | 165 | pass | ||
61 | 166 | else: | ||
62 | 167 | # https://bugs.edge.launchpad.net/bzr/+bug/408193 | ||
63 | 168 | self.assertContainsRe(err, "hardlinking working copy files is " | ||
64 | 169 | "not currently supported") | ||
65 | 170 | raise KnownFailure("--hardlink doesn't work in formats " | ||
66 | 171 | "that support content filtering (#408193)") | ||
68 | 172 | 164 | ||
69 | === modified file 'bzrlib/tests/per_tree/test_iter_search_rules.py' | |||
70 | --- bzrlib/tests/per_tree/test_iter_search_rules.py 2009-07-10 07:14:02 +0000 | |||
71 | +++ bzrlib/tests/per_tree/test_iter_search_rules.py 2009-12-04 06:17:11 +0000 | |||
72 | @@ -18,7 +18,6 @@ | |||
73 | 18 | 18 | ||
74 | 19 | from bzrlib import ( | 19 | from bzrlib import ( |
75 | 20 | rules, | 20 | rules, |
76 | 21 | tests, | ||
77 | 22 | ) | 21 | ) |
78 | 23 | from bzrlib.tests.per_tree import TestCaseWithTree | 22 | from bzrlib.tests.per_tree import TestCaseWithTree |
79 | 24 | 23 | ||
80 | 25 | 24 | ||
81 | === modified file 'bzrlib/tests/test_transform.py' | |||
82 | --- bzrlib/tests/test_transform.py 2009-11-18 16:10:18 +0000 | |||
83 | +++ bzrlib/tests/test_transform.py 2009-12-04 06:17:11 +0000 | |||
84 | @@ -15,18 +15,18 @@ | |||
85 | 15 | # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA | 15 | # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA |
86 | 16 | 16 | ||
87 | 17 | import os | 17 | import os |
88 | 18 | import stat | ||
89 | 19 | from StringIO import StringIO | 18 | from StringIO import StringIO |
90 | 20 | import sys | 19 | import sys |
91 | 21 | 20 | ||
92 | 22 | from bzrlib import ( | 21 | from bzrlib import ( |
93 | 23 | bencode, | 22 | bencode, |
94 | 24 | errors, | 23 | errors, |
95 | 24 | filters, | ||
96 | 25 | generate_ids, | 25 | generate_ids, |
97 | 26 | osutils, | 26 | osutils, |
98 | 27 | progress, | 27 | progress, |
99 | 28 | revision as _mod_revision, | 28 | revision as _mod_revision, |
101 | 29 | symbol_versioning, | 29 | rules, |
102 | 30 | tests, | 30 | tests, |
103 | 31 | urlutils, | 31 | urlutils, |
104 | 32 | ) | 32 | ) |
105 | @@ -1868,6 +1868,49 @@ | |||
106 | 1868 | self.assertEqual([], list(target.iter_changes(revision_tree))) | 1868 | self.assertEqual([], list(target.iter_changes(revision_tree))) |
107 | 1869 | self.assertTrue(source.is_executable('file1-id')) | 1869 | self.assertTrue(source.is_executable('file1-id')) |
108 | 1870 | 1870 | ||
109 | 1871 | def install_rot13_content_filter(self, pattern): | ||
110 | 1872 | original_registry = filters._reset_registry() | ||
111 | 1873 | def restore_registry(): | ||
112 | 1874 | filters._reset_registry(original_registry) | ||
113 | 1875 | self.addCleanup(restore_registry) | ||
114 | 1876 | def rot13(chunks, context=None): | ||
115 | 1877 | return [''.join(chunks).encode('rot13')] | ||
116 | 1878 | rot13filter = filters.ContentFilter(rot13, rot13) | ||
117 | 1879 | filters.register_filter_stack_map('rot13', {'yes': [rot13filter]}.get) | ||
118 | 1880 | os.mkdir(self.test_home_dir + '/.bazaar') | ||
119 | 1881 | rules_filename = self.test_home_dir + '/.bazaar/rules' | ||
120 | 1882 | f = open(rules_filename, 'wb') | ||
121 | 1883 | f.write('[name %s]\nrot13=yes\n' % (pattern,)) | ||
122 | 1884 | f.close() | ||
123 | 1885 | def uninstall_rules(): | ||
124 | 1886 | os.remove(rules_filename) | ||
125 | 1887 | rules.reset_rules() | ||
126 | 1888 | self.addCleanup(uninstall_rules) | ||
127 | 1889 | rules.reset_rules() | ||
128 | 1890 | |||
129 | 1891 | def test_build_tree_content_filtered_files_are_not_hardlinked(self): | ||
130 | 1892 | """build_tree will not hardlink files that have content filtering rules | ||
131 | 1893 | applied to them (but will still hardlink other files from the same tree | ||
132 | 1894 | if it can). | ||
133 | 1895 | """ | ||
134 | 1896 | self.requireFeature(HardlinkFeature) | ||
135 | 1897 | self.install_rot13_content_filter('file1') | ||
136 | 1898 | source = self.create_ab_tree() | ||
137 | 1899 | target = self.make_branch_and_tree('target') | ||
138 | 1900 | revision_tree = source.basis_tree() | ||
139 | 1901 | revision_tree.lock_read() | ||
140 | 1902 | self.addCleanup(revision_tree.unlock) | ||
141 | 1903 | build_tree(revision_tree, target, source, hardlink=True) | ||
142 | 1904 | target.lock_read() | ||
143 | 1905 | self.addCleanup(target.unlock) | ||
144 | 1906 | self.assertEqual([], list(target.iter_changes(revision_tree))) | ||
145 | 1907 | source_stat = os.stat('source/file1') | ||
146 | 1908 | target_stat = os.stat('target/file1') | ||
147 | 1909 | self.assertNotEqual(source_stat, target_stat) | ||
148 | 1910 | source_stat = os.stat('source/file2') | ||
149 | 1911 | target_stat = os.stat('target/file2') | ||
150 | 1912 | self.assertEqualStat(source_stat, target_stat) | ||
151 | 1913 | |||
152 | 1871 | def test_case_insensitive_build_tree_inventory(self): | 1914 | def test_case_insensitive_build_tree_inventory(self): |
153 | 1872 | if (tests.CaseInsensitiveFilesystemFeature.available() | 1915 | if (tests.CaseInsensitiveFilesystemFeature.available() |
154 | 1873 | or tests.CaseInsCasePresFilenameFeature.available()): | 1916 | or tests.CaseInsCasePresFilenameFeature.available()): |
155 | 1874 | 1917 | ||
156 | === modified file 'bzrlib/transform.py' | |||
157 | --- bzrlib/transform.py 2009-12-03 02:24:54 +0000 | |||
158 | +++ bzrlib/transform.py 2009-12-04 06:17:11 +0000 | |||
159 | @@ -2300,8 +2300,12 @@ | |||
160 | 2300 | new_desired_files = desired_files | 2300 | new_desired_files = desired_files |
161 | 2301 | else: | 2301 | else: |
162 | 2302 | iter = accelerator_tree.iter_changes(tree, include_unchanged=True) | 2302 | iter = accelerator_tree.iter_changes(tree, include_unchanged=True) |
165 | 2303 | unchanged = dict((f, p[1]) for (f, p, c, v, d, n, k, e) | 2303 | unchanged = [(f, p[1]) for (f, p, c, v, d, n, k, e) |
166 | 2304 | in iter if not (c or e[0] != e[1])) | 2304 | in iter if not (c or e[0] != e[1])] |
167 | 2305 | if accelerator_tree.supports_content_filtering(): | ||
168 | 2306 | unchanged = [(f, p) for (f, p) in unchanged | ||
169 | 2307 | if not accelerator_tree.iter_search_rules([p]).next()] | ||
170 | 2308 | unchanged = dict(unchanged) | ||
171 | 2305 | new_desired_files = [] | 2309 | new_desired_files = [] |
172 | 2306 | count = 0 | 2310 | count = 0 |
173 | 2307 | for file_id, (trans_id, tree_path) in desired_files: | 2311 | for file_id, (trans_id, tree_path) in desired_files: |
174 | 2308 | 2312 | ||
175 | === modified file 'bzrlib/workingtree_4.py' | |||
176 | --- bzrlib/workingtree_4.py 2009-11-14 11:06:51 +0000 | |||
177 | +++ bzrlib/workingtree_4.py 2009-12-04 06:17:11 +0000 | |||
178 | @@ -1447,21 +1447,10 @@ | |||
179 | 1447 | if basis_root_id is not None: | 1447 | if basis_root_id is not None: |
180 | 1448 | wt._set_root_id(basis_root_id) | 1448 | wt._set_root_id(basis_root_id) |
181 | 1449 | wt.flush() | 1449 | wt.flush() |
182 | 1450 | # If content filtering is supported, do not use the accelerator | ||
183 | 1451 | # tree - the cost of transforming the content both ways and | ||
184 | 1452 | # checking for changed content can outweight the gains it gives. | ||
185 | 1453 | # Note: do NOT move this logic up higher - using the basis from | ||
186 | 1454 | # the accelerator tree is still desirable because that can save | ||
187 | 1455 | # a minute or more of processing on large trees! | ||
188 | 1456 | # The original tree may not have the same content filters | ||
189 | 1457 | # applied so we can't safely build the inventory delta from | ||
190 | 1458 | # the source tree. | ||
191 | 1459 | if wt.supports_content_filtering(): | 1450 | if wt.supports_content_filtering(): |
197 | 1460 | if hardlink: | 1451 | # The original tree may not have the same content filters |
198 | 1461 | # see https://bugs.edge.launchpad.net/bzr/+bug/408193 | 1452 | # applied so we can't safely build the inventory delta from |
199 | 1462 | trace.warning("hardlinking working copy files is not currently " | 1453 | # the source tree. |
195 | 1463 | "supported in %r" % (wt,)) | ||
196 | 1464 | accelerator_tree = None | ||
200 | 1465 | delta_from_tree = False | 1454 | delta_from_tree = False |
201 | 1466 | else: | 1455 | else: |
202 | 1467 | delta_from_tree = True | 1456 | delta_from_tree = True |
This tries to fix bug 408193, --hardlink not working on recent tree formats. I'm not certain this is a correct fix, as I haven't looked at content filtering code before, so I'd love some careful feedback on this approach.
This patch doesn't seem to break any tests, and it doesn't seem to change performance for the worse. For formats that don't support content filtering, nothing changes. For formats that do, this (assuming I've written this correctly) looks at each file to see if any filtering rules apply to it and if so skips hardlinking of it.
In my testing, if you have no rules configured on a 2a tree, on bzr trees it gives the same performance and hardlinks as --hardlink of a 1.9 format tree. (Both cases with 2a branch and repo, just varying tree format.)
In practice, for me, this makes branching 2a format bzr.dev trees speed up from 3s to 1s. Plus of course uses less disk.
This patch almost certainly deserves new tests. I'd appreciate some suggestions about which tests to add, and where. I guess maybe in test_transform?