Merge lp:~jameinel/bzr/2.1-peak-mem-tweak into lp:bzr
- 2.1-peak-mem-tweak
- Merge into bzr.dev
Proposed by
John A Meinel
Status: | Merged |
---|---|
Approved by: | Ian Clatworthy |
Approved revision: | no longer in the source branch. |
Merged at revision: | not available |
Proposed branch: | lp:~jameinel/bzr/2.1-peak-mem-tweak |
Merge into: | lp:bzr |
Diff against target: |
279 lines 9 files modified
bzrlib/btree_index.py (+13/-0) bzrlib/groupcompress.py (+5/-0) bzrlib/index.py (+20/-0) bzrlib/repofmt/groupcompress_repo.py (+5/-0) bzrlib/tests/per_versionedfile.py (+4/-0) bzrlib/tests/test_btree_index.py (+27/-0) bzrlib/tests/test_groupcompress.py (+13/-1) bzrlib/tests/test_index.py (+35/-0) bzrlib/versionedfile.py (+7/-0) |
To merge this branch: | bzr merge lp:~jameinel/bzr/2.1-peak-mem-tweak |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Ian Clatworthy | Approve | ||
Review via email: mp+13582@code.launchpad.net |
Commit message
Description of the change
To post a comment you must log in.
Revision history for this message
John A Meinel (jameinel) wrote : | # |
Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote : | # |
In terms of code quality, this looks good. You've commented out an assert of internal_node_cache size post calling clear_cache() though. Maybe you want to grab the size beforehand and check it's unchanged afterwards? Or maybe that line can go altogether.
I'll run some usertest benchmarks in coming days once you're ready for that. As we measure more, we may tweak things further I guess. Even so, I can't see any faults in this patch and it's a step forward resource usage wise so I'm happy to see this land.
review:
Approve
Preview Diff
[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1 | === modified file 'bzrlib/btree_index.py' | |||
2 | --- bzrlib/btree_index.py 2009-10-15 18:18:44 +0000 | |||
3 | +++ bzrlib/btree_index.py 2009-10-20 04:27:13 +0000 | |||
4 | @@ -853,6 +853,19 @@ | |||
5 | 853 | new_tips = next_tips | 853 | new_tips = next_tips |
6 | 854 | return final_offsets | 854 | return final_offsets |
7 | 855 | 855 | ||
8 | 856 | def clear_cache(self): | ||
9 | 857 | """Clear out any cached/memoized values. | ||
10 | 858 | |||
11 | 859 | This can be called at any time, but generally it is used when we have | ||
12 | 860 | extracted some information, but don't expect to be requesting any more | ||
13 | 861 | from this index. | ||
14 | 862 | """ | ||
15 | 863 | # Note that we don't touch self._root_node or self._internal_node_cache | ||
16 | 864 | # We don't expect either of those to be big, and it can save | ||
17 | 865 | # round-trips in the future. We may re-evaluate this if InternalNode | ||
18 | 866 | # memory starts to be an issue. | ||
19 | 867 | self._leaf_node_cache.clear() | ||
20 | 868 | |||
21 | 856 | def external_references(self, ref_list_num): | 869 | def external_references(self, ref_list_num): |
22 | 857 | if self._root_node is None: | 870 | if self._root_node is None: |
23 | 858 | self._get_root_node() | 871 | self._get_root_node() |
24 | 859 | 872 | ||
25 | === modified file 'bzrlib/groupcompress.py' | |||
26 | --- bzrlib/groupcompress.py 2009-10-17 04:43:14 +0000 | |||
27 | +++ bzrlib/groupcompress.py 2009-10-20 04:27:13 +0000 | |||
28 | @@ -1265,6 +1265,11 @@ | |||
29 | 1265 | else: | 1265 | else: |
30 | 1266 | return self.get_record_stream(keys, 'unordered', True) | 1266 | return self.get_record_stream(keys, 'unordered', True) |
31 | 1267 | 1267 | ||
32 | 1268 | def clear_cache(self): | ||
33 | 1269 | """See VersionedFiles.clear_cache()""" | ||
34 | 1270 | self._group_cache.clear() | ||
35 | 1271 | self._index._graph_index.clear_cache() | ||
36 | 1272 | |||
37 | 1268 | def _check_add(self, key, lines, random_id, check_content): | 1273 | def _check_add(self, key, lines, random_id, check_content): |
38 | 1269 | """check that version_id and lines are safe to add.""" | 1274 | """check that version_id and lines are safe to add.""" |
39 | 1270 | version_id = key[-1] | 1275 | version_id = key[-1] |
40 | 1271 | 1276 | ||
41 | === modified file 'bzrlib/index.py' | |||
42 | --- bzrlib/index.py 2009-10-14 13:54:09 +0000 | |||
43 | +++ bzrlib/index.py 2009-10-20 04:27:13 +0000 | |||
44 | @@ -232,6 +232,13 @@ | |||
45 | 232 | if self._nodes_by_key is not None and self._key_length > 1: | 232 | if self._nodes_by_key is not None and self._key_length > 1: |
46 | 233 | self._update_nodes_by_key(key, value, node_refs) | 233 | self._update_nodes_by_key(key, value, node_refs) |
47 | 234 | 234 | ||
48 | 235 | def clear_cache(self): | ||
49 | 236 | """See GraphIndex.clear_cache() | ||
50 | 237 | |||
51 | 238 | This is a no-op, but we need the api to conform to a generic 'Index' | ||
52 | 239 | abstraction. | ||
53 | 240 | """ | ||
54 | 241 | |||
55 | 235 | def finish(self): | 242 | def finish(self): |
56 | 236 | lines = [_SIGNATURE] | 243 | lines = [_SIGNATURE] |
57 | 237 | lines.append(_OPTION_NODE_REFS + str(self.reference_lists) + '\n') | 244 | lines.append(_OPTION_NODE_REFS + str(self.reference_lists) + '\n') |
58 | @@ -461,6 +468,14 @@ | |||
59 | 461 | # there must be one line - the empty trailer line. | 468 | # there must be one line - the empty trailer line. |
60 | 462 | raise errors.BadIndexData(self) | 469 | raise errors.BadIndexData(self) |
61 | 463 | 470 | ||
62 | 471 | def clear_cache(self): | ||
63 | 472 | """Clear out any cached/memoized values. | ||
64 | 473 | |||
65 | 474 | This can be called at any time, but generally it is used when we have | ||
66 | 475 | extracted some information, but don't expect to be requesting any more | ||
67 | 476 | from this index. | ||
68 | 477 | """ | ||
69 | 478 | |||
70 | 464 | def external_references(self, ref_list_num): | 479 | def external_references(self, ref_list_num): |
71 | 465 | """Return references that are not present in this index. | 480 | """Return references that are not present in this index. |
72 | 466 | """ | 481 | """ |
73 | @@ -1226,6 +1241,11 @@ | |||
74 | 1226 | self.__class__.__name__, | 1241 | self.__class__.__name__, |
75 | 1227 | ', '.join(map(repr, self._indices))) | 1242 | ', '.join(map(repr, self._indices))) |
76 | 1228 | 1243 | ||
77 | 1244 | def clear_cache(self): | ||
78 | 1245 | """See GraphIndex.clear_cache()""" | ||
79 | 1246 | for index in self._indices: | ||
80 | 1247 | index.clear_cache() | ||
81 | 1248 | |||
82 | 1229 | def get_parent_map(self, keys): | 1249 | def get_parent_map(self, keys): |
83 | 1230 | """See graph.StackedParentsProvider.get_parent_map""" | 1250 | """See graph.StackedParentsProvider.get_parent_map""" |
84 | 1231 | search_keys = set(keys) | 1251 | search_keys = set(keys) |
85 | 1232 | 1252 | ||
86 | === modified file 'bzrlib/repofmt/groupcompress_repo.py' | |||
87 | --- bzrlib/repofmt/groupcompress_repo.py 2009-09-25 21:24:21 +0000 | |||
88 | +++ bzrlib/repofmt/groupcompress_repo.py 2009-10-20 04:27:13 +0000 | |||
89 | @@ -1105,7 +1105,10 @@ | |||
90 | 1105 | for stream_info in self._fetch_revision_texts(revision_ids): | 1105 | for stream_info in self._fetch_revision_texts(revision_ids): |
91 | 1106 | yield stream_info | 1106 | yield stream_info |
92 | 1107 | self._revision_keys = [(rev_id,) for rev_id in revision_ids] | 1107 | self._revision_keys = [(rev_id,) for rev_id in revision_ids] |
93 | 1108 | self.from_repository.revisions.clear_cache() | ||
94 | 1109 | self.from_repository.signatures.clear_cache() | ||
95 | 1108 | yield self._get_inventory_stream(self._revision_keys) | 1110 | yield self._get_inventory_stream(self._revision_keys) |
96 | 1111 | self.from_repository.inventories.clear_cache() | ||
97 | 1109 | # TODO: The keys to exclude might be part of the search recipe | 1112 | # TODO: The keys to exclude might be part of the search recipe |
98 | 1110 | # For now, exclude all parents that are at the edge of ancestry, for | 1113 | # For now, exclude all parents that are at the edge of ancestry, for |
99 | 1111 | # which we have inventories | 1114 | # which we have inventories |
100 | @@ -1114,7 +1117,9 @@ | |||
101 | 1114 | self._revision_keys) | 1117 | self._revision_keys) |
102 | 1115 | for stream_info in self._get_filtered_chk_streams(parent_keys): | 1118 | for stream_info in self._get_filtered_chk_streams(parent_keys): |
103 | 1116 | yield stream_info | 1119 | yield stream_info |
104 | 1120 | self.from_repository.chk_bytes.clear_cache() | ||
105 | 1117 | yield self._get_text_stream() | 1121 | yield self._get_text_stream() |
106 | 1122 | self.from_repository.texts.clear_cache() | ||
107 | 1118 | 1123 | ||
108 | 1119 | def get_stream_for_missing_keys(self, missing_keys): | 1124 | def get_stream_for_missing_keys(self, missing_keys): |
109 | 1120 | # missing keys can only occur when we are byte copying and not | 1125 | # missing keys can only occur when we are byte copying and not |
110 | 1121 | 1126 | ||
111 | === modified file 'bzrlib/tests/per_versionedfile.py' | |||
112 | --- bzrlib/tests/per_versionedfile.py 2009-08-26 16:44:27 +0000 | |||
113 | +++ bzrlib/tests/per_versionedfile.py 2009-10-20 04:27:13 +0000 | |||
114 | @@ -1581,6 +1581,10 @@ | |||
115 | 1581 | # All texts should be output. | 1581 | # All texts should be output. |
116 | 1582 | self.assertEqual(set(keys), seen) | 1582 | self.assertEqual(set(keys), seen) |
117 | 1583 | 1583 | ||
118 | 1584 | def test_clear_cache(self): | ||
119 | 1585 | files = self.get_versionedfiles() | ||
120 | 1586 | files.clear_cache() | ||
121 | 1587 | |||
122 | 1584 | def test_construct(self): | 1588 | def test_construct(self): |
123 | 1585 | """Each parameterised test can be constructed on a transport.""" | 1589 | """Each parameterised test can be constructed on a transport.""" |
124 | 1586 | files = self.get_versionedfiles() | 1590 | files = self.get_versionedfiles() |
125 | 1587 | 1591 | ||
126 | === modified file 'bzrlib/tests/test_btree_index.py' | |||
127 | --- bzrlib/tests/test_btree_index.py 2009-10-09 15:02:19 +0000 | |||
128 | +++ bzrlib/tests/test_btree_index.py 2009-10-20 04:27:13 +0000 | |||
129 | @@ -124,6 +124,12 @@ | |||
130 | 124 | 124 | ||
131 | 125 | class TestBTreeBuilder(BTreeTestCase): | 125 | class TestBTreeBuilder(BTreeTestCase): |
132 | 126 | 126 | ||
133 | 127 | def test_clear_cache(self): | ||
134 | 128 | builder = btree_index.BTreeBuilder(reference_lists=0, key_elements=1) | ||
135 | 129 | # This is a no-op, but we need the api to be consistent with other | ||
136 | 130 | # BTreeGraphIndex apis. | ||
137 | 131 | builder.clear_cache() | ||
138 | 132 | |||
139 | 127 | def test_empty_1_0(self): | 133 | def test_empty_1_0(self): |
140 | 128 | builder = btree_index.BTreeBuilder(key_elements=1, reference_lists=0) | 134 | builder = btree_index.BTreeBuilder(key_elements=1, reference_lists=0) |
141 | 129 | # NamedTemporaryFile dies on builder.finish().read(). weird. | 135 | # NamedTemporaryFile dies on builder.finish().read(). weird. |
142 | @@ -639,6 +645,27 @@ | |||
143 | 639 | size = trans.put_file('index', stream) | 645 | size = trans.put_file('index', stream) |
144 | 640 | return btree_index.BTreeGraphIndex(trans, 'index', size) | 646 | return btree_index.BTreeGraphIndex(trans, 'index', size) |
145 | 641 | 647 | ||
146 | 648 | def test_clear_cache(self): | ||
147 | 649 | nodes = self.make_nodes(160, 2, 2) | ||
148 | 650 | index = self.make_index(ref_lists=2, key_elements=2, nodes=nodes) | ||
149 | 651 | self.assertEqual(1, len(list(index.iter_entries([nodes[30][0]])))) | ||
150 | 652 | self.assertEqual([1, 4], index._row_lengths) | ||
151 | 653 | self.assertIsNot(None, index._root_node) | ||
152 | 654 | internal_node_pre_clear = index._internal_node_cache.keys() | ||
153 | 655 | self.assertTrue(len(index._leaf_node_cache) > 0) | ||
154 | 656 | index.clear_cache() | ||
155 | 657 | # We don't touch _root_node or _internal_node_cache, both should be | ||
156 | 658 | # small, and can save a round trip or two | ||
157 | 659 | self.assertIsNot(None, index._root_node) | ||
158 | 660 | # NOTE: We don't want to affect the _internal_node_cache, as we expect | ||
159 | 661 | # it will be small, and if we ever do touch this index again, it | ||
160 | 662 | # will save round-trips. This assertion isn't very strong, | ||
161 | 663 | # becuase without a 3-level index, we don't have any internal | ||
162 | 664 | # nodes cached. | ||
163 | 665 | self.assertEqual(internal_node_pre_clear, | ||
164 | 666 | index._internal_node_cache.keys()) | ||
165 | 667 | self.assertEqual(0, len(index._leaf_node_cache)) | ||
166 | 668 | |||
167 | 642 | def test_trivial_constructor(self): | 669 | def test_trivial_constructor(self): |
168 | 643 | transport = get_transport('trace+' + self.get_url('')) | 670 | transport = get_transport('trace+' + self.get_url('')) |
169 | 644 | index = btree_index.BTreeGraphIndex(transport, 'index', None) | 671 | index = btree_index.BTreeGraphIndex(transport, 'index', None) |
170 | 645 | 672 | ||
171 | === modified file 'bzrlib/tests/test_groupcompress.py' | |||
172 | --- bzrlib/tests/test_groupcompress.py 2009-10-17 04:43:14 +0000 | |||
173 | +++ bzrlib/tests/test_groupcompress.py 2009-10-20 04:27:13 +0000 | |||
174 | @@ -459,7 +459,8 @@ | |||
175 | 459 | ], block._dump()) | 459 | ], block._dump()) |
176 | 460 | 460 | ||
177 | 461 | 461 | ||
179 | 462 | class TestCaseWithGroupCompressVersionedFiles(tests.TestCaseWithTransport): | 462 | class TestCaseWithGroupCompressVersionedFiles( |
180 | 463 | tests.TestCaseWithMemoryTransport): | ||
181 | 463 | 464 | ||
182 | 464 | def make_test_vf(self, create_graph, keylength=1, do_cleanup=True, | 465 | def make_test_vf(self, create_graph, keylength=1, do_cleanup=True, |
183 | 465 | dir='.', inconsistency_fatal=True): | 466 | dir='.', inconsistency_fatal=True): |
184 | @@ -732,6 +733,17 @@ | |||
185 | 732 | " \('b',\) \('42 32 0 8', \(\(\),\)\) \('74 32" | 733 | " \('b',\) \('42 32 0 8', \(\(\),\)\) \('74 32" |
186 | 733 | " 0 8', \(\(\('a',\),\),\)\)") | 734 | " 0 8', \(\(\('a',\),\),\)\)") |
187 | 734 | 735 | ||
188 | 736 | def test_clear_cache(self): | ||
189 | 737 | vf = self.make_source_with_b(True, 'source') | ||
190 | 738 | vf.writer.end() | ||
191 | 739 | for record in vf.get_record_stream([('a',), ('b',)], 'unordered', | ||
192 | 740 | True): | ||
193 | 741 | pass | ||
194 | 742 | self.assertTrue(len(vf._group_cache) > 0) | ||
195 | 743 | vf.clear_cache() | ||
196 | 744 | self.assertEqual(0, len(vf._group_cache)) | ||
197 | 745 | |||
198 | 746 | |||
199 | 735 | 747 | ||
200 | 736 | class StubGCVF(object): | 748 | class StubGCVF(object): |
201 | 737 | def __init__(self, canned_get_blocks=None): | 749 | def __init__(self, canned_get_blocks=None): |
202 | 738 | 750 | ||
203 | === modified file 'bzrlib/tests/test_index.py' | |||
204 | --- bzrlib/tests/test_index.py 2009-09-09 18:52:56 +0000 | |||
205 | +++ bzrlib/tests/test_index.py 2009-10-20 04:27:13 +0000 | |||
206 | @@ -173,6 +173,11 @@ | |||
207 | 173 | "key\x00\x00\t\x00data\n" | 173 | "key\x00\x00\t\x00data\n" |
208 | 174 | "\n", contents) | 174 | "\n", contents) |
209 | 175 | 175 | ||
210 | 176 | def test_clear_cache(self): | ||
211 | 177 | builder = GraphIndexBuilder(reference_lists=2) | ||
212 | 178 | # This is a no-op, but the api should exist | ||
213 | 179 | builder.clear_cache() | ||
214 | 180 | |||
215 | 176 | def test_node_references_are_byte_offsets(self): | 181 | def test_node_references_are_byte_offsets(self): |
216 | 177 | builder = GraphIndexBuilder(reference_lists=1) | 182 | builder = GraphIndexBuilder(reference_lists=1) |
217 | 178 | builder.add_node(('reference', ), 'data', ([], )) | 183 | builder.add_node(('reference', ), 'data', ([], )) |
218 | @@ -383,6 +388,12 @@ | |||
219 | 383 | size = trans.put_file('index', stream) | 388 | size = trans.put_file('index', stream) |
220 | 384 | return GraphIndex(trans, 'index', size) | 389 | return GraphIndex(trans, 'index', size) |
221 | 385 | 390 | ||
222 | 391 | def test_clear_cache(self): | ||
223 | 392 | index = self.make_index() | ||
224 | 393 | # For now, we just want to make sure the api is available. As this is | ||
225 | 394 | # old code, we don't really worry if it *does* anything. | ||
226 | 395 | index.clear_cache() | ||
227 | 396 | |||
228 | 386 | def test_open_bad_index_no_error(self): | 397 | def test_open_bad_index_no_error(self): |
229 | 387 | trans = self.get_transport() | 398 | trans = self.get_transport() |
230 | 388 | trans.put_bytes('name', "not an index\n") | 399 | trans.put_bytes('name', "not an index\n") |
231 | @@ -1071,6 +1082,30 @@ | |||
232 | 1071 | index.insert_index(0, index1) | 1082 | index.insert_index(0, index1) |
233 | 1072 | self.assertEqual([(index1, ('key', ), '')], list(index.iter_all_entries())) | 1083 | self.assertEqual([(index1, ('key', ), '')], list(index.iter_all_entries())) |
234 | 1073 | 1084 | ||
235 | 1085 | def test_clear_cache(self): | ||
236 | 1086 | log = [] | ||
237 | 1087 | |||
238 | 1088 | class ClearCacheProxy(object): | ||
239 | 1089 | |||
240 | 1090 | def __init__(self, index): | ||
241 | 1091 | self._index = index | ||
242 | 1092 | |||
243 | 1093 | def __getattr__(self, name): | ||
244 | 1094 | return getattr(self._index) | ||
245 | 1095 | |||
246 | 1096 | def clear_cache(self): | ||
247 | 1097 | log.append(self._index) | ||
248 | 1098 | return self._index.clear_cache() | ||
249 | 1099 | |||
250 | 1100 | index = CombinedGraphIndex([]) | ||
251 | 1101 | index1 = self.make_index('name', 0, nodes=[(('key', ), '', ())]) | ||
252 | 1102 | index.insert_index(0, ClearCacheProxy(index1)) | ||
253 | 1103 | index2 = self.make_index('name', 0, nodes=[(('key', ), '', ())]) | ||
254 | 1104 | index.insert_index(1, ClearCacheProxy(index2)) | ||
255 | 1105 | # CombinedGraphIndex should call 'clear_cache()' on all children | ||
256 | 1106 | index.clear_cache() | ||
257 | 1107 | self.assertEqual(sorted([index1, index2]), sorted(log)) | ||
258 | 1108 | |||
259 | 1074 | def test_iter_all_entries_empty(self): | 1109 | def test_iter_all_entries_empty(self): |
260 | 1075 | index = CombinedGraphIndex([]) | 1110 | index = CombinedGraphIndex([]) |
261 | 1076 | self.assertEqual([], list(index.iter_all_entries())) | 1111 | self.assertEqual([], list(index.iter_all_entries())) |
262 | 1077 | 1112 | ||
263 | === modified file 'bzrlib/versionedfile.py' | |||
264 | --- bzrlib/versionedfile.py 2009-08-17 22:08:21 +0000 | |||
265 | +++ bzrlib/versionedfile.py 2009-10-20 04:27:13 +0000 | |||
266 | @@ -930,6 +930,13 @@ | |||
267 | 930 | def check_not_reserved_id(version_id): | 930 | def check_not_reserved_id(version_id): |
268 | 931 | revision.check_not_reserved_id(version_id) | 931 | revision.check_not_reserved_id(version_id) |
269 | 932 | 932 | ||
270 | 933 | def clear_cache(self): | ||
271 | 934 | """Clear whatever caches this VersionedFile holds. | ||
272 | 935 | |||
273 | 936 | This is generally called after an operation has been performed, when we | ||
274 | 937 | don't expect to be using this versioned file again soon. | ||
275 | 938 | """ | ||
276 | 939 | |||
277 | 933 | def _check_lines_not_unicode(self, lines): | 940 | def _check_lines_not_unicode(self, lines): |
278 | 934 | """Check that lines being added to a versioned file are not unicode.""" | 941 | """Check that lines being added to a versioned file are not unicode.""" |
279 | 935 | for line in lines: | 942 | for line in lines: |
This patch probably gives the greatest peak memory reduction of the patches I've proposed so far.
The basic work is to just provide "clear_cache()" functions, and then have them called at appropriate times.
Namely, VersionedFiles. clear_cache( ) and BTreeGraphIndex .clear_ cache() are the actual points that matter. I then updated the GroupCompressSt reamSource( ) so that caches are cleared once we have streamed the content out of that VF.
(So we stream revisions, and then clear the cache, then inventories, etc.)
The basic observation is that all of the GCVF instances have a '50MB' LRU cache for their GroupCompressBl ocks. In reality, that 50MB is often 160MB do to 'counting errors'. I haven't tracked down the details there yet, but it doesn't really matter that the cache is oversized. What specifically matters is that once we are done reading all the inventories, we aren't going back and reading them again, but we keep the cached state around.
As for the effect. Branching 'launchpad' locally, I don't see a change in time (8m37s both times), but I see stream( )'
760MB bzr.dev@4756
551MB clear from_repository versioned files after 'get_record_
One thing that this could effect is the "get_stream_ for_missing_ keys()" . As it probably triggers the indexes to re-read some pages it may otherwise have in cache.
I *could* change the code to only clear the GroupCompressBlock cache. However, my numbers from earlier show that as:
652MB clear _group_cache
551MB clear _group_cache + index LeafNode cache
So clearing _group_cache is 110MB saved, clearing the index cache is another 100MB saved. (Give or take 10MB on each measurement, as they were done at different times.)
Note that I also tried adding 'self.to_ repo.X. clear_cache( )' to the StreamSink code, however that was
549MB clear source and target
And 2MB wasn't worth it.
The numbers for branching bzr.dev
345MB bzr.dev@4756
227MB clear both