Merge lp:~jameinel/bzr/2.1-static-tuple-chk-map into lp:bzr
- 2.1-static-tuple-chk-map
- Merge into bzr.dev
Status: | Merged |
---|---|
Approved by: | Andrew Bennetts |
Approved revision: | no longer in the source branch. |
Merged at revision: | not available |
Proposed branch: | lp:~jameinel/bzr/2.1-static-tuple-chk-map |
Merge into: | lp:bzr |
Diff against target: |
1246 lines 10 files modified
NEWS (+9/-8) bzrlib/_chk_map_py.py (+4/-3) bzrlib/_chk_map_pyx.pyx (+41/-18) bzrlib/_static_tuple_c.pxd (+5/-1) bzrlib/chk_map.py (+99/-20) bzrlib/groupcompress.py (+18/-1) bzrlib/inventory.py (+28/-20) bzrlib/repofmt/groupcompress_repo.py (+13/-4) bzrlib/tests/test__chk_map.py (+22/-16) bzrlib/tests/test_chk_map.py (+43/-70) |
To merge this branch: | bzr merge lp:~jameinel/bzr/2.1-static-tuple-chk-map |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Andrew Bennetts | Approve | ||
Review via email: mp+13740@code.launchpad.net |
Commit message
Description of the change
John A Meinel (jameinel) wrote : | # |
Matt Nordhoff (mnordhoff) wrote : | # |
I just tried this branch (well, + r4762 of bzr.dev), on my client and
server. Pushing to the server gave a traceback:
Thu 2009-10-22 03:35:30 +0000
0.357 bzr arguments: [u'serve', u'--inet', u'--directory=/',
u'--allow-writes']
0.372 looking for plugins in /home/mnordhoff
0.372 looking for plugins in /usr/local/
0.568 looking for plugins in
/usr/lib/
0.571 encoding stdout as osutils.
2.439 bzr-svn: using Subversion 1.4.6 ()
10.883 Traceback (most recent call last):
File "/usr/local/
317, in _call_convertin
return callable(*args, **kwargs)
File "/usr/local/
727, in _inserter_thread
stream, src_format, self.tokens)
File "/usr/local/
in insert_stream
return self._locked_
File "/usr/local/
in _locked_
hint = self.target_
File "/usr/local/
in commit_write_group
result = self._commit_
File "/usr/local/
2269, in _commit_write_group
hint = self._pack_
File "/usr/local/
2097, in _commit_write_group
problems = self._check_
File
"/usr/local/
line 653, in _check_
for record in _filter_
File
"/usr/local/
line 1189, in _filter_text_keys
for record, items in interesting_
File "/usr/local/
process
for record in self._read_
File "/usr/local/
_read_all_roots
self.
File "/usr/local/
_read_nodes_
search_
File "/usr/local/
_deserialise
search_
File "/usr/local/
deserialise
' StaticTuple not %s' % (type(key),))
AssertionError: deserialise should be called with a StaticTuple not
<type 'tuple'>
A local "bzr merge" did too:
Thu 2009-10-22 03:40:07 +0000
0.035 bzr arguments: [u'merge', u'/srv/
0.047 looking for plugins in /home/mnordhoff
0.047 looking for plugins in /usr/local/
0.162 looking for plugins in
/usr/lib/
0.218 opening working tree '/usr/local/
0.288 Using fetch logic to copy between
CHKInventoryRep
John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Matt Nordhoff wrote:
> I just tried this branch (well, + r4762 of bzr.dev), on my client and
> server. Pushing to the server gave a traceback:
Thanks for the heads up.
...
> line 653, in _check_
> for record in _filter_
> File
> "/usr/local/
> line 1189, in _filter_text_keys
> for record, items in interesting_
> File "/usr/local/
> process
> for record in self._read_
> File "/usr/local/
> _read_all_roots
> self._read_
> File "/usr/local/
> _read_nodes_
> search_
> File "/usr/local/
> _deserialise
> search_
> File "/usr/local/
> deserialise
> ' StaticTuple not %s' % (type(key),))
> AssertionError: deserialise should be called with a StaticTuple not
> <type 'tuple'>
I'll try to track down where the plain 'tuple' object came into play,
and also why I didn't catch it with the test suite. Admittedly I only
ran a subset, but I thought I ran "selftest -s bt.per_repo' which should
have covered this.
...
> for result in self.target.
> File "/usr/local/
> in iter_changes
> self.id_
> File "/usr/local/
> iter_changes
> self._ensure_root()
> File "/usr/local/
> _ensure_root
> self._root_node = self._get_
> File "/usr/local/
> _get_node
> search_
> File "/usr/local/
> _deserialise
> search_
> File "/usr/local/
> deserialise
> search_
> File "_chk_map_pyx.pyx", line 368, in
> _chk_map_
> TypeError: key ('sha1:
> a StaticTuple
>
> Turning off plugins did not help the latter one; I didn't try it with
> the first one.
^- This is pretty surprising, I'll certainly give it a look. Namely, it
looks like the root_key in 'basis.id_to_entry' is not a StaticTuple,
which is surprising given that the code that sets the root key has:
if type(node) is tuple:
node = StaticTuple.
Thanks for looking closely.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAkr
SM0An09eTMf4fXh
=oe2X
-----END PGP SIG...
John A Meinel (jameinel) wrote : | # |
pulling this back out since it seems to have failures, and I certainly can't trust PQM to catch them right now :)
John A Meinel (jameinel) wrote : | # |
Unfortunately, I'm not able to reproduce either of Matt's failures. At this point, *my* best guess is that he had not recompiled the extensions yet, so it wasn't getting all of the code paths corrected. I certainly could be wrong, but I haven't found how yet.
Matt Nordhoff (mnordhoff) wrote : | # |
John A Meinel wrote:
> Unfortunately, I'm not able to reproduce either of Matt's failures. At this point, *my* best guess is that he had not recompiled the extensions yet, so it wasn't getting all of the code paths corrected. I certainly could be wrong, but I haven't found how yet.
That's totally possible. At the time, keeping track of which changes I
was making to bzr.dev was really starting to drive me batty.
Making sure to recompile everything, I just tried to reproduce the
"merge" error and couldn't.
Pushing to a test branch (since I don't have any other revisions to
push), I can't reproduce that error, either.
...Ah, I found some real data to push. Still can't reproduce it.
Sorry for the false alarm.
--
John A Meinel (jameinel) wrote : | # |
Ok, so I'm resubmitting this one again, only this time, it's even better.
Including the changes to CHKMap to only use StaticTuple internally, I also found that _filter_text_keys was creating a whole lot of file_key tuples, and not interning the various attributes.
I also found that we create *millions* of integer objects, and most of them are redundant because of the identical 'group' start and end information. (IIRC, we create 1.4M integers at peak of parsing the chk stream, and only 300k of them are unique.)
the _filter_text_keys fix saved around 40MB peak, interning the integers saves another 7MB.
Overall, with this patch, I'm now down to 457MB peak when branching all of launchpad. Which is very close to my 50% goal. I also know a way to save another ~10MB or so, but it requires using SimpleSet, which I'm not sure I want to do yet.
Anyway, versus bzr.dev, this patch drops me from 548MB => 457MB peak memory.
Also, I've focused a bit on 'streaming' data out of a repository (versus the insert on the other side). In that scenario, the numbers are:
583MB bzr 2.0.1
422MB bzr.dev
338MB this patch
So not quite 50% savings, but I expect it to still be fairly noticable on Launchpad's code hosting.
Andrew Bennetts (spiv) wrote : | # |
Looks ok, it's mostly mechanical despite the large line count. A couple of questions:
772 - self.parent_
773 + (self.parent_
That change (and the other similar ones looks a bit odd.
Oh, is that how you're casting a StaticTuple to a tuple for string formatting? I would have thought "x.as_tuple()" would be about as fast as "(x[0],)", and it's certainly more readable. I suppose it is noticeably slower?
1115 -class TestSearchKeyFu
Why delete this TestCase?
John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Andrew Bennetts wrote:
> Review: Approve
> Looks ok, it's mostly mechanical despite the large line count. A couple of questions:
>
> 772 - self.parent_
> 773 + (self.parent_
>
> That change (and the other similar ones looks a bit odd.
The old code was formatting with:
"foo %s" % key
Which happened to work because 'key' was a tuple, but I don't think it
was actually *intentional*. I think it was written thinking that the
object was a string / simply formatted. I have the habit of always using
%(,) formatting, just in case the object I'm using happens to end up as
a tuple.
And the rest is just returning the simple sha1: string, rather than a
'tuple' (or StaticTuple).
I can certainly use '.as_tuple()' if you feel that is better. I've
always felt that the string formatting syntax *should* be
'str %s' % (arg1, arg2)
To make sure you don't pass an arg you don't quite understand. Sort of
the same idea that you should never do:
fprintf(file, str)
but alway
fprintf(file, "%s", str)
the former will often work, until it breaks horribly.
>
> Oh, is that how you're casting a StaticTuple to a tuple for string formatting? I would have thought "x.as_tuple()" would be about as fast as "(x[0],)", and it's certainly more readable. I suppose it is noticeably slower?
>
> 1115 -class TestSearchKeyFu
>
> Why delete this TestCase?
Because all of the tests are covered in the "test__chk_map" which is
also permuted against the C and python versions of _chk_map
John
=;->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAkr
sowAnjco9oE2eSg
=/cVN
-----END PGP SIGNATURE-----
John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Andrew Bennetts wrote:
> Review: Approve
> Looks ok, it's mostly mechanical despite the large line count. A couple of questions:
>
> 772 - self.parent_
> 773 + (self.parent_
>
> That change (and the other similar ones looks a bit odd.
>
> Oh, is that how you're casting a StaticTuple to a tuple for string formatting? I would have thought "x.as_tuple()" would be about as fast as "(x[0],)", and it's certainly more readable. I suppose it is noticeably slower?
>
> 1115 -class TestSearchKeyFu
>
> Why delete this TestCase?
I'll mention, I'm not particularly comfortable landing this until we
sort how to get PQM actually failing appropriately again. I'm pretty
sure it is good, and I'll run the tests here, but it is a more-invasive
change. I suppose Babune will give me the real fallout?
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAkr
MzoAn10O3EG6v2B
=pRWD
-----END PGP SIGNATURE-----
Robert Collins (lifeless) wrote : | # |
On Mon, 2009-10-26 at 02:00 +0000, John A Meinel wrote:
>
>
> I'll mention, I'm not particularly comfortable landing this until we
> sort how to get PQM actually failing appropriately again. I'm pretty
> sure it is good, and I'll run the tests here, but it is a
> more-invasive
> change. I suppose Babune will give me the real fallout?
Yes, it will.
-Rob
Preview Diff
1 | === modified file 'NEWS' |
2 | --- NEWS 2009-10-26 06:44:40 +0000 |
3 | +++ NEWS 2009-10-26 14:59:15 +0000 |
4 | @@ -46,6 +46,7 @@ |
5 | (John Arbash Meinel) |
6 | |
7 | * Peak memory under certain operations has been reduced significantly. |
8 | + (eg, 'bzr branch launchpad standalone' is cut in half) |
9 | (John Arbash Meinel) |
10 | |
11 | Documentation |
12 | @@ -74,14 +75,14 @@ |
13 | (John Arbash Meinel) |
14 | |
15 | * ``bzrlib._static_tuple_c.StaticTuple`` is now available and used by |
16 | - the btree index parser. This class functions similarly to ``tuple`` |
17 | - objects. However, it can only point to a limited collection of types. |
18 | - (Currently StaticTuple, str, unicode, None, bool, int, long, float, and |
19 | - not subclasses). This allows us to remove it from the garbage collector |
20 | - (it cannot be in a cycle), it also allows us to intern the objects. In |
21 | - testing, this can reduce peak memory by 20-40%, and significantly |
22 | - improve performance by removing objects from being inspected by the |
23 | - garbage collector. (John Arbash Meinel) |
24 | + the btree index parser and the chk map parser. This class functions |
25 | + similarly to ``tuple`` objects. However, it can only point to a limited |
26 | + collection of types. (Currently StaticTuple, str, unicode, None, bool, |
27 | + int, long, float, but not subclasses). This allows us to remove it from |
28 | + the garbage collector (it cannot be in a cycle), it also allows us to |
29 | + intern the objects. In testing, this can reduce peak memory by 20-40%, |
30 | + and significantly improve performance by removing objects from being |
31 | + inspected by the garbage collector. (John Arbash Meinel) |
32 | |
33 | * ``GroupCompressBlock._ensure_content()`` will now release the |
34 | ``zlib.decompressobj()`` when the first request is for all of the |
35 | |
36 | === modified file 'bzrlib/_chk_map_py.py' |
37 | --- bzrlib/_chk_map_py.py 2009-10-08 04:35:01 +0000 |
38 | +++ bzrlib/_chk_map_py.py 2009-10-26 14:59:15 +0000 |
39 | @@ -19,6 +19,8 @@ |
40 | import zlib |
41 | import struct |
42 | |
43 | +from bzrlib.static_tuple import StaticTuple |
44 | + |
45 | _LeafNode = None |
46 | _InternalNode = None |
47 | _unknown = None |
48 | @@ -93,7 +95,7 @@ |
49 | value_lines = lines[pos:pos+num_value_lines] |
50 | pos += num_value_lines |
51 | value = '\n'.join(value_lines) |
52 | - items[tuple(elements[:-1])] = value |
53 | + items[StaticTuple.from_sequence(elements[:-1])] = value |
54 | if len(items) != length: |
55 | raise AssertionError("item count (%d) mismatch for key %s," |
56 | " bytes %r" % (length, key, bytes)) |
57 | @@ -141,7 +143,7 @@ |
58 | for line in lines[5:]: |
59 | line = common_prefix + line |
60 | prefix, flat_key = line.rsplit('\x00', 1) |
61 | - items[prefix] = (flat_key,) |
62 | + items[prefix] = StaticTuple(flat_key,) |
63 | if len(items) == 0: |
64 | raise AssertionError("We didn't find any item for %s" % key) |
65 | result._items = items |
66 | @@ -155,4 +157,3 @@ |
67 | result._node_width = len(prefix) |
68 | result._search_prefix = common_prefix |
69 | return result |
70 | - |
71 | |
72 | === modified file 'bzrlib/_chk_map_pyx.pyx' |
73 | --- bzrlib/_chk_map_pyx.pyx 2009-10-08 04:35:01 +0000 |
74 | +++ bzrlib/_chk_map_pyx.pyx 2009-10-26 14:59:15 +0000 |
75 | @@ -29,9 +29,8 @@ |
76 | |
77 | cdef extern from "Python.h": |
78 | ctypedef int Py_ssize_t # Required for older pyrex versions |
79 | - struct _PyObject: |
80 | + ctypedef struct PyObject: |
81 | pass |
82 | - ctypedef _PyObject PyObject |
83 | int PyTuple_CheckExact(object p) |
84 | Py_ssize_t PyTuple_GET_SIZE(object t) |
85 | int PyString_CheckExact(object) |
86 | @@ -52,6 +51,18 @@ |
87 | char *PyString_AS_STRING_ptr "PyString_AS_STRING" (PyObject *s) |
88 | object PyString_FromStringAndSize(char*, Py_ssize_t) |
89 | |
90 | +# cimport all of the definitions we will need to access |
91 | +from _static_tuple_c cimport StaticTuple,\ |
92 | + import_static_tuple_c, StaticTuple_New, \ |
93 | + StaticTuple_Intern, StaticTuple_SET_ITEM, StaticTuple_CheckExact |
94 | + |
95 | +cdef extern from "_static_tuple_c.h": |
96 | + # Defined explicitly rather than cimport-ing. Trying to use cimport, the |
97 | + # type for PyObject is a different class that happens to have the same |
98 | + # name... |
99 | + PyObject * StaticTuple_GET_ITEM_ptr "StaticTuple_GET_ITEM" (StaticTuple, |
100 | + Py_ssize_t) |
101 | + |
102 | cdef extern from "zlib.h": |
103 | ctypedef unsigned long uLong |
104 | ctypedef unsigned int uInt |
105 | @@ -60,8 +71,14 @@ |
106 | uLong crc32(uLong crc, Bytef *buf, uInt len) |
107 | |
108 | |
109 | +# Set up the StaticTuple C_API functionality |
110 | +import_static_tuple_c() |
111 | + |
112 | +cdef object _LeafNode |
113 | _LeafNode = None |
114 | +cdef object _InternalNode |
115 | _InternalNode = None |
116 | +cdef object _unknown |
117 | _unknown = None |
118 | |
119 | # We shouldn't just copy this from _dirstate_helpers_pyx |
120 | @@ -91,9 +108,9 @@ |
121 | cdef char *c_out |
122 | cdef PyObject *bit |
123 | |
124 | - if not PyTuple_CheckExact(key): |
125 | - raise TypeError('key %r is not a tuple' % (key,)) |
126 | - num_bits = PyTuple_GET_SIZE(key) |
127 | + if not StaticTuple_CheckExact(key): |
128 | + raise TypeError('key %r is not a StaticTuple' % (key,)) |
129 | + num_bits = len(key) |
130 | # 4 bytes per crc32, and another 1 byte between bits |
131 | num_out_bytes = (9 * num_bits) - 1 |
132 | out = PyString_FromStringAndSize(NULL, num_out_bytes) |
133 | @@ -105,7 +122,7 @@ |
134 | # We use the _ptr variant, because GET_ITEM returns a borrowed |
135 | # reference, and Pyrex assumes that returned 'object' are a new |
136 | # reference |
137 | - bit = PyTuple_GET_ITEM_ptr(key, i) |
138 | + bit = StaticTuple_GET_ITEM_ptr(key, i) |
139 | if not PyString_CheckExact_ptr(bit): |
140 | raise TypeError('Bit %d of %r is not a string' % (i, key)) |
141 | c_bit = <Bytef *>PyString_AS_STRING_ptr(bit) |
142 | @@ -129,9 +146,9 @@ |
143 | cdef char *c_out |
144 | cdef PyObject *bit |
145 | |
146 | - if not PyTuple_CheckExact(key): |
147 | - raise TypeError('key %r is not a tuple' % (key,)) |
148 | - num_bits = PyTuple_GET_SIZE(key) |
149 | + if not StaticTuple_CheckExact(key): |
150 | + raise TypeError('key %r is not a StaticTuple' % (key,)) |
151 | + num_bits = len(key) |
152 | # 4 bytes per crc32, and another 1 byte between bits |
153 | num_out_bytes = (5 * num_bits) - 1 |
154 | out = PyString_FromStringAndSize(NULL, num_out_bytes) |
155 | @@ -140,10 +157,10 @@ |
156 | if i > 0: |
157 | c_out[0] = c'\x00' |
158 | c_out = c_out + 1 |
159 | - bit = PyTuple_GET_ITEM_ptr(key, i) |
160 | + bit = StaticTuple_GET_ITEM_ptr(key, i) |
161 | if not PyString_CheckExact_ptr(bit): |
162 | - raise TypeError('Bit %d of %r is not a string: %r' % (i, key, |
163 | - <object>bit)) |
164 | + raise TypeError('Bit %d of %r is not a string: %r' |
165 | + % (i, key, <object>bit)) |
166 | c_bit = <Bytef *>PyString_AS_STRING_ptr(bit) |
167 | c_len = PyString_GET_SIZE_ptr(bit) |
168 | crc_val = crc32(0, c_bit, c_len) |
169 | @@ -195,6 +212,7 @@ |
170 | cdef char *prefix, *value_start, *prefix_tail |
171 | cdef char *next_null, *last_null, *line_start |
172 | cdef char *c_entry, *entry_start |
173 | + cdef StaticTuple entry_bits |
174 | |
175 | if _LeafNode is None: |
176 | from bzrlib import chk_map |
177 | @@ -265,12 +283,14 @@ |
178 | if next_line == NULL: |
179 | raise ValueError('missing trailing newline') |
180 | cur = next_line + 1 |
181 | - entry_bits = PyTuple_New(width) |
182 | + entry_bits = StaticTuple_New(width) |
183 | for i from 0 <= i < num_prefix_bits: |
184 | + # TODO: Use PyList_GetItem, or turn prefix_bits into a |
185 | + # tuple/StaticTuple |
186 | entry = prefix_bits[i] |
187 | # SET_ITEM 'steals' a reference |
188 | Py_INCREF(entry) |
189 | - PyTuple_SET_ITEM(entry_bits, i, entry) |
190 | + StaticTuple_SET_ITEM(entry_bits, i, entry) |
191 | value = PyString_FromStringAndSize(value_start, next_line - value_start) |
192 | # The next entry bit needs the 'tail' from the prefix, and first part |
193 | # of the line |
194 | @@ -288,7 +308,7 @@ |
195 | memcpy(c_entry + prefix_tail_len, line_start, next_null - line_start) |
196 | Py_INCREF(entry) |
197 | i = num_prefix_bits |
198 | - PyTuple_SET_ITEM(entry_bits, i, entry) |
199 | + StaticTuple_SET_ITEM(entry_bits, i, entry) |
200 | while next_null != last_null: # We have remaining bits |
201 | i = i + 1 |
202 | if i > width: |
203 | @@ -301,11 +321,12 @@ |
204 | entry = PyString_FromStringAndSize(entry_start, |
205 | next_null - entry_start) |
206 | Py_INCREF(entry) |
207 | - PyTuple_SET_ITEM(entry_bits, i, entry) |
208 | + StaticTuple_SET_ITEM(entry_bits, i, entry) |
209 | if len(entry_bits) != width: |
210 | raise AssertionError( |
211 | 'Incorrect number of elements (%d vs %d)' |
212 | % (len(entry_bits)+1, width + 1)) |
213 | + entry_bits = StaticTuple_Intern(entry_bits) |
214 | PyDict_SetItem(items, entry_bits, value) |
215 | if len(items) != length: |
216 | raise ValueError("item count (%d) mismatch for key %s," |
217 | @@ -343,6 +364,8 @@ |
218 | _unknown = chk_map._unknown |
219 | result = _InternalNode(search_key_func=search_key_func) |
220 | |
221 | + if not StaticTuple_CheckExact(key): |
222 | + raise TypeError('key %r is not a StaticTuple' % (key,)) |
223 | if not PyString_CheckExact(bytes): |
224 | raise TypeError('bytes must be a plain string not %s' % (type(bytes),)) |
225 | |
226 | @@ -384,7 +407,8 @@ |
227 | memcpy(c_item_prefix + prefix_length, cur, next_null - cur) |
228 | flat_key = PyString_FromStringAndSize(next_null + 1, |
229 | next_line - next_null - 1) |
230 | - PyDict_SetItem(items, item_prefix, (flat_key,)) |
231 | + flat_key = StaticTuple(flat_key).intern() |
232 | + PyDict_SetItem(items, item_prefix, flat_key) |
233 | cur = next_line + 1 |
234 | assert len(items) > 0 |
235 | result._items = items |
236 | @@ -398,4 +422,3 @@ |
237 | result._node_width = len(item_prefix) |
238 | result._search_prefix = PyString_FromStringAndSize(prefix, prefix_length) |
239 | return result |
240 | - |
241 | |
242 | === modified file 'bzrlib/_static_tuple_c.pxd' |
243 | --- bzrlib/_static_tuple_c.pxd 2009-10-07 15:57:25 +0000 |
244 | +++ bzrlib/_static_tuple_c.pxd 2009-10-26 14:59:15 +0000 |
245 | @@ -36,5 +36,9 @@ |
246 | |
247 | # Steals a reference and val must be a valid type, no checking is done |
248 | void StaticTuple_SET_ITEM(StaticTuple key, Py_ssize_t offset, object val) |
249 | - object StaticTuple_GET_ITEM(StaticTuple key, Py_ssize_t offset) |
250 | + # We would normally use PyObject * here. However it seems that cython/pyrex |
251 | + # treat the PyObject defined in this header as something different than one |
252 | + # defined in a .pyx file. And since we don't INCREF, we need a raw pointer, |
253 | + # not an 'object' return value. |
254 | + void *StaticTuple_GET_ITEM(StaticTuple key, Py_ssize_t offset) |
255 | int StaticTuple_CheckExact(object) |
256 | |
257 | === modified file 'bzrlib/chk_map.py' |
258 | --- bzrlib/chk_map.py 2009-10-20 20:30:21 +0000 |
259 | +++ bzrlib/chk_map.py 2009-10-26 14:59:15 +0000 |
260 | @@ -52,6 +52,7 @@ |
261 | registry, |
262 | trace, |
263 | ) |
264 | +from bzrlib.static_tuple import StaticTuple |
265 | |
266 | # approx 4MB |
267 | # If each line is 50 bytes, and you have 255 internal pages, with 255-way fan |
268 | @@ -114,8 +115,9 @@ |
269 | """ |
270 | delete_count = 0 |
271 | # Check preconditions first. |
272 | - new_items = set([key for (old, key, value) in delta if key is not None |
273 | - and old is None]) |
274 | + as_st = StaticTuple.from_sequence |
275 | + new_items = set([as_st(key) for (old, key, value) in delta |
276 | + if key is not None and old is None]) |
277 | existing_new = list(self.iteritems(key_filter=new_items)) |
278 | if existing_new: |
279 | raise errors.InconsistentDeltaDelta(delta, |
280 | @@ -135,7 +137,7 @@ |
281 | |
282 | def _ensure_root(self): |
283 | """Ensure that the root node is an object not a key.""" |
284 | - if type(self._root_node) is tuple: |
285 | + if type(self._root_node) is StaticTuple: |
286 | # Demand-load the root |
287 | self._root_node = self._get_node(self._root_node) |
288 | |
289 | @@ -149,7 +151,7 @@ |
290 | :param node: A tuple key or node object. |
291 | :return: A node object. |
292 | """ |
293 | - if type(node) is tuple: |
294 | + if type(node) is StaticTuple: |
295 | bytes = self._read_bytes(node) |
296 | return _deserialise(bytes, node, |
297 | search_key_func=self._search_key_func) |
298 | @@ -196,7 +198,7 @@ |
299 | for key, value in sorted(node._items.iteritems()): |
300 | # Don't use prefix nor indent here to line up when used in |
301 | # tests in conjunction with assertEqualDiff |
302 | - result.append(' %r %r' % (key, value)) |
303 | + result.append(' %r %r' % (tuple(key), value)) |
304 | return result |
305 | |
306 | @classmethod |
307 | @@ -220,6 +222,9 @@ |
308 | root_key = klass._create_directly(store, initial_value, |
309 | maximum_size=maximum_size, key_width=key_width, |
310 | search_key_func=search_key_func) |
311 | + if type(root_key) is not StaticTuple: |
312 | + raise AssertionError('we got a %s instead of a StaticTuple' |
313 | + % (type(root_key),)) |
314 | return root_key |
315 | |
316 | @classmethod |
317 | @@ -240,9 +245,11 @@ |
318 | node = LeafNode(search_key_func=search_key_func) |
319 | node.set_maximum_size(maximum_size) |
320 | node._key_width = key_width |
321 | - node._items = dict(initial_value) |
322 | + as_st = StaticTuple.from_sequence |
323 | + node._items = dict([(as_st(key), val) for key, val |
324 | + in initial_value.iteritems()]) |
325 | node._raw_size = sum([node._key_value_len(key, value) |
326 | - for key,value in initial_value.iteritems()]) |
327 | + for key,value in node._items.iteritems()]) |
328 | node._len = len(node._items) |
329 | node._compute_search_prefix() |
330 | node._compute_serialised_prefix() |
331 | @@ -484,11 +491,14 @@ |
332 | def iteritems(self, key_filter=None): |
333 | """Iterate over the entire CHKMap's contents.""" |
334 | self._ensure_root() |
335 | + if key_filter is not None: |
336 | + as_st = StaticTuple.from_sequence |
337 | + key_filter = [as_st(key) for key in key_filter] |
338 | return self._root_node.iteritems(self._store, key_filter=key_filter) |
339 | |
340 | def key(self): |
341 | """Return the key for this map.""" |
342 | - if type(self._root_node) is tuple: |
343 | + if type(self._root_node) is StaticTuple: |
344 | return self._root_node |
345 | else: |
346 | return self._root_node._key |
347 | @@ -503,6 +513,7 @@ |
348 | :param key: A key to map. |
349 | :param value: The value to assign to key. |
350 | """ |
351 | + key = StaticTuple.from_sequence(key) |
352 | # Need a root object. |
353 | self._ensure_root() |
354 | prefix, node_details = self._root_node.map(self._store, key, value) |
355 | @@ -519,12 +530,15 @@ |
356 | def _node_key(self, node): |
357 | """Get the key for a node whether it's a tuple or node.""" |
358 | if type(node) is tuple: |
359 | + node = StaticTuple.from_sequence(node) |
360 | + if type(node) is StaticTuple: |
361 | return node |
362 | else: |
363 | return node._key |
364 | |
365 | def unmap(self, key, check_remap=True): |
366 | """remove key from the map.""" |
367 | + key = StaticTuple.from_sequence(key) |
368 | self._ensure_root() |
369 | if type(self._root_node) is InternalNode: |
370 | unmapped = self._root_node.unmap(self._store, key, |
371 | @@ -544,7 +558,7 @@ |
372 | |
373 | :return: The key of the root node. |
374 | """ |
375 | - if type(self._root_node) is tuple: |
376 | + if type(self._root_node) is StaticTuple: |
377 | # Already saved. |
378 | return self._root_node |
379 | keys = list(self._root_node.serialise(self._store)) |
380 | @@ -881,7 +895,7 @@ |
381 | lines.append(serialized[prefix_len:]) |
382 | lines.extend(value_lines) |
383 | sha1, _, _ = store.add_lines((None,), (), lines) |
384 | - self._key = ("sha1:" + sha1,) |
385 | + self._key = StaticTuple("sha1:" + sha1,).intern() |
386 | bytes = ''.join(lines) |
387 | if len(bytes) != self._current_size(): |
388 | raise AssertionError('Invalid _current_size') |
389 | @@ -1004,6 +1018,9 @@ |
390 | :param key: The key that the serialised node has. |
391 | :return: An InternalNode instance. |
392 | """ |
393 | + if type(key) is not StaticTuple: |
394 | + raise AssertionError('deserialise should be called with a' |
395 | + ' StaticTuple not %s' % (type(key),)) |
396 | return _deserialise_internal_node(bytes, key, |
397 | search_key_func=search_key_func) |
398 | |
399 | @@ -1034,7 +1051,7 @@ |
400 | # for whatever we are missing |
401 | shortcut = True |
402 | for prefix, node in self._items.iteritems(): |
403 | - if node.__class__ is tuple: |
404 | + if node.__class__ is StaticTuple: |
405 | keys[node] = (prefix, None) |
406 | else: |
407 | yield node, None |
408 | @@ -1069,7 +1086,7 @@ |
409 | # A given key can only match 1 child node, if it isn't |
410 | # there, then we can just return nothing |
411 | return |
412 | - if node.__class__ is tuple: |
413 | + if node.__class__ is StaticTuple: |
414 | keys[node] = (search_prefix, [key]) |
415 | else: |
416 | # This is loaded, and the only thing that can match, |
417 | @@ -1102,7 +1119,7 @@ |
418 | # We can ignore this one |
419 | continue |
420 | node_key_filter = prefix_to_keys[search_prefix] |
421 | - if node.__class__ is tuple: |
422 | + if node.__class__ is StaticTuple: |
423 | keys[node] = (search_prefix, node_key_filter) |
424 | else: |
425 | yield node, node_key_filter |
426 | @@ -1117,7 +1134,7 @@ |
427 | if sub_prefix in length_filter: |
428 | node_key_filter.extend(prefix_to_keys[sub_prefix]) |
429 | if node_key_filter: # this key matched something, yield it |
430 | - if node.__class__ is tuple: |
431 | + if node.__class__ is StaticTuple: |
432 | keys[node] = (prefix, node_key_filter) |
433 | else: |
434 | yield node, node_key_filter |
435 | @@ -1255,7 +1272,7 @@ |
436 | :return: An iterable of the keys inserted by this operation. |
437 | """ |
438 | for node in self._items.itervalues(): |
439 | - if type(node) is tuple: |
440 | + if type(node) is StaticTuple: |
441 | # Never deserialised. |
442 | continue |
443 | if node._key is not None: |
444 | @@ -1272,7 +1289,7 @@ |
445 | lines.append('%s\n' % (self._search_prefix,)) |
446 | prefix_len = len(self._search_prefix) |
447 | for prefix, node in sorted(self._items.items()): |
448 | - if type(node) is tuple: |
449 | + if type(node) is StaticTuple: |
450 | key = node[0] |
451 | else: |
452 | key = node._key[0] |
453 | @@ -1282,7 +1299,7 @@ |
454 | % (serialised, self._search_prefix)) |
455 | lines.append(serialised[prefix_len:]) |
456 | sha1, _, _ = store.add_lines((None,), (), lines) |
457 | - self._key = ("sha1:" + sha1,) |
458 | + self._key = StaticTuple("sha1:" + sha1,).intern() |
459 | _page_cache.add(self._key, ''.join(lines)) |
460 | yield self._key |
461 | |
462 | @@ -1317,7 +1334,7 @@ |
463 | raise AssertionError("unserialised nodes have no refs.") |
464 | refs = [] |
465 | for value in self._items.itervalues(): |
466 | - if type(value) is tuple: |
467 | + if type(value) is StaticTuple: |
468 | refs.append(value) |
469 | else: |
470 | refs.append(value.key()) |
471 | @@ -1437,6 +1454,12 @@ |
472 | |
473 | def __init__(self, store, new_root_keys, old_root_keys, |
474 | search_key_func, pb=None): |
475 | + # TODO: Should we add a StaticTuple barrier here? It would be nice to |
476 | + # force callers to use StaticTuple, because there will often be |
477 | + # lots of keys passed in here. And even if we cast it locally, |
478 | + # that just meanst that we will have *both* a StaticTuple and a |
479 | + # tuple() in memory, referring to the same object. (so a net |
480 | + # increase in memory, not a decrease.) |
481 | self._store = store |
482 | self._new_root_keys = new_root_keys |
483 | self._old_root_keys = old_root_keys |
484 | @@ -1444,11 +1467,16 @@ |
485 | # All uninteresting chks that we have seen. By the time they are added |
486 | # here, they should be either fully ignored, or queued up for |
487 | # processing |
488 | + # TODO: This might grow to a large size if there are lots of merge |
489 | + # parents, etc. However, it probably doesn't scale to O(history) |
490 | + # like _processed_new_refs does. |
491 | self._all_old_chks = set(self._old_root_keys) |
492 | # All items that we have seen from the old_root_keys |
493 | self._all_old_items = set() |
494 | # These are interesting items which were either read, or already in the |
495 | # interesting queue (so we don't need to walk them again) |
496 | + # TODO: processed_new_refs becomes O(all_chks), consider switching to |
497 | + # SimpleSet here. |
498 | self._processed_new_refs = set() |
499 | self._search_key_func = search_key_func |
500 | |
501 | @@ -1466,6 +1494,7 @@ |
502 | # this code. (We may want to evaluate saving the raw bytes into the |
503 | # page cache, which would allow a working tree update after the fetch |
504 | # to not have to read the bytes again.) |
505 | + as_st = StaticTuple.from_sequence |
506 | stream = self._store.get_record_stream(keys, 'unordered', True) |
507 | for record in stream: |
508 | if self._pb is not None: |
509 | @@ -1478,10 +1507,18 @@ |
510 | if type(node) is InternalNode: |
511 | # Note we don't have to do node.refs() because we know that |
512 | # there are no children that have been pushed into this node |
513 | + # Note: Using as_st() here seemed to save 1.2MB, which would |
514 | + # indicate that we keep 100k prefix_refs around while |
515 | + # processing. They *should* be shorter lived than that... |
516 | + # It does cost us ~10s of processing time |
517 | + #prefix_refs = [as_st(item) for item in node._items.iteritems()] |
518 | prefix_refs = node._items.items() |
519 | items = [] |
520 | else: |
521 | prefix_refs = [] |
522 | + # Note: We don't use a StaticTuple here. Profiling showed a |
523 | + # minor memory improvement (0.8MB out of 335MB peak 0.2%) |
524 | + # But a significant slowdown (15s / 145s, or 10%) |
525 | items = node._items.items() |
526 | yield record, node, prefix_refs, items |
527 | |
528 | @@ -1495,6 +1532,10 @@ |
529 | if p_r[1] not in all_old_chks] |
530 | new_refs = [p_r[1] for p_r in prefix_refs] |
531 | all_old_chks.update(new_refs) |
532 | + # TODO: This might be a good time to turn items into StaticTuple |
533 | + # instances and possibly intern them. However, this does not |
534 | + # impact 'initial branch' performance, so I'm not worrying |
535 | + # about this yet |
536 | self._all_old_items.update(items) |
537 | # Queue up the uninteresting references |
538 | # Don't actually put them in the 'to-read' queue until we have |
539 | @@ -1553,6 +1594,9 @@ |
540 | # current design allows for this, as callers will do the work |
541 | # to make the results unique. We might profile whether we |
542 | # gain anything by ensuring unique return values for items |
543 | + # TODO: This might be a good time to cast to StaticTuple, as |
544 | + # self._new_item_queue will hold the contents of multiple |
545 | + # records for an extended lifetime |
546 | new_items = [item for item in items |
547 | if item not in self._all_old_items] |
548 | self._new_item_queue.extend(new_items) |
549 | @@ -1583,16 +1627,31 @@ |
550 | if new_items: |
551 | yield None, new_items |
552 | refs = refs.difference(all_old_chks) |
553 | + processed_new_refs.update(refs) |
554 | while refs: |
555 | + # TODO: Using a SimpleSet for self._processed_new_refs and |
556 | + # saved as much as 10MB of peak memory. However, it requires |
557 | + # implementing a non-pyrex version. |
558 | next_refs = set() |
559 | next_refs_update = next_refs.update |
560 | # Inlining _read_nodes_from_store improves 'bzr branch bzr.dev' |
561 | # from 1m54s to 1m51s. Consider it. |
562 | for record, _, p_refs, items in self._read_nodes_from_store(refs): |
563 | - items = [item for item in items |
564 | - if item not in all_old_items] |
565 | + if all_old_items: |
566 | + # using the 'if' check saves about 145s => 141s, when |
567 | + # streaming initial branch of Launchpad data. |
568 | + items = [item for item in items |
569 | + if item not in all_old_items] |
570 | yield record, items |
571 | next_refs_update([p_r[1] for p_r in p_refs]) |
572 | + del p_refs |
573 | + # set1.difference(set/dict) walks all of set1, and checks if it |
574 | + # exists in 'other'. |
575 | + # set1.difference(iterable) walks all of iterable, and does a |
576 | + # 'difference_update' on a clone of set1. Pick wisely based on the |
577 | + # expected sizes of objects. |
578 | + # in our case it is expected that 'new_refs' will always be quite |
579 | + # small. |
580 | next_refs = next_refs.difference(all_old_chks) |
581 | next_refs = next_refs.difference(processed_new_refs) |
582 | processed_new_refs.update(next_refs) |
583 | @@ -1605,6 +1664,7 @@ |
584 | self._old_queue = [] |
585 | all_old_chks = self._all_old_chks |
586 | for record, _, prefix_refs, items in self._read_nodes_from_store(refs): |
587 | + # TODO: Use StaticTuple here? |
588 | self._all_old_items.update(items) |
589 | refs = [r for _,r in prefix_refs if r not in all_old_chks] |
590 | self._old_queue.extend(refs) |
591 | @@ -1660,3 +1720,22 @@ |
592 | ) |
593 | search_key_registry.register('hash-16-way', _search_key_16) |
594 | search_key_registry.register('hash-255-way', _search_key_255) |
595 | + |
596 | + |
597 | +def _check_key(key): |
598 | + """Helper function to assert that a key is properly formatted. |
599 | + |
600 | + This generally shouldn't be used in production code, but it can be helpful |
601 | + to debug problems. |
602 | + """ |
603 | + if type(key) is not StaticTuple: |
604 | + raise TypeError('key %r is not StaticTuple but %s' % (key, type(key))) |
605 | + if len(key) != 1: |
606 | + raise ValueError('key %r should have length 1, not %d' % (key, len(key),)) |
607 | + if type(key[0]) is not str: |
608 | + raise TypeError('key %r should hold a str, not %r' |
609 | + % (key, type(key[0]))) |
610 | + if not key[0].startswith('sha1:'): |
611 | + raise ValueError('key %r should point to a sha1:' % (key,)) |
612 | + |
613 | + |
614 | |
615 | === modified file 'bzrlib/groupcompress.py' |
616 | --- bzrlib/groupcompress.py 2009-10-19 15:45:10 +0000 |
617 | +++ bzrlib/groupcompress.py 2009-10-26 14:59:15 +0000 |
618 | @@ -1269,6 +1269,7 @@ |
619 | """See VersionedFiles.clear_cache()""" |
620 | self._group_cache.clear() |
621 | self._index._graph_index.clear_cache() |
622 | + self._index._int_cache.clear() |
623 | |
624 | def _check_add(self, key, lines, random_id, check_content): |
625 | """check that version_id and lines are safe to add.""" |
626 | @@ -1832,6 +1833,9 @@ |
627 | self.has_graph = parents |
628 | self._is_locked = is_locked |
629 | self._inconsistency_fatal = inconsistency_fatal |
630 | + # GroupCompress records tend to have the same 'group' start + offset |
631 | + # repeated over and over, this creates a surplus of ints |
632 | + self._int_cache = {} |
633 | if track_external_parent_refs: |
634 | self._key_dependencies = knit._KeyRefs( |
635 | track_new_keys=track_new_keys) |
636 | @@ -2013,11 +2017,24 @@ |
637 | """Convert an index value to position details.""" |
638 | bits = node[2].split(' ') |
639 | # It would be nice not to read the entire gzip. |
640 | + # start and stop are put into _int_cache because they are very common. |
641 | + # They define the 'group' that an entry is in, and many groups can have |
642 | + # thousands of objects. |
643 | + # Branching Launchpad, for example, saves ~600k integers, at 12 bytes |
644 | + # each, or about 7MB. Note that it might be even more when you consider |
645 | + # how PyInt is allocated in separate slabs. And you can't return a slab |
646 | + # to the OS if even 1 int on it is in use. Note though that Python uses |
647 | + # a LIFO when re-using PyInt slots, which probably causes more |
648 | + # fragmentation. |
649 | start = int(bits[0]) |
650 | + start = self._int_cache.setdefault(start, start) |
651 | stop = int(bits[1]) |
652 | + stop = self._int_cache.setdefault(stop, stop) |
653 | basis_end = int(bits[2]) |
654 | delta_end = int(bits[3]) |
655 | - return node[0], start, stop, basis_end, delta_end |
656 | + # We can't use StaticTuple here, because node[0] is a BTreeGraphIndex |
657 | + # instance... |
658 | + return (node[0], start, stop, basis_end, delta_end) |
659 | |
660 | def scan_unvalidated_index(self, graph_index): |
661 | """Inform this _GCGraphIndex that there is an unvalidated index. |
662 | |
663 | === modified file 'bzrlib/inventory.py' |
664 | --- bzrlib/inventory.py 2009-10-20 20:29:11 +0000 |
665 | +++ bzrlib/inventory.py 2009-10-26 14:59:15 +0000 |
666 | @@ -51,6 +51,7 @@ |
667 | ) |
668 | from bzrlib.symbol_versioning import deprecated_in, deprecated_method |
669 | from bzrlib.trace import mutter |
670 | +from bzrlib.static_tuple import StaticTuple |
671 | |
672 | |
673 | class InventoryEntry(object): |
674 | @@ -1599,8 +1600,6 @@ |
675 | interesting.add(None) # this will auto-filter it in the loop |
676 | remaining_parents.discard(None) |
677 | while remaining_parents: |
678 | - if None in remaining_parents: |
679 | - import pdb; pdb.set_trace() |
680 | next_parents = set() |
681 | for entry in self._getitems(remaining_parents): |
682 | next_parents.add(entry.parent_id) |
683 | @@ -1615,7 +1614,7 @@ |
684 | while directories_to_expand: |
685 | # Expand directories by looking in the |
686 | # parent_id_basename_to_file_id map |
687 | - keys = [(f,) for f in directories_to_expand] |
688 | + keys = [StaticTuple(f,).intern() for f in directories_to_expand] |
689 | directories_to_expand = set() |
690 | items = self.parent_id_basename_to_file_id.iteritems(keys) |
691 | next_file_ids = set([item[1] for item in items]) |
692 | @@ -1810,7 +1809,7 @@ |
693 | pass |
694 | deletes.add(file_id) |
695 | else: |
696 | - new_key = (file_id,) |
697 | + new_key = StaticTuple(file_id,) |
698 | new_value = result._entry_to_bytes(entry) |
699 | # Update caches. It's worth doing this whether |
700 | # we're propagating the old caches or not. |
701 | @@ -1819,13 +1818,13 @@ |
702 | if old_path is None: |
703 | old_key = None |
704 | else: |
705 | - old_key = (file_id,) |
706 | + old_key = StaticTuple(file_id,) |
707 | if self.id2path(file_id) != old_path: |
708 | raise errors.InconsistentDelta(old_path, file_id, |
709 | "Entry was at wrong other path %r." % |
710 | self.id2path(file_id)) |
711 | altered.add(file_id) |
712 | - id_to_entry_delta.append((old_key, new_key, new_value)) |
713 | + id_to_entry_delta.append(StaticTuple(old_key, new_key, new_value)) |
714 | if result.parent_id_basename_to_file_id is not None: |
715 | # parent_id, basename changes |
716 | if old_path is None: |
717 | @@ -1923,7 +1922,13 @@ |
718 | search_key_name = intern(info.get('search_key_name', 'plain')) |
719 | parent_id_basename_to_file_id = intern(info.get( |
720 | 'parent_id_basename_to_file_id', None)) |
721 | + if not parent_id_basename_to_file_id.startswith('sha1:'): |
722 | + raise ValueError('parent_id_basename_to_file_id should be a sha1' |
723 | + ' key not %r' % (parent_id_basename_to_file_id,)) |
724 | id_to_entry = info['id_to_entry'] |
725 | + if not id_to_entry.startswith('sha1:'): |
726 | + raise ValueError('id_to_entry should be a sha1' |
727 | + ' key not %r' % (id_to_entry,)) |
728 | |
729 | result = CHKInventory(search_key_name) |
730 | result.revision_id = revision_id |
731 | @@ -1932,12 +1937,13 @@ |
732 | result._search_key_name) |
733 | if parent_id_basename_to_file_id is not None: |
734 | result.parent_id_basename_to_file_id = chk_map.CHKMap( |
735 | - chk_store, (parent_id_basename_to_file_id,), |
736 | + chk_store, StaticTuple(parent_id_basename_to_file_id,), |
737 | search_key_func=search_key_func) |
738 | else: |
739 | result.parent_id_basename_to_file_id = None |
740 | |
741 | - result.id_to_entry = chk_map.CHKMap(chk_store, (id_to_entry,), |
742 | + result.id_to_entry = chk_map.CHKMap(chk_store, |
743 | + StaticTuple(id_to_entry,), |
744 | search_key_func=search_key_func) |
745 | if (result.revision_id,) != expected_revision_id: |
746 | raise ValueError("Mismatched revision id and expected: %r, %r" % |
747 | @@ -1965,7 +1971,8 @@ |
748 | id_to_entry_dict = {} |
749 | parent_id_basename_dict = {} |
750 | for path, entry in inventory.iter_entries(): |
751 | - id_to_entry_dict[(entry.file_id,)] = entry_to_bytes(entry) |
752 | + key = StaticTuple(entry.file_id,).intern() |
753 | + id_to_entry_dict[key] = entry_to_bytes(entry) |
754 | p_id_key = parent_id_basename_key(entry) |
755 | parent_id_basename_dict[p_id_key] = entry.file_id |
756 | |
757 | @@ -1994,7 +2001,7 @@ |
758 | parent_id = entry.parent_id |
759 | else: |
760 | parent_id = '' |
761 | - return parent_id, entry.name.encode('utf8') |
762 | + return StaticTuple(parent_id, entry.name.encode('utf8')).intern() |
763 | |
764 | def __getitem__(self, file_id): |
765 | """map a single file_id -> InventoryEntry.""" |
766 | @@ -2005,7 +2012,7 @@ |
767 | return result |
768 | try: |
769 | return self._bytes_to_entry( |
770 | - self.id_to_entry.iteritems([(file_id,)]).next()[1]) |
771 | + self.id_to_entry.iteritems([StaticTuple(file_id,)]).next()[1]) |
772 | except StopIteration: |
773 | # really we're passing an inventory, not a tree... |
774 | raise errors.NoSuchId(self, file_id) |
775 | @@ -2024,7 +2031,7 @@ |
776 | remaining.append(file_id) |
777 | else: |
778 | result.append(entry) |
779 | - file_keys = [(f,) for f in remaining] |
780 | + file_keys = [StaticTuple(f,).intern() for f in remaining] |
781 | for file_key, value in self.id_to_entry.iteritems(file_keys): |
782 | entry = self._bytes_to_entry(value) |
783 | result.append(entry) |
784 | @@ -2035,7 +2042,8 @@ |
785 | # Perhaps have an explicit 'contains' method on CHKMap ? |
786 | if self._fileid_to_entry_cache.get(file_id, None) is not None: |
787 | return True |
788 | - return len(list(self.id_to_entry.iteritems([(file_id,)]))) == 1 |
789 | + return len(list( |
790 | + self.id_to_entry.iteritems([StaticTuple(file_id,)]))) == 1 |
791 | |
792 | def is_root(self, file_id): |
793 | return file_id == self.root_id |
794 | @@ -2193,7 +2201,7 @@ |
795 | basename_utf8 = basename.encode('utf8') |
796 | file_id = self._path_to_fileid_cache.get(cur_path, None) |
797 | if file_id is None: |
798 | - key_filter = [(current_id, basename_utf8)] |
799 | + key_filter = [StaticTuple(current_id, basename_utf8)] |
800 | items = parent_id_index.iteritems(key_filter) |
801 | for (parent_id, name_utf8), file_id in items: |
802 | if parent_id != current_id or name_utf8 != basename_utf8: |
803 | @@ -2215,16 +2223,16 @@ |
804 | lines.append('search_key_name: %s\n' % (self._search_key_name,)) |
805 | lines.append("root_id: %s\n" % self.root_id) |
806 | lines.append('parent_id_basename_to_file_id: %s\n' % |
807 | - self.parent_id_basename_to_file_id.key()) |
808 | + (self.parent_id_basename_to_file_id.key()[0],)) |
809 | lines.append("revision_id: %s\n" % self.revision_id) |
810 | - lines.append("id_to_entry: %s\n" % self.id_to_entry.key()) |
811 | + lines.append("id_to_entry: %s\n" % (self.id_to_entry.key()[0],)) |
812 | else: |
813 | lines.append("revision_id: %s\n" % self.revision_id) |
814 | lines.append("root_id: %s\n" % self.root_id) |
815 | if self.parent_id_basename_to_file_id is not None: |
816 | lines.append('parent_id_basename_to_file_id: %s\n' % |
817 | - self.parent_id_basename_to_file_id.key()) |
818 | - lines.append("id_to_entry: %s\n" % self.id_to_entry.key()) |
819 | + (self.parent_id_basename_to_file_id.key()[0],)) |
820 | + lines.append("id_to_entry: %s\n" % (self.id_to_entry.key()[0],)) |
821 | return lines |
822 | |
823 | @property |
824 | @@ -2271,8 +2279,8 @@ |
825 | parent_id_index = self._chk_inventory.parent_id_basename_to_file_id |
826 | child_keys = set() |
827 | for (parent_id, name_utf8), file_id in parent_id_index.iteritems( |
828 | - key_filter=[(self.file_id,)]): |
829 | - child_keys.add((file_id,)) |
830 | + key_filter=[StaticTuple(self.file_id,)]): |
831 | + child_keys.add(StaticTuple(file_id,)) |
832 | cached = set() |
833 | for file_id_key in child_keys: |
834 | entry = self._chk_inventory._fileid_to_entry_cache.get( |
835 | |
836 | === modified file 'bzrlib/repofmt/groupcompress_repo.py' |
837 | --- bzrlib/repofmt/groupcompress_repo.py 2009-10-19 16:21:20 +0000 |
838 | +++ bzrlib/repofmt/groupcompress_repo.py 2009-10-26 14:59:15 +0000 |
839 | @@ -53,6 +53,7 @@ |
840 | ResumedPack, |
841 | Packer, |
842 | ) |
843 | +from bzrlib.static_tuple import StaticTuple |
844 | |
845 | |
846 | class GCPack(NewPack): |
847 | @@ -814,14 +815,16 @@ |
848 | ' no new_path %r' % (file_id,)) |
849 | if new_path == '': |
850 | new_inv.root_id = file_id |
851 | - parent_id_basename_key = ('', '') |
852 | + parent_id_basename_key = StaticTuple('', '').intern() |
853 | else: |
854 | utf8_entry_name = entry.name.encode('utf-8') |
855 | - parent_id_basename_key = (entry.parent_id, utf8_entry_name) |
856 | + parent_id_basename_key = StaticTuple(entry.parent_id, |
857 | + utf8_entry_name).intern() |
858 | new_value = entry_to_bytes(entry) |
859 | # Populate Caches? |
860 | # new_inv._path_to_fileid_cache[new_path] = file_id |
861 | - id_to_entry_dict[(file_id,)] = new_value |
862 | + key = StaticTuple(file_id).intern() |
863 | + id_to_entry_dict[key] = new_value |
864 | parent_id_basename_dict[parent_id_basename_key] = file_id |
865 | |
866 | new_inv._populate_from_dicts(self.chk_bytes, id_to_entry_dict, |
867 | @@ -949,6 +952,10 @@ |
868 | pb=pb): |
869 | for name, bytes in items: |
870 | (name_utf8, file_id, revision_id) = bytes_to_info(bytes) |
871 | + # TODO: consider interning file_id, revision_id here, or |
872 | + # pushing that intern() into bytes_to_info() |
873 | + # TODO: rich_root should always be True here, for all |
874 | + # repositories that support chk_bytes |
875 | if not rich_root and name_utf8 == '': |
876 | continue |
877 | try: |
878 | @@ -1189,7 +1196,9 @@ |
879 | # are always rich-root, so there are no synthesised root records to |
880 | # ignore. |
881 | _, file_id, revision_id = bytes_to_info(bytes) |
882 | - text_keys.add((file_id, revision_id)) |
883 | + file_id = intern(file_id) |
884 | + revision_id = intern(revision_id) |
885 | + text_keys.add(StaticTuple(file_id, revision_id).intern()) |
886 | yield record |
887 | |
888 | |
889 | |
890 | === modified file 'bzrlib/tests/test__chk_map.py' |
891 | --- bzrlib/tests/test__chk_map.py 2009-04-09 20:23:07 +0000 |
892 | +++ bzrlib/tests/test__chk_map.py 2009-10-26 14:59:15 +0000 |
893 | @@ -20,6 +20,8 @@ |
894 | chk_map, |
895 | tests, |
896 | ) |
897 | +from bzrlib.static_tuple import StaticTuple |
898 | +stuple = StaticTuple |
899 | |
900 | |
901 | def load_tests(standard_tests, module, loader): |
902 | @@ -67,25 +69,25 @@ |
903 | self.assertEqual(expected, actual, 'actual: %r' % (actual,)) |
904 | |
905 | def test_simple_16(self): |
906 | - self.assertSearchKey16('8C736521', ('foo',)) |
907 | - self.assertSearchKey16('8C736521\x008C736521', ('foo', 'foo')) |
908 | - self.assertSearchKey16('8C736521\x0076FF8CAA', ('foo', 'bar')) |
909 | - self.assertSearchKey16('ED82CD11', ('abcd',)) |
910 | + self.assertSearchKey16('8C736521', stuple('foo',)) |
911 | + self.assertSearchKey16('8C736521\x008C736521', stuple('foo', 'foo')) |
912 | + self.assertSearchKey16('8C736521\x0076FF8CAA', stuple('foo', 'bar')) |
913 | + self.assertSearchKey16('ED82CD11', stuple('abcd',)) |
914 | |
915 | def test_simple_255(self): |
916 | - self.assertSearchKey255('\x8cse!', ('foo',)) |
917 | - self.assertSearchKey255('\x8cse!\x00\x8cse!', ('foo', 'foo')) |
918 | - self.assertSearchKey255('\x8cse!\x00v\xff\x8c\xaa', ('foo', 'bar')) |
919 | + self.assertSearchKey255('\x8cse!', stuple('foo',)) |
920 | + self.assertSearchKey255('\x8cse!\x00\x8cse!', stuple('foo', 'foo')) |
921 | + self.assertSearchKey255('\x8cse!\x00v\xff\x8c\xaa', stuple('foo', 'bar')) |
922 | # The standard mapping for these would include '\n', so it should be |
923 | # mapped to '_' |
924 | - self.assertSearchKey255('\xfdm\x93_\x00P_\x1bL', ('<', 'V')) |
925 | + self.assertSearchKey255('\xfdm\x93_\x00P_\x1bL', stuple('<', 'V')) |
926 | |
927 | def test_255_does_not_include_newline(self): |
928 | # When mapping via _search_key_255, we should never have the '\n' |
929 | # character, but all other 255 values should be present |
930 | chars_used = set() |
931 | for char_in in range(256): |
932 | - search_key = self.module._search_key_255((chr(char_in),)) |
933 | + search_key = self.module._search_key_255(stuple(chr(char_in),)) |
934 | chars_used.update(search_key) |
935 | all_chars = set([chr(x) for x in range(256)]) |
936 | unused_chars = all_chars.symmetric_difference(chars_used) |
937 | @@ -113,10 +115,11 @@ |
938 | |
939 | def test_deserialise_empty(self): |
940 | node = self.module._deserialise_leaf_node( |
941 | - "chkleaf:\n10\n1\n0\n\n", ("sha1:1234",)) |
942 | + "chkleaf:\n10\n1\n0\n\n", stuple("sha1:1234",)) |
943 | self.assertEqual(0, len(node)) |
944 | self.assertEqual(10, node.maximum_size) |
945 | self.assertEqual(("sha1:1234",), node.key()) |
946 | + self.assertIsInstance(node.key(), StaticTuple) |
947 | self.assertIs(None, node._search_prefix) |
948 | self.assertIs(None, node._common_serialised_prefix) |
949 | |
950 | @@ -194,7 +197,8 @@ |
951 | |
952 | def assertDeserialiseErrors(self, text): |
953 | self.assertRaises((ValueError, IndexError), |
954 | - self.module._deserialise_internal_node, text, 'not-a-real-sha') |
955 | + self.module._deserialise_internal_node, text, |
956 | + stuple('not-a-real-sha',)) |
957 | |
958 | def test_raises_on_non_internal(self): |
959 | self.assertDeserialiseErrors('') |
960 | @@ -211,7 +215,7 @@ |
961 | |
962 | def test_deserialise_one(self): |
963 | node = self.module._deserialise_internal_node( |
964 | - "chknode:\n10\n1\n1\n\na\x00sha1:abcd\n", ('sha1:1234',)) |
965 | + "chknode:\n10\n1\n1\n\na\x00sha1:abcd\n", stuple('sha1:1234',)) |
966 | self.assertIsInstance(node, chk_map.InternalNode) |
967 | self.assertEqual(1, len(node)) |
968 | self.assertEqual(10, node.maximum_size) |
969 | @@ -221,7 +225,7 @@ |
970 | |
971 | def test_deserialise_with_prefix(self): |
972 | node = self.module._deserialise_internal_node( |
973 | - "chknode:\n10\n1\n1\npref\na\x00sha1:abcd\n", ('sha1:1234',)) |
974 | + "chknode:\n10\n1\n1\npref\na\x00sha1:abcd\n", stuple('sha1:1234',)) |
975 | self.assertIsInstance(node, chk_map.InternalNode) |
976 | self.assertEqual(1, len(node)) |
977 | self.assertEqual(10, node.maximum_size) |
978 | @@ -230,7 +234,7 @@ |
979 | self.assertEqual({'prefa': ('sha1:abcd',)}, node._items) |
980 | |
981 | node = self.module._deserialise_internal_node( |
982 | - "chknode:\n10\n1\n1\npref\n\x00sha1:abcd\n", ('sha1:1234',)) |
983 | + "chknode:\n10\n1\n1\npref\n\x00sha1:abcd\n", stuple('sha1:1234',)) |
984 | self.assertIsInstance(node, chk_map.InternalNode) |
985 | self.assertEqual(1, len(node)) |
986 | self.assertEqual(10, node.maximum_size) |
987 | @@ -240,7 +244,8 @@ |
988 | |
989 | def test_deserialise_pref_with_null(self): |
990 | node = self.module._deserialise_internal_node( |
991 | - "chknode:\n10\n1\n1\npref\x00fo\n\x00sha1:abcd\n", ('sha1:1234',)) |
992 | + "chknode:\n10\n1\n1\npref\x00fo\n\x00sha1:abcd\n", |
993 | + stuple('sha1:1234',)) |
994 | self.assertIsInstance(node, chk_map.InternalNode) |
995 | self.assertEqual(1, len(node)) |
996 | self.assertEqual(10, node.maximum_size) |
997 | @@ -250,7 +255,8 @@ |
998 | |
999 | def test_deserialise_with_null_pref(self): |
1000 | node = self.module._deserialise_internal_node( |
1001 | - "chknode:\n10\n1\n1\npref\x00fo\n\x00\x00sha1:abcd\n", ('sha1:1234',)) |
1002 | + "chknode:\n10\n1\n1\npref\x00fo\n\x00\x00sha1:abcd\n", |
1003 | + stuple('sha1:1234',)) |
1004 | self.assertIsInstance(node, chk_map.InternalNode) |
1005 | self.assertEqual(1, len(node)) |
1006 | self.assertEqual(10, node.maximum_size) |
1007 | |
1008 | === modified file 'bzrlib/tests/test_chk_map.py' |
1009 | --- bzrlib/tests/test_chk_map.py 2009-10-08 04:35:01 +0000 |
1010 | +++ bzrlib/tests/test_chk_map.py 2009-10-26 14:59:15 +0000 |
1011 | @@ -31,6 +31,7 @@ |
1012 | LeafNode, |
1013 | Node, |
1014 | ) |
1015 | +from bzrlib.static_tuple import StaticTuple |
1016 | |
1017 | |
1018 | class TestNode(tests.TestCase): |
1019 | @@ -831,13 +832,13 @@ |
1020 | # 'ab' and 'ac' nodes |
1021 | chkmap.map(('aad',), 'v') |
1022 | self.assertIsInstance(chkmap._root_node._items['aa'], InternalNode) |
1023 | - self.assertIsInstance(chkmap._root_node._items['ab'], tuple) |
1024 | - self.assertIsInstance(chkmap._root_node._items['ac'], tuple) |
1025 | + self.assertIsInstance(chkmap._root_node._items['ab'], StaticTuple) |
1026 | + self.assertIsInstance(chkmap._root_node._items['ac'], StaticTuple) |
1027 | # Unmapping 'acd' can notice that 'aa' is an InternalNode and not have |
1028 | # to map in 'ab' |
1029 | chkmap.unmap(('acd',)) |
1030 | self.assertIsInstance(chkmap._root_node._items['aa'], InternalNode) |
1031 | - self.assertIsInstance(chkmap._root_node._items['ab'], tuple) |
1032 | + self.assertIsInstance(chkmap._root_node._items['ab'], StaticTuple) |
1033 | |
1034 | def test_unmap_without_fitting_doesnt_page_in(self): |
1035 | store = self.get_chk_bytes() |
1036 | @@ -860,8 +861,8 @@ |
1037 | chkmap.map(('aaf',), 'v') |
1038 | # At this point, the previous nodes should not be paged in, but the |
1039 | # newly added nodes would be |
1040 | - self.assertIsInstance(chkmap._root_node._items['aaa'], tuple) |
1041 | - self.assertIsInstance(chkmap._root_node._items['aab'], tuple) |
1042 | + self.assertIsInstance(chkmap._root_node._items['aaa'], StaticTuple) |
1043 | + self.assertIsInstance(chkmap._root_node._items['aab'], StaticTuple) |
1044 | self.assertIsInstance(chkmap._root_node._items['aac'], LeafNode) |
1045 | self.assertIsInstance(chkmap._root_node._items['aad'], LeafNode) |
1046 | self.assertIsInstance(chkmap._root_node._items['aae'], LeafNode) |
1047 | @@ -869,8 +870,8 @@ |
1048 | # Now unmapping one of the new nodes will use only the already-paged-in |
1049 | # nodes to determine that we don't need to do more. |
1050 | chkmap.unmap(('aaf',)) |
1051 | - self.assertIsInstance(chkmap._root_node._items['aaa'], tuple) |
1052 | - self.assertIsInstance(chkmap._root_node._items['aab'], tuple) |
1053 | + self.assertIsInstance(chkmap._root_node._items['aaa'], StaticTuple) |
1054 | + self.assertIsInstance(chkmap._root_node._items['aab'], StaticTuple) |
1055 | self.assertIsInstance(chkmap._root_node._items['aac'], LeafNode) |
1056 | self.assertIsInstance(chkmap._root_node._items['aad'], LeafNode) |
1057 | self.assertIsInstance(chkmap._root_node._items['aae'], LeafNode) |
1058 | @@ -897,9 +898,9 @@ |
1059 | chkmap.map(('aad',), 'v') |
1060 | # At this point, the previous nodes should not be paged in, but the |
1061 | # newly added node would be |
1062 | - self.assertIsInstance(chkmap._root_node._items['aaa'], tuple) |
1063 | - self.assertIsInstance(chkmap._root_node._items['aab'], tuple) |
1064 | - self.assertIsInstance(chkmap._root_node._items['aac'], tuple) |
1065 | + self.assertIsInstance(chkmap._root_node._items['aaa'], StaticTuple) |
1066 | + self.assertIsInstance(chkmap._root_node._items['aab'], StaticTuple) |
1067 | + self.assertIsInstance(chkmap._root_node._items['aac'], StaticTuple) |
1068 | self.assertIsInstance(chkmap._root_node._items['aad'], LeafNode) |
1069 | # Unmapping the new node will check the existing nodes to see if they |
1070 | # would fit. |
1071 | @@ -937,9 +938,9 @@ |
1072 | chkmap.map(('aad',), 'v') |
1073 | # At this point, the previous nodes should not be paged in, but the |
1074 | # newly added node would be |
1075 | - self.assertIsInstance(chkmap._root_node._items['aaa'], tuple) |
1076 | - self.assertIsInstance(chkmap._root_node._items['aab'], tuple) |
1077 | - self.assertIsInstance(chkmap._root_node._items['aac'], tuple) |
1078 | + self.assertIsInstance(chkmap._root_node._items['aaa'], StaticTuple) |
1079 | + self.assertIsInstance(chkmap._root_node._items['aab'], StaticTuple) |
1080 | + self.assertIsInstance(chkmap._root_node._items['aac'], StaticTuple) |
1081 | self.assertIsInstance(chkmap._root_node._items['aad'], LeafNode) |
1082 | # Now clear the page cache, and only include 2 of the children in the |
1083 | # cache |
1084 | @@ -954,7 +955,7 @@ |
1085 | # Unmapping the new node will check the nodes from the page cache |
1086 | # first, and not have to read in 'aaa' |
1087 | chkmap.unmap(('aad',)) |
1088 | - self.assertIsInstance(chkmap._root_node._items['aaa'], tuple) |
1089 | + self.assertIsInstance(chkmap._root_node._items['aaa'], StaticTuple) |
1090 | self.assertIsInstance(chkmap._root_node._items['aab'], LeafNode) |
1091 | self.assertIsInstance(chkmap._root_node._items['aac'], LeafNode) |
1092 | |
1093 | @@ -974,9 +975,9 @@ |
1094 | chkmap.map(('aaf',), 'val') |
1095 | # At this point, the previous nodes should not be paged in, but the |
1096 | # newly added node would be |
1097 | - self.assertIsInstance(chkmap._root_node._items['aaa'], tuple) |
1098 | - self.assertIsInstance(chkmap._root_node._items['aab'], tuple) |
1099 | - self.assertIsInstance(chkmap._root_node._items['aac'], tuple) |
1100 | + self.assertIsInstance(chkmap._root_node._items['aaa'], StaticTuple) |
1101 | + self.assertIsInstance(chkmap._root_node._items['aab'], StaticTuple) |
1102 | + self.assertIsInstance(chkmap._root_node._items['aac'], StaticTuple) |
1103 | self.assertIsInstance(chkmap._root_node._items['aad'], LeafNode) |
1104 | self.assertIsInstance(chkmap._root_node._items['aae'], LeafNode) |
1105 | self.assertIsInstance(chkmap._root_node._items['aaf'], LeafNode) |
1106 | @@ -984,9 +985,9 @@ |
1107 | # Unmapping a new node will see the other nodes that are already in |
1108 | # memory, and not need to page in anything else |
1109 | chkmap.unmap(('aad',)) |
1110 | - self.assertIsInstance(chkmap._root_node._items['aaa'], tuple) |
1111 | - self.assertIsInstance(chkmap._root_node._items['aab'], tuple) |
1112 | - self.assertIsInstance(chkmap._root_node._items['aac'], tuple) |
1113 | + self.assertIsInstance(chkmap._root_node._items['aaa'], StaticTuple) |
1114 | + self.assertIsInstance(chkmap._root_node._items['aab'], StaticTuple) |
1115 | + self.assertIsInstance(chkmap._root_node._items['aac'], StaticTuple) |
1116 | self.assertIsInstance(chkmap._root_node._items['aae'], LeafNode) |
1117 | self.assertIsInstance(chkmap._root_node._items['aaf'], LeafNode) |
1118 | |
1119 | @@ -1031,8 +1032,8 @@ |
1120 | {('a',): 'content here', ('b',): 'more content'}, |
1121 | chk_bytes=basis._store, maximum_size=10) |
1122 | list(target.iter_changes(basis)) |
1123 | - self.assertIsInstance(target._root_node, tuple) |
1124 | - self.assertIsInstance(basis._root_node, tuple) |
1125 | + self.assertIsInstance(target._root_node, StaticTuple) |
1126 | + self.assertIsInstance(basis._root_node, StaticTuple) |
1127 | |
1128 | def test_iter_changes_ab_ab_changed_values_shown(self): |
1129 | basis = self._get_map({('a',): 'content here', ('b',): 'more content'}, |
1130 | @@ -1144,9 +1145,12 @@ |
1131 | |
1132 | def test_iteritems_keys_prefixed_by_2_width_nodes_hashed(self): |
1133 | search_key_func = chk_map.search_key_registry.get('hash-16-way') |
1134 | - self.assertEqual('E8B7BE43\x00E8B7BE43', search_key_func(('a', 'a'))) |
1135 | - self.assertEqual('E8B7BE43\x0071BEEFF9', search_key_func(('a', 'b'))) |
1136 | - self.assertEqual('71BEEFF9\x0000000000', search_key_func(('b', ''))) |
1137 | + self.assertEqual('E8B7BE43\x00E8B7BE43', |
1138 | + search_key_func(StaticTuple('a', 'a'))) |
1139 | + self.assertEqual('E8B7BE43\x0071BEEFF9', |
1140 | + search_key_func(StaticTuple('a', 'b'))) |
1141 | + self.assertEqual('71BEEFF9\x0000000000', |
1142 | + search_key_func(StaticTuple('b', ''))) |
1143 | chkmap = self._get_map( |
1144 | {("a","a"):"content here", ("a", "b",):"more content", |
1145 | ("b", ""): 'boring content'}, |
1146 | @@ -1449,41 +1453,6 @@ |
1147 | , chkmap._dump_tree()) |
1148 | |
1149 | |
1150 | -class TestSearchKeyFuncs(tests.TestCase): |
1151 | - |
1152 | - def assertSearchKey16(self, expected, key): |
1153 | - self.assertEqual(expected, chk_map._search_key_16(key)) |
1154 | - |
1155 | - def assertSearchKey255(self, expected, key): |
1156 | - actual = chk_map._search_key_255(key) |
1157 | - self.assertEqual(expected, actual, 'actual: %r' % (actual,)) |
1158 | - |
1159 | - def test_simple_16(self): |
1160 | - self.assertSearchKey16('8C736521', ('foo',)) |
1161 | - self.assertSearchKey16('8C736521\x008C736521', ('foo', 'foo')) |
1162 | - self.assertSearchKey16('8C736521\x0076FF8CAA', ('foo', 'bar')) |
1163 | - self.assertSearchKey16('ED82CD11', ('abcd',)) |
1164 | - |
1165 | - def test_simple_255(self): |
1166 | - self.assertSearchKey255('\x8cse!', ('foo',)) |
1167 | - self.assertSearchKey255('\x8cse!\x00\x8cse!', ('foo', 'foo')) |
1168 | - self.assertSearchKey255('\x8cse!\x00v\xff\x8c\xaa', ('foo', 'bar')) |
1169 | - # The standard mapping for these would include '\n', so it should be |
1170 | - # mapped to '_' |
1171 | - self.assertSearchKey255('\xfdm\x93_\x00P_\x1bL', ('<', 'V')) |
1172 | - |
1173 | - def test_255_does_not_include_newline(self): |
1174 | - # When mapping via _search_key_255, we should never have the '\n' |
1175 | - # character, but all other 255 values should be present |
1176 | - chars_used = set() |
1177 | - for char_in in range(256): |
1178 | - search_key = chk_map._search_key_255((chr(char_in),)) |
1179 | - chars_used.update(search_key) |
1180 | - all_chars = set([chr(x) for x in range(256)]) |
1181 | - unused_chars = all_chars.symmetric_difference(chars_used) |
1182 | - self.assertEqual(set('\n'), unused_chars) |
1183 | - |
1184 | - |
1185 | class TestLeafNode(TestCaseWithStore): |
1186 | |
1187 | def test_current_size_empty(self): |
1188 | @@ -1908,18 +1877,19 @@ |
1189 | search_key_func = chk_map.search_key_registry.get('hash-255-way') |
1190 | node = InternalNode(search_key_func=search_key_func) |
1191 | leaf1 = LeafNode(search_key_func=search_key_func) |
1192 | - leaf1.map(None, ('foo bar',), 'quux') |
1193 | + leaf1.map(None, StaticTuple('foo bar',), 'quux') |
1194 | leaf2 = LeafNode(search_key_func=search_key_func) |
1195 | - leaf2.map(None, ('strange',), 'beast') |
1196 | - self.assertEqual('\xbeF\x014', search_key_func(('foo bar',))) |
1197 | - self.assertEqual('\x85\xfa\xf7K', search_key_func(('strange',))) |
1198 | + leaf2.map(None, StaticTuple('strange',), 'beast') |
1199 | + self.assertEqual('\xbeF\x014', search_key_func(StaticTuple('foo bar',))) |
1200 | + self.assertEqual('\x85\xfa\xf7K', search_key_func(StaticTuple('strange',))) |
1201 | node.add_node("\xbe", leaf1) |
1202 | # This sets up a path that should not be followed - it will error if |
1203 | # the code tries to. |
1204 | node._items['\xbe'] = None |
1205 | node.add_node("\x85", leaf2) |
1206 | self.assertEqual([(('strange',), 'beast')], |
1207 | - sorted(node.iteritems(None, [('strange',), ('weird',)]))) |
1208 | + sorted(node.iteritems(None, [StaticTuple('strange',), |
1209 | + StaticTuple('weird',)]))) |
1210 | |
1211 | def test_iteritems_partial_empty(self): |
1212 | node = InternalNode() |
1213 | @@ -1932,7 +1902,7 @@ |
1214 | # Ensure test validity: nothing paged in below the root. |
1215 | self.assertEqual(2, |
1216 | len([value for value in node._items.values() |
1217 | - if type(value) == tuple])) |
1218 | + if type(value) is StaticTuple])) |
1219 | # now, mapping to k3 should add a k3 leaf |
1220 | prefix, nodes = node.map(None, ('k3',), 'quux') |
1221 | self.assertEqual("k", prefix) |
1222 | @@ -1971,7 +1941,7 @@ |
1223 | # Ensure test validity: nothing paged in below the root. |
1224 | self.assertEqual(2, |
1225 | len([value for value in node._items.values() |
1226 | - if type(value) == tuple])) |
1227 | + if type(value) is StaticTuple])) |
1228 | # now, mapping to k23 causes k22 ('k2' in node) to split into k22 and |
1229 | # k23, which for simplicity in the current implementation generates |
1230 | # a new internal node between node, and k22/k23. |
1231 | @@ -2016,9 +1986,12 @@ |
1232 | node = InternalNode(search_key_func=search_key_func) |
1233 | node._key_width = 2 |
1234 | node._node_width = 4 |
1235 | - self.assertEqual('E8B7BE43\x0071BEEFF9', search_key_func(('a', 'b'))) |
1236 | - self.assertEqual('E8B7', node._search_prefix_filter(('a', 'b'))) |
1237 | - self.assertEqual('E8B7', node._search_prefix_filter(('a',))) |
1238 | + self.assertEqual('E8B7BE43\x0071BEEFF9', search_key_func( |
1239 | + StaticTuple('a', 'b'))) |
1240 | + self.assertEqual('E8B7', node._search_prefix_filter( |
1241 | + StaticTuple('a', 'b'))) |
1242 | + self.assertEqual('E8B7', node._search_prefix_filter( |
1243 | + StaticTuple('a',))) |
1244 | |
1245 | def test_unmap_k23_from_k1_k22_k23_gives_k1_k22_tree_new(self): |
1246 | chkmap = self._get_map( |
This is an update to my earlier chk_map work (which Andrew had approved.)
The main changes are
1) I have benchmarked it as being specifically helpful. Namely it saves 30MB during 'bzr branch launchpad'. (548MB w/ bzr.dev => 518MB)
2) I changed it temporarily to require StaticTuples everywhere, and then updated the code in 'inventory.py' and 'groupcompress_ repo.py' that were touching things directly.
3) I then updated the interface so that LeafNode and InternalNode and things underneath them require exactly StaticTuples, but CHKMap itself will cast to StaticTuple on demand. This seemed a fair way to migrate. Production code should never be poking at the internals of the nodes, and instead work via the CHKMap api.
This should help avoid spurious failures in the short term, and still encourage good habits in the long term.