Merge lp:~jameinel/bzr/2.1-static-tuple-btree-string-intern into lp:bzr
- 2.1-static-tuple-btree-string-intern
- Merge into bzr.dev
Status: | Merged |
---|---|
Approved by: | Andrew Bennetts |
Approved revision: | no longer in the source branch. |
Merged at revision: | not available |
Proposed branch: | lp:~jameinel/bzr/2.1-static-tuple-btree-string-intern |
Merge into: | lp:bzr |
Diff against target: |
553 lines 12 files modified
NEWS (+13/-7) bzrlib/_bencode_pyx.pyx (+9/-1) bzrlib/_btree_serializer_pyx.pyx (+80/-38) bzrlib/_static_tuple_c.c (+8/-2) bzrlib/btree_index.py (+2/-0) bzrlib/builtins.py (+4/-1) bzrlib/index.py (+4/-1) bzrlib/repository.py (+7/-0) bzrlib/static_tuple.py (+25/-0) bzrlib/tests/test__static_tuple.py (+21/-0) bzrlib/util/_bencode_py.py (+7/-0) setup.py (+2/-1) |
To merge this branch: | bzr merge lp:~jameinel/bzr/2.1-static-tuple-btree-string-intern |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Andrew Bennetts | Approve | ||
bzr-core | Pending | ||
Review via email: mp+13296@code.launchpad.net |
Commit message
Description of the change
John A Meinel (jameinel) wrote : | # |
Andrew Bennetts (spiv) wrote : | # |
384 + refs_as_tuples = tuple([
385 + for ref_list in node[3]])
I wonder if it would be worth adding a convenience method, perhaps StaticTuple.
431 + # I don't believe we can define a method by which
432 + # (prefix,) + StaticTuple will work, though we could
In plain Python you could define an __radd__ for this, so surely there's a way to do this in C?
class T(object):
def __radd__(self, other):
return 'haha!'
t = T()
print ('tuple',) + t # prints 'haha!'
You may need to do something odd like provide the nb_add slot, even though this isn't really a numeric type, but I think that's ok. (All pure python classes would have that I think, even the non-numeric ones, so presumably having tp_as_number filled doesn't automatically make Python do dumb things.)
I think we can live without this, but it would be nice.
488 + k1 = stuple(
489 + stuple('<email address hidden>',))
490 + k2 = stuple(
491 + stuple('<email address hidden>',)),
492 + stuple('<email address hidden>',))
This test data is needlessly complex and hard to read. Why not e.g.:
k1 = stuple(
k2 = stuple(
Which is structurally the same and much easier to follow.
John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Andrew Bennetts wrote:
> Review: Approve
> 384 + refs_as_tuples = tuple([
> 385 + for ref_list in node[3]])
>
> I wonder if it would be worth adding a convenience method, perhaps StaticTuple.
As for "as_tuples()" I would be fine just extending ".as_tuple()" to do
exactly that. The main restriction is that we may not always have tuples
at this point.
At least so far, tuple is interchangeable w/ StaticTuple.
>
> 431 + # I don't believe we can define a method by which
> 432 + # (prefix,) + StaticTuple will work, though we could
>
> In plain Python you could define an __radd__ for this, so surely there's a way to do this in C?
>
> class T(object):
> def __radd__(self, other):
> return 'haha!'
> t = T()
>
> print ('tuple',) + t # prints 'haha!'
>
Tuple uses "tp_as_
didn't think worked the other way. But thanks for pointing me to this,
I'll look into it.
> You may need to do something odd like provide the nb_add slot, even though this isn't really a numeric type, but I think that's ok. (All pure python classes would have that I think, even the non-numeric ones, so presumably having tp_as_number filled doesn't automatically make Python do dumb things.)
>
> I think we can live without this, but it would be nice.
Actually, the main reason I added the comment is because I expect things
to fail at that point, but I haven't gotten a test case to trigger it,
and it also won't trigger with --2a formats... (They don't have missing
compression parents.)
>
> 488 + k1 = stuple(
> 489 + stuple('<email address hidden>',))
> 490 + k2 = stuple(
> 491 + stuple('<email address hidden>',)),
> 492 + stuple('<email address hidden>',))
>
> This test data is needlessly complex and hard to read. Why not e.g.:
>
> k1 = stuple(
> k2 = stuple(
>
> Which is structurally the same and much easier to follow.
Sure. I did the above because that was the actual data I was getting. Of
course, I've since narrowed it down to a bug in interning....
Anyway, I'm happy to simplify it, and should have done so before submitting.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAkr
RwcAni/
=ZhaX
-----END PGP SIGNATURE-----
Andrew Bennetts (spiv) wrote : | # |
[...]
> > In plain Python you could define an __radd__ for this, so surely there's a
> > way to do this in C?
[...]
> > You may need to do something odd like provide the nb_add slot, even though
> > this isn't really a numeric type, but I think that's ok. (All pure python
> > classes would have that I think, even the non-numeric ones, so presumably
> > having tp_as_number filled doesn't automatically make Python do dumb
> > things.)
By the way, because lack of support for tuple + StaticTuple caused all the
babune builders to go red, I took a look at this. (specifically,
bzrlib.
was failing)
You actually need nb_coerce as well as nb_add. Here's a rough patch that does
this. The error it gives when you try to add a plain tuple with incompatible
elements (e.g. ints) is probably not ideal, but it works.
-Andrew.
1 | === modified file 'bzrlib/_static_tuple_c.c' |
2 | --- bzrlib/_static_tuple_c.c 2009-10-15 18:18:44 +0000 |
3 | +++ bzrlib/_static_tuple_c.c 2009-10-16 08:24:36 +0000 |
4 | @@ -513,6 +513,78 @@ |
5 | "Check to see if this tuple has been interned.\n"; |
6 | |
7 | |
8 | +int |
9 | +StaticTuple_coerce(PyObject **v, PyObject **w) |
10 | +{ |
11 | + StaticTuple *st; |
12 | + if (PyTuple_Check(*v)) { |
13 | + st = (StaticTuple*) StaticTuple_new_constructor( |
14 | + &StaticTuple_Type, *v, NULL); |
15 | + if (!st) |
16 | + return -1; |
17 | + Py_INCREF(st); |
18 | + *v = (PyObject*)st; |
19 | + } else if (StaticTuple_CheckExact(*v)) |
20 | + Py_INCREF(*v); |
21 | + else |
22 | + return 1; |
23 | + |
24 | + if (PyTuple_Check(*w)) { |
25 | + st = (StaticTuple*) StaticTuple_new_constructor( |
26 | + &StaticTuple_Type, *w, NULL); |
27 | + if (!st) |
28 | + return -1; |
29 | + Py_INCREF(st); |
30 | + *w = (PyObject*)st; |
31 | + } else if (StaticTuple_CheckExact(*w)) |
32 | + Py_INCREF(*w); |
33 | + else |
34 | + return 1; |
35 | + return 0; |
36 | +} |
37 | + |
38 | +static PyObject * |
39 | +StaticTuple_add(PyObject *v, PyObject *w) |
40 | +{ |
41 | + PyObject *v_t = NULL, *w_t = NULL; |
42 | + PyObject *tmp_tuple, *result; |
43 | + /* StaticTuples and plain tuples may be added (concatenated) to |
44 | + * StaticTuples. |
45 | + */ |
46 | + if (StaticTuple_CheckExact(v)) { |
47 | + v_t = StaticTuple_as_tuple((StaticTuple*)v); |
48 | + if (!v_t) |
49 | + goto fail; |
50 | + } else if (PyTuple_Check(v)) |
51 | + v_t = v; |
52 | + else |
53 | + goto not_imp; |
54 | + |
55 | + if (StaticTuple_CheckExact(w)) { |
56 | + w_t = StaticTuple_as_tuple((StaticTuple*)w); |
57 | + if (!w_t) |
58 | + goto fail; |
59 | + } else if (PyTuple_Check(w)) |
60 | + w_t = w; |
61 | + else |
62 | + goto not_imp; |
63 | + |
64 | + tmp_tuple = PySequence_Concat(v_t, w_t); |
65 | + result = StaticTuple_new_constructor(&StaticTuple_Type, tmp_tuple, NULL); |
66 | + Py_DECREF(tmp_tuple); |
67 | + Py_INCREF(result); |
68 | + return result; |
69 | + |
70 | +not_imp: |
71 | + Py_XDECREF(v_t); |
72 | + Py_XDECREF(w_t); |
73 | + return Py_NotImplemented; |
74 | +fail: |
75 | + Py_XDECREF(v_t); |
76 | + Py_XDECREF(w_t); |
77 | + return NULL; |
78 | +} |
79 | + |
80 | static PyObject * |
81 | StaticTuple_item(StaticTuple *self, Py_ssize_t offset) |
82 | { |
83 | @@ -574,6 +646,29 @@ |
84 | {NULL, NULL} /* sentinel */ |
85 | }; |
86 | |
87 | + |
88 | +static PyNumberMethods StaticTuple_as_number = { |
89 | + (binaryfunc) StaticTuple_add, /* nb_add */ |
90 | + 0, /* nb_subtract */ |
91 | + 0, /* nb_multiply */ |
92 | + 0, /* nb_divide */ |
93 | + 0, /* nb_remainder */ |
94 | + 0, /* nb_divmod */ |
95 | + 0, /* nb_power */ |
96 | + 0, /* nb_negative */ |
97 | + 0, /* nb_positive */ |
98 | + 0, /* nb_absolute */ |
99 | + 0, /* nb_nonzero */ |
100 | + 0, /* nb_invert */ |
101 | + 0, /* nb_lshift */ |
102 | + 0, /* nb_rshift */ |
103 | + 0, /* nb_and */ |
104 | + 0, /* nb_xor */ |
105 | + 0, /* nb_or */ |
106 | + StaticTuple_coerce, /* nb_coerce */ |
107 | +}; |
108 | + |
109 | + |
110 | static PySequenceMethods StaticTuple_as_sequence = { |
111 | (lenfunc)StaticTuple_length, /* sq_length */ |
112 | 0, /* sq_concat */ |
113 | @@ -604,7 +699,7 @@ |
114 | 0, /* tp_setattr */ |
115 | 0, /* tp_compare */ |
116 | (reprfunc)StaticTuple_repr, /* tp_repr */ |
117 | - 0, /* tp_as_number */ |
118 | + &StaticTuple_as_number, /* tp_as_number */ |
119 | &StaticTuple_as_sequence, /* tp_as_sequence */ |
120 | 0, /* tp_as_mapping */ |
121 | (hashfunc)StaticTuple_hash, /* tp_hash */ |
Preview Diff
1 | === modified file 'NEWS' |
2 | --- NEWS 2009-10-15 04:06:32 +0000 |
3 | +++ NEWS 2009-10-15 18:31:17 +0000 |
4 | @@ -25,6 +25,11 @@ |
5 | Improvements |
6 | ************ |
7 | |
8 | +* When reading index files, we now use a ``StaticTuple`` rather than a |
9 | + plain ``tuple`` object. This generally gives a 20% decrease in peak |
10 | + memory, and can give a performance boost up to 40% on large projects. |
11 | + (John Arbash Meinel) |
12 | + |
13 | Documentation |
14 | ************* |
15 | |
16 | @@ -45,13 +50,14 @@ |
17 | used as the interning structure for StaticTuple objects. |
18 | (John Arbash Meinel) |
19 | |
20 | -* ``bzrlib._static_tuple_pyx.StaticTuple`` is now available. This class |
21 | - functions similarly to ``tuple`` objects. However, it can only point at |
22 | - other ``StaticTuple`` instances or strings. This allows us to remove it |
23 | - from the garbage collector (it cannot be in a cycle), it also allows us |
24 | - to intern the objects. In testing, this can reduce peak memory by |
25 | - 20-40%, and significantly improve performance by removing objects from |
26 | - being inspected by the garbage collector. (John Arbash Meinel) |
27 | +* ``bzrlib._static_tuple_pyx.StaticTuple`` is now available and used by |
28 | + the btree index parser. This class functions similarly to ``tuple`` |
29 | + objects. However, it can only point at other ``StaticTuple`` instances |
30 | + or strings. This allows us to remove it from the garbage collector (it |
31 | + cannot be in a cycle), it also allows us to intern the objects. In |
32 | + testing, this can reduce peak memory by 20-40%, and significantly |
33 | + improve performance by removing objects from being inspected by the |
34 | + garbage collector. (John Arbash Meinel) |
35 | |
36 | Testing |
37 | ******* |
38 | |
39 | === modified file 'bzrlib/_bencode_pyx.pyx' |
40 | --- bzrlib/_bencode_pyx.pyx 2009-06-05 01:48:32 +0000 |
41 | +++ bzrlib/_bencode_pyx.pyx 2009-10-15 18:31:17 +0000 |
42 | @@ -58,6 +58,13 @@ |
43 | void D_UPDATE_TAIL(Decoder, int n) |
44 | void E_UPDATE_TAIL(Encoder, int n) |
45 | |
46 | +# To maintain compatibility with older versions of pyrex, we have to use the |
47 | +# relative import here, rather than 'bzrlib._static_tuple_c' |
48 | +from _static_tuple_c cimport StaticTuple, StaticTuple_CheckExact, \ |
49 | + import_static_tuple_c |
50 | + |
51 | +import_static_tuple_c() |
52 | + |
53 | |
54 | cdef class Decoder: |
55 | """Bencode decoder""" |
56 | @@ -371,7 +378,8 @@ |
57 | self._encode_int(x) |
58 | elif PyLong_CheckExact(x): |
59 | self._encode_long(x) |
60 | - elif PyList_CheckExact(x) or PyTuple_CheckExact(x): |
61 | + elif (PyList_CheckExact(x) or PyTuple_CheckExact(x) |
62 | + or StaticTuple_CheckExact(x)): |
63 | self._encode_list(x) |
64 | elif PyDict_CheckExact(x): |
65 | self._encode_dict(x) |
66 | |
67 | === modified file 'bzrlib/_btree_serializer_pyx.pyx' |
68 | --- bzrlib/_btree_serializer_pyx.pyx 2009-10-08 05:12:01 +0000 |
69 | +++ bzrlib/_btree_serializer_pyx.pyx 2009-10-15 18:31:17 +0000 |
70 | @@ -38,6 +38,8 @@ |
71 | Py_ssize_t PyString_Size(object p) |
72 | Py_ssize_t PyString_GET_SIZE_ptr "PyString_GET_SIZE" (PyObject *) |
73 | char * PyString_AS_STRING_ptr "PyString_AS_STRING" (PyObject *) |
74 | + char * PyString_AS_STRING(object) |
75 | + Py_ssize_t PyString_GET_SIZE(object) |
76 | int PyString_AsStringAndSize_ptr(PyObject *, char **buf, Py_ssize_t *len) |
77 | void PyString_InternInPlace(PyObject **) |
78 | int PyTuple_CheckExact(object t) |
79 | @@ -55,6 +57,12 @@ |
80 | # void *memrchr(void *s, int c, size_t n) |
81 | int strncmp(char *s1, char *s2, size_t n) |
82 | |
83 | +# It seems we need to import the definitions so that the pyrex compiler has |
84 | +# local names to access them. |
85 | +from _static_tuple_c cimport StaticTuple, \ |
86 | + import_static_tuple_c, StaticTuple_New, \ |
87 | + StaticTuple_Intern, StaticTuple_SET_ITEM, StaticTuple_CheckExact |
88 | + |
89 | |
90 | # TODO: Find some way to import this from _dirstate_helpers |
91 | cdef void* _my_memrchr(void *s, int c, size_t n): |
92 | @@ -71,6 +79,7 @@ |
93 | pos = pos - 1 |
94 | return NULL |
95 | |
96 | + |
97 | # TODO: Import this from _dirstate_helpers when it is merged |
98 | cdef object safe_string_from_size(char *s, Py_ssize_t size): |
99 | if size < 0: |
100 | @@ -94,6 +103,10 @@ |
101 | Py_DECREF_ptr(py_str) |
102 | return result |
103 | |
104 | +from bzrlib import _static_tuple_c |
105 | +# This sets up the StaticTuple C_API functionality |
106 | +import_static_tuple_c() |
107 | + |
108 | |
109 | cdef class BTreeLeafParser: |
110 | """Parse the leaf nodes of a BTree index. |
111 | @@ -133,6 +146,7 @@ |
112 | self._cur_str = NULL |
113 | self._end_str = NULL |
114 | self._header_found = 0 |
115 | + # keys are tuples |
116 | |
117 | cdef extract_key(self, char * last): |
118 | """Extract a key. |
119 | @@ -142,8 +156,9 @@ |
120 | """ |
121 | cdef char *temp_ptr |
122 | cdef int loop_counter |
123 | - # keys are tuples |
124 | - key = PyTuple_New(self.key_length) |
125 | + cdef StaticTuple key |
126 | + |
127 | + key = StaticTuple_New(self.key_length) |
128 | for loop_counter from 0 <= loop_counter < self.key_length: |
129 | # grab a key segment |
130 | temp_ptr = <char*>memchr(self._start, c'\0', last - self._start) |
131 | @@ -158,15 +173,19 @@ |
132 | last - self._start))) |
133 | raise AssertionError(failure_string) |
134 | # capture the key string |
135 | - # TODO: Consider using PyIntern_FromString, the only caveat is that |
136 | - # it assumes a NULL-terminated string, so we have to check if |
137 | - # temp_ptr[0] == c'\0' or some other char. |
138 | - key_element = safe_interned_string_from_size(self._start, |
139 | + if (self.key_length == 1 |
140 | + and (temp_ptr - self._start) == 45 |
141 | + and strncmp(self._start, 'sha1:', 5) == 0): |
142 | + key_element = safe_string_from_size(self._start, |
143 | + temp_ptr - self._start) |
144 | + else: |
145 | + key_element = safe_interned_string_from_size(self._start, |
146 | temp_ptr - self._start) |
147 | # advance our pointer |
148 | self._start = temp_ptr + 1 |
149 | Py_INCREF(key_element) |
150 | - PyTuple_SET_ITEM(key, loop_counter, key_element) |
151 | + StaticTuple_SET_ITEM(key, loop_counter, key_element) |
152 | + key = StaticTuple_Intern(key) |
153 | return key |
154 | |
155 | cdef int process_line(self) except -1: |
156 | @@ -176,6 +195,7 @@ |
157 | cdef char *ref_ptr |
158 | cdef char *next_start |
159 | cdef int loop_counter |
160 | + cdef Py_ssize_t str_len |
161 | |
162 | self._start = self._cur_str |
163 | # Find the next newline |
164 | @@ -211,12 +231,25 @@ |
165 | # Invalid line |
166 | raise AssertionError("Failed to find the value area") |
167 | else: |
168 | - # capture the value string |
169 | - value = safe_string_from_size(temp_ptr + 1, last - temp_ptr - 1) |
170 | + # Because of how conversions were done, we ended up with *lots* of |
171 | + # values that are identical. These are all of the 0-length nodes |
172 | + # that are referred to by the TREE_ROOT (and likely some other |
173 | + # directory nodes.) For example, bzr has 25k references to |
174 | + # something like '12607215 328306 0 0', which ends up consuming 1MB |
175 | + # of memory, just for those strings. |
176 | + str_len = last - temp_ptr - 1 |
177 | + if (str_len > 4 |
178 | + and strncmp(" 0 0", last - 4, 4) == 0): |
179 | + # This drops peak mem for bzr.dev from 87.4MB => 86.2MB |
180 | + # For Launchpad 236MB => 232MB |
181 | + value = safe_interned_string_from_size(temp_ptr + 1, str_len) |
182 | + else: |
183 | + value = safe_string_from_size(temp_ptr + 1, str_len) |
184 | # shrink the references end point |
185 | last = temp_ptr |
186 | + |
187 | if self.ref_list_length: |
188 | - ref_lists = [] |
189 | + ref_lists = StaticTuple_New(self.ref_list_length) |
190 | loop_counter = 0 |
191 | while loop_counter < self.ref_list_length: |
192 | ref_list = [] |
193 | @@ -248,18 +281,20 @@ |
194 | if temp_ptr == NULL: |
195 | # key runs to the end |
196 | temp_ptr = ref_ptr |
197 | + |
198 | PyList_Append(ref_list, self.extract_key(temp_ptr)) |
199 | - PyList_Append(ref_lists, tuple(ref_list)) |
200 | + ref_list = StaticTuple_Intern(StaticTuple(*ref_list)) |
201 | + Py_INCREF(ref_list) |
202 | + StaticTuple_SET_ITEM(ref_lists, loop_counter - 1, ref_list) |
203 | # prepare for the next reference list |
204 | self._start = next_start |
205 | - ref_lists = tuple(ref_lists) |
206 | - node_value = (value, ref_lists) |
207 | + node_value = StaticTuple(value, ref_lists) |
208 | else: |
209 | if last != self._start: |
210 | # unexpected reference data present |
211 | raise AssertionError("unexpected reference data present") |
212 | - node_value = (value, ()) |
213 | - PyList_Append(self.keys, (key, node_value)) |
214 | + node_value = StaticTuple(value, StaticTuple()) |
215 | + PyList_Append(self.keys, StaticTuple(key, node_value)) |
216 | return 0 |
217 | |
218 | def parse(self): |
219 | @@ -294,7 +329,6 @@ |
220 | cdef Py_ssize_t flat_len |
221 | cdef Py_ssize_t key_len |
222 | cdef Py_ssize_t node_len |
223 | - cdef PyObject * val |
224 | cdef char * value |
225 | cdef Py_ssize_t value_len |
226 | cdef char * out |
227 | @@ -303,13 +337,12 @@ |
228 | cdef int first_ref_list |
229 | cdef int first_reference |
230 | cdef int i |
231 | - cdef PyObject *ref_bit |
232 | cdef Py_ssize_t ref_bit_len |
233 | |
234 | - if not PyTuple_CheckExact(node): |
235 | - raise TypeError('We expected a tuple() for node not: %s' |
236 | + if not PyTuple_CheckExact(node) and not StaticTuple_CheckExact(node): |
237 | + raise TypeError('We expected a tuple() or StaticTuple() for node not: %s' |
238 | % type(node)) |
239 | - node_len = PyTuple_GET_SIZE(node) |
240 | + node_len = len(node) |
241 | have_reference_lists = reference_lists |
242 | if have_reference_lists: |
243 | if node_len != 4: |
244 | @@ -318,8 +351,17 @@ |
245 | elif node_len < 3: |
246 | raise ValueError('Without ref_lists, we need at least 3 entries not: %s' |
247 | % len(node)) |
248 | - # I don't expect that we can do faster than string.join() |
249 | - string_key = '\0'.join(<object>PyTuple_GET_ITEM_ptr_object(node, 1)) |
250 | + # TODO: We can probably do better than string.join(), namely |
251 | + # when key has only 1 item, we can just grab that string |
252 | + # And when there are 2 items, we could do a single malloc + len() + 1 |
253 | + # also, doing .join() requires a PyObject_GetAttrString call, which |
254 | + # we could also avoid. |
255 | + # TODO: Note that pyrex 0.9.6 generates fairly crummy code here, using the |
256 | + # python object interface, versus 0.9.8+ which uses a helper that |
257 | + # checks if this supports the sequence interface. |
258 | + # We *could* do more work on our own, and grab the actual items |
259 | + # lists. For now, just ask people to use a better compiler. :) |
260 | + string_key = '\0'.join(node[1]) |
261 | |
262 | # TODO: instead of using string joins, precompute the final string length, |
263 | # and then malloc a single string and copy everything in. |
264 | @@ -336,7 +378,7 @@ |
265 | refs_len = 0 |
266 | if have_reference_lists: |
267 | # Figure out how many bytes it will take to store the references |
268 | - ref_lists = <object>PyTuple_GET_ITEM_ptr_object(node, 3) |
269 | + ref_lists = node[3] |
270 | next_len = len(ref_lists) # TODO: use a Py function |
271 | if next_len > 0: |
272 | # If there are no nodes, we don't need to do any work |
273 | @@ -350,31 +392,31 @@ |
274 | # references |
275 | refs_len = refs_len + (next_len - 1) |
276 | for reference in ref_list: |
277 | - if not PyTuple_CheckExact(reference): |
278 | + if (not PyTuple_CheckExact(reference) |
279 | + and not StaticTuple_CheckExact(reference)): |
280 | raise TypeError( |
281 | 'We expect references to be tuples not: %s' |
282 | % type(reference)) |
283 | - next_len = PyTuple_GET_SIZE(reference) |
284 | + next_len = len(reference) |
285 | if next_len > 0: |
286 | # We will need (len - 1) '\x00' characters to |
287 | # separate the reference key |
288 | refs_len = refs_len + (next_len - 1) |
289 | - for i from 0 <= i < next_len: |
290 | - ref_bit = PyTuple_GET_ITEM_ptr_object(reference, i) |
291 | - if not PyString_CheckExact_ptr(ref_bit): |
292 | + for ref_bit in reference: |
293 | + if not PyString_CheckExact(ref_bit): |
294 | raise TypeError('We expect reference bits' |
295 | ' to be strings not: %s' |
296 | % type(<object>ref_bit)) |
297 | - refs_len = refs_len + PyString_GET_SIZE_ptr(ref_bit) |
298 | + refs_len = refs_len + PyString_GET_SIZE(ref_bit) |
299 | |
300 | # So we have the (key NULL refs NULL value LF) |
301 | key_len = PyString_Size(string_key) |
302 | - val = PyTuple_GET_ITEM_ptr_object(node, 2) |
303 | - if not PyString_CheckExact_ptr(val): |
304 | + val = node[2] |
305 | + if not PyString_CheckExact(val): |
306 | raise TypeError('Expected a plain str for value not: %s' |
307 | - % type(<object>val)) |
308 | - value = PyString_AS_STRING_ptr(val) |
309 | - value_len = PyString_GET_SIZE_ptr(val) |
310 | + % type(val)) |
311 | + value = PyString_AS_STRING(val) |
312 | + value_len = PyString_GET_SIZE(val) |
313 | flat_len = (key_len + 1 + refs_len + 1 + value_len + 1) |
314 | line = PyString_FromStringAndSize(NULL, flat_len) |
315 | # Get a pointer to the new buffer |
316 | @@ -396,14 +438,14 @@ |
317 | out[0] = c'\r' |
318 | out = out + 1 |
319 | first_reference = 0 |
320 | - next_len = PyTuple_GET_SIZE(reference) |
321 | + next_len = len(reference) |
322 | for i from 0 <= i < next_len: |
323 | if i != 0: |
324 | out[0] = c'\x00' |
325 | out = out + 1 |
326 | - ref_bit = PyTuple_GET_ITEM_ptr_object(reference, i) |
327 | - ref_bit_len = PyString_GET_SIZE_ptr(ref_bit) |
328 | - memcpy(out, PyString_AS_STRING_ptr(ref_bit), ref_bit_len) |
329 | + ref_bit = reference[i] |
330 | + ref_bit_len = PyString_GET_SIZE(ref_bit) |
331 | + memcpy(out, PyString_AS_STRING(ref_bit), ref_bit_len) |
332 | out = out + ref_bit_len |
333 | out[0] = c'\0' |
334 | out = out + 1 |
335 | |
336 | === modified file 'bzrlib/_static_tuple_c.c' |
337 | --- bzrlib/_static_tuple_c.c 2009-10-15 16:18:47 +0000 |
338 | +++ bzrlib/_static_tuple_c.c 2009-10-15 18:31:17 +0000 |
339 | @@ -418,9 +418,15 @@ |
340 | return NULL; /* There seems to be an error */ |
341 | } |
342 | if (result == Py_NotImplemented) { |
343 | - PyErr_BadInternalCall(); |
344 | Py_DECREF(result); |
345 | - return NULL; |
346 | + /* One side must have had a string and the other a StaticTuple. |
347 | + * This clearly means that they are not equal. |
348 | + */ |
349 | + if (op == Py_EQ) { |
350 | + Py_INCREF(Py_False); |
351 | + return Py_False; |
352 | + } |
353 | + result = PyObject_RichCompare(v_obj, w_obj, Py_EQ); |
354 | } |
355 | if (result == Py_False) { |
356 | /* This entry is not identical |
357 | |
358 | === modified file 'bzrlib/btree_index.py' |
359 | --- bzrlib/btree_index.py 2009-10-15 04:01:26 +0000 |
360 | +++ bzrlib/btree_index.py 2009-10-15 18:31:17 +0000 |
361 | @@ -163,6 +163,7 @@ |
362 | node_refs, _ = self._check_key_ref_value(key, references, value) |
363 | if key in self._nodes: |
364 | raise errors.BadIndexDuplicateKey(key, self) |
365 | + # TODO: StaticTuple |
366 | self._nodes[key] = (node_refs, value) |
367 | self._keys.add(key) |
368 | if self._nodes_by_key is not None and self._key_length > 1: |
369 | @@ -625,6 +626,7 @@ |
370 | for line in lines[2:]: |
371 | if line == '': |
372 | break |
373 | + # TODO: Switch to StaticTuple here. |
374 | nodes.append(tuple(map(intern, line.split('\0')))) |
375 | return nodes |
376 | |
377 | |
378 | === modified file 'bzrlib/builtins.py' |
379 | --- bzrlib/builtins.py 2009-10-08 16:32:43 +0000 |
380 | +++ bzrlib/builtins.py 2009-10-15 18:31:17 +0000 |
381 | @@ -431,7 +431,10 @@ |
382 | for node in bt.iter_all_entries(): |
383 | # Node is made up of: |
384 | # (index, key, value, [references]) |
385 | - self.outf.write('%s\n' % (node[1:],)) |
386 | + refs_as_tuples = tuple([tuple([tuple(ref) for ref in ref_list]) |
387 | + for ref_list in node[3]]) |
388 | + as_tuple = (tuple(node[1]), node[2], refs_as_tuples) |
389 | + self.outf.write('%s\n' % (as_tuple,)) |
390 | |
391 | |
392 | class cmd_remove_tree(Command): |
393 | |
394 | === modified file 'bzrlib/index.py' |
395 | --- bzrlib/index.py 2009-10-13 05:20:50 +0000 |
396 | +++ bzrlib/index.py 2009-10-15 18:31:17 +0000 |
397 | @@ -40,6 +40,7 @@ |
398 | debug, |
399 | errors, |
400 | ) |
401 | +from bzrlib.static_tuple import StaticTuple |
402 | |
403 | _HEADER_READV = (0, 200) |
404 | _OPTION_KEY_ELEMENTS = "key_elements=" |
405 | @@ -102,7 +103,7 @@ |
406 | |
407 | def _check_key(self, key): |
408 | """Raise BadIndexKey if key is not a valid key for this index.""" |
409 | - if type(key) != tuple: |
410 | + if type(key) not in (tuple, StaticTuple): |
411 | raise errors.BadIndexKey(key) |
412 | if self._key_length != len(key): |
413 | raise errors.BadIndexKey(key) |
414 | @@ -202,7 +203,9 @@ |
415 | if reference not in self._nodes: |
416 | self._check_key(reference) |
417 | absent_references.append(reference) |
418 | + # TODO: StaticTuple |
419 | node_refs.append(tuple(reference_list)) |
420 | + # TODO: StaticTuple |
421 | return tuple(node_refs), absent_references |
422 | |
423 | def add_node(self, key, value, references=()): |
424 | |
425 | === modified file 'bzrlib/repository.py' |
426 | --- bzrlib/repository.py 2009-10-08 22:53:13 +0000 |
427 | +++ bzrlib/repository.py 2009-10-15 18:31:17 +0000 |
428 | @@ -4319,6 +4319,13 @@ |
429 | ): |
430 | if versioned_file is None: |
431 | continue |
432 | + # TODO: key is often going to be a StaticTuple object |
433 | + # I don't believe we can define a method by which |
434 | + # (prefix,) + StaticTuple will work, though we could |
435 | + # define a StaticTuple.sq_concat that would allow you to |
436 | + # pass in either a tuple or a StaticTuple as the second |
437 | + # object, so instead we could have: |
438 | + # StaticTuple(prefix) + key here... |
439 | missing_keys.update((prefix,) + key for key in |
440 | versioned_file.get_missing_compression_parent_keys()) |
441 | except NotImplementedError: |
442 | |
443 | === added file 'bzrlib/static_tuple.py' |
444 | --- bzrlib/static_tuple.py 1970-01-01 00:00:00 +0000 |
445 | +++ bzrlib/static_tuple.py 2009-10-15 18:31:17 +0000 |
446 | @@ -0,0 +1,25 @@ |
447 | +# Copyright (C) 2009 Canonical Ltd |
448 | +# |
449 | +# This program is free software; you can redistribute it and/or modify |
450 | +# it under the terms of the GNU General Public License as published by |
451 | +# the Free Software Foundation; either version 2 of the License, or |
452 | +# (at your option) any later version. |
453 | +# |
454 | +# This program is distributed in the hope that it will be useful, |
455 | +# but WITHOUT ANY WARRANTY; without even the implied warranty of |
456 | +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
457 | +# GNU General Public License for more details. |
458 | +# |
459 | +# You should have received a copy of the GNU General Public License |
460 | +# along with this program; if not, write to the Free Software |
461 | +# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA |
462 | + |
463 | +"""Interface thunk for a StaticTuple implementation.""" |
464 | + |
465 | +try: |
466 | + from bzrlib._static_tuple_c import StaticTuple |
467 | +except ImportError, e: |
468 | + from bzrlib import osutils |
469 | + osutils.failed_to_load_extension(e) |
470 | + from bzrlib._static_tuple_py import StaticTuple |
471 | + |
472 | |
473 | === modified file 'bzrlib/tests/test__static_tuple.py' |
474 | --- bzrlib/tests/test__static_tuple.py 2009-10-12 18:10:24 +0000 |
475 | +++ bzrlib/tests/test__static_tuple.py 2009-10-15 18:31:17 +0000 |
476 | @@ -23,6 +23,7 @@ |
477 | _static_tuple_py, |
478 | errors, |
479 | osutils, |
480 | + static_tuple, |
481 | tests, |
482 | ) |
483 | |
484 | @@ -278,6 +279,16 @@ |
485 | self.assertCompareEqual(k3, (k1, ('foo', 'bar'))) |
486 | self.assertCompareEqual((k1, ('foo', 'bar')), k3) |
487 | |
488 | + def test_compare_mixed_depths(self): |
489 | + stuple = self.module.StaticTuple |
490 | + k1 = stuple(stuple('a',), stuple('b',)) |
491 | + k2 = stuple(stuple(stuple('c',), stuple('d',)), |
492 | + stuple('b',)) |
493 | + # This requires comparing a StaticTuple to a 'string', and then |
494 | + # interpreting that value in the next higher StaticTuple. This used to |
495 | + # generate a PyErr_BadIternalCall. We now fall back to *something*. |
496 | + self.assertCompareNoRelation(k1, k2) |
497 | + |
498 | def test_hash(self): |
499 | k = self.module.StaticTuple('foo') |
500 | self.assertEqual(hash(k), hash(('foo',))) |
501 | @@ -416,3 +427,13 @@ |
502 | if self.module is _static_tuple_py: |
503 | return |
504 | self.assertIsNot(None, self.module._C_API) |
505 | + |
506 | + def test_static_tuple_thunk(self): |
507 | + # Make sure the right implementation is available from |
508 | + # bzrlib.static_tuple.StaticTuple. |
509 | + if self.module is _static_tuple_py: |
510 | + if CompiledStaticTuple.available(): |
511 | + # We will be using the C version |
512 | + return |
513 | + self.assertIs(static_tuple.StaticTuple, |
514 | + self.module.StaticTuple) |
515 | |
516 | === modified file 'bzrlib/util/_bencode_py.py' |
517 | --- bzrlib/util/_bencode_py.py 2009-06-10 03:56:49 +0000 |
518 | +++ bzrlib/util/_bencode_py.py 2009-10-15 18:31:17 +0000 |
519 | @@ -154,6 +154,13 @@ |
520 | encode_int(int(x), r) |
521 | encode_func[BooleanType] = encode_bool |
522 | |
523 | +try: |
524 | + from bzrlib._static_tuple_c import StaticTuple |
525 | +except ImportError: |
526 | + pass |
527 | +else: |
528 | + encode_func[StaticTuple] = encode_list |
529 | + |
530 | |
531 | def bencode(x): |
532 | r = [] |
533 | |
534 | === modified file 'setup.py' |
535 | --- setup.py 2009-10-12 17:03:40 +0000 |
536 | +++ setup.py 2009-10-15 18:31:17 +0000 |
537 | @@ -270,7 +270,6 @@ |
538 | |
539 | add_pyrex_extension('bzrlib._annotator_pyx') |
540 | add_pyrex_extension('bzrlib._bencode_pyx') |
541 | -add_pyrex_extension('bzrlib._btree_serializer_pyx') |
542 | add_pyrex_extension('bzrlib._chunks_to_lines_pyx') |
543 | add_pyrex_extension('bzrlib._groupcompress_pyx', |
544 | extra_source=['bzrlib/diff-delta.c']) |
545 | @@ -303,6 +302,8 @@ |
546 | add_pyrex_extension('bzrlib._simple_set_pyx') |
547 | ext_modules.append(Extension('bzrlib._static_tuple_c', |
548 | ['bzrlib/_static_tuple_c.c'])) |
549 | +add_pyrex_extension('bzrlib._btree_serializer_pyx') |
550 | + |
551 | |
552 | if unavailable_files: |
553 | print 'C extension(s) not found:' |
This is the same as an earlier patch for using StaticTuple as part of the btree code. It has a couple small additions.
1) Small fix for 'bzr dump-btree' that casts the objects back to tuples for nicer formatting. as_tuples= True' to return StaticTuple
2) Add 'StaticTuple' as a type that 'bencode' knows how to deal with (just treats it as another
Tuple/List object.)
Arguably we probably want to end up with 'decode_
instances. For now, though, this was all that was necessary to get the test suite to pass on my
machine. (Though a lot of the tests that were failing on PQM weren't failing here anyway...)