loggerhead

Code review comment for lp:~mnordhoff/loggerhead/statictuples

statictuples
Merge into trunk-rich

Revision history for this message

John A Meinel (jameinel) wrote on 2009-10-27:

I'm obviously not an official reviewer, but I figured I'd poke my head in here.

I'm a bit surprised at some of the changes that seem pretty unrelated. Such as:
195 if (base < 100) and (dot != 0):
196 - out += '.%d' % (dot,)
197 + out += '.%d' % (dot)

(I realize that the latter probably works, but if 'dot' ever ends up as a tuple, you'll get a rather cryptic failure mode.)

Also, I would be quite skeptical about this change:
@classmethod
223 - def is_installed(cls):
224 - return os.environ.get(cls._reloader_environ_key)
225 + def is_installed(self):
226 + return os.environ.get(self._reloader_environ_key)
227

^- You are changing a "classmethod" to use the "self" attribute, which is, at best, confusing. You don't have an instance here, you just have the Class object, hence why the original author used "cls" to clarify that.

279 - revision_graph[key] = tuple(parent for parent in parents if parent
280 - in revision_graph)
281 + # TODO: It'd be nice if from_sequence supported generators.
282 + revision_graph[key] = StaticTuple.from_sequence(tuple(
283 + parent for parent in parents if parent in revision_graph))

^- It probably wouldn't be hard to add this sort of thing to FromSequence. Namely change these lines:
    if (!PySequence_Check(sequence)) {
        PyErr_Format(PyExc_TypeError, "Type %s is not a sequence type",
                     Py_TYPE(sequence)->tp_name);
        return NULL;
    }

PyObject *as_tuple = NULL;

if (!PySequence_Check(sequence)) {
as_tuple = PySequence_Tuple(sequence);
sequence = as_tuple;
}

...

Py_XDECREF(as_tuple);

Here are some performance numbers for a Launchpad tree. This is ~66k revisions.

61 msec TIMEIT "zlib.decompress(open('test.data', 'rb').read())"
139 msec TIMEIT "o = marshal.loads(zlib.decompress(open('test.data', 'rb').read()))"
(78 msec to marshal.loads())

76 msec TIMEIT "zlib.decompress(open('test.pickle2.z', 'rb').read())"
273 msec TIMEIT "o = cPickle.loads(....)"
(196 msec to cPickle.loads())

So marshal is *almost* as fast as zlib on the same data.

97 msec TIMEIT "zlib.decompress(...'test.json.z')"
356 msec TIMEIT "simplejson.loads(...'test.json.z')"

(the raw json is 16MB)

94 msec TIMEIT "zlib.decompress(...'test.bencode.z')"
271 msec TIMEIT "o = _bencode_pyx.bdecode(...'test.bencode.z')"

So bdecode is as fast as cPickle, and we can probably teach bdecode(as_tuples=True) to return StaticTuple objects. Though

306 msec TIMEIT "o = _bencode_pyx.bdecode_as_tuple(...)"

92 msec TIMEIT "zlib.decompress(...'test.pickle2-st.z')"
335 msec TIMEIT "o = cPickle.loads(...'test.pickle2-st.z')"

So the StaticTuple form is slightly slower than the tuple form. My guess is that cPickle has a bit of extra overhead when you don't use built-in types. (Like it reads in as a list/tuple, and then passes the data to the constructor.)

Now, what about using marshal plus casting to StaticTuples:

o = marshal.loads(zlib.decompress(...))
st = static_tuple.StaticTuple
as_st = static_tuple.StaticTuple.from_sequence
o2 = [st(as_st(x[0]), as_st(x[1]), as_st(x[2])) for x in o]

256 msec TIMEIT "o2 = ..."

So it turns out that using 'marshal' + casting the data into StaticTuple instances is faster than using cPickle.

As for whether things should be interned... I would say that you probably only benefit if you are sharing between branches. Namely, the entries up there are:
  [(merge-info, where-merged, parents)].
        `merge-info` is (seq, revid, merge_depth, revno_str, end_of_merge) --
        like a merged sorted list, but the revno is stringified.
        `where-merged` is a tuple of revisions that have this revision as a
        non-lefthand parent. Finally, `parents` is just the usual list of
        parents of this revision.

However, btree indexes use a tuple of parent *keys* (tuples), not a tuple of parent-ids (strings). So you don't actually share the data there. Though graph.iter_ancestry() is probably using the same parents lists, and thus could share with this revision-info cache. (But I would have guessed that you would only have *one* of them at a time.)

That said, if you do load 2 branches of the same project at the same time, you will save *tremendously* by interning pretty much everything. You could probably intern at the "outer" tuple if you wanted. Since I think the inner data isn't going to be repeated much outside of that.

I should also note that here:
104 + data_statictuples = [[StaticTuple.from_sequence(t) for t in l] for
105 + l in data_tuples]

^- You also almost definitely want to use a StaticTuple instead of that inner list. The cost of the list will otherwise be most of your savings from using ST. If it is always a 3-sequence list, you can use my form, or you could use another ".from_sequence()".

I'll also note that you really do want to create a local variable for "StaticTuple.from_sequence" otherwise you pay an attribute lookup for every call.

In short, using StaticTuple.from_sequence(marshal.loads()) costs (on my machine, with Launchpad's repo) 256ms up from 138ms for just marshal, up from 61ms to just zlib.decompress.

The bigger question is whether it is worthwhile or not. My initial testing doesn't show a significant memory savings *just* from the marshal change. In fact, it may even show an increase. However, this was on a single branch. Also, tuples don't free memory. They use a free list of up to 2k tuples (for every size from 1-20). Which means that doing:

x = (a, b, c)
x = StaticTuple.from_sequence(x)

Will actually keep the memory allocated as both a StaticTuple *and* a Tuple.

However, eventually that tuple is likely to be re-used. But specifically 'marshal.loads()' is going to create all 60k+ (*3) tuples, and then you cast them into StaticTuple, leaving a lot of unused tuples around.

I'm obviously not an official reviewer, but I figured I'd poke my head in here.

I'm a bit surprised at some of the changes that seem pretty unrelated. Such as:
195	 if (base < 100) and (dot != 0):
196	- out += '.%d' % (dot,)
197	+ out += '.%d' % (dot)

(I realize that the latter probably works, but if 'dot' ever ends up as a tuple, you'll get a rather cryptic failure mode.)

Also, I would be quite skeptical about this change:
 @classmethod
223	- def is_installed(cls):
224	- return os.environ.get(cls._reloader_environ_key)
225	+ def is_installed(self):
226	+ return os.environ.get(self._reloader_environ_key)
227

279	- revision_graph[key] = tuple(parent for parent in parents if parent
280	- in revision_graph)
281	+ # TODO: It'd be nice if from_sequence supported generators.
282	+ revision_graph[key] = StaticTuple.from_sequence(tuple(
283	+ parent for parent in parents if parent in revision_graph))

PyObject *as_tuple = NULL;

if (!PySequence_Check(sequence)) {
  as_tuple = PySequence_Tuple(sequence);
  sequence = as_tuple;
}

...

Py_XDECREF(as_tuple);

Here are some performance numbers for a Launchpad tree. This is ~66k revisions.

61 msec  TIMEIT "zlib.decompress(open('test.data', 'rb').read())"
139 msec  TIMEIT "o = marshal.loads(zlib.decompress(open('test.data', 'rb').read()))"
  (78 msec to marshal.loads())

76 msec  TIMEIT "zlib.decompress(open('test.pickle2.z', 'rb').read())"
273 msec  TIMEIT "o = cPickle.loads(....)"
  (196 msec to cPickle.loads())

So marshal is *almost* as fast as zlib on the same data.

97 msec TIMEIT "zlib.decompress(...'test.json.z')"
356 msec TIMEIT "simplejson.loads(...'test.json.z')"

(the raw json is 16MB)

94 msec TIMEIT "zlib.decompress(...'test.bencode.z')"
271 msec TIMEIT "o = _bencode_pyx.bdecode(...'test.bencode.z')"

So bdecode is as fast as cPickle, and we can probably teach bdecode(as_tuples=True) to return StaticTuple objects. Though

306 msec TIMEIT "o = _bencode_pyx.bdecode_as_tuple(...)"

92 msec TIMEIT "zlib.decompress(...'test.pickle2-st.z')"
335 msec TIMEIT "o = cPickle.loads(...'test.pickle2-st.z')"

Now, what about using marshal plus casting to StaticTuples:

o = marshal.loads(zlib.decompress(...))
st = static_tuple.StaticTuple
as_st = static_tuple.StaticTuple.from_sequence
o2 = [st(as_st(x[0]), as_st(x[1]), as_st(x[2])) for x in o]

256 msec TIMEIT "o2 = ..."

So it turns out that using 'marshal' + casting the data into StaticTuple instances is faster than using cPickle.

As for whether things should be interned... I would say that you probably only benefit if you are sharing between branches. Namely, the entries up there are:
  [(merge-info, where-merged, parents)].
        `merge-info` is (seq, revid, merge_depth, revno_str, end_of_merge) --
        like a merged sorted list, but the revno is stringified.
        `where-merged` is a tuple of revisions that have this revision as a
        non-lefthand parent.  Finally, `parents` is just the usual list of
        parents of this revision.

I should also note that here:
104	+ data_statictuples = [[StaticTuple.from_sequence(t) for t in l] for
105	+ l in data_tuples]

I'll also note that you really do want to create a local variable for "StaticTuple.from_sequence" otherwise you pay an attribute lookup for every call.

In short, using StaticTuple.from_sequence(marshal.loads()) costs (on my machine, with Launchpad's repo) 256ms up from 138ms for just marshal, up from 61ms to just zlib.decompress.

x = (a, b, c)
x = StaticTuple.from_sequence(x)

Will actually keep the memory allocated as both a StaticTuple *and* a Tuple.

« Back to merge proposal