loggerhead

Code review comment for lp:~mnordhoff/loggerhead/statictuples

statictuples
Merge into trunk-rich

Revision history for this message

John A Meinel (jameinel) wrote on 2009-10-27:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...

>>
>> PyObject *as_tuple = NULL;
>>
>> if (!PySequence_Check(sequence)) {
>> as_tuple = PySequence_Tuple(sequence);
>> sequence = as_tuple;
>> }
>>
>> ...
>>
>> Py_XDECREF(as_tuple);
>
> Oh, cool!
>
> (Are you suggesting you'll do it or I do it? Cuz I don't mind, but if
> you were going to...)
>
> (If it supports arbitrary iterables, it could be renamed to
> "from_iterable", though I guess it makes sense to keep it
> "from_sequence" for backwards compatibility and to discourage passing
> arbitrary iterables since it's less efficient...)

So I should comment a bit more. The code you have is something like:

for r, parents in graph.iteritems():
val = tuple(p for p in parents if p in graph)

However, it turns out that list comprehensions are faster than
generators. Mostly because a generator has a 'context' that list
comprehensions don't have.

In essence if you use a generator, it adds the overhead of 1 function
call. So doing the above as

for r, parents in graph.iteritems():
val = tuple([p for p in parents if p in graph])

Should actually be faster. And has the distinct advantage that it
generates a sequence rather than an iterable

for r, parents in graph.iteritems():
val = StaticTuple.from_sequence([p for p in parents if p in graph])

just works :)

The only reason not to use a list comprehension over a generator is when
the number of items would cause significant memory overhead. That
certainly isn't true in the case of parent graphs, where you only have a
small handful of keys.
...

> If we intern things, it probably makes sense to intern the whole 3-tuple.
>
> Interning things bothers me, though, since _static_tuple_py interns
> forever and I don't want to increase memory usage that much for people
> with older bzrlibs.
>
> (Also, interning hurts my brain a bit. :-P )

It is up to you, but it saves *tons* of memory otherwise. You could use
a flag based on whether you have the compiled extensions...

>
>> I should also note that here:
>> 104 + data_statictuples = [[StaticTuple.from_sequence(t) for t in l] for
>> 105 + l in data_tuples]
>>
>> ^- You also almost definitely want to use a StaticTuple instead of that inner list. The cost of the list will otherwise be most of your savings from using ST. If it is always a 3-sequence list, you can use my form, or you could use another ".from_sequence()".
>
> Yeah, it uses the inner list because `where-merged` is populated in a
> second loop. I pretty much don't understand that code
> (loggerhead/wholehistory.py) or the performance implications of changing
> it, so I haven't looked into it yet.
>
> Hmm, a quick hack is easier than I thought. I'll check it out right now.
>
>> I'll also note that you really do want to create a local variable for "StaticTuple.from_sequence" otherwise you pay an attribute lookup for every call.
>
> Ehh, didn't figure that was significant. OK, will do.
>
>> In short, using StaticTuple.from_sequence(marshal.loads()) costs (on my machine, with Launchpad's repo) 256ms up from 138ms for just marshal, up from 61ms to just zlib.decompress.
>>
>> The bigger question is whether it is worthwhile or not. My initial testing doesn't show a significant memory savings *just* from the marshal change. In fact, it may even show an increase. However, this was on a single branch. Also, tuples don't free memory. They use a free list of up to 2k tuples (for every size from 1-20). Which means that doing:
>>
>> x = (a, b, c)
>> x = StaticTuple.from_sequence(x)
>>
>> Will actually keep the memory allocated as both a StaticTuple *and* a Tuple.
>>
>> However, eventually that tuple is likely to be re-used. But specifically 'marshal.loads()' is going to create all 60k+ (*3) tuples, and then you cast them into StaticTuple, leaving a lot of unused tuples around.
>
> Keeping a bunch of free tuples around is probably acceptable, since they
> will be reused next time a cache gets loaded, but it's still rather
> unfortunate. Huh.

It is only 2000, though. Versus the 60,000 or so that are created. So it
isn't a huge overhead.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkrm98kACgkQJdeBCYSNAAO+BgCggbnxJUaYp4mNxDQBMKQRUQeq
T70Anjm/I/WGPy/jwKb7tviK88H7mP/j
=9Vqm
-----END PGP SIGNATURE-----

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...

>>
>> PyObject *as_tuple = NULL;
>>
>> if (!PySequence_Check(sequence)) {
>>   as_tuple = PySequence_Tuple(sequence);
>>   sequence = as_tuple;
>> }
>>
>> ...
>>
>> Py_XDECREF(as_tuple);
> 
> Oh, cool!
> 
> (Are you suggesting you'll do it or I do it? Cuz I don't mind, but if
> you were going to...)
> 
> (If it supports arbitrary iterables, it could be renamed to
> "from_iterable", though I guess it makes sense to keep it
> "from_sequence" for backwards compatibility and to discourage passing
> arbitrary iterables since it's less efficient...)

So I should comment a bit more. The code you have is something like:

for r, parents in graph.iteritems():
  val = tuple(p for p in parents if p in graph)

However, it turns out that list comprehensions are faster than
generators. Mostly because a generator has a 'context' that list
comprehensions don't have.

In essence if you use a generator, it adds the overhead of 1 function
call. So doing the above as

for r, parents in graph.iteritems():
  val = tuple([p for p in parents if p in graph])

Should actually be faster. And has the distinct advantage that it
generates a sequence rather than an iterable

for r, parents in graph.iteritems():
  val = StaticTuple.from_sequence([p for p in parents if p in graph])

just works :)

> If we intern things, it probably makes sense to intern the whole 3-tuple.
> 
> Interning things bothers me, though, since _static_tuple_py interns
> forever and I don't want to increase memory usage that much for people
> with older bzrlibs.
> 
> (Also, interning hurts my brain a bit. :-P )

It is up to you, but it saves *tons* of memory otherwise. You could use
a flag based on whether you have the compiled extensions...

> 
>> I should also note that here:
>> 104	+ data_statictuples = [[StaticTuple.from_sequence(t) for t in l] for
>> 105	+ l in data_tuples]
>>
>> ^- You also almost definitely want to use a StaticTuple instead of that inner list. The cost of the list will otherwise be most of your savings from using ST. If it is always a 3-sequence list, you can use my form, or you could use another ".from_sequence()".
> 
> Yeah, it uses the inner list because `where-merged` is populated in a
> second loop. I pretty much don't understand that code
> (loggerhead/wholehistory.py) or the performance implications of changing
> it, so I haven't looked into it yet.
> 
> Hmm, a quick hack is easier than I thought. I'll check it out right now.
> 
>> I'll also note that you really do want to create a local variable for "StaticTuple.from_sequence" otherwise you pay an attribute lookup for every call.
> 
> Ehh, didn't figure that was significant. OK, will do.
> 
>> In short, using StaticTuple.from_sequence(marshal.loads()) costs (on my machine, with Launchpad's repo) 256ms up from 138ms for just marshal, up from 61ms to just zlib.decompress.
>>
>> The bigger question is whether it is worthwhile or not. My initial testing doesn't show a significant memory savings *just* from the marshal change. In fact, it may even show an increase. However, this was on a single branch. Also, tuples don't free memory. They use a free list of up to 2k tuples (for every size from 1-20). Which means that doing:
>>
>> x = (a, b, c)
>> x = StaticTuple.from_sequence(x)
>>
>> Will actually keep the memory allocated as both a StaticTuple *and* a Tuple.
>>
>> However, eventually that tuple is likely to be re-used. But specifically 'marshal.loads()' is going to create all 60k+ (*3) tuples, and then you cast them into StaticTuple, leaving a lot of unused tuples around.
> 
> Keeping a bunch of free tuples around is probably acceptable, since they
> will be reused next time a cache gets loaded, but it's still rather
> unfortunate. Huh.

It is only 2000, though. Versus the 60,000 or so that are created. So it
isn't a huge overhead.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkrm98kACgkQJdeBCYSNAAO+BgCggbnxJUaYp4mNxDQBMKQRUQeq
T70Anjm/I/WGPy/jwKb7tviK88H7mP/j
=9Vqm
-----END PGP SIGNATURE-----

« Back to merge proposal