Code review comment for lp:~mnordhoff/loggerhead/statictuples

Revision history for this message
Matt Nordhoff (mnordhoff) wrote :

(Oh, forgot about replying to this again.)

John A Meinel wrote:
> So I should comment a bit more. The code you have is something like:
>
> for r, parents in graph.iteritems():
> val = tuple(p for p in parents if p in graph)
>
> However, it turns out that list comprehensions are faster than
> generators. Mostly because a generator has a 'context' that list
> comprehensions don't have.
>
> In essence if you use a generator, it adds the overhead of 1 function
> call. So doing the above as
>
> for r, parents in graph.iteritems():
> val = tuple([p for p in parents if p in graph])
>
> Should actually be faster. And has the distinct advantage that it
> generates a sequence rather than an iterable
>
> for r, parents in graph.iteritems():
> val = StaticTuple.from_sequence([p for p in parents if p in graph])
>
> just works :)
>
> The only reason not to use a list comprehension over a generator is when
> the number of items would cause significant memory overhead. That
> certainly isn't true in the case of parent graphs, where you only have a
> small handful of keys.
> ...

OK. Done.

Although I didn't benchmark it; my system is rather overloaded right
now. And I'm lazy. :-P

>> If we intern things, it probably makes sense to intern the whole 3-tuple.
>
>> Interning things bothers me, though, since _static_tuple_py interns
>> forever and I don't want to increase memory usage that much for people
>> with older bzrlibs.
>
>> (Also, interning hurts my brain a bit. :-P )
>
> It is up to you, but it saves *tons* of memory otherwise. You could use
> a flag based on whether you have the compiled extensions...

Eh. I'll work on interning next.

I actually wrote the code for using a flag based on whether the compiled
extensions are available, but I'm not sure I'll use it yet. I don't like
the overhead. :-P

I wonder how much of an impact interning the stringified revnos would
have? They're tiny, but they add up, and most of them are duplicates...

>> Keeping a bunch of free tuples around is probably acceptable, since they
>> will be reused next time a cache gets loaded, but it's still rather
>> unfortunate. Huh.
>
> It is only 2000, though. Versus the 60,000 or so that are created. So it
> isn't a huge overhead.

2000 tuples is _nothing_. There are already 50-60k tuples in memory
before Loggerhead gets a single request, from all of the libraries and
everything.

> John
> =:->
--

« Back to merge proposal