Merge into 2.0 : gc-batching : Code : Bazaar

Reviewer	Date Requested	Status
John A Meinel	2009-08-25	Approve on 2009-08-26
Martin Pool		Approve on 2009-08-26
Review via email: mp+10643@code.launchpad.net

Revision history for this message

Andrew Bennetts (spiv) wrote on 2009-08-25:

#

Fixes #402657, at least for the originally reported case of "bzr branch http://bazaar.launchpad.net/~launchpad-pqm/launchpad/devel".

It changes the implementation of GroupCompressVersionedFiles._get_remaining_record_stream to batch up block fetching a bit more. It used to call _get_block once per key, which would do a single IO request for each block, and many of the blocks were very small (a few hundred bytes) for some reason. Now instead it essentially accumulates a list of blocks to fetch and only performs the IO when the expected fetch size is reasonably large (256kB currently, basically an arbitrary choice), or if the batch needs to be flushed before returning keys from a different source (such as self._unadded_refs).

The code is still somewhat at the mercy of the ordering asked for by the caller; the "bzr branch ..." case does 'unordered' IO, and this patch helps significantly there (based on log+http traces). Cases that trigger other orderings might not be helped as much.

_get_block is now gone, replaced by _get_blocks. There's a new class _BatchingBlockFetcher which has most of the batching logic, although the decision about when the actually fetch the batch is still in _get_remaining_record_stream. _get_remaining_record_stream is shorter and clearer now, which is nice.

I haven't managed to write any convincing automated tests for this improvement, so there are no test changes in this patch :( . I have tested it quite a bit manually, and I'm confident it is as correct as the old code, and I think the new code is pretty clear. So I'm pretty happy with it despite that. Suggestions for good tests are welcome, of course!

At one point yesterday I observed a small (~5%), reproducible, improvement in "co ." time for Launchpad, but I cannot reproduce a significant difference at the moment; seemingly something to do with throwing away and refetching my launchpad/devel branch? (I'm actually seeing 50% worse times than yesterday, even with bzr.dev!) Anyway, it's definitely no worse and still perhaps a little better even for local IO, so that's good.

Fixes #402657, at least for the originally reported case of "bzr branch http://bazaar.launchpad.net/~launchpad-pqm/launchpad/devel".

It changes the implementation of GroupCompressVersionedFiles._get_remaining_record_stream to batch up block fetching a bit more.  It used to call _get_block once per key, which would do a single IO request for each block, and many of the blocks were very small (a few hundred bytes) for some reason.  Now instead it essentially accumulates a list of blocks to fetch and only performs the IO when the expected fetch size is reasonably large (256kB currently, basically an arbitrary choice), or if the batch needs to be flushed before returning keys from a different source (such as self._unadded_refs).

The code is still somewhat at the mercy of the ordering asked for by the caller; the "bzr branch ..." case does 'unordered' IO, and this patch helps significantly there (based on log+http traces).  Cases that trigger other orderings might not be helped as much.

_get_block is now gone, replaced by _get_blocks.  There's a new class _BatchingBlockFetcher which has most of the batching logic, although the decision about when the actually fetch the batch is still in _get_remaining_record_stream.  _get_remaining_record_stream is shorter and clearer now, which is nice.

I haven't managed to write any convincing automated tests for this improvement, so there are no test changes in this patch :( .  I have tested it quite a bit manually, and I'm confident it is as correct as the old code, and I think the new code is pretty clear.  So I'm pretty happy with it despite that.  Suggestions for good tests are welcome, of course!

At one point yesterday I observed a small (~5%), reproducible, improvement in "co ." time for Launchpad, but I cannot reproduce a significant difference at the moment; seemingly something to do with throwing away and refetching my launchpad/devel branch?  (I'm actually seeing 50% worse times than yesterday, even with bzr.dev!)  Anyway, it's definitely no worse and still perhaps a little better even for local IO, so that's good.

Revision history for this message

John A Meinel (jameinel) wrote on 2009-08-25:

#

I suspect yet another lp:mad bug here, as I did not get an email to review this proposal. (I just double checked my inbox.)

Anyway, I'm working on a review, but between lp:mad getting the content of the diff wrong and it not sending me an email, it will be a bit... :)

Revision history for this message

John A Meinel (jameinel) wrote on 2009-08-25:

#

Download full text (7.9 KiB)

To start with, I think your layering here is pretty nice. Adding an object does make managing the state a bit clearer. get_record_stream() is complex enough that it probably should have started out as an object interface rather than just a simple generator. Oh well.

In short, I think we have a race condition because of how LRUSizeCache (or just LRUCache) interacts with _get_blocks() being in arbitrary ordering. And we need a different caching algorithm that can ensure requested blocks are never flushed from the cache before they are no longer needed.

The comment here is no longer correct:
+ for read_memo in read_memos:
+ try:
+ yield cached[read_memo]
+ except KeyError:
+ # read the group
+ zdata = raw_records.next()
+ # decompress - whole thing - this is not a bug, as it
+ # permits caching. We might want to store the partially
+ # decompresed group and decompress object, so that recent
+ # texts are not penalised by big groups.
+ block = GroupCompressBlock.from_bytes(zdata)
+ self._group_cache[read_memo] = block
+ cached[read_memo] = block
+ yield block

^- We are caching the 'block' and not the raw content anymore. And the block may only partially evaluate the compressed content.

However, I'm more concerned that in this loop:
+ for read_memo in read_memos:
+ if read_memo in cached:
+ # Don't fetch what we already have
+ continue
+ if read_memo in not_cached_seen:
+ # Don't try to fetch the same data twice
+ continue
+ not_cached.append(read_memo)
+ not_cached_seen.add(read_memo)
+ raw_records = self._access.get_raw_records(not_cached)
+ for read_memo in read_memos:
+ try:
+ yield cached[read_memo]
+ except KeyError:
+ # read the group
+ zdata = raw_records.next()
+ # decompress - whole thing - this is not a bug, as it
+ # permits caching. We might want to store the partially
+ # decompresed group and decompress object, so that recent
+ # texts are not penalised by big groups.
+ block = GroupCompressBlock.from_bytes(zdata)
+ self._group_cache[read_memo] = block
+ cached[read_memo] = block
+ yield block

^- There is an assumption that raw_records is in the exact ordering of 'read_memos'.

And I think there is another small assumption in here, which is that the code that filters out duplicates is going to require that groups always perfectly fit in cache and aren't expired before you get to the duplicate. Perhaps an example:

Assume we have the groups G1=>G4, and that we have the texts G1,T1, etc.

If the request ends up being for:

[G1,T1], [G2, T1-T100], [G3, T1-T100], [G1,T2]

The code above will not request G1 two times.
However, it will cache[G2] and cache[G3], which gives it time for G1 to be flushed from the cache.

Even more worrisome...

To start with, I think your layering here is pretty nice. Adding an object does make managing the state a bit clearer. get_record_stream() is complex enough that it probably should have started out as an object interface rather than just a simple generator. Oh well.

In short, I think we have a race condition because of how LRUSizeCache (or just LRUCache) interacts with _get_blocks() being in arbitrary ordering. And we need a different caching algorithm that can ensure requested blocks are never flushed from the cache before they are no longer needed.

The comment here is no longer correct:
+        for read_memo in read_memos:
+            try:
+                yield cached[read_memo]
+            except KeyError:
+                # read the group
+                zdata = raw_records.next()
+                # decompress - whole thing - this is not a bug, as it
+                # permits caching. We might want to store the partially
+                # decompresed group and decompress object, so that recent
+                # texts are not penalised by big groups.
+                block = GroupCompressBlock.from_bytes(zdata)
+                self._group_cache[read_memo] = block
+                cached[read_memo] = block
+                yield block

^- We are caching the 'block' and not the raw content anymore. And the block may only partially evaluate the compressed content.

However, I'm more concerned that in this loop:
+        for read_memo in read_memos:
+            if read_memo in cached:
+                # Don't fetch what we already have
+                continue
+            if read_memo in not_cached_seen:
+                # Don't try to fetch the same data twice
+                continue
+            not_cached.append(read_memo)
+            not_cached_seen.add(read_memo)
+        raw_records = self._access.get_raw_records(not_cached)
+        for read_memo in read_memos:
+            try:
+                yield cached[read_memo]
+            except KeyError:
+                # read the group
+                zdata = raw_records.next()
+                # decompress - whole thing - this is not a bug, as it
+                # permits caching. We might want to store the partially
+                # decompresed group and decompress object, so that recent
+                # texts are not penalised by big groups.
+                block = GroupCompressBlock.from_bytes(zdata)
+                self._group_cache[read_memo] = block
+                cached[read_memo] = block
+                yield block

^- There is an assumption that raw_records is in the exact ordering of 'read_memos'.

And I think there is another small assumption in here, which is that the code that filters out duplicates is going to require that groups always perfectly fit in cache and aren't expired before you get to the duplicate. Perhaps an example:

Assume we have the groups G1=>G4, and that we have the texts G1,T1, etc.

If the request ends up being for:

[G1,T1], [G2, T1-T100], [G3, T1-T100], [G1,T2]

The code above will not request G1 two times.
However, it will cache[G2] and cache[G3], which gives it time for G1 to be flushed from the cache.

Even more worrisome, is "large groups" which may never get put into the cache in the first place. (LRUSizeCache says "if a request is larger than my size, don't cache it".)

I think we'll never run into these bugs in test data, but we'll see them 'in the wild' once we have data that may not fit well in the cache.

So I think what we really need is a different caching logic. Namely that "_get_blocks()" could keep a counter for how many times it needs a given block, and doesn't flush the block out of the cache until that value is satisfied. In addition, it is probably reasonable to continue to cache things in gcvf._cache. I'm not sure about the LRU effects, and whether we should always make a request on the _cache to keep it in sync with the actual requests...

Then again, I think the "batching cache" is really all the caching we really need. So I would be tempted to make that the only cache, and see what impact it has. (Do we need caching that persists between get_record_stream calls? We might, but things like CHKMap has its own cache...)

Another possibility would be to use the cache as-is, but a cache miss re-requests the block. However, I don't think we can make a new request while we have a read pending, so that is probably much more complex than just changing the caching logic.

+        # Batch up as many keys as we can until either:
+        #  - we encounter an unadded ref, or
+        #  - we run out of keys, or
+        #  - the total bytes to retrieve for this batch > 256k
+        batcher = _BatchingBlockFetcher(self, locations)
+        BATCH_SIZE = 2**18

^- BATCH_SIZE should at a minimum be a global. I'm not sure why you chose 256 vs 64 vs... nor why you don't use something like transport.recommended_page_size() (4k locally, 64k remote).

going further, though. Is that we aren't paying attention to the 'index' value of the groups when we consider how we want to batch.

Specifically, you don't gain anything by batching index1 with index2, because that requires reading 2 files, which requires 2 round trips. Note that get_raw_records() does split things yet again into index-based batches.

Now I suppose if my patch for bug #402645 goes through, then we will be fetching 'unordered' and the _get_io_ordered_source_keys will group things by index, and that should help a bit there.

+                    if batcher.total_bytes > BATCH_SIZE:
+                        # Ok, this batch is big enough.  Yield some results.
+                        for _ in batcher.yield_factories():
+                            yield _

^- I personally don't like to see "_" as a variable that ever gets used on the right hand side.
I'd prefer:
for factory in batcher.yield_factories():
  yield factory

I would like to see tests added as part of this. Possibly direct tests of _BatchingBlockFetcher

Ideally we would have some tests that we do, in fact, read in big batches so that we don't accidentally regress. It is something that isn't data-wise incorrect, but can have a large impact on performance for certain users. So it is something that is a bit hard to detect, and requires someone going "something doesn't seem right".

That said it is probably pretty hard to test accurately, and it would be a shame to miss out on the benefit for a long time while waiting for testing.

Why not have "add_key()" return the number of total bytes so far. so this becomes:
+                    batcher.add_key(key)
+                    if batcher.total_bytes > BATCH_SIZE:
+                        # Ok, this batch is big enough.  Yield some results.
+                        for _ in batcher.yield_factories():
+                            yield _

if batcher.add_key(key) > BATCH_SIZE:
 ...

I'm just slightly hesitant about looking at object's attributes rather than function return values. Perhaps just overly so.

I don't really like the idea that we are maintaining all the various read orders in various length lists that have to be kept in sync.
What I mean is that you have
 self.keys     which is a list with an entry for every key
 memos_to_get  which is a list that only contains a new entry when there is a transition
 not_cached    a list similar to memos_to_get, but only contains unique items

And then stuff like:

blocks = _get_blocks(memos_to_get)

which then assumes that the blocks can be blocks.next() with 100% fidelity. I realize this is true, but it seems too easy to get skew. And if you end up off-by-one, the data stream is corrupted and you don't notice.
Maybe if we just check that block = blocks.next(); block.location == read_memo ?
That might be sufficient. Just some sort of check that everything is still in lock-step as we expect it to be.

Or even:
  block_read_memo, block = blocks.next()
  assert block_read_memo == read_memo

As I don't think blocks actually have a place to store where they were read from, but we should trivially have the read memo when we are yielding the block.

review: Needs Fixing

Revision history for this message

Andrew Bennetts (spiv) wrote on 2009-08-26:

#

Download full text (8.9 KiB)

John A Meinel wrote:
> Review: Needs Fixing
> To start with, I think your layering here is pretty nice. Adding an object
> does make managing the state a bit clearer. get_record_stream() is complex
> enough that it probably should have started out as an object interface rather
> than just a simple generator. Oh well.

Yes. I'm glad I structured it this way, it feels quite nice to me once I
figured out which bits to move out of _get_remaining_record_stream.

> In short, I think we have a race condition because of how LRUSizeCache (or
> just LRUCache) interacts with _get_blocks() being in arbitrary ordering. And
> we need a different caching algorithm that can ensure requested blocks are
> never flushed from the cache before they are no longer needed.

Ok, I see that. I've fixed it by making add_key grab and keep cached data
immediately; we know we're about to use it so it would be perverse to allow it
to fall out of LRUCache and then not be used. This shifts some of the "plan
what to get" logic out of yield_factories to add_key, which I was considering
doing anyway.

> The comment here is no longer correct:
> + for read_memo in read_memos:
> + try:
> + yield cached[read_memo]
> + except KeyError:
> + # read the group
> + zdata = raw_records.next()
> + # decompress - whole thing - this is not a bug, as it
> + # permits caching. We might want to store the partially
> + # decompresed group and decompress object, so that recent
> + # texts are not penalised by big groups.
> + block = GroupCompressBlock.from_bytes(zdata)
> + self._group_cache[read_memo] = block
> + cached[read_memo] = block
> + yield block
>
> ^- We are caching the 'block' and not the raw content anymore. And the block may only partially evaluate the compressed content.

Ok. I've deleted that large, wrong comment and just put this at the top of the
except block:

# Read the block, and cache it.

> However, I'm more concerned that in this loop:
[...]
>
> ^- There is an assumption that raw_records is in the exact ordering of 'read_memos'.

I've added the assertion you suggest below (without an 'assert' statement, of
course.)

> And I think there is another small assumption in here, which is that the code
> that filters out duplicates is going to require that groups always perfectly
> fit in cache and aren't expired before you get to the duplicate. Perhaps an
> example:
[...]
> Even more worrisome, is "large groups" which may never get put into the cache
> in the first place. (LRUSizeCache says "if a request is larger than my size,
> don't cache it".)
>
> I think we'll never run into these bugs in test data, but we'll see them 'in
> the wild' once we have data that may not fit well in the cache.

Yeah, I think you're right. I wish that we could make the test data encounter
these cases as much as wild data...

> So I think what we really need is a different caching logic. Namely that
> "_get_blocks()" could keep a counter for how many times it needs a given
> block, and d...

John A Meinel wrote:
> Review: Needs Fixing
> To start with, I think your layering here is pretty nice. Adding an object
> does make managing the state a bit clearer. get_record_stream() is complex
> enough that it probably should have started out as an object interface rather
> than just a simple generator. Oh well.

Yes.  I'm glad I structured it this way, it feels quite nice to me once I
figured out which bits to move out of _get_remaining_record_stream.

> In short, I think we have a race condition because of how LRUSizeCache (or
> just LRUCache) interacts with _get_blocks() being in arbitrary ordering. And
> we need a different caching algorithm that can ensure requested blocks are
> never flushed from the cache before they are no longer needed.

Ok, I see that.  I've fixed it by making add_key grab and keep cached data
immediately; we know we're about to use it so it would be perverse to allow it
to fall out of LRUCache and then not be used.  This shifts some of the "plan
what to get" logic out of yield_factories to add_key, which I was considering
doing anyway.

> The comment here is no longer correct:
> +        for read_memo in read_memos:
> +            try:
> +                yield cached[read_memo]
> +            except KeyError:
> +                # read the group
> +                zdata = raw_records.next()
> +                # decompress - whole thing - this is not a bug, as it
> +                # permits caching. We might want to store the partially
> +                # decompresed group and decompress object, so that recent
> +                # texts are not penalised by big groups.
> +                block = GroupCompressBlock.from_bytes(zdata)
> +                self._group_cache[read_memo] = block
> +                cached[read_memo] = block
> +                yield block
> 
> ^- We are caching the 'block' and not the raw content anymore. And the block may only partially evaluate the compressed content.

Ok.  I've deleted that large, wrong comment and just put this at the top of the
except block:

# Read the block, and cache it.

> However, I'm more concerned that in this loop:
[...]
> 
> ^- There is an assumption that raw_records is in the exact ordering of 'read_memos'.

I've added the assertion you suggest below (without an 'assert' statement, of
course.)

> And I think there is another small assumption in here, which is that the code
> that filters out duplicates is going to require that groups always perfectly
> fit in cache and aren't expired before you get to the duplicate. Perhaps an
> example:
[...]
> Even more worrisome, is "large groups" which may never get put into the cache
> in the first place. (LRUSizeCache says "if a request is larger than my size,
> don't cache it".)
> 
> I think we'll never run into these bugs in test data, but we'll see them 'in
> the wild' once we have data that may not fit well in the cache.

Yeah, I think you're right.  I wish that we could make the test data encounter
these cases as much as wild data...

> So I think what we really need is a different caching logic. Namely that
> "_get_blocks()" could keep a counter for how many times it needs a given
> block, and doesn't flush the block out of the cache until that value is
> satisfied. In addition, it is probably reasonable to continue to cache things
> in gcvf._cache. I'm not sure about the LRU effects, and whether we should
> always make a request on the _cache to keep it in sync with the actual
> requests...

What I've done as mentioned above is actually retrieve and keep the block on
_BatchingBlockFetcher at add_key time.  As new blocks are received from
_get_blocks they are also stored on _BatchingBlockFetcher.  So each block, once
retrieved, whether via cache or _get_blocks, will then be present for the rest
of the batch (i.e. until the end of yield_factories).  This fixes that issue, I
think.

An improvement to this scheme would be to then forget blocks from the batch once
we know nothing else in the batch will use them (e.g. by tracking how many keys
in the batch need a particular block/read_memo), but that's probably not very
important.

> Then again, I think the "batching cache" is really all the caching we really
> need. So I would be tempted to make that the only cache, and see what impact
> it has. (Do we need caching that persists between get_record_stream calls? We
> might, but things like CHKMap has its own cache...)

I wonder!  I'm willing to experiment, but for now I'd rather leave that cache
there.  It seems probable to me that it is having at least some minor beneficial
effect, and I'd rather not throw that away.

> Another possibility would be to use the cache as-is, but a cache miss
> re-requests the block. However, I don't think we can make a new request while
> we have a read pending, so that is probably much more complex than just
> changing the caching logic.

Right, I think so too.

> +        # Batch up as many keys as we can until either:
> +        #  - we encounter an unadded ref, or
> +        #  - we run out of keys, or
> +        #  - the total bytes to retrieve for this batch > 256k
> +        batcher = _BatchingBlockFetcher(self, locations)
> +        BATCH_SIZE = 2**18
> 
> ^- BATCH_SIZE should at a minimum be a global. I'm not sure why you chose 256
> vs 64 vs... nor why you don't use something like
> transport.recommended_page_size() (4k locally, 64k remote).

It was an arbitrary value plucked out of the air.  I'll set it globally to 64k.
I'm not sure how to cleanly get at a transport from here.  In my experiments the
value didn't really matter much, the main thing was to avoid fetching 300 bytes
at a time!

> going further, though. Is that we aren't paying attention to the 'index' value
> of the groups when we consider how we want to batch.
> 
> Specifically, you don't gain anything by batching index1 with index2, because
> that requires reading 2 files, which requires 2 round trips. Note that
> get_raw_records() does split things yet again into index-based batches.

Right.  Perhaps we should track the batch-size per index?  Or perhaps it doesn't
matter a lot.  We can tweak this in the future pretty easily, anyway.

> Now I suppose if my patch for bug #402645 goes through, then we will be
> fetching 'unordered' and the _get_io_ordered_source_keys will group things by
> index, and that should help a bit there.

Right, I'm hoping so too :)

> +                    if batcher.total_bytes > BATCH_SIZE:
> +                        # Ok, this batch is big enough.  Yield some results.
> +                        for _ in batcher.yield_factories():
> +                            yield _
> 
> ^- I personally don't like to see "_" as a variable that ever gets used on the right hand side.
> I'd prefer:
> for factory in batcher.yield_factories():
>   yield factory

Ok, changed.  (Although I find the longer lines with the word 'factory' repeated
negates the readability benefit for me, so it's much of a muchness.  So I'm
happy to go with your preference.)

> I would like to see tests added as part of this. Possibly direct tests of
> _BatchingBlockFetcher

Me too.  Unfortunately _BatchingBlockFetcher is still a bit too coupled to
GroupCompressedVersionedFiles to make for good unit tests.  I'll see what I can
do, but I think that'll be in a followup patch if that's ok with you.

> Ideally we would have some tests that we do, in fact, read in big batches so
> that we don't accidentally regress. It is something that isn't data-wise
> incorrect, but can have a large impact on performance for certain users. So it
> is something that is a bit hard to detect, and requires someone going
> "something doesn't seem right".
> 
> That said it is probably pretty hard to test accurately, and it would be a
> shame to miss out on the benefit for a long time while waiting for testing.

Perhaps usertest can help?

> Why not have "add_key()" return the number of total bytes so far. so this becomes:
> +                    batcher.add_key(key)
> +                    if batcher.total_bytes > BATCH_SIZE:
> +                        # Ok, this batch is big enough.  Yield some results.
> +                        for _ in batcher.yield_factories():
> +                            yield _
> 
> if batcher.add_key(key) > BATCH_SIZE:
>  ...
> 
> I'm just slightly hesitant about looking at object's attributes rather than
> function return values. Perhaps just overly so.

I've made this change.  I agree that it's a bit nicer to only rely on methods
rather than attributes for this object.

> I don't really like the idea that we are maintaining all the various read
> orders in various length lists that have to be kept in sync.
[...]
> Maybe if we just check that block = blocks.next(); block.location == read_memo
> ?
> That might be sufficient. Just some sort of check that everything is still in
> lock-step as we expect it to be.
> 
> Or even:
>   block_read_memo, block = blocks.next()
>   assert block_read_memo == read_memo

I've done essentially this.

Incremental diff attached.

-Andrew.

gc-batching.incremental.diff

Revision history for this message

Martin Pool (mbp) wrote on 2009-08-26:

#

I looked over this with spiv, and it looks like it's a reasonable change to merge to 2.0. It needs to be cherrypicked because the current change is based off trunk.

It would still be nice if John could check that this includes all his comments.

As a follow-on it would be good to add some specific tests for the _BatchingBlockFetcher.

review: Approve

Revision history for this message

John A Meinel (jameinel) wrote on 2009-08-26:

#

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...

>> So I think what we really need is a different caching logic. Namely that
>> "_get_blocks()" could keep a counter for how many times it needs a given
>> block, and doesn't flush the block out of the cache until that value is
>> satisfied. In addition, it is probably reasonable to continue to cache things
>> in gcvf._cache. I'm not sure about the LRU effects, and whether we should
>> always make a request on the _cache to keep it in sync with the actual
>> requests...
>
> What I've done as mentioned above is actually retrieve and keep the block on
> _BatchingBlockFetcher at add_key time. As new blocks are received from
> _get_blocks they are also stored on _BatchingBlockFetcher. So each block, once
> retrieved, whether via cache or _get_blocks, will then be present for the rest
> of the batch (i.e. until the end of yield_factories). This fixes that issue, I
> think.
>
> An improvement to this scheme would be to then forget blocks from the batch once
> we know nothing else in the batch will use them (e.g. by tracking how many keys
> in the batch need a particular block/read_memo), but that's probably not very
> important.

It is something I'm a little bit concerned about. Perhaps your block
fetcher is limited enough in scope (since you do try to keep an upper
bound on the request size). It is something I think we should keep an
eye on. As we've really been running into problems lately with consuming
far too much memory.

...
>
...

>> ^- I personally don't like to see "_" as a variable that ever gets used on the right hand side.
>> I'd prefer:
>> for factory in batcher.yield_factories():
>> yield factory
>
> Ok, changed. (Although I find the longer lines with the word 'factory' repeated
> negates the readability benefit for me, so it's much of a muchness. So I'm
> happy to go with your preference.)
>

Well

for f in batcher.yield_factories():
yield f

So from what I can see the diff looks fine. As Martin mentions

1) We are still missing some sort of unit testing here.

2) We should evaluate if memory consumption remains reasonable with the
current batching scheme.

But these are all things that can happen later. (Which is good, given
that this has landed :).

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkqVQbEACgkQJdeBCYSNAAPFEgCfTXWUtjupAWMak6SMdJwLhpdc
t0oAn2ZWsHeQPDZMP1ofLt8Y/SleQUMM
=Wx7v
-----END PGP SIGNATURE-----

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...

>> So I think what we really need is a different caching logic. Namely that
>> "_get_blocks()" could keep a counter for how many times it needs a given
>> block, and doesn't flush the block out of the cache until that value is
>> satisfied. In addition, it is probably reasonable to continue to cache things
>> in gcvf._cache. I'm not sure about the LRU effects, and whether we should
>> always make a request on the _cache to keep it in sync with the actual
>> requests...
> 
> What I've done as mentioned above is actually retrieve and keep the block on
> _BatchingBlockFetcher at add_key time.  As new blocks are received from
> _get_blocks they are also stored on _BatchingBlockFetcher.  So each block, once
> retrieved, whether via cache or _get_blocks, will then be present for the rest
> of the batch (i.e. until the end of yield_factories).  This fixes that issue, I
> think.
> 
> An improvement to this scheme would be to then forget blocks from the batch once
> we know nothing else in the batch will use them (e.g. by tracking how many keys
> in the batch need a particular block/read_memo), but that's probably not very
> important.

It is something I'm a little bit concerned about. Perhaps your block
fetcher is limited enough in scope (since you do try to keep an upper
bound on the request size). It is something I think we should keep an
eye on. As we've really been running into problems lately with consuming
far too much memory.

...
> 
...

>> ^- I personally don't like to see "_" as a variable that ever gets used on the right hand side.
>> I'd prefer:
>> for factory in batcher.yield_factories():
>>   yield factory
> 
> Ok, changed.  (Although I find the longer lines with the word 'factory' repeated
> negates the readability benefit for me, so it's much of a muchness.  So I'm
> happy to go with your preference.)
>

Well

for f in batcher.yield_factories():
  yield f

So from what I can see the diff looks fine. As Martin mentions

1) We are still missing some sort of unit testing here.

2) We should evaluate if memory consumption remains reasonable with the
current batching scheme.

But these are all things that can happen later. (Which is good, given
that this has landed :).

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkqVQbEACgkQJdeBCYSNAAPFEgCfTXWUtjupAWMak6SMdJwLhpdc
t0oAn2ZWsHeQPDZMP1ofLt8Y/SleQUMM
=Wx7v
-----END PGP SIGNATURE-----

Revision history for this message

John A Meinel (jameinel) wrote on 2009-08-26:

#

I don't seem to be able to set the status of this branch to "merged" in the submission request to get it into the 2.0 branch. I assume it was cherrypicked by Martin, since I see the code present in the 2.0rc1 release.

Perhaps Martin can set the status? (So it doesn't show up as something left to be done.)

review: Approve

 === modified file 'doc/developers/bug-handling.txt'
 --- doc/developers/bug-handling.txt	2009-08-24 00:29:31 +0000
 +++ doc/developers/bug-handling.txt	2009-08-26 18:35:24 +0000
@@ -142,12 +142,8 @@
      it's not a good idea for a developer to spend time reproducing the bug
      until they're going to work on it.)
  Triaged
--    This is an odd state - one we consider a bug in launchpad, as it really
--    means "Importance has been set". We use this to mean the same thing
--    as confirmed, and set no preference on whether Confirmed or Triaged are
--    used. Please do not change a "Confirmed" bug to "Triaged" or vice verca -
--    any reports we create or use will always search for both "Confirmed" and
--    "Triaged" or neither "Confirmed" nor "Triaged".
++    We don't use this status.  If it is set, it means the same as
++    Confirmed.
  In Progress
      Someone has started working on this.
  Won't Fix

Bazaar

Merge lp:~spiv/bzr/gc-batching into lp:bzr/2.0

Commit message

Description of the change

Preview Diff

Subscribers

 === modified file 'bzrlib/groupcompress.py'
 --- bzrlib/groupcompress.py	2009-08-25 07:36:53 +0000
 +++ bzrlib/groupcompress.py	2009-08-26 05:41:49 +0000
@@ -44,12 +44,15 @@
      VersionedFiles,
+     )
++# Minimum number of uncompressed bytes to try fetch at once when retrieving
++# groupcompress blocks.
++BATCH_SIZE = 2**16
++
  _USE_LZMA = False and (pylzma is not None)
  # osutils.sha_string('')
  _null_sha1 = 'da39a3ee5e6b4b0d3255bfef95601890afd80709'
--
  def sort_gc_optimal(parent_map):
      """Sort and group the keys in parent_map into groupcompress order.
@@ -984,25 +987,45 @@
          self.gcvf = gcvf
          self.locations = locations
          self.keys = []
++        self.batch_memos = {}
++        self.memos_to_get = []
          self.total_bytes = 0
          self.last_read_memo = None
          self.manager = None
      def add_key(self, key):
--        """Add another to key to fetch."""
++        """Add another to key to fetch.
++
++        :return: The estimated number of bytes needed to fetch the batch so
++            far.
++        """
          self.keys.append(key)
          index_memo, _, _, _ = self.locations[key]
          read_memo = index_memo[0:3]
--        # This looks a bit dangerous, but it's ok: we're assuming that memos in
--        # _group_cache now will still be there when yield_factories is called
--        # (and that uncached memos don't become cached).  This ought to be
--        # true.  But if it isn't that's ok, yield_factories will still work.
--        # The only negative effect is that the estimated 'total_bytes' value
--        # here will be wrong, so we might fetch bigger/smaller batches than
--        # intended.
--        if read_memo not in self.gcvf._group_cache:
++        # Three possibilities for this read_memo:
++        #  - it's already part of this batch; or
++        #  - it's not yet part of this batch, but is already cached; or
++        #  - it's not yet part of this batch and will need to be fetched.
++        if read_memo in self.batch_memos:
++            # This read memo is already in this batch.
++            return self.total_bytes
++        try:
++            cached_block = self.gcvf._group_cache[read_memo]
++        except KeyError:
++            # This read memo is new to this batch, and the data isn't cached
++            # either.
++            self.batch_memos[read_memo] = None
++            self.memos_to_get.append(read_memo)
              byte_length = read_memo[2]
              self.total_bytes += byte_length
++        else:
++            # This read memo is new to this batch, but cached.
++            # Keep a reference to the cached block in batch_memos because it's
++            # certain that we'll use it when this batch is processed, but
++            # there's a risk that it would fall out of _group_cache between now
++            # and then.
++            self.batch_memos[read_memo] = cached_block
++        return self.total_bytes
      def _flush_manager(self):
          if self.manager is not None:
@@ -1021,18 +1044,11 @@
          """
          if self.manager is None and not self.keys:
              return
--        # First, determine the list of memos to get.
--        memos_to_get = []
--        last_read_memo = self.last_read_memo
--        for key in self.keys:
--            index_memo = self.locations[key][0]
--            read_memo = index_memo[:3]
--            if last_read_memo != read_memo:
--                memos_to_get.append(read_memo)
--                last_read_memo = read_memo
--        # Second, we fetch all those memos in one batch.
--        blocks = self.gcvf._get_blocks(memos_to_get)
--        # Finally, we turn blocks into factories and yield them.
++        # Fetch all memos in this batch.
++        blocks = self.gcvf._get_blocks(self.memos_to_get)
++        # Turn blocks into factories and yield them.
++        memos_to_get_stack = list(self.memos_to_get)
++        memos_to_get_stack.reverse()
          for key in self.keys:
              index_memo, _, parents, _ = self.locations[key]
              read_memo = index_memo[:3]
@@ -1042,9 +1058,19 @@
                  # now, so yield records
                  for factory in self._flush_manager():
                      yield factory
--                # Now start a new manager.  The next block from _get_blocks
--                # will be the block we need.
--                block = blocks.next()
++                # Now start a new manager.
++                if memos_to_get_stack and memos_to_get_stack[-1] == read_memo:
++                    # The next block from _get_blocks will be the block we
++                    # need.
++                    block_read_memo, block = blocks.next()
++                    if block_read_memo != read_memo:
++                        raise AssertionError(
++                            "block_read_memo out of sync with read_memo"
++                            "(%r != %r)" % (block_read_memo, read_memo))
++                    self.batch_memos[read_memo] = block
++                    memos_to_get_stack.pop()
++                else:
++                    block = self.batch_memos[read_memo]
                  self.manager = _LazyGroupContentManager(block)
                  self.last_read_memo = read_memo
              start, end = index_memo[3:5]
@@ -1053,6 +1079,8 @@
              for factory in self._flush_manager():
                  yield factory
          del self.keys[:]
++        self.batch_memos.clear()
++        del self.memos_to_get[:]
          self.total_bytes = 0
@@ -1222,7 +1250,8 @@
      def _get_blocks(self, read_memos):
          """Get GroupCompressBlocks for the given read_memos.
--        Blocks are returned in the order specified in read_memos.
++        :returns: a series of (read_memo, block) pairs, in the order they were
++            originally passed.
          """
          cached = {}
          for read_memo in read_memos:
@@ -1246,18 +1275,14 @@
          raw_records = self._access.get_raw_records(not_cached)
          for read_memo in read_memos:
              try:
--                yield cached[read_memo]
++                yield read_memo, cached[read_memo]
              except KeyError:
--                # read the group
++                # Read the block, and cache it.
                  zdata = raw_records.next()
--                # decompress - whole thing - this is not a bug, as it
--                # permits caching. We might want to store the partially
--                # decompresed group and decompress object, so that recent
--                # texts are not penalised by big groups.
                  block = GroupCompressBlock.from_bytes(zdata)
                  self._group_cache[read_memo] = block
                  cached[read_memo] = block
--                yield block
++                yield read_memo, block
      def get_missing_compression_parent_keys(self):
          """Return the keys of missing compression parents.
@@ -1432,34 +1457,32 @@
          # Batch up as many keys as we can until either:
          #  - we encounter an unadded ref, or
          #  - we run out of keys, or
--        #  - the total bytes to retrieve for this batch > 256k
++        #  - the total bytes to retrieve for this batch > BATCH_SIZE
          batcher = _BatchingBlockFetcher(self, locations)
--        BATCH_SIZE = 2**18
          for source, keys in source_keys:
              if source is self:
                  for key in keys:
                      if key in self._unadded_refs:
                          # Flush batch, then yield unadded ref from
                          # self._compressor.
--                        for _ in batcher.yield_factories(full_flush=True):
--                            yield _
++                        for factory in batcher.yield_factories(full_flush=True):
++                            yield factory
                          bytes, sha1 = self._compressor.extract(key)
                          parents = self._unadded_refs[key]
                          yield FulltextContentFactory(key, parents, sha1, bytes)
                          continue
--                    batcher.add_key(key)
--                    if batcher.total_bytes > BATCH_SIZE:
++                    if batcher.add_key(key) > BATCH_SIZE:
                          # Ok, this batch is big enough.  Yield some results.
--                        for _ in batcher.yield_factories():
--                            yield _
++                        for factory in batcher.yield_factories():
++                            yield factory
              else:
--                for _ in batcher.yield_factories(full_flush=True):
--                    yield _
++                for factory in batcher.yield_factories(full_flush=True):
++                    yield factory
                  for record in source.get_record_stream(keys, ordering,
                                                         include_delta_closure):
                      yield record
--        for _ in batcher.yield_factories(full_flush=True):
--            yield _
++        for factory in batcher.yield_factories(full_flush=True):
++            yield factory
      def get_sha1s(self, keys):
          """See VersionedFiles.get_sha1s()."""