Merge lp:~mbp/bzr/590638-buffering into lp:bzr

Proposed by Martin Pool
Status: Merged
Approved by: Vincent Ladeuil
Approved revision: no longer in the source branch.
Merged at revision: 5321
Proposed branch: lp:~mbp/bzr/590638-buffering
Merge into: lp:bzr
Diff against target: 53 lines (+6/-17)
2 files modified
bzrlib/smart/protocol.py (+1/-0)
bzrlib/tests/test_smart_transport.py (+5/-17)
To merge this branch: bzr merge lp:~mbp/bzr/590638-buffering
Reviewer Review Type Date Requested Status
Robert Collins (community) Needs Fixing
Vincent Ladeuil Approve
Review via email: mp+27578@code.launchpad.net

Commit message

flush stream parts after each length-prefixed bit, like we do when sending

Description of the change

In looking into bug 590638 with spiv and igc, we noticed that bzr can buffer up to 1MB of useful data when streaming from the server to the client. In other words we can get into the situation where we have 0.999MiB of data on the server ready to be sent to the client, but all the OS buffers on both sides are empty. This can at the least cause a state where the pipe is not filled, and it may also (?) cause poor tcp window behaviour.

TCP behaviour here is not trivial, and this patch probably doesn't make us optimal but it probably makes us better.

This change also makes it symmetric with the client code and other places where data is flushed down.

I think on platforms that support it we probably want to set TCP_CORK and then write as soon as we have data, then release the cork at the end of the logical operation.

I have tested this but I haven't attempted to measure a performance change. It will probably often be lost in the noise but there may be cases where it matters.

To post a comment you must log in.
Revision history for this message
Vincent Ladeuil (vila) wrote :

Hmm, I feel your pain, such an hard diagnostic for a one-line fix deserves reward: Approve :)

review: Approve
Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Vincent Ladeuil wrote:
> The proposal to merge lp:~mbp/bzr/590638-buffering into lp:bzr has been updated.
>
> Status: Needs review => Approved

I guess I'd like to understand why flushing all the time is actually
better. I agree with the concept of TCP_CORK, though it doesn't really
apply when we are using an SSH channel.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkwXmNgACgkQJdeBCYSNAANTDwCeJaQOTsqV5JIey/9FyStuqYCK
WuMAn3SaQ7jkHyuHUNhRTcHNhzbPrkxY
=xlKb
-----END PGP SIGNATURE-----

Revision history for this message
Robert Collins (lifeless) wrote :

Its not clear to me that it is better, or worse. Certainly if you starve TCP the buffer on the receiver won't open as wide. However the flush in this method is actually 'write' - it simply triggers an OS level write so the data hits the socket.

However, there is a comment (visible in the diff) that is invalidated by this change, and the logging call needs to be updated as well - because we're no longer buffering here, it will make a difference.

I'm at -0.5 on this without some testing: we put in the buffering because we were bouncing between the socket and disk in a very coupled manner. IIRC, and we found we got more CPU use out of bzr by buffering a bit, and the chunks that are being written can be very very small (less so with 2a *today* but we want skinny network transfers so we shouldn't discount that).

If testing this is particularly hard, spiv's network test environment post may be very useful. I'm in PP mode this week, so once I've got the reviews done I may see about looking into this.

An alternative I proposed a while back was a reader thread and a writer thread, so that while we may block on either end on IO, python can at least keep going. That way we could set an arbitrary (say 20MB) buffer and stop pulling from the data generator when we've got that much backed up and not on the pipe. We could also sensibly *log* stuff at that point, to say 'hey, network is slow'.

review: Needs Fixing
Revision history for this message
Vincent Ladeuil (vila) wrote :

>>>>> Robert Collins <email address hidden> writes:

<snip/>

    > An alternative I proposed a while back was a reader thread and a
    > writer thread,

+1, while threads may turn out to be a little harder to debug, splitting
the work and testing each part should help *not* getting there by
catching the problems earlier.

         Vincent

Revision history for this message
Martin Pool (mbp) wrote :

[I talked to Robert about this offline, here's a summary for the record]

I should have mentioned earlier that in looking at data received from
Launchpad, we saw a lag of a few seconds, then 1MB of data arriving
all at once, then the same pattern repeats. That's fairly clearly
driven by this code.

Since we already have the behavior of this patch when sending from the
client but not from the server it's a bit hard for me to believe it's
really intentional. At least I would like to see a comment explaining
why we decided they ought to be different.

On 21 June 2010 07:42, Robert Collins <email address hidden> wrote:
> Review: Needs Fixing
> Its not clear to me that it is better, or worse. Certainly if you starve TCP the buffer on the receiver won't open as wide. However the flush in this method is actually 'write' - it simply triggers an OS level write so the data hits the socket.

I don't see why that's a "however". At the moment we starve the OS
buffers. If they empty we may leave the wire idle or as a knock-on
effect cause worse tcp behavior.

> An alternative I proposed a while back was a reader thread and a writer thread, so that while we may block on either end on IO, python can at least keep going. That way we could set an arbitrary (say 20MB) buffer and stop pulling from the data generator when we've got that much backed up and not on the pipe. We could also sensibly *log* stuff at that point, to say 'hey, network is slow'.

That sounds a lot like what OS socket buffers do, so I'd like to only
add this when we're sure OS buffers can't be made to work well.

However Robert clarified offline that what he actually wanted was
apparently multiple threads trying to read from disk, so that the OS
can see more than one request at a time, but presumably no extra
buffering on the socket.

--
Martin

Revision history for this message
Martin Pool (mbp) wrote :

sent to pqm by email

Revision history for this message
Martin Pool (mbp) wrote :

sent to pqm by email

Revision history for this message
Robert Collins (lifeless) wrote :

We had a outbound mail failure in PQM, and I'm not sure why; tracking
it down may be tricky today (no losas around).

Revision history for this message
Martin Pool (mbp) wrote :

It turned out that this generates a failure from assertWriteCount with a 1.7MB error message, which gives the mailserver a headache.

But maybe we could have a bit more headroom in the mail size limit?

Revision history for this message
Robert Collins (lifeless) wrote :

On Fri, Jun 25, 2010 at 3:05 PM, Martin Pool <email address hidden> wrote:
> It turned out that this generates a failure from assertWriteCount with a 1.7MB error message, which gives the mailserver a headache.
>
> But maybe we could have a bit more headroom in the mail size limit?

Its set to 10MB at the moment; if we want more - file an RT for it. I
can do that if you want.

We should also make PQM truncate in that case, I think.
-Rob

Revision history for this message
Martin Pool (mbp) wrote :

sent to pqm by email

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/smart/protocol.py'
2--- bzrlib/smart/protocol.py 2010-03-26 04:26:55 +0000
3+++ bzrlib/smart/protocol.py 2010-06-25 05:05:44 +0000
4@@ -1231,6 +1231,7 @@
5 if first_chunk is None:
6 first_chunk = chunk
7 self._write_prefixed_body(chunk)
8+ self.flush()
9 if 'hpssdetail' in debug.debug_flags:
10 # Not worth timing separately, as _write_func is
11 # actually buffered
12
13=== modified file 'bzrlib/tests/test_smart_transport.py'
14--- bzrlib/tests/test_smart_transport.py 2010-06-16 05:47:02 +0000
15+++ bzrlib/tests/test_smart_transport.py 2010-06-25 05:05:44 +0000
16@@ -2864,9 +2864,10 @@
17 self.responder = protocol.ProtocolThreeResponder(self.writes.append)
18
19 def assertWriteCount(self, expected_count):
20+ # self.writes can be quite large; don't show the whole thing
21 self.assertEqual(
22 expected_count, len(self.writes),
23- "Too many writes: %r" % (self.writes,))
24+ "Too many writes: %d, expected %d" % (len(self.writes), expected_count))
25
26 def test_send_error_writes_just_once(self):
27 """An error response is written to the medium all at once."""
28@@ -2895,22 +2896,9 @@
29 response = _mod_request.SuccessfulSmartServerResponse(
30 ('arg', 'arg'), body_stream=['chunk1', 'chunk2'])
31 self.responder.send_response(response)
32- # We will write just once, despite the multiple chunks, due to
33- # buffering.
34- self.assertWriteCount(1)
35-
36- def test_send_response_with_body_stream_flushes_buffers_sometimes(self):
37- """When there are many bytes (>1MB), multiple writes will occur rather
38- than buffering indefinitely.
39- """
40- # Construct a response with stream with ~1.5MB in it. This should
41- # trigger 2 writes, but not 3
42- onekib = '12345678' * 128
43- body_stream = [onekib] * (1024 + 512)
44- response = _mod_request.SuccessfulSmartServerResponse(
45- ('arg', 'arg'), body_stream=body_stream)
46- self.responder.send_response(response)
47- self.assertWriteCount(2)
48+ # Per the discussion in bug 590638 we flush once after the header and
49+ # then once after each chunk
50+ self.assertWriteCount(3)
51
52
53 class TestSmartClientUnicode(tests.TestCase):