Merge lp:~mbp/bzr/590638-buffering into lp:bzr

Proposed by Martin Pool
Status: Merged
Approved by: Vincent Ladeuil
Approved revision: no longer in the source branch.
Merged at revision: 5321
Proposed branch: lp:~mbp/bzr/590638-buffering
Merge into: lp:bzr
Diff against target: 53 lines (+6/-17)
2 files modified
bzrlib/smart/protocol.py (+1/-0)
bzrlib/tests/test_smart_transport.py (+5/-17)
To merge this branch: bzr merge lp:~mbp/bzr/590638-buffering
Reviewer Review Type Date Requested Status
Robert Collins (community) Needs Fixing
Vincent Ladeuil Approve
Review via email: mp+27578@code.launchpad.net

Commit message

flush stream parts after each length-prefixed bit, like we do when sending

Description of the change

In looking into bug 590638 with spiv and igc, we noticed that bzr can buffer up to 1MB of useful data when streaming from the server to the client. In other words we can get into the situation where we have 0.999MiB of data on the server ready to be sent to the client, but all the OS buffers on both sides are empty. This can at the least cause a state where the pipe is not filled, and it may also (?) cause poor tcp window behaviour.

TCP behaviour here is not trivial, and this patch probably doesn't make us optimal but it probably makes us better.

This change also makes it symmetric with the client code and other places where data is flushed down.

I think on platforms that support it we probably want to set TCP_CORK and then write as soon as we have data, then release the cork at the end of the logical operation.

I have tested this but I haven't attempted to measure a performance change. It will probably often be lost in the noise but there may be cases where it matters.

To post a comment you must log in.
Revision history for this message
Vincent Ladeuil (vila) wrote :

Hmm, I feel your pain, such an hard diagnostic for a one-line fix deserves reward: Approve :)

review: Approve
Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Vincent Ladeuil wrote:
> The proposal to merge lp:~mbp/bzr/590638-buffering into lp:bzr has been updated.
>
> Status: Needs review => Approved

I guess I'd like to understand why flushing all the time is actually
better. I agree with the concept of TCP_CORK, though it doesn't really
apply when we are using an SSH channel.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkwXmNgACgkQJdeBCYSNAANTDwCeJaQOTsqV5JIey/9FyStuqYCK
WuMAn3SaQ7jkHyuHUNhRTcHNhzbPrkxY
=xlKb
-----END PGP SIGNATURE-----

Revision history for this message
Robert Collins (lifeless) wrote :

Its not clear to me that it is better, or worse. Certainly if you starve TCP the buffer on the receiver won't open as wide. However the flush in this method is actually 'write' - it simply triggers an OS level write so the data hits the socket.

However, there is a comment (visible in the diff) that is invalidated by this change, and the logging call needs to be updated as well - because we're no longer buffering here, it will make a difference.

I'm at -0.5 on this without some testing: we put in the buffering because we were bouncing between the socket and disk in a very coupled manner. IIRC, and we found we got more CPU use out of bzr by buffering a bit, and the chunks that are being written can be very very small (less so with 2a *today* but we want skinny network transfers so we shouldn't discount that).

If testing this is particularly hard, spiv's network test environment post may be very useful. I'm in PP mode this week, so once I've got the reviews done I may see about looking into this.

An alternative I proposed a while back was a reader thread and a writer thread, so that while we may block on either end on IO, python can at least keep going. That way we could set an arbitrary (say 20MB) buffer and stop pulling from the data generator when we've got that much backed up and not on the pipe. We could also sensibly *log* stuff at that point, to say 'hey, network is slow'.

review: Needs Fixing
Revision history for this message
Vincent Ladeuil (vila) wrote :

>>>>> Robert Collins <email address hidden> writes:

<snip/>

    > An alternative I proposed a while back was a reader thread and a
    > writer thread,

+1, while threads may turn out to be a little harder to debug, splitting
the work and testing each part should help *not* getting there by
catching the problems earlier.

         Vincent

Revision history for this message
Martin Pool (mbp) wrote :

[I talked to Robert about this offline, here's a summary for the record]

I should have mentioned earlier that in looking at data received from
Launchpad, we saw a lag of a few seconds, then 1MB of data arriving
all at once, then the same pattern repeats. That's fairly clearly
driven by this code.

Since we already have the behavior of this patch when sending from the
client but not from the server it's a bit hard for me to believe it's
really intentional. At least I would like to see a comment explaining
why we decided they ought to be different.

On 21 June 2010 07:42, Robert Collins <email address hidden> wrote:
> Review: Needs Fixing
> Its not clear to me that it is better, or worse. Certainly if you starve TCP the buffer on the receiver won't open as wide. However the flush in this method is actually 'write' - it simply triggers an OS level write so the data hits the socket.

I don't see why that's a "however". At the moment we starve the OS
buffers. If they empty we may leave the wire idle or as a knock-on
effect cause worse tcp behavior.

> An alternative I proposed a while back was a reader thread and a writer thread, so that while we may block on either end on IO, python can at least keep going. That way we could set an arbitrary (say 20MB) buffer and stop pulling from the data generator when we've got that much backed up and not on the pipe. We could also sensibly *log* stuff at that point, to say 'hey, network is slow'.

That sounds a lot like what OS socket buffers do, so I'd like to only
add this when we're sure OS buffers can't be made to work well.

However Robert clarified offline that what he actually wanted was
apparently multiple threads trying to read from disk, so that the OS
can see more than one request at a time, but presumably no extra
buffering on the socket.

--
Martin

Revision history for this message
Martin Pool (mbp) wrote :

sent to pqm by email

Revision history for this message
Martin Pool (mbp) wrote :

sent to pqm by email

Revision history for this message
Robert Collins (lifeless) wrote :

We had a outbound mail failure in PQM, and I'm not sure why; tracking
it down may be tricky today (no losas around).

Revision history for this message
Martin Pool (mbp) wrote :

It turned out that this generates a failure from assertWriteCount with a 1.7MB error message, which gives the mailserver a headache.

But maybe we could have a bit more headroom in the mail size limit?

Revision history for this message
Robert Collins (lifeless) wrote :

On Fri, Jun 25, 2010 at 3:05 PM, Martin Pool <email address hidden> wrote:
> It turned out that this generates a failure from assertWriteCount with a 1.7MB error message, which gives the mailserver a headache.
>
> But maybe we could have a bit more headroom in the mail size limit?

Its set to 10MB at the moment; if we want more - file an RT for it. I
can do that if you want.

We should also make PQM truncate in that case, I think.
-Rob

Revision history for this message
Martin Pool (mbp) wrote :

sent to pqm by email

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'bzrlib/smart/protocol.py'
--- bzrlib/smart/protocol.py 2010-03-26 04:26:55 +0000
+++ bzrlib/smart/protocol.py 2010-06-25 05:05:44 +0000
@@ -1231,6 +1231,7 @@
1231 if first_chunk is None:1231 if first_chunk is None:
1232 first_chunk = chunk1232 first_chunk = chunk
1233 self._write_prefixed_body(chunk)1233 self._write_prefixed_body(chunk)
1234 self.flush()
1234 if 'hpssdetail' in debug.debug_flags:1235 if 'hpssdetail' in debug.debug_flags:
1235 # Not worth timing separately, as _write_func is1236 # Not worth timing separately, as _write_func is
1236 # actually buffered1237 # actually buffered
12371238
=== modified file 'bzrlib/tests/test_smart_transport.py'
--- bzrlib/tests/test_smart_transport.py 2010-06-16 05:47:02 +0000
+++ bzrlib/tests/test_smart_transport.py 2010-06-25 05:05:44 +0000
@@ -2864,9 +2864,10 @@
2864 self.responder = protocol.ProtocolThreeResponder(self.writes.append)2864 self.responder = protocol.ProtocolThreeResponder(self.writes.append)
28652865
2866 def assertWriteCount(self, expected_count):2866 def assertWriteCount(self, expected_count):
2867 # self.writes can be quite large; don't show the whole thing
2867 self.assertEqual(2868 self.assertEqual(
2868 expected_count, len(self.writes),2869 expected_count, len(self.writes),
2869 "Too many writes: %r" % (self.writes,))2870 "Too many writes: %d, expected %d" % (len(self.writes), expected_count))
28702871
2871 def test_send_error_writes_just_once(self):2872 def test_send_error_writes_just_once(self):
2872 """An error response is written to the medium all at once."""2873 """An error response is written to the medium all at once."""
@@ -2895,22 +2896,9 @@
2895 response = _mod_request.SuccessfulSmartServerResponse(2896 response = _mod_request.SuccessfulSmartServerResponse(
2896 ('arg', 'arg'), body_stream=['chunk1', 'chunk2'])2897 ('arg', 'arg'), body_stream=['chunk1', 'chunk2'])
2897 self.responder.send_response(response)2898 self.responder.send_response(response)
2898 # We will write just once, despite the multiple chunks, due to2899 # Per the discussion in bug 590638 we flush once after the header and
2899 # buffering.2900 # then once after each chunk
2900 self.assertWriteCount(1)2901 self.assertWriteCount(3)
2901
2902 def test_send_response_with_body_stream_flushes_buffers_sometimes(self):
2903 """When there are many bytes (>1MB), multiple writes will occur rather
2904 than buffering indefinitely.
2905 """
2906 # Construct a response with stream with ~1.5MB in it. This should
2907 # trigger 2 writes, but not 3
2908 onekib = '12345678' * 128
2909 body_stream = [onekib] * (1024 + 512)
2910 response = _mod_request.SuccessfulSmartServerResponse(
2911 ('arg', 'arg'), body_stream=body_stream)
2912 self.responder.send_response(response)
2913 self.assertWriteCount(2)
29142902
29152903
2916class TestSmartClientUnicode(tests.TestCase):2904class TestSmartClientUnicode(tests.TestCase):