Bazaar

Merge lp:~mbp/bzr/590638-buffering into lp:bzr

590638-buffering
Merge into bzr.dev

Proposed by Martin Pool on 2010-06-15

Status:

Merged

Approved by:

Vincent Ladeuil on 2010-06-15

Approved revision:

no longer in the source branch.

Merged at revision:

5321

Proposed branch:

lp:~mbp/bzr/590638-buffering

Merge into:

lp:bzr

Diff against target:

53 lines (+6/-17)

2 files modified

bzrlib/smart/protocol.py (+1/-0)
bzrlib/tests/test_smart_transport.py (+5/-17)

To merge this branch:

bzr merge lp:~mbp/bzr/590638-buffering

High

Fix Released

Link a bug report

Reviewer	Date Requested	Status
Robert Collins (community)		Needs Fixing on 2010-06-20
Vincent Ladeuil	2010-06-15	Approve on 2010-06-15
Review via email: mp+27578@code.launchpad.net

Commit message

flush stream parts after each length-prefixed bit, like we do when sending

Description of the change

In looking into bug 590638 with spiv and igc, we noticed that bzr can buffer up to 1MB of useful data when streaming from the server to the client. In other words we can get into the situation where we have 0.999MiB of data on the server ready to be sent to the client, but all the OS buffers on both sides are empty. This can at the least cause a state where the pipe is not filled, and it may also (?) cause poor tcp window behaviour.

TCP behaviour here is not trivial, and this patch probably doesn't make us optimal but it probably makes us better.

This change also makes it symmetric with the client code and other places where data is flushed down.

I think on platforms that support it we probably want to set TCP_CORK and then write as soon as we have data, then release the cork at the end of the logical operation.

I have tested this but I haven't attempted to measure a performance change. It will probably often be lost in the noise but there may be cases where it matters.

Revision history for this message

Vincent Ladeuil (vila) wrote on 2010-06-15:

Hmm, I feel your pain, such an hard diagnostic for a one-line fix deserves reward: Approve :)

review: Approve

Revision history for this message

John A Meinel (jameinel) wrote on 2010-06-15:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Vincent Ladeuil wrote:
> The proposal to merge lp:~mbp/bzr/590638-buffering into lp:bzr has been updated.
>
> Status: Needs review => Approved

I guess I'd like to understand why flushing all the time is actually
better. I agree with the concept of TCP_CORK, though it doesn't really
apply when we are using an SSH channel.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkwXmNgACgkQJdeBCYSNAANTDwCeJaQOTsqV5JIey/9FyStuqYCK
WuMAn3SaQ7jkHyuHUNhRTcHNhzbPrkxY
=xlKb
-----END PGP SIGNATURE-----

Revision history for this message

Robert Collins (lifeless) wrote on 2010-06-20:

Its not clear to me that it is better, or worse. Certainly if you starve TCP the buffer on the receiver won't open as wide. However the flush in this method is actually 'write' - it simply triggers an OS level write so the data hits the socket.

However, there is a comment (visible in the diff) that is invalidated by this change, and the logging call needs to be updated as well - because we're no longer buffering here, it will make a difference.

I'm at -0.5 on this without some testing: we put in the buffering because we were bouncing between the socket and disk in a very coupled manner. IIRC, and we found we got more CPU use out of bzr by buffering a bit, and the chunks that are being written can be very very small (less so with 2a *today* but we want skinny network transfers so we shouldn't discount that).

If testing this is particularly hard, spiv's network test environment post may be very useful. I'm in PP mode this week, so once I've got the reviews done I may see about looking into this.

An alternative I proposed a while back was a reader thread and a writer thread, so that while we may block on either end on IO, python can at least keep going. That way we could set an arbitrary (say 20MB) buffer and stop pulling from the data generator when we've got that much backed up and not on the pipe. We could also sensibly *log* stuff at that point, to say 'hey, network is slow'.

review: Needs Fixing

Revision history for this message

Vincent Ladeuil (vila) wrote on 2010-06-21:

>>>>> Robert Collins <email address hidden> writes:

<snip/>

> An alternative I proposed a while back was a reader thread and a
> writer thread,

+1, while threads may turn out to be a little harder to debug, splitting
the work and testing each part should help *not* getting there by
catching the problems earlier.

Vincent

Revision history for this message

Martin Pool (mbp) wrote on 2010-06-22:

[I talked to Robert about this offline, here's a summary for the record]

I should have mentioned earlier that in looking at data received from
Launchpad, we saw a lag of a few seconds, then 1MB of data arriving
all at once, then the same pattern repeats. That's fairly clearly
driven by this code.

Since we already have the behavior of this patch when sending from the
client but not from the server it's a bit hard for me to believe it's
really intentional. At least I would like to see a comment explaining
why we decided they ought to be different.

On 21 June 2010 07:42, Robert Collins <email address hidden> wrote:
> Review: Needs Fixing
> Its not clear to me that it is better, or worse. Certainly if you starve TCP the buffer on the receiver won't open as wide. However the flush in this method is actually 'write' - it simply triggers an OS level write so the data hits the socket.

I don't see why that's a "however". At the moment we starve the OS
buffers. If they empty we may leave the wire idle or as a knock-on
effect cause worse tcp behavior.

> An alternative I proposed a while back was a reader thread and a writer thread, so that while we may block on either end on IO, python can at least keep going. That way we could set an arbitrary (say 20MB) buffer and stop pulling from the data generator when we've got that much backed up and not on the pipe. We could also sensibly *log* stuff at that point, to say 'hey, network is slow'.

That sounds a lot like what OS socket buffers do, so I'd like to only
add this when we're sure OS buffers can't be made to work well.

However Robert clarified offline that what he actually wanted was
apparently multiple threads trying to read from disk, so that the OS
can see more than one request at a time, but presumably no extra
buffering on the socket.

--
Martin

Revision history for this message

Martin Pool (mbp) wrote on 2010-06-24:

sent to pqm by email

Revision history for this message

Martin Pool (mbp) wrote on 2010-06-25:

sent to pqm by email

Revision history for this message

Robert Collins (lifeless) wrote on 2010-06-25:

We had a outbound mail failure in PQM, and I'm not sure why; tracking
it down may be tricky today (no losas around).

Revision history for this message

Martin Pool (mbp) wrote on 2010-06-25:

It turned out that this generates a failure from assertWriteCount with a 1.7MB error message, which gives the mailserver a headache.

But maybe we could have a bit more headroom in the mail size limit?

Revision history for this message

Robert Collins (lifeless) wrote on 2010-06-25:

On Fri, Jun 25, 2010 at 3:05 PM, Martin Pool <email address hidden> wrote:
> It turned out that this generates a failure from assertWriteCount with a 1.7MB error message, which gives the mailserver a headache.
>
> But maybe we could have a bit more headroom in the mail size limit?

Its set to 10MB at the moment; if we want more - file an RT for it. I
can do that if you want.

We should also make PQM truncate in that case, I think.
-Rob

Revision history for this message

Martin Pool (mbp) wrote on 2010-06-25:

sent to pqm by email

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Alejandro Cornejo2

Bazaar Codereview Subscribers

Benoit Pierre

Gmood

Karl Bielefeldt

Mahmoud Hassan

Martin Pool

Matt Nordhoff

Mohd Fikri Mohd Amin

MrJOHN

Václav Haisman

bzr PQM

vincenzo

to status/vote changes:

Alexander Belchenko

amandla2023

 === modified file 'bzrlib/smart/protocol.py'
 --- bzrlib/smart/protocol.py	2010-03-26 04:26:55 +0000
 +++ bzrlib/smart/protocol.py	2010-06-25 05:05:44 +0000
@@ -1231,6 +1231,7 @@
                      if first_chunk is None:
                          first_chunk = chunk
                      self._write_prefixed_body(chunk)
++                    self.flush()
                      if 'hpssdetail' in debug.debug_flags:
                          # Not worth timing separately, as _write_func is
                          # actually buffered
 === modified file 'bzrlib/tests/test_smart_transport.py'
 --- bzrlib/tests/test_smart_transport.py	2010-06-16 05:47:02 +0000
 +++ bzrlib/tests/test_smart_transport.py	2010-06-25 05:05:44 +0000
@@ -2864,9 +2864,10 @@
          self.responder = protocol.ProtocolThreeResponder(self.writes.append)
      def assertWriteCount(self, expected_count):
++        # self.writes can be quite large; don't show the whole thing
          self.assertEqual(
              expected_count, len(self.writes),
--            "Too many writes: %r" % (self.writes,))
++            "Too many writes: %d, expected %d" % (len(self.writes), expected_count))
      def test_send_error_writes_just_once(self):
          """An error response is written to the medium all at once."""
@@ -2895,22 +2896,9 @@
          response = _mod_request.SuccessfulSmartServerResponse(
              ('arg', 'arg'), body_stream=['chunk1', 'chunk2'])
          self.responder.send_response(response)
--        # We will write just once, despite the multiple chunks, due to
--        # buffering.
--        self.assertWriteCount(1)
--
--    def test_send_response_with_body_stream_flushes_buffers_sometimes(self):
--        """When there are many bytes (>1MB), multiple writes will occur rather
--        than buffering indefinitely.
--        """
--        # Construct a response with stream with ~1.5MB in it. This should
--        # trigger 2 writes, but not 3
--        onekib = '12345678' * 128
--        body_stream = [onekib] * (1024 + 512)
--        response = _mod_request.SuccessfulSmartServerResponse(
--            ('arg', 'arg'), body_stream=body_stream)
--        self.responder.send_response(response)
--        self.assertWriteCount(2)
++        # Per the discussion in bug 590638 we flush once after the header and
++        # then once after each chunk
++        self.assertWriteCount(3)
  class TestSmartClientUnicode(tests.TestCase):