Thread traceback running test suite on Maverick

Bug #767177 reported by John A Meinel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar
In Progress
Medium
John A Meinel

Bug Description

After some time, I've come back to running the full test suite. I cannot run it without "--parallel=something" because we end up getting out-of-file-handle errors. However, I just got these tracebacks which are happening in threads (and thus why it isn't blocking the test suite run as actual failed tests.) I wanted to record it before I forgot about it:

Exception in thread Thread-1768:TransportTests.test_stat(ChrootTransport,TestingChrootServer)
Traceback (most recent call last):
  File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.6/threading.py", line 484, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib/pymodules/python2.6/paramiko/sftp_file.py", line 453, in _prefetch_thread
    self.sftp._async_request(self, CMD_READ, self.handle, long(offset), int(length))
  File "/usr/lib/pymodules/python2.6/paramiko/sftp_client.py", line 649, in _async_request
    self._send_packet(t, str(msg))
  File "/usr/lib/pymodules/python2.6/paramiko/sftp.py", line 172, in _send_packet
    self._write_all(out)
  File "/usr/lib/pymodules/python2.6/paramiko/sftp.py", line 136, in _write_all
    n = self.sock.send(out)
  File "/home/jameinel/dev/bzr/work/bzrlib/transport/ssh.py", line 200, in send
    return self.__socket.send(data)
  File "/usr/lib/python2.6/socket.py", line 167, in _dummy
    raise error(EBADF, 'Bad file descriptor')
error: [Errno 9] Bad file descriptor

Exception in thread Thread-1774:
Traceback (most recent call last):
  File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.6/threading.py", line 484, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib/pymodules/python2.6/paramiko/sftp_file.py", line 453, in _prefetch_thread
    self.sftp._async_request(self, CMD_READ, self.handle, long(offset), int(length))
  File "/usr/lib/pymodules/python2.6/paramiko/sftp_client.py", line 649, in _async_request
    self._send_packet(t, str(msg))
  File "/usr/lib/pymodules/python2.6/paramiko/sftp.py", line 172, in _send_packet
    self._write_all(out)
  File "/usr/lib/pymodules/python2.6/paramiko/sftp.py", line 136, in _write_all
    n = self.sock.send(out)
  File "/home/jameinel/dev/bzr/work/bzrlib/transport/ssh.py", line 200, in send
    return self.__socket.send(data)
  File "/usr/lib/python2.6/socket.py", line 167, in _dummy
    raise error(EBADF, 'Bad file descriptor')
error: [Errno 9] Bad file descriptor

I notice that it is all async requests. I also note that SFTPFile handles use an async request to close the handle when they are deconstructed. Leading to a high likelyhood that we are letting the file close async instead of a try/finally/addCleanup .close(). And that fails intermittently based on whether the server is closed before the client's file handle tries to close itself.

Tags: selftest

Related branches

John A Meinel (jameinel)
Changed in bzr:
assignee: nobody → John A Meinel (jameinel)
status: Confirmed → In Progress
Revision history for this message
Jelmer Vernooij (jelmer) wrote : Re: [Bug 767177] [NEW] Thread traceback running test suite on Maverick

On Wed, 2011-04-20 at 13:29 +0000, John A Meinel wrote:
> Public bug reported:
>
> After some time, I've come back to running the full test suite. I cannot
> run it without "--parallel=something" because we end up getting out-of-
> file-handle errors. However, I just got these tracebacks which are
> happening in threads (and thus why it isn't blocking the test suite run
> as actual failed tests.) I wanted to record it before I forgot about it:
>
> Exception in thread Thread-1768:TransportTests.test_stat(ChrootTransport,TestingChrootServer)
> Traceback (most recent call last):
> File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
> self.run()
> File "/usr/lib/python2.6/threading.py", line 484, in run
> self.__target(*self.__args, **self.__kwargs)
> File "/usr/lib/pymodules/python2.6/paramiko/sftp_file.py", line 453, in _prefetch_thread
> self.sftp._async_request(self, CMD_READ, self.handle, long(offset), int(length))
> File "/usr/lib/pymodules/python2.6/paramiko/sftp_client.py", line 649, in _async_request
> self._send_packet(t, str(msg))
> File "/usr/lib/pymodules/python2.6/paramiko/sftp.py", line 172, in _send_packet
> self._write_all(out)
> File "/usr/lib/pymodules/python2.6/paramiko/sftp.py", line 136, in _write_all
> n = self.sock.send(out)
> File "/home/jameinel/dev/bzr/work/bzrlib/transport/ssh.py", line 200, in send
> return self.__socket.send(data)
> File "/usr/lib/python2.6/socket.py", line 167, in _dummy
> raise error(EBADF, 'Bad file descriptor')
> error: [Errno 9] Bad file descriptor
>
> Exception in thread Thread-1774:
> Traceback (most recent call last):
> File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
> self.run()
> File "/usr/lib/python2.6/threading.py", line 484, in run
> self.__target(*self.__args, **self.__kwargs)
> File "/usr/lib/pymodules/python2.6/paramiko/sftp_file.py", line 453, in _prefetch_thread
> self.sftp._async_request(self, CMD_READ, self.handle, long(offset), int(length))
> File "/usr/lib/pymodules/python2.6/paramiko/sftp_client.py", line 649, in _async_request
> self._send_packet(t, str(msg))
> File "/usr/lib/pymodules/python2.6/paramiko/sftp.py", line 172, in _send_packet
> self._write_all(out)
> File "/usr/lib/pymodules/python2.6/paramiko/sftp.py", line 136, in _write_all
> n = self.sock.send(out)
> File "/home/jameinel/dev/bzr/work/bzrlib/transport/ssh.py", line 200, in send
> return self.__socket.send(data)
> File "/usr/lib/python2.6/socket.py", line 167, in _dummy
> raise error(EBADF, 'Bad file descriptor')
> error: [Errno 9] Bad file descriptor
>
> I notice that it is all async requests. I also note that SFTPFile
> handles use an async request to close the handle when they are
> deconstructed. Leading to a high likelyhood that we are letting the file
> close async instead of a try/finally/addCleanup .close(). And that fails
> intermittently based on whether the server is closed before the client's
> file handle tries to close itself.
FWIW I can reproduce this on Natty as well.

Cheers,

Jelmer

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...
>> I notice that it is all async requests. I also note that SFTPFile
>> handles use an async request to close the handle when they are
>> deconstructed. Leading to a high likelyhood that we are letting the file
>> close async instead of a try/finally/addCleanup .close(). And that fails
>> intermittently based on whether the server is closed before the client's
>> file handle tries to close itself.
> FWIW I can reproduce this on Natty as well.
>
> Cheers,
>
> Jelmer
>

Spelunking through the code, a few more things come up:

1) I only see this with --parallel=fork
2) SFTPTransport._readv() seems to be a big offender.

Specifically, the SFTPTransport._readv asks for a file pointer, hands it
over to a helper, but never calls fp.close().
- From what I can tell, LocalTransport.readv() is probably also an
offender here.

Part of the difficulty is that those are defined as generators, that we
want to 'stream' back the content as it comes in over the wire. So we
can't trivially say that we can close the handle until we are done
streaming.

However, I would be happy to add "_seek_and_readv()" consumes the file
pointer (so it calls .close() when it is done).

I wonder a tiny little bit if we want to have this done synchronously in
production code, though. By leaving it as an async request, we aren't
stopped by the forced round trip after doing a partial read of file X
and moving on to a partial read of file Y.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2u5UsACgkQJdeBCYSNAAPGigCgxesYic1fRv3q5inAZL7ABHbk
DKUAoMBZpjgnNu0dXTSoTLtS+3RUoAIQ
=OxT9
-----END PGP SIGNATURE-----

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I was wrong, we have this code:

    # Now that we've read some data, see if we can yield anything back
    while cur_offset_and_size in data_map:
        this_data = data_map.pop(cur_offset_and_size)
        this_offset = cur_offset_and_size[0]
        try:
            cur_offset_and_size = offset_stack.next()
        except StopIteration:
            # Close the file handle as there will be no more data
            # The handle would normally be cleaned up as this code goes
            # out of scope, but as we are a generator, not all code
            # will re-enter once we have consumed all the expected
            # data. For example:
            # zip(range(len(requests)), readv(foo, requests))
            # Will stop because the range is done, and not run the
            # cleanup code for the readv().
            fp.close()
            cur_offset_and_size = None
        yield this_offset, this_data

So we are already calling fp.close() in _seek_and_read under specific
circumstances. So I feel fine extending this to SFTP's logic.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2u5xMACgkQJdeBCYSNAANEkgCeIEMr/0wY5hLQAmKGtixyLGmN
M5EAoIPpdvIoWHDEIjnZ+2E37YLQdoVK
=PAUX
-----END PGP SIGNATURE-----

Revision history for this message
Martin Pool (mbp) wrote : Re: [Bug 767177] Re: Thread traceback running test suite on Maverick

I think this is the same as the quite old bug 656170?

Revision history for this message
John A Meinel (jameinel) wrote :

Yep.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.