Merge lp:~bzr/bzr/transport_post_connect_hook into lp:bzr
- transport_post_connect_hook
- Merge into bzr.dev
Status: | Merged |
---|---|
Approved by: | Jelmer Vernooij |
Approved revision: | no longer in the source branch. |
Merged at revision: | 6405 |
Proposed branch: | lp:~bzr/bzr/transport_post_connect_hook |
Merge into: | lp:bzr |
Prerequisite: | lp:~spiv/bzr/hooks-refactoring |
Diff against target: |
347 lines (+99/-106) 7 files modified
bzrlib/hooks.py (+1/-0) bzrlib/smart/medium.py (+5/-0) bzrlib/tests/__init__.py (+14/-10) bzrlib/tests/per_transport.py (+34/-0) bzrlib/tests/test_transport.py (+23/-0) bzrlib/tests/transport_util.py (+4/-95) bzrlib/transport/__init__.py (+18/-1) |
To merge this branch: | bzr merge lp:~bzr/bzr/transport_post_connect_hook |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Jelmer Vernooij (community) | Approve | ||
Vincent Ladeuil | Needs Information | ||
Martin Packman | Pending | ||
Review via email: mp+85735@code.launchpad.net |
This proposal supersedes a proposal from 2010-10-14.
Commit message
Add a post_connect hook for transports.
Description of the change
Creates a new post_connect hook for transports, and uses it in the testing framework to ensure transports are disconnected when the test finishes so we don't leak resources. This replaces the current hack that overrides get_transport which was incomplete and causing other problems. See the mailing list thread for more:
<https:/
I'm not really sure how ready this is, but putting up an mp seems as good a way of getting feedback as bugging people on IRC.
The hook is hurled together based on lp:~parthm/bzr/173274-export-hooks if there are opinions on how it should be written differently I'd like to know.
Todo:
News. I'm leaving this till landing actually happens to lessen pain of flux.
Other documentation changes?
Some tests that don't suck. Suggestions welcome.
(originally by mgz, updated to merge bzr.dev)
Vincent Ladeuil (vila) wrote : Posted in a previous version of this proposal | # |
Martin Packman (gz) wrote : Posted in a previous version of this proposal | # |
So, very nearly this exact hook exists in bzrlib.
Martin Packman (gz) wrote : Posted in a previous version of this proposal | # |
Just having a hook turned out to be insufficient, and results in worse hangs than the old get_transport hack as can be seen from the nearly four hour babune runtime last night:
<http://
The problem is that RemoteTransport classes don't use _set_connection and generally behave rather differently. Throwing in a hook point in __init__ when a new medium is built gets us to a leak-free 40 minute runtime:
<http://
However the semantics are not quite correct there. Even if we don't have to worry about remote reconnections (do we?) the post_connect hook is happening before the real connection, and potentially twice in the RemoteHTTPTransport case. Combined with the existing confusion over how many times disconnect gets called, it suggests this hook probably isn't sane enough to be generally useful. We do want the leak problem fixed asap though, so if anyone has any clever ideas...
Vincent Ladeuil (vila) wrote : Posted in a previous version of this proposal | # |
>>>>> Martin [gz] <email address hidden> writes:
> Just having a hook turned out to be insufficient, and results in
> worse hangs than the old get_transport hack as can be seen from
> the nearly four hour babune runtime last night:
> <http://
> The problem is that RemoteTransport classes don't use
> _set_connection and generally behave rather differently. Throwing
> in a hook point in __init__ when a new medium is built gets us to
> a leak-free 40 minute runtime:
> <http://
Ok, so this validates the approach of calling disconnect for all
transports that have been able to connect to their server and ensures
all code paths are covered. As such, I'm tempted to accept the patch as
is waiting for a better solution in a followon.
From a design point of view, I think this outlines the divergence
between the smart transport and the others and came from the time where
we implemented connection sharing.
The smart transport has a medium object that implements
_ensure_
_SharedConnection object which is used a data container by the
transport, but the transport object implements connect_xxx() and
disconnect() while calling _set_connection() when the connection is
established (including reconnections).
So while the post_connect hook can be implemented for the transports
objects, it can't be properly implemented for smart transports where the
transport object is not available at the medium level when the
connection occur (in theory we could pass it done but I'm worried about
ref cycles there).
This hints as connection hook instead of a transport hook but this
doesn't play well with the transport objects who consider the connection
as an opaque attribute so far.
> However the semantics are not quite correct there. Even if we
> don't have to worry about remote reconnections (do we?) the
> post_connect hook is happening before the real connection, and
> potentially twice in the RemoteHTTPTransport case. Combined with
> the existing confusion over how many times disconnect gets called,
I think the confusion comes the fact that disconnect() should be
implemented at the transport level since it is defined as closing the
connection even if there are other transports sharing this connection
(as opposed to closing the connection when the *last* transport using
the connection requires it).
In the test suite, all transports sharing a connection calls
disconnect() to ensure we don't get leaks but that's the expected
behaviour.
> it suggests this hook probably isn't sane enough to be generally
> useful.
Indeed, it lies for the smart transports as it's called *before* the
connection occurs.
> We do want the leak problem fixed asap though, so if anyone has
> any clever ideas...
One alternative would be to add a 'created' hook for transports which,
for the tests, will also call disconnect(). From your hack_transport
branch we know that calling disconnect() for all created transports is
enough to fix the leaks ev...
Jelmer Vernooij (jelmer) wrote : | # |
FWIW This no longer causes hangs with bzr.dev. I'm looking at using the post_connect hook to test the number of connections made in various blackbox tests.
Vincent Ladeuil (vila) wrote : | # |
Wow, this is so old... A few thoughts from reading the comments:
- the hangs referred to in the discussion for the previous version of this proposal are now fixed
- the issue about smart transport and connected transports being different beasts is still existing
Vincent Ladeuil (vila) wrote : | # |
So, I think the same objection still stands: the hook is called for smart
transports *before* the connection occurs which is misleading for a
post_connect hook wanting to precisely track the effective connections.
We can rename it to connection_set...
But in any case, we should explain in the hook documentation that in some
cases, the hook will be called for a transport that hasn't establish its
connection yet.
Not pretty.
This also means that when using RemoteHTTPTransport the hook is most
probably called twice, one for the smart transport and one for the http
transport.
Basically we want to push the hook down at the medium level with some
reference to its transport (which is a good way to introduce ref-cycles ;),
wait, didn't I mention that already ?). There may be a neat trick waiting to
be found but it escapes me right now...
Looking at the smart mediums:
- the Pipe one can arguably call the hook at init time.
- the ssh based ones want to call the hook in _ensure_connection
- the http one... should not call the hook since it's backing http transport
will. Or should it still call the hook ? There are arguments both ways
depending on what the hook is used for...
- SmartClientAlre
definition (and the the comment in the overriding _ensure_connection is a
hint that we are on the right track).
May be we should just focus on the test needs for now and consider yet
another hook ?
Ignoring all cloning issues and connection issues, the original need was to
disconnect all created transports without worrying about disconnecting
unused transport not transport reused multiple times.
@Jelmer: What is your need ? A precise number of connections ?
Martin Packman (gz) wrote : | # |
Have addressed the smart transport case by moving the hook callback down into the medium classes where the connection actually happens. The one wrinkle here is that the medium then gets passes rather than transport. I think this is preferable to having the medium and transport hold references to each other, and the medium classes also have a disconnect method. This should make the hook useful, without actually needing to restructure the transport classes quite yet.
Jelmer Vernooij (jelmer) : | # |
Preview Diff
1 | === modified file 'bzrlib/hooks.py' |
2 | --- bzrlib/hooks.py 2011-12-19 13:23:58 +0000 |
3 | +++ bzrlib/hooks.py 2011-12-23 19:43:24 +0000 |
4 | @@ -83,6 +83,7 @@ |
5 | ('bzrlib.smart.client', '_SmartClient.hooks', 'SmartClientHooks'), |
6 | ('bzrlib.smart.server', 'SmartTCPServer.hooks', 'SmartServerHooks'), |
7 | ('bzrlib.status', 'hooks', 'StatusHooks'), |
8 | + ('bzrlib.transport', 'Transport.hooks', 'TransportHooks'), |
9 | ('bzrlib.version_info_formats.format_rio', 'RioVersionInfoBuilder.hooks', |
10 | 'RioVersionInfoBuilderHooks'), |
11 | ('bzrlib.merge_directive', 'BaseMergeDirective.hooks', |
12 | |
13 | === modified file 'bzrlib/smart/medium.py' |
14 | --- bzrlib/smart/medium.py 2011-12-19 13:23:58 +0000 |
15 | +++ bzrlib/smart/medium.py 2011-12-23 19:43:24 +0000 |
16 | @@ -43,6 +43,7 @@ |
17 | debug, |
18 | errors, |
19 | trace, |
20 | + transport, |
21 | ui, |
22 | urlutils, |
23 | ) |
24 | @@ -1021,6 +1022,8 @@ |
25 | raise AssertionError( |
26 | "Unexpected io_kind %r from %r" |
27 | % (io_kind, self._ssh_connection)) |
28 | + for hook in transport.Transport.hooks["post_connect"]: |
29 | + hook(self) |
30 | |
31 | def _flush(self): |
32 | """See SmartClientStreamMedium._flush().""" |
33 | @@ -1129,6 +1132,8 @@ |
34 | raise errors.ConnectionError("failed to connect to %s:%d: %s" % |
35 | (self._host, port, err_msg)) |
36 | self._connected = True |
37 | + for hook in transport.Transport.hooks["post_connect"]: |
38 | + hook(self) |
39 | |
40 | |
41 | class SmartClientAlreadyConnectedSocketMedium(SmartClientSocketMedium): |
42 | |
43 | === modified file 'bzrlib/tests/__init__.py' |
44 | --- bzrlib/tests/__init__.py 2011-12-18 12:46:49 +0000 |
45 | +++ bzrlib/tests/__init__.py 2011-12-23 19:43:24 +0000 |
46 | @@ -2734,16 +2734,20 @@ |
47 | |
48 | def setUp(self): |
49 | super(TestCaseWithMemoryTransport, self).setUp() |
50 | - # Ensure that ConnectedTransport doesn't leak sockets |
51 | - def get_transport_from_url_with_cleanup(*args, **kwargs): |
52 | - t = orig_get_transport_from_url(*args, **kwargs) |
53 | - if isinstance(t, _mod_transport.ConnectedTransport): |
54 | - self.addCleanup(t.disconnect) |
55 | - return t |
56 | - |
57 | - orig_get_transport_from_url = self.overrideAttr( |
58 | - _mod_transport, 'get_transport_from_url', |
59 | - get_transport_from_url_with_cleanup) |
60 | + |
61 | + def _add_disconnect_cleanup(transport): |
62 | + """Schedule disconnection of given transport at test cleanup |
63 | + |
64 | + This needs to happen for all connected transports or leaks occur. |
65 | + |
66 | + Note reconnections may mean we call disconnect multiple times per |
67 | + transport which is suboptimal but seems harmless. |
68 | + """ |
69 | + self.addCleanup(transport.disconnect) |
70 | + |
71 | + _mod_transport.Transport.hooks.install_named_hook('post_connect', |
72 | + _add_disconnect_cleanup, None) |
73 | + |
74 | self._make_test_root() |
75 | self.addCleanup(os.chdir, os.getcwdu()) |
76 | self.makeAndChdirToTestDir() |
77 | |
78 | === modified file 'bzrlib/tests/per_transport.py' |
79 | --- bzrlib/tests/per_transport.py 2011-11-17 18:06:46 +0000 |
80 | +++ bzrlib/tests/per_transport.py 2011-12-23 19:43:24 +0000 |
81 | @@ -53,9 +53,11 @@ |
82 | from bzrlib.tests.test_transport import TestTransportImplementation |
83 | from bzrlib.transport import ( |
84 | ConnectedTransport, |
85 | + Transport, |
86 | _get_transport_modules, |
87 | ) |
88 | from bzrlib.transport.memory import MemoryTransport |
89 | +from bzrlib.transport.remote import RemoteTransport |
90 | |
91 | |
92 | def get_transport_test_permutations(module): |
93 | @@ -1836,3 +1838,35 @@ |
94 | self.build_tree([needlessly_escaped_dir], transport=t1) |
95 | t2 = t1.clone(needlessly_escaped_dir) |
96 | self.assertEqual(t1.base + "-.09AZ_az~/", t2.base) |
97 | + |
98 | + def test_hook_post_connection_one(self): |
99 | + """Fire post_connect hook after a ConnectedTransport is first used""" |
100 | + log = [] |
101 | + Transport.hooks.install_named_hook("post_connect", log.append, None) |
102 | + t = self.get_transport() |
103 | + self.assertEqual([], log) |
104 | + t.has("non-existant") |
105 | + if isinstance(t, RemoteTransport): |
106 | + self.assertEqual([t.get_smart_medium()], log) |
107 | + elif isinstance(t, ConnectedTransport): |
108 | + self.assertEqual([t], log) |
109 | + else: |
110 | + self.assertEqual([], log) |
111 | + |
112 | + def test_hook_post_connection_multi(self): |
113 | + """Fire post_connect hook once per unshared underlying connection""" |
114 | + log = [] |
115 | + Transport.hooks.install_named_hook("post_connect", log.append, None) |
116 | + t1 = self.get_transport() |
117 | + t2 = t1.clone(".") |
118 | + t3 = self.get_transport() |
119 | + self.assertEqual([], log) |
120 | + t1.has("x") |
121 | + t2.has("x") |
122 | + t3.has("x") |
123 | + if isinstance(t1, RemoteTransport): |
124 | + self.assertEqual([t.get_smart_medium() for t in [t1, t3]], log) |
125 | + elif isinstance(t1, ConnectedTransport): |
126 | + self.assertEqual([t1, t3], log) |
127 | + else: |
128 | + self.assertEqual([], log) |
129 | |
130 | === modified file 'bzrlib/tests/test_transport.py' |
131 | --- bzrlib/tests/test_transport.py 2011-12-05 14:21:55 +0000 |
132 | +++ bzrlib/tests/test_transport.py 2011-12-23 19:43:24 +0000 |
133 | @@ -443,6 +443,29 @@ |
134 | self.assertEqual('chroot-%d:///' % id(server), server.get_url()) |
135 | |
136 | |
137 | +class TestHooks(tests.TestCase): |
138 | + """Basic tests for transport hooks""" |
139 | + |
140 | + def _get_connected_transport(self): |
141 | + return transport.ConnectedTransport("bogus:nowhere") |
142 | + |
143 | + def test_transporthooks_initialisation(self): |
144 | + """Check all expected transport hook points are set up""" |
145 | + hookpoint = transport.TransportHooks() |
146 | + self.assertTrue("post_connect" in hookpoint, |
147 | + "post_connect not in %s" % (hookpoint,)) |
148 | + |
149 | + def test_post_connect(self): |
150 | + """Ensure the post_connect hook is called when _set_transport is""" |
151 | + calls = [] |
152 | + transport.Transport.hooks.install_named_hook("post_connect", |
153 | + calls.append, None) |
154 | + t = self._get_connected_transport() |
155 | + self.assertLength(0, calls) |
156 | + t._set_connection("connection", "auth") |
157 | + self.assertEqual(calls, [t]) |
158 | + |
159 | + |
160 | class PathFilteringDecoratorTransportTest(tests.TestCase): |
161 | """Pathfilter decoration specific tests.""" |
162 | |
163 | |
164 | === modified file 'bzrlib/tests/transport_util.py' |
165 | --- bzrlib/tests/transport_util.py 2011-08-19 22:34:02 +0000 |
166 | +++ bzrlib/tests/transport_util.py 2011-12-23 19:43:24 +0000 |
167 | @@ -14,125 +14,34 @@ |
168 | # along with this program; if not, write to the Free Software |
169 | # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA |
170 | |
171 | -import bzrlib.hooks |
172 | -from bzrlib import transport |
173 | from bzrlib.tests import features |
174 | |
175 | # SFTPTransport offers better performances but relies on paramiko, if paramiko |
176 | # is not available, we fallback to FtpTransport |
177 | if features.paramiko.available(): |
178 | from bzrlib.tests import test_sftp_transport |
179 | - from bzrlib.transport import sftp |
180 | + from bzrlib.transport import sftp, Transport |
181 | _backing_scheme = 'sftp' |
182 | _backing_transport_class = sftp.SFTPTransport |
183 | _backing_test_class = test_sftp_transport.TestCaseWithSFTPServer |
184 | else: |
185 | - from bzrlib.transport import ftp |
186 | + from bzrlib.transport import ftp, Transport |
187 | from bzrlib.tests import test_ftp_transport |
188 | _backing_scheme = 'ftp' |
189 | _backing_transport_class = ftp.FtpTransport |
190 | _backing_test_class = test_ftp_transport.TestCaseWithFTPServer |
191 | |
192 | -from bzrlib.transport import ( |
193 | - ConnectedTransport, |
194 | - register_transport, |
195 | - register_urlparse_netloc_protocol, |
196 | - unregister_transport, |
197 | - _unregister_urlparse_netloc_protocol, |
198 | - ) |
199 | - |
200 | - |
201 | - |
202 | -class TransportHooks(bzrlib.hooks.Hooks): |
203 | - """Dict-mapping hook name to a list of callables for transport hooks""" |
204 | - |
205 | - def __init__(self): |
206 | - super(TransportHooks, self).__init__("bzrlib.tests.transport_util", |
207 | - "InstrumentedTransport.hooks") |
208 | - # Invoked when the transport has just created a new connection. |
209 | - # The api signature is (transport, connection, credentials) |
210 | - self['_set_connection'] = [] |
211 | - |
212 | -_hooked_scheme = 'hooked' |
213 | - |
214 | -def _change_scheme_in(url, actual, desired): |
215 | - if not url.startswith(actual + '://'): |
216 | - raise AssertionError('url "%r" does not start with "%r]"' |
217 | - % (url, actual)) |
218 | - return desired + url[len(actual):] |
219 | - |
220 | - |
221 | -class InstrumentedTransport(_backing_transport_class): |
222 | - """Instrumented transport class to test commands behavior""" |
223 | - |
224 | - hooks = TransportHooks() |
225 | - |
226 | - def __init__(self, base, _from_transport=None): |
227 | - if not base.startswith(_hooked_scheme + '://'): |
228 | - raise ValueError(base) |
229 | - # We need to trick the backing transport class about the scheme used |
230 | - # We'll do the reverse when we need to talk to the backing server |
231 | - fake_base = _change_scheme_in(base, _hooked_scheme, _backing_scheme) |
232 | - super(InstrumentedTransport, self).__init__( |
233 | - fake_base, _from_transport=_from_transport) |
234 | - # The following is needed to minimize the effects of our trick above |
235 | - # while retaining the best compatibility. |
236 | - self._parsed_url.scheme = _hooked_scheme |
237 | - super(ConnectedTransport, self).__init__(str(self._parsed_url)) |
238 | - |
239 | - |
240 | -class ConnectionHookedTransport(InstrumentedTransport): |
241 | - """Transport instrumented to inspect connections""" |
242 | - |
243 | - def _set_connection(self, connection, credentials): |
244 | - """Called when a new connection is created """ |
245 | - super(ConnectionHookedTransport, self)._set_connection(connection, |
246 | - credentials) |
247 | - for hook in self.hooks['_set_connection']: |
248 | - hook(self, connection, credentials) |
249 | - |
250 | |
251 | class TestCaseWithConnectionHookedTransport(_backing_test_class): |
252 | |
253 | def setUp(self): |
254 | - register_urlparse_netloc_protocol(_hooked_scheme) |
255 | - register_transport(_hooked_scheme, ConnectionHookedTransport) |
256 | - self.addCleanup(unregister_transport, _hooked_scheme, |
257 | - ConnectionHookedTransport) |
258 | - self.addCleanup(_unregister_urlparse_netloc_protocol, _hooked_scheme) |
259 | super(TestCaseWithConnectionHookedTransport, self).setUp() |
260 | self.reset_connections() |
261 | - # Add the 'hooked' url to the permitted url list. |
262 | - # XXX: See TestCase.start_server. This whole module shouldn't need to |
263 | - # exist - a bug has been filed on that. once its cleanedup/removed, the |
264 | - # standard test support code will work and permit the server url |
265 | - # correctly. |
266 | - url = self.get_url() |
267 | - t = transport.get_transport_from_url(url) |
268 | - if t.base.endswith('work/'): |
269 | - t = t.clone('../..') |
270 | - self.permit_url(t.base) |
271 | - |
272 | - def get_url(self, relpath=None): |
273 | - super_self = super(TestCaseWithConnectionHookedTransport, self) |
274 | - url = super_self.get_url(relpath) |
275 | - # Replace the backing scheme by our own (see |
276 | - # InstrumentedTransport.__init__) |
277 | - url = _change_scheme_in(url, _backing_scheme, _hooked_scheme) |
278 | - return url |
279 | |
280 | def start_logging_connections(self): |
281 | - self.overrideAttr(InstrumentedTransport, 'hooks', TransportHooks()) |
282 | - # We preserved the hooks class attribute. Now we install our hook. |
283 | - ConnectionHookedTransport.hooks.install_named_hook( |
284 | - '_set_connection', self._collect_connection, None) |
285 | + Transport.hooks.install_named_hook('post_connect', |
286 | + self.connections.append, None) |
287 | |
288 | def reset_connections(self): |
289 | self.connections = [] |
290 | |
291 | - def _collect_connection(self, transport, connection, credentials): |
292 | - # Note: uncomment the following line and use 'bt' under pdb, that will |
293 | - # identify all the connections made including the extraneous ones. |
294 | - # import pdb; pdb.set_trace() |
295 | - self.connections.append(connection) |
296 | - |
297 | |
298 | === modified file 'bzrlib/transport/__init__.py' |
299 | --- bzrlib/transport/__init__.py 2011-12-19 10:58:39 +0000 |
300 | +++ bzrlib/transport/__init__.py 2011-12-23 19:43:24 +0000 |
301 | @@ -52,7 +52,10 @@ |
302 | from bzrlib.trace import ( |
303 | mutter, |
304 | ) |
305 | -from bzrlib import registry |
306 | +from bzrlib import ( |
307 | + hooks, |
308 | + registry, |
309 | + ) |
310 | |
311 | |
312 | # a dictionary of open file streams. Keys are absolute paths, values are |
313 | @@ -283,6 +286,16 @@ |
314 | self.transport.append_bytes(self.relpath, bytes) |
315 | |
316 | |
317 | +class TransportHooks(hooks.Hooks): |
318 | + """Mapping of hook names to registered callbacks for transport hooks""" |
319 | + def __init__(self): |
320 | + super(TransportHooks, self).__init__() |
321 | + self.add_hook("post_connect", |
322 | + "Called after a new connection is established or a reconnect " |
323 | + "occurs. The sole argument passed is either the connected " |
324 | + "transport or smart medium instance.", (2, 5)) |
325 | + |
326 | + |
327 | class Transport(object): |
328 | """This class encapsulates methods for retrieving or putting a file |
329 | from/to a storage location. |
330 | @@ -307,6 +320,8 @@ |
331 | # where the biggest benefit between combining reads and |
332 | # and seeking is. Consider a runtime auto-tune. |
333 | _bytes_to_read_before_seek = 0 |
334 | + |
335 | + hooks = TransportHooks() |
336 | |
337 | def __init__(self, base): |
338 | super(Transport, self).__init__() |
339 | @@ -1496,6 +1511,8 @@ |
340 | """ |
341 | self._shared_connection.connection = connection |
342 | self._shared_connection.credentials = credentials |
343 | + for hook in self.hooks["post_connect"]: |
344 | + hook(self) |
345 | |
346 | def _get_connection(self): |
347 | """Returns the transport specific connection object.""" |
129 + for hook in self.hooks[ "post_connect" ]:
130 + # GZ 2010-10-14: Should the hook be passed the new connection and
131 + # credentials too or does opaque really mean that?
132 + hook(self)
The hook already receives 'self' so it can access the connection/ credentials if needed.
But they are specific to each transport class...
The tests are ok. We know the hook is heavily exercised or we get leaks anyway.
If you really really want to add tests you can check what happens when several transports share a connection, but even that sounds overkill.
Write the NEWS entry, we'll see what is available when your patch lands. Don't forget the hooks-help.txt file.
I'll ping people on the pre-requisites.