Merge lp:~mbp/bzr/497274-http-405 into lp:bzr
- 497274-http-405
- Merge into bzr.dev
Status: | Rejected |
---|---|
Rejected by: | Martin Pool |
Proposed branch: | lp:~mbp/bzr/497274-http-405 |
Merge into: | lp:bzr |
Diff against target: |
237 lines (+104/-31) 5 files modified
NEWS (+8/-0) bzrlib/bzrdir.py (+1/-1) bzrlib/tests/__init__.py (+1/-0) bzrlib/transport/http/__init__.py (+58/-1) bzrlib/transport/http/_pycurl.py (+36/-29) |
To merge this branch: | bzr merge lp:~mbp/bzr/497274-http-405 |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Vincent Ladeuil | Needs Information | ||
John A Meinel | Approve | ||
Review via email: mp+16229@code.launchpad.net |
Commit message
Description of the change
Martin Pool (mbp) wrote : | # |
John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Martin Pool wrote:
> Martin Pool has proposed merging lp:~mbp/bzr/497274-http-405 into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
> Related bugs:
> #497274 should interpret http 405 "not allowed" as "no smart server here" - breaks foreign branches on google code
> https:/
>
>
> This tries to handle 'http 405 not allowed' from Google code as meaning "not allowed to look for .bzr", ie there's no bzr branch there. That may allow foreign branches to work against it. However, their servers keep falling over with '503 not available' errors, which make testing it difficult.
>
> I've only tested this interactively not added tests. I probably should.
>
> This prints and logs the actual message from the http response page.
>
> It also tries to unify http response handling currently scattered across _pycurl.py into one method in the base class. This may be slightly dangerous to change as it's lightly tested and tries to work around server quirks, but any breakage should be shallow and it's worth clearing it up I think. I haven't yet gone through the urllib implementation.
>
I would ask Vincent for some help, but as I understand he already has
all the infrastructure to create an HTTP server that gives canned
responses. So setting one up to return 405 on .bzr/branch-format and
ensuring that bzr then raises NoSuchBranch rather than
InvalidHTTPResponse seems a reasonable high-level test.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAks
biAAn3asxZvkOtM
=JW9R
-----END PGP SIGNATURE-----
John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Martin Pool wrote:
> Martin Pool has proposed merging lp:~mbp/bzr/497274-http-405 into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
> Related bugs:
> #497274 should interpret http 405 "not allowed" as "no smart server here" - breaks foreign branches on google code
> https:/
>
>
> This tries to handle 'http 405 not allowed' from Google code as meaning "not allowed to look for .bzr", ie there's no bzr branch there. That may allow foreign branches to work against it. However, their servers keep falling over with '503 not available' errors, which make testing it difficult.
>
> I've only tested this interactively not added tests. I probably should.
>
> This prints and logs the actual message from the http response page.
>
> It also tries to unify http response handling currently scattered across _pycurl.py into one method in the base class. This may be slightly dangerous to change as it's lightly tested and tries to work around server quirks, but any breakage should be shallow and it's worth clearing it up I think. I haven't yet gone through the urllib implementation.
>
Oh, and:
review: approve
- From me, though it would be nice to have the test case before landing.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAks
gp8AoJXZrwqN8eb
=mBqz
-----END PGP SIGNATURE-----
Vincent Ladeuil (vila) wrote : | # |
> This tries to handle 'http 405 not allowed' from Google code as meaning "not
> allowed to look for .bzr", ie there's no bzr branch there. That may allow
> foreign branches to work against it. However, their servers keep falling over
> with '503 not available' errors, which make testing it difficult.
Then look at lp:~vila/bzr/497274-http-405 for an alternative, minimal (and probably incomplete)
fix with tests.
_post() is used only by smart_http_
InvalidHttpResponse into SmartProtocolError which higher layers interpret as NotABranch when needed.
>
> I've only tested this interactively not added tests. I probably should.
See above.
>
> This prints and logs the actual message from the http response page.
I doubt users are really interested in the general case, muttering may be enough though.
>
> It also tries to unify http response handling currently scattered across
> _pycurl.py into one method in the base class.
I'm pretty sure I went mostly in the opposite direction in the past based on problems
encountered while implementing the webdav support.
I didn't try to unify both implementations either... mostly because we will deprecate pycurl
one day and I was happy with the urllib one. The idea there is that there are three categories
of error codes:
- the ones that represent success,
- the ones that represent known failures,
- the others
Upon reception, the callers must handle the success cases, may handle some known
failures specifically before relying on a fallback for the remaining ones.
Basically the http requests are generic enough that the same error code can be interpreted
differently in different contexts. Unifying the error handling too much makes it harder
to raise the correct bzr exceptions.
So far, only 403 was really unambiguously common, even 404 needed to be handled separately
for HEAD (which is the main reason why I prefer to *not* have it in the common code).
All in all, that gives a coherent way to code all the methods with limited duplication
(mostly 404 but everybody knows what 404 means :)
> This may be slightly dangerous
> to change as it's lightly tested and tries to work around server quirks, but
> any breakage should be shallow and it's worth clearing it up I think. I
> haven't yet gone through the urllib implementation.
See above, that's a one-line fix, the pycurl implementation doesn't even need to be fixed.
That may explains why you can't test it against google, I suspect others use the
urllib implementation and needs the fix.
Note that I fixed a bug in send_http_
So I vote needs_fixing and will mark as Work In Progress.
Martin Pool (mbp) wrote : | # |
> > This tries to handle 'http 405 not allowed' from Google code as meaning "not
> > allowed to look for .bzr", ie there's no bzr branch there. That may allow
> > foreign branches to work against it. However, their servers keep falling
> over
> > with '503 not available' errors, which make testing it difficult.
>
> Then look at lp:~vila/bzr/497274-http-405 for an alternative, minimal (and
> probably incomplete)
> fix with tests.
I'll merge that into mine and see how it looks.
> _post() is used only by smart_http_
> InvalidHttpResponse into SmartProtocolError which higher layers interpret as
> NotABranch when needed.
>
> >
> > I've only tested this interactively not added tests. I probably should.
>
> See above.
>
> >
> > This prints and logs the actual message from the http response page.
>
> I doubt users are really interested in the general case, muttering may be
> enough though.
They might not be, but only giving the status code is not very helpful if there is something mysterious going on, like the server being overloaded. We tend to get bug reports about this - though just getting it into the trace should help us with that.
>
> >
> > It also tries to unify http response handling currently scattered across
> > _pycurl.py into one method in the base class.
>
> I'm pretty sure I went mostly in the opposite direction in the past based on
> problems
> encountered while implementing the webdav support.
>
> I didn't try to unify both implementations either... mostly because we will
> deprecate pycurl
> one day and I was happy with the urllib one. The idea there is that there are
> three categories
> of error codes:
> - the ones that represent success,
> - the ones that represent known failures,
> - the others
>
> Upon reception, the callers must handle the success cases, may handle some
> known
> failures specifically before relying on a fallback for the remaining ones.
>
> Basically the http requests are generic enough that the same error code can be
> interpreted
> differently in different contexts. Unifying the error handling too much makes
> it harder
> to raise the correct bzr exceptions.
>
> So far, only 403 was really unambiguously common, even 404 needed to be
> handled separately
> for HEAD (which is the main reason why I prefer to *not* have it in the common
> code).
>
> All in all, that gives a coherent way to code all the methods with limited
> duplication
> (mostly 404 but everybody knows what 404 means :)
There are two types of possible duplication: across different call paths (get vs post vs others) and across different http client implementations. I can see how different callers might want different interpretations for some codes, but I don't see why we would want different interpretations across urllib and pycurl, unless it really is something where the client library interferes with the result, as it may for redirects.
>
> > This may be slightly dangerous
> > to change as it's lightly tested and tries to work around server quirks, but
> > any breakage should be shallow and it's worth clearing it up I think. I
> > haven't yet gone through the urllib implementatio...
Martin Pool (mbp) wrote : | # |
@vila: How do you see the interaction between the various http exception classes working. For instance, why is this code as it is:
def http_error_
if code == 403:
raise errors.
' for %s' % req.get_full_url())
else:
raise errors.
Martin Pool (mbp) wrote : | # |
google code seems to be up again and I've just been testing this interactively, and I can't actually get trunk to fail there, even with pycurl disabled. So I'm not sure what the future of this patch is: to me at least some of it seems like a useful cleanup, but given all the quirks here I don't think it's worth landing it without at least one real-world test showing it's better.
vila, how did you reproduce an improvement with your patch?
Martin Pool (mbp) wrote : | # |
I've split out the html error handling (pycurl only) into https:/
Vincent Ladeuil (vila) wrote : | # |
>>>>> "martin" == Martin Pool <email address hidden> writes:
Disclaimer: I wrote my patch in urgency without trying to
integrate it with yours, as such I tried to address the missing
parts. The were no tests and no urllib implementation. I stopped
there and didn't explain it as well as I should have, see below
for more comments.
<snip/>
martin> I'll merge that into mine and see how it looks.
That was the idea, I'm fine with whatever end result you come up
with.
<snip/>
>> I doubt users are really interested in the general case,
>> muttering may be enough though.
martin> They might not be, but only giving the status code is
martin> not very helpful if there is something mysterious
martin> going on, like the server being overloaded. We tend
martin> to get bug reports about this - though just getting
martin> it into the trace should help us with that.
Yup. I was a bit uncomfortable with unhtml() but not that
much. My main concern is that its a bit brute force and in
presence of errors I like, as a dev, to have access to the raw
data. Users have others expectancies and I don't a clear answer
about what should be presented.
<snip/>
martin> There are two types of possible duplication: across
martin> different call paths (get vs post vs others) and
martin> across different http client implementations. I can
martin> see how different callers might want different
martin> interpretations for some codes, but I don't see why
martin> we would want different interpretations across urllib
martin> and pycurl, unless it really is something where the
martin> client library interferes with the result, as it may
martin> for redirects.
So, thanks for bringing a fresh eye here. Historically I've never
attempt to unify the two implementations very hard based on the
assumption that pycurl will be deprecated (shudder).
But as you point above, both goals are a bit contradictory, you
have to care to leave errors escape the lower layer handling so
that the higher (common) one can work.
<snip/>
martin> So your code has, in handle_response:
martin> elif code == 405:
martin> # The server refused to handle the request, the data presumably
martin> # contains details about the error and not useful data, but we leave
martin> # that to the higher levels.
martin> return data
martin> which strongly looks like you're going to treat the
martin> html text of the error response as a valid response
martin> body, which is definitely not right. OK, apparently
martin> this doesn't necessarily happen because the base
martin> class send_http_
martin> there's still some risk of that.
Yup, that's why I said the implementation is incomplete. This was
the less intrusive fix but I was not happy with the risk.
martin> If you pass around (http_status, body) too much then
martin> multiple layers need to know how to interpret the
martin> status code. Without understanding it, they can't
martin> understand how to use the body data. For insta2nce,
martin> send_http_
Vincent Ladeuil (vila) wrote : | # |
martin> I've split out the html error handling (pycurl only) into
martin> https:/
The diff isn't online yet, I'll give it a look in a couple of
days. See my other mail and feel free to bring more of your work
based on my remarks there.
John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
> martin> I guess I don't properly understand whether you think
> martin> my patch is actually wrong or just needs tests. I
> martin> will try to use your tests with mine and see where we
> martin> get to.
>
> I hope I clarified that: except for 404, I'm fine with a common
> error handling, may be I just haven't seen yet a good way to
> implement that but I didn't spend much time on it either and I
> trust you to find it ;)
>
If the only code that uses 404 specially is HEAD, then I would be happy
to have it trap the exception. #1 because I don't think we use
"transport.has*" in any live code path. LBYL vs EAFTP.
I think the only code that uses it is probably old-format repositories.
(Where having a revision was indicated by the presence of the
'.bzr/revisions
I'm pretty sure everything else would treate 404 as NoSuchFile.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAks
vOMAoKCYRxzzy7R
=dmsP
-----END PGP SIGNATURE-----
Martin Pool (mbp) wrote : | # |
2009/12/22 John A Meinel <email address hidden>:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
>> martin> I guess I don't properly understand whether you think
>> martin> my patch is actually wrong or just needs tests. I
>> martin> will try to use your tests with mine and see where we
>> martin> get to.
>>
>> I hope I clarified that: except for 404, I'm fine with a common
>> error handling, may be I just haven't seen yet a good way to
>> implement that but I didn't spend much time on it either and I
>> trust you to find it ;)
>>
>
> If the only code that uses 404 specially is HEAD, then I would be happy
> to have it trap the exception. #1 because I don't think we use
> "transport.has*" in any live code path. LBYL vs EAFTP.
>
> I think the only code that uses it is probably old-format repositories.
> (Where having a revision was indicated by the presence of the
> '.bzr/revisions
>
> I'm pretty sure everything else would treate 404 as NoSuchFile.
I think it would be reasonable for .has() to just catch NoSuchFile.
Having looked at this a bit more: InvalidHttpResponse mixes both "not
valid http" and "not the http code we expected." So perhaps a good
step would be to separate that into "HttpError" containing all the
right fields, then we can translate that into a specific bzr error,
taking into account the type of operation we were doing.
However, since imports from google are apparently now working, this is
not urgent for me.
--
Martin <http://
Vincent Ladeuil (vila) wrote : | # |
@poolie: From the comments, I get the impression that you have un-pushed changes there. If that's right, can you update this mp ? I'm pretty sure it's in a good enough state to land on trunk if only to allow whoever being able to reproduce the issue to test it.
This probably needs to be refreshed against the current trunk, watch the conflicts with an hawk eye ;)
Martin Pool (mbp) wrote : | # |
vila, on the whole, I'll just abandon this branch. I think the situation it was trying to fix may have been a transient problem at Google's end, and there's no longer enough obvious benefit to be worth changing it.
Vincent Ladeuil (vila) wrote : | # |
>>>>> Martin Pool <email address hidden> writes:
> vila, on the whole, I'll just abandon this branch. I think the
> situation it was trying to fix may have been a transient problem
> at Google's end, and there's no longer enough obvious benefit to
> be worth changing it.
Did you write this before or after commenting on bug #497274 ?
https:/
still consider this proposal to be useful no ?
Martin Pool (mbp) wrote : | # |
On 21 February 2011 18:48, Vincent Ladeuil <email address hidden> wrote:
>>>>>> Martin Pool <email address hidden> writes:
>
> > vila, on the whole, I'll just abandon this branch. I think the
> > situation it was trying to fix may have been a transient problem
> > at Google's end, and there's no longer enough obvious benefit to
> > be worth changing it.
>
> Did you write this before or after commenting on bug #497274 ?
>
> https:/
> still consider this proposal to be useful no ?
I wrote that before.
I don't think this mp is still in progress, but it can be resurrected
if that's useful.
Unmerged revisions
- 4917. By Martin Pool
-
Remove obsolete comments
- 4916. By Martin Pool
-
Show unhtml_roughly for some http errors to give more of a clue why
- 4915. By Martin Pool
-
Add unhtml_roughly to show html http errors
- 4914. By Martin Pool
-
doc
- 4913. By Martin Pool
-
Move http->error translation into HttpTransportBase
- 4912. By Martin Pool
-
TransportNotPos
sible taken to mean there's no branch present - 4911. By Martin Pool
-
Tweak http error
- 4910. By Martin Pool
-
Interpret http 405 'not allowed' as meaning there's no smart server present
- 4909. By Martin Pool
-
Tweak http error message
- 4908. By Martin Pool
-
Map http 405 to TransportNotPos
sible
Preview Diff
1 | === modified file 'NEWS' |
2 | --- NEWS 2009-12-15 19:59:00 +0000 |
3 | +++ NEWS 2009-12-16 08:26:16 +0000 |
4 | @@ -20,6 +20,11 @@ |
5 | Bug Fixes |
6 | ********* |
7 | |
8 | +* HTTP 405 "Not Allowed" is taken to mean there's no bzr branch or smart |
9 | + server at the URL. This is sent by Google Code and blocks use of |
10 | + foreign branch plugins. |
11 | + (Martin Pool, #497274) |
12 | + |
13 | Improvements |
14 | ************ |
15 | |
16 | @@ -32,6 +37,9 @@ |
17 | Internals |
18 | ********* |
19 | |
20 | +* PyCurl http/https error handling unified; this may have knock-on effects |
21 | + on quirky web servers. (Martin Pool) |
22 | + |
23 | Testing |
24 | ******* |
25 | |
26 | |
27 | === modified file 'bzrlib/bzrdir.py' |
28 | --- bzrlib/bzrdir.py 2009-12-02 17:56:06 +0000 |
29 | +++ bzrlib/bzrdir.py 2009-12-16 08:26:16 +0000 |
30 | @@ -1828,7 +1828,7 @@ |
31 | """Return the .bzrdir style format present in a directory.""" |
32 | try: |
33 | format_string = transport.get_bytes(".bzr/branch-format") |
34 | - except errors.NoSuchFile: |
35 | + except (errors.NoSuchFile, errors.TransportNotPossible): |
36 | raise errors.NotBranchError(path=transport.base) |
37 | |
38 | try: |
39 | |
40 | === modified file 'bzrlib/tests/__init__.py' |
41 | --- bzrlib/tests/__init__.py 2009-12-08 21:46:07 +0000 |
42 | +++ bzrlib/tests/__init__.py 2009-12-16 08:26:16 +0000 |
43 | @@ -3934,6 +3934,7 @@ |
44 | 'bzrlib.symbol_versioning', |
45 | 'bzrlib.tests', |
46 | 'bzrlib.timestamp', |
47 | + 'bzrlib.transport.http', |
48 | 'bzrlib.version_info_formats.format_custom', |
49 | ] |
50 | |
51 | |
52 | === modified file 'bzrlib/transport/http/__init__.py' |
53 | --- bzrlib/transport/http/__init__.py 2009-11-26 14:39:31 +0000 |
54 | +++ bzrlib/transport/http/__init__.py 2009-12-16 08:26:16 +0000 |
55 | @@ -205,6 +205,51 @@ |
56 | # use. |
57 | _get_max_size = 0 |
58 | |
59 | + def _raise_mapped_error(self, url, code, response_body, |
60 | + context_message=None): |
61 | + """Translate an http error into a bzrlib exception. |
62 | + |
63 | + Some methods may choose to override this for particular cases. |
64 | + |
65 | + The URL and code are automatically included as appropriate. |
66 | + |
67 | + :param code: integer response code |
68 | + :param response_body: string containing the body of the error response |
69 | + (typically html) |
70 | + """ |
71 | + # XXX: This should also be used through the urllib implementation, but |
72 | + # it's not yet |
73 | + mutter("http error: %d %s" % (code, url)) |
74 | + mutter(" response body: %r" % response_body) |
75 | + if response_body: |
76 | + plaintext_body = unhtml_roughly(response_body) |
77 | + else: |
78 | + plaintext_body = '' |
79 | + if code == 404: |
80 | + raise errors.NoSuchFile(url) |
81 | + elif code == 403: |
82 | + # "The server understood the request, but is refusing to fulfill |
83 | + # it. Authorization will not help and the request SHOULD NOT be |
84 | + # repeated." |
85 | + raise errors.InvalidHttpResponse(url, '403 Forbidden') |
86 | + elif code == 405: |
87 | + # "The method specified in the Request-Line is not allowed for the |
88 | + # resource identified by the Request-URI. The response MUST |
89 | + # include an Allow header containing a list of valid methods for |
90 | + # the requested resource." |
91 | + # |
92 | + # Sent by Google code when probing for .bzr |
93 | + # <https://bugs.edge.launchpad.net/bzr/+bug/497274> |
94 | + raise errors.TransportNotPossible(url, '405 Not Allowed: ' |
95 | + + plaintext_body) |
96 | + else: |
97 | + if context_message is None: |
98 | + msg = '' |
99 | + else: |
100 | + msg = ': ' + context_message |
101 | + raise errors.InvalidHttpResponse( |
102 | + url, 'http response %d%s: %s' % (code, msg, plaintext_body)) |
103 | + |
104 | def _readv(self, relpath, offsets): |
105 | """Get parts of the file at the given relative path. |
106 | |
107 | @@ -614,10 +659,13 @@ |
108 | t = self._http_transport_ref() |
109 | code, body_filelike = t._post(bytes) |
110 | if code != 200: |
111 | + # this should normally be handled by _post, but check it here |
112 | raise InvalidHttpResponse( |
113 | t._remote_path('.bzr/smart'), |
114 | 'Expected 200 response code, got %r' % (code,)) |
115 | - except (errors.InvalidHttpResponse, errors.ConnectionReset), e: |
116 | + except (errors.TransportNotPossible, |
117 | + errors.InvalidHttpResponse, |
118 | + errors.ConnectionReset), e: |
119 | raise errors.SmartProtocolError(str(e)) |
120 | return body_filelike |
121 | |
122 | @@ -661,3 +709,12 @@ |
123 | def _finished_reading(self): |
124 | """See SmartClientMediumRequest._finished_reading.""" |
125 | pass |
126 | + |
127 | + |
128 | +def unhtml_roughly(maybe_html): |
129 | + """Very approximate html->text translation, for presenting error bodies. |
130 | + |
131 | + >>> unhtml_roughly("<b>bad</b> things happened\\n") |
132 | + ' bad things happened ' |
133 | + """ |
134 | + return re.subn(r"(<[^>]*>|\n| )", " ", maybe_html)[0] |
135 | |
136 | === modified file 'bzrlib/transport/http/_pycurl.py' |
137 | --- bzrlib/transport/http/_pycurl.py 2009-08-19 16:33:39 +0000 |
138 | +++ bzrlib/transport/http/_pycurl.py 2009-12-16 08:26:16 +0000 |
139 | @@ -1,4 +1,4 @@ |
140 | -# Copyright (C) 2006 Canonical Ltd |
141 | +# Copyright (C) 2006, 2009 Canonical Ltd |
142 | # |
143 | # This program is free software; you can redistribute it and/or modify |
144 | # it under the terms of the GNU General Public License as published by |
145 | @@ -206,13 +206,10 @@ |
146 | code = curl.getinfo(pycurl.HTTP_CODE) |
147 | data.seek(0) |
148 | |
149 | - if code == 404: |
150 | - raise errors.NoSuchFile(abspath) |
151 | - if code != 200: |
152 | - self._raise_curl_http_error( |
153 | - curl, 'expected 200 or 404 for full response.') |
154 | - |
155 | - return code, data |
156 | + if code == 200: |
157 | + return code, data |
158 | + else: |
159 | + self._raise_curl_http_error(curl) |
160 | |
161 | # The parent class use 0 to minimize the requests, but since we can't |
162 | # exploit the results as soon as they are received (pycurl limitation) we'd |
163 | @@ -237,16 +234,21 @@ |
164 | |
165 | code = curl.getinfo(pycurl.HTTP_CODE) |
166 | |
167 | - if code == 404: # not found |
168 | - raise errors.NoSuchFile(abspath) |
169 | + if code in (200, 206): |
170 | + msg = self._parse_headers(header) |
171 | + return code, response.handle_response(abspath, code, msg, data) |
172 | elif code in (400, 416): |
173 | # We don't know which, but one of the ranges we specified was |
174 | # wrong. |
175 | + # |
176 | + # 400 is not strictly 'invalid range', but rather 'bad request' |
177 | + # but (perhaps) some servers use it for that meaning. Aside from |
178 | + # that, this could go into _raise_curl_http_error. |
179 | raise errors.InvalidHttpRange(abspath, range_header, |
180 | 'Server return code %d' |
181 | % curl.getinfo(pycurl.HTTP_CODE)) |
182 | - msg = self._parse_headers(header) |
183 | - return code, response.handle_response(abspath, code, msg, data) |
184 | + else: |
185 | + self._raise_curl_http_error(curl) |
186 | |
187 | def _parse_headers(self, status_and_headers): |
188 | """Transform the headers provided by curl into an HTTPMessage""" |
189 | @@ -285,26 +287,31 @@ |
190 | raise |
191 | data.seek(0) |
192 | code = curl.getinfo(pycurl.HTTP_CODE) |
193 | - msg = self._parse_headers(header) |
194 | - return code, response.handle_response(abspath, code, msg, data) |
195 | - |
196 | - |
197 | - def _raise_curl_http_error(self, curl, info=None): |
198 | + if code == 200: |
199 | + msg = self._parse_headers(header) |
200 | + return code, response.handle_response(abspath, code, msg, data) |
201 | + else: |
202 | + self._raise_curl_http_error(curl, body=data) |
203 | + |
204 | + |
205 | + def _raise_curl_http_error(self, curl, info=None, body=None): |
206 | + """Common curl->bzrlib error translation. |
207 | + |
208 | + Some methods may choose to override this for particular cases. |
209 | + |
210 | + The URL and code are automatically included as appropriate. |
211 | + |
212 | + :param info: Extra information to include in the message. |
213 | + :param body: File-like object from which the body of the page can be read. |
214 | + """ |
215 | code = curl.getinfo(pycurl.HTTP_CODE) |
216 | url = curl.getinfo(pycurl.EFFECTIVE_URL) |
217 | - # Some error codes can be handled the same way for all |
218 | - # requests |
219 | - if code == 403: |
220 | - raise errors.TransportError( |
221 | - 'Server refuses to fulfill the request (403 Forbidden)' |
222 | - ' for %s' % url) |
223 | + if body is not None: |
224 | + response_body = body.read() |
225 | else: |
226 | - if info is None: |
227 | - msg = '' |
228 | - else: |
229 | - msg = ': ' + info |
230 | - raise errors.InvalidHttpResponse( |
231 | - url, 'Unable to handle http code %d%s' % (code,msg)) |
232 | + response_body = None |
233 | + self._raise_mapped_error(url, code, response_body=response_body, |
234 | + context_message=info) |
235 | |
236 | def _debug_cb(self, kind, text): |
237 | if kind in (pycurl.INFOTYPE_HEADER_IN, pycurl.INFOTYPE_DATA_IN, |
This tries to handle 'http 405 not allowed' from Google code as meaning "not allowed to look for .bzr", ie there's no bzr branch there. That may allow foreign branches to work against it. However, their servers keep falling over with '503 not available' errors, which make testing it difficult.
I've only tested this interactively not added tests. I probably should.
This prints and logs the actual message from the http response page.
It also tries to unify http response handling currently scattered across _pycurl.py into one method in the base class. This may be slightly dangerous to change as it's lightly tested and tries to work around server quirks, but any breakage should be shallow and it's worth clearing it up I think. I haven't yet gone through the urllib implementation.