Merge lp:~mbp/bzr/497274-http-405 into lp:bzr

Proposed by Martin Pool
Status: Rejected
Rejected by: Martin Pool
Proposed branch: lp:~mbp/bzr/497274-http-405
Merge into: lp:bzr
Diff against target: 237 lines (+104/-31)
5 files modified
NEWS (+8/-0)
bzrlib/bzrdir.py (+1/-1)
bzrlib/tests/__init__.py (+1/-0)
bzrlib/transport/http/__init__.py (+58/-1)
bzrlib/transport/http/_pycurl.py (+36/-29)
To merge this branch: bzr merge lp:~mbp/bzr/497274-http-405
Reviewer Review Type Date Requested Status
Vincent Ladeuil Needs Information
John A Meinel Approve
Review via email: mp+16229@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Martin Pool (mbp) wrote :

This tries to handle 'http 405 not allowed' from Google code as meaning "not allowed to look for .bzr", ie there's no bzr branch there. That may allow foreign branches to work against it. However, their servers keep falling over with '503 not available' errors, which make testing it difficult.

I've only tested this interactively not added tests. I probably should.

This prints and logs the actual message from the http response page.

It also tries to unify http response handling currently scattered across _pycurl.py into one method in the base class. This may be slightly dangerous to change as it's lightly tested and tries to work around server quirks, but any breakage should be shallow and it's worth clearing it up I think. I haven't yet gone through the urllib implementation.

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> Martin Pool has proposed merging lp:~mbp/bzr/497274-http-405 into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
> Related bugs:
> #497274 should interpret http 405 "not allowed" as "no smart server here" - breaks foreign branches on google code
> https://bugs.launchpad.net/bugs/497274
>
>
> This tries to handle 'http 405 not allowed' from Google code as meaning "not allowed to look for .bzr", ie there's no bzr branch there. That may allow foreign branches to work against it. However, their servers keep falling over with '503 not available' errors, which make testing it difficult.
>
> I've only tested this interactively not added tests. I probably should.
>
> This prints and logs the actual message from the http response page.
>
> It also tries to unify http response handling currently scattered across _pycurl.py into one method in the base class. This may be slightly dangerous to change as it's lightly tested and tries to work around server quirks, but any breakage should be shallow and it's worth clearing it up I think. I haven't yet gone through the urllib implementation.
>

I would ask Vincent for some help, but as I understand he already has
all the infrastructure to create an HTTP server that gives canned
responses. So setting one up to return 405 on .bzr/branch-format and
ensuring that bzr then raises NoSuchBranch rather than
InvalidHTTPResponse seems a reasonable high-level test.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkspBNQACgkQJdeBCYSNAAMOsACguGOLfNw3cNBoJNFYjSMjl3io
biAAn3asxZvkOtMob2ryoGSEF7laoBGT
=JW9R
-----END PGP SIGNATURE-----

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> Martin Pool has proposed merging lp:~mbp/bzr/497274-http-405 into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
> Related bugs:
> #497274 should interpret http 405 "not allowed" as "no smart server here" - breaks foreign branches on google code
> https://bugs.launchpad.net/bugs/497274
>
>
> This tries to handle 'http 405 not allowed' from Google code as meaning "not allowed to look for .bzr", ie there's no bzr branch there. That may allow foreign branches to work against it. However, their servers keep falling over with '503 not available' errors, which make testing it difficult.
>
> I've only tested this interactively not added tests. I probably should.
>
> This prints and logs the actual message from the http response page.
>
> It also tries to unify http response handling currently scattered across _pycurl.py into one method in the base class. This may be slightly dangerous to change as it's lightly tested and tries to work around server quirks, but any breakage should be shallow and it's worth clearing it up I think. I haven't yet gone through the urllib implementation.
>

Oh, and:

 review: approve

- From me, though it would be nice to have the test case before landing.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkspBRcACgkQJdeBCYSNAAPp+ACfQl/BMB8fhepPk1/O5vH2FP5v
gp8AoJXZrwqN8ebjB9aAi36oc11xabJz
=mBqz
-----END PGP SIGNATURE-----

review: Approve
Revision history for this message
Vincent Ladeuil (vila) wrote :

> This tries to handle 'http 405 not allowed' from Google code as meaning "not
> allowed to look for .bzr", ie there's no bzr branch there. That may allow
> foreign branches to work against it. However, their servers keep falling over
> with '503 not available' errors, which make testing it difficult.

Then look at lp:~vila/bzr/497274-http-405 for an alternative, minimal (and probably incomplete)
fix with tests.

_post() is used only by smart_http_request() and the later already translate
InvalidHttpResponse into SmartProtocolError which higher layers interpret as NotABranch when needed.

>
> I've only tested this interactively not added tests. I probably should.

See above.

>
> This prints and logs the actual message from the http response page.

I doubt users are really interested in the general case, muttering may be enough though.

>
> It also tries to unify http response handling currently scattered across
> _pycurl.py into one method in the base class.

I'm pretty sure I went mostly in the opposite direction in the past based on problems
encountered while implementing the webdav support.

I didn't try to unify both implementations either... mostly because we will deprecate pycurl
one day and I was happy with the urllib one. The idea there is that there are three categories
of error codes:
- the ones that represent success,
- the ones that represent known failures,
- the others

Upon reception, the callers must handle the success cases, may handle some known
failures specifically before relying on a fallback for the remaining ones.

Basically the http requests are generic enough that the same error code can be interpreted
differently in different contexts. Unifying the error handling too much makes it harder
to raise the correct bzr exceptions.

So far, only 403 was really unambiguously common, even 404 needed to be handled separately
for HEAD (which is the main reason why I prefer to *not* have it in the common code).

All in all, that gives a coherent way to code all the methods with limited duplication
(mostly 404 but everybody knows what 404 means :)

> This may be slightly dangerous
> to change as it's lightly tested and tries to work around server quirks, but
> any breakage should be shallow and it's worth clearing it up I think. I
> haven't yet gone through the urllib implementation.

See above, that's a one-line fix, the pycurl implementation doesn't even need to be fixed.

That may explains why you can't test it against google, I suspect others use the
urllib implementation and needs the fix.

Note that I fixed a bug in send_http_smart_request so you want to backport at least that part.

So I vote needs_fixing and will mark as Work In Progress.

review: Needs Fixing
Revision history for this message
Martin Pool (mbp) wrote :
Download full text (4.8 KiB)

> > This tries to handle 'http 405 not allowed' from Google code as meaning "not
> > allowed to look for .bzr", ie there's no bzr branch there. That may allow
> > foreign branches to work against it. However, their servers keep falling
> over
> > with '503 not available' errors, which make testing it difficult.
>
> Then look at lp:~vila/bzr/497274-http-405 for an alternative, minimal (and
> probably incomplete)
> fix with tests.

I'll merge that into mine and see how it looks.

> _post() is used only by smart_http_request() and the later already translate
> InvalidHttpResponse into SmartProtocolError which higher layers interpret as
> NotABranch when needed.
>
> >
> > I've only tested this interactively not added tests. I probably should.
>
> See above.
>
> >
> > This prints and logs the actual message from the http response page.
>
> I doubt users are really interested in the general case, muttering may be
> enough though.

They might not be, but only giving the status code is not very helpful if there is something mysterious going on, like the server being overloaded. We tend to get bug reports about this - though just getting it into the trace should help us with that.

>
> >
> > It also tries to unify http response handling currently scattered across
> > _pycurl.py into one method in the base class.
>
> I'm pretty sure I went mostly in the opposite direction in the past based on
> problems
> encountered while implementing the webdav support.
>
> I didn't try to unify both implementations either... mostly because we will
> deprecate pycurl
> one day and I was happy with the urllib one. The idea there is that there are
> three categories
> of error codes:
> - the ones that represent success,
> - the ones that represent known failures,
> - the others
>
> Upon reception, the callers must handle the success cases, may handle some
> known
> failures specifically before relying on a fallback for the remaining ones.
>
> Basically the http requests are generic enough that the same error code can be
> interpreted
> differently in different contexts. Unifying the error handling too much makes
> it harder
> to raise the correct bzr exceptions.
>
> So far, only 403 was really unambiguously common, even 404 needed to be
> handled separately
> for HEAD (which is the main reason why I prefer to *not* have it in the common
> code).
>
> All in all, that gives a coherent way to code all the methods with limited
> duplication
> (mostly 404 but everybody knows what 404 means :)

There are two types of possible duplication: across different call paths (get vs post vs others) and across different http client implementations. I can see how different callers might want different interpretations for some codes, but I don't see why we would want different interpretations across urllib and pycurl, unless it really is something where the client library interferes with the result, as it may for redirects.

>
> > This may be slightly dangerous
> > to change as it's lightly tested and tries to work around server quirks, but
> > any breakage should be shallow and it's worth clearing it up I think. I
> > haven't yet gone through the urllib implementatio...

Read more...

Revision history for this message
Martin Pool (mbp) wrote :

@vila: How do you see the interaction between the various http exception classes working. For instance, why is this code as it is:

    def http_error_default(self, req, fp, code, msg, hdrs):
        if code == 403:
            raise errors.TransportError(
                'Server refuses to fulfill the request (403 Forbidden)'
                ' for %s' % req.get_full_url())
        else:
            raise errors.InvalidHttpResponse(req.get_full_url(),
                                             'Unable to handle http code %d: %s'
                                             % (code, msg))

Revision history for this message
Martin Pool (mbp) wrote :

google code seems to be up again and I've just been testing this interactively, and I can't actually get trunk to fail there, even with pycurl disabled. So I'm not sure what the future of this patch is: to me at least some of it seems like a useful cleanup, but given all the quirks here I don't think it's worth landing it without at least one real-world test showing it's better.

vila, how did you reproduce an improvement with your patch?

Revision history for this message
Martin Pool (mbp) wrote :

I've split out the html error handling (pycurl only) into https://code.edge.launchpad.net/~mbp/bzr/http-messages

Revision history for this message
Vincent Ladeuil (vila) wrote :
Download full text (4.2 KiB)

>>>>> "martin" == Martin Pool <email address hidden> writes:

Disclaimer: I wrote my patch in urgency without trying to
integrate it with yours, as such I tried to address the missing
parts. The were no tests and no urllib implementation. I stopped
there and didn't explain it as well as I should have, see below
for more comments.

<snip/>

    martin> I'll merge that into mine and see how it looks.

That was the idea, I'm fine with whatever end result you come up
with.

<snip/>

    >> I doubt users are really interested in the general case,
    >> muttering may be enough though.

    martin> They might not be, but only giving the status code is
    martin> not very helpful if there is something mysterious
    martin> going on, like the server being overloaded. We tend
    martin> to get bug reports about this - though just getting
    martin> it into the trace should help us with that.

Yup. I was a bit uncomfortable with unhtml() but not that
much. My main concern is that its a bit brute force and in
presence of errors I like, as a dev, to have access to the raw
data. Users have others expectancies and I don't a clear answer
about what should be presented.

<snip/>

    martin> There are two types of possible duplication: across
    martin> different call paths (get vs post vs others) and
    martin> across different http client implementations. I can
    martin> see how different callers might want different
    martin> interpretations for some codes, but I don't see why
    martin> we would want different interpretations across urllib
    martin> and pycurl, unless it really is something where the
    martin> client library interferes with the result, as it may
    martin> for redirects.

So, thanks for bringing a fresh eye here. Historically I've never
attempt to unify the two implementations very hard based on the
assumption that pycurl will be deprecated (shudder).

But as you point above, both goals are a bit contradictory, you
have to care to leave errors escape the lower layer handling so
that the higher (common) one can work.

<snip/>

    martin> So your code has, in handle_response:

    martin> elif code == 405:
    martin> # The server refused to handle the request, the data presumably
    martin> # contains details about the error and not useful data, but we leave
    martin> # that to the higher levels.
    martin> return data

    martin> which strongly looks like you're going to treat the
    martin> html text of the error response as a valid response
    martin> body, which is definitely not right. OK, apparently
    martin> this doesn't necessarily happen because the base
    martin> class send_http_smart_request translates it, but
    martin> there's still some risk of that.

Yup, that's why I said the implementation is incomplete. This was
the less intrusive fix but I was not happy with the risk.

    martin> If you pass around (http_status, body) too much then
    martin> multiple layers need to know how to interpret the
    martin> status code. Without understanding it, they can't
    martin> understand how to use the body data. For insta2nce,
    martin> send_http_smart_request ...

Read more...

Revision history for this message
Vincent Ladeuil (vila) wrote :

    martin> I've split out the html error handling (pycurl only) into
    martin> https://code.edge.launchpad.net/~mbp/bzr/http-messages

The diff isn't online yet, I'll give it a look in a couple of
days. See my other mail and feel free to bring more of your work
based on my remarks there.

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> martin> I guess I don't properly understand whether you think
> martin> my patch is actually wrong or just needs tests. I
> martin> will try to use your tests with mine and see where we
> martin> get to.
>
> I hope I clarified that: except for 404, I'm fine with a common
> error handling, may be I just haven't seen yet a good way to
> implement that but I didn't spend much time on it either and I
> trust you to find it ;)
>

If the only code that uses 404 specially is HEAD, then I would be happy
to have it trap the exception. #1 because I don't think we use
"transport.has*" in any live code path. LBYL vs EAFTP.

I think the only code that uses it is probably old-format repositories.
(Where having a revision was indicated by the presence of the
'.bzr/revisions/$REVID' file.)

I'm pretty sure everything else would treate 404 as NoSuchFile.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksvi9wACgkQJdeBCYSNAAM7ygCffCBwDol9uBmR8XgrlxnQTe1e
vOMAoKCYRxzzy7RQmM+Byq7iKO7EP0VF
=dmsP
-----END PGP SIGNATURE-----

Revision history for this message
Martin Pool (mbp) wrote :

2009/12/22 John A Meinel <email address hidden>:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
>>     martin> I guess I don't properly understand whether you think
>>     martin> my patch is actually wrong or just needs tests.  I
>>     martin> will try to use your tests with mine and see where we
>>     martin> get to.
>>
>> I hope I clarified that: except for 404, I'm fine with a common
>> error handling, may be I just haven't seen yet a good way to
>> implement that but I didn't spend much time on it either and I
>> trust you to find it ;)
>>
>
> If the only code that uses 404 specially is HEAD, then I would be happy
> to have it trap the exception. #1 because I don't think we use
> "transport.has*" in any live code path. LBYL vs EAFTP.
>
> I think the only code that uses it is probably old-format repositories.
> (Where having a revision was indicated by the presence of the
> '.bzr/revisions/$REVID' file.)
>
> I'm pretty sure everything else would treate 404 as NoSuchFile.

I think it would be reasonable for .has() to just catch NoSuchFile.

Having looked at this a bit more: InvalidHttpResponse mixes both "not
valid http" and "not the http code we expected." So perhaps a good
step would be to separate that into "HttpError" containing all the
right fields, then we can translate that into a specific bzr error,
taking into account the type of operation we were doing.

However, since imports from google are apparently now working, this is
not urgent for me.

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message
Vincent Ladeuil (vila) wrote :

@poolie: From the comments, I get the impression that you have un-pushed changes there. If that's right, can you update this mp ? I'm pretty sure it's in a good enough state to land on trunk if only to allow whoever being able to reproduce the issue to test it.

This probably needs to be refreshed against the current trunk, watch the conflicts with an hawk eye ;)

review: Needs Information
Revision history for this message
Martin Pool (mbp) wrote :

vila, on the whole, I'll just abandon this branch. I think the situation it was trying to fix may have been a transient problem at Google's end, and there's no longer enough obvious benefit to be worth changing it.

Revision history for this message
Vincent Ladeuil (vila) wrote :

>>>>> Martin Pool <email address hidden> writes:

    > vila, on the whole, I'll just abandon this branch. I think the
    > situation it was trying to fix may have been a transient problem
    > at Google's end, and there's no longer enough obvious benefit to
    > be worth changing it.

Did you write this before or after commenting on bug #497274 ?

https://bugs.launchpad.net/bzr/+bug/497274/comments/15 kind of imply you
still consider this proposal to be useful no ?

Revision history for this message
Martin Pool (mbp) wrote :

On 21 February 2011 18:48, Vincent Ladeuil <email address hidden> wrote:
>>>>>> Martin Pool <email address hidden> writes:
>
>    > vila, on the whole, I'll just abandon this branch.  I think the
>    > situation it was trying to fix may have been a transient problem
>    > at Google's end, and there's no longer enough obvious benefit to
>    > be worth changing it.
>
> Did you write this before or after commenting on bug #497274 ?
>
> https://bugs.launchpad.net/bzr/+bug/497274/comments/15 kind of imply you
> still consider this proposal to be useful no ?

I wrote that before.

I don't think this mp is still in progress, but it can be resurrected
if that's useful.

Unmerged revisions

4917. By Martin Pool

Remove obsolete comments

4916. By Martin Pool

Show unhtml_roughly for some http errors to give more of a clue why

4915. By Martin Pool

Add unhtml_roughly to show html http errors

4914. By Martin Pool

doc

4913. By Martin Pool

Move http->error translation into HttpTransportBase

4912. By Martin Pool

TransportNotPossible taken to mean there's no branch present

4911. By Martin Pool

Tweak http error

4910. By Martin Pool

Interpret http 405 'not allowed' as meaning there's no smart server present

4909. By Martin Pool

Tweak http error message

4908. By Martin Pool

Map http 405 to TransportNotPossible

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2009-12-15 19:59:00 +0000
3+++ NEWS 2009-12-16 08:26:16 +0000
4@@ -20,6 +20,11 @@
5 Bug Fixes
6 *********
7
8+* HTTP 405 "Not Allowed" is taken to mean there's no bzr branch or smart
9+ server at the URL. This is sent by Google Code and blocks use of
10+ foreign branch plugins.
11+ (Martin Pool, #497274)
12+
13 Improvements
14 ************
15
16@@ -32,6 +37,9 @@
17 Internals
18 *********
19
20+* PyCurl http/https error handling unified; this may have knock-on effects
21+ on quirky web servers. (Martin Pool)
22+
23 Testing
24 *******
25
26
27=== modified file 'bzrlib/bzrdir.py'
28--- bzrlib/bzrdir.py 2009-12-02 17:56:06 +0000
29+++ bzrlib/bzrdir.py 2009-12-16 08:26:16 +0000
30@@ -1828,7 +1828,7 @@
31 """Return the .bzrdir style format present in a directory."""
32 try:
33 format_string = transport.get_bytes(".bzr/branch-format")
34- except errors.NoSuchFile:
35+ except (errors.NoSuchFile, errors.TransportNotPossible):
36 raise errors.NotBranchError(path=transport.base)
37
38 try:
39
40=== modified file 'bzrlib/tests/__init__.py'
41--- bzrlib/tests/__init__.py 2009-12-08 21:46:07 +0000
42+++ bzrlib/tests/__init__.py 2009-12-16 08:26:16 +0000
43@@ -3934,6 +3934,7 @@
44 'bzrlib.symbol_versioning',
45 'bzrlib.tests',
46 'bzrlib.timestamp',
47+ 'bzrlib.transport.http',
48 'bzrlib.version_info_formats.format_custom',
49 ]
50
51
52=== modified file 'bzrlib/transport/http/__init__.py'
53--- bzrlib/transport/http/__init__.py 2009-11-26 14:39:31 +0000
54+++ bzrlib/transport/http/__init__.py 2009-12-16 08:26:16 +0000
55@@ -205,6 +205,51 @@
56 # use.
57 _get_max_size = 0
58
59+ def _raise_mapped_error(self, url, code, response_body,
60+ context_message=None):
61+ """Translate an http error into a bzrlib exception.
62+
63+ Some methods may choose to override this for particular cases.
64+
65+ The URL and code are automatically included as appropriate.
66+
67+ :param code: integer response code
68+ :param response_body: string containing the body of the error response
69+ (typically html)
70+ """
71+ # XXX: This should also be used through the urllib implementation, but
72+ # it's not yet
73+ mutter("http error: %d %s" % (code, url))
74+ mutter(" response body: %r" % response_body)
75+ if response_body:
76+ plaintext_body = unhtml_roughly(response_body)
77+ else:
78+ plaintext_body = ''
79+ if code == 404:
80+ raise errors.NoSuchFile(url)
81+ elif code == 403:
82+ # "The server understood the request, but is refusing to fulfill
83+ # it. Authorization will not help and the request SHOULD NOT be
84+ # repeated."
85+ raise errors.InvalidHttpResponse(url, '403 Forbidden')
86+ elif code == 405:
87+ # "The method specified in the Request-Line is not allowed for the
88+ # resource identified by the Request-URI. The response MUST
89+ # include an Allow header containing a list of valid methods for
90+ # the requested resource."
91+ #
92+ # Sent by Google code when probing for .bzr
93+ # <https://bugs.edge.launchpad.net/bzr/+bug/497274>
94+ raise errors.TransportNotPossible(url, '405 Not Allowed: '
95+ + plaintext_body)
96+ else:
97+ if context_message is None:
98+ msg = ''
99+ else:
100+ msg = ': ' + context_message
101+ raise errors.InvalidHttpResponse(
102+ url, 'http response %d%s: %s' % (code, msg, plaintext_body))
103+
104 def _readv(self, relpath, offsets):
105 """Get parts of the file at the given relative path.
106
107@@ -614,10 +659,13 @@
108 t = self._http_transport_ref()
109 code, body_filelike = t._post(bytes)
110 if code != 200:
111+ # this should normally be handled by _post, but check it here
112 raise InvalidHttpResponse(
113 t._remote_path('.bzr/smart'),
114 'Expected 200 response code, got %r' % (code,))
115- except (errors.InvalidHttpResponse, errors.ConnectionReset), e:
116+ except (errors.TransportNotPossible,
117+ errors.InvalidHttpResponse,
118+ errors.ConnectionReset), e:
119 raise errors.SmartProtocolError(str(e))
120 return body_filelike
121
122@@ -661,3 +709,12 @@
123 def _finished_reading(self):
124 """See SmartClientMediumRequest._finished_reading."""
125 pass
126+
127+
128+def unhtml_roughly(maybe_html):
129+ """Very approximate html->text translation, for presenting error bodies.
130+
131+ >>> unhtml_roughly("<b>bad</b> things happened\\n")
132+ ' bad things happened '
133+ """
134+ return re.subn(r"(<[^>]*>|\n|&nbsp;)", " ", maybe_html)[0]
135
136=== modified file 'bzrlib/transport/http/_pycurl.py'
137--- bzrlib/transport/http/_pycurl.py 2009-08-19 16:33:39 +0000
138+++ bzrlib/transport/http/_pycurl.py 2009-12-16 08:26:16 +0000
139@@ -1,4 +1,4 @@
140-# Copyright (C) 2006 Canonical Ltd
141+# Copyright (C) 2006, 2009 Canonical Ltd
142 #
143 # This program is free software; you can redistribute it and/or modify
144 # it under the terms of the GNU General Public License as published by
145@@ -206,13 +206,10 @@
146 code = curl.getinfo(pycurl.HTTP_CODE)
147 data.seek(0)
148
149- if code == 404:
150- raise errors.NoSuchFile(abspath)
151- if code != 200:
152- self._raise_curl_http_error(
153- curl, 'expected 200 or 404 for full response.')
154-
155- return code, data
156+ if code == 200:
157+ return code, data
158+ else:
159+ self._raise_curl_http_error(curl)
160
161 # The parent class use 0 to minimize the requests, but since we can't
162 # exploit the results as soon as they are received (pycurl limitation) we'd
163@@ -237,16 +234,21 @@
164
165 code = curl.getinfo(pycurl.HTTP_CODE)
166
167- if code == 404: # not found
168- raise errors.NoSuchFile(abspath)
169+ if code in (200, 206):
170+ msg = self._parse_headers(header)
171+ return code, response.handle_response(abspath, code, msg, data)
172 elif code in (400, 416):
173 # We don't know which, but one of the ranges we specified was
174 # wrong.
175+ #
176+ # 400 is not strictly 'invalid range', but rather 'bad request'
177+ # but (perhaps) some servers use it for that meaning. Aside from
178+ # that, this could go into _raise_curl_http_error.
179 raise errors.InvalidHttpRange(abspath, range_header,
180 'Server return code %d'
181 % curl.getinfo(pycurl.HTTP_CODE))
182- msg = self._parse_headers(header)
183- return code, response.handle_response(abspath, code, msg, data)
184+ else:
185+ self._raise_curl_http_error(curl)
186
187 def _parse_headers(self, status_and_headers):
188 """Transform the headers provided by curl into an HTTPMessage"""
189@@ -285,26 +287,31 @@
190 raise
191 data.seek(0)
192 code = curl.getinfo(pycurl.HTTP_CODE)
193- msg = self._parse_headers(header)
194- return code, response.handle_response(abspath, code, msg, data)
195-
196-
197- def _raise_curl_http_error(self, curl, info=None):
198+ if code == 200:
199+ msg = self._parse_headers(header)
200+ return code, response.handle_response(abspath, code, msg, data)
201+ else:
202+ self._raise_curl_http_error(curl, body=data)
203+
204+
205+ def _raise_curl_http_error(self, curl, info=None, body=None):
206+ """Common curl->bzrlib error translation.
207+
208+ Some methods may choose to override this for particular cases.
209+
210+ The URL and code are automatically included as appropriate.
211+
212+ :param info: Extra information to include in the message.
213+ :param body: File-like object from which the body of the page can be read.
214+ """
215 code = curl.getinfo(pycurl.HTTP_CODE)
216 url = curl.getinfo(pycurl.EFFECTIVE_URL)
217- # Some error codes can be handled the same way for all
218- # requests
219- if code == 403:
220- raise errors.TransportError(
221- 'Server refuses to fulfill the request (403 Forbidden)'
222- ' for %s' % url)
223+ if body is not None:
224+ response_body = body.read()
225 else:
226- if info is None:
227- msg = ''
228- else:
229- msg = ': ' + info
230- raise errors.InvalidHttpResponse(
231- url, 'Unable to handle http code %d%s' % (code,msg))
232+ response_body = None
233+ self._raise_mapped_error(url, code, response_body=response_body,
234+ context_message=info)
235
236 def _debug_cb(self, kind, text):
237 if kind in (pycurl.INFOTYPE_HEADER_IN, pycurl.INFOTYPE_DATA_IN,