Merge lp:~mbp/bzr/497274-http-405 into lp:bzr

Proposed by Martin Pool
Status: Rejected
Rejected by: Martin Pool
Proposed branch: lp:~mbp/bzr/497274-http-405
Merge into: lp:bzr
Diff against target: 237 lines (+104/-31)
5 files modified
NEWS (+8/-0)
bzrlib/bzrdir.py (+1/-1)
bzrlib/tests/__init__.py (+1/-0)
bzrlib/transport/http/__init__.py (+58/-1)
bzrlib/transport/http/_pycurl.py (+36/-29)
To merge this branch: bzr merge lp:~mbp/bzr/497274-http-405
Reviewer Review Type Date Requested Status
Vincent Ladeuil Needs Information
John A Meinel Approve
Review via email: mp+16229@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Martin Pool (mbp) wrote :

This tries to handle 'http 405 not allowed' from Google code as meaning "not allowed to look for .bzr", ie there's no bzr branch there. That may allow foreign branches to work against it. However, their servers keep falling over with '503 not available' errors, which make testing it difficult.

I've only tested this interactively not added tests. I probably should.

This prints and logs the actual message from the http response page.

It also tries to unify http response handling currently scattered across _pycurl.py into one method in the base class. This may be slightly dangerous to change as it's lightly tested and tries to work around server quirks, but any breakage should be shallow and it's worth clearing it up I think. I haven't yet gone through the urllib implementation.

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> Martin Pool has proposed merging lp:~mbp/bzr/497274-http-405 into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
> Related bugs:
> #497274 should interpret http 405 "not allowed" as "no smart server here" - breaks foreign branches on google code
> https://bugs.launchpad.net/bugs/497274
>
>
> This tries to handle 'http 405 not allowed' from Google code as meaning "not allowed to look for .bzr", ie there's no bzr branch there. That may allow foreign branches to work against it. However, their servers keep falling over with '503 not available' errors, which make testing it difficult.
>
> I've only tested this interactively not added tests. I probably should.
>
> This prints and logs the actual message from the http response page.
>
> It also tries to unify http response handling currently scattered across _pycurl.py into one method in the base class. This may be slightly dangerous to change as it's lightly tested and tries to work around server quirks, but any breakage should be shallow and it's worth clearing it up I think. I haven't yet gone through the urllib implementation.
>

I would ask Vincent for some help, but as I understand he already has
all the infrastructure to create an HTTP server that gives canned
responses. So setting one up to return 405 on .bzr/branch-format and
ensuring that bzr then raises NoSuchBranch rather than
InvalidHTTPResponse seems a reasonable high-level test.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkspBNQACgkQJdeBCYSNAAMOsACguGOLfNw3cNBoJNFYjSMjl3io
biAAn3asxZvkOtMob2ryoGSEF7laoBGT
=JW9R
-----END PGP SIGNATURE-----

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> Martin Pool has proposed merging lp:~mbp/bzr/497274-http-405 into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
> Related bugs:
> #497274 should interpret http 405 "not allowed" as "no smart server here" - breaks foreign branches on google code
> https://bugs.launchpad.net/bugs/497274
>
>
> This tries to handle 'http 405 not allowed' from Google code as meaning "not allowed to look for .bzr", ie there's no bzr branch there. That may allow foreign branches to work against it. However, their servers keep falling over with '503 not available' errors, which make testing it difficult.
>
> I've only tested this interactively not added tests. I probably should.
>
> This prints and logs the actual message from the http response page.
>
> It also tries to unify http response handling currently scattered across _pycurl.py into one method in the base class. This may be slightly dangerous to change as it's lightly tested and tries to work around server quirks, but any breakage should be shallow and it's worth clearing it up I think. I haven't yet gone through the urllib implementation.
>

Oh, and:

 review: approve

- From me, though it would be nice to have the test case before landing.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkspBRcACgkQJdeBCYSNAAPp+ACfQl/BMB8fhepPk1/O5vH2FP5v
gp8AoJXZrwqN8ebjB9aAi36oc11xabJz
=mBqz
-----END PGP SIGNATURE-----

review: Approve
Revision history for this message
Vincent Ladeuil (vila) wrote :

> This tries to handle 'http 405 not allowed' from Google code as meaning "not
> allowed to look for .bzr", ie there's no bzr branch there. That may allow
> foreign branches to work against it. However, their servers keep falling over
> with '503 not available' errors, which make testing it difficult.

Then look at lp:~vila/bzr/497274-http-405 for an alternative, minimal (and probably incomplete)
fix with tests.

_post() is used only by smart_http_request() and the later already translate
InvalidHttpResponse into SmartProtocolError which higher layers interpret as NotABranch when needed.

>
> I've only tested this interactively not added tests. I probably should.

See above.

>
> This prints and logs the actual message from the http response page.

I doubt users are really interested in the general case, muttering may be enough though.

>
> It also tries to unify http response handling currently scattered across
> _pycurl.py into one method in the base class.

I'm pretty sure I went mostly in the opposite direction in the past based on problems
encountered while implementing the webdav support.

I didn't try to unify both implementations either... mostly because we will deprecate pycurl
one day and I was happy with the urllib one. The idea there is that there are three categories
of error codes:
- the ones that represent success,
- the ones that represent known failures,
- the others

Upon reception, the callers must handle the success cases, may handle some known
failures specifically before relying on a fallback for the remaining ones.

Basically the http requests are generic enough that the same error code can be interpreted
differently in different contexts. Unifying the error handling too much makes it harder
to raise the correct bzr exceptions.

So far, only 403 was really unambiguously common, even 404 needed to be handled separately
for HEAD (which is the main reason why I prefer to *not* have it in the common code).

All in all, that gives a coherent way to code all the methods with limited duplication
(mostly 404 but everybody knows what 404 means :)

> This may be slightly dangerous
> to change as it's lightly tested and tries to work around server quirks, but
> any breakage should be shallow and it's worth clearing it up I think. I
> haven't yet gone through the urllib implementation.

See above, that's a one-line fix, the pycurl implementation doesn't even need to be fixed.

That may explains why you can't test it against google, I suspect others use the
urllib implementation and needs the fix.

Note that I fixed a bug in send_http_smart_request so you want to backport at least that part.

So I vote needs_fixing and will mark as Work In Progress.

review: Needs Fixing
Revision history for this message
Martin Pool (mbp) wrote :
Download full text (4.8 KiB)

> > This tries to handle 'http 405 not allowed' from Google code as meaning "not
> > allowed to look for .bzr", ie there's no bzr branch there. That may allow
> > foreign branches to work against it. However, their servers keep falling
> over
> > with '503 not available' errors, which make testing it difficult.
>
> Then look at lp:~vila/bzr/497274-http-405 for an alternative, minimal (and
> probably incomplete)
> fix with tests.

I'll merge that into mine and see how it looks.

> _post() is used only by smart_http_request() and the later already translate
> InvalidHttpResponse into SmartProtocolError which higher layers interpret as
> NotABranch when needed.
>
> >
> > I've only tested this interactively not added tests. I probably should.
>
> See above.
>
> >
> > This prints and logs the actual message from the http response page.
>
> I doubt users are really interested in the general case, muttering may be
> enough though.

They might not be, but only giving the status code is not very helpful if there is something mysterious going on, like the server being overloaded. We tend to get bug reports about this - though just getting it into the trace should help us with that.

>
> >
> > It also tries to unify http response handling currently scattered across
> > _pycurl.py into one method in the base class.
>
> I'm pretty sure I went mostly in the opposite direction in the past based on
> problems
> encountered while implementing the webdav support.
>
> I didn't try to unify both implementations either... mostly because we will
> deprecate pycurl
> one day and I was happy with the urllib one. The idea there is that there are
> three categories
> of error codes:
> - the ones that represent success,
> - the ones that represent known failures,
> - the others
>
> Upon reception, the callers must handle the success cases, may handle some
> known
> failures specifically before relying on a fallback for the remaining ones.
>
> Basically the http requests are generic enough that the same error code can be
> interpreted
> differently in different contexts. Unifying the error handling too much makes
> it harder
> to raise the correct bzr exceptions.
>
> So far, only 403 was really unambiguously common, even 404 needed to be
> handled separately
> for HEAD (which is the main reason why I prefer to *not* have it in the common
> code).
>
> All in all, that gives a coherent way to code all the methods with limited
> duplication
> (mostly 404 but everybody knows what 404 means :)

There are two types of possible duplication: across different call paths (get vs post vs others) and across different http client implementations. I can see how different callers might want different interpretations for some codes, but I don't see why we would want different interpretations across urllib and pycurl, unless it really is something where the client library interferes with the result, as it may for redirects.

>
> > This may be slightly dangerous
> > to change as it's lightly tested and tries to work around server quirks, but
> > any breakage should be shallow and it's worth clearing it up I think. I
> > haven't yet gone through the urllib implementatio...

Read more...

Revision history for this message
Martin Pool (mbp) wrote :

@vila: How do you see the interaction between the various http exception classes working. For instance, why is this code as it is:

    def http_error_default(self, req, fp, code, msg, hdrs):
        if code == 403:
            raise errors.TransportError(
                'Server refuses to fulfill the request (403 Forbidden)'
                ' for %s' % req.get_full_url())
        else:
            raise errors.InvalidHttpResponse(req.get_full_url(),
                                             'Unable to handle http code %d: %s'
                                             % (code, msg))

Revision history for this message
Martin Pool (mbp) wrote :

google code seems to be up again and I've just been testing this interactively, and I can't actually get trunk to fail there, even with pycurl disabled. So I'm not sure what the future of this patch is: to me at least some of it seems like a useful cleanup, but given all the quirks here I don't think it's worth landing it without at least one real-world test showing it's better.

vila, how did you reproduce an improvement with your patch?

Revision history for this message
Martin Pool (mbp) wrote :

I've split out the html error handling (pycurl only) into https://code.edge.launchpad.net/~mbp/bzr/http-messages

Revision history for this message
Vincent Ladeuil (vila) wrote :
Download full text (4.2 KiB)

>>>>> "martin" == Martin Pool <email address hidden> writes:

Disclaimer: I wrote my patch in urgency without trying to
integrate it with yours, as such I tried to address the missing
parts. The were no tests and no urllib implementation. I stopped
there and didn't explain it as well as I should have, see below
for more comments.

<snip/>

    martin> I'll merge that into mine and see how it looks.

That was the idea, I'm fine with whatever end result you come up
with.

<snip/>

    >> I doubt users are really interested in the general case,
    >> muttering may be enough though.

    martin> They might not be, but only giving the status code is
    martin> not very helpful if there is something mysterious
    martin> going on, like the server being overloaded. We tend
    martin> to get bug reports about this - though just getting
    martin> it into the trace should help us with that.

Yup. I was a bit uncomfortable with unhtml() but not that
much. My main concern is that its a bit brute force and in
presence of errors I like, as a dev, to have access to the raw
data. Users have others expectancies and I don't a clear answer
about what should be presented.

<snip/>

    martin> There are two types of possible duplication: across
    martin> different call paths (get vs post vs others) and
    martin> across different http client implementations. I can
    martin> see how different callers might want different
    martin> interpretations for some codes, but I don't see why
    martin> we would want different interpretations across urllib
    martin> and pycurl, unless it really is something where the
    martin> client library interferes with the result, as it may
    martin> for redirects.

So, thanks for bringing a fresh eye here. Historically I've never
attempt to unify the two implementations very hard based on the
assumption that pycurl will be deprecated (shudder).

But as you point above, both goals are a bit contradictory, you
have to care to leave errors escape the lower layer handling so
that the higher (common) one can work.

<snip/>

    martin> So your code has, in handle_response:

    martin> elif code == 405:
    martin> # The server refused to handle the request, the data presumably
    martin> # contains details about the error and not useful data, but we leave
    martin> # that to the higher levels.
    martin> return data

    martin> which strongly looks like you're going to treat the
    martin> html text of the error response as a valid response
    martin> body, which is definitely not right. OK, apparently
    martin> this doesn't necessarily happen because the base
    martin> class send_http_smart_request translates it, but
    martin> there's still some risk of that.

Yup, that's why I said the implementation is incomplete. This was
the less intrusive fix but I was not happy with the risk.

    martin> If you pass around (http_status, body) too much then
    martin> multiple layers need to know how to interpret the
    martin> status code. Without understanding it, they can't
    martin> understand how to use the body data. For insta2nce,
    martin> send_http_smart_request ...

Read more...

Revision history for this message
Vincent Ladeuil (vila) wrote :

    martin> I've split out the html error handling (pycurl only) into
    martin> https://code.edge.launchpad.net/~mbp/bzr/http-messages

The diff isn't online yet, I'll give it a look in a couple of
days. See my other mail and feel free to bring more of your work
based on my remarks there.

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> martin> I guess I don't properly understand whether you think
> martin> my patch is actually wrong or just needs tests. I
> martin> will try to use your tests with mine and see where we
> martin> get to.
>
> I hope I clarified that: except for 404, I'm fine with a common
> error handling, may be I just haven't seen yet a good way to
> implement that but I didn't spend much time on it either and I
> trust you to find it ;)
>

If the only code that uses 404 specially is HEAD, then I would be happy
to have it trap the exception. #1 because I don't think we use
"transport.has*" in any live code path. LBYL vs EAFTP.

I think the only code that uses it is probably old-format repositories.
(Where having a revision was indicated by the presence of the
'.bzr/revisions/$REVID' file.)

I'm pretty sure everything else would treate 404 as NoSuchFile.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksvi9wACgkQJdeBCYSNAAM7ygCffCBwDol9uBmR8XgrlxnQTe1e
vOMAoKCYRxzzy7RQmM+Byq7iKO7EP0VF
=dmsP
-----END PGP SIGNATURE-----

Revision history for this message
Martin Pool (mbp) wrote :

2009/12/22 John A Meinel <email address hidden>:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
>>     martin> I guess I don't properly understand whether you think
>>     martin> my patch is actually wrong or just needs tests.  I
>>     martin> will try to use your tests with mine and see where we
>>     martin> get to.
>>
>> I hope I clarified that: except for 404, I'm fine with a common
>> error handling, may be I just haven't seen yet a good way to
>> implement that but I didn't spend much time on it either and I
>> trust you to find it ;)
>>
>
> If the only code that uses 404 specially is HEAD, then I would be happy
> to have it trap the exception. #1 because I don't think we use
> "transport.has*" in any live code path. LBYL vs EAFTP.
>
> I think the only code that uses it is probably old-format repositories.
> (Where having a revision was indicated by the presence of the
> '.bzr/revisions/$REVID' file.)
>
> I'm pretty sure everything else would treate 404 as NoSuchFile.

I think it would be reasonable for .has() to just catch NoSuchFile.

Having looked at this a bit more: InvalidHttpResponse mixes both "not
valid http" and "not the http code we expected." So perhaps a good
step would be to separate that into "HttpError" containing all the
right fields, then we can translate that into a specific bzr error,
taking into account the type of operation we were doing.

However, since imports from google are apparently now working, this is
not urgent for me.

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message
Vincent Ladeuil (vila) wrote :

@poolie: From the comments, I get the impression that you have un-pushed changes there. If that's right, can you update this mp ? I'm pretty sure it's in a good enough state to land on trunk if only to allow whoever being able to reproduce the issue to test it.

This probably needs to be refreshed against the current trunk, watch the conflicts with an hawk eye ;)

review: Needs Information
Revision history for this message
Martin Pool (mbp) wrote :

vila, on the whole, I'll just abandon this branch. I think the situation it was trying to fix may have been a transient problem at Google's end, and there's no longer enough obvious benefit to be worth changing it.

Revision history for this message
Vincent Ladeuil (vila) wrote :

>>>>> Martin Pool <email address hidden> writes:

    > vila, on the whole, I'll just abandon this branch. I think the
    > situation it was trying to fix may have been a transient problem
    > at Google's end, and there's no longer enough obvious benefit to
    > be worth changing it.

Did you write this before or after commenting on bug #497274 ?

https://bugs.launchpad.net/bzr/+bug/497274/comments/15 kind of imply you
still consider this proposal to be useful no ?

Revision history for this message
Martin Pool (mbp) wrote :

On 21 February 2011 18:48, Vincent Ladeuil <email address hidden> wrote:
>>>>>> Martin Pool <email address hidden> writes:
>
>    > vila, on the whole, I'll just abandon this branch.  I think the
>    > situation it was trying to fix may have been a transient problem
>    > at Google's end, and there's no longer enough obvious benefit to
>    > be worth changing it.
>
> Did you write this before or after commenting on bug #497274 ?
>
> https://bugs.launchpad.net/bzr/+bug/497274/comments/15 kind of imply you
> still consider this proposal to be useful no ?

I wrote that before.

I don't think this mp is still in progress, but it can be resurrected
if that's useful.

Unmerged revisions

4917. By Martin Pool

Remove obsolete comments

4916. By Martin Pool

Show unhtml_roughly for some http errors to give more of a clue why

4915. By Martin Pool

Add unhtml_roughly to show html http errors

4914. By Martin Pool

doc

4913. By Martin Pool

Move http->error translation into HttpTransportBase

4912. By Martin Pool

TransportNotPossible taken to mean there's no branch present

4911. By Martin Pool

Tweak http error

4910. By Martin Pool

Interpret http 405 'not allowed' as meaning there's no smart server present

4909. By Martin Pool

Tweak http error message

4908. By Martin Pool

Map http 405 to TransportNotPossible

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'NEWS'
--- NEWS 2009-12-15 19:59:00 +0000
+++ NEWS 2009-12-16 08:26:16 +0000
@@ -20,6 +20,11 @@
20Bug Fixes20Bug Fixes
21*********21*********
2222
23* HTTP 405 "Not Allowed" is taken to mean there's no bzr branch or smart
24 server at the URL. This is sent by Google Code and blocks use of
25 foreign branch plugins.
26 (Martin Pool, #497274)
27
23Improvements28Improvements
24************29************
2530
@@ -32,6 +37,9 @@
32Internals37Internals
33*********38*********
3439
40* PyCurl http/https error handling unified; this may have knock-on effects
41 on quirky web servers. (Martin Pool)
42
35Testing43Testing
36*******44*******
3745
3846
=== modified file 'bzrlib/bzrdir.py'
--- bzrlib/bzrdir.py 2009-12-02 17:56:06 +0000
+++ bzrlib/bzrdir.py 2009-12-16 08:26:16 +0000
@@ -1828,7 +1828,7 @@
1828 """Return the .bzrdir style format present in a directory."""1828 """Return the .bzrdir style format present in a directory."""
1829 try:1829 try:
1830 format_string = transport.get_bytes(".bzr/branch-format")1830 format_string = transport.get_bytes(".bzr/branch-format")
1831 except errors.NoSuchFile:1831 except (errors.NoSuchFile, errors.TransportNotPossible):
1832 raise errors.NotBranchError(path=transport.base)1832 raise errors.NotBranchError(path=transport.base)
18331833
1834 try:1834 try:
18351835
=== modified file 'bzrlib/tests/__init__.py'
--- bzrlib/tests/__init__.py 2009-12-08 21:46:07 +0000
+++ bzrlib/tests/__init__.py 2009-12-16 08:26:16 +0000
@@ -3934,6 +3934,7 @@
3934 'bzrlib.symbol_versioning',3934 'bzrlib.symbol_versioning',
3935 'bzrlib.tests',3935 'bzrlib.tests',
3936 'bzrlib.timestamp',3936 'bzrlib.timestamp',
3937 'bzrlib.transport.http',
3937 'bzrlib.version_info_formats.format_custom',3938 'bzrlib.version_info_formats.format_custom',
3938 ]3939 ]
39393940
39403941
=== modified file 'bzrlib/transport/http/__init__.py'
--- bzrlib/transport/http/__init__.py 2009-11-26 14:39:31 +0000
+++ bzrlib/transport/http/__init__.py 2009-12-16 08:26:16 +0000
@@ -205,6 +205,51 @@
205 # use.205 # use.
206 _get_max_size = 0206 _get_max_size = 0
207207
208 def _raise_mapped_error(self, url, code, response_body,
209 context_message=None):
210 """Translate an http error into a bzrlib exception.
211
212 Some methods may choose to override this for particular cases.
213
214 The URL and code are automatically included as appropriate.
215
216 :param code: integer response code
217 :param response_body: string containing the body of the error response
218 (typically html)
219 """
220 # XXX: This should also be used through the urllib implementation, but
221 # it's not yet
222 mutter("http error: %d %s" % (code, url))
223 mutter(" response body: %r" % response_body)
224 if response_body:
225 plaintext_body = unhtml_roughly(response_body)
226 else:
227 plaintext_body = ''
228 if code == 404:
229 raise errors.NoSuchFile(url)
230 elif code == 403:
231 # "The server understood the request, but is refusing to fulfill
232 # it. Authorization will not help and the request SHOULD NOT be
233 # repeated."
234 raise errors.InvalidHttpResponse(url, '403 Forbidden')
235 elif code == 405:
236 # "The method specified in the Request-Line is not allowed for the
237 # resource identified by the Request-URI. The response MUST
238 # include an Allow header containing a list of valid methods for
239 # the requested resource."
240 #
241 # Sent by Google code when probing for .bzr
242 # <https://bugs.edge.launchpad.net/bzr/+bug/497274>
243 raise errors.TransportNotPossible(url, '405 Not Allowed: '
244 + plaintext_body)
245 else:
246 if context_message is None:
247 msg = ''
248 else:
249 msg = ': ' + context_message
250 raise errors.InvalidHttpResponse(
251 url, 'http response %d%s: %s' % (code, msg, plaintext_body))
252
208 def _readv(self, relpath, offsets):253 def _readv(self, relpath, offsets):
209 """Get parts of the file at the given relative path.254 """Get parts of the file at the given relative path.
210255
@@ -614,10 +659,13 @@
614 t = self._http_transport_ref()659 t = self._http_transport_ref()
615 code, body_filelike = t._post(bytes)660 code, body_filelike = t._post(bytes)
616 if code != 200:661 if code != 200:
662 # this should normally be handled by _post, but check it here
617 raise InvalidHttpResponse(663 raise InvalidHttpResponse(
618 t._remote_path('.bzr/smart'),664 t._remote_path('.bzr/smart'),
619 'Expected 200 response code, got %r' % (code,))665 'Expected 200 response code, got %r' % (code,))
620 except (errors.InvalidHttpResponse, errors.ConnectionReset), e:666 except (errors.TransportNotPossible,
667 errors.InvalidHttpResponse,
668 errors.ConnectionReset), e:
621 raise errors.SmartProtocolError(str(e))669 raise errors.SmartProtocolError(str(e))
622 return body_filelike670 return body_filelike
623671
@@ -661,3 +709,12 @@
661 def _finished_reading(self):709 def _finished_reading(self):
662 """See SmartClientMediumRequest._finished_reading."""710 """See SmartClientMediumRequest._finished_reading."""
663 pass711 pass
712
713
714def unhtml_roughly(maybe_html):
715 """Very approximate html->text translation, for presenting error bodies.
716
717 >>> unhtml_roughly("<b>bad</b> things happened\\n")
718 ' bad things happened '
719 """
720 return re.subn(r"(<[^>]*>|\n|&nbsp;)", " ", maybe_html)[0]
664721
=== modified file 'bzrlib/transport/http/_pycurl.py'
--- bzrlib/transport/http/_pycurl.py 2009-08-19 16:33:39 +0000
+++ bzrlib/transport/http/_pycurl.py 2009-12-16 08:26:16 +0000
@@ -1,4 +1,4 @@
1# Copyright (C) 2006 Canonical Ltd1# Copyright (C) 2006, 2009 Canonical Ltd
2#2#
3# This program is free software; you can redistribute it and/or modify3# This program is free software; you can redistribute it and/or modify
4# it under the terms of the GNU General Public License as published by4# it under the terms of the GNU General Public License as published by
@@ -206,13 +206,10 @@
206 code = curl.getinfo(pycurl.HTTP_CODE)206 code = curl.getinfo(pycurl.HTTP_CODE)
207 data.seek(0)207 data.seek(0)
208208
209 if code == 404:209 if code == 200:
210 raise errors.NoSuchFile(abspath)210 return code, data
211 if code != 200:211 else:
212 self._raise_curl_http_error(212 self._raise_curl_http_error(curl)
213 curl, 'expected 200 or 404 for full response.')
214
215 return code, data
216213
217 # The parent class use 0 to minimize the requests, but since we can't214 # The parent class use 0 to minimize the requests, but since we can't
218 # exploit the results as soon as they are received (pycurl limitation) we'd215 # exploit the results as soon as they are received (pycurl limitation) we'd
@@ -237,16 +234,21 @@
237234
238 code = curl.getinfo(pycurl.HTTP_CODE)235 code = curl.getinfo(pycurl.HTTP_CODE)
239236
240 if code == 404: # not found237 if code in (200, 206):
241 raise errors.NoSuchFile(abspath)238 msg = self._parse_headers(header)
239 return code, response.handle_response(abspath, code, msg, data)
242 elif code in (400, 416):240 elif code in (400, 416):
243 # We don't know which, but one of the ranges we specified was241 # We don't know which, but one of the ranges we specified was
244 # wrong.242 # wrong.
243 #
244 # 400 is not strictly 'invalid range', but rather 'bad request'
245 # but (perhaps) some servers use it for that meaning. Aside from
246 # that, this could go into _raise_curl_http_error.
245 raise errors.InvalidHttpRange(abspath, range_header,247 raise errors.InvalidHttpRange(abspath, range_header,
246 'Server return code %d'248 'Server return code %d'
247 % curl.getinfo(pycurl.HTTP_CODE))249 % curl.getinfo(pycurl.HTTP_CODE))
248 msg = self._parse_headers(header)250 else:
249 return code, response.handle_response(abspath, code, msg, data)251 self._raise_curl_http_error(curl)
250252
251 def _parse_headers(self, status_and_headers):253 def _parse_headers(self, status_and_headers):
252 """Transform the headers provided by curl into an HTTPMessage"""254 """Transform the headers provided by curl into an HTTPMessage"""
@@ -285,26 +287,31 @@
285 raise287 raise
286 data.seek(0)288 data.seek(0)
287 code = curl.getinfo(pycurl.HTTP_CODE)289 code = curl.getinfo(pycurl.HTTP_CODE)
288 msg = self._parse_headers(header)290 if code == 200:
289 return code, response.handle_response(abspath, code, msg, data)291 msg = self._parse_headers(header)
290292 return code, response.handle_response(abspath, code, msg, data)
291293 else:
292 def _raise_curl_http_error(self, curl, info=None):294 self._raise_curl_http_error(curl, body=data)
295
296
297 def _raise_curl_http_error(self, curl, info=None, body=None):
298 """Common curl->bzrlib error translation.
299
300 Some methods may choose to override this for particular cases.
301
302 The URL and code are automatically included as appropriate.
303
304 :param info: Extra information to include in the message.
305 :param body: File-like object from which the body of the page can be read.
306 """
293 code = curl.getinfo(pycurl.HTTP_CODE)307 code = curl.getinfo(pycurl.HTTP_CODE)
294 url = curl.getinfo(pycurl.EFFECTIVE_URL)308 url = curl.getinfo(pycurl.EFFECTIVE_URL)
295 # Some error codes can be handled the same way for all309 if body is not None:
296 # requests310 response_body = body.read()
297 if code == 403:
298 raise errors.TransportError(
299 'Server refuses to fulfill the request (403 Forbidden)'
300 ' for %s' % url)
301 else:311 else:
302 if info is None:312 response_body = None
303 msg = ''313 self._raise_mapped_error(url, code, response_body=response_body,
304 else:314 context_message=info)
305 msg = ': ' + info
306 raise errors.InvalidHttpResponse(
307 url, 'Unable to handle http code %d%s' % (code,msg))
308315
309 def _debug_cb(self, kind, text):316 def _debug_cb(self, kind, text):
310 if kind in (pycurl.INFOTYPE_HEADER_IN, pycurl.INFOTYPE_DATA_IN,317 if kind in (pycurl.INFOTYPE_HEADER_IN, pycurl.INFOTYPE_DATA_IN,