Merge into bzr.dev : 497274-http-405 : Code : Bazaar

Reviewer	Date Requested	Status
Vincent Ladeuil		Needs Information on 2011-02-19
John A Meinel	2009-12-16	Approve on 2009-12-16
Review via email: mp+16229@code.launchpad.net

Revision history for this message

Martin Pool (mbp) wrote on 2009-12-16:

#

This tries to handle 'http 405 not allowed' from Google code as meaning "not allowed to look for .bzr", ie there's no bzr branch there. That may allow foreign branches to work against it. However, their servers keep falling over with '503 not available' errors, which make testing it difficult.

I've only tested this interactively not added tests. I probably should.

This prints and logs the actual message from the http response page.

It also tries to unify http response handling currently scattered across _pycurl.py into one method in the base class. This may be slightly dangerous to change as it's lightly tested and tries to work around server quirks, but any breakage should be shallow and it's worth clearing it up I think. I haven't yet gone through the urllib implementation.

Revision history for this message

John A Meinel (jameinel) wrote on 2009-12-16:

#

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> Martin Pool has proposed merging lp:~mbp/bzr/497274-http-405 into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
> Related bugs:
> #497274 should interpret http 405 "not allowed" as "no smart server here" - breaks foreign branches on google code
> https://bugs.launchpad.net/bugs/497274
>
>
> This tries to handle 'http 405 not allowed' from Google code as meaning "not allowed to look for .bzr", ie there's no bzr branch there. That may allow foreign branches to work against it. However, their servers keep falling over with '503 not available' errors, which make testing it difficult.
>
> I've only tested this interactively not added tests. I probably should.
>
> This prints and logs the actual message from the http response page.
>
> It also tries to unify http response handling currently scattered across _pycurl.py into one method in the base class. This may be slightly dangerous to change as it's lightly tested and tries to work around server quirks, but any breakage should be shallow and it's worth clearing it up I think. I haven't yet gone through the urllib implementation.
>

I would ask Vincent for some help, but as I understand he already has
all the infrastructure to create an HTTP server that gives canned
responses. So setting one up to return 405 on .bzr/branch-format and
ensuring that bzr then raises NoSuchBranch rather than
InvalidHTTPResponse seems a reasonable high-level test.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkspBNQACgkQJdeBCYSNAAMOsACguGOLfNw3cNBoJNFYjSMjl3io
biAAn3asxZvkOtMob2ryoGSEF7laoBGT
=JW9R
-----END PGP SIGNATURE-----

Revision history for this message

John A Meinel (jameinel) wrote on 2009-12-16:

#

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> Martin Pool has proposed merging lp:~mbp/bzr/497274-http-405 into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
> Related bugs:
> #497274 should interpret http 405 "not allowed" as "no smart server here" - breaks foreign branches on google code
> https://bugs.launchpad.net/bugs/497274
>
>
> This tries to handle 'http 405 not allowed' from Google code as meaning "not allowed to look for .bzr", ie there's no bzr branch there. That may allow foreign branches to work against it. However, their servers keep falling over with '503 not available' errors, which make testing it difficult.
>
> I've only tested this interactively not added tests. I probably should.
>
> This prints and logs the actual message from the http response page.
>
> It also tries to unify http response handling currently scattered across _pycurl.py into one method in the base class. This may be slightly dangerous to change as it's lightly tested and tries to work around server quirks, but any breakage should be shallow and it's worth clearing it up I think. I haven't yet gone through the urllib implementation.
>

Oh, and:

review: approve

- From me, though it would be nice to have the test case before landing.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkspBRcACgkQJdeBCYSNAAPp+ACfQl/BMB8fhepPk1/O5vH2FP5v
gp8AoJXZrwqN8ebjB9aAi36oc11xabJz
=mBqz
-----END PGP SIGNATURE-----

review: Approve

Revision history for this message

Vincent Ladeuil (vila) wrote on 2009-12-18:

#

> This tries to handle 'http 405 not allowed' from Google code as meaning "not
> allowed to look for .bzr", ie there's no bzr branch there. That may allow
> foreign branches to work against it. However, their servers keep falling over
> with '503 not available' errors, which make testing it difficult.

Then look at lp:~vila/bzr/497274-http-405 for an alternative, minimal (and probably incomplete)
fix with tests.

_post() is used only by smart_http_request() and the later already translate
InvalidHttpResponse into SmartProtocolError which higher layers interpret as NotABranch when needed.

>
> I've only tested this interactively not added tests. I probably should.

See above.

>
> This prints and logs the actual message from the http response page.

I doubt users are really interested in the general case, muttering may be enough though.

>
> It also tries to unify http response handling currently scattered across
> _pycurl.py into one method in the base class.

I'm pretty sure I went mostly in the opposite direction in the past based on problems
encountered while implementing the webdav support.

I didn't try to unify both implementations either... mostly because we will deprecate pycurl
one day and I was happy with the urllib one. The idea there is that there are three categories
of error codes:
- the ones that represent success,
- the ones that represent known failures,
- the others

Upon reception, the callers must handle the success cases, may handle some known
failures specifically before relying on a fallback for the remaining ones.

Basically the http requests are generic enough that the same error code can be interpreted
differently in different contexts. Unifying the error handling too much makes it harder
to raise the correct bzr exceptions.

So far, only 403 was really unambiguously common, even 404 needed to be handled separately
for HEAD (which is the main reason why I prefer to *not* have it in the common code).

All in all, that gives a coherent way to code all the methods with limited duplication
(mostly 404 but everybody knows what 404 means :)

> This may be slightly dangerous
> to change as it's lightly tested and tries to work around server quirks, but
> any breakage should be shallow and it's worth clearing it up I think. I
> haven't yet gone through the urllib implementation.

See above, that's a one-line fix, the pycurl implementation doesn't even need to be fixed.

That may explains why you can't test it against google, I suspect others use the
urllib implementation and needs the fix.

Note that I fixed a bug in send_http_smart_request so you want to backport at least that part.

So I vote needs_fixing and will mark as Work In Progress.

> This tries to handle 'http 405 not allowed' from Google code as meaning "not
> allowed to look for .bzr", ie there's no bzr branch there.  That may allow
> foreign branches to work against it.  However, their servers keep falling over
> with '503 not available' errors, which make testing it difficult.

Then look at lp:~vila/bzr/497274-http-405 for an alternative, minimal (and probably incomplete)
fix with tests.

_post() is used only by smart_http_request() and the later already translate 
InvalidHttpResponse into SmartProtocolError which higher layers interpret as NotABranch when needed.

> 
> I've only tested this interactively not added tests.  I probably should.

See above.

> 
> This prints and logs the actual message from the http response page.

I doubt users are really interested in the general case, muttering may be enough though.

> 
> It also tries to unify http response handling currently scattered across
> _pycurl.py into one method in the base class.

I'm pretty sure I went mostly in the opposite direction in the past based on problems
encountered while implementing the webdav support.

I didn't try to unify both implementations either... mostly because we will deprecate pycurl
one day and I was happy with the urllib one. The idea there is that there are three categories
of error codes: 
- the ones that represent success,
- the ones that represent known failures,
- the others

Upon reception, the callers must handle the success cases, may handle some known
failures specifically before relying on a fallback for the remaining ones.

Basically the http requests are generic enough that the same error code can be interpreted
differently in different contexts. Unifying the error handling too much makes it harder
to raise the correct bzr exceptions.

So far, only 403 was really unambiguously common, even 404 needed to be handled separately
for HEAD (which is the main reason why I prefer to *not* have it in the common code).

All in all, that gives a coherent way to code all the methods with limited duplication
(mostly 404 but everybody knows what 404 means :)

> This may be slightly dangerous
> to change as it's lightly tested and tries to work around server quirks, but
> any breakage should be shallow and it's worth clearing it up I think.  I
> haven't yet gone through the urllib implementation.

See above, that's a one-line fix, the pycurl implementation doesn't even need to be fixed.

That may explains why you can't test it against google, I suspect others use the 
urllib implementation and needs the fix.

Note that I fixed a bug in send_http_smart_request so you want to backport at least that part.

So I vote needs_fixing and will mark as Work In Progress.

review: Needs Fixing

Revision history for this message

Martin Pool (mbp) wrote on 2009-12-21:

#

Download full text (4.8 KiB)

> > This tries to handle 'http 405 not allowed' from Google code as meaning "not
> > allowed to look for .bzr", ie there's no bzr branch there. That may allow
> > foreign branches to work against it. However, their servers keep falling
> over
> > with '503 not available' errors, which make testing it difficult.
>
> Then look at lp:~vila/bzr/497274-http-405 for an alternative, minimal (and
> probably incomplete)
> fix with tests.

I'll merge that into mine and see how it looks.

> _post() is used only by smart_http_request() and the later already translate
> InvalidHttpResponse into SmartProtocolError which higher layers interpret as
> NotABranch when needed.
>
> >
> > I've only tested this interactively not added tests. I probably should.
>
> See above.
>
> >
> > This prints and logs the actual message from the http response page.
>
> I doubt users are really interested in the general case, muttering may be
> enough though.

They might not be, but only giving the status code is not very helpful if there is something mysterious going on, like the server being overloaded. We tend to get bug reports about this - though just getting it into the trace should help us with that.

>
> >
> > It also tries to unify http response handling currently scattered across
> > _pycurl.py into one method in the base class.
>
> I'm pretty sure I went mostly in the opposite direction in the past based on
> problems
> encountered while implementing the webdav support.
>
> I didn't try to unify both implementations either... mostly because we will
> deprecate pycurl
> one day and I was happy with the urllib one. The idea there is that there are
> three categories
> of error codes:
> - the ones that represent success,
> - the ones that represent known failures,
> - the others
>
> Upon reception, the callers must handle the success cases, may handle some
> known
> failures specifically before relying on a fallback for the remaining ones.
>
> Basically the http requests are generic enough that the same error code can be
> interpreted
> differently in different contexts. Unifying the error handling too much makes
> it harder
> to raise the correct bzr exceptions.
>
> So far, only 403 was really unambiguously common, even 404 needed to be
> handled separately
> for HEAD (which is the main reason why I prefer to *not* have it in the common
> code).
>
> All in all, that gives a coherent way to code all the methods with limited
> duplication
> (mostly 404 but everybody knows what 404 means :)

There are two types of possible duplication: across different call paths (get vs post vs others) and across different http client implementations. I can see how different callers might want different interpretations for some codes, but I don't see why we would want different interpretations across urllib and pycurl, unless it really is something where the client library interferes with the result, as it may for redirects.

>
> > This may be slightly dangerous
> > to change as it's lightly tested and tries to work around server quirks, but
> > any breakage should be shallow and it's worth clearing it up I think. I
> > haven't yet gone through the urllib implementatio...

> > This tries to handle 'http 405 not allowed' from Google code as meaning "not
> > allowed to look for .bzr", ie there's no bzr branch there.  That may allow
> > foreign branches to work against it.  However, their servers keep falling
> over
> > with '503 not available' errors, which make testing it difficult.
> 
> Then look at lp:~vila/bzr/497274-http-405 for an alternative, minimal (and
> probably incomplete)
> fix with tests.

I'll merge that into mine and see how it looks.

> _post() is used only by smart_http_request() and the later already translate
> InvalidHttpResponse into SmartProtocolError which higher layers interpret as
> NotABranch when needed.
> 
> >
> > I've only tested this interactively not added tests.  I probably should.
> 
> See above.
> 
> >
> > This prints and logs the actual message from the http response page.
> 
> I doubt users are really interested in the general case, muttering may be
> enough though.

They might not be, but only giving the status code is not very helpful if there is something mysterious going on, like the server being overloaded.  We tend to get bug reports about this - though just getting it into the trace should help us with that.

> 
> >
> > It also tries to unify http response handling currently scattered across
> > _pycurl.py into one method in the base class.
> 
> I'm pretty sure I went mostly in the opposite direction in the past based on
> problems
> encountered while implementing the webdav support.
> 
> I didn't try to unify both implementations either... mostly because we will
> deprecate pycurl
> one day and I was happy with the urllib one. The idea there is that there are
> three categories
> of error codes:
> - the ones that represent success,
> - the ones that represent known failures,
> - the others
> 
> Upon reception, the callers must handle the success cases, may handle some
> known
> failures specifically before relying on a fallback for the remaining ones.
> 
> Basically the http requests are generic enough that the same error code can be
> interpreted
> differently in different contexts. Unifying the error handling too much makes
> it harder
> to raise the correct bzr exceptions.
> 
> So far, only 403 was really unambiguously common, even 404 needed to be
> handled separately
> for HEAD (which is the main reason why I prefer to *not* have it in the common
> code).
> 
> All in all, that gives a coherent way to code all the methods with limited
> duplication
> (mostly 404 but everybody knows what 404 means :)

There are two types of possible duplication: across different call paths (get vs post vs others) and across different http client implementations.  I can see how different callers might want different interpretations for some codes, but I don't see why we would want different interpretations across urllib and pycurl, unless it really is something where the client library interferes with the result, as it may for redirects.

> 
> > This may be slightly dangerous
> > to change as it's lightly tested and tries to work around server quirks, but
> > any breakage should be shallow and it's worth clearing it up I think.  I
> > haven't yet gone through the urllib implementation.
> 
> See above, that's a one-line fix, the pycurl implementation doesn't even need
> to be fixed.

ok, 
 
> That may explains why you can't test it against google, I suspect others use
> the
> urllib implementation and needs the fix.
> 
> Note that I fixed a bug in send_http_smart_request so you want to backport at
> least that part.
> 
> So I vote needs_fixing and will mark as Work In Progress.

So your code has, in handle_response:

elif code == 405:
        # The server refused to handle the request, the data presumably
        # contains details about the error and not useful data, but we leave
        # that to the higher levels.
        return data

which strongly looks like you're going to treat the html text of the error response as a valid response body, which is definitely not right.  OK, apparently this doesn't necessarily happen because the base class send_http_smart_request translates it, but there's still some risk of that.

If you pass around (http_status, body) too much then multiple layers need to know how to interpret the status code.  Without understanding it, they can't understand how to use the body data.  For instance, send_http_smart_request assumes that only 200 means success, but that's not quite absolutely true.

So I'd like to have one reasonably well defined place that turns error-like responses into exceptions.  Then if some code wants to handle them specially, it can always catch the exception.  I think it's a bit bad that send_http_smart_request needs to both check the status and also catch exceptions.

I guess I don't properly understand whether you think my patch is actually wrong or just needs tests.  I will try to use your tests with mine and see where we get to.

Revision history for this message

Martin Pool (mbp) wrote on 2009-12-21:

#

@vila: How do you see the interaction between the various http exception classes working. For instance, why is this code as it is:

    def http_error_default(self, req, fp, code, msg, hdrs):
        if code == 403:
            raise errors.TransportError(
                'Server refuses to fulfill the request (403 Forbidden)'
                ' for %s' % req.get_full_url())
        else:
            raise errors.InvalidHttpResponse(req.get_full_url(),
                                             'Unable to handle http code %d: %s'
                                             % (code, msg))

Revision history for this message

Martin Pool (mbp) wrote on 2009-12-21:

#

google code seems to be up again and I've just been testing this interactively, and I can't actually get trunk to fail there, even with pycurl disabled. So I'm not sure what the future of this patch is: to me at least some of it seems like a useful cleanup, but given all the quirks here I don't think it's worth landing it without at least one real-world test showing it's better.

vila, how did you reproduce an improvement with your patch?

Revision history for this message

Martin Pool (mbp) wrote on 2009-12-21:

#

I've split out the html error handling (pycurl only) into https://code.edge.launchpad.net/~mbp/bzr/http-messages

Revision history for this message

Vincent Ladeuil (vila) wrote on 2009-12-21:

#

Download full text (4.2 KiB)

>>>>> "martin" == Martin Pool <email address hidden> writes:

Disclaimer: I wrote my patch in urgency without trying to
integrate it with yours, as such I tried to address the missing
parts. The were no tests and no urllib implementation. I stopped
there and didn't explain it as well as I should have, see below
for more comments.

<snip/>

martin> I'll merge that into mine and see how it looks.

That was the idea, I'm fine with whatever end result you come up
with.

<snip/>

>> I doubt users are really interested in the general case,
>> muttering may be enough though.

    martin> They might not be, but only giving the status code is
    martin> not very helpful if there is something mysterious
    martin> going on, like the server being overloaded. We tend
    martin> to get bug reports about this - though just getting
    martin> it into the trace should help us with that.

Yup. I was a bit uncomfortable with unhtml() but not that
much. My main concern is that its a bit brute force and in
presence of errors I like, as a dev, to have access to the raw
data. Users have others expectancies and I don't a clear answer
about what should be presented.

<snip/>

    martin> There are two types of possible duplication: across
    martin> different call paths (get vs post vs others) and
    martin> across different http client implementations. I can
    martin> see how different callers might want different
    martin> interpretations for some codes, but I don't see why
    martin> we would want different interpretations across urllib
    martin> and pycurl, unless it really is something where the
    martin> client library interferes with the result, as it may
    martin> for redirects.

So, thanks for bringing a fresh eye here. Historically I've never
attempt to unify the two implementations very hard based on the
assumption that pycurl will be deprecated (shudder).

But as you point above, both goals are a bit contradictory, you
have to care to leave errors escape the lower layer handling so
that the higher (common) one can work.

<snip/>

martin> So your code has, in handle_response:

    martin> elif code == 405:
    martin> # The server refused to handle the request, the data presumably
    martin> # contains details about the error and not useful data, but we leave
    martin> # that to the higher levels.
    martin> return data

    martin> which strongly looks like you're going to treat the
    martin> html text of the error response as a valid response
    martin> body, which is definitely not right. OK, apparently
    martin> this doesn't necessarily happen because the base
    martin> class send_http_smart_request translates it, but
    martin> there's still some risk of that.

Yup, that's why I said the implementation is incomplete. This was
the less intrusive fix but I was not happy with the risk.

    martin> If you pass around (http_status, body) too much then
    martin> multiple layers need to know how to interpret the
    martin> status code. Without understanding it, they can't
    martin> understand how to use the body data. For insta2nce,
    martin> send_http_smart_request ...

>>>>> "martin" == Martin Pool <mbp@sourcefrog.net> writes:

Disclaimer: I wrote my patch in urgency without trying to
integrate it with yours, as such I tried to address the missing
parts. The were no tests and no urllib implementation. I stopped
there and didn't explain it as well as I should have, see below
for more comments.

<snip/>

martin> I'll merge that into mine and see how it looks.

That was the idea, I'm fine with whatever end result you come up
with.

<snip/>

>> I doubt users are really interested in the general case,
    >> muttering may be enough though.

martin> They might not be, but only giving the status code is
    martin> not very helpful if there is something mysterious
    martin> going on, like the server being overloaded.  We tend
    martin> to get bug reports about this - though just getting
    martin> it into the trace should help us with that.

Yup. I was a bit uncomfortable with unhtml() but not that
much. My main concern is that its a bit brute force and in
presence of errors I like, as a dev, to have access to the raw
data. Users have others expectancies and I don't a clear answer
about what should be presented.

<snip/>

martin> There are two types of possible duplication: across
    martin> different call paths (get vs post vs others) and
    martin> across different http client implementations.  I can
    martin> see how different callers might want different
    martin> interpretations for some codes, but I don't see why
    martin> we would want different interpretations across urllib
    martin> and pycurl, unless it really is something where the
    martin> client library interferes with the result, as it may
    martin> for redirects.

So, thanks for bringing a fresh eye here. Historically I've never
attempt to unify the two implementations very hard based on the
assumption that pycurl will be deprecated (shudder).

But as you point above, both goals are a bit contradictory, you
have to care to leave errors escape the lower layer handling so
that the higher (common) one can work.

<snip/>

martin> So your code has, in handle_response:

martin>     elif code == 405:
    martin>         # The server refused to handle the request, the data presumably
    martin>         # contains details about the error and not useful data, but we leave
    martin>         # that to the higher levels.
    martin>         return data

martin> which strongly looks like you're going to treat the
    martin> html text of the error response as a valid response
    martin> body, which is definitely not right.  OK, apparently
    martin> this doesn't necessarily happen because the base
    martin> class send_http_smart_request translates it, but
    martin> there's still some risk of that.

Yup, that's why I said the implementation is incomplete. This was
the less intrusive fix but I was not happy with the risk.

martin> If you pass around (http_status, body) too much then
    martin> multiple layers need to know how to interpret the
    martin> status code.  Without understanding it, they can't
    martin> understand how to use the body data.  For insta2nce,
    martin> send_http_smart_request assumes that only 200 means
    martin> success, but that's not quite absolutely true.

martin> So I'd like to have one reasonably well defined place
    martin> that turns error-like responses into exceptions.
    martin> Then if some code wants to handle them specially, it
    martin> can always catch the exception.  I think it's a bit
    martin> bad that send_http_smart_request needs to both check
    martin> the status and also catch exceptions.

Yes.

martin> I guess I don't properly understand whether you think
    martin> my patch is actually wrong or just needs tests.  I
    martin> will try to use your tests with mine and see where we
    martin> get to.

I hope I clarified that: except for 404, I'm fine with a common
error handling, may be I just haven't seen yet a good way to
implement that but I didn't spend much time on it either and I
trust you to find it ;)

And reading the other comments: I didn't test it myself against
google code, I just started with the tests (which I recognize may
look a bit like black magic).

Revision history for this message

Vincent Ladeuil (vila) wrote on 2009-12-21:

#

martin> I've split out the html error handling (pycurl only) into
martin> https://code.edge.launchpad.net/~mbp/bzr/http-messages

The diff isn't online yet, I'll give it a look in a couple of
days. See my other mail and feel free to bring more of your work
based on my remarks there.

Revision history for this message

John A Meinel (jameinel) wrote on 2009-12-21:

#

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> martin> I guess I don't properly understand whether you think
> martin> my patch is actually wrong or just needs tests. I
> martin> will try to use your tests with mine and see where we
> martin> get to.
>
> I hope I clarified that: except for 404, I'm fine with a common
> error handling, may be I just haven't seen yet a good way to
> implement that but I didn't spend much time on it either and I
> trust you to find it ;)
>

If the only code that uses 404 specially is HEAD, then I would be happy
to have it trap the exception. #1 because I don't think we use
"transport.has*" in any live code path. LBYL vs EAFTP.

I think the only code that uses it is probably old-format repositories.
(Where having a revision was indicated by the presence of the
'.bzr/revisions/$REVID' file.)

I'm pretty sure everything else would treate 404 as NoSuchFile.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksvi9wACgkQJdeBCYSNAAM7ygCffCBwDol9uBmR8XgrlxnQTe1e
vOMAoKCYRxzzy7RQmM+Byq7iKO7EP0VF
=dmsP
-----END PGP SIGNATURE-----

Revision history for this message

Martin Pool (mbp) wrote on 2010-01-05:

#

2009/12/22 John A Meinel <email address hidden>:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
>> martin> I guess I don't properly understand whether you think
>> martin> my patch is actually wrong or just needs tests. I
>> martin> will try to use your tests with mine and see where we
>> martin> get to.
>>
>> I hope I clarified that: except for 404, I'm fine with a common
>> error handling, may be I just haven't seen yet a good way to
>> implement that but I didn't spend much time on it either and I
>> trust you to find it ;)
>>
>
> If the only code that uses 404 specially is HEAD, then I would be happy
> to have it trap the exception. #1 because I don't think we use
> "transport.has*" in any live code path. LBYL vs EAFTP.
>
> I think the only code that uses it is probably old-format repositories.
> (Where having a revision was indicated by the presence of the
> '.bzr/revisions/$REVID' file.)
>
> I'm pretty sure everything else would treate 404 as NoSuchFile.

I think it would be reasonable for .has() to just catch NoSuchFile.

Having looked at this a bit more: InvalidHttpResponse mixes both "not
valid http" and "not the http code we expected." So perhaps a good
step would be to separate that into "HttpError" containing all the
right fields, then we can translate that into a specific bzr error,
taking into account the type of operation we were doing.

However, since imports from google are apparently now working, this is
not urgent for me.

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message

Vincent Ladeuil (vila) wrote on 2011-02-19:

#

@poolie: From the comments, I get the impression that you have un-pushed changes there. If that's right, can you update this mp ? I'm pretty sure it's in a good enough state to land on trunk if only to allow whoever being able to reproduce the issue to test it.

This probably needs to be refreshed against the current trunk, watch the conflicts with an hawk eye ;)

review: Needs Information

Revision history for this message

Martin Pool (mbp) wrote on 2011-02-21:

#

vila, on the whole, I'll just abandon this branch. I think the situation it was trying to fix may have been a transient problem at Google's end, and there's no longer enough obvious benefit to be worth changing it.

Revision history for this message

Vincent Ladeuil (vila) wrote on 2011-02-21:

#

>>>>> Martin Pool <email address hidden> writes:

    > vila, on the whole, I'll just abandon this branch. I think the
    > situation it was trying to fix may have been a transient problem
    > at Google's end, and there's no longer enough obvious benefit to
    > be worth changing it.

Did you write this before or after commenting on bug #497274 ?

https://bugs.launchpad.net/bzr/+bug/497274/comments/15 kind of imply you
still consider this proposal to be useful no ?

Revision history for this message

Martin Pool (mbp) wrote on 2011-02-21:

#

On 21 February 2011 18:48, Vincent Ladeuil <email address hidden> wrote:
>>>>>> Martin Pool <email address hidden> writes:
>
> > vila, on the whole, I'll just abandon this branch. I think the
> > situation it was trying to fix may have been a transient problem
> > at Google's end, and there's no longer enough obvious benefit to
> > be worth changing it.
>
> Did you write this before or after commenting on bug #497274 ?
>
> https://bugs.launchpad.net/bzr/+bug/497274/comments/15 kind of imply you
> still consider this proposal to be useful no ?

I wrote that before.

I don't think this mp is still in progress, but it can be resurrected
if that's useful.

Bazaar

Merge lp:~mbp/bzr/497274-http-405 into lp:bzr

Commit message

Description of the change

Unmerged revisions

Preview Diff

Subscribers

 === modified file 'NEWS'
 --- NEWS	2009-12-15 19:59:00 +0000
 +++ NEWS	2009-12-16 08:26:16 +0000
@@ -20,6 +20,11 @@
  Bug Fixes
  *********
++* HTTP 405 "Not Allowed" is taken to mean there's no bzr branch or smart
++  server at the URL.  This is sent by Google Code and blocks use of
++  foreign branch plugins.
++  (Martin Pool, #497274)
++
  Improvements
  ************
@@ -32,6 +37,9 @@
  Internals
  *********
++* PyCurl http/https error handling unified; this may have knock-on effects
++  on quirky web servers.  (Martin Pool)
++
  Testing
  *******
 === modified file 'bzrlib/bzrdir.py'
 --- bzrlib/bzrdir.py	2009-12-02 17:56:06 +0000
 +++ bzrlib/bzrdir.py	2009-12-16 08:26:16 +0000
@@ -1828,7 +1828,7 @@
          """Return the .bzrdir style format present in a directory."""
          try:
              format_string = transport.get_bytes(".bzr/branch-format")
--        except errors.NoSuchFile:
++        except (errors.NoSuchFile, errors.TransportNotPossible):
              raise errors.NotBranchError(path=transport.base)
          try:
 === modified file 'bzrlib/tests/__init__.py'
 --- bzrlib/tests/__init__.py	2009-12-08 21:46:07 +0000
 +++ bzrlib/tests/__init__.py	2009-12-16 08:26:16 +0000
@@ -3934,6 +3934,7 @@
          'bzrlib.symbol_versioning',
          'bzrlib.tests',
          'bzrlib.timestamp',
++        'bzrlib.transport.http',
          'bzrlib.version_info_formats.format_custom',
+         ]
 === modified file 'bzrlib/transport/http/__init__.py'
 --- bzrlib/transport/http/__init__.py	2009-11-26 14:39:31 +0000
 +++ bzrlib/transport/http/__init__.py	2009-12-16 08:26:16 +0000
@@ -205,6 +205,51 @@
      # use.
      _get_max_size = 0
++    def _raise_mapped_error(self, url, code, response_body,
++        context_message=None):
++        """Translate an http error into a bzrlib exception.
++
++        Some methods may choose to override this for particular cases.
++
++        The URL and code are automatically included as appropriate.
++
++        :param code: integer response code
++        :param response_body: string containing the body of the error response
++            (typically html)
++        """
++        # XXX: This should also be used through the urllib implementation, but
++        # it's not yet
++        mutter("http error: %d %s" % (code, url))
++        mutter("  response body: %r" % response_body)
++        if response_body:
++            plaintext_body = unhtml_roughly(response_body)
++        else:
++            plaintext_body = ''
++        if code == 404:
++            raise errors.NoSuchFile(url)
++        elif code == 403:
++            # "The server understood the request, but is refusing to fulfill
++            # it. Authorization will not help and the request SHOULD NOT be
++            # repeated."
++            raise errors.InvalidHttpResponse(url, '403 Forbidden')
++        elif code == 405:
++            # "The method specified in the Request-Line is not allowed for the
++            # resource identified by the Request-URI. The response MUST
++            # include an Allow header containing a list of valid methods for
++            # the requested resource."
++            #
++            # Sent by Google code when probing for .bzr
++            # <https://bugs.edge.launchpad.net/bzr/+bug/497274>
++            raise errors.TransportNotPossible(url, '405 Not Allowed: '
++                + plaintext_body)
++        else:
++            if context_message is None:
++                msg = ''
++            else:
++                msg = ': ' + context_message
++            raise errors.InvalidHttpResponse(
++                url, 'http response %d%s: %s' % (code, msg, plaintext_body))
++
      def _readv(self, relpath, offsets):
          """Get parts of the file at the given relative path.
@@ -614,10 +659,13 @@
              t = self._http_transport_ref()
              code, body_filelike = t._post(bytes)
              if code != 200:
++                # this should normally be handled by _post, but check it here
                  raise InvalidHttpResponse(
                      t._remote_path('.bzr/smart'),
                      'Expected 200 response code, got %r' % (code,))
--        except (errors.InvalidHttpResponse, errors.ConnectionReset), e:
++        except (errors.TransportNotPossible,
++                errors.InvalidHttpResponse,
++                errors.ConnectionReset), e:
              raise errors.SmartProtocolError(str(e))
          return body_filelike
@@ -661,3 +709,12 @@
      def _finished_reading(self):
          """See SmartClientMediumRequest._finished_reading."""
          pass
++
++
++def unhtml_roughly(maybe_html):
++    """Very approximate html->text translation, for presenting error bodies.
++
++    >>> unhtml_roughly("<b>bad</b> things happened\\n")
++    ' bad  things happened '
++    """
++    return re.subn(r"(<[^>]*>|\n|&nbsp;)", " ", maybe_html)[0]
 === modified file 'bzrlib/transport/http/_pycurl.py'
 --- bzrlib/transport/http/_pycurl.py	2009-08-19 16:33:39 +0000
 +++ bzrlib/transport/http/_pycurl.py	2009-12-16 08:26:16 +0000
@@ -1,4 +1,4 @@
--# Copyright (C) 2006 Canonical Ltd
++# Copyright (C) 2006, 2009 Canonical Ltd
+ #
  # This program is free software; you can redistribute it and/or modify
  # it under the terms of the GNU General Public License as published by
@@ -206,13 +206,10 @@
          code = curl.getinfo(pycurl.HTTP_CODE)
          data.seek(0)
--        if code == 404:
--            raise errors.NoSuchFile(abspath)
--        if code != 200:
--            self._raise_curl_http_error(
--                curl, 'expected 200 or 404 for full response.')
--
--        return code, data
++        if code == 200:
++            return code, data
++        else:
++            self._raise_curl_http_error(curl)
      # The parent class use 0 to minimize the requests, but since we can't
      # exploit the results as soon as they are received (pycurl limitation) we'd
@@ -237,16 +234,21 @@
          code = curl.getinfo(pycurl.HTTP_CODE)
--        if code == 404: # not found
--            raise errors.NoSuchFile(abspath)
++        if code in (200, 206):
++            msg = self._parse_headers(header)
++            return code, response.handle_response(abspath, code, msg, data)
          elif code in (400, 416):
              # We don't know which, but one of the ranges we specified was
              # wrong.
++            #
++            # 400 is not strictly 'invalid range', but rather 'bad request'
++            # but (perhaps) some servers use it for that meaning.  Aside from
++            # that, this could go into _raise_curl_http_error.
              raise errors.InvalidHttpRange(abspath, range_header,
                                            'Server return code %d'
                                            % curl.getinfo(pycurl.HTTP_CODE))
--        msg = self._parse_headers(header)
--        return code, response.handle_response(abspath, code, msg, data)
++        else:
++            self._raise_curl_http_error(curl)
      def _parse_headers(self, status_and_headers):
          """Transform the headers provided by curl into an HTTPMessage"""
@@ -285,26 +287,31 @@
                  raise
          data.seek(0)
          code = curl.getinfo(pycurl.HTTP_CODE)
--        msg = self._parse_headers(header)
--        return code, response.handle_response(abspath, code, msg, data)
--
--
--    def _raise_curl_http_error(self, curl, info=None):
++        if code == 200:
++            msg = self._parse_headers(header)
++            return code, response.handle_response(abspath, code, msg, data)
++        else:
++            self._raise_curl_http_error(curl, body=data)
++
++
++    def _raise_curl_http_error(self, curl, info=None, body=None):
++        """Common curl->bzrlib error translation.
++
++        Some methods may choose to override this for particular cases.
++
++        The URL and code are automatically included as appropriate.
++
++        :param info: Extra information to include in the message.
++        :param body: File-like object from which the body of the page can be read.
++        """
          code = curl.getinfo(pycurl.HTTP_CODE)
          url = curl.getinfo(pycurl.EFFECTIVE_URL)
--        # Some error codes can be handled the same way for all
--        # requests
--        if code == 403:
--            raise errors.TransportError(
--                'Server refuses to fulfill the request (403 Forbidden)'
--                ' for %s' % url)
++        if body is not None:
++            response_body = body.read()
          else:
--            if info is None:
--                msg = ''
--            else:
--                msg = ': ' + info
--            raise errors.InvalidHttpResponse(
--                url, 'Unable to handle http code %d%s' % (code,msg))
++            response_body = None
++        self._raise_mapped_error(url, code, response_body=response_body,
++            context_message=info)
      def _debug_cb(self, kind, text):
          if kind in (pycurl.INFOTYPE_HEADER_IN, pycurl.INFOTYPE_DATA_IN,