Bazaar

Code review comment for lp:~mbp/bzr/497274-http-405

497274-http-405
Merge into bzr.dev

Revision history for this message

Martin Pool (mbp) wrote on 2009-12-21:

> > This tries to handle 'http 405 not allowed' from Google code as meaning "not
> > allowed to look for .bzr", ie there's no bzr branch there. That may allow
> > foreign branches to work against it. However, their servers keep falling
> over
> > with '503 not available' errors, which make testing it difficult.
>
> Then look at lp:~vila/bzr/497274-http-405 for an alternative, minimal (and
> probably incomplete)
> fix with tests.

I'll merge that into mine and see how it looks.

> _post() is used only by smart_http_request() and the later already translate
> InvalidHttpResponse into SmartProtocolError which higher layers interpret as
> NotABranch when needed.
>
> >
> > I've only tested this interactively not added tests. I probably should.
>
> See above.
>
> >
> > This prints and logs the actual message from the http response page.
>
> I doubt users are really interested in the general case, muttering may be
> enough though.

They might not be, but only giving the status code is not very helpful if there is something mysterious going on, like the server being overloaded. We tend to get bug reports about this - though just getting it into the trace should help us with that.

>
> >
> > It also tries to unify http response handling currently scattered across
> > _pycurl.py into one method in the base class.
>
> I'm pretty sure I went mostly in the opposite direction in the past based on
> problems
> encountered while implementing the webdav support.
>
> I didn't try to unify both implementations either... mostly because we will
> deprecate pycurl
> one day and I was happy with the urllib one. The idea there is that there are
> three categories
> of error codes:
> - the ones that represent success,
> - the ones that represent known failures,
> - the others
>
> Upon reception, the callers must handle the success cases, may handle some
> known
> failures specifically before relying on a fallback for the remaining ones.
>
> Basically the http requests are generic enough that the same error code can be
> interpreted
> differently in different contexts. Unifying the error handling too much makes
> it harder
> to raise the correct bzr exceptions.
>
> So far, only 403 was really unambiguously common, even 404 needed to be
> handled separately
> for HEAD (which is the main reason why I prefer to *not* have it in the common
> code).
>
> All in all, that gives a coherent way to code all the methods with limited
> duplication
> (mostly 404 but everybody knows what 404 means :)

There are two types of possible duplication: across different call paths (get vs post vs others) and across different http client implementations. I can see how different callers might want different interpretations for some codes, but I don't see why we would want different interpretations across urllib and pycurl, unless it really is something where the client library interferes with the result, as it may for redirects.

>
> > This may be slightly dangerous
> > to change as it's lightly tested and tries to work around server quirks, but
> > any breakage should be shallow and it's worth clearing it up I think. I
> > haven't yet gone through the urllib implementation.
>
> See above, that's a one-line fix, the pycurl implementation doesn't even need
> to be fixed.

ok,

> That may explains why you can't test it against google, I suspect others use
> the
> urllib implementation and needs the fix.
>
> Note that I fixed a bug in send_http_smart_request so you want to backport at
> least that part.
>
> So I vote needs_fixing and will mark as Work In Progress.

So your code has, in handle_response:

    elif code == 405:
        # The server refused to handle the request, the data presumably
        # contains details about the error and not useful data, but we leave
        # that to the higher levels.
        return data

which strongly looks like you're going to treat the html text of the error response as a valid response body, which is definitely not right. OK, apparently this doesn't necessarily happen because the base class send_http_smart_request translates it, but there's still some risk of that.

If you pass around (http_status, body) too much then multiple layers need to know how to interpret the status code. Without understanding it, they can't understand how to use the body data. For instance, send_http_smart_request assumes that only 200 means success, but that's not quite absolutely true.

So I'd like to have one reasonably well defined place that turns error-like responses into exceptions. Then if some code wants to handle them specially, it can always catch the exception. I think it's a bit bad that send_http_smart_request needs to both check the status and also catch exceptions.

I guess I don't properly understand whether you think my patch is actually wrong or just needs tests. I will try to use your tests with mine and see where we get to.

> > This tries to handle 'http 405 not allowed' from Google code as meaning "not
> > allowed to look for .bzr", ie there's no bzr branch there.  That may allow
> > foreign branches to work against it.  However, their servers keep falling
> over
> > with '503 not available' errors, which make testing it difficult.
> 
> Then look at lp:~vila/bzr/497274-http-405 for an alternative, minimal (and
> probably incomplete)
> fix with tests.

I'll merge that into mine and see how it looks.

> _post() is used only by smart_http_request() and the later already translate
> InvalidHttpResponse into SmartProtocolError which higher layers interpret as
> NotABranch when needed.
> 
> >
> > I've only tested this interactively not added tests.  I probably should.
> 
> See above.
> 
> >
> > This prints and logs the actual message from the http response page.
> 
> I doubt users are really interested in the general case, muttering may be
> enough though.

They might not be, but only giving the status code is not very helpful if there is something mysterious going on, like the server being overloaded.  We tend to get bug reports about this - though just getting it into the trace should help us with that.

> 
> >
> > It also tries to unify http response handling currently scattered across
> > _pycurl.py into one method in the base class.
> 
> I'm pretty sure I went mostly in the opposite direction in the past based on
> problems
> encountered while implementing the webdav support.
> 
> I didn't try to unify both implementations either... mostly because we will
> deprecate pycurl
> one day and I was happy with the urllib one. The idea there is that there are
> three categories
> of error codes:
> - the ones that represent success,
> - the ones that represent known failures,
> - the others
> 
> Upon reception, the callers must handle the success cases, may handle some
> known
> failures specifically before relying on a fallback for the remaining ones.
> 
> Basically the http requests are generic enough that the same error code can be
> interpreted
> differently in different contexts. Unifying the error handling too much makes
> it harder
> to raise the correct bzr exceptions.
> 
> So far, only 403 was really unambiguously common, even 404 needed to be
> handled separately
> for HEAD (which is the main reason why I prefer to *not* have it in the common
> code).
> 
> All in all, that gives a coherent way to code all the methods with limited
> duplication
> (mostly 404 but everybody knows what 404 means :)

There are two types of possible duplication: across different call paths (get vs post vs others) and across different http client implementations.  I can see how different callers might want different interpretations for some codes, but I don't see why we would want different interpretations across urllib and pycurl, unless it really is something where the client library interferes with the result, as it may for redirects.

> 
> > This may be slightly dangerous
> > to change as it's lightly tested and tries to work around server quirks, but
> > any breakage should be shallow and it's worth clearing it up I think.  I
> > haven't yet gone through the urllib implementation.
> 
> See above, that's a one-line fix, the pycurl implementation doesn't even need
> to be fixed.

ok, 
 
> That may explains why you can't test it against google, I suspect others use
> the
> urllib implementation and needs the fix.
> 
> Note that I fixed a bug in send_http_smart_request so you want to backport at
> least that part.
> 
> So I vote needs_fixing and will mark as Work In Progress.

So your code has, in handle_response:

which strongly looks like you're going to treat the html text of the error response as a valid response body, which is definitely not right.  OK, apparently this doesn't necessarily happen because the base class send_http_smart_request translates it, but there's still some risk of that.

If you pass around (http_status, body) too much then multiple layers need to know how to interpret the status code.  Without understanding it, they can't understand how to use the body data.  For instance, send_http_smart_request assumes that only 200 means success, but that's not quite absolutely true.

So I'd like to have one reasonably well defined place that turns error-like responses into exceptions.  Then if some code wants to handle them specially, it can always catch the exception.  I think it's a bit bad that send_http_smart_request needs to both check the status and also catch exceptions.

I guess I don't properly understand whether you think my patch is actually wrong or just needs tests.  I will try to use your tests with mine and see where we get to.

« Back to merge proposal