I WOW that is indeed a long conversation (an instructive too), but there is a concern I have, if we were to fix this issue regarding the returning of exceptions we should first patch the update method of python-couchdb since it is the one really performing the job, the put_records_batch simple uses it and later adds the required attachments. From my point of view it is not worth to look at a solution at desktpcouch level but on python-couchdb. I'm sure that the guy at python-couchdb will understand that returning instances of exceptions is not the best way, but will they want to depend on twisted? I personally dont think so, nevertheless we will have to talk with them and fix the real code that has the problem, if exceptions are created by python-couchdb we are simply making things worse. I hope I make sense, Manuel > Man, I don't like returning an exception instance. After a long discussion on > #python, though, I'm not sure there's a better way. Entire discussion pasted > below (sorry, it's quite long). > > If I've got a function which does multiple things, some of > which may fail, but failure of some things doesn't mean that the others didn't > succeed, what's the best way of signalling that failure? If I raise a > ItPartiallyWorked exception, then I'll still need to pass back the successful > results (from the successful actions) in that exception too, which seems a bit > wrong > aquarius, a warning sounds in order > aquarius, just return a tuple that might have None in some bits > also, the "warning" module > lvh, well, that's what we're doing at the moment: we return a > list of results, where each result is either a successful result or an > exception. But returning a list with exception instances in it seems weird. I > mean, we could throw a warning *too* to alert people that it happened, of > course. > aquarius, you probably want to split the separate bits up into smaller > functions that *do* do the right thing and raise an exception, but aren't > taxed with the extra problem of still delivering partial results, like the > composing function > aquarius: output a warning line to stderr ? > lvh, ah, I can't split it up into little bits because I'm > calling an external non-Python thing which actually does all the actions :) > aquarius, yeah no, don't use exception objects as sentinel values > aquarius, use None instead, your users will hate you less > lvh, so I pass all the actions in one go to the external thing > (CouchDB, in this case), and it gives me back a list of successes or failures. > aquarius, does the absolute order matter? IE is the fact that > something is at position 2 meaningful? > if I use None, though, there's no indication of *why* that > action failed. > aquarius, or is it just the relative position (2 comes before 3 and > after 1) > lvh, absolute order does not matter (the list that comes back > is *actually* a list of tuples (id_of_action, success_or_failure) > aquarius, lvh: you can also use a dict {'stepA': 'ok', > 'stepB': 'foo failed'} > if you raise an exception inside a generator and you only catch it > outside of it, is the state lost? > the logical thing to assume would be yes, let me give it a shot > Alberth, yes. that's what we do. What should "foo failed" be, > though? I mean, it could just be a string, but then what I've done is > reinvented string Exceptions :) > aquarius, sentinel value > aquarius, typically yourmodule.FAILURE or None > of course, FAILURE = None # at module scope works fine > lvh, a sentinel value won't tell me what type of failure it > was, though, it'll just tell me that it did fail. I'd like to know why > nice idea, a set of constant values could work > NO_FOO=1 ; NO_BAR =2 > but that's just exceptions reinvented! > aquarius, no, return values do not travel up the stack > exceptions are used to stop execution imho > well, they do: but only once > that is not what you do here, here you want to continue, and > return a collection of results upwards > right > so if there are fifty separate ways it can fail, and for the > "do one action" function I've defined fifty different exceptions, I now need > to define fifty equivalent sentinel values as well so they can be returned by > the multiple-actions function? this seems...less than efficient. > aquarius, if you have fifty different ways it can fail it shouldn't be > one giant black box > aquarius, chopping it up into tiny bits was the first thing I > suggested for a reason ;-) > lvh, why? urllib.urlopen can fail in fifty different ways for > exactly the same reason; the remote web server can throw loads of different > HTTP errors, the network might not be there, etc. > aquarius, and only a few of those are meaningful to catch > lvh, I understand the point about chopping it up, and you are > welcome to call do_one_action twenty times and trap the exception, but the > whole point of the multiple-actions thing is precisely that you can call it > once with twenty things, it calls the external process (CouchDB) once with all > twenty things, and it'll tell you whether each succeeded or failed. It's for > efficiency. > aquarius, and for each of the twenty things twenty *distinct* problems > can happen? > lvh, could do, yes. > aquarius, then I don't understand how your function works at all > aquarius, if you're making an n-ary version of do_one_action i would > expect the n-ary to not return significantly different exceptions > besides perhaps TypeError (the thing you gave me isn't iterable) > lvh, CouchDB is a database whose API is HTTP. So to store a > new document in it you PUT/POST the document contents to a URL specific to > that document. A document PUT may succeed, or it may fail because it conflicts > with an existing document or because it violates the rules about what a > document may contain in many different ways, or because the server is down. > aquarius, I know CouchDB > lvh, ah, ok. so do_many_actions wraps Couch's bulk update > function. > aquarius, right -- but none of those errors are *distinct* > aquarius, all of the failures that happen on the n-ary can also happen > on a single update, right? > correct. > aquarius, now it's making much more sense > but they can happen individually to each component of an > update, and they may happen to some components and not others > we use validation, so that a document must conform to a > specific schema (for instance, a "contact" document must have a name > aquarius, twisted had to solve a similar problem with Deferreds and > Failures > lvh, violating that validation throws an error specific to the > violation ("a contact record must have a name field"), for example > aquarius, sentinel values is how you do it, sorry > lvh, so I don't want to have to define a sentinel value for > every possible failure, since there are *loads*, and we already have > exceptions for this > well, twisted didn't really do it with a sentinel value, but > twisted.python.failure.Failure isn't really what you need > what's actually the problem with passing back an exception > instance, other than inelegance? > aquarius, because then you're using instances as sentinel values > aquarius, the point of sentinel values is that everyone uses the exact > same object > which is why None is so popular: it's a guaranteed singleton > yeah, but...so? I don't really care whether I can deduce that > row5 and row12 failed for the same reason. I never need to test equivalence > between sentinels, which is why I don't think that sentinels are the right way > :) > if you don't need equivalence, why not use a boolean? > aquarius, exceptions travel up the stack until caught > Alberth, because a boolean won't tell me *why* it failed, just > that it did > lvh, only if they're raised. > traveling up the stack is not what you want, because you store state > at the bottom of it > aquarius, I've already told you why the only thing you do with > exceptions is raise > aquarius: how are you deciding that without comparing them? > aquarius, how would *returning* an exception instance be different > from using a sentinel value > besides being more useless because sentinel values are typically used > because you know everyone's using the same one > Alberth, I can compare exception_instance.reason, which is a > string, if I want to; that's effectively a sentinel. > aquarius, sure, but why does it need to be an exception to do that? > pass the string up instead? > Alberth, I'm not sure I like that, strings are a data structure of > last resort > NO_FOO="no foo" ; NO_BAR="no bar" > Alberth, yep, that's what I thought of, but passing back a > string instead of a more complex data structure means that it's hard to pass > additional data. This is (one of the reasons) why string exceptions went away. > aquarius, I don't understand why you want it to be an Exeption rather > than any other type > *Exception > aquarius, I understand your argument for complex data structures and I > agree > aquarius, twisted needed that and that resulted in > twisted.python.failure.Failure > lvh, it doesn't have to be an Exception. It just seemed a bit > silly to me to create some other type MySignallingOfAnErrorThing in fifty > different varieties when I'll use them in a very similar way to Exceptions, > and when I already have Exceptions defined for all these things > seems...wasteful. > aquarius, yeah, but...so? I don't really care whether I can > deduce that row5 and row12 failed for the same reason. I never need to test > equivalence between sentinels, which is why I don't think that sentinels are > the right way :) # could you explain that in more detail? I'm not sure I > understand > also, I don't agree with "when I'll use them in a very similar way to > Exceptions" > you're using them in a way that isn't like an exception at all > both lvh and I believe that an exception object should be used > only for raise/catch, and not as data value. > actually, come to think of it, if you write your thing as a generator > twisted.python.failure.Failure.throwExceptionIntoGenerator might be *exactly* > what you want > imagine that, say, if PUTting a new record to Couch results in > a NoNameDefined problem. My put_one_record function, if Couch throws that > problem, would raise a NoNameDefined exception and I want to tell the punter > that. So calling it would be "try: put_one_record(data); except NoNameDefined: > error_list.add("You didn't define a name") > aquarius, presumably append and not add, but I get what you're saying > aquarius, basically you want to be able to preserve generator state > despite uncaught exceptions > the put_many_records would do something very similar: > "results=put_many_records(list_of_data); for result in results: if > result.problem == NoNameDefinedSentinelValue: error_list.add("You didn't > define a name for %s" % result.id) > aquarius, but I'm pretty sure you can't do that without ugly hacks or > a time machine > or twisted.python.failure.Failure.throwExceptionIntoGenerator > those two bits of code are way similar, it's just that one > uses exceptions and the other doesn't. that's what I mean by "using them > similarly" > (I used "add" 'cos I'm thinking of error_list as a Gtk list > display widget or something, but that's not important :)) > yeah. I couldn't really think of a decent way to do this; it's > all a bit hacky, hence coming here in case there was something cool I don't > know about :) > aquarius, twisted.python.failure.Failure.throwExceptionIntoGenerator > is something cool and you probably didn't know about it > the twisted thing might be good, but...is it independent, or > do I have to turn my app into a Twisted app to get at it? > lvh, already reading about it ;) > aquarius, pretty much everything in twisted.python can be used in 3rd > party non-twisted apps > BUT Failure is explicitly designed for asynchronous error handling > because exceptions aren't good at it > aquarius, in all honesty you're probably better off doing what you're > doing now, except don't make the things you're returning Exceptions > either use sentinel values, or, if you need to pass more data, custom > types > if you don't really care *which* row failed, use custom types > errr, sentinel values > lvh, that's the conclusion I'm coming to...you still think > that I should define fifty different > ThingThatIsSortaLikeAnExceptionButNotReallys, one for each Exception I've got? > aquarius, I think you probably shouldn't have fifty > aquarius, and I think you probably shouldn't have the n-ary thing in > your API > especially if it makes error handling this difficult > why don't people just do a for x in L: try: do_one_thing(x); except > UglyError: continue? > lvh, it's in couch's API for a reason, though, and that's that > making one HTTP call is way, way more efficient if you're trying to update 200 > documents at once. > let alone 20000 :-) > aquarius, if you're convinced you need that, sentinel values or frame > inspection > at the moment you precisely do have to call do_one_thing 20000 > times; we're wrapping couch's bulk_update API now, which is why we're trying > to work out how to do it in a way that doesn't suck :) > frame inspection? > aquarius, you should probably forget I suggested that > every time I've manually crawled up the stack and looked at > frames I've felt guilty about it and six months later I haven't been able to > understand why I did it, but that doesn't mean that there's not some case > somewhere where it's actually a good idea ;) > aquarius, anyway, the reaosn i didn't want exception instances is > because you need to be able to figure out WHICH ONE it is at runtime > which is why identiy in sentinel values is a good idea > WHICH ONE in what sense? I mean, I can tell what type of > exception it is, no? > aquarius, how? typechecking? > a dispatch dict is way cleaner > or a sequence of "if e is mymodule.QUANTUM_TRANSMOGRIFIER_EXPLOSION:" > lvh, I can have a dispatch dict keyed with Exception > subclasses, though, no? and then dispatch on isinstance > aquarius: instead of trying one big do-everything-in-one-call, > why not split it in phases? make an object that collects sub-queries you have > to do (as sub-data objects), a single 'do-it' that takes the collected sub- > queries, and does them together, and stores the exec results back in the sub- > queries in the sub-data objects, then investigate each sub-query object for > the result afterwards? > aquarius, no, you can't dispatch inside a dictionary based on a > boolean value > aquarius, you can do it with type instead of isinstance, but then > you're using the type in order to get to a sentinel value > lvh, yeah, I just realised that my idea's stupid :) > aquarius, it would make much more sense to just use the sentinel > value, surely > aquarius, the reason you would use type is because the type is always > the same object > aquarius, which is a critically important quality of sentinels (but > I'm starting to sound like a stuck record here) > *nod* > aquarius, so basically everything you've come up with yourself is > secretly a sentinel already anyway > I really don't want to have to dictate that every time we > define a new Exception that the API can throw we've gotta also define an > equivalent NotQuiteAnException thing as well which looks identical. It seems > horrid. > aquarius, I think you have too many exceptions > I'd define one single SomethingHasGoneWrongException and set > exception.reason = NotQuiteAnExceptionSubclass if you could do "except" based > on attributes of the exception :) > aquarius, but you can't use except, because you're not raising > anything > in put_many_records, sure. But I do in put_one_record, because > that's the logical way to do it. If I start returning things and then make > everyone check the return value to see if it means there was a problem, I > might as well be a C programmer and kill myself. :) > so basically what we actually want is exceptions that don't lose state > when travelling up the stack > but python doesn't do it > aquarius, ah, right, I was assuming we're talking about the function > that still has to be written > aquarius, there is one way though, I guess, sort of > we are, mostly, and obviously the way you call > put_many_records will differ from the way you call put_one_record, but they > should share as much as possible, purely under the principle of least surprise > :) > aquarius, raise an exception that contains enough state to pick up > where you left off, but I'm not entirely sure what you would want your API to > look like then > if you've already written code that calls put_one_record lots > of times and you want to update your code to the new put_many_records API, you > should't have to totally throw away all your handling code and redo it because > something totally different happens, if at all possible. > aquarius, that would have to be a custom class with a very interesting > __iter__ > since iter is only called once > aquarius, right > aquarius, be right back, I think this might not be entirely terrible > aquarius, you can't do it without some except SomeError, e: > call_the_loop_again(e.enough_state_to_call_the_loop_again) > aquarius, you could do it with exceptions but the exceptions need to > contain enough information to regenerate the state of the generator that you > just lost because it raised an exception and it wasn't caught inside the > generator > aquarius, unfortunately you can't pickle generators > *nod* > aquarius, but you can store return values in a classattr of a Results > class > and clean up on __del__ > yeah, I've wondered about returning a Results object rather > than just a plain list. > continueFrom(e) doesn't seem entirely disgusting but you would need to > call the for loop recursively, afaik you can't change the thing the for loop > is currently iterating over > without disgusting inspection tricks > you can, of course > in C > but I will shoot you in the face if you do it > * aquarius grins > no intention of doing that :) > aquarius, never mind the shooting in the face, I'm not even sure if > it'd be that bad actually > you're doing something unpythonic in order to do something pythonic > also it involves frame inspection, so maybe this would be a fitting > penance for suggesting Exception instances as return values > aquarius, perhaps you should write some of your talking and my > blubbering up and pour it into a comp.lang.python post > that's not a bad idea, actually. > * aquarius adds to todo list > aquarius, to recap my final idea that I realise is disgusting: C code > would figure out through inspection what the iterator is that the for loop > that's iterating over the Results object is currently iterating over > aquarius, then replace that PyObject* pointer with one to a newly > generated iterator based on the last raised exception, which has a reference > to the set of not-yet-processed things > aquarius, either mention that I thought this was ugly or don't mention > my name -- I'd like to be employable if my current job goes tits up > lvh, jesus wept. That's horrid. :) > aquarius, none of the alternatives allow you to use try/except clauses > aquarius, except ones that are worse > lvh, yeah, I see the reason for it, it's just that I have to > be able to maintain this code a year from now and I'm not as clever as you :) > aquarius, this is the bad kind of clever > aquarius, it's the kind of clever that made me realise i need to stop > doing perl > * aquarius laughs > also, if I write that, which I'm not totally sure I'm capable > of, and submit it for code review, my team will boil me alive :) > aquarius, it's not *thhaaaaat* bad > aquarius, the thing you're iterating over is still bound to a name, so > you can just implement Results.__iter__ to store a local copy of the last- > returned iterator > aquarius, that way you don't even have to use C or anythin'! > aquarius, well, except for getting around the exceptions that aren't > really exceptions bit of course > lvh, yeah. food for thoughts. > aquarius, s/thoughts/nightmares/