Merge lp:~robru/gwibber/async into lp:~barry/gwibber/py3

Proposed by Robert Bruce Park
Status: Merged
Merged at revision: 1407
Proposed branch: lp:~robru/gwibber/async
Merge into: lp:~barry/gwibber/py3
Diff against target: 65 lines (+22/-5)
2 files modified
gwibber/gwibber/protocols/base.py (+7/-2)
gwibber/gwibber/tests/test_plugins.py (+15/-3)
To merge this branch: bzr merge lp:~robru/gwibber/async
Reviewer Review Type Date Requested Status
Barry Warsaw Approve
Review via email: mp+122430@code.launchpad.net

Description of the change

Alright barry, sorry for spamming your inbox with that other merge proposal. What happened was I had implemented the threading code inside of a decorator that could be applied to any method, which was sexy in a general purpose kind of way, but then I realized that for the decorator to be effective, every single plugin was going to need to decorate every single method, which was going to be cumbersome and error prone (all it would take is one plugin author to forget to decorate one method and then suddenly all of gwibber is blocked on IO when that method gets called).

After that, I had the flash of brilliance that I could just decorate BasePlugin.__call__ and that would be enough to make all other methods invoked through __call__ to run in threads. That was a good start, but I was a little bit concerned that there were too many layers of misdirection happening (calling plugin('foo') meant invoking the decorator, which invoked a thread, which invoked a lambda, which invoked __call__, which invoked getattr, which invoked the correct operation). Although it wasn't anywhere near as bad, it was starting to look like the old dispatcher code, and I wanted to avoid that at all costs. So at that point I gave up on the decorator protocol and just merged the threading logic directly into the __call__ method. I was also able to factor out the lambda, which means __call__ invokes a thread, which invokes getattr, which invokes the desired operation directly.

The best part about this particular approach is that it means we can test all plugin operations in a very straightforward *synchronous* manner, without needing to write any tests against the threading module.

To post a comment you must log in.
lp:~robru/gwibber/async updated
1407. By Robert Bruce Park

Use thread.join() instead of time.sleep to ensure that the thread has completed before testing for it's results. This resolves a race issue in the test.

1408. By Robert Bruce Park

Rebase branch on barry's renamed plugin/protocols.

Revision history for this message
Barry Warsaw (barry) wrote :

Hi Robert. I'm going to review the newly pushed update, which contains the
change of s/plugin/protocol/

=== modified file 'gwibber/gwibber/protocols/base.py'
--- gwibber/gwibber/protocols/base.py 2012-09-04 20:39:25 +0000
+++ gwibber/gwibber/protocols/base.py 2012-09-04 21:09:51 +0000
> @@ -19,12 +19,17 @@
> ]
>
>
> +from threading import Thread
> +
> +
> class Base:
> def __init__(self, account):
> self.account = account
>
> - def __call__(self, operation, **kws):
> - return getattr(self, operation)(**kws)
> + def __call__(self, operation, *args, **kwargs):
> + """Use threads to avoid blocking on I/O."""
> + Thread(target=getattr(self, operation),
> + args=args, kwargs=kwargs).start()

You probably want to make this a daemon thread, so that if it doesn't
complete, it won't prevent exiting of the main process. Here's what I plan on
committing:

    def __call__(self, operation, *args, **kwargs):
        """Use threads to avoid blocking on I/O."""
        # Let AttributeErrors propagate up.
        method = getattr(self, operation)
        thread = Thread(target=method, args=args, kwargs=kwargs)
        thread.daemon = True
        thread.start()

Although it might be better to squirrel the thread away in an instance
variable, so that it can be directly joined, instead of having to loop through
threading.enumerate() as in your test.

>
> def _login(self):
> """Prevent redundant login attempts."""

=== modified file 'gwibber/gwibber/tests/test_plugins.py'
--- gwibber/gwibber/tests/test_plugins.py 2012-09-04 20:39:25 +0000
+++ gwibber/gwibber/tests/test_plugins.py 2012-09-04 21:09:51 +0000
> @@ -32,9 +32,10 @@
>
> class MyProtocol(Base):
> """Simplest possible protocol implementation to allow testing of Base."""
> + result = ''

Do you think it would be better to initialize the result to None in Base?
Also, what if you try to access result why the thread is still running?
Should you get an exception? Or just None? should trying to get the result
implicitly join the sub-thread, thus guaranteeing that the self.result is the
return value? Of course, that could block, but maybe that's better than
getting a potentially incorrect value.

Putting result in Base could be accompanied by a comment indicating the
required API. Thinking about this some more, I'm worried about race
conditions on self.result. What if two operations on the same plugin are
invoked? Let's say they both complete before the consumer reads self.result.
Won't this cause one result to get overwritten and lost?

I think we're now stumbling into the use cases for the defer package. :)

Revision history for this message
Robert Bruce Park (robru) wrote :
Download full text (4.1 KiB)

On 12-09-04 04:40 PM, Barry Warsaw wrote:
>> + def __call__(self, operation, *args, **kwargs):
>> + """Use threads to avoid blocking on I/O."""
>> + Thread(target=getattr(self, operation),
>> + args=args, kwargs=kwargs).start()
>
> You probably want to make this a daemon thread, so that if it doesn't
> complete, it won't prevent exiting of the main process.

Ah, ok. I wasn't sure what daemon mode meant specifically. That does
sound reasonable.

> Here's what I plan on committing:
>
> def __call__(self, operation, *args, **kwargs):
> """Use threads to avoid blocking on I/O."""
> # Let AttributeErrors propagate up.
> method = getattr(self, operation)

Do you really need to assign to this variable called 'method'? It's a
pet peeve of mine to assign variables that only get used once. Python
programmers are obsessed with doing this and I can't figure out why. I
guess it helps readability but it does introduce a performance penalty,
even if minor.

> thread = Thread(target=method, args=args, kwargs=kwargs)
> thread.daemon = True
> thread.start()
>
> Although it might be better to squirrel the thread away in an instance
> variable, so that it can be directly joined, instead of having to loop through
> threading.enumerate() as in your test.

I don't like it being an instance variable, that keeps a reference to it
around that keeps it alive after it exits. I like the idea of the thread
being "inaccessible" in the background, just like when you are using
some other asynch API (like GObject or even expat or whatever), you
don't have any way of polling it until it calls your callback.

The other problem with it being an instance variable is that __call__ is
going to get called a lot, thus is going to make a bunch of threads, so
you can't just have self.thread, it'd have to be a list() or a set() or
something, then you have to do a bunch of management on that. easier
just to let the reference go out of scope and let the garbage collector
pick it up once it stops executing. My goal here is ultimate simplicity,
and comprehensibility in the code. Don't over-engineer! ;-)

Also, threading.enumerate was quite good I thought, I had originally
written that as time.sleep(0.5) but then figured threading.enumerate
would block for testing purposes without wasting extra time.

>> class MyProtocol(Base):
>> """Simplest possible protocol implementation to allow testing of Base."""
>> + result = ''
>
> Do you think it would be better to initialize the result to None in Base?

No, because I do not intend self.result to actually be used in the real
API, that was just a bit of testing instrumentation for the test suite
only. Real API should be calling a callback at the end of the method,
not setting some variable. if all it does is set some variable, then we
will have to have some other thing that polls that variable. That's
stupid. Callbacks are a way smarter thing to do here. Just think of
self.result as a 'noop' callback for the 'noop' method.

> Also, what if you try to access result why the thread is still running?
> Should you get an exception? ...

Read more...

Revision history for this message
Barry Warsaw (barry) wrote :
Download full text (3.2 KiB)

On Sep 04, 2012, at 10:07 PM, Robert Bruce Park wrote:

>> Here's what I plan on committing:
>>
>> def __call__(self, operation, *args, **kwargs):
>> """Use threads to avoid blocking on I/O."""
>> # Let AttributeErrors propagate up.
>> method = getattr(self, operation)
>
>Do you really need to assign to this variable called 'method'? It's a
>pet peeve of mine to assign variables that only get used once. Python
>programmers are obsessed with doing this and I can't figure out why. I
>guess it helps readability but it does introduce a performance penalty,
>even if minor.

Readability is probably the #1 reason why it's done this way, and especially
if it would send the line off into the rightward nether regions. But also, it
can provide a useful place to add some comments and I'm personally quite
verbose when it comes to commenting my code.

I think Python programmers often have a visceral reaction to stuffing too much
into one-liners. As for performance penalties, it will almost never make any
practical difference (and Pythonistas general attitude is that you still
shouldn't care until there's actually a problem with measurements to back it
up with. ;).
>
>> thread = Thread(target=method, args=args, kwargs=kwargs)
>> thread.daemon = True
>> thread.start()
>>
>> Although it might be better to squirrel the thread away in an instance
>> variable, so that it can be directly joined, instead of having to loop
>> through threading.enumerate() as in your test.
>
>I don't like it being an instance variable, that keeps a reference to it
>around that keeps it alive after it exits. I like the idea of the thread
>being "inaccessible" in the background, just like when you are using
>some other asynch API (like GObject or even expat or whatever), you
>don't have any way of polling it until it calls your callback.
>
>The other problem with it being an instance variable is that __call__ is
>going to get called a lot, thus is going to make a bunch of threads, so
>you can't just have self.thread, it'd have to be a list() or a set() or
>something, then you have to do a bunch of management on that. easier
>just to let the reference go out of scope and let the garbage collector
>pick it up once it stops executing. My goal here is ultimate simplicity,
>and comprehensibility in the code. Don't over-engineer! ;-)

Yep, fair enough. I started experimenting with a different style and ended up
wanting to hide the thread in an outer class that gets returned. It has a
'result' property that will not return a result unless it's ready.

But if we can pretty much guarantee that the async __call__() will never
return a result in real life, then we can as you say keep it simple.

>Also, threading.enumerate was quite good I thought, I had originally
>written that as time.sleep(0.5) but then figured threading.enumerate
>would block for testing purposes without wasting extra time.

Yeah, you'd have to put the time.sleep() inside the operation to be an
accurate test of the async-ness.

Okay, I will merge the branch largely as-is (though I can't resist pissing on
the style hydrant just a little bit :). I w...

Read more...

Revision history for this message
Barry Warsaw (barry) wrote :

Approved and merged with some minor changes and adding the refusal to run non-public operations.

review: Approve
Revision history for this message
Robert Bruce Park (robru) wrote :

On 12-09-04 05:22 PM, Barry Warsaw wrote:
> I think Python programmers often have a visceral reaction to stuffing too much
> into one-liners. As for performance penalties, it will almost never make any
> practical difference (and Pythonistas general attitude is that you still
> shouldn't care until there's actually a problem with measurements to back it
> up with. ;).

Fair enough. My last big project was doing lots of data parsing,
particularly with expat, and I got so lost in profiling-hell that I now
obsessively do things that are known to be faster, readability be damned ;-)

>> The other problem with it being an instance variable is that __call__ is
>> going to get called a lot, thus is going to make a bunch of threads, so
>> you can't just have self.thread, it'd have to be a list() or a set() or
>> something, then you have to do a bunch of management on that. easier
>> just to let the reference go out of scope and let the garbage collector
>> pick it up once it stops executing. My goal here is ultimate simplicity,
>> and comprehensibility in the code. Don't over-engineer! ;-)
>
> Yep, fair enough. I started experimenting with a different style and ended up
> wanting to hide the thread in an outer class that gets returned. It has a
> 'result' property that will not return a result unless it's ready.
>
> But if we can pretty much guarantee that the async __call__() will never
> return a result in real life, then we can as you say keep it simple.

I think we can make that guarantee. I'm not sure when a return value is
going to make sense really, most of the time you're going to want to
call something that results in something getting displayed on the screen.

>> Also, threading.enumerate was quite good I thought, I had originally
>> written that as time.sleep(0.5) but then figured threading.enumerate
>> would block for testing purposes without wasting extra time.
>
> Yeah, you'd have to put the time.sleep() inside the operation to be an
> accurate test of the async-ness.

No, it wasn't testing asyncness, I just needed a way to say "wait for
the thread to finish in order to test that the result is correct", my
first attempt used time.sleep, then I realized that even half a second
was actually waiting much too long, so thread.enumerate offered a very
clean way to say "wait the perfect amount of time for all threads to
complete, and THEN run the test", no wasteful sleeping.

> Okay, I will merge the branch largely as-is (though I can't resist pissing on
> the style hydrant just a little bit :). I want to make it explicit that we do
> not expect, nor really support, __call__() setting any results.

Ok, fair.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'gwibber/gwibber/protocols/base.py'
2--- gwibber/gwibber/protocols/base.py 2012-09-04 20:39:25 +0000
3+++ gwibber/gwibber/protocols/base.py 2012-09-04 21:11:19 +0000
4@@ -19,12 +19,17 @@
5 ]
6
7
8+from threading import Thread
9+
10+
11 class Base:
12 def __init__(self, account):
13 self.account = account
14
15- def __call__(self, operation, **kws):
16- return getattr(self, operation)(**kws)
17+ def __call__(self, operation, *args, **kwargs):
18+ """Use threads to avoid blocking on I/O."""
19+ Thread(target=getattr(self, operation),
20+ args=args, kwargs=kwargs).start()
21
22 def _login(self):
23 """Prevent redundant login attempts."""
24
25=== modified file 'gwibber/gwibber/tests/test_plugins.py'
26--- gwibber/gwibber/tests/test_plugins.py 2012-09-04 20:39:25 +0000
27+++ gwibber/gwibber/tests/test_plugins.py 2012-09-04 21:11:19 +0000
28@@ -32,9 +32,10 @@
29
30 class MyProtocol(Base):
31 """Simplest possible protocol implementation to allow testing of Base."""
32+ result = ''
33
34 def noop(self, one=None, two=None):
35- return '{}:{}'.format(one, two)
36+ self.result = '{}:{}'.format(one, two)
37
38 def _locked_login(self, old_token):
39 self.account['access_token'] = 'fake_token'
40@@ -43,12 +44,23 @@
41 class TestProtocols(unittest.TestCase):
42 """Test protocol implementations."""
43
44- def test_basic_api(self):
45+ def test_basic_api_sync(self):
46 # Protocols get instantiated with an account, and the instance gets
47 # called to perform an operation.
48 fake_account = object()
49 my_protocol = MyProtocol(fake_account)
50- self.assertEqual(my_protocol('noop', one='one', two='two'), 'one:two')
51+ my_protocol.noop(one='foo', two='bar')
52+ self.assertEqual(my_protocol.result, 'foo:bar')
53+
54+ def test_basic_api_async(self):
55+ fake_account = object()
56+ my_protocol = MyProtocol(fake_account)
57+ # Using __call__ makes invocation happen asynchronously in a thread
58+ my_protocol('noop', one='one', two='two')
59+ for thread in threading.enumerate()[1:]:
60+ # Wait for the thread to finish processing.
61+ thread.join()
62+ self.assertEqual(my_protocol.result, 'one:two')
63
64 def test_basic_login(self):
65 # Try to log in twice. The second login attempt returns False because

Subscribers

People subscribed via source and target branches