Merge lp:~stub/launchpad/garbo into lp:launchpad

Proposed by Stuart Bishop
Status: Merged
Approved by: Jeroen T. Vermeulen
Approved revision: no longer in the source branch.
Merged at revision: not available
Proposed branch: lp:~stub/launchpad/garbo
Merge into: lp:launchpad
Diff against target: 192 lines (+39/-27)
2 files modified
lib/canonical/launchpad/doc/looptuner.txt (+4/-11)
lib/canonical/launchpad/utilities/looptuner.py (+35/-16)
To merge this branch: bzr merge lp:~stub/launchpad/garbo
Reviewer Review Type Date Requested Status
Jeroen T. Vermeulen (community) code Approve
Review via email: mp+22680@code.launchpad.net

Commit message

Let garbo tasks timeout while waiting for replication lag or long running transactions

Description of the change

garbo-*.py lets a timeout be set for tasks so they abort if they don't complete withing a given time frame. This timeout is not checked for when blocking for long running transactions or for replication lag.

This branch fixes that. Now tasks will correctly be aborted when they timeout and report this, rather than block indefinitely.

To post a comment you must log in.
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

Hi Stuart,

Nice to see loop tuning continue to get better. One design question: your timeout check accompanies time.sleep calls. Why not have:

 * An attribute flag that says "I consider myself timed out" (even if it hasn't happened yet but is bound to happen because of impending sleep)

 * A _checkTimeLeft(seconds=0) method that you call in various places to set the timeout flag if the requisite number of seconds isn't available

 * A self.sleep method that checks for remaining time and time.sleeps only if the sleep ends before the deadline, so you don't have to check this for every time.sleep

 * Instead of time checks, a check for the timeout flag.

On a more nitpicky note, _is_timed_out is not the right spelling for a method; that'd be _isTimedOut. And I don't see the point of a leading underscore in _msg_counter.

Jeroen

review: Needs Information
Revision history for this message
Stuart Bishop (stub) wrote :

On Mon, Apr 5, 2010 at 7:29 PM, Jeroen T. Vermeulen <email address hidden> wrote:

> Nice to see loop tuning continue to get better.  One design question: your timeout check accompanies time.sleep calls.  Why not have:
>
>  * An attribute flag that says "I consider myself timed out" (even if it hasn't happened yet but is bound to happen because of impending sleep)
>
>  * A _checkTimeLeft(seconds=0) method that you call in various places to set the timeout flag if the requisite number of seconds isn't available
>
>  * A self.sleep method that checks for remaining time and time.sleeps only if the sleep ends before the deadline, so you don't have to check this for every time.sleep
>
>  * Instead of time checks, a check for the timeout flag.

I've added a self._sleep method to simplify the 'if not timed out
sleep' stanzas.

Having a flag in addition to the check method is less clear, and we
are only doing this a few times per second so it isn't a performance
issue.

> On a more nitpicky note, _is_timed_out is not the right spelling for a method; that'd be _isTimedOut.  And I don't see the point of a leading underscore in _msg_counter.

I've fixed both of these. And changed the 'extra' parameter to 'extra seconds'.

--
Stuart Bishop <email address hidden>
http://www.stuartbishop.net/

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

The reason I brought up an "I'm timed out" flag was concern that the tuner might decide to go straight back to work after a sleep was short-circuited because it would have timed out. But turning some "while True" loops into ones with timeout checks in the loop condition is better. Approved.

review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/canonical/launchpad/doc/looptuner.txt'
2--- lib/canonical/launchpad/doc/looptuner.txt 2009-07-14 05:50:46 +0000
3+++ lib/canonical/launchpad/doc/looptuner.txt 2010-04-07 16:01:48 +0000
4@@ -72,7 +72,7 @@
5 ... self.clock = 0
6 ...
7 ... def isDone(self):
8- ... done = (self.iteration > len(self.timings)+1)
9+ ... done = self.iteration >= len(self.timings)
10 ... if done:
11 ... print "done"
12 ... return done
13@@ -80,6 +80,8 @@
14 ... def __call__(self, chunk_size):
15 ... print_change(self.last_chunk_size, chunk_size)
16 ... self.last_chunk_size = chunk_size
17+ ... self.clock += self.timings[self.iteration]
18+ ... self.iteration += 1
19
20
21 In combination with that, we tweak LoopTuner to simulate the timings we gave
22@@ -88,12 +90,7 @@
23
24 >>> class TestTuner(LoopTuner):
25 ... def _time(self):
26- ... clock_now = float(self.operation.clock)
27- ... if self.operation.iteration < len(self.operation.timings):
28- ... next = self.operation.timings[self.operation.iteration]
29- ... self.operation.clock += next
30- ... self.operation.iteration += 1
31- ... return clock_now
32+ ... return float(self.operation.clock)
33
34
35 === Trivial Case ===
36@@ -103,7 +100,6 @@
37 >>> body = PlannedLoop([])
38 >>> loop = TestTuner(body, goal_seconds, 100)
39 >>> loop.run()
40- start
41 done
42
43
44@@ -131,7 +127,6 @@
45 increased (0.0)
46 same (0.0)
47 decreased (-0.0)
48- same (0.0)
49 done
50
51
52@@ -152,7 +147,6 @@
53 same (0.0)
54 same (0.0)
55 same (0.0)
56- same (0.0)
57 done
58
59
60@@ -177,7 +171,6 @@
61 decreased (-0.1)
62 increased (0.1)
63 same (0.0)
64- same (0.0)
65 done
66
67
68
69=== modified file 'lib/canonical/launchpad/utilities/looptuner.py'
70--- lib/canonical/launchpad/utilities/looptuner.py 2010-03-25 12:22:13 +0000
71+++ lib/canonical/launchpad/utilities/looptuner.py 2010-04-07 16:01:48 +0000
72@@ -94,17 +94,21 @@
73 else:
74 self.log = log
75
76+ def _isTimedOut(self, extra_seconds=0):
77+ """Return True if the task will be timed out in extra_seconds."""
78+ return self.abort_time is not None and (
79+ self._time() + extra_seconds >= self.start_time + self.abort_time)
80+
81 def run(self):
82 """Run the loop to completion."""
83 chunk_size = self.minimum_chunk_size
84 iteration = 0
85 total_size = 0
86- start_time = self._time()
87- last_clock = start_time
88+ self.start_time = self._time()
89+ last_clock = self.start_time
90 while not self.operation.isDone():
91
92- if (self.abort_time is not None
93- and last_clock >= start_time + self.abort_time):
94+ if self._isTimedOut():
95 self.log.warn(
96 "Task aborted after %d seconds." % self.abort_time)
97 break
98@@ -137,7 +141,7 @@
99 chunk_size = min(chunk_size, self.maximum_chunk_size)
100 iteration += 1
101
102- total_time = last_clock - start_time
103+ total_time = last_clock - self.start_time
104 average_size = total_size/max(1, iteration)
105 average_speed = total_size/max(1, total_time)
106 self.log.debug(
107@@ -161,7 +165,7 @@
108 if self.cooldown_time is None or self.cooldown_time <= 0.0:
109 return bedtime
110 else:
111- time.sleep(self.cooldown_time)
112+ self._sleep(self.cooldown_time)
113 return self._time()
114
115 def _time(self):
116@@ -172,6 +176,15 @@
117 """
118 return time.time()
119
120+ def _sleep(self, seconds):
121+ """Sleep.
122+
123+ If the sleep interval would put us over the tasks timeout,
124+ do nothing.
125+ """
126+ if not self._isTimedOut(self.cooldown_time):
127+ time.sleep(self.cooldown_time)
128+
129
130 def timedelta_to_seconds(td):
131 return 24 * 60 * td.days + td.seconds
132@@ -208,7 +221,7 @@
133 """When database replication lag is high, block until it drops."""
134 # Lag is most meaningful on the master.
135 store = getUtility(IStoreSelector).get(MAIN_STORE, MASTER_FLAVOR)
136- while True:
137+ while not self._isTimedOut():
138 lag = store.execute("SELECT replication_lag()").get_one()[0]
139 if lag is None:
140 lag = timedelta(seconds=0)
141@@ -225,7 +238,7 @@
142 % time_to_sleep)
143
144 transaction.abort() # Don't become a long running transaction!
145- time.sleep(time_to_sleep)
146+ self._sleep(time_to_sleep)
147
148 def _blockForLongRunningTransactions(self):
149 """If there are long running transactions, block to avoid making
150@@ -233,7 +246,8 @@
151 if self.long_running_transaction is None:
152 return
153 store = getUtility(IStoreSelector).get(MAIN_STORE, MASTER_FLAVOR)
154- while True:
155+ msg_counter = 0
156+ while not self._isTimedOut():
157 results = list(store.execute("""
158 SELECT
159 CURRENT_TIMESTAMP - xact_start,
160@@ -247,13 +261,18 @@
161 """ % self.long_running_transaction).get_all())
162 if not results:
163 break
164- for runtime, procpid, usename, datname, query in results:
165- self.log.info(
166- "Blocked on %s old xact %s@%s/%d - %s."
167- % (runtime, usename, datname, procpid, query))
168- self.log.info("Sleeping for 5 minutes.")
169+
170+ # Check for long running transactions every 30 seconds, but
171+ # only report every 10 minutes to avoid log spam.
172+ msg_counter += 1
173+ if msg_counter % 20 == 1:
174+ for runtime, procpid, usename, datname, query in results:
175+ self.log.info(
176+ "Blocked on %s old xact %s@%s/%d - %s."
177+ % (runtime, usename, datname, procpid, query))
178+ self.log.info("Sleeping for up to 10 minutes.")
179 transaction.abort() # Don't become a long running transaction!
180- time.sleep(5*60)
181+ self._sleep(30)
182
183 def _coolDown(self, bedtime):
184 """As per LoopTuner._coolDown, except we always wait until there
185@@ -263,7 +282,7 @@
186 self._blockWhenLagged()
187 if self.cooldown_time is not None and self.cooldown_time > 0.0:
188 remaining_nap = self._time() - bedtime + self.cooldown_time
189- if remaining_nap > 0.0:
190+ if remaining_nap > 0.0 and not self._is_timed_out(remaining_nap):
191 time.sleep(remaining_nap)
192 return self._time()
193