Merge lp:~stub/launchpad/garbo into lp:launchpad

Proposed by Stuart Bishop
Status: Merged
Approved by: Jeroen T. Vermeulen
Approved revision: no longer in the source branch.
Merged at revision: not available
Proposed branch: lp:~stub/launchpad/garbo
Merge into: lp:launchpad
Diff against target: 192 lines (+39/-27)
2 files modified
lib/canonical/launchpad/doc/looptuner.txt (+4/-11)
lib/canonical/launchpad/utilities/looptuner.py (+35/-16)
To merge this branch: bzr merge lp:~stub/launchpad/garbo
Reviewer Review Type Date Requested Status
Jeroen T. Vermeulen (community) code Approve
Review via email: mp+22680@code.launchpad.net

Commit message

Let garbo tasks timeout while waiting for replication lag or long running transactions

Description of the change

garbo-*.py lets a timeout be set for tasks so they abort if they don't complete withing a given time frame. This timeout is not checked for when blocking for long running transactions or for replication lag.

This branch fixes that. Now tasks will correctly be aborted when they timeout and report this, rather than block indefinitely.

To post a comment you must log in.
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

Hi Stuart,

Nice to see loop tuning continue to get better. One design question: your timeout check accompanies time.sleep calls. Why not have:

 * An attribute flag that says "I consider myself timed out" (even if it hasn't happened yet but is bound to happen because of impending sleep)

 * A _checkTimeLeft(seconds=0) method that you call in various places to set the timeout flag if the requisite number of seconds isn't available

 * A self.sleep method that checks for remaining time and time.sleeps only if the sleep ends before the deadline, so you don't have to check this for every time.sleep

 * Instead of time checks, a check for the timeout flag.

On a more nitpicky note, _is_timed_out is not the right spelling for a method; that'd be _isTimedOut. And I don't see the point of a leading underscore in _msg_counter.

Jeroen

review: Needs Information
Revision history for this message
Stuart Bishop (stub) wrote :

On Mon, Apr 5, 2010 at 7:29 PM, Jeroen T. Vermeulen <email address hidden> wrote:

> Nice to see loop tuning continue to get better.  One design question: your timeout check accompanies time.sleep calls.  Why not have:
>
>  * An attribute flag that says "I consider myself timed out" (even if it hasn't happened yet but is bound to happen because of impending sleep)
>
>  * A _checkTimeLeft(seconds=0) method that you call in various places to set the timeout flag if the requisite number of seconds isn't available
>
>  * A self.sleep method that checks for remaining time and time.sleeps only if the sleep ends before the deadline, so you don't have to check this for every time.sleep
>
>  * Instead of time checks, a check for the timeout flag.

I've added a self._sleep method to simplify the 'if not timed out
sleep' stanzas.

Having a flag in addition to the check method is less clear, and we
are only doing this a few times per second so it isn't a performance
issue.

> On a more nitpicky note, _is_timed_out is not the right spelling for a method; that'd be _isTimedOut.  And I don't see the point of a leading underscore in _msg_counter.

I've fixed both of these. And changed the 'extra' parameter to 'extra seconds'.

--
Stuart Bishop <email address hidden>
http://www.stuartbishop.net/

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

The reason I brought up an "I'm timed out" flag was concern that the tuner might decide to go straight back to work after a sleep was short-circuited because it would have timed out. But turning some "while True" loops into ones with timeout checks in the loop condition is better. Approved.

review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'lib/canonical/launchpad/doc/looptuner.txt'
--- lib/canonical/launchpad/doc/looptuner.txt 2009-07-14 05:50:46 +0000
+++ lib/canonical/launchpad/doc/looptuner.txt 2010-04-07 16:01:48 +0000
@@ -72,7 +72,7 @@
72 ... self.clock = 072 ... self.clock = 0
73 ...73 ...
74 ... def isDone(self):74 ... def isDone(self):
75 ... done = (self.iteration > len(self.timings)+1)75 ... done = self.iteration >= len(self.timings)
76 ... if done:76 ... if done:
77 ... print "done"77 ... print "done"
78 ... return done78 ... return done
@@ -80,6 +80,8 @@
80 ... def __call__(self, chunk_size):80 ... def __call__(self, chunk_size):
81 ... print_change(self.last_chunk_size, chunk_size)81 ... print_change(self.last_chunk_size, chunk_size)
82 ... self.last_chunk_size = chunk_size82 ... self.last_chunk_size = chunk_size
83 ... self.clock += self.timings[self.iteration]
84 ... self.iteration += 1
8385
8486
85In combination with that, we tweak LoopTuner to simulate the timings we gave87In combination with that, we tweak LoopTuner to simulate the timings we gave
@@ -88,12 +90,7 @@
8890
89 >>> class TestTuner(LoopTuner):91 >>> class TestTuner(LoopTuner):
90 ... def _time(self):92 ... def _time(self):
91 ... clock_now = float(self.operation.clock)93 ... return float(self.operation.clock)
92 ... if self.operation.iteration < len(self.operation.timings):
93 ... next = self.operation.timings[self.operation.iteration]
94 ... self.operation.clock += next
95 ... self.operation.iteration += 1
96 ... return clock_now
9794
9895
99=== Trivial Case ===96=== Trivial Case ===
@@ -103,7 +100,6 @@
103 >>> body = PlannedLoop([])100 >>> body = PlannedLoop([])
104 >>> loop = TestTuner(body, goal_seconds, 100)101 >>> loop = TestTuner(body, goal_seconds, 100)
105 >>> loop.run()102 >>> loop.run()
106 start
107 done103 done
108104
109105
@@ -131,7 +127,6 @@
131 increased (0.0)127 increased (0.0)
132 same (0.0)128 same (0.0)
133 decreased (-0.0)129 decreased (-0.0)
134 same (0.0)
135 done130 done
136131
137132
@@ -152,7 +147,6 @@
152 same (0.0)147 same (0.0)
153 same (0.0)148 same (0.0)
154 same (0.0)149 same (0.0)
155 same (0.0)
156 done150 done
157151
158152
@@ -177,7 +171,6 @@
177 decreased (-0.1)171 decreased (-0.1)
178 increased (0.1)172 increased (0.1)
179 same (0.0)173 same (0.0)
180 same (0.0)
181 done174 done
182175
183176
184177
=== modified file 'lib/canonical/launchpad/utilities/looptuner.py'
--- lib/canonical/launchpad/utilities/looptuner.py 2010-03-25 12:22:13 +0000
+++ lib/canonical/launchpad/utilities/looptuner.py 2010-04-07 16:01:48 +0000
@@ -94,17 +94,21 @@
94 else:94 else:
95 self.log = log95 self.log = log
9696
97 def _isTimedOut(self, extra_seconds=0):
98 """Return True if the task will be timed out in extra_seconds."""
99 return self.abort_time is not None and (
100 self._time() + extra_seconds >= self.start_time + self.abort_time)
101
97 def run(self):102 def run(self):
98 """Run the loop to completion."""103 """Run the loop to completion."""
99 chunk_size = self.minimum_chunk_size104 chunk_size = self.minimum_chunk_size
100 iteration = 0105 iteration = 0
101 total_size = 0106 total_size = 0
102 start_time = self._time()107 self.start_time = self._time()
103 last_clock = start_time108 last_clock = self.start_time
104 while not self.operation.isDone():109 while not self.operation.isDone():
105110
106 if (self.abort_time is not None111 if self._isTimedOut():
107 and last_clock >= start_time + self.abort_time):
108 self.log.warn(112 self.log.warn(
109 "Task aborted after %d seconds." % self.abort_time)113 "Task aborted after %d seconds." % self.abort_time)
110 break114 break
@@ -137,7 +141,7 @@
137 chunk_size = min(chunk_size, self.maximum_chunk_size)141 chunk_size = min(chunk_size, self.maximum_chunk_size)
138 iteration += 1142 iteration += 1
139143
140 total_time = last_clock - start_time144 total_time = last_clock - self.start_time
141 average_size = total_size/max(1, iteration)145 average_size = total_size/max(1, iteration)
142 average_speed = total_size/max(1, total_time)146 average_speed = total_size/max(1, total_time)
143 self.log.debug(147 self.log.debug(
@@ -161,7 +165,7 @@
161 if self.cooldown_time is None or self.cooldown_time <= 0.0:165 if self.cooldown_time is None or self.cooldown_time <= 0.0:
162 return bedtime166 return bedtime
163 else:167 else:
164 time.sleep(self.cooldown_time)168 self._sleep(self.cooldown_time)
165 return self._time()169 return self._time()
166170
167 def _time(self):171 def _time(self):
@@ -172,6 +176,15 @@
172 """176 """
173 return time.time()177 return time.time()
174178
179 def _sleep(self, seconds):
180 """Sleep.
181
182 If the sleep interval would put us over the tasks timeout,
183 do nothing.
184 """
185 if not self._isTimedOut(self.cooldown_time):
186 time.sleep(self.cooldown_time)
187
175188
176def timedelta_to_seconds(td):189def timedelta_to_seconds(td):
177 return 24 * 60 * td.days + td.seconds190 return 24 * 60 * td.days + td.seconds
@@ -208,7 +221,7 @@
208 """When database replication lag is high, block until it drops."""221 """When database replication lag is high, block until it drops."""
209 # Lag is most meaningful on the master.222 # Lag is most meaningful on the master.
210 store = getUtility(IStoreSelector).get(MAIN_STORE, MASTER_FLAVOR)223 store = getUtility(IStoreSelector).get(MAIN_STORE, MASTER_FLAVOR)
211 while True:224 while not self._isTimedOut():
212 lag = store.execute("SELECT replication_lag()").get_one()[0]225 lag = store.execute("SELECT replication_lag()").get_one()[0]
213 if lag is None:226 if lag is None:
214 lag = timedelta(seconds=0)227 lag = timedelta(seconds=0)
@@ -225,7 +238,7 @@
225 % time_to_sleep)238 % time_to_sleep)
226239
227 transaction.abort() # Don't become a long running transaction!240 transaction.abort() # Don't become a long running transaction!
228 time.sleep(time_to_sleep)241 self._sleep(time_to_sleep)
229242
230 def _blockForLongRunningTransactions(self):243 def _blockForLongRunningTransactions(self):
231 """If there are long running transactions, block to avoid making244 """If there are long running transactions, block to avoid making
@@ -233,7 +246,8 @@
233 if self.long_running_transaction is None:246 if self.long_running_transaction is None:
234 return247 return
235 store = getUtility(IStoreSelector).get(MAIN_STORE, MASTER_FLAVOR)248 store = getUtility(IStoreSelector).get(MAIN_STORE, MASTER_FLAVOR)
236 while True:249 msg_counter = 0
250 while not self._isTimedOut():
237 results = list(store.execute("""251 results = list(store.execute("""
238 SELECT252 SELECT
239 CURRENT_TIMESTAMP - xact_start,253 CURRENT_TIMESTAMP - xact_start,
@@ -247,13 +261,18 @@
247 """ % self.long_running_transaction).get_all())261 """ % self.long_running_transaction).get_all())
248 if not results:262 if not results:
249 break263 break
250 for runtime, procpid, usename, datname, query in results:264
251 self.log.info(265 # Check for long running transactions every 30 seconds, but
252 "Blocked on %s old xact %s@%s/%d - %s."266 # only report every 10 minutes to avoid log spam.
253 % (runtime, usename, datname, procpid, query))267 msg_counter += 1
254 self.log.info("Sleeping for 5 minutes.")268 if msg_counter % 20 == 1:
269 for runtime, procpid, usename, datname, query in results:
270 self.log.info(
271 "Blocked on %s old xact %s@%s/%d - %s."
272 % (runtime, usename, datname, procpid, query))
273 self.log.info("Sleeping for up to 10 minutes.")
255 transaction.abort() # Don't become a long running transaction!274 transaction.abort() # Don't become a long running transaction!
256 time.sleep(5*60)275 self._sleep(30)
257276
258 def _coolDown(self, bedtime):277 def _coolDown(self, bedtime):
259 """As per LoopTuner._coolDown, except we always wait until there278 """As per LoopTuner._coolDown, except we always wait until there
@@ -263,7 +282,7 @@
263 self._blockWhenLagged()282 self._blockWhenLagged()
264 if self.cooldown_time is not None and self.cooldown_time > 0.0:283 if self.cooldown_time is not None and self.cooldown_time > 0.0:
265 remaining_nap = self._time() - bedtime + self.cooldown_time284 remaining_nap = self._time() - bedtime + self.cooldown_time
266 if remaining_nap > 0.0:285 if remaining_nap > 0.0 and not self._is_timed_out(remaining_nap):
267 time.sleep(remaining_nap)286 time.sleep(remaining_nap)
268 return self._time()287 return self._time()
269288