Merge lp:~flacoste/launchpad/ppr-constant-memory into lp:launchpad

Proposed by Francis J. Lacoste
Status: Merged
Approved by: Robert Collins
Approved revision: no longer in the source branch.
Merged at revision: 11795
Proposed branch: lp:~flacoste/launchpad/ppr-constant-memory
Merge into: lp:launchpad
Diff against target: 628 lines (+246/-169)
1 file modified
lib/lp/scripts/utilities/pageperformancereport.py (+246/-169)
To merge this branch: bzr merge lp:~flacoste/launchpad/ppr-constant-memory
Reviewer Review Type Date Requested Status
Robert Collins (community) Approve
Review via email: mp+39324@code.launchpad.net

Commit message

Refactor page-performance-report to use less memory by using a SQLite3 db to hold the requests and generating statistics for only one key at a time.

Description of the change

This branch changes the algorithm used by the Page Performance Report to be
able to reduce memory usage.

The current algorithm builds the statistics as it parses the logs
all-in-memory. This uses a great amount of memory because it maintains
multiple array of request times in memory for all the keys (categories, page
ids, urls) it wants to report on. It currently fails to generate any weekly or
monthly report and has trouble with some daily report too.

The new algorithm parses all the logs into a SQLite3 database and then
generates statistics for one key at a time. It still does the statistics
computation in memory. This means that the amount of memory still grows
linearly with the number of requests, as the all category will require an
array that has all the request times.

Other changes:

* I've dropped the variance column for the report. We include standard deviation
which is its square root and more useful anyway.

* I've used numpy.clip instead of doing it using list comprehension for the
input to the histogram.

Locally on a 300 000 request file here are the performance diff:

            Old New
User time 1m33 1m52
Sys time 0m1.6 0m5
RSS 483M 229M

QA

I've compared the reports generated using the old algorithm with the new one
and the reports are identical (apart the removed column).

On sodium, I've been able to generate the problematic daily reports. It peaked
at 2.2G for 4 million requests. I'm not sure that the weekly and monthly
reports will be able to be computed still. Trying that now.

To post a comment you must log in.
Revision history for this message
Francis J. Lacoste (flacoste) wrote :

As far as stats goes, I forgot to report that the SQLite3 DB size was 55M for 300000 requests and 776M for 4.1M.

Revision history for this message
Robert Collins (lifeless) :
review: Approve
Revision history for this message
Robert Collins (lifeless) wrote :

Seems plausible; it might be better to not put the time and sql time in the same table.

If you used different tables, you could avoid all the masking stuff entirely.

Revision history for this message
Francis J. Lacoste (flacoste) wrote :

Yeah, computing one statistics at a time would also reduce the peak amount of memory used at the cost of more processing time. I'll see how it goes for weekly and monthly and assess if another round is needed.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'lib/lp/scripts/utilities/pageperformancereport.py'
--- lib/lp/scripts/utilities/pageperformancereport.py 2010-08-20 20:31:18 +0000
+++ lib/lp/scripts/utilities/pageperformancereport.py 2010-10-25 21:49:08 +0000
@@ -13,7 +13,10 @@
13import re13import re
14import subprocess14import subprocess
15from textwrap import dedent15from textwrap import dedent
16import sqlite3
17import tempfile
16import time18import time
19import warnings
1720
18import numpy21import numpy
19import simplejson as json22import simplejson as json
@@ -24,6 +27,9 @@
24from canonical.launchpad.scripts.logger import log27from canonical.launchpad.scripts.logger import log
25from lp.scripts.helpers import LPOptionParser28from lp.scripts.helpers import LPOptionParser
2629
30# We don't care about conversion to nan, they are expected.
31warnings.filterwarnings(
32 'ignore', '.*converting a masked element to nan.', UserWarning)
2733
28class Request(zc.zservertracelog.tracereport.Request):34class Request(zc.zservertracelog.tracereport.Request):
29 url = None35 url = None
@@ -52,19 +58,14 @@
5258
53 Requests belong to a Category if the URL matches a regular expression.59 Requests belong to a Category if the URL matches a regular expression.
54 """60 """
55 def __init__(self, title, regexp, timeout):61 def __init__(self, title, regexp):
56 self.title = title62 self.title = title
57 self.regexp = regexp63 self.regexp = regexp
58 self._compiled_regexp = re.compile(regexp, re.I | re.X)64 self._compiled_regexp = re.compile(regexp, re.I | re.X)
59 self.times = Times(timeout)65
6066 def match(self, request):
61 def add(self, request):67 """Return true when the request match this category."""
62 """Add a request to a Category if it belongs.68 return self._compiled_regexp.search(request.url) is not None
63
64 Does nothing if the request does not belong in this Category.
65 """
66 if self._compiled_regexp.search(request.url) is not None:
67 self.times.add(request)
6869
69 def __cmp__(self, other):70 def __cmp__(self, other):
70 return cmp(self.title.lower(), other.title.lower())71 return cmp(self.title.lower(), other.title.lower())
@@ -81,7 +82,6 @@
81 mean = 0 # Mean time per hit.82 mean = 0 # Mean time per hit.
82 median = 0 # Median time per hit.83 median = 0 # Median time per hit.
83 std = 0 # Standard deviation per hit.84 std = 0 # Standard deviation per hit.
84 var = 0 # Variance per hit.
85 ninetyninth_percentile_time = 085 ninetyninth_percentile_time = 0
86 histogram = None # # Request times histogram.86 histogram = None # # Request times histogram.
8787
@@ -89,46 +89,16 @@
89 mean_sqltime = 0 # Mean time spend waiting for SQL to process.89 mean_sqltime = 0 # Mean time spend waiting for SQL to process.
90 median_sqltime = 0 # Median time spend waiting for SQL to process.90 median_sqltime = 0 # Median time spend waiting for SQL to process.
91 std_sqltime = 0 # Standard deviation of SQL time.91 std_sqltime = 0 # Standard deviation of SQL time.
92 var_sqltime = 0 # Variance of SQL time
9392
94 total_sqlstatements = 0 # Total number of SQL statements issued.93 total_sqlstatements = 0 # Total number of SQL statements issued.
95 mean_sqlstatements = 094 mean_sqlstatements = 0
96 median_sqlstatements = 095 median_sqlstatements = 0
97 std_sqlstatements = 096 std_sqlstatements = 0
98 var_sqlstatements = 097
9998 def __init__(self, times, timeout):
100empty_stats = Stats() # Singleton.99 """Compute the stats based on times.
101100
102101 Times is a list of (app_time, sql_statements, sql_times).
103class Times:
104 """Collection of request times."""
105 def __init__(self, timeout):
106 self.total_hits = 0
107 self.total_time = 0
108 self.request_times = []
109 self.sql_statements = []
110 self.sql_times = []
111 self.ticks = []
112 self.histogram_width = int(1.5*timeout)
113
114 def add(self, request):
115 """Add the application time from the request to the collection."""
116 self.total_hits += 1
117 self.total_time += request.app_seconds
118 self.request_times.append(request.app_seconds)
119 if request.sql_statements is not None:
120 self.sql_statements.append(request.sql_statements)
121 if request.sql_seconds is not None:
122 self.sql_times.append(request.sql_seconds)
123 if request.ticks is not None:
124 self.ticks.append(request.ticks)
125
126 _stats = None
127
128 def stats(self):
129 """Generate statistics about our request times.
130
131 Returns a `Stats` instance.
132102
133 The histogram is a list of request counts per 1 second bucket.103 The histogram is a list of request counts per 1 second bucket.
134 ie. histogram[0] contains the number of requests taking between 0 and104 ie. histogram[0] contains the number of requests taking between 0 and
@@ -136,67 +106,201 @@
136 1 and 2 seconds etc. histogram is None if there are no requests in106 1 and 2 seconds etc. histogram is None if there are no requests in
137 this Category.107 this Category.
138 """108 """
139 if not self.total_hits:109 if not times:
140 return empty_stats110 return
141111
142 if self._stats is not None:112 self.total_hits = len(times)
143 return self._stats113
144114 # Ignore missing values (-1) in computation.
145 stats = Stats()115 times_array = numpy.ma.masked_values(
146116 numpy.asarray(times, dtype=numpy.float32), -1.)
147 stats.total_hits = self.total_hits117
148118 self.total_time, self.total_sqlstatements, self.total_sqltime = (
149 # Time stats119 times_array.sum(axis=0))
150 array = numpy.asarray(self.request_times, numpy.float32)120
151 stats.total_time = numpy.sum(array)121 self.mean, self.mean_sqlstatements, self.mean_sqltime = (
152 stats.mean = numpy.mean(array)122 times_array.mean(axis=0))
153 stats.median = numpy.median(array)123
154 stats.std = numpy.std(array)124 self.median, self.median_sqlstatements, self.median_sqltime = (
155 stats.var = numpy.var(array)125 numpy.median(times_array, axis=0))
126
127 self.std, self.std_sqlstatements, self.std_sqltime = (
128 numpy.std(times_array, axis=0))
129
156 # This is an approximation which may not be true: we don't know if we130 # This is an approximation which may not be true: we don't know if we
157 # have a std distribution or not. We could just find the 99th131 # have a std distribution or not. We could just find the 99th
158 # percentile by counting. Shock. Horror; however this appears pretty132 # percentile by counting. Shock. Horror; however this appears pretty
159 # good based on eyeballing things so far - once we're down in the 2-3133 # good based on eyeballing things so far - once we're down in the 2-3
160 # second range for everything we may want to revisit.134 # second range for everything we may want to revisit.
161 stats.ninetyninth_percentile_time = stats.mean + stats.std*3135 self.ninetyninth_percentile_time = self.mean + self.std*3
162 capped_times = (min(a_time, self.histogram_width) for a_time in136
163 self.request_times)137 histogram_width = int(timeout*1.5)
164 array = numpy.fromiter(capped_times, numpy.float32,138 histogram_times = numpy.clip(times_array[:,0], 0, histogram_width)
165 len(self.request_times))
166 histogram = numpy.histogram(139 histogram = numpy.histogram(
167 array, normed=True,140 histogram_times, normed=True, range=(0, histogram_width),
168 range=(0, self.histogram_width), bins=self.histogram_width)141 bins=histogram_width)
169 stats.histogram = zip(histogram[1], histogram[0])142 self.histogram = zip(histogram[1], histogram[0])
170143
171 # SQL time stats.144
172 array = numpy.asarray(self.sql_times, numpy.float32)145class SQLiteRequestTimes:
173 stats.total_sqltime = numpy.sum(array)146 """SQLite-based request times computation."""
174 stats.mean_sqltime = numpy.mean(array)147
175 stats.median_sqltime = numpy.median(array)148 def __init__(self, categories, options):
176 stats.std_sqltime = numpy.std(array)149 if options.db_file is None:
177 stats.var_sqltime = numpy.var(array)150 fd, self.filename = tempfile.mkstemp(suffix='.db', prefix='ppr')
178151 os.close(fd)
179 # SQL query count.152 else:
180 array = numpy.asarray(self.sql_statements, numpy.int)153 self.filename = options.db_file
181 stats.total_sqlstatements = int(numpy.sum(array))154 self.con = sqlite3.connect(self.filename, isolation_level='EXCLUSIVE')
182 stats.mean_sqlstatements = numpy.mean(array)155 log.debug('Using request database %s' % self.filename)
183 stats.median_sqlstatements = numpy.median(array)156 # Some speed optimization.
184 stats.std_sqlstatements = numpy.std(array)157 self.con.execute('PRAGMA synchronous = off')
185 stats.var_sqlstatements = numpy.var(array)158 self.con.execute('PRAGMA journal_mode = off')
186159
187 # Cache for next invocation.160 self.categories = categories
188 self._stats = stats161 self.store_all_request = options.pageids or options.top_urls
189 return stats162 self.timeout = options.timeout
190163 self.cur = self.con.cursor()
191 def __str__(self):164
192 results = self.stats()165 # Create the tables, ignore errors about them being already present.
193 total, mean, median, std, histogram = results166 try:
194 hstr = " ".join("%2d" % v for v in histogram)167 self.cur.execute('''
195 return "%2.2f %2.2f %2.2f %s" % (168 CREATE TABLE category_request (
196 total, mean, median, std, hstr)169 category INTEGER,
197170 time REAL,
198 def __cmp__(self, b):171 sql_statements INTEGER,
199 return cmp(self.total_time, b.total_time)172 sql_time REAL)
173 ''');
174 except sqlite3.OperationalError, e:
175 if 'already exists' in str(e):
176 pass
177 else:
178 raise
179
180 if self.store_all_request:
181 try:
182 self.cur.execute('''
183 CREATE TABLE request (
184 pageid TEXT,
185 url TEXT,
186 time REAL,
187 sql_statements INTEGER,
188 sql_time REAL)
189 ''');
190 except sqlite3.OperationalError, e:
191 if 'already exists' in str(e):
192 pass
193 else:
194 raise
195
196 def add_request(self, request):
197 """Add a request to the cache."""
198 sql_statements = request.sql_statements
199 sql_seconds = request.sql_seconds
200
201 # Store missing value as -1, as it makes dealing with those
202 # easier with numpy.
203 if sql_statements is None:
204 sql_statements = -1
205 if sql_seconds is None:
206 sql_seconds = -1
207 for idx, category in enumerate(self.categories):
208 if category.match(request):
209 self.con.execute(
210 "INSERT INTO category_request VALUES (?,?,?,?)",
211 (idx, request.app_seconds, sql_statements, sql_seconds))
212
213 if self.store_all_request:
214 pageid = request.pageid or 'Unknown'
215 self.con.execute(
216 "INSERT INTO request VALUES (?,?,?,?,?)",
217 (pageid, request.url, request.app_seconds, sql_statements,
218 sql_seconds))
219
220 def commit(self):
221 """Call commit on the underlying connection."""
222 self.con.commit()
223
224 def get_category_times(self):
225 """Return the times for each category."""
226 category_query = 'SELECT * FROM category_request ORDER BY category'
227
228 empty_stats = Stats([], 0)
229 categories = dict(self.get_times(category_query))
230 return [
231 (category, categories.get(idx, empty_stats))
232 for idx, category in enumerate(self.categories)]
233
234 def get_top_urls_times(self, top_n):
235 """Return the times for the Top URL by total time"""
236 top_url_query = '''
237 SELECT url, time, sql_statements, sql_time
238 FROM request WHERE url IN (
239 SELECT url FROM (SELECT url, sum(time) FROM request
240 GROUP BY url
241 ORDER BY sum(time) DESC
242 LIMIT %d))
243 ORDER BY url
244 ''' % top_n
245 # Sort the result by total time
246 return sorted(
247 self.get_times(top_url_query), key=lambda x: x[1].total_time,
248 reverse=True)
249
250 def get_pageid_times(self):
251 """Return the times for the pageids."""
252 pageid_query = '''
253 SELECT pageid, time, sql_statements, sql_time
254 FROM request
255 ORDER BY pageid
256 '''
257 return self.get_times(pageid_query)
258
259 def get_times(self, query):
260 """Return a list of key, stats based on the query.
261
262 The query should return rows of the form:
263 [key, app_time, sql_statements, sql_times]
264
265 And should be sorted on key.
266 """
267 times = []
268 current_key = None
269 results = []
270 self.cur.execute(query)
271 while True:
272 rows = self.cur.fetchmany()
273 if len(rows) == 0:
274 break
275 for row in rows:
276 # We are encountering a new group...
277 if row[0] != current_key:
278 # Compute the stats of the previous group
279 if current_key != None:
280 results.append(
281 (current_key, Stats(times, self.timeout)))
282 # Initialize the new group.
283 current_key = row[0]
284 times = []
285
286 times.append(row[1:])
287 # Compute the stats of the last group
288 if current_key != None:
289 results.append((current_key, Stats(times, self.timeout)))
290
291 return results
292
293 def close(self, remove=False):
294 """Close the SQLite connection.
295
296 :param remove: If true, the DB file will be removed.
297 """
298 self.con.close()
299 if remove:
300 log.debug('Deleting request database.')
301 os.unlink(self.filename)
302 else:
303 log.debug('Keeping request database %s.' % self.filename)
200304
201305
202def main():306def main():
@@ -235,13 +339,17 @@
235 # Default to 12: the staging timeout.339 # Default to 12: the staging timeout.
236 default=12, type="int",340 default=12, type="int",
237 help="The configured timeout value : determines high risk page ids.")341 help="The configured timeout value : determines high risk page ids.")
342 parser.add_option(
343 "--db-file", dest="db_file",
344 default=None, metavar="FILE",
345 help="Do not parse the records, generate reports from the DB file.")
238346
239 options, args = parser.parse_args()347 options, args = parser.parse_args()
240348
241 if not os.path.isdir(options.directory):349 if not os.path.isdir(options.directory):
242 parser.error("Directory %s does not exist" % options.directory)350 parser.error("Directory %s does not exist" % options.directory)
243351
244 if len(args) == 0:352 if len(args) == 0 and options.db_file is None:
245 parser.error("At least one zserver tracelog file must be provided")353 parser.error("At least one zserver tracelog file must be provided")
246354
247 if options.from_ts is not None and options.until_ts is not None:355 if options.from_ts is not None and options.until_ts is not None:
@@ -266,7 +374,7 @@
266 for option in script_config.options('categories'):374 for option in script_config.options('categories'):
267 regexp = script_config.get('categories', option)375 regexp = script_config.get('categories', option)
268 try:376 try:
269 categories.append(Category(option, regexp, options.timeout))377 categories.append(Category(option, regexp))
270 except sre_constants.error, x:378 except sre_constants.error, x:
271 log.fatal("Unable to compile regexp %r (%s)" % (regexp, x))379 log.fatal("Unable to compile regexp %r (%s)" % (regexp, x))
272 return 1380 return 1
@@ -275,18 +383,23 @@
275 if len(categories) == 0:383 if len(categories) == 0:
276 parser.error("No data in [categories] section of configuration.")384 parser.error("No data in [categories] section of configuration.")
277385
278 pageid_times = {}386 times = SQLiteRequestTimes(categories, options)
279 url_times = {}387
280388 if len(args) > 0:
281 parse(args, categories, pageid_times, url_times, options)389 parse(args, times, options)
282390 times.commit()
283 # Truncate the URL times to the top N.391
392 log.debug('Generating category statistics...')
393 category_times = times.get_category_times()
394
395 pageid_times = []
396 url_times= []
284 if options.top_urls:397 if options.top_urls:
285 sorted_urls = sorted(398 log.debug('Generating top %d urls statistics...' % options.top_urls)
286 ((times, url) for url, times in url_times.items()399 url_times = times.get_top_urls_times(options.top_urls)
287 if times.total_hits > 0), reverse=True)400 if options.pageids:
288 url_times = [(url, times)401 log.debug('Generating pageid statistics...')
289 for times, url in sorted_urls[:options.top_urls]]402 pageid_times = times.get_pageid_times()
290403
291 def _report_filename(filename):404 def _report_filename(filename):
292 return os.path.join(options.directory, filename)405 return os.path.join(options.directory, filename)
@@ -295,7 +408,7 @@
295 if options.categories:408 if options.categories:
296 report_filename = _report_filename('categories.html')409 report_filename = _report_filename('categories.html')
297 log.info("Generating %s", report_filename)410 log.info("Generating %s", report_filename)
298 html_report(open(report_filename, 'w'), categories, None, None)411 html_report(open(report_filename, 'w'), category_times, None, None)
299412
300 # Pageid only report.413 # Pageid only report.
301 if options.pageids:414 if options.pageids:
@@ -313,7 +426,8 @@
313 if options.categories and options.pageids:426 if options.categories and options.pageids:
314 report_filename = _report_filename('combined.html')427 report_filename = _report_filename('combined.html')
315 html_report(428 html_report(
316 open(report_filename, 'w'), categories, pageid_times, url_times)429 open(report_filename, 'w'),
430 category_times, pageid_times, url_times)
317431
318 # Report of likely timeout candidates432 # Report of likely timeout candidates
319 report_filename = _report_filename('timeout-candidates.html')433 report_filename = _report_filename('timeout-candidates.html')
@@ -322,6 +436,7 @@
322 open(report_filename, 'w'), None, pageid_times, None,436 open(report_filename, 'w'), None, pageid_times, None,
323 options.timeout - 2)437 options.timeout - 2)
324438
439 times.close(options.db_file is None)
325 return 0440 return 0
326441
327442
@@ -363,7 +478,7 @@
363 *(int(elem) for elem in match.groups() if elem is not None))478 *(int(elem) for elem in match.groups() if elem is not None))
364479
365480
366def parse(tracefiles, categories, pageid_times, url_times, options):481def parse(tracefiles, times, options):
367 requests = {}482 requests = {}
368 total_requests = 0483 total_requests = 0
369 for tracefile in tracefiles:484 for tracefile in tracefiles:
@@ -444,35 +559,7 @@
444 log.debug("Parsed %d requests", total_requests)559 log.debug("Parsed %d requests", total_requests)
445560
446 # Add the request to any matching categories.561 # Add the request to any matching categories.
447 if options.categories:562 times.add_request(request)
448 for category in categories:
449 category.add(request)
450
451 # Add the request to the times for that pageid.
452 if options.pageids:
453 pageid = request.pageid
454 try:
455 times = pageid_times[pageid]
456 except KeyError:
457 times = Times(options.timeout)
458 pageid_times[pageid] = times
459 times.add(request)
460
461 # Add the request to the times for that URL.
462 if options.top_urls:
463 url = request.url
464 # Hack to remove opstats from top N report. This
465 # should go into a config file if we end up with
466 # more pages that need to be ignored because
467 # they are just noise.
468 if not (url is None or url.endswith('+opstats')):
469 try:
470 times = url_times[url]
471 except KeyError:
472 times = Times(options.timeout)
473 url_times[url] = times
474 times.add(request)
475
476 else:563 else:
477 raise MalformedLine('Unknown record type %s', record_type)564 raise MalformedLine('Unknown record type %s', record_type)
478 except MalformedLine, x:565 except MalformedLine, x:
@@ -491,7 +578,6 @@
491 elif prefix == 't':578 elif prefix == 't':
492 if len(args) != 4:579 if len(args) != 4:
493 raise MalformedLine("Wrong number of arguments %s" % (args,))580 raise MalformedLine("Wrong number of arguments %s" % (args,))
494 request.ticks = int(args[1])
495 request.sql_statements = int(args[2])581 request.sql_statements = int(args[2])
496 request.sql_seconds = float(args[3]) / 1000582 request.sql_seconds = float(args[3]) / 1000
497 else:583 else:
@@ -500,12 +586,12 @@
500586
501587
502def html_report(588def html_report(
503 outf, categories, pageid_times, url_times,589 outf, category_times, pageid_times, url_times,
504 ninetyninth_percentile_threshold=None):590 ninetyninth_percentile_threshold=None):
505 """Write an html report to outf.591 """Write an html report to outf.
506592
507 :param outf: A file object to write the report to.593 :param outf: A file object to write the report to.
508 :param categories: Categories to report.594 :param category_times: The time statistics for categories.
509 :param pageid_times: The time statistics for pageids.595 :param pageid_times: The time statistics for pageids.
510 :param url_times: The time statistics for the top XXX urls.596 :param url_times: The time statistics for the top XXX urls.
511 :param ninetyninth_percentile_threshold: Lower threshold for inclusion of597 :param ninetyninth_percentile_threshold: Lower threshold for inclusion of
@@ -575,20 +661,17 @@
575661
576 <th class="clickable">Mean Time (secs)</th>662 <th class="clickable">Mean Time (secs)</th>
577 <th class="clickable">Time Standard Deviation</th>663 <th class="clickable">Time Standard Deviation</th>
578 <th class="clickable">Time Variance</th>
579 <th class="clickable">Median Time (secs)</th>664 <th class="clickable">Median Time (secs)</th>
580 <th class="sorttable_nosort">Time Distribution</th>665 <th class="sorttable_nosort">Time Distribution</th>
581666
582 <th class="clickable">Total SQL Time (secs)</th>667 <th class="clickable">Total SQL Time (secs)</th>
583 <th class="clickable">Mean SQL Time (secs)</th>668 <th class="clickable">Mean SQL Time (secs)</th>
584 <th class="clickable">SQL Time Standard Deviation</th>669 <th class="clickable">SQL Time Standard Deviation</th>
585 <th class="clickable">SQL Time Variance</th>
586 <th class="clickable">Median SQL Time (secs)</th>670 <th class="clickable">Median SQL Time (secs)</th>
587671
588 <th class="clickable">Total SQL Statements</th>672 <th class="clickable">Total SQL Statements</th>
589 <th class="clickable">Mean SQL Statements</th>673 <th class="clickable">Mean SQL Statements</th>
590 <th class="clickable">SQL Statement Standard Deviation</th>674 <th class="clickable">SQL Statement Standard Deviation</th>
591 <th class="clickable">SQL Statement Variance</th>
592 <th class="clickable">Median SQL Statements</th>675 <th class="clickable">Median SQL Statements</th>
593676
594 </tr>677 </tr>
@@ -600,8 +683,7 @@
600 # Store our generated histograms to output Javascript later.683 # Store our generated histograms to output Javascript later.
601 histograms = []684 histograms = []
602685
603 def handle_times(html_title, times):686 def handle_times(html_title, stats):
604 stats = times.stats()
605 histograms.append(stats.histogram)687 histograms.append(stats.histogram)
606 print >> outf, dedent("""\688 print >> outf, dedent("""\
607 <tr>689 <tr>
@@ -611,7 +693,6 @@
611 <td class="numeric 99pc_under">%.2f</td>693 <td class="numeric 99pc_under">%.2f</td>
612 <td class="numeric mean_time">%.2f</td>694 <td class="numeric mean_time">%.2f</td>
613 <td class="numeric std_time">%.2f</td>695 <td class="numeric std_time">%.2f</td>
614 <td class="numeric var_time">%.2f</td>
615 <td class="numeric median_time">%.2f</td>696 <td class="numeric median_time">%.2f</td>
616 <td>697 <td>
617 <div class="histogram" id="histogram%d"></div>698 <div class="histogram" id="histogram%d"></div>
@@ -619,30 +700,27 @@
619 <td class="numeric total_sqltime">%.2f</td>700 <td class="numeric total_sqltime">%.2f</td>
620 <td class="numeric mean_sqltime">%.2f</td>701 <td class="numeric mean_sqltime">%.2f</td>
621 <td class="numeric std_sqltime">%.2f</td>702 <td class="numeric std_sqltime">%.2f</td>
622 <td class="numeric var_sqltime">%.2f</td>
623 <td class="numeric median_sqltime">%.2f</td>703 <td class="numeric median_sqltime">%.2f</td>
624704
625 <td class="numeric total_sqlstatements">%d</td>705 <td class="numeric total_sqlstatements">%.f</td>
626 <td class="numeric mean_sqlstatements">%.2f</td>706 <td class="numeric mean_sqlstatements">%.2f</td>
627 <td class="numeric std_sqlstatements">%.2f</td>707 <td class="numeric std_sqlstatements">%.2f</td>
628 <td class="numeric var_sqlstatements">%.2f</td>
629 <td class="numeric median_sqlstatements">%.2f</td>708 <td class="numeric median_sqlstatements">%.2f</td>
630 </tr>709 </tr>
631 """ % (710 """ % (
632 html_title,711 html_title,
633 stats.total_hits, stats.total_time,712 stats.total_hits, stats.total_time,
634 stats.ninetyninth_percentile_time,713 stats.ninetyninth_percentile_time,
635 stats.mean, stats.std, stats.var, stats.median,714 stats.mean, stats.std, stats.median,
636 len(histograms) - 1,715 len(histograms) - 1,
637 stats.total_sqltime, stats.mean_sqltime,716 stats.total_sqltime, stats.mean_sqltime,
638 stats.std_sqltime, stats.var_sqltime, stats.median_sqltime,717 stats.std_sqltime, stats.median_sqltime,
639 stats.total_sqlstatements, stats.mean_sqlstatements,718 stats.total_sqlstatements, stats.mean_sqlstatements,
640 stats.std_sqlstatements, stats.var_sqlstatements,719 stats.std_sqlstatements, stats.median_sqlstatements))
641 stats.median_sqlstatements))
642720
643 # Table of contents721 # Table of contents
644 print >> outf, '<ol>'722 print >> outf, '<ol>'
645 if categories:723 if category_times:
646 print >> outf, '<li><a href="#catrep">Category Report</a></li>'724 print >> outf, '<li><a href="#catrep">Category Report</a></li>'
647 if pageid_times:725 if pageid_times:
648 print >> outf, '<li><a href="#pageidrep">Pageid Report</a></li>'726 print >> outf, '<li><a href="#pageidrep">Pageid Report</a></li>'
@@ -650,22 +728,21 @@
650 print >> outf, '<li><a href="#topurlrep">Top URL Report</a></li>'728 print >> outf, '<li><a href="#topurlrep">Top URL Report</a></li>'
651 print >> outf, '</ol>'729 print >> outf, '</ol>'
652730
653 if categories:731 if category_times:
654 print >> outf, '<h2 id="catrep">Category Report</h2>'732 print >> outf, '<h2 id="catrep">Category Report</h2>'
655 print >> outf, table_header733 print >> outf, table_header
656 for category in categories:734 for category, times in category_times:
657 html_title = '%s<br/><span class="regexp">%s</span>' % (735 html_title = '%s<br/><span class="regexp">%s</span>' % (
658 html_quote(category.title), html_quote(category.regexp))736 html_quote(category.title), html_quote(category.regexp))
659 handle_times(html_title, category.times)737 handle_times(html_title, times)
660 print >> outf, table_footer738 print >> outf, table_footer
661739
662 if pageid_times:740 if pageid_times:
663 print >> outf, '<h2 id="pageidrep">Pageid Report</h2>'741 print >> outf, '<h2 id="pageidrep">Pageid Report</h2>'
664 print >> outf, table_header742 print >> outf, table_header
665 for pageid, times in sorted(pageid_times.items()):743 for pageid, times in pageid_times:
666 pageid = pageid or 'None'
667 if (ninetyninth_percentile_threshold is not None and744 if (ninetyninth_percentile_threshold is not None and
668 (times.stats().ninetyninth_percentile_time <745 (times.ninetyninth_percentile_time <
669 ninetyninth_percentile_threshold)):746 ninetyninth_percentile_threshold)):
670 continue747 continue
671 handle_times(html_quote(pageid), times)748 handle_times(html_quote(pageid), times)