Merge lp:~spiv/launchpad/codebrowse-oops-84838 into lp:launchpad

Proposed by Andrew Bennetts
Status: Superseded
Proposed branch: lp:~spiv/launchpad/codebrowse-oops-84838
Merge into: lp:launchpad
Prerequisite: lp:~spiv/launchpad/loggerhead-logging
Diff against target: 210 lines (+148/-1)
4 files modified
configs/development/launchpad-lazr.conf (+3/-0)
lib/canonical/config/schema-lazr.conf (+9/-0)
lib/launchpad_loggerhead/app.py (+134/-0)
scripts/start-loggerhead.py (+2/-1)
To merge this branch: bzr merge lp:~spiv/launchpad/codebrowse-oops-84838
Reviewer Review Type Date Requested Status
Andrew Bennetts (community) Needs Resubmitting
Michael Hudson-Doyle Needs Fixing
Review via email: mp+31007@code.launchpad.net

This proposal has been superseded by a proposal from 2010-07-27.

Description of the change

This fixes a 5-digit bug number. :)

This fixes codebrowse to log tracebacks from failed requests as OOPSes, rather than dumping them in debug.log. This is a fairly basic implementation: it shows very elementary HTML to the user, and it doesn't capture any information in the OOPS beyond the traceback and the URL... but even those flaws are improvements over the status quo, and are clearly marked with XXXs.

The only outstanding issue I know of is finding out what the production config values for this should be, particularly the error_dir. And of course arranging for the production OOPSes to get synced and reported on like OOPS reports from other systems.

This is based on an already-approved (but as yet unmerged) branch that makes a more modest tweak to codebrowse's logging.

To post a comment you must log in.
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

1) Yay!
2) I think it would be better to have some of the state around 'really-called' etc in an object
3) Unless I'm misreading/misremembering, the wsgi protocol is that the application can return an _iterable_, not necessarily an _iterator_, so I think you need to say "app_iter = iter(app(...))".
4) The oops page should still be a "500 Internal Server Error" at the HTTP level.
5) Maybe the oops template rendering can be factored out?
6) Yay!, again

review: Needs Fixing
Revision history for this message
Andrew Bennetts (spiv) wrote :

Ok, I think I've addressed all your comments, plus a couple of nits you didn't notice :)

It still appears to work after the changes, so that's a plus ;)

Please re-review.

review: Needs Resubmitting

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'configs/development/launchpad-lazr.conf'
2--- configs/development/launchpad-lazr.conf 2010-07-25 05:16:58 +0000
3+++ configs/development/launchpad-lazr.conf 2010-07-27 07:29:59 +0000
4@@ -52,6 +52,9 @@
5 log_folder: /var/tmp/codebrowse.launchpad.dev/logs
6 launchpad_root: https://code.launchpad.dev/
7 secret_path: configs/development/codebrowse-secret
8+error_dir: /var/tmp/codebrowse.launchpad.dev/errors
9+oops_prefix: CB
10+copy_to_zlog: false
11
12 [codehosting]
13 launch: True
14
15=== modified file 'lib/canonical/config/schema-lazr.conf'
16--- lib/canonical/config/schema-lazr.conf 2010-07-26 09:23:26 +0000
17+++ lib/canonical/config/schema-lazr.conf 2010-07-27 07:29:59 +0000
18@@ -283,6 +283,15 @@
19 # datatype: string
20 secret_path:
21
22+# See [error_reports].
23+copy_to_zlog: False
24+
25+# See [error_reports].
26+error_dir: none
27+
28+# See [error_reports].
29+oops_prefix: CB
30+
31
32 [codehosting]
33 # datatype: string
34
35=== modified file 'lib/launchpad_loggerhead/app.py'
36--- lib/launchpad_loggerhead/app.py 2010-05-15 17:43:59 +0000
37+++ lib/launchpad_loggerhead/app.py 2010-07-27 07:29:59 +0000
38@@ -3,6 +3,7 @@
39
40 import logging
41 import os
42+import sys
43 import threading
44 import urllib
45 import urlparse
46@@ -26,6 +27,8 @@
47 from canonical.config import config
48 from canonical.launchpad.xmlrpc import faults
49 from canonical.launchpad.webapp.vhosts import allvhosts
50+from canonical.launchpad.webapp.errorlog import (
51+ ErrorReportingUtility, ScriptRequest)
52 from lp.code.interfaces.codehosting import (
53 BRANCH_TRANSPORT, LAUNCHPAD_ANONYMOUS)
54 from lp.codehosting.vfs import get_lp_server
55@@ -227,3 +230,134 @@
56 bzr_branch.unlock()
57 finally:
58 lp_server.stop_server()
59+
60+
61+def make_oops_logging_exception_hook(error_utility, request):
62+ """Make a hook for logging OOPSes."""
63+ def log_oops():
64+ error_utility.raising(sys.exc_info(), request)
65+ return log_oops
66+
67+
68+def make_error_utility():
69+ """Make an error utility for logging errors from codebrowse."""
70+ error_utility = ErrorReportingUtility()
71+ error_utility.configure('codebrowse')
72+ return error_utility
73+
74+
75+# XXX: This HTML template should be replaced with the same one that lpnet uses
76+# for reporting OOPSes to users, or at least something that looks similar. But
77+# even this is better than the "Internal Server Error" you'd get otherwise.
78+# - Andrew Bennetts, 2010-07-27.
79+_oops_html_template = '''\
80+<html>
81+<head><title>Oops! %(oopsid)s</title></head>
82+<body>
83+<h1>Oops!</h1>
84+<p>Something broke while generating the page.
85+Please try again in a few minutes, and if the problem persists file a bug at
86+<a href="https://bugs.launchpad.net/launchpad-code"
87+>https://bugs.launchpad.net/launchpad-code</a>
88+and quote OOPS-ID <strong>%(oopsid)s</strong>
89+</p></body></html>'''
90+
91+
92+_error_status = '500 Internal Server Error'
93+_error_headers = [('Content-Type', 'text/html')]
94+
95+
96+class WrappedStartResponse(object):
97+ """Wraps start_response (from a WSGI request) to keep track of whether
98+ start_response was called (and whether the callable it returns has been
99+ called).
100+
101+ Used by oops_middleware.
102+ """
103+
104+ def __init__(self, start_response):
105+ self._real_start_response = start_response
106+ self.response_start = None
107+ self._write_callable = None
108+
109+ @property
110+ def body_started(self):
111+ return self._write_callable is not None
112+
113+ def start_response(self, status, headers, exc_info=None):
114+ # Just keep a note of the values we were called with for now. We don't
115+ # need to invoke the real start_response until the response body
116+ # starts.
117+ self.response_start = (status, headers)
118+ if exc_info is not None:
119+ self.response_start += (exc_info,)
120+ return self.write_wrapper
121+
122+ def really_start(self):
123+ self._write_callable = self._real_start_response(*self.response_start)
124+
125+ def write_wrapper(self, data):
126+ if self._write_callable is None:
127+ self.really_start()
128+ self._write_callable(data)
129+
130+ def generate_oops(self, environ, error_utility):
131+ """Generate an OOPS.
132+
133+ :returns: True if the error page was sent to the user, and False if it
134+ couldn't (i.e. if the response body was already started).
135+ """
136+ oopsid = report_oops(environ, error_utility)
137+ if self.body_started:
138+ return False
139+ write = self._real_start_response(_error_status, _error_headers)
140+ write(_oops_html_template % {'oopsid': oopsid})
141+ return True
142+
143+
144+def report_oops(environ, error_utility):
145+ # XXX: We should capture more per-request information to include in
146+ # the OOPS here, e.g. duration, user, etc. But even an OOPS with
147+ # just a traceback and URL is better than nothing.
148+ # - Andrew Bennetts, 2010-07-27.
149+ request = ScriptRequest(
150+ [], URL=construct_url(environ))
151+ error_utility.raising(sys.exc_info(), request)
152+ return request.oopsid
153+
154+
155+def oops_middleware(app):
156+ """Middleware to log an OOPS if the request fails.
157+
158+ If the request fails before the response body has started then this returns
159+ a basic HTML error page with the OOPS ID to the user (and status code 500).
160+ """
161+ error_utility = make_error_utility()
162+ def wrapped_app(environ, start_response):
163+ wrapped = WrappedStartResponse(start_response)
164+ # Start processing this request
165+ try:
166+ app_iter = iter(app(environ, wrapped.start_response))
167+ except:
168+ error_page_sent = wrapped.generate_oops(environ, error_utility)
169+ if error_page_sent:
170+ return
171+ # Could not send error page to user, so... just give up.
172+ raise
173+ # Start yielding the response
174+ while True:
175+ try:
176+ data = app_iter.next()
177+ except StopIteration:
178+ return
179+ except:
180+ error_page_sent = wrapped.generate_oops(environ, error_utility)
181+ if error_page_sent:
182+ return
183+ # Could not send error page to user, so... just give up.
184+ raise
185+ else:
186+ if not wrapped.body_started:
187+ wrapped.really_start()
188+ yield data
189+ return wrapped_app
190
191=== modified file 'scripts/start-loggerhead.py'
192--- scripts/start-loggerhead.py 2010-06-08 15:43:13 +0000
193+++ scripts/start-loggerhead.py 2010-07-27 07:29:59 +0000
194@@ -119,7 +119,7 @@
195
196 from launchpad_loggerhead.debug import (
197 change_kill_thread_criteria, threadpool_debug)
198-from launchpad_loggerhead.app import RootApp
199+from launchpad_loggerhead.app import RootApp, oops_middleware
200 from launchpad_loggerhead.session import SessionHandler
201
202 SESSION_VAR = 'lh.session'
203@@ -153,6 +153,7 @@
204 return wrapped
205 app = set_scheme(app)
206 app = change_kill_thread_criteria(app)
207+app = oops_middleware(app)
208
209 try:
210 httpserver.serve(