Merge lp:~spiv/launchpad/codebrowse-oops-84838 into lp:launchpad

Proposed by Andrew Bennetts
Status: Merged
Approved by: Michael Hudson-Doyle
Approved revision: no longer in the source branch.
Merged at revision: 11288
Proposed branch: lp:~spiv/launchpad/codebrowse-oops-84838
Merge into: lp:launchpad
Prerequisite: lp:~spiv/launchpad/loggerhead-logging
Diff against target: 210 lines (+148/-1)
4 files modified
configs/development/launchpad-lazr.conf (+3/-0)
lib/canonical/config/schema-lazr.conf (+9/-0)
lib/launchpad_loggerhead/app.py (+134/-0)
scripts/start-loggerhead.py (+2/-1)
To merge this branch: bzr merge lp:~spiv/launchpad/codebrowse-oops-84838
Reviewer Review Type Date Requested Status
Michael Hudson-Doyle Approve
Andrew Bennetts Pending
Review via email: mp+31022@code.launchpad.net

This proposal supersedes a proposal from 2010-07-27.

Commit message

Unhandled errors in codebrowse now log OOPSes

Description of the change

This fixes a 5-digit bug number. :)

This fixes codebrowse to log tracebacks from failed requests as OOPSes, rather than dumping them in debug.log. This is a fairly basic implementation: it shows very elementary HTML to the user, and it doesn't capture any information in the OOPS beyond the traceback and the URL... but even those flaws are improvements over the status quo, and are clearly marked with XXXs.

The only outstanding issue I know of is finding out what the production config values for this should be, particularly the error_dir. And of course arranging for the production OOPSes to get synced and reported on like OOPS reports from other systems.

This is based on an already-approved (but as yet unmerged) branch that makes a more modest tweak to codebrowse's logging.

To post a comment you must log in.
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote : Posted in a previous version of this proposal

1) Yay!
2) I think it would be better to have some of the state around 'really-called' etc in an object
3) Unless I'm misreading/misremembering, the wsgi protocol is that the application can return an _iterable_, not necessarily an _iterator_, so I think you need to say "app_iter = iter(app(...))".
4) The oops page should still be a "500 Internal Server Error" at the HTTP level.
5) Maybe the oops template rendering can be factored out?
6) Yay!, again

review: Needs Fixing
Revision history for this message
Andrew Bennetts (spiv) wrote : Posted in a previous version of this proposal

Ok, I think I've addressed all your comments, plus a couple of nits you didn't notice :)

It still appears to work after the changes, so that's a plus ;)

Please re-review.

review: Needs Resubmitting
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Yay, this looks really good now. Only one near-trivial comment: WrappedStartResponse.write_wrapper should use the body_started property?

Otherwise fine. Do you need me to land this for you?

review: Approve
Revision history for this message
Andrew Bennetts (spiv) wrote :

Michael Hudson wrote:
> Review: Approve
> Yay, this looks really good now. Only one near-trivial comment: WrappedStartResponse.write_wrapper should use the body_started property?

Done.

> Otherwise fine. Do you need me to land this for you?

Yes please.

Revision history for this message
Robert Collins (lifeless) wrote :

For the configuration stuff - ask losas@c.c to allocate the details.

-Rob

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

On Tue, 03 Aug 2010 00:25:57 -0000, Andrew Bennetts <email address hidden> wrote:
> Michael Hudson wrote:
> > Review: Approve
> > Yay, this looks really good now. Only one near-trivial comment: WrappedStartResponse.write_wrapper should use the body_started property?
>
> Done.

Woo.

> > Otherwise fine. Do you need me to land this for you?
>
> Yes please.

My lucid install ate itself and I haven't gotten EC2 and so on set up
yet so you might want to chase someone else for this. I'll get to it,
just maybe not very soon :-)

Cheers,
mwh

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'configs/development/launchpad-lazr.conf'
2--- configs/development/launchpad-lazr.conf 2010-07-25 05:16:58 +0000
3+++ configs/development/launchpad-lazr.conf 2010-08-03 00:26:18 +0000
4@@ -52,6 +52,9 @@
5 log_folder: /var/tmp/codebrowse.launchpad.dev/logs
6 launchpad_root: https://code.launchpad.dev/
7 secret_path: configs/development/codebrowse-secret
8+error_dir: /var/tmp/codebrowse.launchpad.dev/errors
9+oops_prefix: CB
10+copy_to_zlog: false
11
12 [codehosting]
13 launch: True
14
15=== modified file 'lib/canonical/config/schema-lazr.conf'
16--- lib/canonical/config/schema-lazr.conf 2010-07-26 09:23:26 +0000
17+++ lib/canonical/config/schema-lazr.conf 2010-08-03 00:26:18 +0000
18@@ -283,6 +283,15 @@
19 # datatype: string
20 secret_path:
21
22+# See [error_reports].
23+copy_to_zlog: False
24+
25+# See [error_reports].
26+error_dir: none
27+
28+# See [error_reports].
29+oops_prefix: CB
30+
31
32 [codehosting]
33 # datatype: string
34
35=== modified file 'lib/launchpad_loggerhead/app.py'
36--- lib/launchpad_loggerhead/app.py 2010-05-15 17:43:59 +0000
37+++ lib/launchpad_loggerhead/app.py 2010-08-03 00:26:18 +0000
38@@ -3,6 +3,7 @@
39
40 import logging
41 import os
42+import sys
43 import threading
44 import urllib
45 import urlparse
46@@ -26,6 +27,8 @@
47 from canonical.config import config
48 from canonical.launchpad.xmlrpc import faults
49 from canonical.launchpad.webapp.vhosts import allvhosts
50+from canonical.launchpad.webapp.errorlog import (
51+ ErrorReportingUtility, ScriptRequest)
52 from lp.code.interfaces.codehosting import (
53 BRANCH_TRANSPORT, LAUNCHPAD_ANONYMOUS)
54 from lp.codehosting.vfs import get_lp_server
55@@ -227,3 +230,134 @@
56 bzr_branch.unlock()
57 finally:
58 lp_server.stop_server()
59+
60+
61+def make_oops_logging_exception_hook(error_utility, request):
62+ """Make a hook for logging OOPSes."""
63+ def log_oops():
64+ error_utility.raising(sys.exc_info(), request)
65+ return log_oops
66+
67+
68+def make_error_utility():
69+ """Make an error utility for logging errors from codebrowse."""
70+ error_utility = ErrorReportingUtility()
71+ error_utility.configure('codebrowse')
72+ return error_utility
73+
74+
75+# XXX: This HTML template should be replaced with the same one that lpnet uses
76+# for reporting OOPSes to users, or at least something that looks similar. But
77+# even this is better than the "Internal Server Error" you'd get otherwise.
78+# - Andrew Bennetts, 2010-07-27.
79+_oops_html_template = '''\
80+<html>
81+<head><title>Oops! %(oopsid)s</title></head>
82+<body>
83+<h1>Oops!</h1>
84+<p>Something broke while generating the page.
85+Please try again in a few minutes, and if the problem persists file a bug at
86+<a href="https://bugs.launchpad.net/launchpad-code"
87+>https://bugs.launchpad.net/launchpad-code</a>
88+and quote OOPS-ID <strong>%(oopsid)s</strong>
89+</p></body></html>'''
90+
91+
92+_error_status = '500 Internal Server Error'
93+_error_headers = [('Content-Type', 'text/html')]
94+
95+
96+class WrappedStartResponse(object):
97+ """Wraps start_response (from a WSGI request) to keep track of whether
98+ start_response was called (and whether the callable it returns has been
99+ called).
100+
101+ Used by oops_middleware.
102+ """
103+
104+ def __init__(self, start_response):
105+ self._real_start_response = start_response
106+ self.response_start = None
107+ self._write_callable = None
108+
109+ @property
110+ def body_started(self):
111+ return self._write_callable is not None
112+
113+ def start_response(self, status, headers, exc_info=None):
114+ # Just keep a note of the values we were called with for now. We don't
115+ # need to invoke the real start_response until the response body
116+ # starts.
117+ self.response_start = (status, headers)
118+ if exc_info is not None:
119+ self.response_start += (exc_info,)
120+ return self.write_wrapper
121+
122+ def really_start(self):
123+ self._write_callable = self._real_start_response(*self.response_start)
124+
125+ def write_wrapper(self, data):
126+ if not self.body_started:
127+ self.really_start()
128+ self._write_callable(data)
129+
130+ def generate_oops(self, environ, error_utility):
131+ """Generate an OOPS.
132+
133+ :returns: True if the error page was sent to the user, and False if it
134+ couldn't (i.e. if the response body was already started).
135+ """
136+ oopsid = report_oops(environ, error_utility)
137+ if self.body_started:
138+ return False
139+ write = self._real_start_response(_error_status, _error_headers)
140+ write(_oops_html_template % {'oopsid': oopsid})
141+ return True
142+
143+
144+def report_oops(environ, error_utility):
145+ # XXX: We should capture more per-request information to include in
146+ # the OOPS here, e.g. duration, user, etc. But even an OOPS with
147+ # just a traceback and URL is better than nothing.
148+ # - Andrew Bennetts, 2010-07-27.
149+ request = ScriptRequest(
150+ [], URL=construct_url(environ))
151+ error_utility.raising(sys.exc_info(), request)
152+ return request.oopsid
153+
154+
155+def oops_middleware(app):
156+ """Middleware to log an OOPS if the request fails.
157+
158+ If the request fails before the response body has started then this returns
159+ a basic HTML error page with the OOPS ID to the user (and status code 500).
160+ """
161+ error_utility = make_error_utility()
162+ def wrapped_app(environ, start_response):
163+ wrapped = WrappedStartResponse(start_response)
164+ # Start processing this request
165+ try:
166+ app_iter = iter(app(environ, wrapped.start_response))
167+ except:
168+ error_page_sent = wrapped.generate_oops(environ, error_utility)
169+ if error_page_sent:
170+ return
171+ # Could not send error page to user, so... just give up.
172+ raise
173+ # Start yielding the response
174+ while True:
175+ try:
176+ data = app_iter.next()
177+ except StopIteration:
178+ return
179+ except:
180+ error_page_sent = wrapped.generate_oops(environ, error_utility)
181+ if error_page_sent:
182+ return
183+ # Could not send error page to user, so... just give up.
184+ raise
185+ else:
186+ if not wrapped.body_started:
187+ wrapped.really_start()
188+ yield data
189+ return wrapped_app
190
191=== modified file 'scripts/start-loggerhead.py'
192--- scripts/start-loggerhead.py 2010-06-08 15:43:13 +0000
193+++ scripts/start-loggerhead.py 2010-08-03 00:26:18 +0000
194@@ -119,7 +119,7 @@
195
196 from launchpad_loggerhead.debug import (
197 change_kill_thread_criteria, threadpool_debug)
198-from launchpad_loggerhead.app import RootApp
199+from launchpad_loggerhead.app import RootApp, oops_middleware
200 from launchpad_loggerhead.session import SessionHandler
201
202 SESSION_VAR = 'lh.session'
203@@ -153,6 +153,7 @@
204 return wrapped
205 app = set_scheme(app)
206 app = change_kill_thread_criteria(app)
207+app = oops_middleware(app)
208
209 try:
210 httpserver.serve(