Merge lp:~bac/launchpad/bug-341935-captcha into lp:launchpad

Proposed by Brad Crittenden
Status: Merged
Approved by: Brad Crittenden
Approved revision: no longer in the source branch.
Merged at revision: not available
Proposed branch: lp:~bac/launchpad/bug-341935-captcha
Merge into: lp:launchpad
Diff against target: 319 lines
7 files modified
lib/canonical/launchpad/browser/tests/registration.py (+2/-0)
lib/canonical/launchpad/pagetests/standalone/xx-new-account-redirection-url.txt (+2/-0)
lib/canonical/launchpad/templates/launchpad-login.pt (+23/-5)
lib/canonical/launchpad/webapp/login.py (+47/-0)
lib/lp/registry/stories/foaf/xx-createaccount.txt (+17/-4)
lib/lp/registry/stories/foaf/xx-reg-with-existing-email.txt (+5/-1)
lib/lp/testing/registration.py (+32/-0)
To merge this branch: bzr merge lp:~bac/launchpad/bug-341935-captcha
Reviewer Review Type Date Requested Status
Martin Albisetti (community) ui Approve
Michael Nelson (community) ui* Approve
Edwin Grubbs (community) code ui* Approve
Review via email: mp+13022@code.launchpad.net

Commit message

Add a simple math-based captcha to the registration pages to thwart dumb bots.

To post a comment you must log in.
Revision history for this message
Brad Crittenden (bac) wrote :

= Summary =

Bug 341935 addresses the need to do *something* to disrupt the ability for bots to
trick Launchpad into sending registration email to unsuspecting people. It was
agreed to do a simple, text-based math problem captcha as a first step to try to
defeat the bots. While it would be trivial for bots to defeat this captcha (see the
pagetest!) it is our belief that we are not a real target so no one would invest in a
custom solution for use against Launchpad.

== Proposed fix ==

Add a simple captcha into the form.

== Pre-implementation notes ==

Discussions with Barry and Curtis.

== Implementation details ==

The view for the login page is a mess. It's not a LaunchpadFormView so the additions
were done 'by hand'.

In order to make it a low bar the math question is simple addition with the answer
being in the range [10, 20].

Screenshot at http://people.canonical.com/~bac/captcha.png

== Tests ==

bin/test xx-createaccount.txt

== Demo and Q/A ==

https://launchpad.dev/+login

= Launchpad lint =

Checking for conflicts. and issues in doctests and templates.
Running jslint, xmllint, pyflakes, and pylint.
Using normal rules.

Linting changed files:
  lib/canonical/launchpad/templates/launchpad-login.pt
  lib/lp/registry/stories/foaf/xx-createaccount.txt
  lib/canonical/launchpad/webapp/login.py

Revision history for this message
Edwin Grubbs (edwin-grubbs) wrote :

Hi Brad,

This looks good. I just have one minor comment.

merge-approved

-Edwin

>=== modified file 'lib/lp/registry/stories/foaf/xx-createaccount.txt'
>--- lib/lp/registry/stories/foaf/xx-createaccount.txt 2009-09-11 18:45:31 +0000
>+++ lib/lp/registry/stories/foaf/xx-createaccount.txt 2009-10-07 21:39:02 +0000
>@@ -21,12 +21,34 @@
> The email address you provided isn't valid. Please verify it and try
> again.
>
>-Jane tries again, providing a valid email address and asking to create a new
>-account.
>+Next she enters a valid email address but enters a wrong answer for
>+the incredibly trivial match captcha. She gets an error message.

Did you intend "math captcha" instead of "match captcha"?

> >>> browser.getControl(name='loginpage_email', index=1).value = (
> ... '<email address hidden>')
>- >>> browser.getControl('Register').click()
>+ >>> browser.getControl(name='loginpage_captcha_submission').value = '-1'
>+ >>> browser.getControl('Register').click()
>+ >>> print_feedback_messages(browser.contents)
>+ The answer to the simple math question was incorrect or missing.
>+ Please try again.
>+
>+Jane tries again, providing a the correct captcha answer. Her valid
>+email address from before has been retained in the form.
>+
>+ >>> import re
>+ >>> def get_captcha_answer(contents):
>+ ... expr = re.compile("What is (\d+ .{1} \d+)?")
>+ ... match = expr.search(contents)
>+ ... if match:
>+ ... question = match.group(1)
>+ ... answer = eval(question)
>+ ... return str(answer)
>+ ... return ''
>+
>+ >>> browser.getControl(name='loginpage_captcha_submission').value = (
>+ ... get_captcha_answer(browser.contents))
>+ >>> browser.getControl('Register').click()
>+ >>> print_feedback_messages(browser.contents)
> >>> print extract_text(find_main_content(browser.contents))
> Registration mail sent
> Instructions on completing your registration have been sent to
>@@ -36,7 +58,7 @@
> Launchpad sends Jane an email message containing a token she must use to
> complete the registration process.
>
>- >>> import email, re
>+ >>> import email
> >>> from lp.services.mail import stub
> >>> from canonical.launchpad.ftests.logintoken import (
> ... get_token_url_from_email)
>@@ -91,6 +113,8 @@
> >>> browser.open('http://launchpad.dev/+login')
> >>> browser.getControl(name='loginpage_email', index=1).value = (
> ... '<email address hidden>')
>+ >>> browser.getControl(name='loginpage_captcha_submission').value = (
>+ ... get_captcha_answer(browser.contents))
> >>> browser.getControl('Register').click()
>
> >>> from_addr, to_addrs, raw_msg = stub.test_emails.pop()
>

review: Approve (code ui*)
Revision history for this message
Michael Nelson (michael.nelson) wrote :

Hi bac,

Thanks for getting this done! Looks very straight-forward.

I've noticed that many sites have at least a casual explanation for why they're asking people to answer a maths question. Something like: "We like to make sure that you're not a bot." as a description for the form field, but I'm assuming you've made the decision that there's no benefit?

review: Approve (ui*)
Revision history for this message
Martin Albisetti (beuno) wrote :

08:11 < beuno> noodles775, good morning
08:11 < noodles775> hi beuno
08:11 < beuno> noodles775, looking at the captcha review
08:12 < beuno> it's a great point about telling people why we ask that
08:12 < beuno> I wonder if we could have a (?) icon, that is clickable, and mrevell's help popup comes up
08:12 < beuno> so we don't have so much text on the page
08:12 < noodles775> Yeah, that'd be a good way to keep it minimal :)
08:13 < beuno> the other thing that came to mind was, maybe the field shouldn't be that length
08:13 < beuno> to make it distincctivly different than password
08:13 < beuno> if you don't read labels, it's easy to be fooled
08:16 < noodles775> yeah, true - I hadn't thought of that - mistaking it for a password field simply because it's following the email field.
08:16 < noodles775> I had a quick look at a few other sites (gmail signup and ubuntu forums) and both define a label like "Random question" or "Word
                    verification", and are visually quite different...
08:17 < beuno> right, you may also be able to get away with a very short description
08:17 < beuno> I think the key here is that it look visually different enough
08:17 < beuno> this is where "don't read the merge proposal" comes in useful :)
08:18 < noodles775> So, laying it out like the gmail signup:
08:19 < noodles775> hmm... not sure what we'd use for a label, but having the actual simple math question in the right section (ie. aligned with the inputs)
                    and the input below it could do that.
08:20 < beuno> yeap, that would work
08:21 < noodles775> perhaps the label could simply be 'Random question'.
08:21 < beuno> yeah, or "spam test"
08:21 < beuno> something that gets the message across
08:21 < noodles775> I'll play with it and do a diff.
08:21 < beuno> it's a great point that the label is not a good place to put the question

review: Needs Fixing (ui)
Revision history for this message
Michael Nelson (michael.nelson) wrote :

> Hi bac,
>
> Thanks for getting this done! Looks very straight-forward.
>
> I've noticed that many sites have at least a casual explanation for why
> they're asking people to answer a maths question. Something like: "We like to
> make sure that you're not a bot." as a description for the form field, but I'm
> assuming you've made the decision that there's no benefit?

I was just chatting with Martin, and a few things came out of it:

1. It would be good to ensure that the captcha field is visually different (it's possible that some people won't read the label but will simply type their password in there...being so used to entering their email followed by a password to create accounts etc.)
2. It might be good to have some indication of why we're asking for a maths question (as above) - but it doesn't need to be prominent.

So along those lines, here's two thoughts - see what you think:

1. http://people.canonical.com/~michaeln/tmp/captcha_eg1.png
This deals with (1) by giving the field a normal label and moving the question into the right-hand column (similar to what gmail does with their 'Word verification' when you try to register an account). That said, I don't particularly like the smaller input all on its own... so,

2. http://people.canonical.com/~michaeln/tmp/captcha_eg2.png
shows another idea - modifying the question to simply 8 + 8 = and having the input inline there. This looks much simpler to me and is very hard to mistake as a password field ;). I'll see what you guys think.

Note, in the second screenshot, you'll see that I also added the explanation of the random question to the steps above the fields - this seems like a good place to put it (as it is a required step) and it keeps the form clean. Another thought Martin had was to have a small '?' help link next to the question that would pop-up an explanation.

Anyway, let me know what you think.

Revision history for this message
Brad Crittenden (bac) wrote :

Thanks for the review and suggestions Michael and Martin.

Michael I like your second screenshot and have implemented it. A new screenshot is at http://people.canonical.com/~bac/captcha-2.png and a diff of the changes at http://pastebin.ubuntu.com/288678/

Revision history for this message
Martin Albisetti (beuno) :
review: Approve (ui)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/canonical/launchpad/browser/tests/registration.py'
2--- lib/canonical/launchpad/browser/tests/registration.py 2009-08-05 18:52:52 +0000
3+++ lib/canonical/launchpad/browser/tests/registration.py 2009-10-08 20:25:21 +0000
4@@ -8,6 +8,7 @@
5 from canonical.launchpad.ftests import logout
6 from canonical.launchpad.testing.pages import setupBrowser
7 from canonical.launchpad.webapp import canonical_url
8+from lp.testing.registration import set_captcha_answer
9
10
11 def start_registration_through_the_web(email):
12@@ -20,6 +21,7 @@
13 browser = setupBrowser()
14 browser.open('http://launchpad.dev/+login')
15 browser.getControl(name='loginpage_email', index=1).value = email
16+ set_captcha_answer(browser)
17 browser.getControl('Register').click()
18 return browser
19
20
21=== modified file 'lib/canonical/launchpad/pagetests/standalone/xx-new-account-redirection-url.txt'
22--- lib/canonical/launchpad/pagetests/standalone/xx-new-account-redirection-url.txt 2009-05-12 01:39:29 +0000
23+++ lib/canonical/launchpad/pagetests/standalone/xx-new-account-redirection-url.txt 2009-10-08 20:25:21 +0000
24@@ -39,8 +39,10 @@
25 If she now creates a new account, we'll certainly redirect her back to
26 whatever we had stored on the redirection_url hidden field.
27
28+ >>> from lp.testing.registration import set_captcha_answer
29 >>> anon_browser.getControl('E-mail address:', index=1).value = (
30 ... 'granny@canonical.com')
31+ >>> set_captcha_answer(anon_browser)
32 >>> anon_browser.getControl('Register').click()
33 >>> print anon_browser.contents
34 <...
35
36=== modified file 'lib/canonical/launchpad/templates/launchpad-login.pt'
37--- lib/canonical/launchpad/templates/launchpad-login.pt 2009-07-17 17:59:07 +0000
38+++ lib/canonical/launchpad/templates/launchpad-login.pt 2009-10-08 20:25:21 +0000
39@@ -154,10 +154,11 @@
40 <!-- Preserve extra query parameters as hidden fields. -->
41 <tal:replacement tal:replace="structure view/preserve_query" />
42 </form>
43- <p><strong>Note:</strong> You must enable cookies in your browser to log into or register for Launchpad.</p>
44+ <p><strong>Note:</strong> You must enable cookies in your browser
45+ to log into or register for Launchpad.</p>
46 <!-- 5a. didn't try to register last time: -->
47 <h1 tal:condition="not: view/registration_error">
48- Not registered yet?
49+ Not registered yet?
50 </h1>
51 <!-- 5b. just tried to register and failed: -->
52 <tal:block condition="view/registration_error">
53@@ -168,19 +169,24 @@
54 <!-- 5. might want to register: -->
55 <p>
56 Creating your Launchpad account is easy. Here's what to do:</p>
57-
58+
59 <ol class="subordinate">
60 <li>Make sure cookies are enabled in your browser.</li>
61- <li>Enter your e-mail address.</li>
62+ <li>Enter your e-mail address and answer our random question
63+ so we know that you're human.
64+ </li>
65 <li>Follow the URL in the confirmation e-mail that Launchpad sends and you're done!</li>
66 </ol>
67-
68+
69
70
71 <form name="join" method="POST">
72 <input type="hidden" name="redirection_url"
73 tal:condition="not: request/form/redirection_url|nothing"
74 tal:attributes="value view/getRedirectionURL" />
75+ <input type="hidden"
76+ tal:attributes="name view/captcha_hash;
77+ value view/get_captcha_hash" />
78 <table>
79 <tr>
80 <th>
81@@ -194,6 +200,18 @@
82 </td>
83 </tr>
84 <tr>
85+ <th>
86+ <label>Random question:</label>
87+ </th>
88+ <td>
89+ <span id="problem" tal:content="view/captcha_problem" />
90+ <input type="text" size="5"
91+ id="captcha_submission" value=""
92+ tal:attributes="name view/captcha_submission"
93+ />
94+ </td>
95+ </tr>
96+ <tr>
97 <th></th>
98 <td>
99 <input
100
101=== modified file 'lib/canonical/launchpad/webapp/login.py'
102--- lib/canonical/launchpad/webapp/login.py 2009-07-31 19:56:48 +0000
103+++ lib/canonical/launchpad/webapp/login.py 2009-10-08 20:25:21 +0000
104@@ -8,6 +8,8 @@
105 import cgi
106 import urllib
107 from datetime import datetime, timedelta
108+import md5
109+import random
110
111 from BeautifulSoup import UnicodeDammit
112
113@@ -18,6 +20,7 @@
114
115 from z3c.ptcompat import ViewPageTemplateFile
116
117+from canonical.cachedproperty import cachedproperty
118 from canonical.config import config
119 from canonical.launchpad import _
120 from canonical.launchpad.interfaces.account import AccountStatus
121@@ -167,6 +170,8 @@
122 submit_registration = form_prefix + 'submit_registration'
123 input_email = form_prefix + 'email'
124 input_password = form_prefix + 'password'
125+ captcha_submission = form_prefix + 'captcha_submission'
126+ captcha_hash = form_prefix + 'captcha_hash'
127
128 # Instance variables that represent the state of the form.
129 login_error = None
130@@ -292,12 +297,20 @@
131 redirection_url = redirection_url_list[0]
132
133 self.email = request.form.get(self.input_email).strip()
134+
135 if not valid_email(self.email):
136 self.registration_error = (
137 "The email address you provided isn't valid. "
138 "Please verify it and try again.")
139 return
140
141+ # Validate the user is human, more or less.
142+ if not self.validateCaptcha():
143+ self.registration_error = (
144+ "The answer to the simple math question was incorrect "
145+ "or missing. Please try again.")
146+ return
147+
148 registered_email = getUtility(IEmailAddressSet).getByEmail(self.email)
149 if registered_email is not None:
150 person = registered_email.person
151@@ -386,6 +399,40 @@
152 L.append(html % (name, cgi.escape(value, quote=True)))
153 return '\n'.join(L)
154
155+ def validateCaptcha(self):
156+ """Validate the submitted captcha value matches what we expect."""
157+ expected = self.request.form.get(self.captcha_hash)
158+ submitted = self.request.form.get(self.captcha_submission)
159+ if expected is not None and submitted is not None:
160+ return md5.new(submitted).hexdigest() == expected
161+ return False
162+
163+ @cachedproperty
164+ def captcha_answer(self):
165+ """Get the answer for the current captcha challenge.
166+
167+ With each failed attempt a new challenge will be given. Our answer
168+ space is acknowledged to be ridiculously small but is chosen in the
169+ interest of ease-of-use. We're not trying to create an iron-clad
170+ challenge but only a minimal obstacle to dumb bots.
171+ """
172+ return random.randint(10, 20)
173+
174+ @property
175+ def get_captcha_hash(self):
176+ """Get the captcha hash.
177+
178+ The hash is the value we put in the form for later comparison.
179+ """
180+ return md5.new(str(self.captcha_answer)).hexdigest()
181+
182+ @property
183+ def captcha_problem(self):
184+ """Create the captcha challenge."""
185+ op1 = random.randint(1, self.captcha_answer)
186+ op2 = self.captcha_answer - op1
187+ return '%d + %d =' % (op1, op2)
188+
189
190 def logInPrincipal(request, principal, email):
191 """Log the principal in. Password validation must be done in callsites."""
192
193=== modified file 'lib/lp/registry/stories/foaf/xx-createaccount.txt'
194--- lib/lp/registry/stories/foaf/xx-createaccount.txt 2009-09-11 18:45:31 +0000
195+++ lib/lp/registry/stories/foaf/xx-createaccount.txt 2009-10-08 20:25:21 +0000
196@@ -21,12 +21,24 @@
197 The email address you provided isn't valid. Please verify it and try
198 again.
199
200-Jane tries again, providing a valid email address and asking to create a new
201-account.
202+Next she enters a valid email address but enters a wrong answer for
203+the incredibly trivial math captcha. She gets an error message.
204
205 >>> browser.getControl(name='loginpage_email', index=1).value = (
206 ... 'jane@example.com')
207- >>> browser.getControl('Register').click()
208+ >>> browser.getControl(name='loginpage_captcha_submission').value = '-1'
209+ >>> browser.getControl('Register').click()
210+ >>> print_feedback_messages(browser.contents)
211+ The answer to the simple math question was incorrect or missing.
212+ Please try again.
213+
214+Jane tries again, providing a the correct captcha answer. Her valid
215+email address from before has been retained in the form.
216+
217+ >>> from lp.testing.registration import set_captcha_answer
218+ >>> set_captcha_answer(browser)
219+ >>> browser.getControl('Register').click()
220+ >>> print_feedback_messages(browser.contents)
221 >>> print extract_text(find_main_content(browser.contents))
222 Registration mail sent
223 Instructions on completing your registration have been sent to
224@@ -36,7 +48,7 @@
225 Launchpad sends Jane an email message containing a token she must use to
226 complete the registration process.
227
228- >>> import email, re
229+ >>> import email
230 >>> from lp.services.mail import stub
231 >>> from canonical.launchpad.ftests.logintoken import (
232 ... get_token_url_from_email)
233@@ -91,6 +103,7 @@
234 >>> browser.open('http://launchpad.dev/+login')
235 >>> browser.getControl(name='loginpage_email', index=1).value = (
236 ... 'jperson@example.com')
237+ >>> set_captcha_answer(browser)
238 >>> browser.getControl('Register').click()
239
240 >>> from_addr, to_addrs, raw_msg = stub.test_emails.pop()
241
242=== modified file 'lib/lp/registry/stories/foaf/xx-reg-with-existing-email.txt'
243--- lib/lp/registry/stories/foaf/xx-reg-with-existing-email.txt 2009-05-12 01:39:29 +0000
244+++ lib/lp/registry/stories/foaf/xx-reg-with-existing-email.txt 2009-10-08 20:25:21 +0000
245@@ -15,6 +15,8 @@
246
247 >>> browser.getControl('E-mail address:', index=1).value = (
248 ... 'test@canonical.com')
249+ >>> from lp.testing.registration import set_captcha_answer
250+ >>> set_captcha_answer(browser)
251 >>> browser.getControl('Register').click()
252
253 >>> for message in get_feedback_messages(browser.contents):
254@@ -31,6 +33,7 @@
255 on with the registration process.
256
257 >>> browser.getControl('E-mail address:', index=1).value = 'mpo@iki.fi'
258+ >>> set_captcha_answer(browser)
259 >>> browser.getControl('Register').click()
260
261 >>> print extract_text(find_tag_by_id(browser.contents, 'address'))
262@@ -82,6 +85,7 @@
263
264 >>> anon_browser.getControl('E-mail address:', index=1).value = (
265 ... 'christian.reis@ubuntulinux.com')
266+ >>> set_captcha_answer(anon_browser)
267 >>> anon_browser.getControl('Register').click()
268
269 >>> from_addr, to_addr, msg = stub.test_emails.pop()
270@@ -143,6 +147,7 @@
271
272 >>> browser.getControl('E-mail address:', index=1).value = (
273 ... 'bad-user@canonical.com')
274+ >>> set_captcha_answer(browser)
275 >>> browser.getControl('Register').click()
276
277 >>> print extract_text(find_main_content(browser.contents).p)
278@@ -179,4 +184,3 @@
279 <div ...>This profile cannot be claimed because the account is suspended.
280 Contact a <a href="mailto:feedback@launchpad.net?subject=SU...">Launchpad
281 admin</a> about this issue.</div>
282-
283
284=== added file 'lib/lp/testing/registration.py'
285--- lib/lp/testing/registration.py 1970-01-01 00:00:00 +0000
286+++ lib/lp/testing/registration.py 2009-10-08 20:25:21 +0000
287@@ -0,0 +1,32 @@
288+# Copyright 2009 Canonical Ltd. This software is licensed under the
289+# GNU Affero General Public License version 3 (see the file LICENSE).
290+
291+"""Helper functions dealing with registration in tests.
292+"""
293+__metaclass__ = type
294+
295+__all__ = [
296+ 'get_captcha_answer',
297+ 'set_captcha_answer',
298+ ]
299+
300+import re
301+
302+
303+def get_captcha_answer(contents):
304+ """Search the browser contents and get the captcha answer."""
305+ expr = re.compile("(\d+ .{1} \d+) =")
306+ match = expr.search(contents)
307+ if match:
308+ question = match.group(1)
309+ answer = eval(question)
310+ return str(answer)
311+ return ''
312+
313+
314+def set_captcha_answer(browser, answer=None):
315+ """Given a browser, set the login captcha with the correct answer."""
316+ if answer is None:
317+ answer = get_captcha_answer(browser.contents)
318+ browser.getControl(name='loginpage_captcha_submission').value = (
319+ answer)