Merge lp:~lifeless/launchpad/malone into lp:launchpad

Proposed by Robert Collins
Status: Merged
Approved by: Robert Collins
Approved revision: no longer in the source branch.
Merged at revision: 11199
Proposed branch: lp:~lifeless/launchpad/malone
Merge into: lp:launchpad
Diff against target: 296 lines (+91/-25)
9 files modified
lib/canonical/database/nl_search.py (+56/-7)
lib/canonical/launchpad/doc/textsearching.txt (+18/-6)
lib/lp/answers/doc/faqtarget.txt (+1/-0)
lib/lp/answers/doc/questiontarget.txt (+5/-0)
lib/lp/answers/stories/question-add-in-other-languages.txt (+0/-2)
lib/lp/answers/stories/question-add.txt (+0/-3)
lib/lp/bugs/doc/bugtask.txt (+8/-0)
lib/lp/bugs/model/bugtask.py (+3/-4)
lib/lp/bugs/stories/guided-filebug/xx-sorting-by-relevance.txt (+0/-3)
To merge this branch: bzr merge lp:~lifeless/launchpad/malone
Reviewer Review Type Date Requested Status
Paul Hummer (community) code Approve
Review via email: mp+30507@code.launchpad.net

Commit message

Avoid perfoming hundreds of thousands of comparisons as a prelude to actual searching - the fti index is the right place to perform relevance assessment.

Description of the change

Do less work searching, so searching is faster. Faster searching -> less load -> less timeouts. We all win. Winning is good. Goodness is good.

To post a comment you must log in.
Revision history for this message
Paul Hummer (rockstar) wrote :

8 - extra_constraints_tables=None):
9 + extra_constraints_tables=None,
10 + fast_enabled=True):
11 + """Return the tsearch2 query that should be use to do a phrase search.
12 +
13 + The precise heuristics applied by this function will vary as we tune
14 + the system.
15 +

*used

review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/canonical/database/nl_search.py'
2--- lib/canonical/database/nl_search.py 2009-06-25 05:30:52 +0000
3+++ lib/canonical/database/nl_search.py 2010-07-22 05:49:45 +0000
4@@ -36,7 +36,58 @@
5
6
7 def nl_phrase_search(phrase, table, constraints='',
8- extra_constraints_tables=None):
9+ extra_constraints_tables=None,
10+ fast_enabled=True):
11+ """Return the tsearch2 query that should be used to do a phrase search.
12+
13+ The precise heuristics applied by this function will vary as we tune
14+ the system.
15+
16+ It is the interface by which a user query should be turned into a backend
17+ search language query.
18+
19+ Caveats: The model class must define a 'fti' column which is then used used
20+ for full text searching.
21+
22+ :param phrase: A search phrase.
23+ :param table: This should be the SQLBase class representing the base type.
24+ :param constraints: Additional SQL clause that limits the rows to a subset
25+ of the table.
26+ :param extra_constraints_tables: A list of additional table names that are
27+ needed by the constraints clause.
28+ :param fast_enabled: If true use a fast, but less precise, code path. When
29+ feature flags are available this will be converted to a feature flag.
30+ :return: A tsearch2 query string.
31+ """
32+ terms = nl_term_candidates(phrase)
33+ if len(terms) == 0:
34+ return ''
35+ if fast_enabled:
36+ return _nl_phrase_search(terms, table, constraints,
37+ extra_constraints_tables)
38+ else:
39+ return _slow_nl_phrase_search(terms, table, constraints,
40+ extra_constraints_tables)
41+
42+
43+def _nl_phrase_search(terms, table, constraints, extra_constraints_tables):
44+ """Perform a very simple pruning of the phrase, letting fti do ranking.
45+
46+ See nl_phrase_search for the contract of this function.
47+ """
48+ bad_words = set(['firefox', 'ubuntu', 'launchpad'])
49+ terms = set(terms)
50+ filtered_terms = []
51+ for term in terms:
52+ if term.lower() in bad_words:
53+ continue
54+ filtered_terms.append(term)
55+ # sorted for testing convenience.
56+ return '|'.join(sorted(filtered_terms))
57+
58+
59+def _slow_nl_phrase_search(terms, table, constraints,
60+ extra_constraints_tables):
61 """Return the tsearch2 query that should be use to do a phrase search.
62
63 This function implement an algorithm similar to the one used by MySQL
64@@ -56,7 +107,7 @@
65 closer in the text at the top of the list, while still returning rows that
66 use only some of the terms.
67
68- :phrase: A search phrase.
69+ :terms: Some candidate search terms.
70
71 :table: This should be the SQLBase class representing the base type.
72
73@@ -66,14 +117,12 @@
74 :extra_constraints_tables: A list of additional table names that are
75 needed by the constraints clause.
76
77- Caveat: The SQLBase class must define a 'fti' column .
78- This is the column that is used for full text searching.
79+ Caveat: The model class must define a 'fti' column which is then used
80+ for full text searching.
81 """
82 total = table.select(
83 constraints, clauseTables=extra_constraints_tables).count()
84- term_candidates = nl_term_candidates(phrase)
85- if len(term_candidates) == 0:
86- return ''
87+ term_candidates = terms
88 if total < 5:
89 return '|'.join(term_candidates)
90
91
92=== modified file 'lib/canonical/launchpad/doc/textsearching.txt'
93--- lib/canonical/launchpad/doc/textsearching.txt 2009-07-23 17:49:31 +0000
94+++ lib/canonical/launchpad/doc/textsearching.txt 2010-07-22 05:49:45 +0000
95@@ -588,7 +588,12 @@
96
97 To get the actual tsearch2 query that should be run, you will use the
98 nl_phrase_search() function. This one takes two mandatory parameters and
99-two optional ones. You pass in the search phrase and a SQLObject class.
100+two optional ones. You pass in the search phrase and a database model class.
101+
102+The original nl_phrase_search has proved slow, so there are now two
103+implementations in the core.
104+
105+First we describe the slow implementation.
106
107 The select method of that class will be use to count the number of rows
108 that is matched by each term. Term matching 50% or more of the total
109@@ -608,7 +613,8 @@
110
111 So firefox will be removed from the final query:
112
113- >>> nl_phrase_search('system is slow when running firefox', Question)
114+ >>> nl_phrase_search('system is slow when running firefox', Question,
115+ ... fast_enabled=False)
116 u'system|slow|run'
117
118 >>> nl_term_candidates('how do I do this?')
119@@ -616,6 +622,12 @@
120 >>> nl_phrase_search('how do I do this?', Question)
121 ''
122
123+The fast code path only removes firefox, ubuntu and launchpad today:
124+
125+ >>> nl_phrase_search('system is slow when running firefox on ubuntu',
126+ ... Question)
127+ u'run|slow|system'
128+
129
130 ==== Using other constraints ====
131
132@@ -640,13 +652,13 @@
133 >>> nl_phrase_search(
134 ... 'firefox gets very slow on flickr', Question,
135 ... "Question.product = %s AND Product.active = 't'" % firefox_product.id,
136- ... ['Product'])
137+ ... ['Product'], fast_enabled=False)
138 u'slow|flickr'
139
140 When the query only has stop words or common words in it, the returned
141 query will be the empty string:
142
143- >>> nl_phrase_search('firefox will not do it', Question)
144+ >>> nl_phrase_search('ubuntu will not do it', Question)
145 ''
146
147 When there are no candidate rows, only stemming and stop words removal
148@@ -656,7 +668,7 @@
149 0
150 >>> nl_phrase_search('firefox is very slow on flickr', Question,
151 ... 'product = -1')
152- u'firefox|slow|flickr'
153+ u'flickr|slow'
154
155
156 ==== No keywords filtering with few rows ====
157@@ -693,4 +705,4 @@
158 ... 'firefox is slow', Question,
159 ... 'distribution = %s AND sourcepackagename = %s' % sqlvalues(
160 ... ubuntu, firefox_package.sourcepackagename))
161- u'firefox|slow'
162+ u'slow'
163
164=== modified file 'lib/lp/answers/doc/faqtarget.txt'
165--- lib/lp/answers/doc/faqtarget.txt 2010-03-18 18:55:02 +0000
166+++ lib/lp/answers/doc/faqtarget.txt 2010-07-22 05:49:45 +0000
167@@ -159,6 +159,7 @@
168 >>> for faq in target.findSimilarFAQs('How do I use the Answer Tracker'):
169 ... print faq.title
170 How to answer a question
171+ How to use bug mail
172 How to become a Launchpad king
173
174 The results are ordered by relevancy. The first document is considered
175
176=== modified file 'lib/lp/answers/doc/questiontarget.txt'
177--- lib/lp/answers/doc/questiontarget.txt 2010-04-08 20:24:52 +0000
178+++ lib/lp/answers/doc/questiontarget.txt 2010-07-22 05:49:45 +0000
179@@ -359,6 +359,11 @@
180 ... print t.title
181 New question
182 Another question
183+ Question title3
184+ Question title2
185+ Question title1
186+ Question title0
187+
188
189
190 Answer contacts
191
192=== modified file 'lib/lp/answers/stories/question-add-in-other-languages.txt'
193--- lib/lp/answers/stories/question-add-in-other-languages.txt 2009-11-11 22:17:17 +0000
194+++ lib/lp/answers/stories/question-add-in-other-languages.txt 2010-07-22 05:49:45 +0000
195@@ -45,8 +45,6 @@
196 ... row.first('a').renderContents()
197 'Installation of Java Runtime Environment for Mozilla'
198 'Problema al recompilar kernel con soporte smp (doble-n\xc3\xbacleo)'
199- 'mailto: problem in webpage'
200- '\xd8\xb9\xd9\x83\xd8\xb3 ...'
201
202 >>> for tag in find_tags_by_class(browser.contents, 'warning message'):
203 ... print tag.renderContents()
204
205=== modified file 'lib/lp/answers/stories/question-add.txt'
206--- lib/lp/answers/stories/question-add.txt 2009-11-17 02:33:27 +0000
207+++ lib/lp/answers/stories/question-add.txt 2010-07-22 05:49:45 +0000
208@@ -83,9 +83,6 @@
209 Installation of Java Runtime Environment for Mozilla (Answered)
210 posted on 2006-07-20 by Sample Person
211 in ...mozilla-firefox... package in Ubuntu
212- mailto: problem in webpage (Solved)
213- posted on 2006-07-20 by Sample Person, answered by Foo Bar
214- in ...mozilla-firefox... package in Ubuntu
215 >>> similar_questions.a['href']
216 u'http://answers.../ubuntu/+source/mozilla-firefox/+question/...'
217
218
219=== modified file 'lib/lp/bugs/doc/bugtask.txt'
220--- lib/lp/bugs/doc/bugtask.txt 2010-06-23 13:31:35 +0000
221+++ lib/lp/bugs/doc/bugtask.txt 2010-07-22 05:49:45 +0000
222@@ -1375,6 +1375,13 @@
223 >>> for similar_bug in sorted(similar_bugs, key=attrgetter('id')):
224 ... print "%s: %s" % (similar_bug.id, similar_bug.title)
225 1: Firefox does not support SVG
226+ 2: Blackhole Trash folder
227+ 9: Thunderbird crashes
228+ 10: another test bug
229+ 16: a test bug
230+ 17: a test bug
231+ 18: New Bug
232+
233
234 ... and for SourcePackages.
235
236@@ -1387,6 +1394,7 @@
237 >>> for similar_bug in sorted(similar_bugs, key=attrgetter('id')):
238 ... print "%s: %s" % (similar_bug.id, similar_bug.title)
239 1: Firefox does not support SVG
240+ 18: New Bug
241
242 Private bugs won't show up in the list of similar bugs unless the user
243 is a direct subscriber. We'll demonstrate this by creating a new bug
244
245=== modified file 'lib/lp/bugs/model/bugtask.py'
246--- lib/lp/bugs/model/bugtask.py 2010-07-08 13:10:41 +0000
247+++ lib/lp/bugs/model/bugtask.py 2010-07-22 05:49:45 +0000
248@@ -29,6 +29,7 @@
249
250 from storm.expr import And, Alias, AutoTables, In, Join, LeftJoin, Or, SQL
251 from storm.sqlobject import SQLObjectResultSet
252+from storm.store import EmptyResultSet
253 from storm.zope.interfaces import IResultSet, ISQLObjectResultSet
254
255 import pytz
256@@ -1462,6 +1463,8 @@
257 def findSimilar(self, user, summary, product=None, distribution=None,
258 sourcepackagename=None):
259 """See `IBugTaskSet`."""
260+ if not summary:
261+ return EmptyResultSet()
262 # Avoid circular imports.
263 from lp.bugs.model.bug import Bug
264 search_params = BugTaskSearchParams(user)
265@@ -1482,9 +1485,6 @@
266 else:
267 raise AssertionError('Need either a product or distribution.')
268
269- if not summary:
270- return BugTask.select('1 = 2')
271-
272 search_params.fast_searchtext = nl_phrase_search(
273 summary, Bug, ' AND '.join(constraint_clauses), ['BugTask'])
274 return self.search(search_params)
275@@ -2069,7 +2069,6 @@
276 ', '.join(tables), ' AND '.join(clauses))
277 return clause
278
279-
280 def search(self, params, *args):
281 """See `IBugTaskSet`."""
282 store_selector = getUtility(IStoreSelector)
283
284=== modified file 'lib/lp/bugs/stories/guided-filebug/xx-sorting-by-relevance.txt'
285--- lib/lp/bugs/stories/guided-filebug/xx-sorting-by-relevance.txt 2009-07-15 13:22:29 +0000
286+++ lib/lp/bugs/stories/guided-filebug/xx-sorting-by-relevance.txt 2010-07-22 05:49:45 +0000
287@@ -20,9 +20,6 @@
288 #4 Reflow problems with complex page layouts
289 New (0 comments) last updated 2006-07-14...
290 Malone pages that use more complex layouts...
291- #5 Firefox install instructions should be complete
292- New (0 comments) last updated 2006-07-14...
293- All ways of downloading firefox should provide complete...
294
295 If we instead enter a summary that matches bug #4 better, the result will
296 be reversed.