Merge lp:~lifeless/launchpad/malone into lp:launchpad

Proposed by Robert Collins
Status: Merged
Approved by: Robert Collins
Approved revision: no longer in the source branch.
Merged at revision: 11199
Proposed branch: lp:~lifeless/launchpad/malone
Merge into: lp:launchpad
Diff against target: 296 lines (+91/-25)
9 files modified
lib/canonical/database/nl_search.py (+56/-7)
lib/canonical/launchpad/doc/textsearching.txt (+18/-6)
lib/lp/answers/doc/faqtarget.txt (+1/-0)
lib/lp/answers/doc/questiontarget.txt (+5/-0)
lib/lp/answers/stories/question-add-in-other-languages.txt (+0/-2)
lib/lp/answers/stories/question-add.txt (+0/-3)
lib/lp/bugs/doc/bugtask.txt (+8/-0)
lib/lp/bugs/model/bugtask.py (+3/-4)
lib/lp/bugs/stories/guided-filebug/xx-sorting-by-relevance.txt (+0/-3)
To merge this branch: bzr merge lp:~lifeless/launchpad/malone
Reviewer Review Type Date Requested Status
Paul Hummer (community) code Approve
Review via email: mp+30507@code.launchpad.net

Commit message

Avoid perfoming hundreds of thousands of comparisons as a prelude to actual searching - the fti index is the right place to perform relevance assessment.

Description of the change

Do less work searching, so searching is faster. Faster searching -> less load -> less timeouts. We all win. Winning is good. Goodness is good.

To post a comment you must log in.
Revision history for this message
Paul Hummer (rockstar) wrote :

8 - extra_constraints_tables=None):
9 + extra_constraints_tables=None,
10 + fast_enabled=True):
11 + """Return the tsearch2 query that should be use to do a phrase search.
12 +
13 + The precise heuristics applied by this function will vary as we tune
14 + the system.
15 +

*used

review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'lib/canonical/database/nl_search.py'
--- lib/canonical/database/nl_search.py 2009-06-25 05:30:52 +0000
+++ lib/canonical/database/nl_search.py 2010-07-22 05:49:45 +0000
@@ -36,7 +36,58 @@
3636
3737
38def nl_phrase_search(phrase, table, constraints='',38def nl_phrase_search(phrase, table, constraints='',
39 extra_constraints_tables=None):39 extra_constraints_tables=None,
40 fast_enabled=True):
41 """Return the tsearch2 query that should be used to do a phrase search.
42
43 The precise heuristics applied by this function will vary as we tune
44 the system.
45
46 It is the interface by which a user query should be turned into a backend
47 search language query.
48
49 Caveats: The model class must define a 'fti' column which is then used used
50 for full text searching.
51
52 :param phrase: A search phrase.
53 :param table: This should be the SQLBase class representing the base type.
54 :param constraints: Additional SQL clause that limits the rows to a subset
55 of the table.
56 :param extra_constraints_tables: A list of additional table names that are
57 needed by the constraints clause.
58 :param fast_enabled: If true use a fast, but less precise, code path. When
59 feature flags are available this will be converted to a feature flag.
60 :return: A tsearch2 query string.
61 """
62 terms = nl_term_candidates(phrase)
63 if len(terms) == 0:
64 return ''
65 if fast_enabled:
66 return _nl_phrase_search(terms, table, constraints,
67 extra_constraints_tables)
68 else:
69 return _slow_nl_phrase_search(terms, table, constraints,
70 extra_constraints_tables)
71
72
73def _nl_phrase_search(terms, table, constraints, extra_constraints_tables):
74 """Perform a very simple pruning of the phrase, letting fti do ranking.
75
76 See nl_phrase_search for the contract of this function.
77 """
78 bad_words = set(['firefox', 'ubuntu', 'launchpad'])
79 terms = set(terms)
80 filtered_terms = []
81 for term in terms:
82 if term.lower() in bad_words:
83 continue
84 filtered_terms.append(term)
85 # sorted for testing convenience.
86 return '|'.join(sorted(filtered_terms))
87
88
89def _slow_nl_phrase_search(terms, table, constraints,
90 extra_constraints_tables):
40 """Return the tsearch2 query that should be use to do a phrase search.91 """Return the tsearch2 query that should be use to do a phrase search.
4192
42 This function implement an algorithm similar to the one used by MySQL93 This function implement an algorithm similar to the one used by MySQL
@@ -56,7 +107,7 @@
56 closer in the text at the top of the list, while still returning rows that107 closer in the text at the top of the list, while still returning rows that
57 use only some of the terms.108 use only some of the terms.
58109
59 :phrase: A search phrase.110 :terms: Some candidate search terms.
60111
61 :table: This should be the SQLBase class representing the base type.112 :table: This should be the SQLBase class representing the base type.
62113
@@ -66,14 +117,12 @@
66 :extra_constraints_tables: A list of additional table names that are117 :extra_constraints_tables: A list of additional table names that are
67 needed by the constraints clause.118 needed by the constraints clause.
68119
69 Caveat: The SQLBase class must define a 'fti' column .120 Caveat: The model class must define a 'fti' column which is then used
70 This is the column that is used for full text searching.121 for full text searching.
71 """122 """
72 total = table.select(123 total = table.select(
73 constraints, clauseTables=extra_constraints_tables).count()124 constraints, clauseTables=extra_constraints_tables).count()
74 term_candidates = nl_term_candidates(phrase)125 term_candidates = terms
75 if len(term_candidates) == 0:
76 return ''
77 if total < 5:126 if total < 5:
78 return '|'.join(term_candidates)127 return '|'.join(term_candidates)
79128
80129
=== modified file 'lib/canonical/launchpad/doc/textsearching.txt'
--- lib/canonical/launchpad/doc/textsearching.txt 2009-07-23 17:49:31 +0000
+++ lib/canonical/launchpad/doc/textsearching.txt 2010-07-22 05:49:45 +0000
@@ -588,7 +588,12 @@
588588
589To get the actual tsearch2 query that should be run, you will use the589To get the actual tsearch2 query that should be run, you will use the
590nl_phrase_search() function. This one takes two mandatory parameters and590nl_phrase_search() function. This one takes two mandatory parameters and
591two optional ones. You pass in the search phrase and a SQLObject class.591two optional ones. You pass in the search phrase and a database model class.
592
593The original nl_phrase_search has proved slow, so there are now two
594implementations in the core.
595
596First we describe the slow implementation.
592597
593The select method of that class will be use to count the number of rows598The select method of that class will be use to count the number of rows
594that is matched by each term. Term matching 50% or more of the total599that is matched by each term. Term matching 50% or more of the total
@@ -608,7 +613,8 @@
608613
609So firefox will be removed from the final query:614So firefox will be removed from the final query:
610615
611 >>> nl_phrase_search('system is slow when running firefox', Question)616 >>> nl_phrase_search('system is slow when running firefox', Question,
617 ... fast_enabled=False)
612 u'system|slow|run'618 u'system|slow|run'
613619
614 >>> nl_term_candidates('how do I do this?')620 >>> nl_term_candidates('how do I do this?')
@@ -616,6 +622,12 @@
616 >>> nl_phrase_search('how do I do this?', Question)622 >>> nl_phrase_search('how do I do this?', Question)
617 ''623 ''
618624
625The fast code path only removes firefox, ubuntu and launchpad today:
626
627 >>> nl_phrase_search('system is slow when running firefox on ubuntu',
628 ... Question)
629 u'run|slow|system'
630
619631
620==== Using other constraints ====632==== Using other constraints ====
621633
@@ -640,13 +652,13 @@
640 >>> nl_phrase_search(652 >>> nl_phrase_search(
641 ... 'firefox gets very slow on flickr', Question,653 ... 'firefox gets very slow on flickr', Question,
642 ... "Question.product = %s AND Product.active = 't'" % firefox_product.id,654 ... "Question.product = %s AND Product.active = 't'" % firefox_product.id,
643 ... ['Product'])655 ... ['Product'], fast_enabled=False)
644 u'slow|flickr'656 u'slow|flickr'
645657
646When the query only has stop words or common words in it, the returned658When the query only has stop words or common words in it, the returned
647query will be the empty string:659query will be the empty string:
648660
649 >>> nl_phrase_search('firefox will not do it', Question)661 >>> nl_phrase_search('ubuntu will not do it', Question)
650 ''662 ''
651663
652When there are no candidate rows, only stemming and stop words removal664When there are no candidate rows, only stemming and stop words removal
@@ -656,7 +668,7 @@
656 0668 0
657 >>> nl_phrase_search('firefox is very slow on flickr', Question,669 >>> nl_phrase_search('firefox is very slow on flickr', Question,
658 ... 'product = -1')670 ... 'product = -1')
659 u'firefox|slow|flickr'671 u'flickr|slow'
660672
661673
662==== No keywords filtering with few rows ====674==== No keywords filtering with few rows ====
@@ -693,4 +705,4 @@
693 ... 'firefox is slow', Question,705 ... 'firefox is slow', Question,
694 ... 'distribution = %s AND sourcepackagename = %s' % sqlvalues(706 ... 'distribution = %s AND sourcepackagename = %s' % sqlvalues(
695 ... ubuntu, firefox_package.sourcepackagename))707 ... ubuntu, firefox_package.sourcepackagename))
696 u'firefox|slow'708 u'slow'
697709
=== modified file 'lib/lp/answers/doc/faqtarget.txt'
--- lib/lp/answers/doc/faqtarget.txt 2010-03-18 18:55:02 +0000
+++ lib/lp/answers/doc/faqtarget.txt 2010-07-22 05:49:45 +0000
@@ -159,6 +159,7 @@
159 >>> for faq in target.findSimilarFAQs('How do I use the Answer Tracker'):159 >>> for faq in target.findSimilarFAQs('How do I use the Answer Tracker'):
160 ... print faq.title160 ... print faq.title
161 How to answer a question161 How to answer a question
162 How to use bug mail
162 How to become a Launchpad king163 How to become a Launchpad king
163164
164The results are ordered by relevancy. The first document is considered165The results are ordered by relevancy. The first document is considered
165166
=== modified file 'lib/lp/answers/doc/questiontarget.txt'
--- lib/lp/answers/doc/questiontarget.txt 2010-04-08 20:24:52 +0000
+++ lib/lp/answers/doc/questiontarget.txt 2010-07-22 05:49:45 +0000
@@ -359,6 +359,11 @@
359 ... print t.title359 ... print t.title
360 New question360 New question
361 Another question361 Another question
362 Question title3
363 Question title2
364 Question title1
365 Question title0
366
362367
363368
364Answer contacts369Answer contacts
365370
=== modified file 'lib/lp/answers/stories/question-add-in-other-languages.txt'
--- lib/lp/answers/stories/question-add-in-other-languages.txt 2009-11-11 22:17:17 +0000
+++ lib/lp/answers/stories/question-add-in-other-languages.txt 2010-07-22 05:49:45 +0000
@@ -45,8 +45,6 @@
45 ... row.first('a').renderContents()45 ... row.first('a').renderContents()
46 'Installation of Java Runtime Environment for Mozilla'46 'Installation of Java Runtime Environment for Mozilla'
47 'Problema al recompilar kernel con soporte smp (doble-n\xc3\xbacleo)'47 'Problema al recompilar kernel con soporte smp (doble-n\xc3\xbacleo)'
48 'mailto: problem in webpage'
49 '\xd8\xb9\xd9\x83\xd8\xb3 ...'
5048
51 >>> for tag in find_tags_by_class(browser.contents, 'warning message'):49 >>> for tag in find_tags_by_class(browser.contents, 'warning message'):
52 ... print tag.renderContents()50 ... print tag.renderContents()
5351
=== modified file 'lib/lp/answers/stories/question-add.txt'
--- lib/lp/answers/stories/question-add.txt 2009-11-17 02:33:27 +0000
+++ lib/lp/answers/stories/question-add.txt 2010-07-22 05:49:45 +0000
@@ -83,9 +83,6 @@
83 Installation of Java Runtime Environment for Mozilla (Answered)83 Installation of Java Runtime Environment for Mozilla (Answered)
84 posted on 2006-07-20 by Sample Person84 posted on 2006-07-20 by Sample Person
85 in ...mozilla-firefox... package in Ubuntu85 in ...mozilla-firefox... package in Ubuntu
86 mailto: problem in webpage (Solved)
87 posted on 2006-07-20 by Sample Person, answered by Foo Bar
88 in ...mozilla-firefox... package in Ubuntu
89 >>> similar_questions.a['href']86 >>> similar_questions.a['href']
90 u'http://answers.../ubuntu/+source/mozilla-firefox/+question/...'87 u'http://answers.../ubuntu/+source/mozilla-firefox/+question/...'
9188
9289
=== modified file 'lib/lp/bugs/doc/bugtask.txt'
--- lib/lp/bugs/doc/bugtask.txt 2010-06-23 13:31:35 +0000
+++ lib/lp/bugs/doc/bugtask.txt 2010-07-22 05:49:45 +0000
@@ -1375,6 +1375,13 @@
1375 >>> for similar_bug in sorted(similar_bugs, key=attrgetter('id')):1375 >>> for similar_bug in sorted(similar_bugs, key=attrgetter('id')):
1376 ... print "%s: %s" % (similar_bug.id, similar_bug.title)1376 ... print "%s: %s" % (similar_bug.id, similar_bug.title)
1377 1: Firefox does not support SVG1377 1: Firefox does not support SVG
1378 2: Blackhole Trash folder
1379 9: Thunderbird crashes
1380 10: another test bug
1381 16: a test bug
1382 17: a test bug
1383 18: New Bug
1384
13781385
1379... and for SourcePackages.1386... and for SourcePackages.
13801387
@@ -1387,6 +1394,7 @@
1387 >>> for similar_bug in sorted(similar_bugs, key=attrgetter('id')):1394 >>> for similar_bug in sorted(similar_bugs, key=attrgetter('id')):
1388 ... print "%s: %s" % (similar_bug.id, similar_bug.title)1395 ... print "%s: %s" % (similar_bug.id, similar_bug.title)
1389 1: Firefox does not support SVG1396 1: Firefox does not support SVG
1397 18: New Bug
13901398
1391Private bugs won't show up in the list of similar bugs unless the user1399Private bugs won't show up in the list of similar bugs unless the user
1392is a direct subscriber. We'll demonstrate this by creating a new bug1400is a direct subscriber. We'll demonstrate this by creating a new bug
13931401
=== modified file 'lib/lp/bugs/model/bugtask.py'
--- lib/lp/bugs/model/bugtask.py 2010-07-08 13:10:41 +0000
+++ lib/lp/bugs/model/bugtask.py 2010-07-22 05:49:45 +0000
@@ -29,6 +29,7 @@
2929
30from storm.expr import And, Alias, AutoTables, In, Join, LeftJoin, Or, SQL30from storm.expr import And, Alias, AutoTables, In, Join, LeftJoin, Or, SQL
31from storm.sqlobject import SQLObjectResultSet31from storm.sqlobject import SQLObjectResultSet
32from storm.store import EmptyResultSet
32from storm.zope.interfaces import IResultSet, ISQLObjectResultSet33from storm.zope.interfaces import IResultSet, ISQLObjectResultSet
3334
34import pytz35import pytz
@@ -1462,6 +1463,8 @@
1462 def findSimilar(self, user, summary, product=None, distribution=None,1463 def findSimilar(self, user, summary, product=None, distribution=None,
1463 sourcepackagename=None):1464 sourcepackagename=None):
1464 """See `IBugTaskSet`."""1465 """See `IBugTaskSet`."""
1466 if not summary:
1467 return EmptyResultSet()
1465 # Avoid circular imports.1468 # Avoid circular imports.
1466 from lp.bugs.model.bug import Bug1469 from lp.bugs.model.bug import Bug
1467 search_params = BugTaskSearchParams(user)1470 search_params = BugTaskSearchParams(user)
@@ -1482,9 +1485,6 @@
1482 else:1485 else:
1483 raise AssertionError('Need either a product or distribution.')1486 raise AssertionError('Need either a product or distribution.')
14841487
1485 if not summary:
1486 return BugTask.select('1 = 2')
1487
1488 search_params.fast_searchtext = nl_phrase_search(1488 search_params.fast_searchtext = nl_phrase_search(
1489 summary, Bug, ' AND '.join(constraint_clauses), ['BugTask'])1489 summary, Bug, ' AND '.join(constraint_clauses), ['BugTask'])
1490 return self.search(search_params)1490 return self.search(search_params)
@@ -2069,7 +2069,6 @@
2069 ', '.join(tables), ' AND '.join(clauses))2069 ', '.join(tables), ' AND '.join(clauses))
2070 return clause2070 return clause
20712071
2072
2073 def search(self, params, *args):2072 def search(self, params, *args):
2074 """See `IBugTaskSet`."""2073 """See `IBugTaskSet`."""
2075 store_selector = getUtility(IStoreSelector)2074 store_selector = getUtility(IStoreSelector)
20762075
=== modified file 'lib/lp/bugs/stories/guided-filebug/xx-sorting-by-relevance.txt'
--- lib/lp/bugs/stories/guided-filebug/xx-sorting-by-relevance.txt 2009-07-15 13:22:29 +0000
+++ lib/lp/bugs/stories/guided-filebug/xx-sorting-by-relevance.txt 2010-07-22 05:49:45 +0000
@@ -20,9 +20,6 @@
20 #4 Reflow problems with complex page layouts20 #4 Reflow problems with complex page layouts
21 New (0 comments) last updated 2006-07-14...21 New (0 comments) last updated 2006-07-14...
22 Malone pages that use more complex layouts...22 Malone pages that use more complex layouts...
23 #5 Firefox install instructions should be complete
24 New (0 comments) last updated 2006-07-14...
25 All ways of downloading firefox should provide complete...
2623
27If we instead enter a summary that matches bug #4 better, the result will24If we instead enter a summary that matches bug #4 better, the result will
28be reversed.25be reversed.